Biography

Short Bio: My name is Yue ZHAO (赵越 in Chinese). I am a third-year Ph.D. student at Heinz College, Carnegie Mellon University (CMU). Before joining CMU, I earned my Master degree from University of Toronto, and worked as a senior consultant at PwC Canada. I have coauthored more than 20 papers (in JMLR, TKDE, NeurIPS, etc.) on anomaly detection and its applications in security and healthcare. Service-wide, I am on the conference program committee of KDD, AAAI, and IJCAI, and reviewing for JMLR, TPAMI, and TKDE. I am one of the two recipients of the 2022 Norton Labs Graduate Fellowship.

Outlier detection systems and applications: I build automated, scalable, and accelerated machine learning systems (MLSys) to support large-scale real-world outlier detection applications in security, finance, and healthcare with millions of downloads. I designed CPU-based (PyOD), GPU-based (TOD), distributed detection systems (SUOD) for tabular (PyOD), time-series (TODS), and graph data (PyGOD).

Research outcomes (related to outlier detection if not specified):

Primary field Secondary Method Year Venue Lead author
machine learning systems PyOD 2019 JMLR Y
machine learning systems time series TODS 2020 AAAI
machine learning systems benchmark TODS 2021 NeurIPS
machine learning systems SUOD 2021 MLSys Y
machine learning systems distributed systems TOD 2022 Preprint Y
machine learning systems graph neural networks PyGOD 2022 Preprint Y
ensemble learning semi-supervised XGBOD 2018 IJCNN Y
ensemble learning LSCP 2019 SDM Y
ensemble learning machine learning systems combo 2020 AAAI Y
ensemble learning interpretable ML COPOD 2020 ICDM Y
ensemble learning interpretable ML ECOD 2022 TKDE Y
automated machine learning graph mining AutoAudit 2022 BigData
automated machine learning MetaOD 2021 NeurIPS Y
graph neural networks contrastive learning CONAD 2022 PAKDD
AI x Science benchmark HR manage. 2018 Intellisys Y
AI x Science CIBS 2020 BIBM
AI x Science PyHealth 2020 Preprint Y
AI x Science benchmark TDC 2021 NeurIPS

At CMU, I work with Prof. Leman Akoglu (DATA Lab), Prof. Zhihao Jia (Catalyst), and Prof. George H. Chen. Externally, I collaborate with Prof. Jure Leskovec at Stanford University and Prof. Xia “Ben” Hu at Rice University.


Open-source Contribution: I have led or contributed as a core member to more than 10 ML open-source initiatives, receiving 13,000 GitHub stars (top 0.002%: ranked 800 out of 40M GitHub users) and >8,000,000 total downloads. Popular ones:

  • PyOD: A Python Toolbox for Scalable Outlier Detection (Anomaly Detection).
  • TOD: Tensor-based outlier detection–First large-scale GPU-based system for acceleration!
  • SUOD: An Acceleration System for Large-scale Heterogeneous Outlier Detection.
  • anomaly-detection-resources: The most starred resources (books, courses, etc.)!
  • PyTorch Geometric (PyG): Graph Neural Network Library for PyTorch. Contributed to profiler & benchmarking, and heterogeneous data transformation, as a member of the PyG team.
  • Python Graph Outlier Detection (PyGOD): A Python Library for Graph Outlier Detection.
  • Therapeutics Data Commons (TDC): Machine learning for drug discovery.
  • combo: A Python Toolbox for ML Model Combination (Ensemble Learning).
  • TODS: Time-series Outlier Detection. Contributed to core detection models.
  • MetaOD: Automatic Unsupervised Outlier Model Selection (AutoML).

[#1] I host a WeChat group on anomaly detection (异常检测微信讨论组) & machine learning systems (MLSys讨论组), along with more than four hundred of researchers (e.g., Berkley, MIT, Tsinghua, etc.) and industry people (e.g., Alibaba, IBM, Meta, etc.) for collaboration and intern/full-time opportunities. Join it by scan 微信 @ 加群小助手!

[#2] I host a WeChat group for ML Ph.D. (北美ML博士求职分享群) where we share postdoc, intern, and full-time jobs for ML Ph.D. (students). Join it by scan 微信 @ 加群小助手!

[#3] I am a dedicated writer with more than 300 articles (in Chinese) and 170,000 followers on Zhihu (知乎) — Chinese Quora (200 million+ registered users). I have been officially recognized as a “Top Writer” (优秀回答者) in four fields (AI, ML, DM, and STAT). My articles have been read by more than 20,000,000 times. See my Zhihu page (微调).

Contact me by Email (zhaoy [AT] cmu.edu) or WeChat (微信) @ yzhao062.

Interests

  • Outlier Detection Systems (ODSys)
  • Outlier & Anomaly Detection
  • AI + Security
  • Machine Learning Systems (MLSys)
  • Automated Machine Learning
  • Graph Neural Networks
  • Ensemble Learning
  • Scalable Machine Learning
  • Healthcare AI & Therapeutic for ML
  • Information Systems

Education

  • Ph.D. Student in Information Systems, 2019-2023

    Carnegie Mellon University

  • M.S. in Applied Computing, 2015-2017

    University of Toronto

  • B.S. in Computer Engineering (Minor in Computer Science and Math), 2015

    University of Cincinnati

  • High School Diploma, 2010

    Shanxi Experimental Secondary School 山西省实验中学

Miscellaneous

News & Travel

May 2022: Invited to present at Morgan Stanley for automated outlier detection!

Apr 2022: 🌟 Reached 800 citations on Google Scholar!

Apr 2022: PyGOD (Python Graph Outlier Detection) received 400+ stars in a week! We released PyGOD (Python Graph Outlier Detection). With PyGOD, you could do anomaly detection with the latest graph neural networks in 5 lines! See paper here!

Mar 2022: Invited to present at Morgan Stanley for large-scale anomaly detection systems!

Mar 2022: 🎉 I received the prestigious 2022 Norton Labs Graduate Fellowship (one of the two graduate students worldwide). Thanks to the selection committee and my advisors!

Mar 2022: ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions is accepted to IEEE Transactions on Knowledge and Data Engineering (TKDE)! ECOD is a simple yet effective detection algorithm with extremely fast O(nd) runtime.

Feb 2022: Propose a new initiative called Detected AI (detected.ai) for large-scale anomaly detection applications. It is still too early to tell, but it will be exciting!

Feb 2022: Have a new system out TOD: GPU-accelerated Outlier Detection via Tensor Operations. with George H. Chen and Zhihao Jia. Preprint, Code being released

  • TOD is the first fast, comprehensive, GPU-based outlier detection system.
  • 🌟 on average it is 11 times faster than PyOD!
  • 🌟 it supports various OD algorithms, e,g., kNN, LOF, ABOD, HBOS, etc.

Profile & Casual Pictures


Resources

Publications

See my Google Scholar, DBLP, ORCID, and ResearchGate.

Prepints & Working Papers

[w22a] TOD: GPU-accelerated Outlier Detection via Tensor Operations, with George H. Chen and Zhihao Jia. Under submission at a key ML conference. Preprint.

[w21c] A Large-scale Study on Unsupervised Outlier Model Selection: Do Internal Strategies Suffice? with Martin Q. Ma (equal contribution), Xiaorong Zhang, and Leman Akoglu. Preprint.


Peer-reviewed Papers

(2022). ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions. IEEE Transactions on Knowledge and Data Engineering (TKDE) (Co-first author; equal contribution).

PDF Code DOI IEEE Xplore

(2021). Automatic Unsupervised Outlier Model Selection. Advances in Neural Information Processing Systems (NeurIPS).

PDF Code Project

(2021). Revisiting Time Series Outlier Detection: Definitions and Benchmarks. Advances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track.

PDF Code

(2020). Combining Machine Learning Models Using combo Library. Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), demo track.

PDF Code Video DOI

(2020). SynC: A Unified Framework for Generating Synthetic Population with Gaussian Copula. Workshops at the Thirty-Fourth AAAI Conference on Artificial Intelligence.

PDF Code PPAI Arxiv

(2018). DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Workshop on Outlier Detection De-constructed (ODD).

PDF Slides

(2017). An empirical study of touch-based authentication methods on smartwatches. Proceedings of the 2017 ACM International Symposium on Wearable Computers (ISWC) (Co-first author; equal contribution).

PDF DOI ACM DL

Services

I am open to peer review and organizing chances in the field of outlier & anomaly detection, ensemble Learning, clustering, ML libraries & systems, and information systems.

Journal Reviewer

Program Committee and/or Reviewer for Conferences and Workshops

Awards and Funds

The Norton Labs Graduate Fellowship

The Norton Labs Graduate Fellowship provides up to $20,000 USD that may be used to cover one year of the student's tuition fees and/or reimburse expenses incurred by the student during collaboration with Norton Labs. Selected as one of the only two graduate students to receive the award.

CMU GSA/Provost Conference Funding

Part of the travel grant for attending ICDM 2020.

AAAI Student Travel Grant & CMU GSA/Provost Conference Funding

Part of the travel grant for attending AAAI 2020.

Mitacs-Accelerate Research and Development Funding

Project IT07884 ($30,000): machine learning in HR analytics.

Mantei/Mae Award & Scholar

Awarded to highest-performing students in Electrical Engineering, Computer Engineering, and Computer Science ($40,000 in four years).

University Global Award and Scholarship

Awarded to top performing international students ($32,000 in four years).

Experience

Professional Positions

 
 
 
 
 

Visiting Student Researcher

Stanford University, Computer Science Department,

May 2021 – Aug 2021 Stanford, CA, USA

Designed new GNN systems and models.

Supervised by Prof. Jure Leskovec.

 
 
 
 
 

Machine Learning Research Intern

IQVIA, Analytics Center of Excellence

May 2020 – Aug 2020 Boston, MA, USA

Designed new machine learning systems and models in healthcare.

Supervised by Dr. Cao (Danica) Xiao (IQVIA) and Prof. Jimeng Sun (UIUC).

 
 
 
 
 

Senior Consultant

PwC Canada, Consulting & Deals

Feb 2017 – Jun 2019 Toronto, ON, Canada
I was a senior consultant with the following duties:

  • Designed fraud analytic solutions for major Canadian banks and insurance firms.
  • Led applied data analytics projects, e.g., client segmentation and churn analysis.
  • Developed multiple pricing optimization models with statistical methods.
 
 
 
 
 

Research Associate (Intern)

PwC Canada, Consulting & Deals

May 2016 – Dec 2016 Toronto, ON, Canada

Applied research in people analytics with machine learning.

Supervised by Prof. Anthony Bonner and the project is partly supported by Mitacs-Accelerate Research and Development Funding (IT07884).

 
 
 
 
 

Software Engineer (Contract & Intern)

Siemens PLM Software USA

Mar 2012 – Dec 2014 Cincinnati, Ohio, USA
As a co-op student and contractor, my works include:

  • Managed a Java project to transition the LabManager system to vCloud Director.
  • Refactored outdated automation code and added new modules and JUnit test cases.
  • Led a C++ Code Coverage project on Teamcenter platform to strengthen its stability.

Experience

Teaching Positions

 
 
 
 
 

Teaching Assistant

Carnegie Mellon University, Heinz College of Information Systems and Public Policy

Feb 2020 – Present Pittsburgh, PA, United States

I am a teaching assistant for the following courses:

  • Intro to Artificial Intelligence taught by Prof. David Steier (Fall 2020, Spring 2021, Fall 2021, Spring 2022).
  • Digital Transformation taught by Prof. James Riel (Spring 2022).
  • Statistics for IT Managers taught by Prof. Daniel Nagin (Fall 2021).

The main duties include grading assignments and giving lectures on selected topics.

 
 
 
 
 

Teaching Assistant

University of Toronto, Department of Computer Science

Sep 2015 – Dec 2015 Toronto, ON, Canada
I was a teaching assistant for Embedded Systems taught by Prof. Philip Anderson.
 
 
 
 
 

Teaching Assistant

University of Cincinnati, Department of Electrical Engineering & Computer Science

Sep 2014 – Dec 2014 Cincinnati, OH, USA
I was a teaching assistant for Introduction to Programming taught by Prof. George Purdy.

Open-source Initiatives

To find more of my open-source initiatives, see my GitHub.

PyGOD (Python Graph Outlier Detection)

A Python Library for Graph Outlier Detection (Anomaly Detection)

PyG (PyTorch Geometric)

Graph Neural Network Library for PyTorch

Therapeutics Data Commons (TDC)

Machine Learning Datasets and Tasks for Drug Discovery and Development

SUOD

SUOD Accelerating Large-scale Unsupervised Heterogeneous Outlier Detection

combo

A Python Toolbox for Machine Learning Model Combination.

Python Outlier Detection Toolbox

PyOD–A Python Toolbox for Scalable Outlier Detection (Anomaly Detection).

Talks

Recent Talks

Anomaly Detection Algorithms, Applications, and Systems (in Chinese)

本次视频主要介绍了多种异常检测算法,相关应用和使用技巧,并对未来的研究进行了展望.

Contact

[WeChat (微信) @ yzhao062 | 微信 @ 加群小助手]