Avatar

Yue Zhao

Ph.D. Candidate in Information Systems

Carnegie Mellon University

Machine Learning Systems, Anomaly Detection, Automated ML

Author of PyOD, PyGOD, ADBench

Biography

📰 I am on the market with expected graduation in Spring 2023. I am broadly interested in machine learning, data mining/science, and information systems positions. I can work in the U.S., and Canada without sponsorship; please reach out if you have an open opportunity! Contact me by Email (zhaoy [AT] cmu.edu) or WeChat (微信) @ yzhao062.

I am at ICDM from Nov 27-29th in Orlando and NeurIPS from Nov 29th to Dec 2nd in New Orleans. Let us chat!

Who am I (赵越)? I am a 4-th (final) year Ph.D. candidate at Carnegie Mellon University (CMU). Before joining CMU, I earned my MS degree from University of Toronto (2016) and BS degree from University of Cincinnati (2015), and worked as a senior consultant at PwC Canada (2016-19). I am an expert on anomaly detection (a.k.a outlier detection), machine learning systems (MLSys), and automated machine learning (AutoML), with more than 7-year professional experience and 30+ papers (in JMLR, NeurIPS, VLDB, MLsys, etc.) I appreciate the generous support from CMU Presidential Fellowship and Norton Labs Graduate Fellowship.

At CMU, I work with Prof. Leman Akoglu for automated data mining, Prof. Zhihao Jia for machine learning systems, and Prof. George H. Chen for general ML. I am a member of CMU automated learning systems group (Catalyst) and Data Analytics Techniques Algorithms (DATA) Lab. Externally, I collaborate with Prof. Jure Leskovec at Stanford and Prof. Philip S. Yu at UIC.

Contributions Machine Learning Systems for Anomaly Detection: I use machine learning systems (MLSys) techniques to support large-scale, real-world outlier detection applications in security, finance, and healthcare with millions of downloads. I designed CPU-based (PyOD), GPU-based (TOD), distributed detection systems (SUOD) for tabular (PyOD), time-series (TODS), and graph data (PyGOD). To understand the characteristics of OD algorithms, I co-author large-scale benchmarks for tabular data (ADBench), time-series data (paper), and graph data (BOND). My work has been widely used by thousands of projects and applications, including Amazon, IBM, Morgan Stanley, and Tesla. See more applications.

Research outcomes (primarily for outlier detection if not specified):

Primary field Secondary Method Year Venue Lead author
large-scale Benchmark tabular anomaly detection ADBench 2022 NeurIPS Y
large-scale Benchmark graph anomaly detection BOND 2022 NeurIPS Y
large-scale Benchmark sequence anomaly detection TODS 2021 NeurIPS
automated machine learning outlier model selection MetaOD 2021 NeurIPS Y
automated machine learning outlier model selection ELECT 2022 ICDM Y
automated machine learning outlier HP optimization HPOD 2022 Preprint Y
automated machine learning outlier evaluation IPM 2021 Preprint Y
machine learning systems PyOD 2019 JMLR Y
machine learning systems time series TODS 2020 AAAI
machine learning systems SUOD 2021 MLSys Y
machine learning systems distributed systems TOD 2022 VLDB Y
machine learning systems graph neural networks PyGOD 2022 Preprint Y
robust ML semi-supervised XGBOD 2018 IJCNN Y
robust ML ensemble learning LSCP 2019 SDM Y
robust ML ensemble learning combo 2020 AAAI Y
robust ML ensemble learning COPOD 2020 ICDM Y
robust ML ensemble learning ECOD 2022 TKDE Y
robust ML noisy label learning ADMoE 2023 AAAI Y
graph mining finance AutoAudit 2020 BigData
graph neural networks contrastive learning CONAD 2022 PAKDD
Diffusion Models survey 2022 Preprint
AI x Science synthetic data SynC 2020 ICDMW
AI x Science healthcare AI PyHealth 2020 Preprint Y
AI x Science Datasets & Benchmark TDC 2021 NeurIPS
AI x Science Datasets & Benchmark TDC V2 2022 NCHEMB

Open-source Contribution: I have led or contributed as a core member to more than 10 ML open-source initiatives, receiving 15,000 GitHub stars (top 0.002%: ranked 800 out of 40M GitHub users) and >10,000,000 total downloads. Popular ones:

  • PyOD: A Python Toolbox for Scalable Outlier Detection (Anomaly Detection).
  • ADBench: The most comprehensive tabular anomaly detection benchmark (30 anomaly detection algorithms on 57 benchmark datasets).
  • TOD: Tensor-based outlier detection–First large-scale GPU-based system for acceleration!
  • SUOD: An Acceleration System for Large-scale Heterogeneous Outlier Detection.
  • anomaly-detection-resources: The most starred resources (books, courses, etc.)!
  • Python Graph Outlier Detection (PyGOD): A Python Library for Graph Outlier Detection.
  • Therapeutics Data Commons (TDC): Machine learning for drug discovery.
  • PyTorch Geometric (PyG): Graph Neural Network Library for PyTorch. Contributed to profiler & benchmarking, and heterogeneous data transformation.
  • combo: A Python Toolbox for ML Model Combination (Ensemble Learning).
  • TODS: Time-series Outlier Detection. Contributed to core detection models.
  • MetaOD: Automatic Unsupervised Outlier Model Selection (AutoML).

[#1] 我组织并维护多个机器学习研究社交微信群,包括

  • anomaly detection (异常检测微信讨论组) & machine learning systems (机器学习系统讨论组) & 其他机器学习研究方向群
  • ML Ph.D. (北美ML博士求职分享群) where we share postdoc, intern, and full-time jobs for ML Ph.D. (students). Join them by scanning 微信 @ 加群小助手!

[#2] I am a dedicated writer with more than 300 articles (in Chinese) and 200,000 followers on Zhihu (知乎) — Chinese Quora (200 million+ registered users). I have been officially recognized as a “Top Writer” (优秀回答者) in four fields (AI, ML, DM, and STAT). My articles have been read by more than 20,000,000 times. See my Zhihu page (微调).

Contact me by Email (zhaoy [AT] cmu.edu) or WeChat (微信) @ yzhao062.

Interests

  • Outlier & Anomaly Detection
  • Machine Learning Systems (MLSys)
  • Automated Machine Learning (AutoML)
  • Unsupervised Machine Learning
  • AI + Security
  • Graph Neural Networks
  • Healthcare AI & Therapeutic for ML
  • Ensemble Learning

Education

  • Ph.D. Candidate in Information Systems, 2019-2023 (expected)

    Carnegie Mellon University

  • M.S. in Applied Computing, 2015-2017

    University of Toronto

  • B.S. in Computer Engineering (Minor in Computer Science and Math), 2015

    University of Cincinnati

  • High School Diploma, 2010

    Shanxi Experimental Secondary School 山西省实验中学

Miscellaneous

News & Travel

Nov 2022: I am at ICDM from Nov 27-29th in Orlando and NeurIPS from Nov 29th to Dec 2nd in New Orleans (Poster Session 6, Thu Dec 01 02:30 PM–04:00 PM (PST) @ Hall J #1032).

Nov 2022: Happy to serve as the workflow co-chair for KDD 2023!

Nov 2022: ADMoE: Anomaly Detection with Mixture-of-Experts from Noisy Labels will appear in AAAI 2023–the first framework of using multiple sets of noisy labels for anomaly detection.

Oct 2022: Have a new system paper out TOD: GPU-accelerated Outlier Detection via Tensor Operations. with George H. Chen and Zhihao Jia. VLDB paper, Code.

Oct 2022: Great news! Our proposal (led by Prof. Zhihao Jia) for AI-assisted systems has been funded via Meta 2022 AI4AI Research!

Sep 2022: Artificial Intelligence Foundation for Therapeutic Science published in Nature Chemical Biology. The paper describes Therapeutics Data Commons (TDC) and its various use cases, laying the foundation of therapeutic science.

Sep 2022: Two large-scale anomaly detection benchmarks for tabular data (ADBench) and graph data (BOND) accepted at NeurIPS 2022.


Profile & Casual Pictures

Publications

See my Google Scholar, DBLP, ORCID, and ResearchGate.

Prepints & Working Papers

[w22f] Diffusion Models: A Comprehensive Survey of Methods and Applications, with Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yingxia Shao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. Preprint.

[w22e] Hyperparameter Optimization for Unsupervised Outlier Detection, with Leman Akoglu. Preprint.

[w21c] A Large-scale Study on Unsupervised Outlier Model Selection: Do Internal Strategies Suffice? with Martin Q. Ma (equal contribution), Xiaorong Zhang, and Leman Akoglu. Preprint.


Peer-reviewed Papers

(2023). TOD: GPU-accelerated Outlier Detection via Tensor Operations. International Conference on Very Large Data Bases (VLDB).

PDF Code

(2022). ADBench: Anomaly Detection Benchmark. Advances in Neural Information Processing Systems (NeurIPS) (Equal contribution).

PDF Code

(2022). ELECT: Toward Unsupervised Outlier Model Selection . IEEE International Conference on Data Mining (ICDM).

PDF Code

(2022). ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions. IEEE Transactions on Knowledge and Data Engineering (TKDE) (Co-first author; equal contribution).

PDF Code DOI IEEE Xplore

(2021). Automatic Unsupervised Outlier Model Selection. Advances in Neural Information Processing Systems (NeurIPS).

PDF Code Project

(2020). Combining Machine Learning Models Using combo Library. Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), demo track.

PDF Code Video DOI

(2020). SynC: A Unified Framework for Generating Synthetic Population with Gaussian Copula. Workshops at the Thirty-Fourth AAAI Conference on Artificial Intelligence.

PDF Code PPAI Arxiv

(2018). DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Workshop on Outlier Detection De-constructed (ODD).

PDF Slides

(2017). An empirical study of touch-based authentication methods on smartwatches. Proceedings of the 2017 ACM International Symposium on Wearable Computers (ISWC) (Co-first author; equal contribution).

PDF DOI ACM DL

Awards and Funds

Meta 2022 AI4AI Research Award

To foster further innovation in this area, and to deepen our collaboration with academia, Meta is pleased to invite faculty to respond to this call for research proposals pertaining to the aforementioned topics.

The Norton Labs Graduate Fellowship

The Norton Labs Graduate Fellowship provides up to $20,000 USD that may be used to cover one year of the student's tuition fees and/or reimburse expenses incurred by the student during collaboration with Norton Labs. Selected as one of the only two graduate students to receive the award.

CMU Presidential Fellowship

PwC Presidential Fellowship ($80,000).

Mitacs-Accelerate Research and Development Funding

Project IT07884 ($30,000): machine learning in HR analytics.

Mantei/Mae Award & Scholar

Awarded to highest-performing students in Electrical Engineering, Computer Engineering, and Computer Science ($40,000 in four years).

University Global Award and Scholarship

Awarded to top performing international students ($32,000 in four years).

Services

Conference Organizing Committee

External Reviewer for Funding Proposals

Journal Reviewer

Program Committee and/or Reviewer for Conferences and Workshops

Experience

Professional Positions

 
 
 
 
 

Machine Learning Research Intern

NortonLifeLock Research Group

May 2022 – Present
Supervised by Dr. Acar Tamersoy and Dr. Kevin Roundy.
 
 
 
 
 

Machine Learning Research Intern

Microsoft Research

Jan 2022 – Mar 2022

Designed weakly supervised anomaly detection algorithms.

Supervised by Dr. Guoqing Zheng and Dr. Subhabrata (Subho) Mukherjee.

 
 
 
 
 

Visiting Student Researcher

Stanford University, Computer Science Department,

May 2021 – Aug 2021 Stanford, CA, USA

Designed new GNN systems and models.

Supervised by Prof. Jure Leskovec.

 
 
 
 
 

Machine Learning Research Intern

IQVIA, Analytics Center of Excellence

May 2020 – Aug 2020 Boston, MA, USA

Designed new machine learning systems and models in healthcare.

Supervised by Dr. Cao (Danica) Xiao (IQVIA) and Prof. Jimeng Sun (UIUC).

 
 
 
 
 

Senior Consultant

PwC Canada, Consulting & Deals

Feb 2017 – Jun 2019 Toronto, ON, Canada
I was a senior consultant with the following duties:

  • Designed fraud analytic solutions for major Canadian banks and insurance firms.
  • Led applied data analytics projects, e.g., client segmentation and churn analysis.
  • Developed multiple pricing optimization models with statistical methods.
 
 
 
 
 

Research Associate (Intern)

PwC Canada, Consulting & Deals

May 2016 – Dec 2016 Toronto, ON, Canada

Applied research in people analytics with machine learning.

Supervised by Prof. Anthony Bonner and the project is partly supported by Mitacs-Accelerate Research and Development Funding (IT07884).

 
 
 
 
 

Software Engineer (Contract & Intern)

Siemens PLM Software USA

Mar 2012 – Dec 2014 Cincinnati, Ohio, USA
As a co-op student and contractor, my works include:

  • Managed a Java project to transition the LabManager system to vCloud Director.
  • Refactored outdated automation code and added new modules and JUnit test cases.
  • Led a C++ Code Coverage project on Teamcenter platform to strengthen its stability.

Experience

Teaching Positions

 
 
 
 
 

Teaching Assistant

Carnegie Mellon University, Heinz College of Information Systems and Public Policy

Feb 2020 – Present Pittsburgh, PA, United States

I am a teaching assistant for the following courses:

  • Intro to Artificial Intelligence taught by Prof. David Steier (Fall 2020, Spring 2021, Fall 2021, Spring 2022).
  • Digital Transformation taught by Prof. James Riel (Spring 2022).
  • Statistics for IT Managers taught by Prof. Daniel Nagin (Fall 2021).

The main duties include grading assignments and giving lectures on selected topics.

 
 
 
 
 

Teaching Assistant

University of Toronto, Department of Computer Science

Sep 2015 – Dec 2015 Toronto, ON, Canada
I was a teaching assistant for Embedded Systems taught by Prof. Philip Anderson.
 
 
 
 
 

Teaching Assistant

University of Cincinnati, Department of Electrical Engineering & Computer Science

Sep 2014 – Dec 2014 Cincinnati, OH, USA
I was a teaching assistant for Introduction to Programming taught by Prof. George Purdy.

Open-source Initiatives

To find more of my open-source initiatives, see my GitHub.

ADBench

ADBench Anomaly Detection Benchmark

PyGOD (Python Graph Outlier Detection)

A Python Library for Graph Outlier Detection (Anomaly Detection) and Its Benchmark

Therapeutics Data Commons (TDC)

Machine Learning Datasets and Tasks for Drug Discovery and Development

SUOD

SUOD Accelerating Large-scale Unsupervised Heterogeneous Outlier Detection

combo

A Python Toolbox for Machine Learning Model Combination.

Python Outlier Detection Toolbox

PyOD–A Python Toolbox for Scalable Outlier Detection (Anomaly Detection).

Talks

Recent Talks

Contact

[WeChat (微信) @ yzhao062 | 微信 @ 加群小助手]