Avatar

Yue Zhao

Ph.D. Candidate in Information Systems

Carnegie Mellon University

Machine Learning Systems (MLSys), Anomaly Detection, Out-of-distribution (OOD) detection, Automated Machine Learning

Author of PyOD, PyGOD, ADBench

📰 I am on the market with expected graduation in Spring 2023. I am broadly interested in machine learning, data mining/science, and information systems positions. I can work in the U.S., and Canada without sponsorship; please reach out if you have an open opportunity! Contact me by Email (zhaoy [AT] cmu.edu) or WeChat (微信) @ yzhao062.

Who am I (赵越)? I am a 4-th (final) year Ph.D. candidate at Carnegie Mellon University (CMU). Before joining CMU, I earned my MS degree from University of Toronto (2016) and BS degree from University of Cincinnati (2015), and worked as a senior consultant at PwC Canada (2016-19). I am an expert on anomaly detection (a.k.a outlier detection), machine learning systems (MLSys), and automated machine learning (AutoML), with more than 7-year professional experience and 30+ papers (in JMLR, NeurIPS, VLDB, MLsys, etc.) I appreciate the generous support from CMU Presidential Fellowship and Norton Labs Graduate Fellowship.

At CMU, I work with Prof. Leman Akoglu for automated data mining, Prof. Zhihao Jia for machine learning systems, and Prof. George H. Chen for general ML. I am a member of CMU automated learning systems group (Catalyst) and Data Analytics Techniques Algorithms (DATA) Lab. Externally, I collaborate with Prof. Jure Leskovec at Stanford and Prof. Philip S. Yu at UIC.

Contributions Machine Learning Systems for Anomaly Detection and Out-of-distribution (OOD) Detection: I use machine learning systems (MLSys) techniques to support large-scale, real-world outlier detection applications in security, finance, and healthcare with millions of downloads. I designed CPU-based (PyOD), GPU-based (TOD), distributed detection systems (SUOD) for tabular (PyOD), time-series (TODS), and graph data (PyGOD). To understand the characteristics of OD algorithms, I co-author large-scale benchmarks for tabular data (ADBench), time-series data (paper), and graph data (BOND). My work has been widely used by thousands of projects and applications, including Amazon, IBM, Morgan Stanley, and Tesla. See more applications.

Research outcomes (primarily for outlier detection if not specified):

Primary field Secondary Method Year Venue Lead author
large-scale Benchmark tabular anomaly detection ADBench 2022 NeurIPS Y
large-scale Benchmark graph anomaly detection BOND 2022 NeurIPS Y
large-scale Benchmark sequence anomaly detection TODS 2021 NeurIPS
automated machine learning outlier model selection MetaOD 2021 NeurIPS Y
automated machine learning outlier model selection ELECT 2022 ICDM Y
automated machine learning outlier HP optimization HPOD 2022 Preprint Y
automated machine learning outlier evaluation IPM 2021 Preprint Y
machine learning systems PyOD 2019 JMLR Y
machine learning systems time series TODS 2020 AAAI
machine learning systems SUOD 2021 MLSys Y
machine learning systems distributed systems TOD 2022 VLDB Y
machine learning systems graph neural networks PyGOD 2022 Preprint Y
robust ML semi-supervised XGBOD 2018 IJCNN Y
robust ML ensemble learning LSCP 2019 SDM Y
robust ML ensemble learning combo 2020 AAAI Y
robust ML ensemble learning COPOD 2020 ICDM Y
robust ML ensemble learning ECOD 2022 TKDE Y
robust ML noisy label learning ADMoE 2023 AAAI Y
graph mining finance AutoAudit 2020 BigData
graph neural networks contrastive learning CONAD 2022 PAKDD
Diffusion Models survey 2022 Preprint
AI x Science synthetic data SynC 2020 ICDMW
AI x Science healthcare AI PyHealth 2020 Preprint Y
AI x Science Datasets & Benchmark TDC 2021 NeurIPS
AI x Science Datasets & Benchmark TDC V2 2022 NCHEMB

Open-source Contribution: I have led or contributed as a core member to more than 10 ML open-source initiatives, receiving 15,000 GitHub stars (top 0.002%: ranked 800 out of 40M GitHub users) and >10,000,000 total downloads. Popular ones:

  • PyOD: A Python Toolbox for Scalable Outlier Detection (Anomaly Detection).
  • ADBench: The most comprehensive tabular anomaly detection benchmark (30 anomaly detection algorithms on 57 benchmark datasets).
  • TOD: Tensor-based outlier detection–First large-scale GPU-based system for acceleration!
  • SUOD: An Acceleration System for Large-scale Heterogeneous Outlier Detection.
  • anomaly-detection-resources: The most starred resources (books, courses, etc.)!
  • Python Graph Outlier Detection (PyGOD): A Python Library for Graph Outlier Detection.
  • Therapeutics Data Commons (TDC): Machine learning for drug discovery.
  • PyTorch Geometric (PyG): Graph Neural Network Library for PyTorch. Contributed to profiler & benchmarking, and heterogeneous data transformation.
  • combo: A Python Toolbox for ML Model Combination (Ensemble Learning).
  • TODS: Time-series Outlier Detection. Contributed to core detection models.
  • MetaOD: Automatic Unsupervised Outlier Model Selection (AutoML).

[#1] 我组织并维护多个机器学习研究社交微信群,包括

  • anomaly detection (异常检测微信讨论组) & machine learning systems (机器学习系统讨论组) & 其他机器学习研究方向群
  • ML Ph.D. (北美ML博士求职分享群) where we share postdoc, intern, and full-time jobs for ML Ph.D. (students). Join them by scanning 微信 @ 加群小助手!

[#2] I am a dedicated writer with more than 300 articles (in Chinese) and 200,000 followers on Zhihu (知乎) — Chinese Quora (200 million+ registered users). I have been officially recognized as a “Top Writer” (优秀回答者) in four fields (AI, ML, DM, and STAT). My articles have been read by more than 20,000,000 times. See my Zhihu page (微调).

Contact me by Email (zhaoy [AT] cmu.edu) or WeChat (微信) @ yzhao062.

Interests

  • Anomaly Detection
  • Out-of-distribution (OOD) detection
  • Machine Learning Systems (MLSys)
  • Automated Machine Learning (AutoML)
  • Unsupervised Machine Learning
  • AI + Security
  • Graph Neural Networks
  • Healthcare AI & Therapeutic for ML
  • Ensemble Learning

Education

  • Ph.D. Candidate in Information Systems, 2019-2023 (expected)

    Carnegie Mellon University

  • M.S. in Applied Computing, 2015-2017

    University of Toronto

  • B.S. in Computer Engineering (Minor in Computer Science and Math), 2015

    University of Cincinnati

  • High School Diploma, 2010

    Shanxi Experimental Secondary School 山西省实验中学

Latest