Avatar

Yue Zhao

Ph.D. Student in Information Systems

Carnegie Mellon University

Expert with 7-year Professional Experience on Anomaly Detection

Author of PyOD, PyGOD, ADBench

Who am I (赵越)? I am a third-year Ph.D. student at Carnegie Mellon University (CMU). Before joining CMU, I earned my Master degree from University of Toronto (2016), and worked as a senior consultant at PwC Canada (2019). I am an expert on anomaly detection (a.k.a outlier detection) algorithms, systems, and its applications in security, healthcare, and Finance, with more than 7 year professional experience and 20+ papers (in JMLR, TKDE, NeurIPS, etc.). My research is partly supported by Norton Labs Graduate Fellowship.

Contributions to outlier detection systems, benchmarks, and applications: I build automated, scalable, and accelerated machine learning systems (MLSys) to support large-scale, real-world outlier detection applications in security, finance, and healthcare with millions of downloads. I designed CPU-based (PyOD), GPU-based (TOD), distributed detection systems (SUOD) for tabular (PyOD), time-series (TODS), and graph data (PyGOD). To understand the characteristics of OD algorithms, I co-author large-scale benchmarks for tabular data (ADBench), time-series data (paper), and graph data (UNOD). My work has been widely used by thousands of projects and applications, including leading firms like IBM, Morgan Stanley, and Tesla. See more applications.

Research outcomes (primarily for outlier detection if not specified):

Primary field Secondary Method Year Venue Lead author
large-scale Benchmark tabular anomaly detection ADBench 2022 Preprint Y
large-scale Benchmark graph anomaly detection UNOD 2022 Preprint Y
large-scale Benchmark time series TODS 2021 NeurIPS
machine learning systems PyOD 2019 JMLR Y
machine learning systems time series TODS 2020 AAAI
machine learning systems SUOD 2021 MLSys Y
machine learning systems distributed systems TOD 2022 Preprint Y
machine learning systems graph neural networks PyGOD 2022 Preprint Y
ensemble learning semi-supervised XGBOD 2018 IJCNN Y
ensemble learning LSCP 2019 SDM Y
ensemble learning machine learning systems combo 2020 AAAI Y
ensemble learning interpretable ML COPOD 2020 ICDM Y
ensemble learning interpretable ML ECOD 2022 TKDE Y
automated machine learning graph mining AutoAudit 2022 BigData
automated machine learning MetaOD 2021 NeurIPS Y
automated machine learning outlier evaluation IPM 2021 Preprint Y
graph neural networks contrastive learning CONAD 2022 PAKDD
AI x Science large-scale Benchmark HR manage. 2018 Intellisys Y
AI x Science super-resolution DRS 2020 ICME
AI x Science single-cell CIBS 2020 BIBM
AI x Science synthetic data SynC 2020 ICDMW
AI x Science healthcare AI PyHealth 2020 Preprint Y
AI x Science large-scale Benchmark TDC 2021 NeurIPS

At CMU, I work with Prof. Leman Akoglu (DATA Lab), Prof. Zhihao Jia (Catalyst), and Prof. George H. Chen. Externally, I collaborate with Prof. Jure Leskovec at Stanford University and Prof. Xia “Ben” Hu at Rice University.


Open-source Contribution: I have led or contributed as a core member to more than 10 ML open-source initiatives, receiving 13,000 GitHub stars (top 0.002%: ranked 800 out of 40M GitHub users) and >10,000,000 total downloads. Popular ones:

  • PyOD: A Python Toolbox for Scalable Outlier Detection (Anomaly Detection).
  • ADBench: The most comprehensive tabular anomaly detection benchmark (30 anomaly detection algorithms on 55 benchmark datasets).
  • TOD: Tensor-based outlier detection–First large-scale GPU-based system for acceleration!
  • SUOD: An Acceleration System for Large-scale Heterogeneous Outlier Detection.
  • anomaly-detection-resources: The most starred resources (books, courses, etc.)!
  • PyTorch Geometric (PyG): Graph Neural Network Library for PyTorch. Contributed to profiler & benchmarking, and heterogeneous data transformation, as a member of the PyG team.
  • Python Graph Outlier Detection (PyGOD): A Python Library for Graph Outlier Detection.
  • Therapeutics Data Commons (TDC): Machine learning for drug discovery.
  • combo: A Python Toolbox for ML Model Combination (Ensemble Learning).
  • TODS: Time-series Outlier Detection. Contributed to core detection models.
  • MetaOD: Automatic Unsupervised Outlier Model Selection (AutoML).

[#1] 我组织并维护多个机器学习研究社交微信群,包括

  • anomaly detection (异常检测微信讨论组)
  • machine learning systems (机器学习系统讨论组),
  • ML Ph.D. (北美ML博士求职分享群) where we share postdoc, intern, and full-time jobs for ML Ph.D. (students).
  • 其他机器学习研究方向群 Join them by scanning 微信 @ 加群小助手!

[#2] I am a dedicated writer with more than 300 articles (in Chinese) and 170,000 followers on Zhihu (知乎) — Chinese Quora (200 million+ registered users). I have been officially recognized as a “Top Writer” (优秀回答者) in four fields (AI, ML, DM, and STAT). My articles have been read by more than 20,000,000 times. See my Zhihu page (微调).

Contact me by Email (zhaoy [AT] cmu.edu) or WeChat (微信) @ yzhao062.

Interests

  • Outlier Detection Systems (ODSys)
  • Outlier & Anomaly Detection
  • AI + Security
  • Machine Learning Systems (MLSys)
  • Automated Machine Learning
  • Graph Neural Networks
  • Ensemble Learning
  • Scalable Machine Learning
  • Healthcare AI & Therapeutic for ML
  • Information Systems

Education

  • Ph.D. Student in Information Systems, 2019-2023

    Carnegie Mellon University

  • M.S. in Applied Computing, 2015-2017

    University of Toronto

  • B.S. in Computer Engineering (Minor in Computer Science and Math), 2015

    University of Cincinnati

  • High School Diploma, 2010

    Shanxi Experimental Secondary School 山西省实验中学

Latest