📰 I am on the market for tenure-track AP positions. I am broadly interested in machine learning (ML), data mining/science, and information systems positions. I am a US and Canadian permanent resident with full work authorization. See my latest CV.
Summary. In June 2023, I will finish my Ph.D. in 3.5 years at Carnegie Mellon University (CMU),
with the support from the CMU Presidential Fellowship and Norton Graduate Fellowship.
My research accelerates and automates unsupervised ML: (1) how to support large-scale learning tasks with ML systems and (2) how to automate unsupervised ML model selection and hyperparameter optimization.
I build AI/ML applications in healthcare and security.
Mentors. At CMU, I work with Prof. Leman Akoglu for automated data mining, Prof. Zhihao Jia for machine learning systems, and Prof. George H. Chen for general ML. I am a member of CMU automated learning systems group (Catalyst) and Data Analytics Techniques Algorithms (DATA) Lab. I have collaborated with Prof. Jure Leskovec at Stanford and Prof. Philip S. Yu at UIC.
Open-source Contribution: I have led or contributed as a core member to more than 10 ML open-source initiatives, receiving 15,000 GitHub stars (top 0.002%: ranked 800 out of 40M GitHub users) and >12,000,000 total downloads.
I am a dedicated writer with more than 300 articles (in Chinese) and 200,000 followers on Zhihu (知乎) — Chinese Quora (200 million+ registered users). I have been officially recognized as a “Top Writer” (优秀回答者) in four fields (AI, ML, DM, and STAT). My articles have been read by more than 20,000,000 times. See my Zhihu page (微调).
Contact me by Email (zhaoy [AT] cmu.edu) or WeChat @ yzhao062.Ph.D. Candidate in Information Systems, 2019-2023
Carnegie Mellon University
M.S. in Applied Computing, 2015-2017
University of Toronto
B.S. in Computer Engineering (Minor in Computer Science and Math), 2015
University of Cincinnati
High School Diploma, 2010
Shanxi Experimental Secondary School 山西省实验中学
Mar 2023: Automated and Salable Algorithms and Systems for Unsupervised ML @ USC
Mar 2023: Automated and Salable Algorithms and Systems for Unsupervised ML @ UC Davis
Feb 2023: Automated and Salable Algorithms and Systems for Unsupervised ML @ SBU
Feb 2023: Automated and Salable Algorithms and Systems for Unsupervised ML @ U Chicago
Feb 2023: Automated and Salable Algorithms and Systems for Unsupervised ML @ CMU (PDL)
Feb 2023: Automated and Salable Algorithms and Systems for Unsupervised ML @ UCM
Feb 2023: Weakly Supervised Anomaly Detection: A Survey is out! [code]
Dec 2022: The Need for Unsupervised Outlier Model Selection: A Review and Evaluation of Internal Evaluation Strategies will appear in ACM SIGKDD Explorations Newsletter 2023 (joint work with Leman Akoglu).
Nov 2022: Happy to serve as the workflow co-chair for KDD 2023!
Nov 2022: ADMoE: Anomaly Detection with Mixture-of-Experts from Noisy Labels will appear in AAAI 2023–the first framework of using multiple sets of noisy labels for detection.
Oct 2022: Have a new system paper out TOD: GPU-accelerated Outlier Detection via Tensor Operations. with George H. Chen and Zhihao Jia. VLDB paper, Code.
Oct 2022: Great news! Our proposal (led by Prof. Zhihao Jia) for AI-assisted systems has been funded via Meta 2022 AI4AI Research!
See my Google Scholar, DBLP, ORCID, and ResearchGate.
Research outcomes. I have published more than 30 papers in leading journals such as JMLR, NeurIPS, VLDB, and MLsys (primarily for unsupervised ML if not specified):
Primary field | Secondary | Method | Year | Venue | Lead author |
---|---|---|---|---|---|
large-scale Benchmark | tabular data | ADBench | 2022 | NeurIPS | Y |
large-scale Benchmark | graph learning | BOND | 2022 | NeurIPS | Y |
large-scale Benchmark | sequence data | TODS | 2021 | NeurIPS | |
automated machine learning | model selection | MetaOD | 2021 | NeurIPS | Y |
automated machine learning | model selection | ELECT | 2022 | ICDM | Y |
automated machine learning | HP optimization | HPOD | 2022 | Preprint | Y |
automated machine learning | evaluation metrics | IPM | 2023 | KDD Explor. | Y |
machine learning systems | PyOD | 2019 | JMLR | Y | |
machine learning systems | time series | TODS | 2020 | AAAI | |
machine learning systems | SUOD | 2021 | MLSys | Y | |
machine learning systems | distributed systems | TOD | 2022 | VLDB | Y |
machine learning systems | graph neural networks | PyGOD | 2022 | Preprint | Y |
robust ML | semi-supervised | XGBOD | 2018 | IJCNN | Y |
robust ML | ensemble learning | LSCP | 2019 | SDM | Y |
robust ML | ensemble learning | combo | 2020 | AAAI | Y |
robust ML | ensemble learning | COPOD | 2020 | ICDM | Y |
robust ML | ensemble learning | ECOD | 2022 | TKDE | Y |
robust ML | noisy label learning | ADMoE | 2023 | AAAI | Y |
graph mining | finance | AutoAudit | 2020 | BigData | |
graph neural networks | contrastive learning | CONAD | 2022 | PAKDD | |
Diffusion Models | survey | 2022 | Preprint | ||
AI x Science | synthetic data | SynC | 2020 | ICDMW | |
AI x Science | healthcare AI | PyHealth | 2020 | Preprint | Y |
AI x Science | Datasets & Benchmark | TDC | 2021 | NeurIPS | |
AI x Science | Datasets & Benchmark | TDC V2 | 2022 | NCHEMB |
[w23a] Weakly Supervised Anomaly Detection: A Survey, with Minqi Jiang, Chaochuan Hou, Ao Zheng, Xiyang Hu, Songqiao Han, Hailiang Huang, Xiangnan He, Philip S. Yu. Preprint.
[w22f] Diffusion Models: A Comprehensive Survey of Methods and Applications, with Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yingxia Shao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. Preprint.
[w22e] Hyperparameter Optimization for Unsupervised Outlier Detection, with Leman Akoglu. Preprint.
Professional Positions
Designed weakly supervised anomaly detection algorithms.
Supervised by Dr. Guoqing Zheng and Dr. Subhabrata (Subho) Mukherjee.
Designed new GNN systems and models.
Supervised by Prof. Jure Leskovec.
Designed new machine learning systems and models in healthcare.
Supervised by Dr. Cao (Danica) Xiao (IQVIA) and Prof. Jimeng Sun (UIUC).
Applied research in people analytics with machine learning.
Supervised by Prof. Anthony Bonner and the project is partly supported by Mitacs-Accelerate Research and Development Funding (IT07884).
Teaching Positions
I am a teaching assistant for the following courses:
The main duties include grading assignments and giving lectures on selected topics.
To find more of my open-source initiatives, see my GitHub. Popular ones: