📰 I am on the market with expected graduation in Spring 2023. I am broadly interested in machine learning, data mining/science, and information systems positions. I can work in the U.S., and Canada without sponsorship; please reach out if you have an open opportunity! Contact me by Email (zhaoy [AT] cmu.edu) or WeChat (微信) @ yzhao062.
Who am I (赵越)? I am a 4-th (final) year Ph.D. candidate at Carnegie Mellon University (CMU). Before joining CMU, I earned my MS degree from University of Toronto (2016) and BS degree from University of Cincinnati (2015), and worked as a senior consultant at PwC Canada (2016-19). I am an expert on anomaly detection (a.k.a outlier detection), machine learning systems (MLSys), and automated machine learning (AutoML), with more than 7-year professional experience and 30+ papers (in JMLR, NeurIPS, VLDB, MLsys, etc.) I appreciate the generous support from CMU Presidential Fellowship and Norton Labs Graduate Fellowship.
At CMU, I work with Prof. Leman Akoglu for automated data mining, Prof. Zhihao Jia for machine learning systems, and Prof. George H. Chen for general ML. I am a member of CMU automated learning systems group (Catalyst) and Data Analytics Techniques Algorithms (DATA) Lab. Externally, I collaborate with Prof. Jure Leskovec at Stanford and Prof. Philip S. Yu at UIC.
Contributions Machine Learning Systems for Anomaly Detection: I use machine learning systems (MLSys) techniques to support large-scale, real-world outlier detection applications in security, finance, and healthcare with millions of downloads. I designed CPU-based (PyOD), GPU-based (TOD), distributed detection systems (SUOD) for tabular (PyOD), time-series (TODS), and graph data (PyGOD). To understand the characteristics of OD algorithms, I co-author large-scale benchmarks for tabular data (ADBench), time-series data (paper), and graph data (BOND). My work has been widely used by thousands of projects and applications, including Amazon, IBM, Morgan Stanley, and Tesla. See more applications.
Research outcomes (primarily for outlier detection if not specified):
|Primary field||Secondary||Method||Year||Venue||Lead author|
|large-scale Benchmark||tabular anomaly detection||ADBench||2022||NeurIPS||Y|
|large-scale Benchmark||graph anomaly detection||BOND||2022||NeurIPS||Y|
|large-scale Benchmark||sequence anomaly detection||TODS||2021||NeurIPS|
|automated machine learning||outlier model selection||MetaOD||2021||NeurIPS||Y|
|automated machine learning||outlier model selection||ELECT||2022||ICDM||Y|
|automated machine learning||outlier HP optimization||HPOD||2022||Preprint||Y|
|automated machine learning||outlier evaluation||IPM||2021||Preprint||Y|
|machine learning systems||PyOD||2019||JMLR||Y|
|machine learning systems||time series||TODS||2020||AAAI|
|machine learning systems||SUOD||2021||MLSys||Y|
|machine learning systems||distributed systems||TOD||2022||VLDB||Y|
|machine learning systems||graph neural networks||PyGOD||2022||Preprint||Y|
|robust ML||ensemble learning||LSCP||2019||SDM||Y|
|robust ML||ensemble learning||combo||2020||AAAI||Y|
|robust ML||ensemble learning||COPOD||2020||ICDM||Y|
|robust ML||ensemble learning||ECOD||2022||TKDE||Y|
|robust ML||noisy label learning||ADMoE||2023||AAAI||Y|
|graph neural networks||contrastive learning||CONAD||2022||PAKDD|
|AI x Science||synthetic data||SynC||2020||ICDMW|
|AI x Science||healthcare AI||PyHealth||2020||Preprint||Y|
|AI x Science||Datasets & Benchmark||TDC||2021||NeurIPS|
|AI x Science||Datasets & Benchmark||TDC V2||2022||NCHEMB|
Open-source Contribution: I have led or contributed as a core member to more than 10 ML open-source initiatives, receiving 15,000 GitHub stars (top 0.002%: ranked 800 out of 40M GitHub users) and >10,000,000 total downloads. Popular ones:
[#2] I am a dedicated writer with more than 300 articles (in Chinese) and 200,000 followers on Zhihu (知乎) — Chinese Quora (200 million+ registered users). I have been officially recognized as a “Top Writer” (优秀回答者) in four fields (AI, ML, DM, and STAT). My articles have been read by more than 20,000,000 times. See my Zhihu page (微调).Contact me by Email (zhaoy [AT] cmu.edu) or WeChat (微信) @ yzhao062.
Ph.D. Candidate in Information Systems, 2019-2023 (expected)
Carnegie Mellon University
M.S. in Applied Computing, 2015-2017
University of Toronto
B.S. in Computer Engineering (Minor in Computer Science and Math), 2015
University of Cincinnati
High School Diploma, 2010
Shanxi Experimental Secondary School 山西省实验中学
Nov 2022: Happy to serve as the workflow co-chair for KDD 2023!
Nov 2022: ADMoE: Anomaly Detection with Mixture-of-Experts from Noisy Labels will appear in AAAI 2023–the first framework of using multiple sets of noisy labels for anomaly detection.
Oct 2022: Great news! Our proposal (led by Prof. Zhihao Jia) for AI-assisted systems has been funded via Meta 2022 AI4AI Research!
Sep 2022: Artificial Intelligence Foundation for Therapeutic Science published in Nature Chemical Biology. The paper describes Therapeutics Data Commons (TDC) and its various use cases, laying the foundation of therapeutic science.
[w22f] Diffusion Models: A Comprehensive Survey of Methods and Applications, with Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yingxia Shao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. Preprint.
[w22e] Hyperparameter Optimization for Unsupervised Outlier Detection, with Leman Akoglu. Preprint.
[w21c] A Large-scale Study on Unsupervised Outlier Model Selection: Do Internal Strategies Suffice? with Martin Q. Ma (equal contribution), Xiaorong Zhang, and Leman Akoglu. Preprint.
Designed new machine learning systems and models in healthcare.
Supervised by Dr. Cao (Danica) Xiao (IQVIA) and Prof. Jimeng Sun (UIUC).
Applied research in people analytics with machine learning.
Supervised by Prof. Anthony Bonner and the project is partly supported by Mitacs-Accelerate Research and Development Funding (IT07884).
I am a teaching assistant for the following courses:
The main duties include grading assignments and giving lectures on selected topics.
To find more of my open-source initiatives, see my GitHub.