I am pursuing a Joint Ph.D. degree in Machine Learning & Public Policy at Carnegie Mellon University, advised by Prof. Leman Akoglu. Before coming to CMU, I have more than 5-year software engineering and management consulting experience.
Research Keywords: Outlier & Anomaly Detection; Ensemble Learning; Scalable Machine Learning; ML Library and System
I am always open to collaboration opportunities (anywhere on the earth) and applied research internships (United Stats (CPT), Canada (Residency), and China (Residency); I need visa sponsorship for other countries). I have been working with the researchers from both industry and academia (U of Toronto, UIUC, IQVIA, PwC, Arima etc.). Feel free to reach out by Email (zhaoy [AT] cmu.edu) or WeChat (微信).
If you want to meet in person for a coffee in Pittsburgh (it is on me), drop me a line ☕
Joint Ph.D. in Machine Learning & Public Policy (Expected); Ph.D. in Information System & Management (Primary), 2019-2024
Carnegie Mellon University
M.S. in Applied Computing, 2015-2017
University of Toronto
B.S. in Computer Engineering (Minor in Computer Science and Math), 2015
University of Cincinnati
High School Diploma, 2010
Shanxi Experimental Secondary School 山西省实验中学
Dec 2019: Our paper “SynC: A Unified Framework for Generating Synthetic Population with Gaussian Copula” is accepted at AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI). See you in New York!
Dec 2019: Have a preliminary paper on accelerating the training & prediction with a large number of unsupervised anomaly detectors: “SUOD: Toward Scalable Unsupervised Outlier Detection”. More in-depth theoretical justification and an accompanied scalable python toolkit SUOD will be released for KDD 2020 (ADS track).
Dec 2019: PyOD has been downloaded by more than 350,000 times!
Nov 2019: Received more than 5,000 ⭐ on GitHub.
Oct 2019: Our demo paper “Combining Machine Learning Models and Scores Using combo library” on ML library combo is accepted at AAAI 2020. See you in New York! Check out our Demo Video!
[#1] I am an active software developer with more than 5,000 GitHub stars in total (top 1,200 among 37,000,000 GitHub developers ranked by Gitstar Ranking). I led multiple popular open-source ML initiatives, including PyOD (total downloads > 350,000 times), combo, anomaly-detection-resources, and awesome-ensemble-learning.
[#2] I am a dedicated technical writer with more than 200 articles (in Chinese) and 90,000 followers on Zhihu (知乎) — Chinese Quora (200 million+ registered users). Since 2018, I have been officially recognized as a “Top Zhihu Writer” (优秀回答者) in four fields (AI, ML, DM, and STAT). My articles have been read by more than 8,000,000 times with 95,000 upvotes (statistics provided by Zhihu). See my Zhihu page.
High-resolution profile pictures can be downloaded here: Professional, Casual.
I am open to peer review and organizing chances (all types of venues) in the field of outlier & anomaly detection, ensemble Learning, clustering, ML libraries & systems, and information systems. Please send me an email (zhaoy@cmu.edu) or a request in the corresponding reviewing/organizing system.
[w20a] DNA: Differentiating Noise from Anomaly
[w20b] Outlier Detection via Semi-supervised Generative Models
[w19e] Yue Zhao, Xueying Ding, Jianing Yang, and Haoping Bai. SUOD: Toward Scalable Unsupervised Outlier Detection. AAAI Conference on Artificial Intelligence Workshop, 2020. Submitted, under review. Under R&R for KDD 2020 ADS track. [PDF] [Code] [Slides]
[w19c] Yiqun Mei (UIUC), Yue Zhao. A New Image Super-Resolution Method (Name masked due to the double-blind policy). Submitted to a major CV conference, under review.
I will be happy to give talks on the series of tools I built, e.g., PyOD and combo. I am also happy to discuss the experience as a ML developer and researcher and how to build ML tools from design. Please drop me a line for invite :)
I am an enthusiastic open-source developer: I build machine learning libraries and systems. Specifically, I initialized Python Outlier Detection library (PyOD) in 2018, which has become the most popular Python outlier detection toolkit. I also initialized combo: A Python Toolbox for Machine Learning Model Combination in July 2019–it is currently under active development. Watch/Star/Follow welcome!
Professional Positions
Applied research in people analytics with machine learning.
Supervised by Prof. Anthony Bonner and the project is partly supported by Mitacs-Accelerate Research and Development Funding (IT07884).
Teaching Positions