My name is Yue ZHAO (赵越 in Chinese). I am a Ph.D. student at Carnegie Mellon University (CMU), and an ex management consultant at PwC Canada. I am a technical writer at Zhihu with 150,000 followers and more than 15M article reads. As a seasoned ML software/system architect, I have led/participated > 10 ML libraries initiatives, 10,000 GitHub stars (top 0.002%: ranked 800 out of 40M GitHub users), and >300,0000 total downloads. Popular ones:
My research focuses on three independent but interleaved streams:
At CMU, I work with Prof. Leman Akoglu (Heinz) on anomaly detection, and Prof. Zhihao Jia (CSD) on machine learning systems (MLSys).
General Notes: I am open to ML/DM Internship (2022). Please reach out :)
Contact me by Email (zhaoy [AT] cmu.edu) or WeChat (微信) @ yzhao062.[#1] I am open to collaboration opportunities (anytime & anywhere) and research internships (summer 2021/2022). I could legally work in United States (CPT), Canada (permanent residency), and China (permanent residency). I have been working with the professionals from both industry and academia (e.g., Stanford, Havard, Facebook).
[#2] Call for review oppt. I am looking for paper review, tutorial, workshop, and talk opportunities (in anomaly detection, scalable ML, machine learning systems, AutoML, information systems, and ensemble learning).
[#3] I host a WeChat group on anomaly detection (异常检测微信讨论组), along with more than three hundred of researchers (e.g., Berkley, MIT, Tsinghua, etc.) and industry people (e.g., Alibaba, IBM, Facebook, etc.) for collaboration and intern/full-time opportunities. Ping me to join!
Ph.D. in Machine Learning and Information Systems, 2019-2024
Carnegie Mellon University
M.S. in Applied Computing, 2015-2017
University of Toronto
B.S. in Computer Engineering (Minor in Computer Science and Math), 2015
University of Cincinnati
High School Diploma, 2010
Shanxi Experimental Secondary School 山西省实验中学
Mar 2020: (confirmed!) I will join Prof. Jure Leskovek‘s team @ Stanford University for a summer research intern:)
Feb 2021: Therapeutics Data Commons (TDC), a large collection of > 60 machine learning-ready datasets across more than 20 therapeutic tasks, is released. See paper on arxiv! Great work led by Kexin Huang and Prof. Marinka Zitnik from Havard!
Jan 2021: Have a new system paper (SUOD: Accelerating Large-scale Unsupervised Heterogeneous Outlier Detection) accepted at Conference on Machine Learning and Systems (MLSys). SUOD is an acceleration system for large-scale unsupervised outlier detection with Xiyang Hu. It has been downloaded by more than 900,000 times, included as part of PyOD.
Jan 2021: We have a new library PyHealth released for more than 30 state-of-the-art predictive health algorithms (mostly deep learning based). See the corresponding paper as well!
Jan 2021: Invited talk by University of Nottingham on general ML applications and career development. Link to be shared soon! See my previous talks.
[#1] I am a dedicated writer with more than 200 articles (in Chinese) and 140,000 followers on Zhihu (知乎) — Chinese Quora (200 million+ registered users). Since 2018, I have been officially recognized as a “Top Zhihu Writer” (优秀回答者) in four fields (AI, ML, DM, and STAT). My articles have been read by more than 10,000,000 times with 100,000 upvotes (statistics provided by Zhihu). See my Zhihu page (微调).
I am open to peer review and organizing chances in the field of outlier & anomaly detection, ensemble Learning, clustering, ML libraries & systems, and information systems.
IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
See my Google Scholar, DBLP, ORCID, and ResearchGate.
[w21b] Therapeutics Data Commons: Machine Learning Datasets and Tasks for Therapeutics, with Kexin Huang, Tianfan Fu, Wenhao Gao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, Marinka Zitnik. Preprint.
[w21a] PyHealth: A Python Library for Health Predictive Models, with Zhi Qiao (equal contribution), Cao (Danica) Xiao, Lucas M. Glass, and Jimeng Sun. Preprint.
[w20i] Automating Outlier Detection via Meta-Learning, with Ryan A. Rossi and Leman Akoglu. Submitted to a major CS conference, under review. Preprint.
I am happy to give talks on the series of tools I built, e.g., PyOD, combo, and SUOD. I am also willing to share my experience as a ML developer and researcher, especially on how to build ML tools from design. Please drop me a line for invite :)
I am an enthusiastic open-source developer: I build machine learning libraries and systems. Specifically, I initialized Python Outlier Detection library (PyOD) in 2018, which has become the most popular Python outlier detection toolkit. I also initialized combo: A Python Toolbox for Machine Learning Model Combination in July 2019–it is currently under active development.
I am currently working on a new ML system called SUOD (Scalable Unsupervised Outlier Detection), for accelerating model training and prediction when a large number of outlier detectors are presented on large, high-dimensional datasets. Watch/Star/Follow welcome!
Professional Positions
Designed new machine learning systems and models in healthcare.
Supervised by Dr. Cao (Danica) Xiao (IQVIA) and Prof. Jimeng Sun (UIUC).
Applied research in people analytics with machine learning.
Supervised by Prof. Anthony Bonner and the project is partly supported by Mitacs-Accelerate Research and Development Funding (IT07884).
Teaching Positions