Yue Zhao

Ph.D. Student in Machine Learning and Public Policy (expected)

H. John Heinz III College

Carnegie Mellon University


My name is Yue ZHAO (赵越 in Chinese). I am pursuing a Ph.D. in Machine Learning and Public Policy (expected) at Carnegie Mellon University. At CMU, I have been fortunate to work with Prof. Leman Akoglu and Prof. Pedro Ferreira. I focus on:

  • data mining topics related to scalability, reliability, and automation and
  • information systems topics related to interaction, trade-off, and cooperation between human and “AI”
Contact me by Email (zhaoy [AT] cmu.edu) or WeChat (微信).

[#1] I am open to collaboration opportunities (anytime & anywhere) and research internships (open for Summer 2021). I could legally work in United States (CPT), Canada (permanent residency), and China (permanent residency). I have been working with the professionals from both industry and academia (U Toronto, UIUC, Texas A&M University, Tsinghua U, Purdue University, Northeastern U, IQVIA, Adobe, PwC, Arima, etc.).

[#2] I am actively looking for paper review, tutorial, workshop, and talk opportunities (in anomaly detection, AutoML, ensemble learning, scalable ML, and learning systems).

[#3] I host a WeChat group on anomaly detection (异常检测微信讨论组), along with more than a hundred of researchers (e.g., Berkley, Tsinghua, etc.) and industry people (e.g., Alibaba, IBM, Faceboook, etc.) for collaboration and intern/full-time position opportunities. Ping me to join!

[#4] In addition to develop the most popular outlier detection toolbox PyOD, I am also maintaining a knowledge repository for anomaly detection resources for related books, papers, videos, and toolboxes. Check out to know more about the field!

Word Cloud from Paper Titles


  • Outlier & Anomaly Detection
  • Automated Machine Learning
  • Scalable Machine Learning
  • Machine Learning Systems
  • Ensemble Learning
  • Clustering
  • Active Learning
  • Information Systems


  • Ph.D. in Machine Learning and Public Policy (expected), 2019-2024

    Carnegie Mellon University

  • M.S. in Applied Computing, 2015-2017

    University of Toronto

  • B.S. in Computer Engineering (Minor in Computer Science and Math), 2015

    University of Cincinnati

  • High School Diploma, 2010

    Shanxi Experimental Secondary School 山西省实验中学


News & Travel

Jul 2020: Congrats to many of my great friends and fellows who become partners and principals at PwC Canada and US in 2020! Applause to David P., Jon Wong, and Marie K.!

Jul 2020: PyOD has been downloaded by more than 1,000,000 times!

Jun 2020: Busy with multiple AutoML projects. Have three papers submitted to CIKM and ICDM. Finger crossed!

May-Aug 2020: I am doing a summer ML research internship at IQVIA, one of the largest human data science firms, in collaboration with Dr. Cao (Danica) Xiao (IQVIA) and Prof. Jimeng Sun (UIUC).

May 2020: Have a new system paper (SUOD: A Scalable Unsupervised Outlier Detection Framework) ready to submit. SUOD is an acceleration system for large-scale unsupervised outlier detection. It has been downloaded by more than 300,000 times, and presented in AAAI Workshop on Artificial Intelligence for Cyber Security (AICS).

Fun Facts

[#1] I am an active software/system developer with more than 7,000 GitHub stars in total (top 1,000 among 37,000,000 GitHub developers ranked by Gitstar Ranking). I led multiple popular open-source ML initiatives, including PyOD, combo, SUOD, anomaly-detection-resources, and awesome-ensemble-learning.

[#2] I am a dedicated technical writer with more than 200 articles (in Chinese) and 130,000 followers on Zhihu (知乎) — Chinese Quora (200 million+ registered users). Since 2018, I have been officially recognized as a “Top Zhihu Writer” (优秀回答者) in four fields (AI, ML, DM, and STAT). My articles have been read by more than 10,000,000 times with 100,000 upvotes (statistics provided by Zhihu). See my Zhihu page.

Profile & Casual Pictures



I am open to peer review and organizing chances in the field of outlier & anomaly detection, ensemble Learning, clustering, ML libraries & systems, and information systems.

Journal/Conference Reviewer

Program Committee

Talks and Presentation

[04/17/2020; Pittsburgh, PA] I will present “Developing Python Libraries for Machine Learning: Best Practices and Lessons Learned” at The Python Conference (PyCon) 2020, the largest annual gathering for the Python community.


See my Google Scholar, ORCID, and ResearchGate.

Prepints & Working Papers

[w20g] SALINE: A Scalable and Flexible System for Machine Learning in Healthcare, with Zhi Qiao, Cao (Danica) Xiao, and Jimeng Sun. To be submitted to JMLR (MLOSS track).

[w20f] A Statistical Based Approach for Synthetic Data Generation (Name masked due to the double-blind policy), with Zheng Li, Jialin Fu. Submitted to a major data mining conference, under review.

[w20e] A Statistical Based Approach for Outlier Detection (Name masked due to the double-blind policy), with Zheng Li, Nicola Botta, Cezar Ionescu, Xiyang Hu. Submitted to a major data mining conference, under review.

[w20f] A Cell Clustering Paper (Name masked due to the double-blind policy), with Changlin Wan, Dongya Jia, Wennan Chang, Sha Cao, Xiao Wang, Chi Zhang. Submitted to a major data mining conference, under review.

[w20b] A New Semi-supervised Anomaly Detection Model (Name masked due to the double-blind policy), with Cheng Cheng (co-first author) and Xiyang Hu, Cao (Danica) Xiao (IQVIA), Yunlong Wang (IQVIA), Prof. Jimeng Sun (UIUC) and Prof. Jeremy C. Weiss.
To be submitted to a major data mining conference.

[w20a] SUOD: An Acceleration System for Large-Scale Unsupervised Outlier Detection, with Xiyang Hu, Cheng Cheng, Cong Wang (CMU & Tsinghua U), Changlin Wan (Purdue U) Cao (Danica) Xiao (IQVIA), Yunlong Wang (IQVIA), Prof. Jimeng Sun (UIUC) and Prof. Leman Akoglu.
Accepted in AAAI 2020 Workshop; to be submitted.

[w20e] DNA: Differentiating Noise from Anomaly by Generative Models.

[w20f] Improving Supervised Anomaly Detection via Unsupervised Representation Learning.

Peer-reviewed Papers

(2020). DSR: An Accurate Single Image Super Resolution Approach for Various Degradations. IEEE International Conference on Multimedia and Expo (ICME).

IEEE Xplore

(2020). Combining Machine Learning Models Using combo Library. Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), demo track.

PDF Code Video

(2020). SUOD: Toward Scalable Unsupervised Outlier Detection. Workshops at the Thirty-Fourth AAAI Conference on Artificial Intelligence.

PDF Code Slides AICS

(2020). SynC: A Unified Framework for Generating Synthetic Population with Gaussian Copula. Workshops at the Thirty-Fourth AAAI Conference on Artificial Intelligence.

PDF Code PPAI Arxiv

(2018). DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Workshop on Outlier Detection De-constructed (ODD).

PDF Poster Slides

(2017). An empirical study of touch-based authentication methods on smartwatches. Proceedings of the 2017 ACM International Symposium on Wearable Computers (Equal contribution).



I am happy to give talks on the series of tools I built, e.g., PyOD, combo, and SUOD. I am also willing to share my experience as a ML developer and researcher, especially on how to build ML tools from design. Please drop me a line for invite :)

I am an enthusiastic open-source developer: I build machine learning libraries and systems. Specifically, I initialized Python Outlier Detection library (PyOD) in 2018, which has become the most popular Python outlier detection toolkit. I also initialized combo: A Python Toolbox for Machine Learning Model Combination in July 2019–it is currently under active development.

I am currently working on a new ML system called SUOD (Scalable Unsupervised Outlier Detection), for accelerating model training and prediction when a large number of outlier detectors are presented on large, high-dimensional datasets. Watch/Star/Follow welcome!


An Acceleration System for Large Scale Unsupervised Anomaly Detection


A Python Toolbox for Machine Learning Model Combination.

Python Outlier Detection Toolbox

PyOD–A Python Toolbox for Scalable Outlier Detection (Anomaly Detection).


Professional Positions


Machine Learning Research Intern

IQVIA, Analytics Center of Excellence

May 2020 – Aug 2020 Boston, MA, USA

Designed new machine learning systems and models in healthcare.

Supervised by Dr. Cao (Danica) Xiao (IQVIA) and Prof. Jimeng Sun (UIUC).


Senior Consultant

PwC Canada, Consulting & Deals

Feb 2017 – Jun 2019 Toronto, ON, Canada
I was a senior consultant with the following duties:

  • Designed fraud analytic solutions for major Canadian banks and insurance firms.
  • Led applied data analytics projects, e.g., client segmentation and churn analysis.
  • Developed multiple pricing optimization models with statistical methods.

Research Associate (Intern)

PwC Canada, Consulting & Deals

May 2016 – Dec 2016 Toronto, ON, Canada

Applied research in people analytics with machine learning.

Supervised by Prof. Anthony Bonner and the project is partly supported by Mitacs-Accelerate Research and Development Funding (IT07884).


Software Engineer (Contract & Intern)

Siemens PLM Software USA

Mar 2012 – Dec 2014 Cincinnati, Ohio, USA
As a co-op student and contractor, my works include:

  • Managed a Java project to transition the LabManager system to vCloud Director.
  • Refactored outdated automation code and added new modules and JUnit test cases.
  • Led a C++ Code Coverage project on Teamcenter platform to strengthen its stability.


Teaching Positions


Teaching Assistant

University of Toronto, Department of Computer Science

Sep 2015 – Dec 2015 Toronto, ON, Canada
I was a teaching assistant for Embedded Systems taught by Prof. Philip Anderson.

Teaching Assistant

University of Cincinnati, Department of Electrical Engineering & Computer Science

Sep 2014 – Dec 2014 Cincinnati, OH, USA
I was a teaching assistant for Introduction to Programming taught by Prof. George Purdy.

Funds and Awards

AAAI Student Travel Grant & CMU GSA/Provost Conference Funding

Part of the travel grant for attending AAAI 2020.

Mitacs-Accelerate Research and Development Funding

Project IT07884 ($30,000): machine learning in HR analytics.

Mantei/Mae Award & Scholar

Awarded to highest-performing students in Electrical Engineering, Computer Engineering, and Computer Science ($40,000 in four years).

University Global Award and Scholarship

Awarded to top performing international students ($32,000 in four years).