Biography

2022 Summer Internship: Open to ML/AI research/system internship. Please reach out :)

My name is Yue ZHAO (赵越 in Chinese). I am a third-year Ph.D. student at Heinz College, Carnegie Mellon University (CMU)–the best interdisciplinary research institute in the world. Before joining CMU, I was a senior consultant at PwC Canada.

I have led or contributed as a core member to more than 10 ML open-source initiatives, receiving 11,000 GitHub stars (top 0.002%: ranked 900 out of 40M GitHub users) and >500,0000 total downloads. Popular ones:

  • [JMLR] PyOD: A Python Toolbox for Scalable Outlier Detection (Anomaly Detection).
  • [MLSys] SUOD: An Acceleration System for Large-scale Heterogeneous Outlier Detection.
  • [NeurIPS] MetaOD: Automatic Unsupervised Outlier Model Selection (AutoML).
  • PyG (PyTorch Geometric): Graph Neural Network Library for PyTorch. Contributed to profiler & benchmarking, and heterogeneous data transformation, as a member of the PyG team.
  • [NeurIPS] TDC: An extensive machine learning data hub for drug discovery.
  • [ICDM] COPOD: A fast and parameter-free outlier detection method.
  • [AAAI] combo: A Python Toolbox for ML Model Combination (Ensemble Learning).
  • [NeurIPS, AAAI] TODS: Time-series Outlier Detection. Contributed to core detection models.

I specialize in designing and building machine learning systems (MLSys), with realization and applications in outlier detection, healthcare, graph neural networks, and ensemble learning. My research focuses on the intersection of two fields:

  • machine learning systems that can speed/scale up and automate underlying algorithms
  • data mining algorithms like outlier detection (anomaly detection) and ensemble learning

At CMU, I work with Prof. Leman Akoglu from DATA Lab on outlier detection, Prof. Zhihao Jia from Catalyst on machine learning systems, and Prof. George H. Chen on general ML and statistics. Externally, I am also fortunate to visit and collaborate with Prof. Jure Leskovec at Stanford University.

Startup and VC: I am interested in capitalizing my expertise in machine learning systems and outlier detection. Let's connect!

Contact me by Email (zhaoy [AT] cmu.edu) or WeChat (微信) @ yzhao062.

[#1] Call for review oppt. I am looking for paper review, tutorial, workshop, and talk opportunities (in anomaly detection, scalable ML, machine learning systems, and AutoML).

[#2] I host a WeChat group on anomaly detection (异常检测微信讨论组) & machine learning systems (MLSys讨论组), along with more than three hundred of researchers (e.g., Berkley, MIT, Tsinghua, etc.) and industry people (e.g., Alibaba, IBM, Facebook, etc.) for collaboration and intern/full-time opportunities. Ping me to join!

[#3] I am a dedicated writer with more than 300 articles (in Chinese) and 160,000 followers on Zhihu (知乎) — Chinese Quora (200 million+ registered users). I have been officially recognized as a “Top Writer” (优秀回答者) in four fields (AI, ML, DM, and STAT). My articles have been read by more than 20,000,000 times. See my Zhihu page (微调).

Interests

  • Outlier & Anomaly Detection
  • Machine Learning Systems (MLSys)
  • Automated Machine Learning
  • Scalable Machine Learning
  • Parallel Computing
  • Healthcare AI & Therapeutic for ML
  • Graph Neural Networks
  • Ensemble Learning
  • Information Systems

Education

  • Ph.D. Student in Information Systems and Management, 2019-2023

    Carnegie Mellon University

  • M.S. in Applied Computing, 2015-2017

    University of Toronto

  • B.S. in Computer Engineering (Minor in Computer Science and Math), 2015

    University of Cincinnati

  • High School Diploma, 2010

    Shanxi Experimental Secondary School 山西省实验中学

Miscellaneous

News & Travel

Oct 2021: 🌟 Reached 500 citations on Google scholar!

NeurIPS 2021: Happy to have multiple papers accepted at NeurIPS 2021, including my latest work on Automatic Unsupervised Outlier Model Selection and the participation in two impactful ML initiatives: (i) Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development by {Havard, Gatech, and MIT} and (ii) Revisiting Time Series Outlier Detection: Definitions and Benchmarks by Rice University.

Sep 2021: Happy to spend my summer (as a member of the PyG team) for adding some cool features to PyTorch Geometric (PyG). Check out PyG 2.0!


Profile & Casual Pictures


Resources

Services

I am open to peer review and organizing chances in the field of outlier & anomaly detection, ensemble Learning, clustering, ML libraries & systems, and information systems.

Journal/Conference Reviewer

Journal:

Conference:

Program Committee

Publications

See my Google Scholar, DBLP, ORCID, and ResearchGate.

Prepints & Working Papers

[w21d] Copula-Based Outlier Detection, with Zheng Li, Xiyang Hu, Nicola Botta, Cezar Ionescu, and George H. Chen. 1st round revision at TKDE. Preprint.

[w21c] A Large-scale Study on Unsupervised Outlier Model Selection: Do Internal Strategies Suffice? with Martin Q. Ma (equal contribution), Xiaorong Zhang, and Leman Akoglu. Preprint.


Peer-reviewed Papers

(2021). Automatic Unsupervised Outlier Model Selection. Advances in Neural Information Processing Systems (NeurIPS).

PDF

(2021). Revisiting Time Series Outlier Detection: Definitions and Benchmarks. Advances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track.

PDF Code

(2020). Combining Machine Learning Models Using combo Library. Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), demo track.

PDF Code Video DOI

(2020). SynC: A Unified Framework for Generating Synthetic Population with Gaussian Copula. Workshops at the Thirty-Fourth AAAI Conference on Artificial Intelligence.

PDF Code PPAI Arxiv

(2018). DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Workshop on Outlier Detection De-constructed (ODD).

PDF Poster Slides

(2017). An empirical study of touch-based authentication methods on smartwatches. Proceedings of the 2017 ACM International Symposium on Wearable Computers (Equal contribution).

PDF DOI ACM DL

Experience

Professional Positions

 
 
 
 
 

Visiting Student Researcher

Stanford University, Computer Science Department,

May 2021 – Aug 2021 Stanford, CA, USA

Designed new GNN systems and models.

Supervised by Prof. Jure Leskovec.

 
 
 
 
 

Machine Learning Research Intern

IQVIA, Analytics Center of Excellence

May 2020 – Aug 2020 Boston, MA, USA

Designed new machine learning systems and models in healthcare.

Supervised by Dr. Cao (Danica) Xiao (IQVIA) and Prof. Jimeng Sun (UIUC).

 
 
 
 
 

Senior Consultant

PwC Canada, Consulting & Deals

Feb 2017 – Jun 2019 Toronto, ON, Canada
I was a senior consultant with the following duties:

  • Designed fraud analytic solutions for major Canadian banks and insurance firms.
  • Led applied data analytics projects, e.g., client segmentation and churn analysis.
  • Developed multiple pricing optimization models with statistical methods.
 
 
 
 
 

Research Associate (Intern)

PwC Canada, Consulting & Deals

May 2016 – Dec 2016 Toronto, ON, Canada

Applied research in people analytics with machine learning.

Supervised by Prof. Anthony Bonner and the project is partly supported by Mitacs-Accelerate Research and Development Funding (IT07884).

 
 
 
 
 

Software Engineer (Contract & Intern)

Siemens PLM Software USA

Mar 2012 – Dec 2014 Cincinnati, Ohio, USA
As a co-op student and contractor, my works include:

  • Managed a Java project to transition the LabManager system to vCloud Director.
  • Refactored outdated automation code and added new modules and JUnit test cases.
  • Led a C++ Code Coverage project on Teamcenter platform to strengthen its stability.

Experience

Teaching Positions

 
 
 
 
 

Teaching Assistant

Carnegie Mellon University, Heinz College of Information Systems and Public Policy

Feb 2020 – Present Pittsburgh, PA, United States

I am a teaching assistant for the following courses:

  • Intro to Artificial Intelligence taught by Prof. David Steier (Fall 2020, Spring 2021, Fall 2021).
  • Statistics for IT Managers taught by Prof. Daniel Nagin (Fall 2021).

The main duties include grading assignments and giving lectures on selected topics.

 
 
 
 
 

Teaching Assistant

University of Toronto, Department of Computer Science

Sep 2015 – Dec 2015 Toronto, ON, Canada
I was a teaching assistant for Embedded Systems taught by Prof. Philip Anderson.
 
 
 
 
 

Teaching Assistant

University of Cincinnati, Department of Electrical Engineering & Computer Science

Sep 2014 – Dec 2014 Cincinnati, OH, USA
I was a teaching assistant for Introduction to Programming taught by Prof. George Purdy.

Funds and Awards

CMU GSA/Provost Conference Funding

Part of the travel grant for attending ICDM 2020.

AAAI Student Travel Grant & CMU GSA/Provost Conference Funding

Part of the travel grant for attending AAAI 2020.

Mitacs-Accelerate Research and Development Funding

Project IT07884 ($30,000): machine learning in HR analytics.

Mantei/Mae Award & Scholar

Awarded to highest-performing students in Electrical Engineering, Computer Engineering, and Computer Science ($40,000 in four years).

University Global Award and Scholarship

Awarded to top performing international students ($32,000 in four years).

Open-source Initiatives

I am happy to give talks on the series of tools I built, e.g., PyOD, combo, and SUOD. I am willing to share my experience as a ML developer and researcher, especially on how to build ML systems. Please drop me a line for invite :)

I am an enthusiastic open-source developer: I build machine learning libraries and systems. Specifically, I initialized Python Outlier Detection library (PyOD) in 2018, which has become the most popular Python outlier detection toolkit. I also initialized combo: A Python Toolbox for Machine Learning Model Combination in July 2019–it is currently under active development.

My recent works is a new ML system called SUOD (Scalable Unsupervised Outlier Detection), for accelerating model training and prediction when a large number of outlier detectors are presented on large, high-dimensional datasets. Watch/Star/Follow welcome!

To find more of my open-source initiatives, see my GitHub.

PyG (PyTorch Geometric)

Graph Neural Network Library for PyTorch

Therapeutics Data Commons (TDC)

Machine Learning Datasets and Tasks for Drug Discovery and Development

SUOD

SUOD Accelerating Large-scale Unsupervised Heterogeneous Outlier Detection

combo

A Python Toolbox for Machine Learning Model Combination.

Python Outlier Detection Toolbox

PyOD–A Python Toolbox for Scalable Outlier Detection (Anomaly Detection).

Contact

WeChat (微信) @ yzhao062