Miscellaneous

News & Travel

Oct 2019: I had a talk on “text generation by generative models”. Check out the slides.

Oct 2019: PyOD has been downloaded by more than 300,000 times!

Oct 2019: Our demo paper “Combining Machine Learning Models and Scores Using combo library” on ML library combo is accepted at AAAI 2020. See you in New York! Check out our Demo Video!


Fun Facts

[#1] I am an active software developer with more than 4,800 GitHub stars in total (top 1,200 among 37,000,000 GitHub developers ranked by Gitstar Ranking). I led multiple popular open-source ML initiatives, including PyOD (total downloads > 300,000 times), combo, anomaly-detection-resources, and awesome-ensemble-learning.

[#2] I am a dedicated technical writer with more than 200 articles (in Chinese) and 85,000 followers on Zhihu (知乎) — Chinese Quora (200 million+ registered users). Since 2018, I have been officially recognized as a “Top Zhihu Writer” (优秀回答者) in four fields (AI, ML, DM, and STAT). My articles have been read by more than 8,000,000 times (statistics by Zhihu). See my Zhihu page.

[#3] From Oct 2019 to May 2020, I am a GSA International Student Co-Advocate, along with Sandra Fomete. The Graduate Student Assembly (GSA) is the branch of student government that represents all graduate students at CMU. I am working with administrators at different levels and student groups to improve international graduate students’ experience at CMU. My core focus includes immigration (visa and CPT/OPT), career development, and mental health issues, which are particularly relevant to international students. If you are a CMU graduate student that wants to discuss related issues with me, please shoot me an email.


Profile Pictures

High-resolution profile pictures can be downloaded here: Professional, Casual.

Publications

I am open to peer review and organizing chances (all types of venues) in the field of outlier & anomaly detection, ensemble Learning, clustering, ML libraries & systems, and information systems. Please send me an email (zhaoy@cmu.edu) or a request in the corresponding reviewing/organizing system.

Journal Reviewer


Working Papers

[w19e] SUOD: Scalable Unsupervised Outlier Detection by Projection, Approximation, and Parallelization. With Haoping Bai (CMU), Xueying Ding (CMU), and Jianing Yang (CMU).

Short Intro: As the ground truth is often absent in anomaly detection tasks, using a single detection model may incur huge risk. As a remedy, practitioners resort for training a large group of detectors to combine for both performance improvement and stability enhancement. However, training many unsupervised models can be computationally expensive, which makes this approach infeasible in real-world applications. In this work, we propose a three-phase framework called Scalable Unsupervised Outlier Detection (SUOD) to speed up this process. Empirically, the proposed SOUD framework shows great performance regarding both accuracy and efficiency. More to come.

[w19c] A New Image Super-Resolution (MR) Method (Name masked due to coming submission). With Yiqun Mei (UIUC).

Under Review

[w19d] Colin Wan, Zheng Li, Alicia Guo, Yue Zhao. SynC: A Unified Framework for Generating Synthetic Population with Gaussian Copula. AAAI Conference on Artificial Intelligence Workshop, 2020. Submitted, under review. Under R&R for KDD 2020.


Quickly discover relevant content by filtering publications.

Peer-reviewed Papers

(2019). Combining Machine Learning Models Using combo Library. AAAI Conference on Artificial Intelligence (AAAI), demo track. Accepted, to appear.

PDF Code Video

(2018). DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Workshop on Outlier Detection De-constructed (ODD).

PDF Poster Slides

(2017). An empirical study of touch-based authentication methods on smartwatches. Proceedings of the 2017 ACM International Symposium on Wearable Computers (Equal contribution).

PDF DOI ACM DL

Software

It is my pleasure to give talks on the series of tools I built, including PyOD and combo listed below. I am also happy to discuss the experience as a ML developer and researcher and how to build ML tools. Please drop me a line for invite :)

I am an enthusiastic open-source developer: I build machine learning libraries and systems. Specifically, I initialized Python Outlier Detection library (PyOD) in 2018, which has become the most popular Python outlier detection toolkit. I also initialized combo: A Python Toolbox for Machine Learning Model Combination in July 2019–it is currently under active development. Watch/Star/Follow welcome!

A Python Toolbox for Machine Learning Model Combination.

PyOD–A Python Toolbox for Scalable Outlier Detection (Anomaly Detection).

Experience

Professional Positions

 
 
 
 
 

Senior Consultant

PwC Canada, Consulting & Deals

Feb 2017 – Jun 2019 Toronto, ON, Canada
I was a senior consultant with the following duties:

  • Designed fraud analytic solutions for major Canadian banks and insurance firms.
  • Led applied data analytics projects, e.g., client segmentation and churn analysis.
  • Developed multiple pricing optimization models with statistical methods.
 
 
 
 
 

Research Associate (Intern)

PwC Canada, Consulting & Deals

May 2016 – Dec 2016 Toronto, ON, Canada

Applied research in people analytics with machine learning.

Supervised by Prof. Anthony Bonner and the project is partly supported by Mitacs-Accelerate Research and Development Funding (IT07884).

 
 
 
 
 

Software Engineer (Intern & Contract)

Siemens PLM Software USA

Mar 2012 – Dec 2014 Cincinnati, Ohio, USA
As a co-op student and contractor, my works include:

  • Managed a Java project to transition the LabManager system to vCloud Director.
  • Refactored outdated automation code and added new modules and JUnit test cases.
  • Led a C++ Code Coverage project on Teamcenter platform to strengthen its stability.

Experience

Teaching Positions

 
 
 
 
 

Teaching Assistant

University of Toronto, Department of Computer Science

Sep 2015 – Dec 2015 Toronto, ON, Canada
I was a teaching assistant for Embedded Systems taught by Prof. Philip Anderson.
 
 
 
 
 

Teaching Assistant

University of Cincinnati, Department of Electrical Engineering & Computer Science

Sep 2014 – Dec 2014 Cincinnati, OH, USA
I was a teaching assistant for Introduction to Programming taught by Prof. George Purdy.

Funds and Awards

Mitacs-Accelerate Research and Development Funding

Project IT07884 ($30,000): machine learning in HR analytics.

Mantei/Mae Award & Scholar

Awarded to highest-performing students in Electrical Engineering, Computer Engineering, and Computer Science ($40,000 in four years).

University Global Award and Scholarship

Awarded to top performing international students ($32,000 in four years).

Contact

  • zhaoy@cmu.edu
  • Hamburgh Hall 2005, 4800 Forbes Ave, Pittsburgh, PA, USA, 15213
  • Thursday 15:00 to 18:00
    Friday 15:00 to 18:00
    or by appointment