I am pursuing a Joint Ph.D. degree in Machine Learning & Public Policy at Carnegie Mellon University. Before coming to CMU, I have more than 5-year industry experience as a software engineer and management consultant. See my professional experience. My advisors are Prof. Leman Akoglu and Prof. Amelia Haviland.
My interests lie on the applied side of the learning algorithms. I especially care why, when, and how to use learning models to bring social impact. In addition to propose new ensemble learning, outlier detection, and clustering algorithms, I design and implement accessible and scalable machine learning systems and libraries as well. More importantly, I enjoy applying learning algorithms to solve real-world problems (e.g., healthcare, education, security, and finance), i.e., build applications and understand their implications.
I am always open to collaboration opportunities (anywhere on the earth) and applied research internships (United Stats (CPT), Canada (Residency), and China (Residency); I will need visa sponsorship for other countries). I have been working with different researchers from both industry and academia (U of Toronto, UIUC, PwC etc.). Please do not hesitate to reach out by Email (zhaoy [AT] cmu.edu) or WeChat (微信).
If you would like to meet in person for a coffee (it is on me), just drop me a line. I know several excellent coffee shops in CMU/Pitt that you would not want to miss :)
Joint Ph.D. in Machine Learning & Public Policy (Expected); Ph.D. in Information System & Management (Primary), 2019-2024
Carnegie Mellon University
M.S. in Applied Computing, 2016
University of Toronto
B.S. in Computer Engineering (Minor in Computer Science and Math), 2015
University of Cincinnati
High School Diploma, 2010
Shanxi Experimental Secondary School 山西省实验中学
Oct 2019: I had a talk on “text generation by generative models”. Check out the slides.
Oct 2019: PyOD has been downloaded by more than 300,000 times!
Oct 2019: Our demo paper “Combining Machine Learning Models and Scores Using combo library” on ML library combo is accepted at AAAI 2020. See you in New York! Check out our Demo Video!
[#1] I am an active software developer with more than 4,800 GitHub stars in total (top 1,200 among 37,000,000 GitHub developers ranked by Gitstar Ranking). I led multiple popular open-source ML initiatives, including PyOD (total downloads > 300,000 times), combo, anomaly-detection-resources, and awesome-ensemble-learning.
[#2] I am a dedicated technical writer with more than 200 articles (in Chinese) and 85,000 followers on Zhihu (知乎) — Chinese Quora (200 million+ registered users). Since 2018, I have been officially recognized as a “Top Zhihu Writer” (优秀回答者) in four fields (AI, ML, DM, and STAT). My articles have been read by more than 8,000,000 times (statistics by Zhihu). See my Zhihu page.
[#3] From Oct 2019 to May 2020, I am a GSA International Student Co-Advocate, along with Sandra Fomete. The Graduate Student Assembly (GSA) is the branch of student government that represents all graduate students at CMU. I am working with administrators at different levels and student groups to improve international graduate students’ experience at CMU. My core focus includes immigration (visa and CPT/OPT), career development, and mental health issues, which are particularly relevant to international students. If you are a CMU graduate student that wants to discuss related issues with me, please shoot me an email.
I am open to peer review and organizing chances (all types of venues) in the field of outlier & anomaly detection, ensemble Learning, clustering, ML libraries & systems, and information systems. Please send me an email (email@example.com) or a request in the corresponding reviewing/organizing system.
[w19e] SUOD: Scalable Unsupervised Outlier Detection by Projection, Approximation, and Parallelization. With Haoping Bai (CMU), Xueying Ding (CMU), and Jianing Yang (CMU).
Short Intro: As the ground truth is often absent in anomaly detection tasks, using a single detection model may incur huge risk. As a remedy, practitioners resort for training a large group of detectors to combine for both performance improvement and stability enhancement. However, training many unsupervised models can be computationally expensive, which makes this approach infeasible in real-world applications. In this work, we propose a three-phase framework called Scalable Unsupervised Outlier Detection (SUOD) to speed up this process. Empirically, the proposed SOUD framework shows great performance regarding both accuracy and efficiency. More to come.
[w19c] A New Image Super-Resolution (MR) Method (Name masked due to coming submission). With Yiqun Mei (UIUC).
[w19d] Colin Wan, Zheng Li, Alicia Guo, Yue Zhao. SynC: A Unified Framework for Generating Synthetic Population with Gaussian Copula. AAAI Conference on Artificial Intelligence Workshop, 2020. Submitted, under review. Under R&R for KDD 2020.
It is my pleasure to give talks on the series of tools I built, including PyOD and combo listed below. I am also happy to discuss the experience as a ML developer and researcher and how to build ML tools. Please drop me a line for invite :)
I am an enthusiastic open-source developer: I build machine learning libraries and systems. Specifically, I initialized Python Outlier Detection library (PyOD) in 2018, which has become the most popular Python outlier detection toolkit. I also initialized combo: A Python Toolbox for Machine Learning Model Combination in July 2019–it is currently under active development. Watch/Star/Follow welcome!
Applied research in people analytics with machine learning.
Supervised by Prof. Anthony Bonner and the project is partly supported by Mitacs-Accelerate Research and Development Funding (IT07884).