I am an assistant professor of information systems at Carnegie Mellon University's Heinz College, and an affiliated faculty member of the Machine Learning Department. I work on machine learning for healthcare and for information systems in developing countries. In these applications, my work revolves around forecasting, such as predicting how long a patient will stay in a hospital, or how produce prices will change in a week at over a thousand Indian markets. To produce forecasts, I typically uses nonparametric methods that, instead of specifying a model for the data in advance, let the data decide on what model to use, essentially through an election-like process where each data point casts a vote. Since these methods inform interventions that can be costly and affect people's well-being, ensuring that predictions are reliable and interpretable is essential. To this end, in addition to developing nonparametric predictors, I also produce theory for when and why they work, and identify forecast evidence that would be helpful to practitioners for decision making.
Prospective PhD students: If you are interested in working with me but have not already been accepted into a PhD program at Carnegie Mellon University, then apply to a CMU PhD program first instead of contacting me (PhD admissions are done at the department or college level, depending on the program, and are not done by me individually, so I cannot grant you admission or say whether you will get accepted). Specifically within CMU's Heinz College, the PhD programs most relevant to my work are the joint PhD program in Machine Learning and Public Policy and the PhD program in Information Systems and Management. If you are already a CMU PhD student and would like to work with me, feel free to contact me directly.
Email: georgechen [at symbol] cmu.edu
Office: HBH 2216 (the west wing of Hamburg Hall, second floor)
My book with Devavrat Shah is out: "Explaining the Success of Nearest Neighbor Methods in Prediction" (Foundations and Trends in Machine Learning, May 2018). Despite nearest neighbor methods appearing in text as early as the 11th century in Alhazen's "Book of Optics", it was not until fairly recently that arguably the most general, nonasymptotic theory for nearest neighbor classification was developed by Chaudhuri and Dasgupta (2014). This book goes over some of the latest nonasymptotic theoretical guarantees for nearest neighbor and related kernel regression and classification methods both in general metric spaces, and in contemporary applications where clustering structure appears (time series forecasting, recommendation systems, medical image segmentation). The book also covers some recent advances in approximate nearest neighbor search, explains why decision tree and related ensemble methods are nearest neighbor methods, and discusses the potential for far away neighbors to help in prediction. We also organized a related workshop at NeurIPS 2017 (slides are available for all the talks).
Before joining Carnegie Mellon, I finished my Ph.D. in Electrical Engineering and Computer Science at MIT, advised by Polina Golland and Devavrat Shah. My thesis was on nonparametric machine learning methods for forecasting viral news, recommending products to people, and finding human organs in medical images. I also worked on satellite image analysis to help bring electricity to rural India, and modeled brain activation patterns evoked by reading sentences. Between grad school and becoming faculty, I helped develop the recommendation engine at a predictive analytics startup Celect and then was a teaching postdoc in MIT's Digital Learning Lab, where I was the primary instructor and course developer for a new edX course on computational probability and inference.
I enjoy teaching and pondering the future of education! I have previously taught at MIT, UC Berkeley, and in Jerusalem at a program MEET that brings together Israeli and Palestinian high school students. As a grad student, I served on the Task Force on the Future of MIT Education, and my time as a teaching postdoc was all about better understanding the digital learning space.
Last updated March 29, 2019.