Profile photo


I am an assistant professor of information systems at Carnegie Mellon University's Heinz College, and an affiliated faculty member of the Machine Learning Department. I work on machine learning for healthcare and for information systems in developing countries. In these applications, my work revolves around forecasting, such as predicting how long a patient will stay in a hospital, or when and where farmers in rural India should sell their crops. To produce forecasts, I typically uses nonparametric methods that, instead of specifying a model for the data in advance, let the data decide on what model to use, essentially through an election-like process where each data point casts a vote. Since these methods inform interventions that can be costly and affect people's well-being, ensuring that predictions are reliable and interpretable is essential. To this end, in addition to developing nonparametric predictors, I also produce theory for when and why they work, and identify forecast evidence that would be helpful to practitioners for decision making.

Email: [click here]

Office: HBH 2216 (the west wing of Hamburg Hall, second floor)

Teaching schedule (Fall 2018): I will be teaching 95-865 "Unstructured Data Analytics" during mini-2.

Nearest neighbor survey book thumbnail


My book with Devavrat Shah is out: "Explaining the Success of Nearest Neighbor Methods in Prediction" (Foundations and Trends in Machine Learning, May 2018). Despite nearest neighbor methods appearing in text as early as the 11th century in Alhazen's "Book of Optics", it was not until fairly recently that arguably the most general, nonasymptotic theory for nearest neighbor classification was developed by Chaudhuri and Dasgupta (2014). This book goes over some of the latest nonasymptotic theoretical guarantees for nearest neighbor and related kernel regression and classification methods both in general metric spaces, and in contemporary applications where clustering structure appears (time series forecasting, recommendation systems, medical image segmentation). The book also covers some recent advances in approximate nearest neighbor search, explains why decision tree and related ensemble methods are nearest neighbor methods, and discusses the potential for far away neighbors to help in prediction. We also organized a related workshop at NIPS 2017 (slides are available for all the talks).

Current Projects


Before joining Carnegie Mellon, I finished my Ph.D. in Electrical Engineering and Computer Science at MIT, advised by Polina Golland and Devavrat Shah. My thesis was on nonparametric machine learning methods for forecasting viral news, recommending products to people, and finding human organs in medical images. I also worked on satellite image analysis to help bring electricity to rural India, and modeled brain activation patterns evoked by reading sentences. Between grad school and becoming faculty, I helped develop the recommendation engine at a predictive analytics startup Celect and then was a teaching postdoc in MIT's Digital Learning Lab, where I was the primary instructor and course developer for a new edX course on computational probability and inference.

I enjoy teaching and pondering the future of education! I have previously taught at MIT, UC Berkeley, and in Jerusalem at a program MEET that brings together Israeli and Palestinian high school students. As a grad student, I served on the Task Force on the Future of MIT Education, and my time as a teaching postdoc was all about better understanding the digital learning space.

Last updated October 5, 2018.