George H. Chen
Assistant Professor of Information Systems,
Heinz College
Affiliated Faculty,
Machine Learning Department
Carnegie Mellon University
Email: georgechen [at symbol] cmu.edu
Office: HBH 2216 (the west wing of Hamburg Hall, second floor)
About
I primarily work on machine learning for healthcare and for sustainable
development, with an emphasis on forecasting problems involving survival
analysis as well as time series data. A recurring theme in my work is
the use of nonparametric prediction methods that aim to make few
assumptions on the underlying data. Since these methods inform
interventions that can be costly and affect people's well-being,
ensuring that predictions are reliable is essential. To this end, in
addition to developing nonparametric predictors, I also produce theory
to understand when and why they work, and identify forecast evidence to
help practitioners make decisions.
Research areas:
nonparametric prediction, survival analysis, time series forecasting,
missing data, healthcare, sustainable development
CoolCrop:
I am a co-founder and advisor for
CoolCrop, an AgriTech startup based
in India that works on providing cold storage hardware for farmers and
also providing market forecasts to help farmers make decisions on
business operations.
Pre-historic:
I obtained my Ph.D. in Electrical
Engineering and Computer Science at
MIT, advised by
Polina Golland and
Devavrat Shah. My
thesis was on
nonparametric machine learning methods. At MIT, I also worked on
satellite
image analysis to help bring electricity to rural India, and
taught twice in Jerusalem at a program
MEET that brings together Israeli
and Palestinian high school students to learn computer science and
entrepreneurship. Between grad school and becoming faculty, I helped
develop the recommendation engine at a predictive analytics startup
Celect (since acquired by Nike) and then was a teaching postdoc in
MIT's Digital Learning Lab, where I was the primary instructor and
course developer for an
edX
course on computational probability and inference. I completed my
undergraduate studies at UC Berkeley, dual majoring in
Electrical Engineering and Computer Sciences, and
Engineering Mathematics and Statistics.
My CV can be found here.
Survival Analysis Tutorial
July 23, 2020:
Jeremy
Weiss and I co-taught a tutorial on survival analysis at the 2020
Conference on Health, Inference, and Learning (CHIL):
[tutorial webpage]
Teaching (Fall 2020)
95-865 "Unstructured Data Analytics" (mini 2)
Research Group
I've had the fortune of working with some wonderful students over the years. If you're interested in working with me, shoot me an email telling me what you're particularly excited about working on, why it overlaps with my research interests, and what skills you've already cultivated (if you're a master's student or an undergrad, ideally you should have already taken some machine learning and statistics courses). Note that currently I do not take on students who are not already admitted to CMU.
Current students:
- Emaad Manzoor (PhD), starting as Assistant Professor at UW Madison School of Business Fall 2021
- Xinyu Yao (PhD)
- Vinayak Bhatia (PhD)
- Xiaobin Shen (MISM)
- Shahriar Noroozizadeh (master's in ML)
Past students and where they went after graduating (* = indicates a PhD student who worked with me on a secondary master's):
- Mi Zhou (PhD 2020), Assistant Professor at UBC Sauder School of Business
- *Wei Ma (master's in ML 2018), Assistant Professor at Hong Kong Polytechnic University in the Civil Engineering Department
- *Lynn H. Kaack (master's in ML 2018), postdoc at ETH Zurich in the Energy Politics Group
- Xiaotong (Maggie) Lu (MISM 2020), McKinsey
- Runtong (Fred) Yang (MISM 2019), Capitol One
- Ren Zuo (MISM 2018), Cornerstone Research
- Linhong (Lexie) Li (undergrad in Statistics and Machine Learning 2020), McKinsey
- Junyan Pu (undergrad in Statistics and Machine Learning 2020), CMU master's student in CS
Papers
You can also find my papers listed on
Google Scholar.
Working Papers
-
"Validation for Missing Data Imputation: Which Method Should I Choose?"
Shun Liao*, Yuhuai Wu*, Zhaolei Zhang, George H. Chen†, Marzyeh Ghassemi† (*, † = equal contribution respectively)
(Under review)
-
"Influence via Ethos: On the Persuasive Power of Reputation in Deliberation Online"
Emaad Ahmed Manzoor, George H. Chen, Dokyun Lee, Michael D. Smith
(Under review)
[arXiv]
Best paper at AAAI Workshop on AI for Behavior Change 2021
-
"Consumer Behavior in the Online Classroom: Using Video Analytics and Machine Learning to Understand the Consumption of Video Courseware"
Mi Zhou, George H. Chen, Pedro Ferreira, Michael D. Smith
INFORMS Conference on Information Systems & Technology (CIST), October 2019
Workshop on Information Systems & Economics (WISE), December 2019
(Under review)
2020
-
"Neural Topic Models with Survival Supervision: Jointly Predicting Time-to-Event Outcomes and Learning How Clinical Features Relate"
Linhong Li, Ren Zuo, Amanda Coston, Jeremy C. Weiss, George H. Chen
International Conference on Artificial Intelligence in Medicine (AIME), August 2020
[arXiv] [code] [talk slides]
-
"Predicting Mortality Risk in Viral and Unspecified Pneumonia to Assist Clinicians with COVID-19 ECMO Planning"
Helen Zhou*, Cheng Cheng*, Zachary C. Lipton, George H. Chen, Jeremy C. Weiss (* = equal contribution)
International Conference on Artificial Intelligence in Medicine (AIME), August 2020
[arXiv] [code]
(Also presented at the International Conference on Machine Learning (ICML) Workshop on Machine Learning for Global Health, July 2020)
-
"Deep Kernel Survival Analysis and Subject-Specific Survival Time Prediction Intervals"
George H. Chen
Machine Learning for Healthcare (MLHC), August 2020
[arXiv] [code] [poster]
Note:
This paper is essentially a sequel to my theory paper on nearest neighbor and kernel survival analysis (ICML 2019), where an open problem encountered is how to automatically learn kernel functions for survival analysis aside from using random survival forests.
2019
-
"Missing Not at Random in Matrix Completion: The Effectiveness of Estimating Missingness Probabilities Under a Low Nuclear Norm Assumption"
Wei Ma*, George H. Chen* (* = equal contribution)
Neural Information Processing Systems (NeurIPS), December 2019
[arXiv] [code] [poster] [talk slides]
Note: We have a longer version in preparation analyzing a collection of missingness probability estimators, with more debiasing guarantees
Best paper (theoretical track) at INFORMS Data Mining and Decision Analytics Workshop 2019
-
"Truck Traffic Monitoring with Satellite Images"
Lynn H. Kaack, George H. Chen, M. Granger Morgan
ACM Conference on Computing and Sustainable Societies (COMPASS), July 2019
[arXiv]
(Also presented at the
International Conference on Machine Learning (ICML) Workshop on Climate Change, June 2019)
-
"Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates"
George H. Chen
International Conference on Machine Learning (ICML), June 2019
[arXiv] [code] [talk slides] [poster]
Note:
In my follow-up work at MLHC 2020, I show how to automatically learn kernel functions for survival analysis in a neural net framework, and how to use these kernel functions to help construct survival time prediction intervals.
-
"An Interpretable Produce Price Forecasting System for Small Farmers in India using Collaborative Filtering and Adaptive Nearest Neighbors"
Wei Ma, Kendall Nowocin, Niraj Marathe, George H. Chen
Information and Communication Technologies and Development (ICTD), January 2019
[arXiv]
2018
-
"Explaining the Success of Nearest Neighbor Methods in Prediction"
George H. Chen, Devavrat Shah
Foundations and Trends in Machine Learning, May 2018
[DOI]
2017
-
"Survival-Supervised Topic Modeling with Anchor Words: Characterizing Pancreatitis Outcomes"
George H. Chen, Jeremy C. Weiss
Neural Information Processing Systems (NeurIPS) Workshop on Machine Learning for Health, December 2017
[arXiv (short workshop version)]
(Also presented at Society for Medical Decision Making North American Meeting, October 2017)
-
"Toward Reducing Crop Spoilage and Increasing Small Farmer Profits in India: a Simultaneous Hardware and Software Solution"
George H. Chen, Kendall Nowocin, Niraj Marathe
Information and Communication Technologies and Development, November 2017
[arXiv]
2015
-
"A Latent Source Model for Patch-Based Image Segmentation"
George H. Chen, Devavrat Shah, Polina Golland
Medical Image Computing and Computer-Assisted Intervention (MICCAI), October 2015
[arXiv]
[paper]
[poster]
Note:
For a more comprehensive exposition of this paper, consider
reading Chapter 5 of my
Ph.D. thesis.
-
"Latent Source Models for Nonparametric Inference"
George H. Chen
Ph.D. thesis, MIT, May 2015
[paper]
Received the George M. Sprowls award for best Ph.D. thesis in Computer Science at MIT
-
"Targeting Villages for Rural Development Using Satellite Image
Analysis"
Kush R. Varshney, George H. Chen, Brian Abelson, Kendall
Nowocin, Vivek Sakhrani, Ling Xu, Brian L. Spatocco
Big Data, March 2015
[paper]
2014
-
"A Latent Source Model for Online Collaborative Filtering"
(alphabetical author ordering)
Guy Bresler, George H. Chen, Devavrat Shah
Neural Information Processing Systems (NeurIPS), December 2014
[arXiv - longer version]
[paper - short conference version]
[poster]
Selected for spotlight (one of 62/1678 submissions)
Note:
An expanded version including intuition for how collaborative
filtering relates to an MAP item recommender and derivations for
the examples is in Chapter 4 of my
Ph.D. thesis;
the notation has also been changed to be more similar to the
other two papers that went toward my thesis.
2013
-
"A Latent Source Model for Nonparametric Time Series Classification"
(alphabetical author ordering)
George H. Chen, Stanislav Nikolov, Devavrat Shah
Neural Information Processing Systems (NeurIPS), December 2013
[arXiv - longer version]
[paper - short conference version]
[poster]
Note:
An expanded version with a lower bound on the misclassification
rate and further discussion is in Chapter 3 of my
Ph.D. thesis.
-
"Sparse Projections of Medical Images onto Manifolds"
George H. Chen, Christian Wachinger, Polina Golland
Information Processing in Medical Imaging (IPMI), June-July 2013
[arXiv]
[paper]
[poster]
2012
-
"Deformation-Invariant Sparse Coding"
George H. Chen
Master's thesis, MIT, May 2012
[paper]
[poster]
2011
-
"Deformation-Invariant Sparse Coding for Modeling Spatial Variability of Functional Patterns in the Brain"
George H. Chen, Evelina G. Fedorenko, Nancy G. Kanwisher, Polina Golland
Neural Information Processing Systems (NeurIPS) Workshop on Machine Learning and Interpretation in Neuroimaging, December 2011
[paper]
[talk slides]
2010
-
"Indoor Localization and Visualization Using a Human-Operated Backpack System"
Timothy Liu, Matthew Carlberg, George Chen, Jacky Chen, John Kua, Avideh Zakhor
International Conference on Indoor Positioning and Indoor Navigation (IPIN), September 2010
[paper]
-
"Indoor Localization Algorithms for a Human-Operated Backpack System"
George Chen, John Kua, Stephen Shum, Nikhil Naikal, Matthew Carlberg, Avideh Zakhor
International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT), May 2010
[paper]
2009
-
"Classifying Urban Landscape in Aerial LIDAR Using 3D Shape Analysis"
Matthew Carlberg, Peiran Gao, George Chen, Avideh Zakhor
International Conference on Image Processing (ICIP), November 2009
[paper]
-
"2D Tree Detection in Large Urban Landscapes Using Aerial LIDAR Data"
George Chen, Avideh Zakhor
International Conference on Image Processing (ICIP), November 2009
[paper]
-
"Image Augmented Laser Scan Matching for Indoor Dead Reckoning"
Nikhil Naikal, John Kua, George Chen, Avideh Zakhor
International Conference on Intelligent Robots and Systems (IROS), October 2009
[paper]
Last updated 10/25/2020.