George H. Chen
Assistant Professor of Information Systems,
Heinz College
Affiliated Faculty,
Machine Learning Department
Carnegie Mellon University
Email: georgechen [at symbol] cmu.edu
Office: HBH 2216 (the west wing of Hamburg Hall, second floor)
About
I work on forecasting problems in healthcare and in sustainable
development, such as predicting how long a patient will stay in a
hospital, or how produce prices will change in a week at over a
thousand Indian markets. To produce forecasts, I typically use
nonparametric methods that make very few assumptions on the underlying
data. Since these methods inform interventions that can be costly and
affect people's well-being, ensuring that predictions are reliable is
essential. To this end, in addition to developing nonparametric
predictors, I also produce theory to understand when and why they work,
and I identify forecast evidence to help practitioners make decisions.
Research areas:
nonparametric prediction, survival analysis, time series forecasting,
missing data, healthcare, sustainable development
Pre-historic:
I obtained my Ph.D. in Electrical
Engineering and Computer Science at
MIT, advised by
Polina Golland and
Devavrat Shah. My
thesis was on
nonparametric machine learning methods. At MIT, I also worked on
satellite
image analysis to help bring electricity to rural India, and
taught twice in Jerusalem at a program
MEET that brings together Israeli
and Palestinian high school students to learn computer science and
entrepreneurship. Between grad school and becoming faculty, I helped
develop the recommendation engine at a predictive analytics startup
Celect (since acquired by Nike) and then was a teaching postdoc in
MIT's Digital Learning Lab, where I was the primary instructor and
course developer for an
edX
course on computational probability and inference. I completed my
undergraduate studies at UC Berkeley, dual majoring in
Electrical Engineering and Computer Sciences, and
Engineering Mathematics and Statistics.
My CV can be found here.
Survival Analysis Tutorial
July 23, 2020:
Together with Jeremy
Weiss, we taught a tutorial on survival analysis at the 2020
Conference on Health, Inference, and Learning (CHIL):
[tutorial webpage]
Teaching (Fall 2020)
95-865 "Unstructured Data Analytics" (mini 2)
Papers
You can also find my papers listed on
Google Scholar.
Working Papers
-
"Validation for Missing Data Imputation: Which Method Should I Choose?"
Shun Liao*, Yuhuai Wu*, Zhaolei Zhang, George H. Chen†, Marzyeh Ghassemi† (*, † = equal contribution respectively)
(Under review)
-
"Influence via Ethos: On the Persuasive Power of Reputation in Deliberation Online"
Emaad Ahmed Manzoor, George H. Chen, Dokyun Lee, Michael D. Smith
(Under review)
[arXiv]
-
"Consumer Behavior in the Online Classroom: Using Video Analytics and Machine Learning to Understand the Consumption of Video Courseware"
Mi Zhou, George H. Chen, Pedro Ferreira, Michael D. Smith
INFORMS Conference on Information Systems & Technology (CIST), October 2019
Workshop on Information Systems & Economics (WISE), December 2019
(Under review)
2020
-
"Neural Topic Models with Survival Supervision: Jointly Predicting Time-to-Event Outcomes and Learning How Clinical Features Relate"
Linhong Li, Ren Zuo, Amanda Coston, Jeremy C. Weiss, George H. Chen
International Conference on Artificial Intelligence in Medicine (AIME), August 2020
[arXiv] [code] [talk slides]
-
"Predicting Mortality Risk in Viral and Unspecified Pneumonia to Assist Clinicians with COVID-19 ECMO Planning"
Helen Zhou*, Cheng Cheng*, Zachary C. Lipton, George H. Chen, Jeremy C. Weiss (* = equal contribution)
International Conference on Artificial Intelligence in Medicine (AIME), August 2020
[arXiv] [code]
(Also presented at the International Conference on Machine Learning (ICML) Workshop on Machine Learning for Global Health, July 2020)
-
"Deep Kernel Survival Analysis and Subject-Specific Survival Time Prediction Intervals"
George H. Chen
Machine Learning for Healthcare (MLHC), August 2020
[arXiv] [code] [poster]
Note:
This paper is essentially a sequel to my theory paper on nearest neighbor and kernel survival analysis (ICML 2019), where an open problem encountered is how to automatically learn kernel functions for survival analysis aside from using random survival forests.
2019
-
"Missing Not at Random in Matrix Completion: The Effectiveness of Estimating Missingness Probabilities Under a Low Nuclear Norm Assumption"
Wei Ma*, George H. Chen* (* = equal contribution)
Neural Information Processing Systems (NeurIPS), December 2019
[arXiv] [code] [poster] [talk slides]
Note: We have a longer version in preparation analyzing a collection of missingness probability estimators, with more debiasing guarantees
Best paper (theoretical track) at INFORMS Data Mining and Decision Analytics Workshop 2019
-
"Truck Traffic Monitoring with Satellite Images"
Lynn H. Kaack, George H. Chen, M. Granger Morgan
ACM Conference on Computing and Sustainable Societies (COMPASS), July 2019
[arXiv]
(Also presented at the
International Conference on Machine Learning (ICML) Workshop on Climate Change, June 2019)
-
"Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates"
George H. Chen
International Conference on Machine Learning (ICML), June 2019
[arXiv] [code] [talk slides] [poster]
Note:
In my follow-up work at MLHC 2020, I show how to automatically learn kernel functions for survival analysis in a neural net framework, and how to use these kernel functions to help construct survival time prediction intervals.
-
"An Interpretable Produce Price Forecasting System for Small Farmers in India using Collaborative Filtering and Adaptive Nearest Neighbors"
Wei Ma, Kendall Nowocin, Niraj Marathe, George H. Chen
Information and Communication Technologies and Development (ICTD), January 2019
[arXiv]
2018
-
"Explaining the Success of Nearest Neighbor Methods in Prediction"
George H. Chen, Devavrat Shah
Foundations and Trends in Machine Learning, May 2018
[DOI]
2017
-
"Survival-Supervised Topic Modeling with Anchor Words: Characterizing Pancreatitis Outcomes"
George H. Chen, Jeremy C. Weiss
Neural Information Processing Systems (NeurIPS) Workshop on Machine Learning for Health, December 2017
[arXiv (short workshop version)]
(Also presented at Society for Medical Decision Making North American Meeting, October 2017)
-
"Toward Reducing Crop Spoilage and Increasing Small Farmer Profits in India: a Simultaneous Hardware and Software Solution"
George H. Chen, Kendall Nowocin, Niraj Marathe
Information and Communication Technologies and Development, November 2017
[arXiv]
2015
-
"A Latent Source Model for Patch-Based Image Segmentation"
George H. Chen, Devavrat Shah, Polina Golland
Medical Image Computing and Computer-Assisted Intervention (MICCAI), October 2015
[arXiv]
[paper]
[poster]
Note:
For a more comprehensive exposition of this paper, consider
reading Chapter 5 of my
Ph.D. thesis.
-
"Latent Source Models for Nonparametric Inference"
George H. Chen
Ph.D. thesis, MIT, May 2015
[paper]
Received the George M. Sprowls award for best Ph.D. thesis in Computer Science at MIT
-
"Targeting Villages for Rural Development Using Satellite Image
Analysis"
Kush R. Varshney, George H. Chen, Brian Abelson, Kendall
Nowocin, Vivek Sakhrani, Ling Xu, Brian L. Spatocco
Big Data, March 2015
[paper]
2014
-
"A Latent Source Model for Online Collaborative Filtering"
(alphabetical author ordering)
Guy Bresler, George H. Chen, Devavrat Shah
Neural Information Processing Systems (NeurIPS), December 2014
[arXiv - longer version]
[paper - short conference version]
[poster]
Selected for spotlight (one of 62/1678 submissions)
Note:
An expanded version including intuition for how collaborative
filtering relates to an MAP item recommender and derivations for
the examples is in Chapter 4 of my
Ph.D. thesis;
the notation has also been changed to be more similar to the
other two papers that went toward my thesis.
2013
-
"A Latent Source Model for Nonparametric Time Series Classification"
(alphabetical author ordering)
George H. Chen, Stanislav Nikolov, Devavrat Shah
Neural Information Processing Systems (NeurIPS), December 2013
[arXiv - longer version]
[paper - short conference version]
[poster]
Note:
An expanded version with a lower bound on the misclassification
rate and further discussion is in Chapter 3 of my
Ph.D. thesis.
-
"Sparse Projections of Medical Images onto Manifolds"
George H. Chen, Christian Wachinger, Polina Golland
Information Processing in Medical Imaging (IPMI), June-July 2013
[arXiv]
[paper]
[poster]
2012
-
"Deformation-Invariant Sparse Coding"
George H. Chen
Master's thesis, MIT, May 2012
[paper]
[poster]
2011
-
"Deformation-Invariant Sparse Coding for Modeling Spatial Variability of Functional Patterns in the Brain"
George H. Chen, Evelina G. Fedorenko, Nancy G. Kanwisher, Polina Golland
Neural Information Processing Systems (NeurIPS) Workshop on Machine Learning and Interpretation in Neuroimaging, December 2011
[paper]
[talk slides]
2010
-
"Indoor Localization and Visualization Using a Human-Operated Backpack System"
Timothy Liu, Matthew Carlberg, George Chen, Jacky Chen, John Kua, Avideh Zakhor
International Conference on Indoor Positioning and Indoor Navigation (IPIN), September 2010
[paper]
-
"Indoor Localization Algorithms for a Human-Operated Backpack System"
George Chen, John Kua, Stephen Shum, Nikhil Naikal, Matthew Carlberg, Avideh Zakhor
International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT), May 2010
[paper]
2009
-
"Classifying Urban Landscape in Aerial LIDAR Using 3D Shape Analysis"
Matthew Carlberg, Peiran Gao, George Chen, Avideh Zakhor
International Conference on Image Processing (ICIP), November 2009
[paper]
-
"2D Tree Detection in Large Urban Landscapes Using Aerial LIDAR Data"
George Chen, Avideh Zakhor
International Conference on Image Processing (ICIP), November 2009
[paper]
-
"Image Augmented Laser Scan Matching for Indoor Dead Reckoning"
Nikhil Naikal, John Kua, George Chen, Avideh Zakhor
International Conference on Intelligent Robots and Systems (IROS), October 2009
[paper]
Last updated 10/25/2020.