Note: "♣" denotes an author list that is alphabetical by last name, as is customary in fields like math and theoretical computer science.

You can also find my papers listed on Google Scholar.

Nonasymptotic theory for nearest neighbor methods in prediction

Despite nearest neighbor methods appearing in text as early as the 11th century in Alhazen's "Book of Optics", it was not until fairly recently that arguably the most general, nonasymptotic theory for nearest neighbor classification was developed by Chaudhuri and Dasgupta (2014). I've worked on a book that goes over some of the latest nonasymptotic theoretical guarantees for nearest neighbor and related kernel regression and classification methods both in general metric spaces, and in contemporary applications where clustering structure appears (time series forecasting, recommendation systems, medical image segmentation). The book also covers some recent advances in approximate nearest neighbor search, explains why decision tree and related ensemble methods are nearest neighbor methods, and discusses the potential for far away neighbors to help in prediction. I helped organize a related workshop at NIPS 2017 (slides are available for all the talks).

Nearest neighbor survey book thumbnail
  • "Explaining the Success of Nearest Neighbor Methods in Prediction"
    George H. Chen, Devavrat Shah
    Foundations and Trends in Machine Learning, May 2018
    [DOI]

Chapter 5 of the above book is on theoretical results using clustering structure. This chapter is based on my PhD thesis and provides a better overview than my thesis does. Proofs for the chapter are deferred to my thesis:

  • "Latent Source Models for Nonparametric Inference"
    George H. Chen
    Ph.D. thesis, MIT, May 2015
    [paper]
    Received the George M. Sprowls award for best Ph.D. thesis in Computer Science at MIT

My thesis unifies and builds on the following trilogy of papers:

  • "A Latent Source Model for Patch-Based Image Segmentation"
    George H. Chen, Devavrat Shah, Polina Golland
    Medical Image Computing and Computer-Assisted Intervention, October 2015
    [arXiv] [paper] [poster]
    Note: For a more comprehensive exposition of this paper, consider reading Chapter 5 of my Ph.D. thesis.
  • "A Latent Source Model for Online Collaborative Filtering"
    ♣ Guy Bresler, George H. Chen, Devavrat Shah
    Neural Information Processing Systems, December 2014
    [arXiv - longer version] [paper - short conference version] [poster]
    Selected for spotlight (one of 62/1678 submissions)
    Note: An expanded version including intuition for how collaborative filtering relates to an MAP item recommender and derivations for the examples is in Chapter 4 of my Ph.D. thesis; the notation has also been changed to be more similar to the rest of the trilogy of papers.
  • "A Latent Source Model for Nonparametric Time Series Classification"
    George H. Chen, Stanislav Nikolov, Devavrat Shah
    Neural Information Processing Systems, December 2013
    [arXiv - longer version] [paper - short conference version] [poster]
    Note: An expanded version with a lower bound on the misclassification rate and further discussion is in Chapter 3 of my Ph.D. thesis.

Forecasting patient outcomes in electronic health records

I'm working on topic modeling combined with survival analysis, currently focusing on patients with pancreatitis admitted to the intensive care unit:

  • "Survival-Supervised Topic Modeling with Anchor Words: Characterizing Pancreatitis Outcomes"
    George H. Chen, Jeremy C. Weiss
    Neural Information Processing Systems Workshop on Machine Learning for Health, December 2017
    [arXiv (short workshop version)] [longer version in preparation]
    (Also presented at an abstract-only venue Society for Medical Decision Making North American Meeting, October 2017)

Rural development

With a startup called CoolCrop, I am working on providing small and marginal farmers in rural India with access to cost-effective refrigeration and predictive analytics:

  • "Toward Reducing Crop Spoilage and Increasing Small Farmer Profits in India: a Simultaneous Hardware and Software Solution"
    George H. Chen, Kendall Nowocin, Niraj Marathe
    Information and Communication Technologies for Development, November 2017
    [arXiv]

Previously, as part of a startup GridForm, I analyzed satellite images of enormous tracts of land to help plan development projects. We focused on helping renewable energy companies bring electricity to rural India. We won the $10,000 grand prize at the 2014 MIT IDEAS Global Challenge. Here's a joint paper with Kush Varshney and Brian Abelson of DataKind:

  • "Targeting Villages for Rural Development Using Satellite Image Analysis"
    Kush R. Varshney, George H. Chen, Brian Abelson, Kendall Nowocin, Vivek Sakhrani, Ling Xu, Brian L. Spatocco
    Big Data, March 2015
    [paper]

Real-time medical image analysis

Various real-time medical imaging applications could be enabled by speeding up dimensionality reduction, a subroutine used in many image analysis algorithms. To do this, we create a sparse description of a manifold; our work relates to sparse multivariate regression:

Sparsification graphic
  • "Sparse Projections of Medical Images onto Manifolds"
    George H. Chen, Christian Wachinger, Polina Golland
    Information Processing in Medical Imaging, June-July 2013
    [arXiv] [paper] [poster]

Modeling brain activation patterns

My master's thesis presented a probabilistic model of brain activation patterns evoked by functional stimuli such as reading sentences; the model combines sparse coding and image alignment:

Deformation-invariant sparse coding graphic
  • "Deformation-Invariant Sparse Coding"
    George H. Chen
    Master's thesis, MIT, May 2012
    [paper] [poster]

Preliminary version:

  • "Deformation-Invariant Sparse Coding for Modeling Spatial Variability of Functional Patterns in the Brain"
    George H. Chen, Evelina G. Fedorenko, Nancy G. Kanwisher, Polina Golland
    Neural Information Processing Systems Workshop on Machine Learning and Interpretation in Neuroimaging, December 2011
    [paper] [talk slides]

Backpack with sensors for indoor modeling

I developed algorithms that track where this fancy backpack is indoors using laser scanners. After I graduated from Berkeley, this project progressed quite a bit! Be sure to check out the latest developments from the Video and Image Processing Lab's website. Preliminary results:

Photo of backpack with sensors
  • "Indoor Localization and Visualization Using a Human-Operated Backpack System"
    Timothy Liu, Matthew Carlberg, George Chen, Jacky Chen, John Kua, Avideh Zakhor
    International Conference on Indoor Positioning and Indoor Navigation, September 2010
    [paper]
  • "Indoor Localization Algorithms for a Human-Operated Backpack System"
    George Chen, John Kua, Stephen Shum, Nikhil Naikal, Matthew Carlberg, Avideh Zakhor
    International Symposium on 3D Data Processing, Visualization and Transmission, May 2010
    [paper]
  • "Image Augmented Laser Scan Matching for Indoor Dead Reckoning"
    Nikhil Naikal, John Kua, George Chen, Avideh Zakhor
    International Conference on Intelligent Robots and Systems, October 2009
    [paper]

Analyzing aerial images of cities

How to automatically find buildings, trees, ground, and water in aerial LIDAR images:

Example labeling of LIDAR image
  • "Classifying Urban Landscape in Aerial LIDAR Using 3D Shape Analysis"
    Matthew Carlberg, Peiran Gao, George Chen, Avideh Zakhor
    International Conference on Image Processing, November 2009
    [paper]
  • "2D Tree Detection in Large Urban Landscapes Using Aerial LIDAR Data"
    George Chen, Avideh Zakhor
    International Conference on Image Processing, November 2009
    [paper]

2018

  • "Explaining the Success of Nearest Neighbor Methods in Prediction"
    George H. Chen, Devavrat Shah
    Foundations and Trends in Machine Learning, May 2018
    [DOI]

2017

  • "Survival-Supervised Topic Modeling with Anchor Words: Characterizing Pancreatitis Outcomes"
    George H. Chen, Jeremy C. Weiss
    Neural Information Processing Systems Workshop on Machine Learning for Health, December 2017
    [arXiv (short workshop version)] [longer version in preparation]
    (Also presented at an abstract-only venue Society for Medical Decision Making North American Meeting, October 2017)
  • "Toward Reducing Crop Spoilage and Increasing Small Farmer Profits in India: a Simultaneous Hardware and Software Solution"
    George H. Chen, Kendall Nowocin, Niraj Marathe
    Information and Communication Technologies for Development, November 2017
    [arXiv]

2015

  • "A Latent Source Model for Patch-Based Image Segmentation"
    George H. Chen, Devavrat Shah, Polina Golland
    Medical Image Computing and Computer-Assisted Intervention, October 2015
    [arXiv] [paper] [poster]
    Note: For a more comprehensive exposition of this paper, consider reading Chapter 5 of my Ph.D. thesis.
  • "Latent Source Models for Nonparametric Inference"
    George H. Chen
    Ph.D. thesis, MIT, May 2015
    [paper]
    Received the George M. Sprowls award for best Ph.D. thesis in Computer Science at MIT
  • "Targeting Villages for Rural Development Using Satellite Image Analysis"
    Kush R. Varshney, George H. Chen, Brian Abelson, Kendall Nowocin, Vivek Sakhrani, Ling Xu, Brian L. Spatocco
    Big Data, March 2015
    [paper]

2014

  • "A Latent Source Model for Online Collaborative Filtering"
    ♣ Guy Bresler, George H. Chen, Devavrat Shah
    Neural Information Processing Systems, December 2014
    [arXiv - longer version] [paper - short conference version] [poster]
    Selected for spotlight (one of 62/1678 submissions)
    Note: An expanded version including intuition for how collaborative filtering relates to an MAP item recommender and derivations for the examples is in Chapter 4 of my Ph.D. thesis; the notation has also been changed to be more similar to the other two papers that went toward my thesis.

2013

  • "A Latent Source Model for Nonparametric Time Series Classification"
    George H. Chen, Stanislav Nikolov, Devavrat Shah
    Neural Information Processing Systems, December 2013
    [arXiv - longer version] [paper - short conference version] [poster]
    Note: An expanded version with a lower bound on the misclassification rate and further discussion is in Chapter 3 of my Ph.D. thesis.
  • "Sparse Projections of Medical Images onto Manifolds"
    George H. Chen, Christian Wachinger, Polina Golland
    Information Processing in Medical Imaging, June-July 2013
    [arXiv] [paper] [poster]

2012

  • "Deformation-Invariant Sparse Coding"
    George H. Chen
    Master's thesis, MIT, May 2012
    [paper] [poster]

2011

  • "Deformation-Invariant Sparse Coding for Modeling Spatial Variability of Functional Patterns in the Brain"
    George H. Chen, Evelina G. Fedorenko, Nancy G. Kanwisher, Polina Golland
    Neural Information Processing Systems Workshop on Machine Learning and Interpretation in Neuroimaging, December 2011
    [paper] [talk slides]

2010

  • "Indoor Localization and Visualization Using a Human-Operated Backpack System"
    Timothy Liu, Matthew Carlberg, George Chen, Jacky Chen, John Kua, Avideh Zakhor
    International Conference on Indoor Positioning and Indoor Navigation, September 2010
    [paper]
  • "Indoor Localization Algorithms for a Human-Operated Backpack System"
    George Chen, John Kua, Stephen Shum, Nikhil Naikal, Matthew Carlberg, Avideh Zakhor
    International Symposium on 3D Data Processing, Visualization and Transmission, May 2010
    [paper]

2009

  • "Classifying Urban Landscape in Aerial LIDAR Using 3D Shape Analysis"
    Matthew Carlberg, Peiran Gao, George Chen, Avideh Zakhor
    International Conference on Image Processing, November 2009
    [paper]
  • "2D Tree Detection in Large Urban Landscapes Using Aerial LIDAR Data"
    George Chen, Avideh Zakhor
    International Conference on Image Processing, November 2009
    [paper]
  • "Image Augmented Laser Scan Matching for Indoor Dead Reckoning"
    Nikhil Naikal, John Kua, George Chen, Avideh Zakhor
    International Conference on Intelligent Robots and Systems, October 2009
    [paper]