Nonasymptotic theory for nearest neighbor methods in prediction
Despite nearest neighbor methods appearing in text as early as the
11th century in Alhazen's "Book of Optics", it was not until fairly
recently that arguably the most general, nonasymptotic theory for
nearest neighbor classification was developed by
Chaudhuri and Dasgupta (2014).
I've worked on a book that goes over some of the latest nonasymptotic theoretical guarantees
for nearest neighbor and related kernel regression and classification methods
both in general metric spaces, and in contemporary applications where
clustering structure appears (time series forecasting, recommendation
systems, medical image segmentation). The book also covers some recent
advances in approximate nearest neighbor search, explains why
decision tree and related ensemble methods are nearest neighbor
methods, and discusses the potential for far away neighbors to help
in prediction. I helped organize a
related workshop at NIPS 2017
(slides are available for all the talks).
-
"Explaining the Success of Nearest Neighbor Methods in Prediction"
George H. Chen, Devavrat Shah
Foundations and Trends in Machine Learning, May 2018
[DOI]
Chapter 5 of the above book is on theoretical results using clustering
structure. This chapter is based on my PhD thesis and provides a better
overview than my thesis does. Proofs for the chapter are deferred to my thesis:
-
"Latent Source Models for Nonparametric Inference"
George H. Chen
Ph.D. thesis, MIT, May 2015
[paper]
Received the George M. Sprowls award for best Ph.D. thesis in Computer Science at MIT
My thesis unifies and builds on the following trilogy of papers:
-
"A Latent Source Model for Patch-Based Image Segmentation"
George H. Chen, Devavrat Shah, Polina Golland
Medical Image Computing and Computer-Assisted Intervention, October 2015
[arXiv]
[paper]
[poster]
Note:
For a more comprehensive exposition of this paper, consider
reading Chapter 5 of my
Ph.D. thesis.
-
"A Latent Source Model for Online Collaborative Filtering"
♣
Guy Bresler, George H. Chen, Devavrat Shah
Neural Information Processing Systems, December 2014
[arXiv - longer version]
[paper - short conference version]
[poster]
Selected for spotlight (one of 62/1678 submissions)
Note:
An expanded version including intuition for how collaborative
filtering relates to an MAP item recommender and derivations for
the examples is in Chapter 4 of my
Ph.D. thesis;
the notation has also been changed to be more similar to the
rest of the trilogy of papers.
-
"A Latent Source Model for Nonparametric Time Series Classification"
♣
George H. Chen, Stanislav Nikolov, Devavrat Shah
Neural Information Processing Systems, December 2013
[arXiv - longer version]
[paper - short conference version]
[poster]
Note:
An expanded version with a lower bound on the misclassification
rate and further discussion is in Chapter 3 of my
Ph.D. thesis.
Forecasting patient outcomes in electronic health records
I'm working on topic modeling combined with survival analysis,
currently focusing on patients with pancreatitis admitted to the
intensive care unit:
-
"Survival-Supervised Topic Modeling with Anchor Words: Characterizing Pancreatitis Outcomes"
George H. Chen, Jeremy C. Weiss
Neural Information Processing Systems Workshop on Machine Learning for Health, December 2017
[arXiv (short workshop version)] [longer version in preparation]
(Also presented at an abstract-only venue
Society for Medical Decision Making North American Meeting, October 2017)
Rural development
With a startup called CoolCrop, I am working on providing small and marginal
farmers in rural India with access to cost-effective refrigeration and
predictive analytics:
-
"Toward Reducing Crop Spoilage and Increasing Small Farmer Profits in India: a Simultaneous Hardware and Software Solution"
George H. Chen, Kendall Nowocin, Niraj Marathe
Information and Communication Technologies for Development, November 2017
[arXiv]
Previously, as part of a startup GridForm,
I analyzed satellite images of enormous tracts of land to help plan
development projects. We focused on
helping
renewable energy companies bring electricity to rural India. We
won
the $10,000 grand prize at the 2014 MIT IDEAS Global Challenge. Here's a
joint paper with Kush Varshney and Brian Abelson of
DataKind:
-
"Targeting Villages for Rural Development Using Satellite Image Analysis"
Kush R. Varshney, George H. Chen, Brian Abelson, Kendall Nowocin, Vivek Sakhrani, Ling Xu, Brian L. Spatocco
Big Data, March 2015
[paper]
Real-time medical image analysis
Various real-time medical imaging applications could be enabled by speeding up
dimensionality reduction, a subroutine used in many image analysis algorithms.
To do this, we create a sparse description of a manifold; our work relates to
sparse multivariate regression:
-
"Sparse Projections of Medical Images onto Manifolds"
George H. Chen, Christian Wachinger, Polina Golland
Information Processing in Medical Imaging, June-July 2013
[arXiv]
[paper]
[poster]
Modeling brain activation patterns
My master's thesis presented a probabilistic model of brain
activation patterns evoked by functional stimuli such as reading
sentences; the model combines sparse coding and image alignment:
-
"Deformation-Invariant Sparse Coding"
George H. Chen
Master's thesis, MIT, May 2012
[paper]
[poster]
Preliminary version:
-
"Deformation-Invariant Sparse Coding for Modeling Spatial Variability of Functional Patterns in the Brain"
George H. Chen, Evelina G. Fedorenko, Nancy G. Kanwisher, Polina Golland
Neural Information Processing Systems Workshop on Machine Learning and Interpretation in Neuroimaging, December 2011
[paper]
[talk slides]
Backpack with sensors for indoor modeling
I developed algorithms that track where this fancy backpack is
indoors using laser scanners.
After I graduated from Berkeley, this project progressed quite a bit!
Be sure to check out the latest developments from the
Video and Image
Processing Lab's website.
Preliminary results:
-
"Indoor Localization and Visualization Using a Human-Operated Backpack System"
Timothy Liu, Matthew Carlberg, George Chen, Jacky Chen, John Kua, Avideh Zakhor
International Conference on Indoor Positioning and Indoor Navigation, September 2010
[paper]
-
"Indoor Localization Algorithms for a Human-Operated Backpack System"
George Chen, John Kua, Stephen Shum, Nikhil Naikal, Matthew Carlberg, Avideh Zakhor
International Symposium on 3D Data Processing, Visualization and Transmission, May 2010
[paper]
-
"Image Augmented Laser Scan Matching for Indoor Dead Reckoning"
Nikhil Naikal, John Kua, George Chen, Avideh Zakhor
International Conference on Intelligent Robots and Systems, October 2009
[paper]
Analyzing aerial images of cities
How to automatically find buildings, trees, ground, and water in
aerial LIDAR images:
-
"Classifying Urban Landscape in Aerial LIDAR Using 3D Shape
Analysis"
Matthew Carlberg, Peiran Gao, George Chen,
Avideh Zakhor
International Conference on Image Processing, November 2009
[paper]
-
"2D Tree Detection in Large Urban Landscapes Using Aerial LIDAR
Data"
George Chen, Avideh Zakhor
International Conference on Image Processing, November 2009
[paper]