I am an Assistant Professor at the Machine Learning Department in Carnegie Mellon University. I received my PhD in the Computer Science Department at Princeton University under the advisement of Sanjeev Arora. My research interests lie in machine learning and statistics, spanning topics like representation learning, generative models, word embeddings, variational inference and MCMC and non-convex optimization. The broad goal of my research is principled and mathematical understanding of statistical and algorithmic phenomena and problems arising in modern machine learning. The easiest way to reach me is email. My address is aristesk |

- Masked prediction tasks: a parameter identifiability view. With Bingin Liu, Daniel Hsu and Pradeep Ravikumar.
*Manuscript 2022.* - Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient for Out-of-Distribution Generalization. With Elan Rosenfeld and Pradeep Ravikumar.
*Manuscript 2022.* - The Effects of Invertibility on the Representational Complexity of Encoders in Variational Autoencoders. With Divyansh Pareek.
*ICLR 2022.* - Parametric Complexity Bounds for Approximating PDEs with Neural Networks. With Tanya Marwah and Zachary C. Lipton.
*NeurIPS 2021, Spotlight.* - Universal Approximation for Log-concave Distributions using Well-conditioned Normalizing Flows. With Holden Lee, Chirag Pabbaraju and Anish Sevekari.
*NeurIPS 2021.* - An Online Learning Approach to Interpolation and Extrapolation in Domain Generalization. With Elan Rosenfeld and Pradeep Ravikumar.
*AISTATS 2022.* - The Limitations of Limited Context for Constituency Parsing. With Yuchen Li.
*ACL 2021.* - The Risks of Invariant Risk Minimization. With Elan Rosenfeld and Pradeep Ravikumar.
*ICLR 2021.* - Representational aspects of depth and conditioning in normalizing flows. With Frederic Koehler and Viraj Mehta.
*ICML 2021.* - On Learning Language-Invariant Representations for Universal Machine Translation. With Han Zhao, Junjie Hu.
*ICML 2020.* - Benefits of Overparameterization in Single-Layer Latent Variable Generative Models. With Rares Buhai, Yoni Halpern and David Sontag.
*ICML 2020.* - Approximability of Discriminators Implies Diversity in GANs. With Yu Bai and Tengyu Ma.
*ICLR 2019.* - Representational Power of ReLU Networks and Polynomial Kernels: Beyond Worst-Case Analysis. With Frederic Koehler.
*ICLR 2019.* - Do GANs learn the distribution? Some theory and empirics. With Sanjeev Arora and Yi Zhang.
*ICLR 2018* - Linear algebraic structure of word senses, with applications to polysemy. With Sanjeev Arora, Yuanzhi Li, Yingyu Liang and Tengyu Ma.
*Transactions of the Association for Computat ional Linguistics (TACL), 2018* - Automated WordNet Construction Using Word Embeddings. With Mikhail Khodak, Christiane Fellbaum, Sanjeev Arora.
*EACL Workshop on Sense, Concept and Entity Representation s and their Applications, 2017* - On the ability of neural nets to express distributions. With Holden Lee, Rong Ge, Tengyu Ma, Sanjeev Arora.
*COLT 2017* - RAND-WALK: a latent variable model approach to word embeddings. With Sanjeev Arora, Yuanzhi Li, Yingyu Liang and Tengyu Ma.
*Transactions of the Association for Computational Linguistics (TACL), 2016*

- Sampling Approximately Low-Rank Ising Models: MCMC meets Variational Methods. With Frederic Koehler and Holden Lee.
*Manuscript 2022.* - Analyzing and improving the optimization landscape of noise-contrastive estimation. With Bingbin Liu, Elan Rosenfeld and Pradeep Ravikumar.
*ICLR 2022 (Spotlight).* - Variational autoencoders in the presence of low-dimensional data: landscape and implicit bias. With Frederic Koehler, Viraj Mehta and Chenghui Zhou
*ICLR 2022.* - Contrastive learning of strong-mixing continuous-time stochastic processes. With Bingbin Liu and Pradeep Ravikumar.
*AISTATS 2021.* - Efficient sampling from the Bingham distribution. With Rong Ge, Holden Lee and Jianfeng Lu.
*ALT 2021.* - Fast Convergence for Langevin Diffusion with Matrix Manifold Structure. With Ankur Moitra.
*Manuscript 2020* - Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo. With Rong Ge and Holden Lee.
*NeurIPS 2018* - Mean-field approximation, convex hierarchies, and the optimality of correlation rounding: a unified perspective. With Vishesh Jain and Frederik Koehler.
*STOC 2019.* - Provable learning of noisy-or networks. With Sanjeev Arora, Rong Ge, and Tengyu Ma.
*STOC 2017* - How to calculate partition functions using convex programming hierarchies: provable bounds for variational methods.
*COLT 2016, long talk* - Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods. With Yuanzhi Li.
*NeurIPS 2016* - Non-negative matrix factorization using a decode-and-update approach. With Yuanzhi Li and Yingyu Liang.
*NeurIPS 2016* - Recovery guarantee of weighted low-rank approximation via alternating minimization. With Yuanzhi Li and Yingyu Liang.
*ICML 2016* - On some provably correct cases of variational inference for topic models. With Pranjal Awasthi.
*NeurIPS 2015, Spotlight*

- Sum-of-squares meets square loss: Fast rates for agnostic tensor completion. With Dylan J. Foster.
*COLT 2019* - Tight algorithms and lower bounds for approximately convex optimization. With Yuanzhi Li.
*NeurI PS 2016* - Label optimal regret bounds for online local learning. With Pranjal Awasthi, Moses Charikar and Kevin A. Lai.
*COLT 2015*

- Yuchen Li
- Bingbin Liu (co-advised with Pradeep Ravikumar)
- Tanya Marwah (co-advised with Zachary Lipton)
- Elan Rosenfeld (co-advised with Pradeep Ravikumar)

- Diffusing along manifolds of local optima via Langevin dynamics
- Microsoft Research New England, 03/19
- MIFODS Workshop on Learning with Complex Structure, MIT, 01/20
- Mean-field approximation and variational methods via convex relaxations
- Harvard Physics and Computation Seminar, 10/18
- MIT Seminar on Stochastic Processes, 11/18
- Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo
- MIT Algorithms and Complexity Seminar, 11/01/17
- Provable algorithms for learning noisy-OR networks
- STOC (Montreal, 2017)
- Theoretical aspects of representation learning
- Simons Institute for the Theory of Computing, 03/27/17
- New techniques for learning and inference in probabilistic graphical models
- MIT Stochastics and Statistics Seminar, 09/08/17
- Microsoft Research Redmond, 02/08/17
- How to calculate partition functions using convex programming hierarchies: provable bounds for variational methods
- Stanford Theory Seminar, 02/02/17
- Los Alamos National Laboratory, 11/07/16
- Rutgers University, 10/19/16
- COLT (New York City, 2016) [Video]
- On some provably correct cases of variational inference for topic models
- NeurIPS (Montreal, 2015) [Video, talk starts circa 11:45]
- Random walks on context spaces: towards an explanation of the mysteries of semantic word embeddings
- China Theory Week (Jiao Tong University, Shanghai, 2015)
- Label optimal regret bounds for online local learning
- COLT (Paris, 2015) [Video]

- Instructor for 10.707 (Advanced Deep Learning) at CMU: Spring 2021
- Instructor for 10.417 (Intermediate Deep Learning) at CMU: Fall 2020
- Instructor for 10.707 (Advanced Deep Learning) at CMU: Spring 2020
- Instructor for 18.200A (Principles of Discrete and Applied Mathematics) at MIT: Fall 2017/18 and Fall 2018/19