Harry Dong

PhD Student, Carnegie Mellon University

I am a 5th/final year Electrical and Computer Engineering (ECE) PhD student at Carnegie Mellon University (CMU) where I have the pleasure of exploring my research interests in efficient machine learning algorithms with my advisor, Professor Yuejie Chi. Currently, I am also a part-time research intern at Meta. I frequently collaborate with Beidi Chen and folks at AFRL. Prior to CMU, I graduated with High Distinction from UC Berkeley with degrees in statistics and computer science in 2021. Before then, I grew up in the peaceful Central Valley of California.

Feel free to reach me (about anything) via email: harryd [at] andrew [dot] cmu [dot] edu

CV / Google Scholar / LinkedIn / GitHub / X (Twitter)

Research Overview

My research roughly aims to make powerful deep learning models more practical for use. My main focus is on the algorithmic side of LLM inference efficiency/scaling by leveraging inherent structures and patterns within the architecture, data, and/or pretrained weights. To help with this, I also like to uncover and understand the various subtle idiosyncrasies of these models. I also devote a significant amount of time investigating how LLMs and diffusion models can be applied to challenging science problems, particularly in materials science, where there may be physical constraints and low error tolerance. Previously, I have worked on various provable optimization methods for estimation, traffic routing, and neuroscience.

Areas

AI Efficiency: Efficient algorithms/architectures to reduce inference time and costs, with a focus on LLMs.
Inference Scaling: Understanding and improving inference scaling behavior of LLMs, with a focus on reasoning.
Machine Learning in Materials Science: Scalable methods for materials data that have underlying physics relationships.

Research Highlights

Generalized Parallel Scaling with Interdependent Generations
Harry Dong, David Brandfonbrener, Eryk Helenowski, Yun He, Mrinal Kumar, Han Fang, Yuejie Chi, Karthik Abinav Sankararaman
Preprint, 2025
Paper

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
Harry Dong, Xinyu Yang, Zhenyu Zhang, Zhangyang Wang, Yuejie Chi, Beidi Chen
International Conference on Machine Learning (ICML), 2024
Paper / Code

Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation
Harry Dong, Beidi Chen, Yuejie Chi
Conference on Language Modeling (COLM), 2024
Paper / Code / Oral

A Lightweight Transformer for Faster and Robust EBSD Data Collection
Harry Dong, Sean Donegan, Megna Shah, Yuejie Chi
Scientific Reports, 2024
Paper / Code

Fast and Provable Tensor Robust Principal Component Analysis via Scaled Gradient Descent
Harry Dong, Tian Tong, Cong Ma, Yuejie Chi
Information and Inference: A Journal of the IMA, 2023
Paper / Code

Awards

Wei Shen and Xuehong Zhang Presidential Fellowship (2024)
Liang Ji-Dian Graduate Fellowship (2023)
Michel and Kathy Doreau Graduate Fellowship (2023)
NSF GRFP Honorable Mention (2023)
UC Berkeley High Distinction (2021)