Stephen McAleer

Postdoc at Carnegie Mellon University

Contact:   smcaleer@cs.cmu.edu
Google Scholar | CV | Twitter


I am broadly interested in algorithms for robust sequential decision-making, drawing from reinforcement learning, search, and game theory. My long-term goal is to create an agent that can do anything that a human can do on a computer. Toward this goal, I am currently working on foundation models for decision-making and AI alignment.

I am a postdoc at CMU working with Tuomas Sandholm. I received a PhD in computer science from the University of California, Irvine working with Pierre Baldi. During my PhD, I did research scientist internships at Intel Labs and DeepMind. Before that, I received my bachelor's degree in mathematics and economics from Arizona State University in 2017.

I am currently on the job market! Please reach out if you think I will be a good fit.


Research

Language Models can Solve Computer Tasks
Geunwoo Kim, Pierre Baldi, Stephen McAleer
NeurIPS 2023
Paper | Code

Team-PSRO for Learning Approximate TMECor in Large Team Games via Cooperative Reinforcement Learning
Stephen McAleer, Gabriele Farina, Gaoyue Zhou, Mingzhi Wang, Yaodong Yang, Tuomas Sandholm
NeurIPS 2023
Paper

ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret
Stephen McAleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm
ICLR 2023
Paper | Code

Mastering the Game of Stratego With Model-Free Multiagent Reinforcement Learning
Julien Perolat*, Bart de Vylder*, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent Sifre, Nathalie Beauguerlange, Remi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls*
Science 2022
Paper

XDO: A Double Oracle Algorithm for Extensive-Form Games
Stephen McAleer, John Lanier, Kevin A Wang, Pierre Baldi, Roy Fox
NeurIPS 2021
Paper | Code

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games
Stephen McAleer*, John Lanier*, Roy Fox, Pierre Baldi
NeurIPS 2020
Paper | Code

Solving the Rubik's Cube With Approximate Policy Iteration
Stephen McAleer*, Forest Agostinelli*, Alexander Shmakov*, Pierre Baldi
ICLR 2018
Paper

Llemma: An Open Language Model For Mathematics
Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck
ArXiv 2023
Paper | Code

Confronting Reward Model Overoptimization with Constrained RLHF
Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen McAleer
ArXiv 2023
Paper

Language Models can Solve Computer Tasks
Geunwoo Kim, Pierre Baldi, Stephen McAleer
NeurIPS 2023
Paper | Code

Team-PSRO for Learning Approximate TMECor in Large Team Games via Cooperative Reinforcement Learning
Stephen McAleer, Gabriele Farina, Gaoyue Zhou, Mingzhi Wang, Yaodong Yang, Tuomas Sandholm
NeurIPS 2023
Paper

Policy Space Diversity for Non-Transitive Games
Jian Yao, Weiming Liu, Haobo Fu, Yaodong Yang, Stephen McAleer, Qiang Fu, Wei Yang
NeurIPS 2023
Paper

Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games
Brian Hu Zhang, Gabriele Farina, Ioannis Anagnostides, Federico Cacciamani, Stephen McAleer, Andreas Alexander Haupt, Andrea Celli, Nicola Gatti, Vincent Conitzer, Tuomas Sandholm
NeurIPS 2023
Paper

Algorithms and Complexity for Computing Nash Equilibria in Adversarial Team Games
Ioannis Anagnostides, Fivos Kalogiannis, Ioannis Panageas, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Stephen McAleer
EC 2023
Paper

Regret-Minimizing Double Oracle for Extensive-Form Games
Xiaohang Tang, Le Cong Dinh, Stephen McAleer, Yaodong Yang
ICML 2023
Paper

MANSA: Learning Fast and Slow in Multi-Agent Systems
David Henry Mguni, Taher Jafferjee, Haojun Chen, Jianhong Wang, Long Fei, Xidong Feng, Stephen McAleer, Feifei Tong, Jun Wang, Yaodong Yang
ICML 2023
Paper

A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems
Oliver Slumbers, David Henry Mguni, Stephen McAleer, Stefano B Blumberg, Yaodong Yang, Jun Wang
ICML 2023
Paper

ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret
Stephen McAleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm
ICLR 2023
Paper | Code

Mastering the Game of Stratego With Model-Free Multiagent Reinforcement Learning
Julien Perolat*, Bart de Vylder*, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent Sifre, Nathalie Beauguerlange, Remi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls*
Science 2022
Paper

Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning
Yuanpei Chen, Yaodong Yang, Tianhao Wu, Shengjie Wang, Xidong Feng, Jiechuang Jiang, Stephen McAleer, Hao Dong, Zongqing Lu, Song-Chun Zhu
NeurIPS 2022
Paper | Code

Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks
Litian Liang, Yaosheng Xu, Stephen McAleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox
ICML 2022
Paper

Proving Theorems using Incremental Learning and Hindsight Experience Replay
Eser AygŁn, Laurent Orseau, Ankit Anand, Xavier Glorot, Stephen McAleer, Vlad Firoiu, Lei Zhang, Doina Precup, Shibl Mourad
ICML 2022
Paper

Independent Natural Policy Gradient Always Converges in Markov Potential Games
Roy Fox, Stephen McAleer, Will Overman, Ioannis Panageas
AISTATS 2022
Paper

XDO: A Double Oracle Algorithm for Extensive-Form Games
Stephen McAleer, John Lanier, Kevin A Wang, Pierre Baldi, Roy Fox
NeurIPS 2021
Paper | Code

Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games
Xidong Feng, Oliver Slumbers, Yaodong Yang, Ziyu Wan, Bo Liu, Stephen McAleer, Ying Wen, Jun Wang
NeurIPS 2021
Paper

Online Double Oracle
Le Cong Dinh, Yaodong Yang, Stephen McAleer, Nicolas Perez Nieves, Oliver Slumbers, Zheng Tian, David Henry Mguni, Haitham Bou Ammar, Jun Wang
TMLR 2021
Paper

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games
Stephen McAleer*, John Lanier*, Roy Fox, Pierre Baldi
NeurIPS 2020
Paper | Code

Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination
Shauharda Khadka, Somdeb Majumdar, Santiago Miret, Stephen McAleer, Kagan Tumer
ICML 2020
Paper

Solving the Rubik's Cube with Deep Reinforcement Learning and Search
Forest Agostinelli*, Stephen McAleer*, Alexander Shmakov*, Pierre Baldi
Nature Machine Intelligence 2019
Paper | Code

Solving the Rubik's Cube With Approximate Policy Iteration
Stephen McAleer*, Forest Agostinelli*, Alexander Shmakov*, Pierre Baldi
ICLR 2018
Paper


Teaching

Tuomas and I are co-teaching a course on computational game solving this semester. The first half focuses on fundamental concepts in game theory and the second half covers state-of-the art methods on large games such as Stratego and Diplomacy. We emphasize the intersection of concepts from reinforcement learning and game theory in state-of-the-art methods.


Selected Press

VentureBeat: Meet LLEMMA, the math-focused open source AI that outperforms rivals.

Popular Science: Here's how a new AI mastered the tricky game of Stratego.

TechCrunch: Now AI can outmaneuver you at both Stratego and Diplomacy.

Gizmodo: DeepMind's New AI Uses Game Theory to Trounce Humans in 'Stratego'.

MIT Technology Review: A machine has figured out Rubik's Cube all by itself.

Washington Post: How quickly can AI solve a Rubik's Cube? In less time than it took you to read this headline.

LA Times: A machine taught itself to solve Rubik's Cube without human help, UC Irvine researchers say.

BBC: AI Solves Rubik's Cube in One Second.