Stephen McAleer

Postdoc at Carnegie Mellon University

Contact:   smcaleer@cs.cmu.edu
Google Scholar | CV | Twitter


I am a postdoc at CMU working on reinforcement learning and game theory with Tuomas Sandholm.

I am interested in algorithms that make optimal decisions in the presence of other decision makers. In particular, I work on developing scalable algorithms that have game-theoretic guarantees. Some topics that I am currently working on include:

  • Reinforcement learning for two-player zero-sum games
  • Algorithms for mediators in general-sum games
  • Multi-agent reinforcement learning with game-theoretic guarantees
  • Generalization in reinforcement learning

I received my PhD in computer science from the University of California, Irvine working with Pierre Baldi. During my PhD, I did research scientist internships at Intel Labs and DeepMind. Before that, I received my bachelor's degree in mathematics and economics from Arizona State University in 2017.

Please reach out if you are interested in talking!


Representative Papers

Multi-Agent Reinforcement Learning

ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret
Stephen McAleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm
Preprint 2022
Paper | Code

Mastering the Game of Stratego With Model-Free Multiagent Reinforcement Learning
Julien Perolat*, Bart de Vylder*, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent Sifre, Nathalie Beauguerlange, Remi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls*
Preprint 2022
Paper

XDO: A Double Oracle Algorithm for Extensive-Form Games
Stephen McAleer, John Lanier, Kevin A Wang, Pierre Baldi, Roy Fox
Conference on Neural Information Processing Systems (NeurIPS) 2021
Paper | Code

Neural Auto-Curricula in Two-Player Zero-Sum Games
Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen McAleer, Ying Wen, Jun Wang, Yaodong Yang
Conference on Neural Information Processing Systems (NeurIPS) 2021
Paper

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games
Stephen McAleer*, John Lanier*, Roy Fox, Pierre Baldi
Conference on Neural Information Processing Systems (NeurIPS) 2020
Paper | Code

Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination
Somdeb Majumdar, Shauharda Khadka, Santiago Miret, Stephen McAleer, Stephen McAleer, Kagan Tumer
International Conference on Machine Learning (ICML) 2020
Paper

Single-Agent Reinforcement Learning

Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks
Litian Liang, Yaosheng Xu, Stephen McAleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox
International Conference on Machine Learning (ICML) 2022
Paper

Proving Theorems Using Incremental Learning and Hindsight Experience Replay
Eser Aygün, Ankit Anand, Laurent Orseau, Xavier Glorot, Stephen McAleer, Vlad Firoiu, Lei M Zhang, Doina Precup, Shibl Mourad
International Conference on Machine Learning (ICML) 2022
Paper

Solving the Rubik's Cube With Deep Reinforcement Learning and Search
Forest Agostinelli*, Stephen McAleer*, Alexander Shmakov*, Pierre Baldi
Nature Machine Intelligence 2019
Paper

Solving the Rubik's Cube With Approximate Policy Iteration
Stephen McAleer*, Forest Agostinelli*, Alexander Shmakov*, Pierre Baldi
International Conference on Learning Representations (ICLR) 2018
Paper


Selected Press

MIT Technology Review: A machine has figured out Rubik's Cube all by itself.

Washington Post: How quickly can AI solve a Rubik's Cube? In less time than it took you to read this headline.

LA Times: A machine taught itself to solve Rubik's Cube without human help, UC Irvine researchers say.

BBC: AI Solves Rubik's Cube in One Second.