Research Overview

We are interested in stochastic modeling and analysis that provides sharp insights into the design of big data infrastructure. Our vision is to democratize machine learning by transcending the need for big compute and big data. We design scalable and sample-efficient algorithms that enable users to perform accurate training and inference even with limited computation capabilities and small amounts of data.

Tools and Approach

We use a diverse set of analytical tools drawn from probability, queueing, coding theory, and statistical learning. Our approach involves the following steps, often traversed in an iterative manner:

  1. Choosing a model that captures the essential elements of the system, while being simple and tractable.
  2. Designing algorithms that are provably optimal within the model framework.
  3. Evaluating the proposed algorithms on the real system, going beyond the assumptions of the model.

Selected Projects and Talk Videos

Computational Heterogeneity in Federated Learning

The emerging area of federated learning seeks to achieve this goal by orchestrating distributed model training using a large number of resource-constrained mobile devices that collect data from their environment. Due to limited communication capabilities as well as privacy concerns, the data collected by these devices cannot be sent to the cloud for centralized processing. Instead, the nodes perform local training updates and only send the resulting model to the cloud. In this project, we tackle computational heterogeneity in federated optimization, for example, heterogeneous local updates made by the edge clients and intermittent client availability.

Distributed Machine Learning

In large-scale machine learning, training is performed by running stochastic gradient descent (SGD) in a distributed fashion using multiple worker nodes. Synchronization delays due to straggling workers and communication delays in aggregating updates from worker nodes can significantly increase the run-time of the training job. Our goal is to design a system-aware distributed SGD algorithms that strikes the best trade-off between the training time, and the error in the trained model.

Correlated Multi-armed Bandits

Multi-armed bandit algorithms have numerous data-driven decision-making applications including advertising, A/B testing and hyper-parameter tuning. Unlike classic works that assume independence of the rewards observed by sampling different arms, we consider correlated rewards. This correlation can be exploited to identify sub-optimal or "non-competitive" arms and avoid exploring them. This allows us to reduce a K armed bandit problem to a C armed bandit problem, where C is the number of "competitive" arms.

Redundancy in Parallel Computing

In large-scale computing where a job is divided into parallel tasks that are run on different computing, slow or straggling nodes can become the bottleneck. We use redundancy in the form of replication or erasure coding to design robust distributed computing algorithms that can seamless adjust to varying node computation speeds. Applications include distributed matrix computations, data analytics and inference.