18-847F: Foundations of Cloud and ML Infrastructure

Course Description

The objective of this seminar course is to introduce students to modern cloud and machine learning infrastructure, and its theoretical foundations. Students will read, present and critique a curated set of research papers from both theory and systems. There will also be a final project based on the topics discussed.

The first half of the course will cover distributed computing and storage systems. We will study frameworks such as MapReduce and Spark, and discuss scheduling and load balancing policies used in them. In the context of distributed storage systems, we will discuss coding-theoretic techniques used to improve availability and repair failed nodes. The second half of the course will focus on machine learning infrastructure. A key discussion topic will be stochastic gradient descent and its implementation in large-scale systems. Other topics include hyper-parameter tuning in neural networks, and generative adversarial networks.

Class Hours

MW 4:30-6:00 pm, starting Aug 30th
Scaife Hall 222

Syllabus

PDF Syllabus and Schedule (subject to change)