Course Overview
The objective of this course is to introduce students to state-of-the-art algorithms in large-scale machine learning and distributed optimization, in particular, the emerging field of federated learning. Topics to be covered include but are not limited to:
- Mini-batch SGD and its convergence analysis
- Momentum and variance reduction methods
- Synchronous and asynchronous SGD
- Local-update SGD
- Gradient compression/quantization
- Differential privacy in federated learning
- Decentralized SGD
- Hyperparameter optimization
Prerequisites
- A pre-requisite is an introductory course in machine learning: 18-461/661, 10-601/701 or equivalent
- Undergraduate level training or coursework in algorithms, linear algebra, calculus, probability, and statistics is strongly encourage.
- A background in programming will also be necessary for the problem sets; students are expected to be familiar with python or learn it during the course..
Comparison with Related Courses
- 18-847F: Foundations of Cloud and ML Infrastructure: Unlike 18-847F which had student
presentations on latest research papers, 18-667 will have lectures by the instructor. Also,
since the focus of 18-667 is on large-scale ML, the cloud computing topics from 18-847F will not
be covered in 18-667.
- 18-660: Optimization: While 18-660 covers the fundamentals of convex and non-convex
optimization and stochastic gradient descent, 18-667 will discuss state-of-the-art research
papers in federated learning and optimization. 18-667 can be taken after or along with 18-660.
Textbooks
There will be no required textbooks. Students are expected to read the research papers covered in each
lecture.
Piazza
We will use Piazza for class discussions. We strongly encourage students to post on this forum rather than emailing the course staff directly (this will be more efficient for both students and staff). Students should use Piazza to:
- Ask clarifying questions about the course material.
- Share useful resources with classmates (so long as they do not contain homework solutions).
- Look for students to form study groups.
- Answer questions posted by other students to solidify your own understanding of the material.
Grading Policy
Grades will be based on the following components:
- Homework (50%): There will be 4 homeworks.
- Late submissions will not be accepted.
- There is one exception to this rule: You are given 3 late days (self-granted 24-hr extensions) which you can use to give yourself extra time without penalty. At most one late day can be used per assignment. This will be monitored automatically via Gradescope.
- Solutions will be graded on both correctness and clarity. If you cannot solve a problem completely, you will get more partial credit by identifying the gaps in your argument than by attempting to cover them up.
- Three Quizzes (50%) : Short quizzes with multiple-choice questions based on the papers discussed each week
Collaboration Policy
Group studying and collaborating on problem sets are encouraged, as working together is a great way to understand new material. Students are free to discuss the homework problems with anyone under the following conditions:- Students must write their own solutions and understand the solutions that they wrote down.
- Students must list the names of their collaborators (i.e., anyone with whom the assignment was discussed).
- Students may not use old solution sets from other classes under any circumstances, unless the instructor grants special permission.
Schedule
Date | Lecture | Readings | Announcements |
---|---|---|---|
Mon, 1 Feb | Intro and Logistics [Slides] |
|
|
Wed, 3 Feb | SGD in Machine Learning [Slides] | ||
Fri, 5 Feb | OH | HW1 release | |
Mon, 8 Feb | SGD in Neural Network Training, Momentum and Adaptive Methods [Slides] | ||
Wed, 10 Feb | SGD Convergence Analysis [Slides] | ||
Fri, 12 Feb | Math Quiz Review | ||
Mon, 15 Feb | Distributed Synchronous SGD [Slides] | ||
Wed, 17 Feb | Lecture Canceled | ||
Fri, 19 Feb | HW1 due, HW2 release | ||
Mon, 22 Feb | Asynchronous SGD, AdaSync, Hogwild [Slides] | ||
Wed, 24 Feb | Break Day; No Class | ||
Fri, 26 Feb | |||
Mon, 1 Mar | Local-update SGD [Slides] | ||
Wed, 3 Mar | Adacomm, Elastic Averaging, Overlap SGD [Slides] | ||
Fri, 5 Mar | |||
Mon, 8 Mar | First Quiz | ||
Wed, 10 Mar | Quantized SGD, AdaQuant [Slides] | ||
Fri, 12 Mar | |||
Sun, 14 Mar | HW2 due | ||
Mon, 15 Mar | Federated Learning Intro [Slides] | ||
Wed, 17 Mar | Data Heterogeneity in Federated Learning[Slides] | HW3 release | |
Fri, 19 Mar | Midsemester Break; No classes | ||
Mon, 22 Mar | Computational Heterogeneity in Federated Learning [Slides] | ||
Wed, 24 Mar | Fairness in FL [Slides] | ||
Fri, 26 Mar | |||
Mon, 29 Mar | Client Selection and Importance Sampling [Slides] | ||
Wed, 31 Mar | Robustness in FL [Slides] | ||
Fri, 2 Apr | |||
Mon, 5 Apr | Break Day; No classes | ||
Wed, 7 Apr | Second Quiz | ||
Fri, 9 Apr | HW3 due | ||
Mon, 12 Apr | Privacy and Security in FL [Slides] | ||
Wed, 14 Apr | Personalized and Multi-task Learning [Slides] |
HW4 release |
|
Fri, 16 Apr | |||
Mon, 19 Apr | Lecture Canceled | ||
Wed, 21 Apr | Decentralized SGD [Slides] | ||
Fri, 23 Apr | |||
Mon, 26 Apr | Hyperparameter Optimization [Slides] | ||
Wed, 28 Apr | Review Lecture [Slides] | ||
Fri, 30 Apr | |||
Mon, 3 May | No class | ||
Wed, 5 May | Third Quiz | ||
Fri, 7 May |
HW4 due |