Convex Optimization: Spring 2020

Machine Learning 10-725

Instructor: Yuanzhi Li (yuanzhil at andrew dot cmu dot edu)

Course descriptions:
According to Wikipedia: Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets. Many classes of convex optimization problems admit polynomial-time algorithms, whereas mathematical optimization is in general NP-hard.

According to Yintat Lee: The design of algorithms is traditionally a discrete endeavor. However, many advances have come from a continuous viewpoint. Typically, a continuous process, deterministic or randomized is designed (or shown) to have desirable properties, such as approaching an optimal solution or a desired distribution, and an algorithm is derived from this by appropriate discretization. In interesting and general settings, the current fastest methods are a consequence of this perspective.

In this course: Convex optimization was an extremely useful tool in Machine Learning, capable of solving many real-world problems efficiently, such as linear regression, logistic regression, linear programming, SDP programming etc. However, in modern machine learning, as the non-convex neural networks growing to be the dominating models in this field, principles developed in traditional convex optimization are now becoming insufficient.

In these lectures, you will see a convex optimization course that hasn't been taught (or close to being taught) anywhere else: This newly designed course spans the old topics in convex optimization such as Gradient Descent, Stochastic Gradient Descent, Mirror Descent, Accelerated Gradient Descent, Interior Point Method, and making connections all the way to optimizations topics in deep learning, such as Neural Tangent Kernel, Scheduled Learning Rate, Momentums, Batch normalization, Adversarial training, Generative Adversarial Networks and Reinforcement Learning.

In the end of the course, you will have a sense about the spirit of convex optimization, and how it can still be applied to other real-world problems that are not convex at all.

Jerry Ding (dalud at andrew dot cmu dot edu), office hour: Thursday, 10:30 AM to 11:30 AM.
Vishwak Srinivasan (vishwaks at andrew dot cmu dot edu), office hour: Wednesday, 3:30 PM to 4:30 PM
Cinnie Hsiung (cinnieh at andrew dot cmu dot edu), office hour: Monday, 10:00 AM to 11:00 AM.
Stefani Karp (shkarp at andrew dot cmu dot edu), office hour: Friday, 3:30 PM to 4:30 PM.

Lectures: Tuesday and Thursday 9:00-10:20am, Wean Hall 7500

Course evaluation: 100% Homeworks (and they will be very simple and easy :) some amount of coding is expected).

Tentative Schedule (to be changed with students requests)

Fundamentals of convex optimization
Lecture 1 (Finished) Introduction: Why am I taking this course? (Definitely not just because of the gradings are super gentle)
Lecture 2 (Finished) Gradient Descent and Basic Mirror Descent (One of the "only two" optimization algorithms)
Lecture 3 (Finished) Momentum is all you need: The Accelerated Gradient Descent Algorithm (Based on the paper "Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent")
Lecture 4 (Finished) Constraint optimization: Projected Gradient Descent
Lecture 5 (Finished) The real Mirror Descent (One of the "only two" optimization algorithms)
Lecture 6-7 (Finished) Stochastic Gradient Descent
Lecture 8-9 (Finished) Distributed Optimization
Lecture 10 (Finished) Proximal Gradient Descent (For optimization over non-smooth regularizers)
Lecture 11 (Finished) Duality and Minmax Optimization (The gradient descent ascent algorithm)
Lecture 12 (Finished) Fundations of Convex optimization: Recap
Convex geometry
Lecture 13 (Finished) Hessian Matrix and Pre-conditioned Gradient Descent
Lecture 14 (Finished) Self-concordant functions and Interior Point Methods (How to eat sandwiches elegantly)
Lecture 15 (Finished) Adaptive Algorithms (The famous Adagrad and Adam)
Lecture 16 (Finished) Ellipsoid algorithm (How to cut cakes elegantly)
Introduction to non-convex optimization
Lecture 17 (Finished) Saddle Points, Local Minima, Hessian Descent and Noisy Gradient Descent
Lecture 18 (Finished) Introduction to Bayesian Optimization
Lecture 19 (Finished) Introduction to evolutionary optimization algorithms
Lecture 20 (Finished) Over-parameterization in non-convex optimization (Larger non-convex models are easier to train?)
Lecture 21 (Finished) Over-parameterization in deep learning (Larger neural networks are easier to train?)
Lecture 22 (Finished) Training and Generalization: Algorithmic regularization and inductive bias in non-convex optimization (Larger learning rate in SGD improves generalization?)
Lecture 23 (Finished) Adversarial examples and adversarial training in deep learning
Lecture 24 (Finished) Feature visualization in neural networks using adversarial training and projected gradient ascent (Feature purification: How adversarial training performs robust deep learning)
Lecture 26 (Finished) Generative Adversarial Networks and Minmax non-convex non-concave optimization (Optimality conditions and the local adversarial training algorithm)
Lecture 27 (Finished) Online optimization: Online learning (Making sequential decisions online)
Lecture 28 (Finished) Online optimization: Introduction to Reinforcement Learning