Convex Optimization: Spring 2020

Machine Learning 10-725

Instructor: Yuanzhi Li (yuanzhil at andrew dot cmu dot edu)

Course descriptions:
According to Wikipedia: Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets. Many classes of convex optimization problems admit polynomial-time algorithms, whereas mathematical optimization is in general NP-hard.

According to Yintat Lee: The design of algorithms is traditionally a discrete endeavor. However, many advances have come from a continuous viewpoint. Typically, a continuous process, deterministic or randomized is designed (or shown) to have desirable properties, such as approaching an optimal solution or a desired distribution, and an algorithm is derived from this by appropriate discretization. In interesting and general settings, the current fastest methods are a consequence of this perspective.

In this course: Convex optimization was an extremely useful tool in Machine Learning, capable of solving many real-world problems efficiently, such as linear regression, logistic regression, linear programming, SDP programming etc. However, in modern machine learning, as the non-convex neural networks growing to be the dominating models in this field, principles developed in traditional convex optimization are now becoming insufficient.

In these lectures, you will see a convex optimization course that hasn't been taught (or close to being taught) anywhere else: This newly designed course spans the old topics in convex optimization such as Gradient Descent, Stochastic Gradient Descent, Mirror Descent, Accelerated Gradient Descent, Interior Point Method, and making connections all the way to optimizations topics in deep learning, such as Neural Tangent Kernel, Scheduled Learning Rate, Momentums, Batch normalization and Linear Reinforcement Learning.

In the end of the course, you will have a sense about the spirit of convex optimization, and how it can still be applied to other real-world problems that are not convex at all.

TAs:
Jerry Ding (dalud at andrew dot cmu dot edu), office hour: Thursday, 10:30 AM to 11:30 AM.
Zhiqian Qiao (zhiqianq at andrew dot cmu dot edu), office hour: TBD.
Vishwak Srinivasan (vishwaks at andrew dot cmu dot edu), office hour: Wednesday, 3:30 PM to 4:30 PM
Cinnie Hsiung (cinnieh at andrew dot cmu dot edu), office hour: Monday, 10:00 AM to 11:00 AM.
Stefani Karp (shkarp at andrew dot cmu dot edu), office hour: Friday, 3:30 PM to 4:30 PM.

Lectures: Tuesday and Thursday 9:00-10:20am, Wean Hall 7500

Course evaluation: 100% Homeworks (and they will be very simple and easy :) some amount of coding is expected).

Tentative Schedule (to be changed with students requests)

Fundamentals of convex optimization
Lecture 1 (Finished)	Introduction: Why am I taking this course?	(Definitely not just because of the gradings are super gentle)
Lecture 2 (Finished)	Gradient Descent and Basic Mirror Descent	(One of the "only two" optimization algorithms)
Lecture 3 (Finished)	Accelerated Gradient Descent	(Based on the paper "Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent")
Lecture 4 (Finished)	Constraint optimization: Projected Gradient Descent
Lecture 5 (Finished)	The real Mirror Descent	(One of the "only two" optimization algorithms)
Lecture 6-7 (Finished)	Stochastic Gradient Descent
Lecture 8-9 (Finished)	Distributed Optimization
Lecture 10 (Finished)	Proximal Gradient Descent	(For optimization over non-smooth regularizers)
Lecture 11 (Finished)	Duality and Minmax Optimization
Lecture 12 (Finished)	Fundations of Convex optimization: Recap
Convex geometry
Lecture 13	Self-concordant Barrier Functions and Interior Point Method	(Mainly for solving linear programming)
Lecture 14	Cutting Plane Method
Lecture 15	Optimization under uncertainty: Online Convex Optimization
Lecture 16	Adaptive Methods	(Adagrad and Adam)
Beyond convexity: What can we do?
Lecture ?	One-point Convex Functions: Gradient Descent and Mirror Descent	(With applications to matrix completion)
Lecture ?	Escaping Saddle Points and Hessian Descent	(With applications to neural networks with quadratic activations)
Neural networks and convex optimization
Lecture ?	How convex are neural networks?-- The Neural Tangent Kernel approach	(Based on the paper "A convergence theory of deep learning via over-parameterization")
Lecture ?	Scheduled Learning Rate in Deep Learning: A Mirror Descent view	(Based on the paper "Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks")
Lecture ?	Momentums in Deep Learning: A Dual Averaging view	(TBD)
Lecture ?	Batch Normalization, Weight Decay and Optimization: A discussion	(TBD)
Lecture ?	Contextual Bandit and Reinforcement Learning	(TBD)
Lecture ?	Limitations of Convex Methods and Shallow Learners	(Based on the paper "What Can ResNet Learn Efficiently, Going Beyond Kernels?")

Top