Deep Reinforcement Learning and Control
Spring 2019, CMU 10403


Instructors: Katerina Fragkiadaki
Lectures: Tuesd/Thursd, 3:00-4:20pm, Posner Hall 152
Recitations: Fri, 1:30-2:50pm, Posner 146
Office Hours: Katerina: Tuesd/Thursd 4:20-4.50pm, outside Posner Hall 152
Teaching Assistants:
Communication: Piazza is intended for all future announcements, general questions about the course, clarifications about assignments, student questions to each other, discussions about material, and so on. We strongly encourage all students to participate in discussion, ask, and answer questions through Piazza.

Class goals

  • Implement and experiment with existing algorithms for learning control policies guided by reinforcement, demonstrations and intrinsic curiosity.
  • Evaluate the sample complexity, generalization and generality of these algorithms.
  • Be able to understand research papers in the field of robotic learning.
  • Try out some ideas/extensions on your own. Particular focus on incorporating sensory input from visual sensors.

Prerequisites

The prerequisite for this course is a full semester introductory course in machine learning. If you have passed a similar semester-long course at another university, we accept that.

Schedule

The following schedule is tentative, it will continuously change based on time constraints and interest of the people in the class. Reading materials and lecture notes will be added as lectures progress.

Date Topic (slides) Assignments Readings
01/15 Course Introduction [1, SB Ch1, Ch16]
01/17 Imitation via Behavior Cloning [20,22,23,36]
01/18 RECITATION: tensorflow, keras, openai gym, and AWS
01/22 Introduction to policy search, MDPs, Dynamic Programming HW1 is out. [SB, Ch3, Ch4, 39]
01/24 DP cont., Monte Carlo learning [SB, Ch4, Ch5]
01/29 TD learning [SB Ch6]
01/30 CLASS CANCELLED HW2 is out, HW1 is due
02/05 Exploration-exploitation in multi-armed bandits, Thompson sampling [SB Ch 2 2.1-2.7,38]
02/07 N-step bootstrapping, Monte Carlo Tree Search [SB Ch 7, CH 8]
02/12 On-policy prediction with Function Approximation [SB Ch9]
02/14 Deep Q learning [4,5,6,40,41,42]
02/15 RECITATION HW3 out, HW2 due
02/19 Policy gradients, actor-critic methods [SB Ch 13, 7]
02/21 Policy gradients, actor-critic methods (cont.) [SB Ch 13, 7]
02/26 Policy gradients, actor-critic methods (cont.) [7,14,15]
02/28 Natural policy gradients [12,13]
03/04 HW3 due
03/05 Natural policy gradients (cont.), MCTS with neural networks, multigoal RL HW4 out [17,18,2,3]
03/07 Advanced evolutionary methods [37]
03/12 SPRING BREAK
03/14 SPRING BREAK
03/19 Model learning and model-based RL in low dim state space [47,48,49 ch1-2,50]
03/21 Model learning and model-based RL in high dim sensory space [45,44,51,52]
03/22 RECITATION: DDPG, HER, Natural Policy Gradient
03/26 Deep Exploration, intrinsic motivation HW4 due [9,10,11]
03/28 Variational Autoencoders, Bayes by backpropr [53,54,55,56]
03/29 RECITATION: Gaussian processes for Bayesian Optimization HW5 out [49 ch2]
04/02 Special topic: experimental design [57,58]
04/04 Inverse reinforcement Learning, Generative adversarial imitation learning [26,16]
04/09 GANs, Generative adversarial imitation learning [26,16,59,60]
04/11 NO CLASSES
04/16 Sim2Real transfer [33,61,63,64]
04/18 Maximum-entropy RL [19]
04/19 HW5 due
04/23 Policy learning from imitating local controllers,iLQR [29]
04/25 iLQR,Visual imitation [65,66]
04/26 RECITATION: Model-Based Reinforcement Learning HW6 out
04/30 Causality and CV for RL
05/03 Very recent advances and open problems
05/10 HW6 due

Resources

Readings

  1. [SB] Sutton & Barto, Reinforcement Learning: An Introduction, 2nd edition
  2. [GBC] Goodfellow, Bengio & Courville, Deep Learning
  1. Smith & Gasser, The Development of Embodied Cognition: Six Lessons from Babies
  2. Silver et al., Mastering the Game of Go with Deep Neural Networks and Tree Search
  3. Silver et al., Mastering the Game of Go without Human Knowledge
  4. Mnih et al., Playing Atari with Deep Reinforcement Learning
  5. Guo et al., Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning
  6. Hasselt et al., Deep Reinforcement Learning with Double Q-learning
  7. Mnih et al., Asynchronous Methods for Deep Reinforcement Learning
  8. Houthooft et al., VIME: Variational Information Maximizing Exploration
  9. Stadie et al., Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
  10. Pathak et al., Curiosity-driven Exploration by Self-supervised Prediction
  11. Burda et al., Large-Scale Study of Curiosity-Driven Learning
  12. Kakade et al., A Natural Policy Gradient
  13. Rajeswaran et al., Towards Generalization and Simplicity in Continuous Control
  14. Lillicrap et al., Continuous control with deep reinforcement learning
  15. Heess et al., Learning Continuous Control Policies by Stochastic Value Gradients
  16. Ho et al., Generative Adversarial Imitation Learning
  17. Andrychowicz et al., Hindsight Experience Replay
  18. Nair et al., Visual Reinforcement Learning with Imagined Goals
  19. Haarnoja et al., Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
  20. Bagnell, An Invitation to Imitation
  21. Nguyen, Imitation Learning with Recurrent Neural Networks
  22. Bojarski et al., End to End Learning for Self-Driving Cars
  23. Ross et al., A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
  24. Ziebart et al., Navigate Like a Cabbie: Probabilistic Reasoning from Observed Context-Aware Behavior
  25. Ho et al., Model-Free Imitation Learning with Policy Optimization
  26. Finn et al., Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
  27. Ziebart et al., Maximum Entropy Inverse Reinforcement Learning
  28. Ziebart et al., Human Behavior Modeling with Maximum Entropy Inverse Optimal Control
  29. Levine et al., Guided Policy Search
  30. Levine et al., End-to-End Training of Deep Visuomotor Policies
  31. Kumar et al., Learning Dexterous Manipulation Policies from Experience and Imitation
  32. Mordatch et al., Combining model-based policy search with online model learning for control of physical humanoids
  33. Rajeswaran et al., EPOpt: Learning Robust Neural Network Policies Using Model Ensembles
  34. Ganin et al., Domain-Adversarial Training of Neural Networks
  35. Finn et al., Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
  36. Rahmatizadeh et al., From Virtual Demonstration to Real-World Manipulation Using LSTM and MDN
  37. Salimans et al., Evolution Strategies as a Scalable Alternative to Reinforcement Learning
  38. Russo et al., A Tutorial on Thompson Sampling
  39. Nikolaus Hansen The CMA Evolution Strategy: A Tutorial
  40. Shaul et al. Prioritized Experience Replay
  41. Hessel et al. Rainbow: Combining Improvements in Deep Reinforcement Learning
  42. Pritzel et al. Neural Episodic Control
  43. Jaderberg et al. Population Based Training of Neural Networks
  44. Dosovitskiy et al. Learning to Act by Predicting the Future
  45. Fragkiadaki et al. Learning Visual Predictive Models of Physics for Playing Billiards
  46. Sanchez-Gonzalez et al. Graph Networks as Learnable Physics Engines for Inference and Control
  47. Nagabandi et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
  48. Chua et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
  49. Rasmussen and Williams Gaussian Processes for Machine Learning
  50. Deisenroth et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search
  51. Oh et al. Action-Conditional Video Prediction using Deep Networks in Atari Games
  52. Ebert et al. Self-Supervised Visual Planning with Temporal Skip Connections
  53. Osband et al. Deep Exploration via Bootstrapped DQN
  54. Blundell et al. Weight Uncertainty in Neural Networks
  55. Kingma and Welling Auto-Encoding Variational Bayes
  56. Doersch Tutorial on Variational Autoencoders
  57. Kandasamy et al. Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly
  58. Brochu et al. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning
  59. Goodfellow et al. Generative Adversarial Nets
  60. Zhu et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills
  61. Tobin et al. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
  62. Tremblay et al. Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization
  63. Muller et al. Driving Policy Transfer via Modularity and Abstraction
  64. Chebotar et al. Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience
  65. Peng et al. SFV: Reinforcement Learning of Physical Skills from Videos
  66. Pathak et al. Zero-Shot Visual Imitation

General references

Online courses

AWS Resources

For those of you who need GPU resources, for future homeworks or the project, please read through this section carefully.
  • If you are not officially registered for this class, you are not allowed to request resources. We will be checking before we submit requests, so please do not request access to them.
  • We will be offering AWS resources. All students should join AWS educate using this link: https://aws.amazon.com/education/awseducate/ using their @andrew.cmu.edu email address. If you do not use your andrew email address, your resources may be denied. You should do this as soon as possible, as it can take time to set up your accounts.
  • AWS NOTE: You need to back this account with your own credit/debit card and we will give out allocation codes of $50, this is important as should you go over this $50 it will start charging to your card, please be sure to keep an eye on your funds and not forget to terminate instances. The university holds no responsibility in paying for additional usage.
  • We will ask you to complete an Allocation Form in order to apply for your resources. This will be made available later in the semester. Note: HW1 does not require AWS resources.

Assignments and grading

The course grade is a weighted average of assignments (90%) and a final project (10% with an additional 10% in extra credit). This year the project will be a competition on playing Atari games using deep learning.

Homeworks

There will be seven homework assignments for this class. Unless otherwise stated, all homeworks must be completed independently (although discussion with your fellow students is allowed and encouraged as long as your interaction follows the course integrity policy). There will be certain assignments that alllow teams of two; this will be stated explicitly in the instructions for the assignment. For assignments with teams, only one person should submit the writeup and code on Gradescope. Additionally you should upload your code to Autolab. Please make sure the same person who submitted the writeup and code to Gradescope is the one who submits it to Autolab. Make sure you mark your partner as a collaborator on Gradescope (you do not need to do this in Autolab) and that both names are listed in the writeup. Writeups should be typeset in Latex and submitted as PDF. All code, including auxiliary scripts used for testing should be submitted with a README file to explain/document them.

Projects

The project for this course will be on using deep reinforcement learning to teach an agent to play Atari games. You may work in teams of 2 people. Only one person should submit the project report and code on Gradescope. Additionally you should upload your code to Autolab. Please make sure the same person who submitted the writeup and code to Gradescope is the one who submits it to Autolab. Make sure you mark your partner(s) as a collaborator on Gradescope (you do not need to do this in Autolab) and that both names are listed in the writeup. Writeups should be typeset in Latex and submitted as PDF. All code, including auxiliary scripts used for testing should be submitted with a README file to explain/document them. Please write your project report in LaTeX using the NIPS style file. (sty file, tex example)

Grace Day/Late Homework Policy

  • Homeworks: Each student has a total of 4 grace days that may be applied to the homework assignments. No more than 3 grace days may be used on any single assignment. Any assignment submitted more than 3 days past the deadline will get zero credit. Grace days will be subtracted from both students in the homework team. E.g. an assignment submitted 1 day late will result in both team members losing 1 grace day from their total allotment.
  • Projects: Each team will be allotted a total of 3 grace days on the project, separate from homework grace days (unused grace days from the homework assignments CANNOT be applied to the project). Project late days can be on the midway and final report, but not on the poster presentation. Any project submitted more than 3 days past the deadline will get zero credit.

Course Policies

Auditing

  • Official auditing of the course (i.e. taking the course for an “Audit” grade) is not permitted this semester.
  • Unofficial auditing of the course (i.e. watching the lectures online or attending them in person, but not turning in homeworks to grade) is welcome and permitted without prior approval. We give priority to students who are officially registered for the course, so informal auditors may only take a seat in the classroom is there is one available 10 minutes after the start of class. Unofficial auditors will not be given access to course materials such as homework assignments and exams.
  • Please email Shreyan if you need further clarification.

Extensions

In general, we do not grant extensions on assignments. There are several exceptions:
  • Medical Emergencies: If you are sick and unable to complete an assignment or attend class, please go to University Health Services. For minor illnesses, we expect grace days or our late penalties to provide sufficient accommodation. For medical emergencies (e.g. prolonged hospitalization), students may request an extension afterwards and should include a note from University Health Services.
  • Family/Personal Emergencies: If you have a family emergency (e.g. death in the family) or a personal emergency (e.g. mental health crisis), please contact your academic adviser or Counseling and Psychological Services (CaPS). In addition to offering support, they will reach out to the instructors for all your courses on your behalf to request an extension.
  • University-Approved Absences: If you are attending an out-of-town university approved event (e.g. multi-day athletic/academic trip organized by the university), you may request an extension for the duration of the trip. You must provide confirmation of your attendance, usually from a faculty or staff organizer of the event.
For any of the above situations, you may request an extension by emailing Shreyan . The email should be sent as soon as you are aware of the conflict and at least 5 days prior to the deadline. In the case of an emergency, no notice is needed.

Pass/Fail Policy

We allow you take the course as Pass/Fail. Instructor permission is not required. You must complete all aspects of the course (all homeworks and a project) if you take the course as Pass/Fail. What grade is the cutoff for Pass will depend on your program. Be sure to check with your program / department as to whether you can count a Pass/Fail course towards your degree requirements, notify us that you want to take the course pass/fail, and notify us of the Pass threshold your department uses (i.e., does it correspond to a grade of A, B, C, or D?)

Online and Waitlisted Students

  • All lecture videos will be recorded and made available online. We are currently working with the administration to try and create an online section for this course (10-703 B) so that we can work on getting everybody off of the waitlist and officially enrolled. Please be patient, as this may take time to complete. Past experience suggests that there will be sufficient seats in the classroom for everybody who wants to take the course, so we are optimistic that all students on the waitlist are likely to be able to register within the first few weeks.
  • Waitlisted students should complete all homework assignments with the rest of the class.
  • The first couple lectures are likely to be quite full -- so it'd be best for waitlist students to use the livestream. We'll let you know when seats start to become available. Once that happens, you are welcome to take a physical seat (GHC 4401) if there is an open one 5 minutes after class has started (e.g. 12:05pm)

Students with course conflicts

Students with timing conflicts (i.e., who have another class offered at the same time) will be permitted to take this course. However, there may be occasional days when we need for you to arrive in person during class time (e.g. for student presentations). We will let you know the dates that we require you to be available as soon as we know them.

Academic Integrity (Read this carefully!)

(Adapted from Roni Rosenfeld's 10-601 Spring 2016 Course Policies.)

Collaboration among Students

  • The purpose of student collaboration is to facilitate learning, not to circumvent it. Studying the material in groups is strongly encouraged. It is also allowed to seek help from other students in understanding the material needed to solve a particular homework problem, provided no written notes (including code) are shared, or are taken at that time, and provided learning is facilitated, not circumvented. The actual solution must be done by each student alone.
  • The presence or absence of any form of help or collaboration, whether given or received, must be explicitly stated and disclosed in full by all involved. Specifically, each assignment solution must include answering the following questions:
    1. Did you receive any help whatsoever from anyone in solving this assignment? Yes / No.
      • If you answered 'yes', give full details: ____________
      • (e.g. "Jane Doe explained to me what is asked in Question 3.4")
    2. Did you give any help whatsoever to anyone in solving this assignment? Yes / No.
      • If you answered 'yes', give full details: _____________
      • (e.g. "I pointed Joe Smith to section 2.3 since he didn't know how to proceed with Question 2")
    3. Did you find or come across code that implements any part of this assignment ? Yes / No. (See below policy on "found code")
      • If you answered 'yes', give full details: _____________
      • (book & page, URL & location within the page, etc.).
  • If you gave help after turning in your own assignment and/or after answering the questions above, you must update your answers before the assignment's deadline, if necessary by emailing the course staff.
  • Collaboration without full disclosure will be handled severely, in compliance with CMU's Policy on Cheating and Plagiarism.

Previously Used Assignments

Some of the homework assignments used in this class may have been used in prior versions of this class, or in classes at other institutions, or elsewhere. Solutions to them may be, or may have been, available online, or from other people or sources. It is explicitly forbidden to use any such sources, or to consult people who have solved these problems before. It is explicitly forbidden to search for these problems or their solutions on the internet. You must solve the homework assignments completely on your own. We will be actively monitoring your compliance. Collaboration with other students who are currently taking the class is allowed, but only under the conditions stated above.

Policy Regarding "Found Code"

You are encouraged to read books and other instructional materials, both online and offline, to help you understand the concepts and algorithms taught in class. These materials may contain example code or pseudo code, which may help you better understand an algorithm or an implementation detail. However, when you implement your own solution to an assignment, you must put all materials aside, and write your code completely on your own, starting “from scratch”. Specifically, you may not use any code you found or came across. If you find or come across code that implements any part of your assignment, you must disclose this fact in your collaboration statement.

Duty to Protect One's Work

Students are responsible for pro-actively protecting their work from copying and misuse by other students. If a student's work is copied by another student, the original author is also considered to be at fault and in gross violation of the course policies. It does not matter whether the author allowed the work to be copied or was merely negligent in preventing it from being copied. When overlapping work is submitted by different students, both students will be punished.
To protect future students, do not post your solutions publicly, neither during the course nor afterwards.

Penalties for Violations of Course Policies

All violations (even first one) of course policies will always be reported to the university authorities (your Department Head, Associate Dean, Dean of Student Affairs, etc.) as an official Academic Integrity Violation and will carry severe penalties.
  1. The penalty for the first violation is a one-and-a-half letter grade reduction. For example, if your final letter grade for the course was to be an A-, it would become a C+
  2. The penalty for the second violation is failure in the course, and can even lead to dismissal from the university.

Accommodations for Students with Disabilities

If you have a disability and have an accommodations letter from the Disability Resources office, please discuss your accommodation needs with Shreyan or one of the instructors as early in the semester as possible. We will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, I encourage you to contact them at access@andrew.cmu.edu.

Take care of yourself

Do your best to maintain a healthy lifestyle this semester by eating well, exercising, avoiding drugs and alcohol, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress. All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful. If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at www.cmu.edu/counseling . Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.