Spring 2026, University of Wisconsin-Madison
| Instructor | Tengyang Xie |
| Time | Monday, 2:25pm - 5:25pm |
| Location | Morgridge Hall 2538 |
| Office Hours | After class + on-demand |
| Announcements | Canvas |
| Homework Submission | Gradescope |
| Q&A | Piazza |
This is a PhD-level course on the theory of reinforcement learning (RL). While it is impossible to cover all important topics in RL, the central theme of this course is to understand the mathematical principles behind classic RL ideas and algorithms. Topics include (tentative):
| Date | Topic | Materials |
|---|---|---|
| Jan 26 | Course Overview, MDPs. Bellman Equation and Optimality. | Slides, Note1-2 |
| Feb 02 | Value Iteration and Policy Iteration. | Note3, Note4 |
| Feb 09 | Policy Iteration (cont'd). Concentration Inequalities and Uniform Convergence. | Note5 |
| Feb 16 | Tabular Analysis. | Note6, HW1 (Due 03/02) |
| Feb 23 | Fitted Q-Iteration. | Note7 |
| Mar 02 | Policy Evaluation. | Note8, HW2 (Due 03/18) |
| Mar 09 | Policy Optimization. | Note9 |
| Mar 16 | Natural Policy Gradient and Global Optimality. | Note10 |
| Mar 23 | Strategic Exploration. | Note11 |
The grading for the course will be based on (tentative, subject to ±10% changes):
Problem sets and deadlines are posted on the course webpage (see schedule above), and solutions should be submitted to Gradescope. The deadlines on the schedule are tentative; please refer to Gradescope for the exact time.
Homeworks must be submitted by the posted due date. You are allowed up to 6 total late days for the homeworks throughout the entire semester. These will be automatically deducted if your assignment is late.
After your late days are used up, late penalties will be applied:
We will track all your late days and any deductions will be applied in computing the final grades. If you are unable to turn in HWs on time, aside from permitted days, then do not enroll in the course.
See Project Topics & References for a list of suggested papers.
Students will complete a course project, which includes:
The project can take one of the following forms:
This is a theory-oriented course. Students should have a solid background in:
Prior exposure to reinforcement learning is helpful but not required.
Homework assignments and project must be completed individually. Cheating and plagiarism will be dealt with in accordance with University procedures (see the UW-Madison Academic Misconduct Rules and Procedures).
You are encouraged to discuss ideas, approaches, and techniques broadly with your peers or the instructor, but not at a level of detail where specific solutions are described by anyone.
If you have any questions about this policy, please ask the instructor before you act.
The use of artificial intelligence (AI) tools and applications (including, but not limited to, Copilot, ChatGPT, Gemini, Claude and others) for course assignments and assessments does not support the learning objectives of this course and is prohibited. However, you may use AI tools to help understand lecture content.
Using AI for homework assignments or project is a violation of the course's expectations and will be addressed through UW-Madison's academic misconduct policy, specifically UWS 14.03(1)(b): Uses unauthorized materials or fabricated data in any academic exercise.