CS 839: Mathematical Principles of Reinforcement Learning

Spring 2026, University of Wisconsin-Madison

Course Information

InstructorTengyang Xie
TimeMonday, 2:25pm - 5:25pm
LocationMorgridge Hall 2538
Office HoursAfter class + on-demand
AnnouncementsCanvas
Homework SubmissionGradescope
Q&APiazza

Course Description

This is a PhD-level course on the theory of reinforcement learning (RL). While it is impossible to cover all important topics in RL, the central theme of this course is to understand the mathematical principles behind classic RL ideas and algorithms. Topics include (tentative):

Schedule

Important: This schedule is tentative and subject to change. Please check back often. In particular, the deadlines for the homework sets/project may change; please see Gradescope for the actual deadlines.
Date Topic Materials
Jan 26 Course Overview, MDPs. Bellman Equation and Optimality. Slides, Note1-2
Feb 02 Value Iteration and Policy Iteration. Note3, Note4
Feb 09 Policy Iteration (cont'd). Concentration Inequalities and Uniform Convergence. Note5
Feb 16 Tabular Analysis. Note6, HW1 (Due 03/02)
Feb 23 Fitted Q-Iteration. Note7
Mar 02 Policy Evaluation. Note8, HW2 (Due 03/18)
Mar 09 Policy Optimization. Note9
Mar 16 Natural Policy Gradient and Global Optimality. Note10
Mar 23 Strategic Exploration. Note11

Grading

The grading for the course will be based on (tentative, subject to ±10% changes):

Problem sets and deadlines are posted on the course webpage (see schedule above), and solutions should be submitted to Gradescope. The deadlines on the schedule are tentative; please refer to Gradescope for the exact time.

Late Homework Policy

Homeworks must be submitted by the posted due date. You are allowed up to 6 total late days for the homeworks throughout the entire semester. These will be automatically deducted if your assignment is late.

How late days are counted: Late days are counted by the ceiling of hours late divided by 24. For example, being 1 hour late uses 1 late day, and being 25 hours late uses 2 late days.

After your late days are used up, late penalties will be applied:

We will track all your late days and any deductions will be applied in computing the final grades. If you are unable to turn in HWs on time, aside from permitted days, then do not enroll in the course.

Course Project

Last Updated: 02/09/2026

See Project Topics & References for a list of suggested papers.

Students will complete a course project, which includes:

The project can take one of the following forms:

Prerequisites

This is a theory-oriented course. Students should have a solid background in:

Prior exposure to reinforcement learning is helpful but not required.

Academic Integrity and Collaboration Policy

Homework assignments and project must be completed individually. Cheating and plagiarism will be dealt with in accordance with University procedures (see the UW-Madison Academic Misconduct Rules and Procedures).

You are encouraged to discuss ideas, approaches, and techniques broadly with your peers or the instructor, but not at a level of detail where specific solutions are described by anyone.

If you have any questions about this policy, please ask the instructor before you act.

AI Policy

The use of artificial intelligence (AI) tools and applications (including, but not limited to, Copilot, ChatGPT, Gemini, Claude and others) for course assignments and assessments does not support the learning objectives of this course and is prohibited. However, you may use AI tools to help understand lecture content.

Using AI for homework assignments or project is a violation of the course's expectations and will be addressed through UW-Madison's academic misconduct policy, specifically UWS 14.03(1)(b): Uses unauthorized materials or fabricated data in any academic exercise.

Resources

Back to homepage