CS 839: Mathematical Principles of Reinforcement Learning

Spring 2026, University of Wisconsin-Madison

Course Information

Instructor	Tengyang Xie
Time	Monday, 2:25pm - 5:25pm
Location	Morgridge Hall 2538
Office Hours	After class + on-demand
Announcements	Canvas
Homework Submission	Gradescope
Q&A	Piazza

Course Description

This is a PhD-level course on the theory of reinforcement learning (RL). While it is impossible to cover all important topics in RL, the central theme of this course is to understand the mathematical principles behind classic RL ideas and algorithms. Topics include (tentative):

Fundamentals: Markov Decision Processes (MDPs), value iteration, policy iteration
Classic RL theory: fitted Q-iteration, error propagation, importance sampling, policy gradient
Exploration and exploitation: optimism, pessimism
Modern topics

Schedule

Important: This schedule is tentative and subject to change. Please check back often. In particular, the deadlines for the homework sets/project may change; please see Gradescope for the actual deadlines.

Date	Topic	Materials
Jan 26	Course Overview, MDPs. Bellman Equation and Optimality.	Slides, Note1-2
Feb 02	Value Iteration and Policy Iteration.	Note3, Note4
Feb 09	Policy Iteration (cont'd). Concentration Inequalities and Uniform Convergence.	Note5
Feb 16	Tabular Analysis.	Note6, HW1 (Due 03/02)
Feb 23	Fitted Q-Iteration.	Note7
Mar 02	Policy Evaluation.	Note8, HW2 (Due 03/18)
Mar 09	Policy Optimization.	Note9
Mar 16	Natural Policy Gradient and Global Optimality.	Note10
Mar 23	Strategic Exploration.	Note11
Mar 30	Spring break (no class).
Apr 06	Strategic Exploration.	Note11
Apr 13	Offline Reinforcement Learning.	Note12

Grading

The grading for the course will be based on (tentative, subject to ±10% changes):

3-4 Homework Assignments ~ 50%
Course Project ~ 50%

Problem sets and deadlines are posted on the course webpage (see schedule above), and solutions should be submitted to Gradescope. The deadlines on the schedule are tentative; please refer to Gradescope for the exact time.

Late Homework Policy

Homeworks must be submitted by the posted due date. You are allowed up to 6 total late days for the homeworks throughout the entire semester. These will be automatically deducted if your assignment is late.

How late days are counted: Late days are counted by the ceiling of hours late divided by 24. For example, being 1 hour late uses 1 late day, and being 25 hours late uses 2 late days.

After your late days are used up, late penalties will be applied:

Up to 24 hours late: 33% penalty
Up to 48 hours late: 66% penalty
Beyond 48 hours: no credit

We will track all your late days and any deductions will be applied in computing the final grades. If you are unable to turn in HWs on time, aside from permitted days, then do not enroll in the course.

Course Project

Last Updated: 02/09/2026

See Project Topics & References for a list of suggested papers.

Students will complete a course project, which includes:

A short project proposal
A project report
A final presentation

The project can take one of the following forms:

Reproduce and understand theoretical analysis from an existing paper
Work on novel research questions related to RL theory (discuss with me first)
Start with reproducing, then try to extend — recommended for most students

Prerequisites

This is a theory-oriented course. Students should have a solid background in:

Linear algebra
Probability and statistics
Machine learning fundamentals (e.g., CS 760)
Mathematical maturity and proof-writing experience

Prior exposure to reinforcement learning is helpful but not required.

Academic Integrity and Collaboration Policy

Homework assignments and project must be completed individually. Cheating and plagiarism will be dealt with in accordance with University procedures (see the UW-Madison Academic Misconduct Rules and Procedures).

You are encouraged to discuss ideas, approaches, and techniques broadly with your peers or the instructor, but not at a level of detail where specific solutions are described by anyone.

If you have any questions about this policy, please ask the instructor before you act.

AI Policy

The use of artificial intelligence (AI) tools and applications (including, but not limited to, Copilot, ChatGPT, Gemini, Claude and others) for course assignments and assessments does not support the learning objectives of this course and is prohibited. However, you may use AI tools to help understand lecture content.

Using AI for homework assignments or project is a violation of the course's expectations and will be addressed through UW-Madison's academic misconduct policy, specifically UWS 14.03(1)(b): Uses unauthorized materials or fabricated data in any academic exercise.

Resources

Reinforcement Learning: An Introduction by Sutton and Barto
Reinforcement Learning: Theory and Algorithms by Agarwal, Jiang, Kakade, and Sun

Back to homepage