ECE 18-813B: Special Topics in Artificial Intelligence: Foundations of Reinforcement Learning

Reinforcement learning (RL), which is modeled as sequential decision making in the face of uncertainty, has garnered growing interest in recent years due to its remarkable success in practice. However, the explosion of complexity in emerging applications and the presence of nonconvexity exacerbate the challenge of achieving efficient RL in resource-constrained situations, where data collection and computation is expensive, time-consuming, or even high-stakes (e.g., in clinical trials, autonomous systems, and online advertising). Despite decades-long research efforts, however, the theoretical underpinnings of RL remain far from mature, especially when it comes to understanding and enhancing the sample and computational efficiencies of RL algorithms. An explosion of research has been conducted over the past few years towards advancing the frontiers of these topics, which leverage toolkits that sit at the intersection of multiple fields, including but not limited to control, optimization, statistics and learning.

This course aims to present a coherent framework that covers important algorithmic developments in modern RL, highlighting the connections between new ideas and classical topics. Employing Markov Decision Processes (MDPs) as the central mathematical framework, we will cover multiple important scenarios including but not limited to the simulator setting, online RL, offline RL, and multi-agent RL, gravitating our discussions around issues such as sample complexity, computational efficiency, function approximation, distributional robustness, as well as information-theoretic and algorithmic-dependent lower bounds.

Course Syllabus

Recommended Readings

Lecture Notes

Note: some figures in the slides are taken from the internet or public resources, without proper acknowledgement. I apologize for these omissions in advance, which will be fixed at a later time.

Suggested Readings for Presentations/Projects (incomplete)

You can also select another paper upon the instructor's approval.

  1. Constrained MDP: CRPO, sample complexity, multi-objective RL

  2. Robust MDP: sample complexity, policy optimization, corruption-robust RL

  3. Risk-sensitive RL: regret

  4. Convex MDP: reward is enough

  5. Multi-agent RL: V-learning, CCE learning

  6. Representation learning: decision-estimation coefficient, contrastive representation learning, low-rank MDP

  7. policy gradient methods: general theory, PG for control, variance-reduced PG

  8. Offline RL: doubly-robust OPE, pessimism, offline RL with realizability, actor-critic, adversarial model

  9. Additional topics: RLHF, the role of coverage, POMDP