Statistical and Algorithmic Foundations of Reinforcement Learning

Reinforcement learning (RL), which is frequently modeled as learning and decision making in Markov decision processes (MDP), is garnering growing interest in recent years due to its remarkable success in practice. A core objective of RL is to search for a policy - based on a collection of noisy data samples - that approximately maximizes expected rewards in an MDP, without direct access to a precise description of the underlying model. In contemporary applications, it is increasingly more common to encounter environments with prohibitively large state and action space, thus challenging the paradigm of RL in terms of both sample and computation efficiencies.

Broadly speaking, there are two common algorithmic approaches: model-based and model-free. In the model-based approach, one first learns to describe the unknown model using the data samples in hand, and then leverages the fitted model to perform planning - a task that can be accomplished by resorting to Bellman's principle of optimality. In comparison, the model-free approach attempts to compute the optimal policy (and the optimal value function) without learning the model explicitly, which lends itself to scenarios when a realistic model is difficult to construct or changes on the fly. There is a great interest in characterizing the trade-offs between sample complexity, computation complexity, and statistical accuracy of RL algorithms, and to design improved algorithms that achieve better trade-offs and meet new considerations. At the same time, we are actively pursuing applications of RL in real-world science and engineering applications such as wireless networks, graph mining and more.

Overview

Sample Complexity of Multi-agent and Federated Reinforcement Learning

Sample Complexity of Single-agent Reinforcement Learning

Global Convergence of Multi-agent and Federated Policy Optimization

Global Convergence of Single-agent Policy Optimization

Function Approximation

Empirical RL