Recent years have witnessed both tremendous empirical successes and fast-growing theoretical development of reinforcement learning (RL), in solving many sequential decision-making and control tasks. However, many RL algorithms are still several miles away from being applied to practical autonomous systems, which usually involve more complicated scenarios with multiple decision-makers and safety-critical concerns. In this talk, I will introduce our work on the development of RL algorithms with provable guarantees, with focuses on the multi-agent and safety-critical settings. I will first show that policy optimization, one of the main drivers of the empirical successes of RL, enjoys global convergence and sample complexity guarantees for a class of robust control problems. More importantly, we show that certain policy optimization approaches automatically preserve some "robustness" during the iterations, some property we termed as "implicit regularization". Interestingly, such a setting naturally unifies other important benchmark settings in control and game theory: risk-sensitive control design, and linear quadratic zero-sum dynamic games, while the latter is the benchmark multi-agent RL (MARL) setting that mirrors the role played by linear quadratic regulators (LQR) for single-agent RL. Despite the nonconvexity and the fundamental challenges in the optimization landscape, our theory shows that policy optimization enjoys global convergence guarantees in these problems as well. The results have then provided some theoretical justifications for several basic robust RL and MARL settings that are popular in the empirical RL world. In addition, I will introduce several other works along this line of provable MARL and robust RL, including decentralized MARL with networked agents, sample complexity of model-based MARL, etc. Time permitting, I will also share several future directions based on the previous results, towards large-scale and reliable autonomy.
Bio: Kaiqing is a PhD Candidate in the Department of Electrical and Computer Engineering (ECE) and the Coordinated Science Laboratory (CSL) at the University of Illinois at Urbana-Champaign (UIUC), working with Professor Tamer Başar. He received his BS in 2015 from Tsinghua University and MS in both Applied Mathematics and ECE in 2017 from UIUC. His research interests lie in the intersections of control theory, game theory, and reinforcement learning theory; with applications in intelligent and distributed multi-agent systems, including smart grid, robotics, and transportation systems.