Multi-Agent Reinforcement Learning In Stochastic Games: From Alphago To Robust Control

APA

(2022). Multi-Agent Reinforcement Learning In Stochastic Games: From Alphago To Robust Control. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/multi-agent-reinforcement-learning-stochastic-games-alphago-robust-control

MLA

Multi-Agent Reinforcement Learning In Stochastic Games: From Alphago To Robust Control. The Simons Institute for the Theory of Computing, Feb. 11, 2022, https://simons.berkeley.edu/talks/multi-agent-reinforcement-learning-stochastic-games-alphago-robust-control

BibTex

          @misc{ scivideos_19620,
            doi = {},
            url = {https://simons.berkeley.edu/talks/multi-agent-reinforcement-learning-stochastic-games-alphago-robust-control},
            author = {},
            keywords = {},
            language = {en},
            title = {Multi-Agent Reinforcement Learning In Stochastic Games: From Alphago To Robust Control},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2022},
            month = {feb},
            note = {19620 see, \url{https://scivideos.org/Simons-Institute/19620}}
          }
          
Kaiqing Zhang (MIT)
Source Repository Simons Institute

Abstract

Reinforcement learning (RL) has recently achieved tremendous successes in several artificial intelligence applications. Many of the forefront applications of RL involve "multiple agents", e.g., playing chess and Go games, autonomous driving, and robotics. In this talk, I will introduce several recent works on multi-agent reinforcement learning (MARL) with theoretical guarantees. Specifically, we focus on solving the most basic multi-agent RL setting: infinite-horizon zero-sum stochastic games (Shapley 1953), using three common RL approaches: model-based, value-based, and policy-based ones. We first show that for the tabular setting, "model-based multi-agent RL" (estimating the model first and then planning) can achieve near-optimal sample complexity when a generative model of the game environment is available. Second, we show that a simple variant of "Q-learning" (value-based) can find the Nash equilibrium of the game, even if the agents run it independently/in a "fully decentralized" fashion. Third, we show that "policy gradient" methods (policy-based) can solve zero-sum stochastic games with linear dynamics and quadratic costs, which equivalently solves a robust and risk-sensitive control problem. With this connection to robust control, we discover that our policy gradient methods automatically preserve the robustness of the system during iterations, some phenomena we referred to as "implicit regularization". Time permitting, I will also discuss some ongoing and future directions along these lines.