Zap Q-learning with Nonlinear Function Approximation

APA

(2020). Zap Q-learning with Nonlinear Function Approximation. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/tbd-244

MLA

Zap Q-learning with Nonlinear Function Approximation. The Simons Institute for the Theory of Computing, Dec. 02, 2020, https://simons.berkeley.edu/talks/tbd-244

BibTex

          @misc{ scivideos_16824,
            doi = {},
            url = {https://simons.berkeley.edu/talks/tbd-244},
            author = {},
            keywords = {},
            language = {en},
            title = {Zap Q-learning with Nonlinear Function Approximation},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2020},
            month = {dec},
            note = {16824 see, \url{https://scivideos.org/Simons-Institute/16824}}
          }
          
Sean Meyn (University of Florida)
Source Repository Simons Institute

Abstract

Zap Q-learning is a recent class of reinforcement learning algorithms, motivated primarily as a means to accelerate convergence. Stability theory has been absent outside of two restrictive classes: the tabular setting, and optimal stopping. This paper introduces a new framework for analysis of a more general class of recursive algorithms known as stochastic approximation. Based on this general theory, it is shown that Zap Q-learning is consistent under a non-degeneracy assumption, even when the function approximation architecture is nonlinear. Zap Q-learning with neural network function approximation emerges as a special case, and is tested on examples from OpenAI Gym. Based on multiple experiments with a range of neural network sizes, it is found that the new algorithms converge quickly and are robust to choice of function approximation architecture.