Video URL

https://simons.berkeley.edu/talks/tbd-245

Q-learning with Uniformly Bounded Variance

(2020). Q-learning with Uniformly Bounded Variance. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/tbd-245

Q-learning with Uniformly Bounded Variance. The Simons Institute for the Theory of Computing, Dec. 02, 2020, https://simons.berkeley.edu/talks/tbd-245

          @misc{ scivideos_16825,
            doi = {},
            url = {https://simons.berkeley.edu/talks/tbd-245},
            author = {},
            keywords = {},
            language = {en},
            title = {Q-learning with Uniformly Bounded Variance},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2020},
            month = {dec},
            note = {16825 see, \url{https://scivideos.org/Simons-Institute/16825}}
          }

Adithya Devraj (Stanford)

December 02, 2020

Source Repository Simons Institute

Subject

Computer Science

Abstract

Sample complexity bounds are a common performance metric in the RL literature. In the discounted cost, infinite horizon setting, all of the known bounds can be arbitrarily large, as the discount factor approaches unity. For a large discount factor, these bounds seem to imply that a very large number of samples is required to achieve an epsilon-optimal policy. In this talk, we will discuss a new class of algorithms that have sample complexity uniformly bounded for all discount factors. One may argue that this is impossible, due to a recent min-max lower bound. The explanation is that this previous lower bound is for a specific problem, which we modify, without compromising the ultimate objective of obtaining an epsilon-optimal policy. Specifically, we show that the asymptotic covariance of the Q-learning algorithm with an optimized step-size sequence is a quadratic function of a factor that goes to infinity, as discount factor gets close to 1; an expected, and essentially known result. The new relative Q-learning algorithm proposed here is shown to have asymptotic covariance that is uniformly bounded for all discount factors.

Supported by

Video URL

Q-learning with Uniformly Bounded Variance

Abstract

Perspectives on Communicating Physics to the Public

Analytical Methods for Inflation Correlators.

Black holes and gravitational waves

Cosmology from Galaxy Surveys

Gyroscopic gravitational memory from binary systems

Video URL

Q-learning with Uniformly Bounded Variance

APA

MLA

BibTex

Abstract