Video URL

https://simons.berkeley.edu/talks/global-convergence-and-approximation-benefits-policy-gradient-met…

On the Global Convergence and Approximation Benefits of Policy Gradient Methods

(2020). On the Global Convergence and Approximation Benefits of Policy Gradient Methods. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/global-convergence-and-approximation-benefits-policy-gradient-methods

On the Global Convergence and Approximation Benefits of Policy Gradient Methods. The Simons Institute for the Theory of Computing, Oct. 30, 2020, https://simons.berkeley.edu/talks/global-convergence-and-approximation-benefits-policy-gradient-methods

          @misc{ scivideos_16707,
            doi = {},
            url = {https://simons.berkeley.edu/talks/global-convergence-and-approximation-benefits-policy-gradient-methods},
            author = {},
            keywords = {},
            language = {en},
            title = {On the Global Convergence and Approximation Benefits of Policy Gradient Methods},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2020},
            month = {oct},
            note = {16707 see, \url{https://scivideos.org/Simons-Institute/16707}}
          }

Daniel Russo (Columbia University)

October 30, 2020

Source Repository Simons Institute

Subject

Computer Science

Abstract

Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, due to the multi-period nature of the objective, policy gradient algorithms face non-convex optimization problems and can get stuck in suboptimal local minima even for extremely simple problems. This talk with discus structural properties ‚Äì shared by several canonical control problems ‚Äì that guarantee the policy gradient objective function has no suboptimal stationary points despite being non-convex. Time permitting, I‚Äôll then zoom in on the special case of state aggregated policies and a proof showing that policy gradient converges to better policies than its relative, approximate policy iteration.

Supported by

Video URL

On the Global Convergence and Approximation Benefits of Policy Gradient Methods

Abstract

Strong Gravity Lecture

GPTs and the probabilistic foundations of quantum theory - Lecture

Mathematical Physics Lecture

Hardware-efficient quantum computing using qudits

String Theory Lecture

Video URL

On the Global Convergence and Approximation Benefits of Policy Gradient Methods

APA

MLA

BibTex

Abstract