Video URL

https://simons.berkeley.edu/talks/learning-deep-relu-networks-fixed-parameter-tractable

Learning Deep ReLU Networks is Fixed-Parameter Tractable

(2020). Learning Deep ReLU Networks is Fixed-Parameter Tractable. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/learning-deep-relu-networks-fixed-parameter-tractable

Learning Deep ReLU Networks is Fixed-Parameter Tractable. The Simons Institute for the Theory of Computing, Dec. 16, 2020, https://simons.berkeley.edu/talks/learning-deep-relu-networks-fixed-parameter-tractable

          @misc{ scivideos_16879,
            doi = {},
            url = {https://simons.berkeley.edu/talks/learning-deep-relu-networks-fixed-parameter-tractable},
            author = {},
            keywords = {},
            language = {en},
            title = {Learning Deep ReLU Networks is Fixed-Parameter Tractable},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2020},
            month = {dec},
            note = {16879 see, \url{https://scivideos.org/Simons-Institute/16879}}
          }

Sitan Chen (MIT)

December 16, 2020

Source Repository Simons Institute

Subject

Computer Science

Abstract

We consider the problem of learning an unknown ReLU network with an arbitrary number of layers under Gaussian inputs and obtain the first nontrivial results for networks of depth more than two. We give an algorithm whose running time is a fixed polynomial in the ambient dimension and some (exponentially large) function of only the network's parameters. These results provably cannot be obtained using gradient-based methods and give the first example of a class of efficiently learnable neural networks that gradient descent will fail to learn. In contrast, prior work for learning networks of depth three or higher requires exponential time in the ambient dimension, while prior work for the depth-two case requires well-conditioned weights and/or positive coefficients to obtain efficient run-times. Our algorithm does not require these assumptions. Our main technical tool is a type of filtered PCA that can be used to iteratively recover an approximate basis for the subspace spanned by the hidden units in the first layer. Our analysis leverages new structural results on lattice polynomials from tropical geometry. Based on joint work with Adam Klivans and Raghu Meka.

Supported by

Video URL

Learning Deep ReLU Networks is Fixed-Parameter Tractable

Abstract

Quantum Gravity Seminar Series - TBA

Bridging Scales in Black Hole Accretion and Feedback: Magnetized Bondi Accretion in 3D GRMHD

Machine Learning Lecture

Machine Learning Lecture

Hypertoric 2-Categories O and Symplectic Duality

Video URL

Learning Deep ReLU Networks is Fixed-Parameter Tractable

APA

MLA

BibTex

Abstract