Video URL

https://simons.berkeley.edu/talks/surprising-simplicity-early-time-learning-dynamics-neural-network…

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks in High Dimension

(2020). The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks in High Dimension. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/surprising-simplicity-early-time-learning-dynamics-neural-networks-high-dimension

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks in High Dimension. The Simons Institute for the Theory of Computing, Dec. 16, 2020, https://simons.berkeley.edu/talks/surprising-simplicity-early-time-learning-dynamics-neural-networks-high-dimension

          @misc{ scivideos_16877,
            doi = {},
            url = {https://simons.berkeley.edu/talks/surprising-simplicity-early-time-learning-dynamics-neural-networks-high-dimension},
            author = {},
            keywords = {},
            language = {en},
            title = {The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks in High Dimension},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2020},
            month = {dec},
            note = {16877 see, \url{https://scivideos.org/Simons-Institute/16877}}
          }

Wei Hu (Princeton University)

December 16, 2020

Source Repository Simons Institute

Subject

Computer Science

Abstract

Modern neural networks are often regarded as complex black-box functions whose behavior is difficult to understand owing to their nonlinear dependence on the data and the nonconvexity in their loss landscapes. In this work, we show that these common perceptions can be completely false in the early phase of learning. In particular, we formally prove that, for a class of well-behaved input distributions in high dimension, the early-time learning dynamics of a two-layer fully-connected neural network can be mimicked by training a simple linear model on the inputs. We additionally argue that this surprising simplicity can persist in networks with more layers and with convolutional architecture, which we verify empirically. Key to our analysis is to bound the spectral norm of the difference between the Neural Tangent Kernel (NTK) at initialization and an affine transform of the data kernel; however, unlike many previous results utilizing the NTK, we do not require the network to have disproportionately large width, and the network is allowed to escape the kernel regime later in training. Link to paper: https://arxiv.org/abs/2006.14599

Supported by

Video URL

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks in High Dimension

Abstract

String Theory Lecture

Quantum Gravity Lecture

Measure Transport Perspectives on Sampling, Generative Modeling, and Beyond

String Theory Lecture

Quantum Gravity Lecture

Video URL

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks in High Dimension

APA

MLA

BibTex

Abstract