copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

J. Lee, L. Xiao, S. Schoenholz, Y. Bahri, R. Novak, J. Sohl-Dickstein, and J. Pennington. (2019)cite arxiv:1902.06720Comment: 11+17 pages, open-source code available at https://github.com/google/neural-tangents.

Abstract

A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Furthermore, mirroring the correspondence between wide Bayesian neural networks and Gaussian processes, gradient-based training of wide neural networks with a squared loss produces test set predictions drawn from a Gaussian process with a particular compositional kernel. While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized version even for finite practically-sized networks. This agreement is robust across different architectures, optimization methods, and loss functions.

Description

[1902.06720] Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Links and resources

BibTeX key: lee2019neural
entry type: article
year: 2019
url: http://arxiv.org/abs/1902.06720
note: cite arxiv:1902.06720Comment: 11+17 pages, open-source code available at https://github.com/google/neural-tangents

@kirk86's tags highlighted

Cite this publication

@article{lee2019neural, abstract = {A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Furthermore, mirroring the correspondence between wide Bayesian neural networks and Gaussian processes, gradient-based training of wide neural networks with a squared loss produces test set predictions drawn from a Gaussian process with a particular compositional kernel. While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized version even for finite practically-sized networks. This agreement is robust across different architectures, optimization methods, and loss functions.}, added-at = {2019-09-25T05:57:51.000+0200}, author = {Lee, Jaehoon and Xiao, Lechao and Schoenholz, Samuel S. and Bahri, Yasaman and Novak, Roman and Sohl-Dickstein, Jascha and Pennington, Jeffrey}, biburl = {https://www.bibsonomy.org/bibtex/21ac3cd2d8cc07d0326969c468e36fbe3/kirk86}, description = {[1902.06720] Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent}, interhash = {cfe4ba10654a3f5d3d38bdbdc013c46b}, intrahash = {1ac3cd2d8cc07d0326969c468e36fbe3}, keywords = {deep-learning generalization kernels optimization readings theory}, note = {cite arxiv:1902.06720Comment: 11+17 pages, open-source code available at https://github.com/google/neural-tangents}, timestamp = {2019-09-25T05:58:19.000+0200}, title = {Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent}, url = {http://arxiv.org/abs/1902.06720}, year = 2019 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Comments and Reviews
(0)