copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

R. Novak, L. Xiao, J. Lee, Y. Bahri, G. Yang, J. Hron, D. Abolafia, J. Pennington, and J. Sohl-Dickstein. (2018)cite arxiv:1810.05148Comment: Published as a conference paper at ICLR 2019.

Abstract

There is a previously identified equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs). This equivalence enables, for instance, test set predictions that would have resulted from a fully Bayesian, infinitely wide trained FCN to be computed without ever instantiating the FCN, but by instead evaluating the corresponding GP. In this work, we derive an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and achieve state of the art results on CIFAR10 for GPs without trainable kernels. We also introduce a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible. Surprisingly, in the absence of pooling layers, the GPs corresponding to CNNs with and without weight sharing are identical. As a consequence, translation equivariance, beneficial in finite channel CNNs trained with stochastic gradient descent (SGD), is guaranteed to play no role in the Bayesian treatment of the infinite channel limit - a qualitative difference between the two regimes that is not present in the FCN case. We confirm experimentally, that while in some scenarios the performance of SGD-trained finite CNNs approaches that of the corresponding GPs as the channel count increases, with careful tuning SGD-trained CNNs can significantly outperform their corresponding GPs, suggesting advantages from SGD training compared to fully Bayesian parameter estimation.

Description

[1810.05148] Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

Links and resources

BibTeX key: novak2018bayesian
entry type: book
year: 2018
url: http://arxiv.org/abs/1810.05148
note: cite arxiv:1810.05148Comment: Published as a conference paper at ICLR 2019

Cite this publication

@book{novak2018bayesian, abstract = {There is a previously identified equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs). This equivalence enables, for instance, test set predictions that would have resulted from a fully Bayesian, infinitely wide trained FCN to be computed without ever instantiating the FCN, but by instead evaluating the corresponding GP. In this work, we derive an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and achieve state of the art results on CIFAR10 for GPs without trainable kernels. We also introduce a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible. Surprisingly, in the absence of pooling layers, the GPs corresponding to CNNs with and without weight sharing are identical. As a consequence, translation equivariance, beneficial in finite channel CNNs trained with stochastic gradient descent (SGD), is guaranteed to play no role in the Bayesian treatment of the infinite channel limit - a qualitative difference between the two regimes that is not present in the FCN case. We confirm experimentally, that while in some scenarios the performance of SGD-trained finite CNNs approaches that of the corresponding GPs as the channel count increases, with careful tuning SGD-trained CNNs can significantly outperform their corresponding GPs, suggesting advantages from SGD training compared to fully Bayesian parameter estimation.}, added-at = {2019-09-26T15:08:16.000+0200}, author = {Novak, Roman and Xiao, Lechao and Lee, Jaehoon and Bahri, Yasaman and Yang, Greg and Hron, Jiri and Abolafia, Daniel A. and Pennington, Jeffrey and Sohl-Dickstein, Jascha}, biburl = {https://www.bibsonomy.org/bibtex/22cc1bd4f3982a445bceb0bc72bca612a/kirk86}, description = {[1810.05148] Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes}, interhash = {1776f78b4892a2c13a0bda5edfd490be}, intrahash = {2cc1bd4f3982a445bceb0bc72bca612a}, keywords = {bayesian deep-learning gaussian-proceses generalization readings}, note = {cite arxiv:1810.05148Comment: Published as a conference paper at ICLR 2019}, timestamp = {2019-09-26T15:12:37.000+0200}, title = {Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes}, url = {http://arxiv.org/abs/1810.05148}, year = 2018 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

Comments and Reviews
(0)