Bayesian Deep Learning and a Probabilistic Perspective of Generalization
A. Wilson, and P. Izmailov. (2020)cite arxiv:2002.08791Comment: 27 pages, 17 figures.
Abstract
The key distinguishing property of a Bayesian approach is marginalization,
rather than using a single setting of weights. Bayesian marginalization can
particularly improve the accuracy and calibration of modern deep neural
networks, which are typically underspecified by the data, and can represent
many compelling but different solutions. We show that deep ensembles provide an
effective mechanism for approximate Bayesian marginalization, and propose a
related approach that further improves the predictive distribution by
marginalizing within basins of attraction, without significant overhead. We
also investigate the prior over functions implied by a vague distribution over
neural network weights, explaining the generalization properties of such models
from a probabilistic perspective. From this perspective, we explain results
that have been presented as mysterious and distinct to neural network
generalization, such as the ability to fit images with random labels, and show
that these results can be reproduced with Gaussian processes. Finally, we
provide a Bayesian perspective on tempering for calibrating predictive
distributions.
Description
[2002.08791] Bayesian Deep Learning and a Probabilistic Perspective of Generalization
%0 Journal Article
%1 wilson2020bayesian
%A Wilson, Andrew Gordon
%A Izmailov, Pavel
%D 2020
%K bayesian generalization readings uncertainty
%T Bayesian Deep Learning and a Probabilistic Perspective of Generalization
%U http://arxiv.org/abs/2002.08791
%X The key distinguishing property of a Bayesian approach is marginalization,
rather than using a single setting of weights. Bayesian marginalization can
particularly improve the accuracy and calibration of modern deep neural
networks, which are typically underspecified by the data, and can represent
many compelling but different solutions. We show that deep ensembles provide an
effective mechanism for approximate Bayesian marginalization, and propose a
related approach that further improves the predictive distribution by
marginalizing within basins of attraction, without significant overhead. We
also investigate the prior over functions implied by a vague distribution over
neural network weights, explaining the generalization properties of such models
from a probabilistic perspective. From this perspective, we explain results
that have been presented as mysterious and distinct to neural network
generalization, such as the ability to fit images with random labels, and show
that these results can be reproduced with Gaussian processes. Finally, we
provide a Bayesian perspective on tempering for calibrating predictive
distributions.
@article{wilson2020bayesian,
abstract = {The key distinguishing property of a Bayesian approach is marginalization,
rather than using a single setting of weights. Bayesian marginalization can
particularly improve the accuracy and calibration of modern deep neural
networks, which are typically underspecified by the data, and can represent
many compelling but different solutions. We show that deep ensembles provide an
effective mechanism for approximate Bayesian marginalization, and propose a
related approach that further improves the predictive distribution by
marginalizing within basins of attraction, without significant overhead. We
also investigate the prior over functions implied by a vague distribution over
neural network weights, explaining the generalization properties of such models
from a probabilistic perspective. From this perspective, we explain results
that have been presented as mysterious and distinct to neural network
generalization, such as the ability to fit images with random labels, and show
that these results can be reproduced with Gaussian processes. Finally, we
provide a Bayesian perspective on tempering for calibrating predictive
distributions.},
added-at = {2020-02-22T03:09:14.000+0100},
author = {Wilson, Andrew Gordon and Izmailov, Pavel},
biburl = {https://www.bibsonomy.org/bibtex/2297012b20a1dfeb5b20d23e47aeb57ad/kirk86},
description = {[2002.08791] Bayesian Deep Learning and a Probabilistic Perspective of Generalization},
interhash = {afbe805c6d9cbc644218b1735bb3f14e},
intrahash = {297012b20a1dfeb5b20d23e47aeb57ad},
keywords = {bayesian generalization readings uncertainty},
note = {cite arxiv:2002.08791Comment: 27 pages, 17 figures},
timestamp = {2020-02-22T03:09:14.000+0100},
title = {Bayesian Deep Learning and a Probabilistic Perspective of Generalization},
url = {http://arxiv.org/abs/2002.08791},
year = 2020
}