Abstract
During the past five years the Bayesian deep learning community has developed
increasingly accurate and efficient approximate inference procedures that allow
for Bayesian inference in deep neural networks. However, despite this
algorithmic progress and the promise of improved uncertainty quantification and
sample efficiency there are---as of early 2020---no publicized deployments of
Bayesian neural networks in industrial practice. In this work we cast doubt on
the current understanding of Bayes posteriors in popular deep neural networks:
we demonstrate through careful MCMC sampling that the posterior predictive
induced by the Bayes posterior yields systematically worse predictions compared
to simpler methods including point estimates obtained from SGD. Furthermore, we
demonstrate that predictive performance is improved significantly through the
use of a "cold posterior" that overcounts evidence. Such cold posteriors
sharply deviate from the Bayesian paradigm but are commonly used as heuristic
in Bayesian deep learning papers. We put forward several hypotheses that could
explain cold posteriors and evaluate the hypotheses through experiments. Our
work questions the goal of accurate posterior approximations in Bayesian deep
learning: If the true Bayes posterior is poor, what is the use of more accurate
approximations? Instead, we argue that it is timely to focus on understanding
the origin of the improved performance of cold posteriors.
Users
Please
log in to take part in the discussion (add own reviews or comments).