Abstract
The posteriors over neural network weights are high dimensional and
multimodal. Each mode typically characterizes a meaningfully different
representation of the data. We develop Cyclical Stochastic Gradient MCMC
(SG-MCMC) to automatically explore such distributions. In particular, we
propose a cyclical stepsize schedule, where larger steps discover new modes,
and smaller steps characterize each mode. We prove that our proposed learning
rate schedule provides faster convergence to samples from a stationary
distribution than SG-MCMC with standard decaying schedules. Moreover, we
provide extensive experimental results to demonstrate the effectiveness of
cyclical SG-MCMC in learning complex multimodal distributions, especially for
fully Bayesian inference with modern deep neural networks.
Users
Please
log in to take part in the discussion (add own reviews or comments).