The posteriors over neural network weights are high dimensional and
multimodal. Each mode typically characterizes a meaningfully different
representation of the data. We develop Cyclical Stochastic Gradient MCMC
(SG-MCMC) to automatically explore such distributions. In particular, we
propose a cyclical stepsize schedule, where larger steps discover new modes,
and smaller steps characterize each mode. We prove that our proposed learning
rate schedule provides faster convergence to samples from a stationary
distribution than SG-MCMC with standard decaying schedules. Moreover, we
provide extensive experimental results to demonstrate the effectiveness of
cyclical SG-MCMC in learning complex multimodal distributions, especially for
fully Bayesian inference with modern deep neural networks.
log in to take part in the discussion (add own reviews or comments).