Information-Theoretic Generalization Bounds for SGLD via Data-Dependent
Estimates
J. Negrea, M. Haghifam, G. Dziugaite, A. Khisti, and D. Roy. (2019)cite arxiv:1911.02151Comment: 23 pages, 1 figure. To appear in, Advances in Neural Information Processing Systems (33), 2019.
Abstract
In this work, we improve upon the stepwise analysis of noisy iterative
learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently
extended by Bu, Zou, and Veeravalli (2019). Our main contributions are
significantly improved mutual information bounds for Stochastic Gradient
Langevin Dynamics via data-dependent estimates. Our approach is based on the
variational characterization of mutual information and the use of
data-dependent priors that forecast the mini-batch gradient based on a subset
of the training samples. Our approach is broadly applicable within the
information-theoretic framework of Russo and Zou (2015) and Xu and Raginsky
(2017). Our bound can be tied to a measure of flatness of the empirical risk
surface. As compared with other bounds that depend on the squared norms of
gradients, empirical investigations show that the terms in our bounds are
orders of magnitude smaller.
Description
[1911.02151] Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates
%0 Journal Article
%1 negrea2019informationtheoretic
%A Negrea, Jeffrey
%A Haghifam, Mahdi
%A Dziugaite, Gintare Karolina
%A Khisti, Ashish
%A Roy, Daniel M.
%D 2019
%K bounds generalization information neurips2019 readings theory
%T Information-Theoretic Generalization Bounds for SGLD via Data-Dependent
Estimates
%U http://arxiv.org/abs/1911.02151
%X In this work, we improve upon the stepwise analysis of noisy iterative
learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently
extended by Bu, Zou, and Veeravalli (2019). Our main contributions are
significantly improved mutual information bounds for Stochastic Gradient
Langevin Dynamics via data-dependent estimates. Our approach is based on the
variational characterization of mutual information and the use of
data-dependent priors that forecast the mini-batch gradient based on a subset
of the training samples. Our approach is broadly applicable within the
information-theoretic framework of Russo and Zou (2015) and Xu and Raginsky
(2017). Our bound can be tied to a measure of flatness of the empirical risk
surface. As compared with other bounds that depend on the squared norms of
gradients, empirical investigations show that the terms in our bounds are
orders of magnitude smaller.
@article{negrea2019informationtheoretic,
abstract = {In this work, we improve upon the stepwise analysis of noisy iterative
learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently
extended by Bu, Zou, and Veeravalli (2019). Our main contributions are
significantly improved mutual information bounds for Stochastic Gradient
Langevin Dynamics via data-dependent estimates. Our approach is based on the
variational characterization of mutual information and the use of
data-dependent priors that forecast the mini-batch gradient based on a subset
of the training samples. Our approach is broadly applicable within the
information-theoretic framework of Russo and Zou (2015) and Xu and Raginsky
(2017). Our bound can be tied to a measure of flatness of the empirical risk
surface. As compared with other bounds that depend on the squared norms of
gradients, empirical investigations show that the terms in our bounds are
orders of magnitude smaller.},
added-at = {2020-05-23T19:38:50.000+0200},
author = {Negrea, Jeffrey and Haghifam, Mahdi and Dziugaite, Gintare Karolina and Khisti, Ashish and Roy, Daniel M.},
biburl = {https://www.bibsonomy.org/bibtex/23d3d9e74a3c6dedf9f08231e2320bd53/kirk86},
description = {[1911.02151] Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates},
interhash = {1013053efa3ea06af10ca5faa31d170a},
intrahash = {3d3d9e74a3c6dedf9f08231e2320bd53},
keywords = {bounds generalization information neurips2019 readings theory},
note = {cite arxiv:1911.02151Comment: 23 pages, 1 figure. To appear in, Advances in Neural Information Processing Systems (33), 2019},
timestamp = {2021-01-11T00:12:44.000+0100},
title = {Information-Theoretic Generalization Bounds for SGLD via Data-Dependent
Estimates},
url = {http://arxiv.org/abs/1911.02151},
year = 2019
}