Existing Rademacher complexity bounds for neural networks rely only on norm
control of the weight matrices and depend exponentially on depth via a product
of the matrix norms. Lower bounds show that this exponential dependence on
depth is unavoidable when no additional properties of the training data are
considered. We suspect that this conundrum comes from the fact that these
bounds depend on the training data only through the margin. In practice, many
data-dependent techniques such as Batchnorm improve the generalization
performance. For feedforward neural nets as well as RNNs, we obtain tighter
Rademacher complexity bounds by considering additional data-dependent
properties of the network: the norms of the hidden layers of the network, and
the norms of the Jacobians of each layer with respect to the previous layers.
Our bounds scale polynomially in depth when these empirical quantities are
small, as is usually the case in practice. To obtain these bounds, we develop
general tools for augmenting a sequence of functions to make their composition
Lipschitz and then covering the augmented functions. Inspired by our theory, we
directly regularize the network's Jacobians during training and empirically
demonstrate that this improves test performance.
Description
[1905.03684] Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation
%0 Journal Article
%1 wei2019datadependent
%A Wei, Colin
%A Ma, Tengyu
%D 2019
%K bounds deep-learning generalization optimization probability readings stats theory
%T Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz
Augmentation
%U http://arxiv.org/abs/1905.03684
%X Existing Rademacher complexity bounds for neural networks rely only on norm
control of the weight matrices and depend exponentially on depth via a product
of the matrix norms. Lower bounds show that this exponential dependence on
depth is unavoidable when no additional properties of the training data are
considered. We suspect that this conundrum comes from the fact that these
bounds depend on the training data only through the margin. In practice, many
data-dependent techniques such as Batchnorm improve the generalization
performance. For feedforward neural nets as well as RNNs, we obtain tighter
Rademacher complexity bounds by considering additional data-dependent
properties of the network: the norms of the hidden layers of the network, and
the norms of the Jacobians of each layer with respect to the previous layers.
Our bounds scale polynomially in depth when these empirical quantities are
small, as is usually the case in practice. To obtain these bounds, we develop
general tools for augmenting a sequence of functions to make their composition
Lipschitz and then covering the augmented functions. Inspired by our theory, we
directly regularize the network's Jacobians during training and empirically
demonstrate that this improves test performance.
@article{wei2019datadependent,
abstract = {Existing Rademacher complexity bounds for neural networks rely only on norm
control of the weight matrices and depend exponentially on depth via a product
of the matrix norms. Lower bounds show that this exponential dependence on
depth is unavoidable when no additional properties of the training data are
considered. We suspect that this conundrum comes from the fact that these
bounds depend on the training data only through the margin. In practice, many
data-dependent techniques such as Batchnorm improve the generalization
performance. For feedforward neural nets as well as RNNs, we obtain tighter
Rademacher complexity bounds by considering additional data-dependent
properties of the network: the norms of the hidden layers of the network, and
the norms of the Jacobians of each layer with respect to the previous layers.
Our bounds scale polynomially in depth when these empirical quantities are
small, as is usually the case in practice. To obtain these bounds, we develop
general tools for augmenting a sequence of functions to make their composition
Lipschitz and then covering the augmented functions. Inspired by our theory, we
directly regularize the network's Jacobians during training and empirically
demonstrate that this improves test performance.},
added-at = {2019-09-25T04:56:26.000+0200},
author = {Wei, Colin and Ma, Tengyu},
biburl = {https://www.bibsonomy.org/bibtex/2270051c0ddabf9bb297e5d1fdf64073d/kirk86},
description = {[1905.03684] Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation},
interhash = {777ad303eef80fc20e78b5b3e7ab2345},
intrahash = {270051c0ddabf9bb297e5d1fdf64073d},
keywords = {bounds deep-learning generalization optimization probability readings stats theory},
note = {cite arxiv:1905.03684},
timestamp = {2019-09-25T04:56:26.000+0200},
title = {Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz
Augmentation},
url = {http://arxiv.org/abs/1905.03684},
year = 2019
}