Article,

Information Dropout: Learning Optimal Representations Through Noisy Computation

A. Achille, and S. Soatto.
(2016)cite arxiv:1611.01353.

Abstract

The cross-entropy loss commonly used in deep learning is closely related to the defining properties of optimal representations, but does not enforce some of the key properties. We show that this can be solved by adding a regularization term, which is in turn related to injecting multiplicative noise in the activations of a Deep Neural Network, a special case of which is the common practice of dropout. We show that our regularized loss function can be efficiently minimized using Information Dropout, a generalization of dropout rooted in information theoretic principles that automatically adapts to the data and can better exploit architectures of limited capacity. When the task is the reconstruction of the input, we show that our loss function yields a Variational Autoencoder as a special case, thus providing a link between representation learning, information theory and variational inference. Finally, we prove that we can promote the creation of disentangled representations simply by enforcing a factorized prior, a fact that has been observed empirically in recent work. Our experiments validate the theoretical intuitions behind our method, and we find that information dropout achieves a comparable or better generalization performance than binary dropout, especially on smaller models, since it can automatically adapt the noise to the structure of the network, as well as to the test sample.

BibTeX key: achille2016information
entry type: article
year: 2016
url: http://arxiv.org/abs/1611.01353
note: cite arxiv:1611.01353

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{achille2016information, abstract = {The cross-entropy loss commonly used in deep learning is closely related to the defining properties of optimal representations, but does not enforce some of the key properties. We show that this can be solved by adding a regularization term, which is in turn related to injecting multiplicative noise in the activations of a Deep Neural Network, a special case of which is the common practice of dropout. We show that our regularized loss function can be efficiently minimized using Information Dropout, a generalization of dropout rooted in information theoretic principles that automatically adapts to the data and can better exploit architectures of limited capacity. When the task is the reconstruction of the input, we show that our loss function yields a Variational Autoencoder as a special case, thus providing a link between representation learning, information theory and variational inference. Finally, we prove that we can promote the creation of disentangled representations simply by enforcing a factorized prior, a fact that has been observed empirically in recent work. Our experiments validate the theoretical intuitions behind our method, and we find that information dropout achieves a comparable or better generalization performance than binary dropout, especially on smaller models, since it can automatically adapt the noise to the structure of the network, as well as to the test sample.}, added-at = {2019-09-25T04:53:30.000+0200}, author = {Achille, Alessandro and Soatto, Stefano}, biburl = {https://www.bibsonomy.org/bibtex/234697eaa9d992461d5d501c79ececf13/kirk86}, description = {[1611.01353] Information Dropout: Learning Optimal Representations Through Noisy Computation}, interhash = {b8215868c57a86c8aa173fee75f25704}, intrahash = {34697eaa9d992461d5d501c79ececf13}, keywords = {deep-learning readings theory regularisation}, note = {cite arxiv:1611.01353}, timestamp = {2019-09-26T16:00:39.000+0200}, title = {Information Dropout: Learning Optimal Representations Through Noisy Computation}, url = {http://arxiv.org/abs/1611.01353}, year = 2016 }

BibSonomy

Information Dropout: Learning Optimal Representations Through Noisy Computation

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on