A Capacity Scaling Law for Artificial Neural Networks
G. Friedland, and M. Krell. (2017)cite arxiv:1708.06019Comment: 13 pages, 4 figures, 2 listings of source code.
Abstract
We derive the calculation of two critical numbers predicting the behavior of
perceptron networks. First, we derive the calculation of what we call the
lossless memory (LM) dimension. The LM dimension is a generalization of the
Vapnik--Chervonenkis (VC) dimension that avoids structured data and therefore
provides an upper bound for perfectly fitting almost any training data. Second,
we derive what we call the MacKay (MK) dimension. This limit indicates a 50%
chance of not being able to train a given function. Our derivations are
performed by embedding a neural network into Shannon's communication model
which allows to interpret the two points as capacities measured in bits. We
present a proof and practical experiments that validate our upper bounds with
repeatable experiments using different network configurations, diverse
implementations, varying activation functions, and several learning algorithms.
The bottom line is that the two capacity points scale strictly linear with the
number of weights. Among other practical applications, our result allows to
compare and benchmark different neural network implementations independent of a
concrete learning task. Our results provide insight into the capabilities and
limits of neural networks and generate valuable know how for experimental
design decisions.
Description
[1708.06019v2] A Capacity Scaling Law for Artificial Neural Networks
%0 Journal Article
%1 friedland2017capacity
%A Friedland, Gerald
%A Krell, Mario
%D 2017
%K capacity deep-learning
%T A Capacity Scaling Law for Artificial Neural Networks
%U http://arxiv.org/abs/1708.06019
%X We derive the calculation of two critical numbers predicting the behavior of
perceptron networks. First, we derive the calculation of what we call the
lossless memory (LM) dimension. The LM dimension is a generalization of the
Vapnik--Chervonenkis (VC) dimension that avoids structured data and therefore
provides an upper bound for perfectly fitting almost any training data. Second,
we derive what we call the MacKay (MK) dimension. This limit indicates a 50%
chance of not being able to train a given function. Our derivations are
performed by embedding a neural network into Shannon's communication model
which allows to interpret the two points as capacities measured in bits. We
present a proof and practical experiments that validate our upper bounds with
repeatable experiments using different network configurations, diverse
implementations, varying activation functions, and several learning algorithms.
The bottom line is that the two capacity points scale strictly linear with the
number of weights. Among other practical applications, our result allows to
compare and benchmark different neural network implementations independent of a
concrete learning task. Our results provide insight into the capabilities and
limits of neural networks and generate valuable know how for experimental
design decisions.
@article{friedland2017capacity,
abstract = {We derive the calculation of two critical numbers predicting the behavior of
perceptron networks. First, we derive the calculation of what we call the
lossless memory (LM) dimension. The LM dimension is a generalization of the
Vapnik--Chervonenkis (VC) dimension that avoids structured data and therefore
provides an upper bound for perfectly fitting almost any training data. Second,
we derive what we call the MacKay (MK) dimension. This limit indicates a 50%
chance of not being able to train a given function. Our derivations are
performed by embedding a neural network into Shannon's communication model
which allows to interpret the two points as capacities measured in bits. We
present a proof and practical experiments that validate our upper bounds with
repeatable experiments using different network configurations, diverse
implementations, varying activation functions, and several learning algorithms.
The bottom line is that the two capacity points scale strictly linear with the
number of weights. Among other practical applications, our result allows to
compare and benchmark different neural network implementations independent of a
concrete learning task. Our results provide insight into the capabilities and
limits of neural networks and generate valuable know how for experimental
design decisions.},
added-at = {2019-08-29T14:04:39.000+0200},
author = {Friedland, Gerald and Krell, Mario},
biburl = {https://www.bibsonomy.org/bibtex/24851bd1a4a74ca4e54b6cfd196be2524/kirk86},
description = {[1708.06019v2] A Capacity Scaling Law for Artificial Neural Networks},
interhash = {2d8078d71fcf2236348a3437690895ba},
intrahash = {4851bd1a4a74ca4e54b6cfd196be2524},
keywords = {capacity deep-learning},
note = {cite arxiv:1708.06019Comment: 13 pages, 4 figures, 2 listings of source code},
timestamp = {2019-08-29T14:04:39.000+0200},
title = {A Capacity Scaling Law for Artificial Neural Networks},
url = {http://arxiv.org/abs/1708.06019},
year = 2017
}