Abstract
The lack of crisp mathematical models that capture the structure of
real-world data sets is a major obstacle to the detailed theoretical
understanding of deep neural networks. Here, we introduce a generative model
for data sets that we call the hidden manifold model (HMM). The idea is to have
high-dimensional inputs lie on a lower-dimensional manifold, with labels that
depend only on their position within this manifold, akin to a single layer
decoder or generator in a generative adversarial network. We first demonstrate
the effect of structured data sets by experimentally comparing the dynamics and
the performance of two-layer neural networks trained on three different data
sets: (i) an unstructured synthetic data set containing random i.i.d. inputs,
(ii) a structured data set drawn from the HMM and (iii) a simple canonical data
set containing MNIST images. We pinpoint two phenomena related to the dynamics
of the networks and their ability to generalise that only appear when training
on structured data sets, and we experimentally demonstrate that training
networks on data sets drawn from the HMM reproduces both the phenomena seen
during training on real dataset. Our main theoretical result is that we show
that the learning dynamics in the hidden manifold model is amenable to an
analytical treatment by proving a "Gaussian Equivalence Theorem", opening the
way to further detailed theoretical studies. In particular, we show how the
dynamics of stochastic gradient descent for a two-layer network is captured by
a set of ordinary differential equations that track the generalisation error at
all times.
Users
Please
log in to take part in the discussion (add own reviews or comments).