Аннотация
While deep learning is successful in a number of applications, it is not yet
well understood theoretically. A satisfactory theoretical characterization of
deep learning however, is beginning to emerge. It covers the following
questions: 1) representation power of deep networks 2) optimization of the
empirical risk 3) generalization properties of gradient descent techniques ---
why the expected error does not suffer, despite the absence of explicit
regularization, when the networks are overparametrized? In this review we
discuss recent advances in the three areas. In approximation theory both
shallow and deep networks have been shown to approximate any continuous
functions on a bounded domain at the expense of an exponential number of
parameters (exponential in the dimensionality of the function). However, for a
subset of compositional functions, deep networks of the convolutional type can
have a linear dependence on dimensionality, unlike shallow networks. In
optimization we discuss the loss landscape for the exponential loss function
and show that stochastic gradient descent will find with high probability the
global minima. To address the question of generalization for classification
tasks, we use classical uniform convergence results to justify minimizing a
surrogate exponential-type loss function under a unit norm constraint on the
weight matrix at each layer -- since the interesting variables for
classification are the weight directions rather than the weights. Our approach,
which is supported by several independent new results, offers a solution to the
puzzle about generalization performance of deep overparametrized ReLU networks,
uncovering the origin of the underlying hidden complexity control.
Пользователи данного ресурса
Пожалуйста,
войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)