Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization

T. Poggio, A. Banburski, и Q. Liao.
(2019)cite arxiv:1908.09375Comment: arXiv admin note: text overlap with arXiv:1611.00740.

Аннотация

While deep learning is successful in a number of applications, it is not yet well understood theoretically. A satisfactory theoretical characterization of deep learning however, is beginning to emerge. It covers the following questions: 1) representation power of deep networks 2) optimization of the empirical risk 3) generalization properties of gradient descent techniques --- why the expected error does not suffer, despite the absence of explicit regularization, when the networks are overparametrized? In this review we discuss recent advances in the three areas. In approximation theory both shallow and deep networks have been shown to approximate any continuous functions on a bounded domain at the expense of an exponential number of parameters (exponential in the dimensionality of the function). However, for a subset of compositional functions, deep networks of the convolutional type can have a linear dependence on dimensionality, unlike shallow networks. In optimization we discuss the loss landscape for the exponential loss function and show that stochastic gradient descent will find with high probability the global minima. To address the question of generalization for classification tasks, we use classical uniform convergence results to justify minimizing a surrogate exponential-type loss function under a unit norm constraint on the weight matrix at each layer -- since the interesting variables for classification are the weight directions rather than the weights. Our approach, which is supported by several independent new results, offers a solution to the puzzle about generalization performance of deep overparametrized ReLU networks, uncovering the origin of the underlying hidden complexity control.

ключ BibTeX: poggio2019theoretical
тип записи: article
год: 2019
url: http://arxiv.org/abs/1908.09375
Примечание: cite arxiv:1908.09375Comment: arXiv admin note: text overlap with arXiv:1611.00740

тэги

Пользователи данного ресурса

Комментарии и рецензиипоказать / перейти в невидимый режим

Пожалуйста, войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)

Цитировать эту публикацию

%0 Journal Article %1 poggio2019theoretical %A Poggio, Tomaso %A Banburski, Andrzej %A Liao, Qianli %D 2019 %K approximate bounds deep-learning generalization learning theory %T Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization %U http://arxiv.org/abs/1908.09375 %X While deep learning is successful in a number of applications, it is not yet well understood theoretically. A satisfactory theoretical characterization of deep learning however, is beginning to emerge. It covers the following questions: 1) representation power of deep networks 2) optimization of the empirical risk 3) generalization properties of gradient descent techniques --- why the expected error does not suffer, despite the absence of explicit regularization, when the networks are overparametrized? In this review we discuss recent advances in the three areas. In approximation theory both shallow and deep networks have been shown to approximate any continuous functions on a bounded domain at the expense of an exponential number of parameters (exponential in the dimensionality of the function). However, for a subset of compositional functions, deep networks of the convolutional type can have a linear dependence on dimensionality, unlike shallow networks. In optimization we discuss the loss landscape for the exponential loss function and show that stochastic gradient descent will find with high probability the global minima. To address the question of generalization for classification tasks, we use classical uniform convergence results to justify minimizing a surrogate exponential-type loss function under a unit norm constraint on the weight matrix at each layer -- since the interesting variables for classification are the weight directions rather than the weights. Our approach, which is supported by several independent new results, offers a solution to the puzzle about generalization performance of deep overparametrized ReLU networks, uncovering the origin of the underlying hidden complexity control.

@article{poggio2019theoretical, abstract = {While deep learning is successful in a number of applications, it is not yet well understood theoretically. A satisfactory theoretical characterization of deep learning however, is beginning to emerge. It covers the following questions: 1) representation power of deep networks 2) optimization of the empirical risk 3) generalization properties of gradient descent techniques --- why the expected error does not suffer, despite the absence of explicit regularization, when the networks are overparametrized? In this review we discuss recent advances in the three areas. In approximation theory both shallow and deep networks have been shown to approximate any continuous functions on a bounded domain at the expense of an exponential number of parameters (exponential in the dimensionality of the function). However, for a subset of compositional functions, deep networks of the convolutional type can have a linear dependence on dimensionality, unlike shallow networks. In optimization we discuss the loss landscape for the exponential loss function and show that stochastic gradient descent will find with high probability the global minima. To address the question of generalization for classification tasks, we use classical uniform convergence results to justify minimizing a surrogate exponential-type loss function under a unit norm constraint on the weight matrix at each layer -- since the interesting variables for classification are the weight directions rather than the weights. Our approach, which is supported by several independent new results, offers a solution to the puzzle about generalization performance of deep overparametrized ReLU networks, uncovering the origin of the underlying hidden complexity control.}, added-at = {2019-09-03T00:35:05.000+0200}, author = {Poggio, Tomaso and Banburski, Andrzej and Liao, Qianli}, biburl = {https://www.bibsonomy.org/bibtex/28ffc30aa258c65f01ffdf489886a0d8b/kirk86}, description = {[1908.09375] Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization}, interhash = {d15dc52a84eab247dd7ba18a3af82d8c}, intrahash = {8ffc30aa258c65f01ffdf489886a0d8b}, keywords = {approximate bounds deep-learning generalization learning theory}, note = {cite arxiv:1908.09375Comment: arXiv admin note: text overlap with arXiv:1611.00740}, timestamp = {2019-09-03T00:35:05.000+0200}, title = {Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization}, url = {http://arxiv.org/abs/1908.09375}, year = 2019 }

BibSonomy