Abstract
Recently it was shown in several papers that backpropagation is able to find
the global minimum of the empirical risk on the training data using
over-parametrized deep neural networks. In this paper a similar result is shown
for deep neural networks with the sigmoidal squasher activation function in a
regression setting, and a lower bound is presented which proves that these
networks do not generalize well on a new data in the sense that they do not
achieve the optimal minimax rate of convergence for estimation of smooth
regression functions.
Users
Please
log in to take part in the discussion (add own reviews or comments).