Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks
J. Werfel, X. Xie, und H. Seung. In, MIT Press, (2003)Discussion of learning curves for stochastic gradient descent.
Besides gradient based approaches, the paper shortly describes (with additional references) weight perturbation and node perturbation approaches..
Zusammenfassung
Gradient-following learning methods can encounter problems of implementation in many applications, and stochastic variants are frequently used to overcome these difficulties. We derive quantitative learning curves for three online training methods used with a linear perceptron: direct gradient descent, node perturbation, and weight perturbation. The maximum learning rate for the stochastic methods scales inversely with the first power of the dimensionality of the noise injected into the system; with sufficiently small learning rate, all three methods give identical learning curves. These results suggest guidelines for when these stochastic methods will be limited in their utility, and considerations for architectures in which they will be effective.
Beschreibung
Discussion of learning curves for stochastic gradient descent.
Besides gradient based approaches, the paper shortly describes (with additional references) weight perturbation and node perturbation approaches.
Discussion of learning curves for stochastic gradient descent.
Besides gradient based approaches, the paper shortly describes (with additional references) weight perturbation and node perturbation approaches.
%0 Conference Paper
%1 Werfel03learningcurves
%A Werfel, Justin
%A Xie, Xiaohui
%A Seung, H. Sebastian
%B In
%D 2003
%I MIT Press
%K 2008 descent gradient learning online similarity
%T Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks
%X Gradient-following learning methods can encounter problems of implementation in many applications, and stochastic variants are frequently used to overcome these difficulties. We derive quantitative learning curves for three online training methods used with a linear perceptron: direct gradient descent, node perturbation, and weight perturbation. The maximum learning rate for the stochastic methods scales inversely with the first power of the dimensionality of the noise injected into the system; with sufficiently small learning rate, all three methods give identical learning curves. These results suggest guidelines for when these stochastic methods will be limited in their utility, and considerations for architectures in which they will be effective.
@inproceedings{Werfel03learningcurves,
abstract = {Gradient-following learning methods can encounter problems of implementation in many applications, and stochastic variants are frequently used to overcome these difficulties. We derive quantitative learning curves for three online training methods used with a linear perceptron: direct gradient descent, node perturbation, and weight perturbation. The maximum learning rate for the stochastic methods scales inversely with the first power of the dimensionality of the noise injected into the system; with sufficiently small learning rate, all three methods give identical learning curves. These results suggest guidelines for when these stochastic methods will be limited in their utility, and considerations for architectures in which they will be effective.},
added-at = {2009-05-08T10:03:11.000+0200},
author = {Werfel, Justin and Xie, Xiaohui and Seung, H. Sebastian},
biburl = {https://www.bibsonomy.org/bibtex/279243799bb83784cad32bbb3c743a30f/mgrani},
booktitle = {In},
description = {Discussion of learning curves for stochastic gradient descent.
Besides gradient based approaches, the paper shortly describes (with additional references) weight perturbation and node perturbation approaches.},
interhash = {260ce6f1c1b619ca882a2e3090578d2f},
intrahash = {79243799bb83784cad32bbb3c743a30f},
keywords = {2008 descent gradient learning online similarity},
note = {Discussion of learning curves for stochastic gradient descent.
Besides gradient based approaches, the paper shortly describes (with additional references) weight perturbation and node perturbation approaches.},
publisher = {MIT Press},
timestamp = {2009-05-08T10:03:12.000+0200},
title = {Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks},
year = 2003
}