An overview of gradient descent optimization algorithms
S. Ruder. (2016)cite arxiv:1609.04747Comment: Added derivations of AdaMax and Nadam.
Аннотация
Gradient descent optimization algorithms, while increasingly popular, are
often used as black-box optimizers, as practical explanations of their
strengths and weaknesses are hard to come by. This article aims to provide the
reader with intuitions with regard to the behaviour of different algorithms
that will allow her to put them to use. In the course of this overview, we look
at different variants of gradient descent, summarize challenges, introduce the
most common optimization algorithms, review architectures in a parallel and
distributed setting, and investigate additional strategies for optimizing
gradient descent.
Описание
[1609.04747] An overview of gradient descent optimization algorithms
%0 Generic
%1 ruder2016overview
%A Ruder, Sebastian
%D 2016
%K deep_learning gradient_descend optimization overview
%T An overview of gradient descent optimization algorithms
%U http://arxiv.org/abs/1609.04747
%X Gradient descent optimization algorithms, while increasingly popular, are
often used as black-box optimizers, as practical explanations of their
strengths and weaknesses are hard to come by. This article aims to provide the
reader with intuitions with regard to the behaviour of different algorithms
that will allow her to put them to use. In the course of this overview, we look
at different variants of gradient descent, summarize challenges, introduce the
most common optimization algorithms, review architectures in a parallel and
distributed setting, and investigate additional strategies for optimizing
gradient descent.
@misc{ruder2016overview,
abstract = {Gradient descent optimization algorithms, while increasingly popular, are
often used as black-box optimizers, as practical explanations of their
strengths and weaknesses are hard to come by. This article aims to provide the
reader with intuitions with regard to the behaviour of different algorithms
that will allow her to put them to use. In the course of this overview, we look
at different variants of gradient descent, summarize challenges, introduce the
most common optimization algorithms, review architectures in a parallel and
distributed setting, and investigate additional strategies for optimizing
gradient descent.},
added-at = {2018-06-13T14:25:59.000+0200},
author = {Ruder, Sebastian},
biburl = {https://www.bibsonomy.org/bibtex/24d1336b721e9154546ba8e1d87046316/loroch},
description = {[1609.04747] An overview of gradient descent optimization algorithms},
interhash = {6e9f951ec79eba6cb7eb27db1e6d4ad6},
intrahash = {4d1336b721e9154546ba8e1d87046316},
keywords = {deep_learning gradient_descend optimization overview},
note = {cite arxiv:1609.04747Comment: Added derivations of AdaMax and Nadam},
timestamp = {2018-06-13T14:25:59.000+0200},
title = {An overview of gradient descent optimization algorithms},
url = {http://arxiv.org/abs/1609.04747},
year = 2016
}