copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Large Scale Distributed Deep Networks

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng. Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, page 1223--1231. USA, Curran Associates Inc., (2012)

Abstract

Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly- sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm.

Description

Large scale distributed deep networks

Links and resources

BibTeX key: Dean:2012:LSD:2999134.2999271
entry type: inproceedings
address: USA
booktitle: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1
year: 2012
pages: 1223--1231
publisher: Curran Associates Inc.
series: NIPS'12
acmid: 2999271
numpages: 9
location: Lake Tahoe, Nevada
url: http://dl.acm.org/citation.cfm?id=2999134.2999271

@ven7u's tags highlighted

Cite this publication

%0 Conference Paper %1 Dean:2012:LSD:2999134.2999271 %A Dean, Jeffrey %A Corrado, Greg S. %A Monga, Rajat %A Chen, Kai %A Devin, Matthieu %A Le, Quoc V. %A Mao, Mark Z. %A Ranzato, Marc'Aurelio %A Senior, Andrew %A Tucker, Paul %A Yang, Ke %A Ng, Andrew Y. %B Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 %C USA %D 2012 %I Curran Associates Inc. %K deep-learning distributed-computing dl4j scalable-neural-netwrok %P 1223--1231 %T Large Scale Distributed Deep Networks %U http://dl.acm.org/citation.cfm?id=2999134.2999271 %X Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly- sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm.

@inproceedings{Dean:2012:LSD:2999134.2999271, abstract = {Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly- sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm.}, acmid = {2999271}, added-at = {2018-02-10T17:45:26.000+0100}, address = {USA}, author = {Dean, Jeffrey and Corrado, Greg S. and Monga, Rajat and Chen, Kai and Devin, Matthieu and Le, Quoc V. and Mao, Mark Z. and Ranzato, Marc'Aurelio and Senior, Andrew and Tucker, Paul and Yang, Ke and Ng, Andrew Y.}, biburl = {https://www.bibsonomy.org/bibtex/22e2f1089c083ce7e238e1423a4be139f/ven7u}, booktitle = {Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1}, description = {Large scale distributed deep networks}, interhash = {f32bac5178c09cbc0021852a4766c4cf}, intrahash = {2e2f1089c083ce7e238e1423a4be139f}, keywords = {deep-learning distributed-computing dl4j scalable-neural-netwrok}, location = {Lake Tahoe, Nevada}, numpages = {9}, pages = {1223--1231}, publisher = {Curran Associates Inc.}, series = {NIPS'12}, timestamp = {2018-02-10T17:45:26.000+0100}, title = {Large Scale Distributed Deep Networks}, url = {http://dl.acm.org/citation.cfm?id=2999134.2999271}, year = 2012 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Large Scale Distributed Deep Networks

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Large Scale Distributed Deep Networks

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Large Scale Distributed Deep Networks

Comments and Reviews
(0)