Misc,

AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training

C. Chen, J. Choi, D. Brand, A. Agrawal, W. Zhang, and K. Gopalakrishnan.
(2017)cite arxiv:1712.02679Comment: IBM Research AI, 9 pages, 7 figures, AAAI18 accepted.

Abstract

Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained. To overcome this limitation, new gradient compression techniques are needed that are computationally friendly, applicable to a wide variety of layers seen in Deep Neural Networks and adaptable to variations in network architectures as well as their hyper-parameters. In this paper we introduce a novel technique - the Adaptive Residual Gradient Compression (AdaComp) scheme. AdaComp is based on localized selection of gradient residues and automatically tunes the compression rate depending on local activity. We show excellent results on a wide spectrum of state of the art Deep Learning models in multiple domains (vision, speech, language), datasets (MNIST, CIFAR10, ImageNet, BN50, Shakespeare), optimizers (SGD with momentum, Adam) and network parameters (number of learners, minibatch-size etc.). Exploiting both sparsity and quantization, we demonstrate end-to-end compression rates of ~200X for fully-connected and recurrent layers, and ~40X for convolutional layers, without any noticeable degradation in model accuracies.

BibTeX key: chen2017adacomp
entry type: misc
year: 2017
url: http://arxiv.org/abs/1712.02679
note: cite arxiv:1712.02679Comment: IBM Research AI, 9 pages, 7 figures, AAAI18 accepted

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@misc{chen2017adacomp, abstract = {Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained. To overcome this limitation, new gradient compression techniques are needed that are computationally friendly, applicable to a wide variety of layers seen in Deep Neural Networks and adaptable to variations in network architectures as well as their hyper-parameters. In this paper we introduce a novel technique - the Adaptive Residual Gradient Compression (AdaComp) scheme. AdaComp is based on localized selection of gradient residues and automatically tunes the compression rate depending on local activity. We show excellent results on a wide spectrum of state of the art Deep Learning models in multiple domains (vision, speech, language), datasets (MNIST, CIFAR10, ImageNet, BN50, Shakespeare), optimizers (SGD with momentum, Adam) and network parameters (number of learners, minibatch-size etc.). Exploiting both sparsity and quantization, we demonstrate end-to-end compression rates of ~200X for fully-connected and recurrent layers, and ~40X for convolutional layers, without any noticeable degradation in model accuracies.}, added-at = {2019-06-04T15:53:22.000+0200}, author = {Chen, Chia-Yu and Choi, Jungwook and Brand, Daniel and Agrawal, Ankur and Zhang, Wei and Gopalakrishnan, Kailash}, biburl = {https://www.bibsonomy.org/bibtex/2d8f8723c11cc44fe0e660d3d653060fe/alrigazzi}, description = {AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training}, interhash = {5f15d1f3b001cf64a32ec0fdb0adca3b}, intrahash = {d8f8723c11cc44fe0e660d3d653060fe}, keywords = {deep dl large-scale networks neural training}, note = {cite arxiv:1712.02679Comment: IBM Research AI, 9 pages, 7 figures, AAAI18 accepted}, timestamp = {2019-06-04T15:53:22.000+0200}, title = {AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training}, url = {http://arxiv.org/abs/1712.02679}, year = 2017 }

BibSonomy

AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on