Misc,

A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

F. Schuiki, M. Schaffner, F. Gürkaynak, and L. Benini.
(2018)cite arxiv:1803.04783Comment: 16 pages, submitted to IEEE Transactions on Computers journal.

Abstract

Most investigations into near-memory hardware accelerators for deep neural networks have primarily focused on inference, while the potential of accelerating training has received relatively little attention so far. Based on an in-depth analysis of the key computational patterns in state-of-the-art gradient-based training methods, we propose an efficient near-memory acceleration engine called NTX that can be used to train state-of-the-art deep convolutional neural networks at scale. Our main contributions are: (i) identifying requirements for efficient data address generation and developing an efficient accelerator offloading scheme reducing overhead by 7x over previously published results; (ii) support a rich set of operations allowing for efficient calculation of the back-propagation phase. The low control overhead allows up to 8 NTX engines to be controlled by a simple processor. Evaluations in a near-memory computing scenario where the accelerator is placed on the logic base die of a Hybrid Memory Cube demonstrate a 2.6x energy efficiency improvement over contemporary GPUs at 4.4x less silicon area, and an average compute performance of 1.01 Tflop/s for training large state-of-the-art networks with full floating-point precision. The architecture is scalable and paves the way towards efficient deep learning in a distributed near-memory setting.

BibTeX key: schuiki2018scalable
entry type: misc
year: 2018
url: http://arxiv.org/abs/1803.04783
note: cite arxiv:1803.04783Comment: 16 pages, submitted to IEEE Transactions on Computers journal

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@misc{schuiki2018scalable, abstract = {Most investigations into near-memory hardware accelerators for deep neural networks have primarily focused on inference, while the potential of accelerating training has received relatively little attention so far. Based on an in-depth analysis of the key computational patterns in state-of-the-art gradient-based training methods, we propose an efficient near-memory acceleration engine called NTX that can be used to train state-of-the-art deep convolutional neural networks at scale. Our main contributions are: (i) identifying requirements for efficient data address generation and developing an efficient accelerator offloading scheme reducing overhead by 7x over previously published results; (ii) support a rich set of operations allowing for efficient calculation of the back-propagation phase. The low control overhead allows up to 8 NTX engines to be controlled by a simple processor. Evaluations in a near-memory computing scenario where the accelerator is placed on the logic base die of a Hybrid Memory Cube demonstrate a 2.6x energy efficiency improvement over contemporary GPUs at 4.4x less silicon area, and an average compute performance of 1.01 Tflop/s for training large state-of-the-art networks with full floating-point precision. The architecture is scalable and paves the way towards efficient deep learning in a distributed near-memory setting.}, added-at = {2018-03-14T11:22:09.000+0100}, author = {Schuiki, Fabian and Schaffner, Michael and Gürkaynak, Frank K. and Benini, Luca}, biburl = {https://www.bibsonomy.org/bibtex/26bb32003d4bdccccc9a5be11c18d1d7a/jk_itwm}, description = {A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets}, interhash = {0a504edc45761d6ca4f3ee04e4bb0f41}, intrahash = {6bb32003d4bdccccc9a5be11c18d1d7a}, keywords = {distributed hardware to_read}, note = {cite arxiv:1803.04783Comment: 16 pages, submitted to IEEE Transactions on Computers journal}, timestamp = {2018-03-14T11:22:09.000+0100}, title = {A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets}, url = {http://arxiv.org/abs/1803.04783}, year = 2018 }

BibSonomy

A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on