The DenseNet architecture is highly computationally efficient as a result of
feature reuse. However, a naive DenseNet implementation can require a
significant amount of GPU memory: If not properly managed, pre-activation batch
normalization and contiguous convolution operations can produce feature maps
that grow quadratically with network depth. In this technical report, we
introduce strategies to reduce the memory consumption of DenseNets during
training. By strategically using shared memory allocations, we reduce the
memory cost for storing feature maps from quadratic to linear. Without the GPU
memory bottleneck, it is now possible to train extremely deep DenseNets.
Networks with 14M parameters can be trained on a single GPU, up from 4M. A
264-layer DenseNet (73M parameters), which previously would have been
infeasible to train, can now be trained on a single workstation with 8 NVIDIA
Tesla M40 GPUs. On the ImageNet ILSVRC classification dataset, this large
DenseNet obtains a state-of-the-art single-crop top-1 error of 20.26%.
Description
[1707.06990] Memory-Efficient Implementation of DenseNets
%0 Generic
%1 pleiss2017memoryefficient
%A Pleiss, Geoff
%A Chen, Danlu
%A Huang, Gao
%A Li, Tongcheng
%A van der Maaten, Laurens
%A Weinberger, Kilian Q.
%D 2017
%K densenet
%T Memory-Efficient Implementation of DenseNets
%U http://arxiv.org/abs/1707.06990
%X The DenseNet architecture is highly computationally efficient as a result of
feature reuse. However, a naive DenseNet implementation can require a
significant amount of GPU memory: If not properly managed, pre-activation batch
normalization and contiguous convolution operations can produce feature maps
that grow quadratically with network depth. In this technical report, we
introduce strategies to reduce the memory consumption of DenseNets during
training. By strategically using shared memory allocations, we reduce the
memory cost for storing feature maps from quadratic to linear. Without the GPU
memory bottleneck, it is now possible to train extremely deep DenseNets.
Networks with 14M parameters can be trained on a single GPU, up from 4M. A
264-layer DenseNet (73M parameters), which previously would have been
infeasible to train, can now be trained on a single workstation with 8 NVIDIA
Tesla M40 GPUs. On the ImageNet ILSVRC classification dataset, this large
DenseNet obtains a state-of-the-art single-crop top-1 error of 20.26%.
@misc{pleiss2017memoryefficient,
abstract = {The DenseNet architecture is highly computationally efficient as a result of
feature reuse. However, a naive DenseNet implementation can require a
significant amount of GPU memory: If not properly managed, pre-activation batch
normalization and contiguous convolution operations can produce feature maps
that grow quadratically with network depth. In this technical report, we
introduce strategies to reduce the memory consumption of DenseNets during
training. By strategically using shared memory allocations, we reduce the
memory cost for storing feature maps from quadratic to linear. Without the GPU
memory bottleneck, it is now possible to train extremely deep DenseNets.
Networks with 14M parameters can be trained on a single GPU, up from 4M. A
264-layer DenseNet (73M parameters), which previously would have been
infeasible to train, can now be trained on a single workstation with 8 NVIDIA
Tesla M40 GPUs. On the ImageNet ILSVRC classification dataset, this large
DenseNet obtains a state-of-the-art single-crop top-1 error of 20.26%.},
added-at = {2018-03-20T17:27:33.000+0100},
author = {Pleiss, Geoff and Chen, Danlu and Huang, Gao and Li, Tongcheng and van der Maaten, Laurens and Weinberger, Kilian Q.},
biburl = {https://www.bibsonomy.org/bibtex/20afb78936264e121beea2630d342fdd7/rcb},
description = {[1707.06990] Memory-Efficient Implementation of DenseNets},
interhash = {999d47b177e8d53dd5c572dae4512284},
intrahash = {0afb78936264e121beea2630d342fdd7},
keywords = {densenet},
note = {cite arxiv:1707.06990Comment: Technical report},
timestamp = {2018-03-20T17:27:33.000+0100},
title = {Memory-Efficient Implementation of DenseNets},
url = {http://arxiv.org/abs/1707.06990},
year = 2017
}