X. Wang, R. Girshick, A. Gupta, and K. He. (2017)cite arxiv:1711.07971Comment: CVPR 2018, code is available at: https://github.com/facebookresearch/video-nonlocal-net.
Abstract
Both convolutional and recurrent operations are building blocks that process
one local neighborhood at a time. In this paper, we present non-local
operations as a generic family of building blocks for capturing long-range
dependencies. Inspired by the classical non-local means method in computer
vision, our non-local operation computes the response at a position as a
weighted sum of the features at all positions. This building block can be
plugged into many computer vision architectures. On the task of video
classification, even without any bells and whistles, our non-local models can
compete or outperform current competition winners on both Kinetics and Charades
datasets. In static image recognition, our non-local models improve object
detection/segmentation and pose estimation on the COCO suite of tasks. Code is
available at https://github.com/facebookresearch/video-nonlocal-net .
%0 Generic
%1 wang2017nonlocal
%A Wang, Xiaolong
%A Girshick, Ross
%A Gupta, Abhinav
%A He, Kaiming
%D 2017
%K cs.CV
%T Non-local Neural Networks
%U http://arxiv.org/abs/1711.07971
%X Both convolutional and recurrent operations are building blocks that process
one local neighborhood at a time. In this paper, we present non-local
operations as a generic family of building blocks for capturing long-range
dependencies. Inspired by the classical non-local means method in computer
vision, our non-local operation computes the response at a position as a
weighted sum of the features at all positions. This building block can be
plugged into many computer vision architectures. On the task of video
classification, even without any bells and whistles, our non-local models can
compete or outperform current competition winners on both Kinetics and Charades
datasets. In static image recognition, our non-local models improve object
detection/segmentation and pose estimation on the COCO suite of tasks. Code is
available at https://github.com/facebookresearch/video-nonlocal-net .
@misc{wang2017nonlocal,
abstract = {Both convolutional and recurrent operations are building blocks that process
one local neighborhood at a time. In this paper, we present non-local
operations as a generic family of building blocks for capturing long-range
dependencies. Inspired by the classical non-local means method in computer
vision, our non-local operation computes the response at a position as a
weighted sum of the features at all positions. This building block can be
plugged into many computer vision architectures. On the task of video
classification, even without any bells and whistles, our non-local models can
compete or outperform current competition winners on both Kinetics and Charades
datasets. In static image recognition, our non-local models improve object
detection/segmentation and pose estimation on the COCO suite of tasks. Code is
available at https://github.com/facebookresearch/video-nonlocal-net .},
added-at = {2022-02-15T05:01:23.000+0100},
author = {Wang, Xiaolong and Girshick, Ross and Gupta, Abhinav and He, Kaiming},
biburl = {https://www.bibsonomy.org/bibtex/29b71715e36efc0ab4e554a7f279f5fd5/aerover},
description = {Non-local Neural Networks},
interhash = {8692d080bb46499bcd15a1a1de8cedd3},
intrahash = {9b71715e36efc0ab4e554a7f279f5fd5},
keywords = {cs.CV},
note = {cite arxiv:1711.07971Comment: CVPR 2018, code is available at: https://github.com/facebookresearch/video-nonlocal-net},
timestamp = {2022-02-15T05:01:23.000+0100},
title = {Non-local Neural Networks},
url = {http://arxiv.org/abs/1711.07971},
year = 2017
}