ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
D. Chen, A. Chang, and M. Nießner. (2019)cite arxiv:1912.08830Comment: Video: https://youtu.be/T9J5t-UEcNA.
Abstract
We introduce the new task of 3D object localization in RGB-D scans using
natural language descriptions. As input, we assume a point cloud of a scanned
3D scene along with a free-form description of a specified target object. To
address this task, we propose ScanRefer, where the core idea is to learn a
fused descriptor from 3D object proposals and encoded sentence embeddings. This
learned descriptor then correlates the language expressions with the underlying
geometric features of the 3D scan and facilitates the regression of the 3D
bounding box of the target object. In order to train and benchmark our method,
we introduce a new ScanRefer dataset, containing 46,173 descriptions of 9,943
objects from 703 ScanNet scenes. ScanRefer is the first large-scale effort to
perform object localization via natural language expression directly in 3D.
Description
[1912.08830] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
%0 Generic
%1 chen2019scanrefer
%A Chen, Dave Zhenyu
%A Chang, Angel X.
%A Nießner, Matthias
%D 2019
%K 3d localisation nlp object paper-dzhi scanrefer
%T ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
%U http://arxiv.org/abs/1912.08830
%X We introduce the new task of 3D object localization in RGB-D scans using
natural language descriptions. As input, we assume a point cloud of a scanned
3D scene along with a free-form description of a specified target object. To
address this task, we propose ScanRefer, where the core idea is to learn a
fused descriptor from 3D object proposals and encoded sentence embeddings. This
learned descriptor then correlates the language expressions with the underlying
geometric features of the 3D scan and facilitates the regression of the 3D
bounding box of the target object. In order to train and benchmark our method,
we introduce a new ScanRefer dataset, containing 46,173 descriptions of 9,943
objects from 703 ScanNet scenes. ScanRefer is the first large-scale effort to
perform object localization via natural language expression directly in 3D.
@misc{chen2019scanrefer,
abstract = {We introduce the new task of 3D object localization in RGB-D scans using
natural language descriptions. As input, we assume a point cloud of a scanned
3D scene along with a free-form description of a specified target object. To
address this task, we propose ScanRefer, where the core idea is to learn a
fused descriptor from 3D object proposals and encoded sentence embeddings. This
learned descriptor then correlates the language expressions with the underlying
geometric features of the 3D scan and facilitates the regression of the 3D
bounding box of the target object. In order to train and benchmark our method,
we introduce a new ScanRefer dataset, containing 46,173 descriptions of 9,943
objects from 703 ScanNet scenes. ScanRefer is the first large-scale effort to
perform object localization via natural language expression directly in 3D.},
added-at = {2020-01-20T09:59:50.000+0100},
author = {Chen, Dave Zhenyu and Chang, Angel X. and Nießner, Matthias},
biburl = {https://www.bibsonomy.org/bibtex/265acab3da100d9641cf102c19c7d261d/nosebrain},
description = {[1912.08830] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language},
interhash = {09eb153b52c64e65e48de1f058fe5dc2},
intrahash = {65acab3da100d9641cf102c19c7d261d},
keywords = {3d localisation nlp object paper-dzhi scanrefer},
note = {cite arxiv:1912.08830Comment: Video: https://youtu.be/T9J5t-UEcNA},
timestamp = {2020-01-20T10:00:01.000+0100},
title = {ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language},
url = {http://arxiv.org/abs/1912.08830},
year = 2019
}