copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

M. Roberts, J. Ramapuram, A. Ranjan, A. Kumar, M. Bautista, N. Paczan, R. Webb, and J. Susskind. (2020)cite arxiv:2011.02523Comment: Accepted for publication at the International Conference on Computer Vision (ICCV) 2021.

Abstract

For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. We address this challenge by introducing Hypersim, a photorealistic synthetic dataset for holistic indoor scene understanding. To create our dataset, we leverage a large repository of synthetic scenes created by professional artists, and we generate 77,400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry. Our dataset: (1) relies exclusively on publicly available 3D assets; (2) includes complete scene geometry, material information, and lighting information for every scene; (3) includes dense per-pixel semantic instance segmentations and complete camera information for every image; and (4) factors every image into diffuse reflectance, diffuse illumination, and a non-diffuse residual term that captures view-dependent lighting effects. We analyze our dataset at the level of scenes, objects, and pixels, and we analyze costs in terms of money, computation time, and annotation effort. Remarkably, we find that it is possible to generate our entire dataset from scratch, for roughly half the cost of training a popular open-source natural language processing model. We also evaluate sim-to-real transfer performance on two real-world scene understanding tasks - semantic segmentation and 3D shape prediction - where we find that pre-training on our dataset significantly improves performance on both tasks, and achieves state-of-the-art performance on the most challenging Pix3D test set. All of our rendered image data, as well as all the code we used to generate our dataset and perform our experiments, is available online.

Description

2011.02523.pdf

Links and resources

BibTeX key: roberts2020hypersim
entry type: misc
year: 2020
url: http://arxiv.org/abs/2011.02523
note: cite arxiv:2011.02523Comment: Accepted for publication at the International Conference on Computer Vision (ICCV) 2021

Cite this publication

%0 Generic %1 roberts2020hypersim %A Roberts, Mike %A Ramapuram, Jason %A Ranjan, Anurag %A Kumar, Atulit %A Bautista, Miguel Angel %A Paczan, Nathan %A Webb, Russ %A Susskind, Joshua M. %D 2020 %K dataset %T Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding %U http://arxiv.org/abs/2011.02523 %X For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. We address this challenge by introducing Hypersim, a photorealistic synthetic dataset for holistic indoor scene understanding. To create our dataset, we leverage a large repository of synthetic scenes created by professional artists, and we generate 77,400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry. Our dataset: (1) relies exclusively on publicly available 3D assets; (2) includes complete scene geometry, material information, and lighting information for every scene; (3) includes dense per-pixel semantic instance segmentations and complete camera information for every image; and (4) factors every image into diffuse reflectance, diffuse illumination, and a non-diffuse residual term that captures view-dependent lighting effects. We analyze our dataset at the level of scenes, objects, and pixels, and we analyze costs in terms of money, computation time, and annotation effort. Remarkably, we find that it is possible to generate our entire dataset from scratch, for roughly half the cost of training a popular open-source natural language processing model. We also evaluate sim-to-real transfer performance on two real-world scene understanding tasks - semantic segmentation and 3D shape prediction - where we find that pre-training on our dataset significantly improves performance on both tasks, and achieves state-of-the-art performance on the most challenging Pix3D test set. All of our rendered image data, as well as all the code we used to generate our dataset and perform our experiments, is available online.

@misc{roberts2020hypersim, abstract = {For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. We address this challenge by introducing Hypersim, a photorealistic synthetic dataset for holistic indoor scene understanding. To create our dataset, we leverage a large repository of synthetic scenes created by professional artists, and we generate 77,400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry. Our dataset: (1) relies exclusively on publicly available 3D assets; (2) includes complete scene geometry, material information, and lighting information for every scene; (3) includes dense per-pixel semantic instance segmentations and complete camera information for every image; and (4) factors every image into diffuse reflectance, diffuse illumination, and a non-diffuse residual term that captures view-dependent lighting effects. We analyze our dataset at the level of scenes, objects, and pixels, and we analyze costs in terms of money, computation time, and annotation effort. Remarkably, we find that it is possible to generate our entire dataset from scratch, for roughly half the cost of training a popular open-source natural language processing model. We also evaluate sim-to-real transfer performance on two real-world scene understanding tasks - semantic segmentation and 3D shape prediction - where we find that pre-training on our dataset significantly improves performance on both tasks, and achieves state-of-the-art performance on the most challenging Pix3D test set. All of our rendered image data, as well as all the code we used to generate our dataset and perform our experiments, is available online.}, added-at = {2021-09-04T15:47:08.000+0200}, author = {Roberts, Mike and Ramapuram, Jason and Ranjan, Anurag and Kumar, Atulit and Bautista, Miguel Angel and Paczan, Nathan and Webb, Russ and Susskind, Joshua M.}, biburl = {https://www.bibsonomy.org/bibtex/246c89f212d83f04ca0c861947968e534/shuncheng.wu}, description = {2011.02523.pdf}, interhash = {c9229f49c08e6675c27082f882fa2981}, intrahash = {46c89f212d83f04ca0c861947968e534}, keywords = {dataset}, note = {cite arxiv:2011.02523Comment: Accepted for publication at the International Conference on Computer Vision (ICCV) 2021}, timestamp = {2021-09-04T15:47:08.000+0200}, title = {Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding}, url = {http://arxiv.org/abs/2011.02523}, year = 2020 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

Comments and Reviews
(0)