Inproceedings,

Digging Into Self-Supervised Monocular Depth Estimation

C. Godard, O. Aodha, M. Firman, and G. Brostow.
2019 IEEE/CVF International Conference on Computer Vision (ICCV), page 3827-3837. (October 2019)
DOI: 10.1109/ICCV.2019.00393

Abstract

Per-pixel ground-truth depth data is challenging to acquire at scale. To overcome this limitation, self-supervised learning has emerged as a promising alternative for training models to perform monocular depth estimation. In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods. Research on self-supervised monocular training usually explores increasingly complex architectures, loss functions, and image formation models, all of which have recently helped to close the gap with fully-supervised methods. We show that a surprisingly simple model, and associated design choices, lead to superior predictions. In particular, we propose (i) a minimum reprojection loss, designed to robustly handle occlusions, (ii) a full-resolution multi-scale sampling method that reduces visual artifacts, and (iii) an auto-masking loss to ignore training pixels that violate camera motion assumptions. We demonstrate the effectiveness of each component in isolation, and show high quality, state-of-the-art results on the KITTI benchmark.

BibTeX key: 2019-godard
entry type: inproceedings
booktitle: 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
year: 2019
month: oct
pages: 3827-3837
issn: 2380-7504
DOI: 10.1109/ICCV.2019.00393
url: https://ieeexplore.ieee.org/document/9009796/

BibSonomy

Digging Into Self-Supervised Monocular Depth Estimation

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on