Аннотация
We present an end-to-end deep learning architecture for depth map inference
from multi-view images. In the network, we first extract deep visual image
features, and then build the 3D cost volume upon the reference camera frustum
via the differentiable homography warping. Next, we apply 3D convolutions to
regularize and regress the initial depth map, which is then refined with the
reference image to generate the final output. Our framework flexibly adapts
arbitrary N-view inputs using a variance-based cost metric that maps multiple
features into one cost feature. The proposed MVSNet is demonstrated on the
large-scale indoor DTU dataset. With simple post-processing, our method not
only significantly outperforms previous state-of-the-arts, but also is several
times faster in runtime. We also evaluate MVSNet on the complex outdoor Tanks
and Temples dataset, where our method ranks first before April 18, 2018 without
any fine-tuning, showing the strong generalization ability of MVSNet.
Пользователи данного ресурса
Пожалуйста,
войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)