Abstract
We present a novel online depth map fusion approach that learns depth map
aggregation in a latent feature space. While previous fusion methods use an
explicit scene representation like signed distance functions (SDFs), we propose
a learned feature representation for the fusion. The key idea is a separation
between the scene representation used for the fusion and the output scene
representation, via an additional translator network. Our neural network
architecture consists of two main parts: a depth and feature fusion
sub-network, which is followed by a translator sub-network to produce the final
surface representation (e.g. TSDF) for visualization or other tasks. Our
approach is an online process, handles high noise levels, and is particularly
able to deal with gross outliers common for photometric stereo-based depth
maps. Experiments on real and synthetic data demonstrate improved results
compared to the state of the art, especially in challenging scenarios with
large amounts of noise and outliers.
Users
Please
log in to take part in the discussion (add own reviews or comments).