Zusammenfassung
Detecting objects and estimating their pose remains as one of the major
challenges of the computer vision research community. There exists a compromise
between localizing the objects and estimating their viewpoints. The detector
ideally needs to be view-invariant, while the pose estimation process should be
able to generalize towards the category-level. This work is an exploration of
using deep learning models for solving both problems simultaneously. For doing
so, we propose three novel deep learning architectures, which are able to
perform a joint detection and pose estimation, where we gradually decouple the
two tasks. We also investigate whether the pose estimation problem should be
solved as a classification or regression problem, being this still an open
question in the computer vision community. We detail a comparative analysis of
all our solutions and the methods that currently define the state of the art
for this problem. We use PASCAL3D+ and ObjectNet3D datasets to present the
thorough experimental evaluation and main results. With the proposed models we
achieve the state-of-the-art performance in both datasets.
Nutzer