Аннотация
Most existing studies on learning local features focus on the patch-based
descriptions of individual keypoints, whereas neglecting the spatial relations
established from their keypoint locations. In this paper, we go beyond the
local detail representation by introducing context awareness to augment
off-the-shelf local feature descriptors. Specifically, we propose a unified
learning framework that leverages and aggregates the cross-modality contextual
information, including (i) visual context from high-level image representation,
and (ii) geometric context from 2D keypoint distribution. Moreover, we propose
an effective N-pair loss that eschews the empirical hyper-parameter search and
improves the convergence. The proposed augmentation scheme is lightweight
compared with the raw local feature description, meanwhile improves remarkably
on several large-scale benchmarks with diversified scenes, which demonstrates
both strong practicality and generalization ability in geometric matching
applications.
Пользователи данного ресурса
Пожалуйста,
войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)