Abstract
Object detection methods fall into two categories, i.e., two-stage and
single-stage detectors. The former is characterized by high detection accuracy
while the latter usually has considerable inference speed. Hence, it is
imperative to fuse their metrics for a better accuracy vs. speed trade-off. To
this end, we propose a dual refinement network (Dual-RefineDet) to boost the
performance of the single-stage detector. Inheriting from advantages of the
two-stage approach (i.e., two-step regression and accurate features for
detection), anchor refinement and feature offset refinement are conducted in
anchor-offset detection, where the detection head is comprised of deformable
convolutions. Moreover, to leverage contextual information for describing
objects, we design a multi-deformable head, in which multiple detection paths
with different respective field sizes devote themselves to detecting objects.
Extensive experiments on PASCAL VOC datasets are conducted, and we achieve the
state-of-the-art results and a better accuracy vs. speed trade-off, i.e.,
\$81.3\%\$ mAP vs. \$42.3\$ FPS with \$320320\$ input image on VOC2007
dataset. Codes will be made publicly available.
Users
Please
log in to take part in the discussion (add own reviews or comments).