Abstract
Active learning (AL) attempts to maximize the performance gain of the model
by marking the fewest samples. Deep learning (DL) is greedy for data and
requires a large amount of data supply to optimize massive parameters, so that
the model learns how to extract high-quality features. In recent years, due to
the rapid development of internet technology, we are in an era of information
torrents and we have massive amounts of data. In this way, DL has aroused
strong interest of researchers and has been rapidly developed. Compared with
DL, researchers have relatively low interest in AL. This is mainly because
before the rise of DL, traditional machine learning requires relatively few
labeled samples. Therefore, early AL is difficult to reflect the value it
deserves. Although DL has made breakthroughs in various fields, most of this
success is due to the publicity of the large number of existing annotation
datasets. However, the acquisition of a large number of high-quality annotated
datasets consumes a lot of manpower, which is not allowed in some fields that
require high expertise, especially in the fields of speech recognition,
information extraction, medical images, etc. Therefore, AL has gradually
received due attention. A natural idea is whether AL can be used to reduce the
cost of sample annotations, while retaining the powerful learning capabilities
of DL. Therefore, deep active learning (DAL) has emerged. Although the related
research has been quite abundant, it lacks a comprehensive survey of DAL. This
article is to fill this gap, we provide a formal classification method for the
existing work, and a comprehensive and systematic overview. In addition, we
also analyzed and summarized the development of DAL from the perspective of
application. Finally, we discussed the confusion and problems in DAL, and gave
some possible development directions for DAL.
Users
Please
log in to take part in the discussion (add own reviews or comments).