In the past few years, object detection has attracted a lot of attention in the context of human–robot collaboration and Industry 5.0 due to enormous quality improvements in deep learning technologies. In many applications, object detection models have to be able to quickly adapt to a changing environment, i.e., to learn new objects. A crucial but challenging prerequisite for this is the automatic generation of new training data which currently still limits the broad application of object detection methods in industrial manufacturing. In this work, we discuss how to adapt state-of-the-art object detection methods for the task of automatic bounding box annotation in a use case where the background is homogeneous and the object’s label is provided by a human. We compare an adapted version of Faster R-CNN and the Scaled-YOLOv4-p5 architecture and show that both can be trained to distinguish unknown objects from a complex but homogeneous background using only a small amount of training data. In contrast to most other state-of-the-art methods for bounding box labeling, our proposed method neither requires human verification, a predefined set of classes, nor a very large manually annotated dataset. Our method outperforms the state-of-the-art, transformer-based object discovery method LOST on our simple fruits dataset by large margins.
Gromit-MPX is an on-screen annotation tool that works with any Unix desktop environment under X11 as well as Wayland. - GitHub - bk138/gromit-mpx: Gromit-MPX is an on-screen annotation tool that works with any Unix desktop environment under X11 as well as Wayland.
J. Davis, and D. Huttenlocher. CSCL '95: The first international conference on Computer support for collaborative learning, page 84--88. Mahwah, NJ, USA, Lawrence Erlbaum Associates, Inc., (1995)
P. Dmitriev, N. Eiron, M. Fontoura, and E. Shekita. Proceedings of the 15th International Conference on World Wide Web, page 811--817. New York, NY, USA, ACM, (2006)
C. Neuwirth, D. Kaufer, R. Chandhok, and J. Morris. Proceedings of the 1990 ACM conference on Computer-supported cooperative work, page 183--195. New York, NY, USA, ACM, (1990)
C. Liao, F. Guimbreti&\#232;re, and K. Hinckley. UIST '05: Proceedings of the 18th annual ACM symposium on User interface software and technology, page 241--244. New York, NY, USA, ACM, (2005)
S. Bao, G. Xue, X. Wu, Y. Yu, B. Fei, and Z. Su. Proceedings of the 16th International Conference on World Wide Web, page 501--510. New York, NY, USA, ACM, (2007)
R. Yan, A. Natsev, and M. Campbell. MS '07: Workshop on multimedia information retrieval on The many faces of multimedia semantics, page 13--20. New York, NY, USA, ACM Press, (2007)
P. Dmitriev, N. Eiron, M. Fontoura, and E. Shekita. WWW '06: Proceedings of the 15th international conference on World Wide Web, page 811--817. New York, NY, USA, ACM, (2006)
J. Kim, R. Farzan, and P. Brusilovsky. HT '08: Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, page 233--234. New York, NY, USA, ACM, (2008)
J. Kim, R. Farzan, and P. Brusilovsky. BooksOnline '08: Proceeding of the 2008 ACM workshop on Research advances in large digital book repositories, page 25--28. New York, NY, USA, ACM, (2008)
P. Brusilovsky, H. Hsiao, and M. Yudelson. JCDL '08: Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries, page 337--340. New York, NY, USA, ACM, (2008)
G. Buscher, A. Dengel, and L. van Elst. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, page 387--394. New York, NY, USA, ACM, (2008)
A. Zhang, M. Igo, M. Facciotti, and D. Karger. Proceedings of the Fourth (2017) ACM Conference on Learning @ Scale, page 319--322. New York, NY, USA, ACM, (2017)
S. Zyto, D. Karger, M. Ackerman, and S. Mahajan. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, page 1883--1892. New York, NY, USA, ACM, (2012)
P. Pantel, M. Gamon, O. Alonso, and K. Haas. Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, page 285--294. New York, NY, USA, ACM, (2012)
A. Zhang, M. Igo, M. Facciotti, and D. Karger. Proceedings of the Fourth (2017) ACM Conference on Learning @ Scale, page 319--322. New York, NY, USA, ACM, (2017)
S. Bao, G. Xue, X. Wu, Y. Yu, B. Fei, and Z. Su. Proceedings of the 16th International Conference on World Wide Web, page 501--510. New York, NY, USA, ACM, (2007)
G. Buscher, A. Dengel, and L. van Elst. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, page 387--394. New York, NY, USA, ACM, (2008)
S. Buechel, and U. Hahn. chapter Readers vs. Writers vs. Texts: Coping with Different Perspectives of Text Understanding in Emotion Annotation, page 1--12. Association for Computational Linguistics, (2017)
T. Tran, N. Tran, A. Teka Hadgu, and R. Jäschke. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, (September 2015)
T. Tran, N. Tran, A. Teka Hadgu, and R. Jäschke. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, (September 2015)
T. Tran, N. Tran, A. Hadgu, and R. Jäschke. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), page 97--106. Association for Computational Linguistics, (September 2015)
J. Jeon, V. Lavrenko, and R. Manmatha. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, page 119--126. New York, NY, USA, ACM, (2003)