Artikel in einem Konferenzbericht,

Audio Set: An ontology and human-labeled dataset for audio events

J. Gemmeke, D. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. Moore, M. Plakal, und M. Ritter.
2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seite 776-780. (März 2017)
DOI: 10.1109/ICASSP.2017.7952261

Zusammenfassung

Audio event recognition, the human-like ability to identify and relate sounds from audio, is a nascent problem in machine perception. Comparable problems such as object detection in images have reaped enormous benefits from comprehensive datasets - principally ImageNet. This paper describes the creation of Audio Set, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research. Using a carefully structured hierarchical ontology of 632 audio classes guided by the literature and manual curation, we collect data from human labelers to probe the presence of specific audio classes in 10 second segments of YouTube videos. Segments are proposed for labeling using searches based on metadata, context (e.g., links), and content analysis. The result is a dataset of unprecedented breadth and size that will, we hope, substantially stimulate the development of high-performance audio event recognizers.

BibTeX-Schlüssel: 7952261
Eintragstyp: inproceedings
Buchtitel: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Jahr: 2017
Monat: March
Seiten: 776-780
issn: 2379-190X
DOI: 10.1109/ICASSP.2017.7952261
URL: https://ieeexplore.ieee.org/document/7952261

BibSonomy

Audio Set: An ontology and human-labeled dataset for audio events

Zusammenfassung

Tags

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf