copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection

S. Venkatesh, D. Moffat, and E. Miranda. (2021)cite arxiv:2109.00962Comment: 20 pages, 4 figures, 6 tables. Added more experimental validation.

Abstract

Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes and their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, and music information retrieval. In recent years, most research articles adopt segmentation-by-classification. This technique divides audio into small frames and individually performs classification on these frames. In this paper, we present a novel approach called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. This is done by having separate output neurons to detect the presence of an audio class and predict its start and end points. YOHO obtained a higher F-measure and lower error rate than the state-of-the-art Convolutional Recurrent Neural Network on multiple datasets. As YOHO is purely a convolutional neural network and has no recurrent layers, it is faster during inference. In addition, as this approach is more end-to-end and predicts acoustic boundaries directly, it is significantly quicker during post-processing and smoothing.

Description

[2109.00962] You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection

Links and resources

BibTeX key: venkatesh2021yololike
entry type: misc
year: 2021
url: http://arxiv.org/abs/2109.00962
note: cite arxiv:2109.00962Comment: 20 pages, 4 figures, 6 tables. Added more experimental validation

@annakrause's tags highlighted

Cite this publication

@misc{venkatesh2021yololike, abstract = {Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes and their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, and music information retrieval. In recent years, most research articles adopt segmentation-by-classification. This technique divides audio into small frames and individually performs classification on these frames. In this paper, we present a novel approach called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. This is done by having separate output neurons to detect the presence of an audio class and predict its start and end points. YOHO obtained a higher F-measure and lower error rate than the state-of-the-art Convolutional Recurrent Neural Network on multiple datasets. As YOHO is purely a convolutional neural network and has no recurrent layers, it is faster during inference. In addition, as this approach is more end-to-end and predicts acoustic boundaries directly, it is significantly quicker during post-processing and smoothing.}, added-at = {2022-03-08T09:49:54.000+0100}, author = {Venkatesh, Satvik and Moffat, David and Miranda, Eduardo Reck}, biburl = {https://www.bibsonomy.org/bibtex/29695a761267d44a7f7be1c13bfcac84d/annakrause}, description = {[2109.00962] You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection}, interhash = {e0dd5369a04a8b31e886d111f68708e3}, intrahash = {9695a761267d44a7f7be1c13bfcac84d}, keywords = {audio segmentation singleshot}, note = {cite arxiv:2109.00962Comment: 20 pages, 4 figures, 6 tables. Added more experimental validation}, timestamp = {2022-03-08T09:49:54.000+0100}, title = {You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection}, url = {http://arxiv.org/abs/2109.00962}, year = 2021 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection

Comments and Reviews
(0)