Author of the publication

Unsupervised Audio-Caption Aligning Learns Correspondences Between Individual Sound Events and Textual Phrases.

, , , and . ICASSP, page 8867-8871. IEEE, (2022)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Multi-label vs. combined single-label sound event detection with deep neural networks., , , and . EUSIPCO, page 2551-2555. IEEE, (2015)Stacked Convolutional and Recurrent Neural Networks for Bird Audio Detection., , , and . CoRR, (2017)Stacked Convolutional and Recurrent Neural Networks for Music Emotion Recognition., , , , , and . CoRR, (2017)Bayesian extensions to non-negative matrix factorisation for audio signal modelling., , and . ICASSP, page 1825-1828. IEEE, (2008)Exemplar-Based Sparse Representation With Residual Compensation for Voice Conversion., , , and . IEEE ACM Trans. Audio Speech Lang. Process., 22 (10): 1506-1521 (2014)A multi-device dataset for urban acoustic scene classification., , and . DCASE, page 9-13. (2018)Differentiable Tracking-Based Training of Deep Learning Sound Source Localizers., , and . WASPAA, page 211-215. IEEE, (2021)A Curated Dataset of Urban Scenes for Audio-Visual Scene Analysis., , , and . ICASSP, page 626-630. IEEE, (2021)Non-negative tensor factorization models for Bayesian audio processing, , and . Digital Signal Processing, (March 2015)Query by Example of Audio Signals using Euclidean Distance Between Gaussian Mixture Models., and . ICASSP (1), page 225-228. IEEE, (2007)