Erstelle KI Videos aus Text mit dem KI Video Generator. Hole dir die fortschrittlichsten KI Avatare und Voiceover in über 140 Sprachen. Teste den kostenlosen KI Video Generator noch heute!
Today, speech technology is only available for a small fraction of the thousands of languages spoken around the world because traditional systems need to be trained on large amounts of annotated speech audio with transcriptions. Obtaining that kind of data for every human language and dialect is almost impossible.
Wav2vec works around this limitation by requiring little to no transcribed data. The model uses self-supervision to push the boundaries by learning from unlabeled training data. This enables speech recognition systems for many more languages and dialects, such as Kyrgyz and Swahili, which don’t have a lot of transcribed speech audio. Self-supervision is the key to leveraging unannotated data and building better systems.
P. Moreira, Y. Bizzoni, K. Nielbo, I. Lassen, and M. Thomsen. Proceedings of the The 5th Workshop on Narrative Understanding, page 25--35. Toronto, Canada, Association for Computational Linguistics, (July 2023)
Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, page 1480--1489. San Diego, California, Association for Computational Linguistics, (June 2016)
A. Nenkova, and R. Passonneau. Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, page 145--152. Boston, Massachusetts, USA, Association for Computational Linguistics, (2004)