@amanshakya

Spoken Language Identification Using Convolutional Neural Network In Nepalese Context

, , and . Proceedings of the 26th International Conference of the ORIENTAL- COCOSDA (O-COCOSDA 2023), page 1-6. IEEE, (December 2023)
DOI: 10.1109/O-COCOSDA60357.2023.10482929

Abstract

In this work we perform a language identification (LID) task that classifies few regional languages spoken in Nepal. We use a Convolutional Neural Network (CNN) that operates on spectrogram of the provided audio utterances. Datasets for three languages Nepali, Hindi and Sanskrit were downloaded from voxlingua107. Additional two languages Newari and Maithili were extracted from YouTube news channels, since there were no standard datasets available for these languages. Nepal Mandel TV and TV TODAY JANAKPUR channels were used for audio extraction. Classification accuracy for three languages from Voxlingua107 datasets are 95%, 92% and 89% for Sanskrit, Nepali and Hindi respectively. Experiment carried out in YouTube datasets along with voxlingua107 datasets produces accuracy of Sanskrit-74%, Nepali-72%, Hindi-68%, Newari-66% and Maithili-63%. Significant reduction in accuracy in this case is due to the quality of non-standard datasets obtained from YouTube. With suitable preprocessing and filtration performance can be enhanced further. In our experiments we show, that our model is capable to classify between few Nepalese regional languages from short speech utterances and can be easily extended to include more group of languages.

Links and resources

Tags

community