Article,

Optimizing Similarity Threshold for Abstract Similarity Metric in Speech Diarization Systems: A Mathematical Formulation

J. Prabhala, V. K, and R. Ravi.
Applied Mathematics and Sciences: An International Journal (MathSJ), 10 (1/2): 01-10 (June 2023)
DOI: 10.5121/mathsj.2023.10201

Full text

Abstract

Speaker diarization is a critical task in speech processing that aims to identify "who spoke when?" in an audio or video recording that contains unknown amounts of speech from unknown speakers and unknown number of speakers. Diarization has numerous applications in speech recognition, speaker identification, and automatic captioning. Supervised and unsupervised algorithms are used to address speaker diarization problems, but providing exhaustive labeling for the training dataset can become costly in supervised learning, while accuracy can be compromised when using unsupervised approaches. This paper presents a novel approach to speaker diarization, which defines loosely labeled data and employs x-vector embedding and a formalized approach for threshold searching with a given abstract similarity metric to cluster temporal segments into unique user segments. The proposed algorithm uses concepts of graph theory, matrix algebra, and genetic algorithm to formulate and solve the optimization problem. Additionally, the algorithm is applied to English, Spanish, and Chinese audios, and the performance is evaluated using wellknown similarity metrics. The results demonstrate that the robustness of the proposed approach. The findings of this research have significant implications for speech processing, speaker identification including those with tonal differences. The proposed method offers a practical and efficient solution for speaker diarization in real-world scenarios where there are labeling time and cost constraints.

BibTeX key: noauthororeditor
entry type: article
year: 2023
month: June
journal: Applied Mathematics and Sciences: An International Journal (MathSJ)
number: 1/2
pages: 01-10
volume: 10
language: English
issn: 2349 - 6223
DOI: 10.5121/mathsj.2023.10201
Document: https://www.airccse.com/mathsj/papers/10223mathsj01.pdf

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 noauthororeditor %A Prabhala, Jagat Chaitanya %A K, Venkatnareshbabu %A Ravi, Ragoju %D 2023 %J Applied Mathematics and Sciences: An International Journal (MathSJ) %K Speaker diarization graph matrix optimization processing signal similarity speech theory x-vector %N 1/2 %P 01-10 %R 10.5121/mathsj.2023.10201 %T Optimizing Similarity Threshold for Abstract Similarity Metric in Speech Diarization Systems: A Mathematical Formulation %U https://www.airccse.com/mathsj/papers/10223mathsj01.pdf %V 10 %X Speaker diarization is a critical task in speech processing that aims to identify "who spoke when?" in an audio or video recording that contains unknown amounts of speech from unknown speakers and unknown number of speakers. Diarization has numerous applications in speech recognition, speaker identification, and automatic captioning. Supervised and unsupervised algorithms are used to address speaker diarization problems, but providing exhaustive labeling for the training dataset can become costly in supervised learning, while accuracy can be compromised when using unsupervised approaches. This paper presents a novel approach to speaker diarization, which defines loosely labeled data and employs x-vector embedding and a formalized approach for threshold searching with a given abstract similarity metric to cluster temporal segments into unique user segments. The proposed algorithm uses concepts of graph theory, matrix algebra, and genetic algorithm to formulate and solve the optimization problem. Additionally, the algorithm is applied to English, Spanish, and Chinese audios, and the performance is evaluated using wellknown similarity metrics. The results demonstrate that the robustness of the proposed approach. The findings of this research have significant implications for speech processing, speaker identification including those with tonal differences. The proposed method offers a practical and efficient solution for speaker diarization in real-world scenarios where there are labeling time and cost constraints.

@article{noauthororeditor, abstract = {Speaker diarization is a critical task in speech processing that aims to identify "who spoke when?" in an audio or video recording that contains unknown amounts of speech from unknown speakers and unknown number of speakers. Diarization has numerous applications in speech recognition, speaker identification, and automatic captioning. Supervised and unsupervised algorithms are used to address speaker diarization problems, but providing exhaustive labeling for the training dataset can become costly in supervised learning, while accuracy can be compromised when using unsupervised approaches. This paper presents a novel approach to speaker diarization, which defines loosely labeled data and employs x-vector embedding and a formalized approach for threshold searching with a given abstract similarity metric to cluster temporal segments into unique user segments. The proposed algorithm uses concepts of graph theory, matrix algebra, and genetic algorithm to formulate and solve the optimization problem. Additionally, the algorithm is applied to English, Spanish, and Chinese audios, and the performance is evaluated using wellknown similarity metrics. The results demonstrate that the robustness of the proposed approach. The findings of this research have significant implications for speech processing, speaker identification including those with tonal differences. The proposed method offers a practical and efficient solution for speaker diarization in real-world scenarios where there are labeling time and cost constraints. }, added-at = {2023-06-28T08:52:43.000+0200}, author = {Prabhala, Jagat Chaitanya and K, Venkatnareshbabu and Ravi, Ragoju}, biburl = {https://www.bibsonomy.org/bibtex/2fdfa0df23b3227a4a002cf703bca9d23/journalmathsj}, doi = {10.5121/mathsj.2023.10201}, interhash = {9c6116eb34181b1a825ec412f4c90d2e}, intrahash = {fdfa0df23b3227a4a002cf703bca9d23}, issn = {2349 - 6223}, journal = {Applied Mathematics and Sciences: An International Journal (MathSJ)}, keywords = {Speaker diarization graph matrix optimization processing signal similarity speech theory x-vector}, language = {English}, month = {June}, number = {1/2}, pages = {01-10}, timestamp = {2023-06-28T08:52:43.000+0200}, title = {Optimizing Similarity Threshold for Abstract Similarity Metric in Speech Diarization Systems: A Mathematical Formulation}, url = {https://www.airccse.com/mathsj/papers/10223mathsj01.pdf}, volume = 10, year = 2023 }

BibSonomy

Optimizing Similarity Threshold for Abstract Similarity Metric in Speech Diarization Systems: A Mathematical Formulation

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on