WueDevils at SemEval-2022 Task 8: Multilingual News Article Similarity via Pair-Wise Sentence Similarity Matrices
D. Wangsadirdja, F. Heinickel, S. Trapp, A. Zehe, K. Kobs, и A. Hotho. Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), стр. 1235--1243. Seattle, United States, Association for Computational Linguistics, (июля 2022)
Аннотация
We present a system that creates pair-wise cosine and arccosine sentence similarity matrices using multilingual sentence embeddings obtained from pre-trained SBERT and Universal Sentence Encoder (USE) models respectively. For each news article sentence, it searches the most similar sentence from the other article and computes an average score. Further, a convolutional neural network calculates a total similarity score for the article pairs on these matrices. Finally, a random forest regressor merges the previous results to a final score that can optionally be extended with a publishing date score.
%0 Conference Paper
%1 wangsadirdja2022wuedevils
%A Wangsadirdja, Dirk
%A Heinickel, Felix
%A Trapp, Simon
%A Zehe, Albin
%A Kobs, Konstantin
%A Hotho, Andreas
%B Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
%C Seattle, United States
%D 2022
%I Association for Computational Linguistics
%K author:zehe from:albinzehe mlnlprjak multilingual myown news semeval similarity
%P 1235--1243
%T WueDevils at SemEval-2022 Task 8: Multilingual News Article Similarity via Pair-Wise Sentence Similarity Matrices
%U https://aclanthology.org/2022.semeval-1.175
%X We present a system that creates pair-wise cosine and arccosine sentence similarity matrices using multilingual sentence embeddings obtained from pre-trained SBERT and Universal Sentence Encoder (USE) models respectively. For each news article sentence, it searches the most similar sentence from the other article and computes an average score. Further, a convolutional neural network calculates a total similarity score for the article pairs on these matrices. Finally, a random forest regressor merges the previous results to a final score that can optionally be extended with a publishing date score.
@inproceedings{wangsadirdja2022wuedevils,
abstract = {We present a system that creates pair-wise cosine and arccosine sentence similarity matrices using multilingual sentence embeddings obtained from pre-trained SBERT and Universal Sentence Encoder (USE) models respectively. For each news article sentence, it searches the most similar sentence from the other article and computes an average score. Further, a convolutional neural network calculates a total similarity score for the article pairs on these matrices. Finally, a random forest regressor merges the previous results to a final score that can optionally be extended with a publishing date score.},
added-at = {2022-07-20T03:32:58.000+0200},
address = {Seattle, United States},
author = {Wangsadirdja, Dirk and Heinickel, Felix and Trapp, Simon and Zehe, Albin and Kobs, Konstantin and Hotho, Andreas},
biburl = {https://www.bibsonomy.org/bibtex/28feb9047b0e06aa3b40b74bb20b6c45b/dmir},
booktitle = {Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)},
interhash = {21ebbf0d6a7201da1580440d18885e68},
intrahash = {8feb9047b0e06aa3b40b74bb20b6c45b},
keywords = {author:zehe from:albinzehe mlnlprjak multilingual myown news semeval similarity},
month = jul,
pages = {1235--1243},
publisher = {Association for Computational Linguistics},
timestamp = {2024-01-18T10:31:52.000+0100},
title = {{W}ue{D}evils at {S}em{E}val-2022 Task 8: Multilingual News Article Similarity via Pair-Wise Sentence Similarity Matrices},
url = {https://aclanthology.org/2022.semeval-1.175},
year = 2022
}