copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Automatic Spanish Translation of the SQuAD Dataset for Multilingual Question Answering

C. Carrino, M. Costa jussà, and J. Fonollosa. (2019)cite arxiv:1912.05200Comment: Submitted to LREC 2020.

Abstract

Recently, multilingual question answering became a crucial research topic, and it is receiving increased interest in the NLP community. However, the unavailability of large-scale datasets makes it challenging to train multilingual QA systems with performance comparable to the English ones. In this work, we develop the Translate Align Retrieve (TAR) method to automatically translate the Stanford Question Answering Dataset (SQuAD) v1.1 to Spanish. We then used this dataset to train Spanish QA systems by fine-tuning a Multilingual-BERT model. Finally, we evaluated our QA models with the recently proposed MLQA and XQuAD benchmarks for cross-lingual Extractive QA. Experimental results show that our models outperform the previous Multilingual-BERT baselines achieving the new state-of-the-art value of 68.1 F1 points on the Spanish MLQA corpus and 77.6 F1 and 61.8 Exact Match points on the Spanish XQuAD corpus. The resulting, synthetically generated SQuAD-es v1.1 corpora, with almost 100\% of data contained in the original English version, to the best of our knowledge, is the first large-scale QA training resource for Spanish.

Description

[1912.05200] Automatic Spanish Translation of the SQuAD Dataset for Multilingual Question Answering

Links and resources

BibTeX key: carrino2019automatic
entry type: misc
year: 2019
url: http://arxiv.org/abs/1912.05200
note: cite arxiv:1912.05200Comment: Submitted to LREC 2020

@festplatte's tags highlighted

Cite this publication

search on

Meta data

Last update 4 years ago
Created 4 years ago

Comments and Reviews
(0)

There is no review or comment yet. You can write one!

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Automatic Spanish Translation of the SQuAD Dataset for Multilingual Question Answering

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Automatic Spanish Translation of the SQuAD Dataset for Multilingual Question Answering

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Automatic Spanish Translation of the SQuAD Dataset for Multilingual Question Answering

Comments and Reviews
(0)