Beliebiger Eintrag,

An open diachronic corpus of historical Spanish: annotation criteria and automatic modernisation of spelling

F. Sánchez-Martínez, I. Martínez-Sempere, X. Ivars-Ribes, und R. Carrasco.
(2013)cite arxiv:1306.3692Comment: The part of this paper describing the IMPACT-es corpus has been accepted for publication in the journal Language Resources and Evaluation (http://link.springer.com/article/10.1007/s10579-013-9239-y).

Zusammenfassung

The IMPACT-es diachronic corpus of historical Spanish compiles over one hundred books --containing approximately 8 million words-- in addition to a complementary lexicon which links more than 10 thousand lemmas with attestations of the different variants found in the documents. This textual corpus and the accompanying lexicon have been released under an open license (Creative Commons by-nc-sa) in order to permit their intensive exploitation in linguistic research. Approximately 7% of the words in the corpus (a selection aimed at enhancing the coverage of the most frequent word forms) have been annotated with their lemma, part of speech, and modern equivalent. This paper describes the annotation criteria followed and the standards, based on the Text Encoding Initiative recommendations, used to the represent the texts in digital form. As an illustration of the possible synergies between diachronic textual resources and linguistic research, we describe the application of statistical machine translation techniques to infer probabilistic context-sensitive rules for the automatic modernisation of spelling. The automatic modernisation with this type of statistical methods leads to very low character error rates when the output is compared with the supervised modern version of the text.

BibTeX-Schlüssel: sanchezmartinez2013diachronic
Eintragstyp: misc
Jahr: 2013
URL: http://arxiv.org/abs/1306.3692
Hinweis: cite arxiv:1306.3692Comment: The part of this paper describing the IMPACT-es corpus has been accepted for publication in the journal Language Resources and Evaluation (http://link.springer.com/article/10.1007/s10579-013-9239-y)

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Bitte melden Sie sich an um selbst Rezensionen oder Kommentare zu erstellen.

Zitieren Sie diese Publikation

@misc{sanchezmartinez2013diachronic, abstract = {The IMPACT-es diachronic corpus of historical Spanish compiles over one hundred books --containing approximately 8 million words-- in addition to a complementary lexicon which links more than 10 thousand lemmas with attestations of the different variants found in the documents. This textual corpus and the accompanying lexicon have been released under an open license (Creative Commons by-nc-sa) in order to permit their intensive exploitation in linguistic research. Approximately 7% of the words in the corpus (a selection aimed at enhancing the coverage of the most frequent word forms) have been annotated with their lemma, part of speech, and modern equivalent. This paper describes the annotation criteria followed and the standards, based on the Text Encoding Initiative recommendations, used to the represent the texts in digital form. As an illustration of the possible synergies between diachronic textual resources and linguistic research, we describe the application of statistical machine translation techniques to infer probabilistic context-sensitive rules for the automatic modernisation of spelling. The automatic modernisation with this type of statistical methods leads to very low character error rates when the output is compared with the supervised modern version of the text.}, added-at = {2013-10-27T17:10:45.000+0100}, author = {Sánchez-Martínez, Felipe and Martínez-Sempere, Isabel and Ivars-Ribes, Xavier and Carrasco, Rafael C.}, biburl = {https://www.bibsonomy.org/bibtex/24d5a4e732e2730fedc6661baa9f4d00a/filologanoga}, description = {[1306.3692] An open diachronic corpus of historical Spanish: annotation criteria and automatic modernisation of spelling}, interhash = {1b3348d482543506885481376903b693}, intrahash = {4d5a4e732e2730fedc6661baa9f4d00a}, keywords = {ClassicsNLP}, note = {cite arxiv:1306.3692Comment: The part of this paper describing the IMPACT-es corpus has been accepted for publication in the journal Language Resources and Evaluation (http://link.springer.com/article/10.1007/s10579-013-9239-y)}, timestamp = {2013-10-27T17:10:45.000+0100}, title = {An open diachronic corpus of historical Spanish: annotation criteria and automatic modernisation of spelling}, url = {http://arxiv.org/abs/1306.3692}, year = 2013 }

BibSonomy

An open diachronic corpus of historical Spanish: annotation criteria and automatic modernisation of spelling

Zusammenfassung

Tags

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf