unhammer > Sinhala | BibSonomy

bookmarks (hide)1
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1The EMILLE Corpus
monolingual, parallel and annotated corpora. There are fourteen monolingual corpora, including both written and (for some languages) spoken data for fourteen South Asian languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Oriya, Punjabi, Sinhala, Tamil, Telegu and Urdu. The EMILLE monolingual corpora contain approximately 92,799,000 words (including 2,627,000 words of transcribed spoken data for Bengali, Gujarati, Hindi, Punjabi and Urdu). The parallel corpus consists of 200,000 words of text in English and its accompanying translations in Hindi, Bengali, Punjabi, Gujarati and Urdu. The annotated component includes the Urdu monolingual and parallel corpora annotated for parts-of-speech, together with twenty written Hindi corpus files annotated to show the nature of demonstrative use. The corpus is marked up using CES-compliant SGML, and encoded using Unicode.
16 years ago by @unhammer
show all tags
Assamese
Bengali
Gujarati
Hindi
Kannada
Kashmiri
Malayalam
Marathi
Oriya
Punjabi
Sinhala
Tamil
Telegu
Urdu
corpus
parallel
AssameseBengaliGujaratiHindiKannadaKashmiriMalayalamMarathiOriyaPunjabiSinhalaTamilTeleguUrducorpusparallel
(0)
copydelete
- community post
- history of this post

⟨⟨
⟨
1
⟩
⟩⟩

publications (hide)
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

No matching posts.

⟨⟨
⟨
⟩
⟩⟩