Although term extraction has been researched for more than 20 years, only a few studies focus on under-resourced languages. Moreover, bilingual term mapping from comparable corpora for these languages has attracted researchers only recently. This paper presents methods for term extraction, term tagging in documents, and bilingual term mapping from comparable corpora for four under-resourced languages: Croatian, Latvian, Lithuanian, and Romanian. Methods described in this paper are language independent as long as language specific parameter data is provided by the user and the user has access to a part of speech or a morpho-syntactic tagger.
The cb2Bib is a free, open source, and multiplatform application for rapidly extracting unformatted, or unstandardized bibliographic references from email alerts, journal Web pages, and PDF files. The cb2Bib facilitates the capture of single references from unformatted and non standard sources. Output references are written in BibTeX. Article files can be easily linked and renamed by dragging them onto the cb2Bib window. Additionally, it permits editing and browsing BibTeX files, citing references, searching references and the full contents of the referenced documents, inserting bibliographic metadata to documents, and writing short notes that interrelate several references.
Text mining and web scraping involves chunk parsing and recognition of named entities (institutions, dates, titles)...The extraction of named entities is mostly based on a strategy that combines look up in gazetteers (lists of companies, cities, etc.) wit
The main task of the GenIELex project is the development of a biochemistry specific lexicon as well as of an annotated corpus for the evaluation of the system. The need for the construction of such a lexicon is illustrated by the following figures, based
Todays feature of the week post will point you to one of the hidden features of the system. As most of you certainly know one way to acquire the meta data of a publication is to use the screen scraping facility of BibSonomy.
Y. Ohsawa, N. Benson, und M. Yachida. ADL '98: Proceedings of the Advances in Digital Libraries Conference, Seite 12. Washington, DC, USA, IEEE Computer Society, (1998)
M. Ozdil, und F. Vural. Document Analysis and Recognition, 1997., Proceedings of the Fourth
International Conference on, 2, Seite 483--486. IEEE Computer Society, (1997)