Although term extraction has been researched for more than 20 years, only a few studies focus on under-resourced languages. Moreover, bilingual term mapping from comparable corpora for these languages has attracted researchers only recently. This paper presents methods for term extraction, term tagging in documents, and bilingual term mapping from comparable corpora for four under-resourced languages: Croatian, Latvian, Lithuanian, and Romanian. Methods described in this paper are language independent as long as language specific parameter data is provided by the user and the user has access to a part of speech or a morpho-syntactic tagger.
Y. Kim, K. Stratos, and D. Kim. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), page 643--653. Vancouver, Canada, Association for Computational Linguistics, (July 2017)
M. Müller-Prove. Good Tags – Bad Tags. Social Tagging in der Wissensorganisation, volume 47 of Medien in der Wissenschaft, page 15-22. Münster, New York, München, Berlin, Waxmann, (2008)
K. Tso-Sutter, L. Marinho, and L. Schmidt-Thieme. Proceedings of 23rd Annual ACM Symposium on Applied Computing (SAC'08), Fortaleza, Brazil, page 1995-1999. New York, NY, USA, ACM, (2008)
R. Jäschke, L. Marinho, A. Hotho, L. Schmidt-Thieme, and G. Stumme. Knowledge Discovery in Databases: PKDD 2007, 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, volume 4702 of Lecture Notes in Computer Science, page 506-514. Berlin, Heidelberg, Springer, (2007)
M. Blank, T. Bopp, T. Hampel, and J. Schulte. Good Tags - Bad Tags. Social Tagging in der Wissensorganisation, 47, page 85-97. Münster, New York, München, Berlin, Waxmann Verlag GmbH, (2008)