Abstract
In this paper we present a rule-based method for multi-word term (MWT) extraction and
lemmatization of extracted multi-word terms. Extracted and lemmatized MWT candidates are
post-processed using data-driven and heuristic approach in order to reject falsely offered lemmas
(“parasite lemmas”) and then ranked by calculating various measures before passing them to
human evaluators. For accepted terms dictionary entries are automatically produced that enable
generation of all terms’ inflected forms. All subtasks of this process are integrated into a tool for
development and management of lexical resources LeXimir (Stanković et al., 2016).
Users
Please
log in to take part in the discussion (add own reviews or comments).