Article,

Automatic term recognition based on statistics of compound nouns

H. Nakagawa.
Terminology, 6 (2): 195--210 (2001)
DOI: 10.1075/term.6.1.05nak

Abstract

The NTCIR1 TMREC group called for participation of the term recognition task which is a part of NTCIR1 held in 1999. As an activity of TMREC, they have provided us with the test collection of the term recognition task. The goal of this task is to automatically recognize and extract terms from the text corpus which consists of 1,870 abstracts gathered from the NACSIS Academic Conference Database. This article describes the term extraction method we have proposed to extract terms consisting of simple and compound nouns and the experimental evaluation of the proposed method with this NTCIR TMREC test collection. The basic idea of scoring a simple noun N of our term extraction method is to count how many nouns are conjoined with N to make compound nouns. Then we extend this score to measure the score of compound nouns because most of technical terms are compound nouns. Our method has a parameter to tune the degree of preference either for longer compound nouns or for shorter compound nouns. As for term candidates, in addition to noun sequences, we may add variations such as patterns of 'A no B' that roughly means 'B of A' or 'A's B' and/or 'A na B' where 'A na' is an adjective. Experimental results of our method are promising, namely recall of 0.83, precision of 0.46 and F-value of 0.59 for exactly matched extracted terms when we take into account top scoring 16,000 extracted terms.

BibTeX key: nakagawa_automatic_2001
entry type: article
year: 2001
journal: Terminology
number: 2
pages: 195--210
volume: 6
DOI: 10.1075/term.6.1.05nak
url: http://dx.doi.org/10.1075/term.6.1.05nak

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{nakagawa_automatic_2001, abstract = {The NTCIR1 TMREC group called for participation of the term recognition task which is a part of NTCIR1 held in 1999. As an activity of TMREC, they have provided us with the test collection of the term recognition task. The goal of this task is to automatically recognize and extract terms from the text corpus which consists of 1,870 abstracts gathered from the NACSIS Academic Conference Database. This article describes the term extraction method we have proposed to extract terms consisting of simple and compound nouns and the experimental evaluation of the proposed method with this NTCIR TMREC test collection. The basic idea of scoring a simple noun N of our term extraction method is to count how many nouns are conjoined with N to make compound nouns. Then we extend this score to measure the score of compound nouns because most of technical terms are compound nouns. Our method has a parameter to tune the degree of preference either for longer compound nouns or for shorter compound nouns. As for term candidates, in addition to noun sequences, we may add variations such as patterns of 'A no B' that roughly means 'B of A' or 'A's B' and/or 'A na B' where 'A na' is an adjective. Experimental results of our method are promising, namely recall of 0.83, precision of 0.46 and F-value of 0.59 for exactly matched extracted terms when we take into account top scoring 16,000 extracted terms.}, added-at = {2018-11-04T17:02:36.000+0100}, author = {Nakagawa, H}, biburl = {https://www.bibsonomy.org/bibtex/2c7c1b2c93b6b3f81f58b4d4f3abe5047/lepsky}, doi = {10.1075/term.6.1.05nak}, interhash = {f14c673b2924283441516790d58fccd6}, intrahash = {c7c1b2c93b6b3f81f58b4d4f3abe5047}, journal = {Terminology}, keywords = {terminologieextraktion}, number = 2, pages = {195--210}, timestamp = {2018-11-04T17:02:36.000+0100}, title = {Automatic term recognition based on statistics of compound nouns}, url = {http://dx.doi.org/10.1075/term.6.1.05nak}, volume = 6, year = 2001 }

BibSonomy

Automatic term recognition based on statistics of compound nouns

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on