Article,

Measuring mono-word termhood by rank difference via corpus comparison

, and .
Terminology, 14 (2): 204--230 (2008)

Abstract

Terminology as a set of concept carriers crystallizes our special knowledge about a subject. Automatic term recognition (ATR) plays a critical role in the processing and management of various kinds of information, knowledge and documents, e.g., knowledge acquis ition via text mining. Measuring termhood properly is one of the core issues involved in ATR. This article presents a novel approach to termhood measurement for mono-word terms via corpus comparison, which quantifies the termhood of a term candidate as its rank difference in a domain and a background corpus. Our ATR experiments to identify legal terms in Hong Kong (HK) legal texts with the British National Corpus (BNC) as background corpus provide evidence to confirm the validity and effectiveness of this approach. Without any prior knowledge and ad hoc heuristics, it achieves a precision of 97.0\% on the top 1000 candidates and a precision of 96.1\% on the top 10\% candidates that are most highly ranked by the termhood measure, illustrating a state-of-the-art performance on mono-word ATR in the field.

Tags

Users

  • @sofiagruiz92

Comments and Reviews