@obj

A Formal Concept Analysis-Based Domain-Specific Thesaurus and Its Application in Document Representation

, , and . Computational Science and Its Applications – ICCSA 2010, (2010)

Abstract

Many techniques in the process of document retrieval and clustering, based on the vector space model, represent documentsby vectors. They ignore the conceptual relationships of terms such as synonyms, hypernyms and hyponyms and, especially, treatterms as a bag of terms. The application of conceptual relationships of terms has been proved by generating improved results for document clusteringin previous studies. For those studies, thesauri like WordNet were used to provide the information of relationships betweenterms. However, some domain-specific terms like "query expansion" and "document clustering" cannot be found in these thesauri.These terms are thought of as important features in domain-specific documents. In this paper, we propose an automatic domain-specificthesaurus building approach based on Formal Concept Analysis (FCA) dealing with the problem with general thesauri. We alsoapply the domain-specific thesaurus as background knowledge to represent documents by concept dimension vectors. In the evaluation,an improved result by our method compared to traditional approaches is shown.

Description

SpringerLink - Book Chapter

Links and resources

Tags

community

  • @obj
  • @dblp
@obj's tags highlighted