RAID : robust algorithm for stemming text Document

Abstract

In this work, we propose a robust algorithm for automatic indexing unstructured Document. It can detect the most relevant words in an unstructured document. This algorithm is based on two main modules: the first module ensures the processing of compound words and the second allows the detection of the endings of the words that have not been taken into consideration by the approaches presented in literature. The proposed algorithm allows the detection and removal of suffixes and enriches the basis of suffixes by eliminating the suffixes of compound words. We have experienced our algorithm on two bases of words: a standard collection of terms and a medical corpus. The results show the remarkable effectiveness of our algorithm compared to others presented in related works.

BibTeX key: boukhari_raid_2016
entry type: article
year: 2016
journal: International Journal of Computer Information Systems and Industrial Management Applications
pages: 235--246
volume: 8
Document: http://www.mirlabs.net/ijcisim/regular_papers_2016/IJCISIM_24.pdf

BibSonomy

RAID : robust algorithm for stemming text Document

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on