Artikel,

The impact of indexing approaches on Arabic text classification

A. Al-Badarneh, E. Al-Shawakfa, B. Bani-Ismail, K. Al-Rababah, und S. Shatnawi.
Journal of Information Science, 43 (2): 159--173 (April 2017)
DOI: 10.1177/0165551515625030

Zusammenfassung

This paper investigates the impact of using different indexing approaches (full-word, stem, and root) when classifying Arabic text. In this study, the naïve Bayes classifier is used to construct the multinomial classification models and is evaluated using stratified k-fold cross-validation (k ranges from 2 to 10). It is also uses a corpus that consists of 1000 normalized Arabic documents. The results of one experiment in this study show that significant accuracy improvements have occurred when the full-word form is used in most k-folds. Further experiments show that the classifier has achieved the highest accuracy in the eight-fold by using 7/8–1/8 train–test ratio, despite the indexing approach being used. The overall results of this study show that the classifier has achieved the maximum micro-average accuracy 99.36\%, either by using the full-word form or the stem form. This proves that the stem is a better choice to use when classifying Arabic text, because it makes the corpus dataset smaller and this will enhance both the processing time and storage utilization, and achieve the highest level of accuracy.

BibTeX-Schlüssel: al-badarneh_impact_2017
Eintragstyp: article
Jahr: 2017
Monat: apr
Zeitschrift: Journal of Information Science
Nummer: 2
Seiten: 159--173
Band: 43
issn: 01655515
DOI: 10.1177/0165551515625030

BibSonomy

The impact of indexing approaches on Arabic text classification

Zusammenfassung

Tags

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf