@lepsky

Identifying ISI-indexed articles by their lexical usage : a text analysis approach

, , , und . Journal of the Association for Information Science & Technology, 66 (3): 501--511 (März 2015)
DOI: 10.1002/asi.23194

Zusammenfassung

This research creates an architecture for investigating the existence of probable lexical divergences between articles, categorized as Institute for Scientific Information ( ISI) and non- ISI, and consequently, if such a difference is discovered, to propose the best available classification method. Based on a collection of ISI- and non- ISI-indexed articles in the areas of business and computer science, three classification models are trained. A sensitivity analysis is applied to demonstrate the impact of words in different syntactical forms on the classification decision. The results demonstrate that the lexical domains of ISI and non- ISI articles are distinguishable by machine learning techniques. Our findings indicate that the support vector machine identifies ISI-indexed articles in both disciplines with higher precision than do the Naïve Bayesian and K- Nearest Neighbors techniques.

Links und Ressourcen

Tags