Misc,

Document Author Classification using Generalized Discriminant Analysis

, , and .
(2006)

Abstract

Classification by document authorship based on statistical analysis — stylometry — is considered here by using feature vectors obtained from counts of all words in the intersecting sets of the training data. This differs from some previous stylometry, which used only selected “noncontextual” words with the highest counts, and also from conventional text search techniques, where noncontextual words are frequently left out when the term-by-document matrices are formed. The dimensionality of the resulting vector is reduced using a generalized discriminant analysis (GDA). The method is tested on three sets of documents which have been previously subjected to statistical analysis. Results show that the method is successful at identifying author differences and at classifying unknown authorship, consistent with previous techniques.

Tags

Users

  • @schwemmlein

Comments and Reviews