Inproceedings,

Automatic Clustering Assessment through a Social Tagging System

, and .
Computational Science and Engineering (CSE), 2012 IEEE 15th International Conference on, page 74 -81. (December 2012)
DOI: 10.1109/ICCSE.2012.20

Abstract

Assessing the quality of the clustering process is fundamental in unsupervised clustering. In literature we can find three different clustering validity techniques: external criteria, internal criteria and relative criteria. In this paper, we focus on external criteria and present an algorithm that allows the implementation of external measures to assess clustering quality when the structure of the data set is unknown. To obtain an automatic partition of a data set and to reflect how documents must be grouped according to human intuition we use internal information present in data like descriptions provide by the users as tags and the distance between documents. The results show an evident correlation between manual and automatic classes indicating it is acceptable to use an automatic partition. In addition to presenting an alternative to finding the structure of the data set using meta-data such as tags, we also wanted to test the impact of their integration in the k-means++ algorithm and verify how it influences the quality of the formed clusters, suggesting a model of integration based on the occurrence of tags in document content. The experimental results indicate a positive impact when external measures are calculated, although there was no apparent correlation between the weight assigned to the tags and the quality of the obtained clusters.

Tags

Users

  • @folke

Comments and Reviews