Abstract
Spectral clustering methods are common graph-based
approaches to clustering of data. Spectral clustering
algorithms typically start from local information
encoded in a weighted graph on the data and cluster
according to the global eigenvectors of the
corresponding (normalized) similarity matrix. One
contribution of this paper is to present fundamental
limitations of this general local to global approach.
We show that based only on local information, the
underlying normalized cut functional is not a suitable
measure for the quality of clustering. Further, even
with a suitable similarity measure, we show that the
first few eigenvectors of such adjacency matrices
cannot successfully cluster datasets that contain
structures at different scales of size and density.
These findings provide theoretical explanations to some
empirically observed characteristics of these methods.
Based on our theoretical analysis, a second
contribution of this paper is a novel diffusion based
measure to evaluate the coherence of individual
clusters. Our measure can be used in conjunction with
any graph-based clustering method (global, top-down,
bottom-up), it is scale-free and can determine coherent
clusters at all scales. We present simple examples
where various spectral clustering algorithms fail,
whereas clustering utilizing this coherence measure
finds the correct clusters at all scales.
Users
Please
log in to take part in the discussion (add own reviews or comments).