Fundamental Limitations of Spectral Clustering Methods
B. Nadler, und M. Galun. Advances in Neural Information Processing Systems 19, MIT Press, Cambridge, MA, (2007)
Zusammenfassung
Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the underlying normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. These findings provide theoretical explanations to some empirically observed characteristics of these methods. Based on our theoretical analysis, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any graph-based clustering method (global, top-down, bottom-up), it is scale-free and can determine coherent clusters at all scales. We present simple examples where various spectral clustering algorithms fail, whereas clustering utilizing this coherence measure finds the correct clusters at all scales.
%0 Book Section
%1 Nadler:Galun:07
%A Nadler, Boaz
%A Galun, Meirav
%B Advances in Neural Information Processing Systems 19
%C Cambridge, MA
%D 2007
%E Schölkopf, B.
%E Platt, J.
%E Hoffman, T.
%I MIT Press
%K 2006 clustering nips spectral
%T Fundamental Limitations of Spectral Clustering Methods
%U http://books.nips.cc/papers/files/nips19/NIPS2006_0134.pdf
%X Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the underlying normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. These findings provide theoretical explanations to some empirically observed characteristics of these methods. Based on our theoretical analysis, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any graph-based clustering method (global, top-down, bottom-up), it is scale-free and can determine coherent clusters at all scales. We present simple examples where various spectral clustering algorithms fail, whereas clustering utilizing this coherence measure finds the correct clusters at all scales.
@incollection{Nadler:Galun:07,
abstract = {Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the underlying normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. These findings provide theoretical explanations to some empirically observed characteristics of these methods. Based on our theoretical analysis, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any graph-based clustering method (global, top-down, bottom-up), it is scale-free and can determine coherent clusters at all scales. We present simple examples where various spectral clustering algorithms fail, whereas clustering utilizing this coherence measure finds the correct clusters at all scales.},
added-at = {2008-01-16T16:16:06.000+0100},
address = {Cambridge, MA},
author = {Nadler, Boaz and Galun, Meirav},
biburl = {https://www.bibsonomy.org/bibtex/20e4ac98b3da886340bc0a8808045dcc2/seandalai},
booktitle = {Advances in Neural Information Processing Systems 19},
editor = {Sch\"{o}lkopf, B. and Platt, J. and Hoffman, T.},
interhash = {b4e892304da2207b7a9856a2dcf53169},
intrahash = {0e4ac98b3da886340bc0a8808045dcc2},
keywords = {2006 clustering nips spectral},
publisher = {MIT Press},
timestamp = {2008-01-16T16:16:07.000+0100},
title = {Fundamental Limitations of Spectral Clustering Methods},
url = {http://books.nips.cc/papers/files/nips19/NIPS2006_0134.pdf},
year = 2007
}