Article,

On the Performance of Clustering in Hilbert Spaces

G. Biau, L. Devroye, and G. Lugosi.
Information Theory, IEEE Transactions on, 54 (2): 781--790 (February 2008)
DOI: 10.1109/tit.2007.913516

Abstract

Based on randomly drawn vectors in a separable Hilbert space, one may construct a k-means clustering scheme by minimizing an empirical squared error. We investigate the risk of such a clustering scheme, defined as the expected squared distance of a random vector X from the set of cluster centers. Our main result states that, for an almost surely bounded , the expected excess clustering risk is O(Â¿1/n) . Since clustering in high (or even infinite)-dimensional spaces may lead to severe computational problems, we examine the properties of a dimension reduction strategy for clustering based on Johnson-Lindenstrauss-type random projections. Our results reflect a tradeoff between accuracy and computational complexity when one uses k-means clustering after random projection of the data to a low-dimensional space. We argue that random projections work better than other simplistic dimension reduction schemes.

BibTeX key: citeulike:11151499
entry type: article
year: 2008
month: feb
institution: LSTA & LPMA, Univ. Pierre et Marie Curie-Paris VI, Paris, France
journal: Information Theory, IEEE Transactions on
number: 2
pages: 781--790
publisher: IEEE
volume: 54
citeulike-attachment-1: biau-devroye-lugosi-clustering2007.pdf; /pdf/user/gdmcbain/article/11151499/943107/biau-devroye-lugosi-clustering2007.pdf; 63f69ffd64583c289668adb3c274ce1d7efef97b
citeulike-article-id: 11151499
file: biau-devroye-lugosi-clustering2007.pdf
issn: 0018-9448
citeulike-linkout-0: http://dx.doi.org/10.1109/tit.2007.913516
citeulike-linkout-1: http://ieeexplore.ieee.org/xpls/abs\_all.jsp?arnumber=4439834
priority: 0
posted-at: 2014-01-23 22:34:12
DOI: 10.1109/tit.2007.913516
url: http://dx.doi.org/10.1109/tit.2007.913516

BibSonomy

On the Performance of Clustering in Hilbert Spaces

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on