One of the consequences of fast computers, the Internet and inexpensive storage is the widespread collection of data from a variety of sources and of a variety of types. Sources of data include web click streams, financial transactions, and observational science data. Data types include categorical vs. numerical, static vs. dynamic, points in a metric space vs. vertices in a graph. The nagging question often posed about these data sets is: Can we find something interesting that we did not already know? The first answer to this question is often: Let's try clustering the data! Indeed, clustering is one of the most widely used tools for analyzing data sets. Some modern applications of clustering include clustering the web, clustering search results, clustering click streams, customer segmentation, and community discovery in social networks.
Because of its recent ubiquitous applicability, the field of clustering has undergone major revolution over the last few decades characterized by advances in approximation and randomized algorithms, novel formulations of the clustering problem, algorithms for clustering massively large data sets, algorithms for clustering data streams, and dimension reduction techniques