The %CLUSTERGROUPS macro creates a custom template that combines a dendrogram and a blockplot to highlight each of the specified number of clusters with a different color.
The %CLUSTERGROUPS macro enhances dendrograms produced in SAS by adding color to highlight the clusters. You specify the number of clusters desired as input to the macro.
Abstract:
In this paper, an algorithm for cluster generation using tabu search approach with simulated annealing is proposed. The main idea of this algorithm is to use the tabu search approach to generate non-local moves for the clusters and apply the simulated annealing technique to select suitable current best solution so that speed the cluster generation. Experimental results demonstrate the proposed tabu search approach with simulated annealing algorithm for cluster generation is superior to the tabu search approach with Generalised Lloyd algorithm. 1 Clustering Clustering is the process of grouping patterns into a number of clusters, each of which contains the patterns that are similar to each other in some way. The existing clustering algorithms can be simply classied into the following two categories: hierarchical clustering and partitional clustering [1]. The hierarchical clustering operates by partitioning the patterns into successively fewer structures. This method gives rise to a d...
CiteSeerX - Document Details (Isaac Councill, Lee Giles): Clustering is a hard combinatorial problem and is defined as the unsupervised classification of patterns. The formation of clusters is based on the principle of maximizing the similarity between objects of the same cluster while simultaneously minimizing the similarity between objects belonging to distinct clusters. This paper presents a tool for database clustering using a rule-based genetic algorithm (RBCGA). RBCGA evolves individuals consisting of a fixed set of clustering rules, where each rule includes d non-binary intervals, one for each feature. The investigations attempt to alleviate certain drawbacks related to the classical minimization of square-error criterion by suggesting a flexible fitness function which takes into consideration, cluster asymmetry, density, coverage and homogeny.
Background
The problem of inferring the evolutionary history and constructing the phylogenetic tree with high performance has become one of the major problems in computational biology.
Results
A new phylogenetic tree construction method from a given set of objects (proteins, species, etc.) is presented. As an extension of ant colony optimization, this method proposes an adaptive phylogenetic clustering algorithm based on a digraph to find a tree structure that defines the ancestral relationships among the given objects.
Conclusion
Our phylogenetic tree construction method is tested to compare its results with that of the genetic algorithm (GA). Experimental results show that our algorithm converges much faster and also achieves higher quality than GA.
Handcock, M.S., Raftery, A.E. and Tantrum, J. (2005).
Model-Based Clustering for Social Networks.
Working Paper no. 46, Center for Statistics and the Social Sciences,
University of Washington.
Agglomerative Hierarchical Clustering with Constraints: Theoretical and Empirical Results
Ian Davidson1 and S.S. Ravi1
(1) Department of Computer Science, University at Albany - State University of New York, Albany, NY 12222,
CiteSeerX - Document Details (Isaac Councill, Lee Giles): Clustering categorical databases presents special difficulties due to the absence of natural dissimilarities between objects. We present a solution that overcomes these difficulties that is based on an information-theoretical definition of dissimilarities between partitions of finite sets (applied to partitions of the set of objects to be clustered which are determined by categorical attributes) and makes use of genetic algorithms for finding an acceptable approximative clustering. We tested our method on databases for which the clustering of the rows is known in advance and we show that our proposed method finds the natural clustering of the data with a good classification rate, better than that of the classical algorithm k-means.
CiteSeerX - Document Details (Isaac Councill, Lee Giles): In a database with categorical attributes each attribute denes a partition whose classes can be regarded as natural clusters of rows. The main theme of this paper is nding a partition of the rows of the database that is as close as possible to the partitions associated to each attribute. The classes of this partition will then be treated as clusters of rows. We evaluate the closeness of two partitions by using certain generalizations of the classical conditional entropy. From this perspective, we wish to construct a partition (referred to as the median partition) such that the sum of the dissimilarities between this partition and all the partitions determined by the attributes of the database is minimal. Then, the problem of nding the median partition is an optimization problem over the space of all partitions of the rows of the database for which we give an approximative solution. To search more eciently the space of possible partitions, which can be very large, we are using a genetic algorithm. Partitions are represented by chromosomes and we tested both the classical techniques of mutation and crossover and certain special mutation and crossover methods that contain specic knowledge of the problem domain. Keywords: median partition, Shannon entropy, Gini index, mutations, crossover operations 1
"Here's a preliminary data mining analysis of musical social networking service Last.fm. An automated classification into clusters or sub populations with related musical genres reveals the structure of musical preferences among the users in a relatively large sample population. Musical tag clouds are adopted to characterise users and populations, which adds a highly descriptive value and aids with the interpretation of the results."
S. Chu, J. Roddick, und A. Australia. In Data Mining II-Proceedings of Second International Conference on Data Mining Methods and Databases, Seite 515--523. (2000)
R. Almeida, und V. Almeida. WWW '04: Proceedings of the 13th international conference on World Wide Web, Seite 413--421. New York, NY, USA, ACM Press, (2004)