Abstract
We formalize clustering as a partitioning problem with a user-defined internal clustering criterion and present SINICC, an unbiased, empirical method for comparing internal clustering criteria. An application to multi-sensor fusion is described, where the data set is composed of inexact sensor “reports” pertaining to “objects” in an environment. Given these reports, the objective is to produce a representation of the environment, where each entity in the representation is the result of “fusing” sensor reports. Before one can perform fusion, however, the reports must be “associated” into homogeneous clusters. Simulated annealing is used to find a near-optimal partitioning with respect to each of several clustering criteria for a variety of simulated data sets. This method can then be used to determine the “best” clustering criterion for the multi-sensor fusion problem with a given fusion operator.
costs US $ 31.50
CiteSeerX - Document Details (Isaac Councill, Lee Giles): A novel evolution strategy implementing variable length genomes is developed to address the problem of dynamic partitional clustering. As opposed to static, dynamic partitional clustering does not require the a priori specification of the number of clusters. Results of the algorithm are presented and discussed for 2-D touching and non-touching cluster test cases.
Abstract
In this paper we consider the problem of clustering m objects into c clusters. The objects are represented by points in an n-dimensional Euclidean space, and the objective is to classify these m points into c clusters such that the distance between points within a cluster and its center (which is to be found) is minimized. The problem is a nonconvex program that has many local minima. It has been studied by many researchers and the most well-known algorithm for solving it is the k-means algorithm. In this paper, we develop a new algorithm for solving this problem based on a tabu search technique. Preliminary computational experience on the developed algorithm are encouraging and compare favorably with both the k-means and the simulated annealing algorithms.
costs US $ 31.50
Abstract:
In this paper, an algorithm for cluster generation using tabu search approach with simulated annealing is proposed. The main idea of this algorithm is to use the tabu search approach to generate non-local moves for the clusters and apply the simulated annealing technique to select suitable current best solution so that speed the cluster generation. Experimental results demonstrate the proposed tabu search approach with simulated annealing algorithm for cluster generation is superior to the tabu search approach with Generalised Lloyd algorithm. 1 Clustering Clustering is the process of grouping patterns into a number of clusters, each of which contains the patterns that are similar to each other in some way. The existing clustering algorithms can be simply classied into the following two categories: hierarchical clustering and partitional clustering [1]. The hierarchical clustering operates by partitioning the patterns into successively fewer structures. This method gives rise to a d...
CiteSeerX - Document Details (Isaac Councill, Lee Giles): This paper describes a genetically guided approach to optimizing the hard (J1) and fuzzy (Jm) c-means functionals used in cluster analysis. Our experiments show that a genetic algorithm ameliorates the difficulty of choosing an initialization for the c-means clustering algorithms. Experiments use six data sets, including the Iris data, magnetic resonance and color images. The genetic algorithm approach is generally able to find the lowest known Jm value or a Jm associated with a partition very similar to that associated with the lowest Jm value. On data sets with several local extrema, the GA approach always avoids the less desirable solutions. Degenerate partitions are always avoided by the GA approach, which provides an effiective method for optimizing clustering models whose objective function can be represented in terms of cluster centers. The time cost of genetic guided clustering is shown to make a series random initializations of fuzzy/hard c-means, where the partition a...
CiteSeerX - Document Details (Isaac Councill, Lee Giles): Clustering is a hard combinatorial problem and is defined as the unsupervised classification of patterns. The formation of clusters is based on the principle of maximizing the similarity between objects of the same cluster while simultaneously minimizing the similarity between objects belonging to distinct clusters. This paper presents a tool for database clustering using a rule-based genetic algorithm (RBCGA). RBCGA evolves individuals consisting of a fixed set of clustering rules, where each rule includes d non-binary intervals, one for each feature. The investigations attempt to alleviate certain drawbacks related to the classical minimization of square-error criterion by suggesting a flexible fitness function which takes into consideration, cluster asymmetry, density, coverage and homogeny.
CiteSeerX - Document Details (Isaac Councill, Lee Giles): This paper proposes a new evolutionary algorithm for subspace clustering in very large and high dimensional databases. The design includes task-specific coding and genetic operators, along with a non-random initialization procedure. Reported experimental results show the algorithm scales almost linearly with the size and dimensionality of the database as well as the dimensionality of the hidden clusters.
CiteSeerX - Document Details (Isaac Councill, Lee Giles): Clustering categorical databases presents special difficulties due to the absence of natural dissimilarities between objects. We present a solution that overcomes these difficulties that is based on an information-theoretical definition of dissimilarities between partitions of finite sets (applied to partitions of the set of objects to be clustered which are determined by categorical attributes) and makes use of genetic algorithms for finding an acceptable approximative clustering. We tested our method on databases for which the clustering of the rows is known in advance and we show that our proposed method finds the natural clustering of the data with a good classification rate, better than that of the classical algorithm k-means.
CiteSeerX - Document Details (Isaac Councill, Lee Giles): In a database with categorical attributes each attribute denes a partition whose classes can be regarded as natural clusters of rows. The main theme of this paper is nding a partition of the rows of the database that is as close as possible to the partitions associated to each attribute. The classes of this partition will then be treated as clusters of rows. We evaluate the closeness of two partitions by using certain generalizations of the classical conditional entropy. From this perspective, we wish to construct a partition (referred to as the median partition) such that the sum of the dissimilarities between this partition and all the partitions determined by the attributes of the database is minimal. Then, the problem of nding the median partition is an optimization problem over the space of all partitions of the rows of the database for which we give an approximative solution. To search more eciently the space of possible partitions, which can be very large, we are using a genetic algorithm. Partitions are represented by chromosomes and we tested both the classical techniques of mutation and crossover and certain special mutation and crossover methods that contain specic knowledge of the problem domain. Keywords: median partition, Shannon entropy, Gini index, mutations, crossover operations 1
Abstract--Tbe applicability of evolution strategies (ESs), population based stochastic optimization techniques,
to optimize clustering objective functions is explored. Clustering objective functions are categorized into
centroid and non-centroid type of functions. Optimization of the centroid type of objective functions is
accomplished by formulating them as functions of real-valued parameters using ESs. Both hard and fuzzy
clustering objective functions are considered in this study. Applicability of ESs to discrete optimization
problems is extended to optimize the non-centroid type of objective functions. As ESs are amenable to
parallelization, a parallel model (master/slave model) is described in the context of the clustering problem.
Results obtained for selected data sets substantiate the utility of ESs in clustering.
I. Sarafis, A. Zalzala, und P. Trinder. In Proceedings of the 2002 Congress on Evolutionary Computation CEC2002, Seite 1238--1243. IEEE Academic Press, (2002)
S. Chu, J. Roddick, und A. Australia. In Data Mining II-Proceedings of Second International Conference on Data Mining Methods and Databases, Seite 515--523. (2000)