@pitman

Fitting semiparametric clustering models to dissimilarity data

. Advances in Data Analysis and Classification, 2 (2): 121--161 (2008)

Abstract

The cluster analysis problem of partitioning a set of objects from dissimilarity data is here handled with the statistical model-based approach of fitting the “closest” classification matrix to the observed dissimilarities. A classification matrix represents a clustering structure expressed in terms of dissimilarities.In cluster analysis there is a lack of methodologies widely used to directly partition a set of objects from dissimilaritydata. In real applications, a hierarchical clustering algorithm is applied on dissimilarities and subsequently a partitionis chosen by visual inspection of the dendrogram. Alternatively, a “tandem analysis” is used by first applying a MultidimensionalScaling (MDS) algorithm and then by using a partitioning algorithm such as k-means applied on the dimensions specified by the MDS. However, neither the hierarchical clustering algorithms nor the tandemanalysis is specifically defined to solve the statistical problem of fitting the closest partition to the observed dissimilarities.This lack of appropriate methodologies motivates this paper, in particular, the introduction and the study of three new objectpartitioning models for dissimilarity data, their estimation via least-squares and the introduction of three new fast algorithms.

Description

SpringerLink - Journal Article

Links and resources

Tags