Fitting semiparametric clustering models to dissimilarity data
M. Vichi. Advances in Data Analysis and Classification, 2 (2):
121--161(2008)
Abstract
The cluster analysis problem of partitioning a set of objects from dissimilarity data is here handled with the statistical
model-based approach of fitting the “closest” classification matrix to the observed dissimilarities. A classification matrix represents a clustering structure expressed in terms of dissimilarities.In cluster analysis there is a lack of methodologies widely used to directly partition a set of objects from dissimilaritydata. In real applications, a hierarchical clustering algorithm is applied on dissimilarities and subsequently a partitionis chosen by visual inspection of the dendrogram. Alternatively, a “tandem analysis” is used by first applying a MultidimensionalScaling (MDS) algorithm and then by using a partitioning algorithm such as k-means applied on the dimensions specified by the MDS. However, neither the hierarchical clustering algorithms nor the tandemanalysis is specifically defined to solve the statistical problem of fitting the closest partition to the observed dissimilarities.This lack of appropriate methodologies motivates this paper, in particular, the introduction and the study of three new objectpartitioning models for dissimilarity data, their estimation via least-squares and the introduction of three new fast algorithms.
%0 Journal Article
%1 vichi08
%A Vichi, Maurizio
%D 2008
%J Advances in Data Analysis and Classification
%K clustering
%N 2
%P 121--161
%T Fitting semiparametric clustering models to dissimilarity data
%U http://dx.doi.org/10.1007/s11634-008-0025-4
%V 2
%X The cluster analysis problem of partitioning a set of objects from dissimilarity data is here handled with the statistical
model-based approach of fitting the “closest” classification matrix to the observed dissimilarities. A classification matrix represents a clustering structure expressed in terms of dissimilarities.In cluster analysis there is a lack of methodologies widely used to directly partition a set of objects from dissimilaritydata. In real applications, a hierarchical clustering algorithm is applied on dissimilarities and subsequently a partitionis chosen by visual inspection of the dendrogram. Alternatively, a “tandem analysis” is used by first applying a MultidimensionalScaling (MDS) algorithm and then by using a partitioning algorithm such as k-means applied on the dimensions specified by the MDS. However, neither the hierarchical clustering algorithms nor the tandemanalysis is specifically defined to solve the statistical problem of fitting the closest partition to the observed dissimilarities.This lack of appropriate methodologies motivates this paper, in particular, the introduction and the study of three new objectpartitioning models for dissimilarity data, their estimation via least-squares and the introduction of three new fast algorithms.
@article{vichi08,
abstract = {The cluster analysis problem of partitioning a set of objects from dissimilarity data is here handled with the statistical
model-based approach of fitting the “closest” classification matrix to the observed dissimilarities. A classification matrix represents a clustering structure expressed in terms of dissimilarities.In cluster analysis there is a lack of methodologies widely used to directly partition a set of objects from dissimilaritydata. In real applications, a hierarchical clustering algorithm is applied on dissimilarities and subsequently a partitionis chosen by visual inspection of the dendrogram. Alternatively, a “tandem analysis” is used by first applying a MultidimensionalScaling (MDS) algorithm and then by using a partitioning algorithm such as k-means applied on the dimensions specified by the MDS. However, neither the hierarchical clustering algorithms nor the tandemanalysis is specifically defined to solve the statistical problem of fitting the closest partition to the observed dissimilarities.This lack of appropriate methodologies motivates this paper, in particular, the introduction and the study of three new objectpartitioning models for dissimilarity data, their estimation via least-squares and the introduction of three new fast algorithms.},
added-at = {2009-07-29T23:16:43.000+0200},
author = {Vichi, Maurizio},
biburl = {https://www.bibsonomy.org/bibtex/2ca4ff976007ae6250592d071fad0b67e/pitman},
description = {SpringerLink - Journal Article},
interhash = {893a48c19e106c43625858744fc2ca85},
intrahash = {ca4ff976007ae6250592d071fad0b67e},
journal = {Advances in Data Analysis and Classification},
keywords = {clustering},
number = 2,
pages = {121--161},
timestamp = {2009-07-29T23:16:43.000+0200},
title = {Fitting semiparametric clustering models to dissimilarity data},
url = {http://dx.doi.org/10.1007/s11634-008-0025-4},
volume = 2,
year = 2008
}