Zusammenfassung
BACKGROUND:Data clustering analysis has been
extensively applied to extract information from gene
expression profiles obtained with DNA microarrays. To
this aim, existing clustering approaches, mainly
developed in computer science, have been adapted to
microarray data analysis. However, previous studies
revealed that microarray datasets have very diverse
structures, some of which may not be correctly captured
by current clustering methods. We therefore approached
the problem from a new starting point, and developed a
clustering algorithm designed to capture
dataset-specific structures at the beginning of the
process.RESULTS:The clustering algorithm is named Fuzzy
clustering by Local Approximation of MEmbership
(FLAME). Distinctive elements of FLAME are: (i)
definition of the neighborhood of each object (gene or
sample) and identification of objects with
ärchetypal" features named Cluster Supporting
Objects, around which to construct the clusters; (ii)
assignment to each object of a fuzzy membership vector
approximated from the memberships of its neighboring
objects, by an iterative converging process in which
membership spreads from the Cluster Supporting Objects
through their neighbors. Comparative analysis with
K-means, hierarchical, fuzzy C-means and fuzzy
self-organizing maps (SOM) showed that data partitions
generated by FLAME are not superimposable to those of
other methods and, although different types of datasets
are better partitioned by different algorithms, FLAME
displays the best overall performance. FLAME is
implemented, together with all the above-mentioned
algorithms, in a C++ software with graphical interface
for Linux and Windows, capable of handling very large
datasets, named Gene Expression Data Analysis Studio
(GEDAS), freely available under GNU General Public
License.CONCLUSION:The FLAME algorithm has intrinsic
advantages, such as the ability to capture non-linear
relationships and non-globular clusters, the automated
definition of the number of clusters, and the
identification of cluster outliers, i.e. genes that are
not assigned to any cluster. As a result, clusters are
more internally homogeneous and more diverse from each
other, and provide better partitioning of biological
functions. The clustering algorithm can be easily
extended to applications different from gene expression
analysis.
Links und Ressourcen
Tags