Document Detail

A simple and fast method to determine the parameters for fuzzy c-means cluster analysis.
MedLine Citation:
PMID:  20880957     Owner:  NLM     Status:  MEDLINE    
MOTIVATION: Fuzzy c-means clustering is widely used to identify cluster structures in high-dimensional datasets, such as those obtained in DNA microarray and quantitative proteomics experiments. One of its main limitations is the lack of a computationally fast method to set optimal values of algorithm parameters. Wrong parameter values may either lead to the inclusion of purely random fluctuations in the results or ignore potentially important data. The optimal solution has parameter values for which the clustering does not yield any results for a purely random dataset but which detects cluster formation with maximum resolution on the edge of randomness.
RESULTS: Estimation of the optimal parameter values is achieved by evaluation of the results of the clustering procedure applied to randomized datasets. In this case, the optimal value of the fuzzifier follows common rules that depend only on the main properties of the dataset. Taking the dimension of the set and the number of objects as input values instead of evaluating the entire dataset allows us to propose a functional relationship determining the fuzzifier directly. This result speaks strongly against using a predefined fuzzifier as typically done in many previous studies. Validation indices are generally used for the estimation of the optimal number of clusters. A comparison shows that the minimum distance between the centroids provides results that are at least equivalent or better than those obtained by other computationally more expensive indices.
Veit Schwämmle; Ole Nørregaard Jensen
Related Documents :
938187 - Another view of schizophrenia subtypes. a report from the international pilot study of ...
18469217 - Models and methods in delay discounting.
17049097 - Identifying geographic areas with high disease rates: when do confidence intervals for ...
10960857 - Space-time mixture modelling of public health data.
24488827 - Analysis of estrogens and estrogen mimics in edible matrices - a review.
21297187 - Estimating minutes of physical activity from the previous day physical activity recall:...
Publication Detail:
Type:  Journal Article     Date:  2010-09-29
Journal Detail:
Title:  Bioinformatics (Oxford, England)     Volume:  26     ISSN:  1367-4811     ISO Abbreviation:  Bioinformatics     Publication Date:  2010 Nov 
Date Detail:
Created Date:  2010-11-04     Completed Date:  2011-02-16     Revised Date:  2013-05-20    
Medline Journal Info:
Nlm Unique ID:  9808944     Medline TA:  Bioinformatics     Country:  England    
Other Details:
Languages:  eng     Pagination:  2841-8     Citation Subset:  IM    
Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Cluster Analysis*
Fuzzy Logic*
Oligonucleotide Array Sequence Analysis / methods
Pattern Recognition, Automated / methods

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Previous Document:  A statistical framework for Illumina DNA methylation arrays.
Next Document:  Cerebral palsy and alcohol consumption during pregnancy: is there a connection?