Document Detail


A comparative analysis of biclustering algorithms for gene expression data.
MedLine Citation:
PMID:  22772837     Owner:  NLM     Status:  Publisher    
Abstract/OtherAbstract:
The need to analyze high-dimension biological data is driving the development of new data mining methods. Biclustering algorithms have been successfully applied to gene expression data to discover local patterns, in which a subset of genes exhibit similar expression levels over a subset of conditions. However, it is not clear which algorithms are best suited for this task. Many algorithms have been published in the past decade, most of which have been compared only to a small number of algorithms. Surveys and comparisons exist in the literature, but because of the large number and variety of biclustering algorithms, they are quickly outdated. In this article we partially address this problem of evaluating the strengths and weaknesses of existing biclustering methods. We used the BiBench package to compare 12 algorithms, many of which were recently published or have not been extensively studied. The algorithms were tested on a suite of synthetic data sets to measure their performance on data with varying conditions, such as different bicluster models, varying noise, varying numbers of biclusters and overlapping biclusters. The algorithms were also tested on eight large gene expression data sets obtained from the Gene Expression Omnibus. Gene Ontology enrichment analysis was performed on the resulting biclusters, and the best enrichment terms are reported. Our analyses show that the biclustering method and its parameters should be selected based on the desired model, whether that model allows overlapping biclusters, and its robustness to noise. In addition, we observe that the biclustering algorithms capable of finding more than one model are more successful at capturing biologically relevant clusters.
Authors:
Kemal Eren; Mehmet Deveci; Onur Küçüktunç; Umit V Catalyürek
Related Documents :
22824967 - Swift block-updating em and pseudo-em procedures for bayesian shrinkage analysis of qua...
23155767 - Agent-based model of macrophage action on endocrine pancreas.
23365667 - Random parameter sampling of a generic three-tier mapk cascade model reveals major fact...
22745057 - A probabilistic method for species sensitivity distributions taking into account the in...
20487397 - Ignorance is not probability.
11329767 - Envirometrics. part i: modeling of water salinity and air quality data.
Publication Detail:
Type:  JOURNAL ARTICLE     Date:  2012-7-6
Journal Detail:
Title:  Briefings in bioinformatics     Volume:  -     ISSN:  1477-4054     ISO Abbreviation:  -     Publication Date:  2012 Jul 
Date Detail:
Created Date:  2012-7-9     Completed Date:  -     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  100912837     Medline TA:  Brief Bioinform     Country:  -    
Other Details:
Languages:  ENG     Pagination:  -     Citation Subset:  -    
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  Ultrafast clustering algorithms for metagenomic sequence analysis.
Next Document:  Photocatalytic hydrogen production from a simple water-soluble [FeFe]-hydrogenase model system.