Document Detail

Should we abandon the t-test in the analysis of gene expression microarray data: a comparison of variance modeling strategies.
MedLine Citation:
PMID:  20838429     Owner:  NLM     Status:  MEDLINE    
High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alternatives have been developed based on different and innovative variance modeling strategies. However, a critical issue is that selecting a different test usually leads to a different gene list. In this context and given the current tendency to apply the t-test, identifying the most efficient approach in practice remains crucial. To provide elements to answer, we conduct a comparison of eight tests representative of variance modeling strategies in gene expression data: Welch's t-test, ANOVA [1], Wilcoxon's test, SAM [2], RVM [3], limma [4], VarMixt [5] and SMVar [6]. Our comparison process relies on four steps (gene list analysis, simulations, spike-in data and re-sampling) to formulate comprehensive and robust conclusions about test performance, in terms of statistical power, false-positive rate, execution time and ease of use. Our results raise concerns about the ability of some methods to control the expected number of false positives at a desirable level. Besides, two tests (limma and VarMixt) show significant improvement compared to the t-test, in particular to deal with small sample sizes. In addition limma presents several practical advantages, so we advocate its application to analyze gene expression data.
Marine Jeanmougin; Aurelien de Reynies; Laetitia Marisa; Caroline Paccard; Gregory Nuel; Mickael Guedj
Related Documents :
16453369 - Likelihood based tests for spatial randomness.
12162619 - Single-factor repeated-measures designs: analysis and interpretation.
17829359 - A substantial bias in nonparametric tests for periodicity in geophysical data.
22574299 - Whole blood viscosity assessment issues ii: prevalence in endothelial dysfunction and h...
25259589 - A maximal cycle test with good validity and high repeatability in adults of all ages.
3233699 - An example of circular statistics in chronobiological studies: analysis of polymorphism...
7362009 - Latent hypoparathyroidism in patients with autotransplanted parathyroid glands.
9114869 - Performance-based assessment of clinical ethics using an objective structured clinical ...
11153039 - Aspergillus candidus: a respiratory hazard associated with grain dust.
Publication Detail:
Type:  Journal Article; Research Support, Non-U.S. Gov't     Date:  2010-09-03
Journal Detail:
Title:  PloS one     Volume:  5     ISSN:  1932-6203     ISO Abbreviation:  PLoS ONE     Publication Date:  2010  
Date Detail:
Created Date:  2010-09-14     Completed Date:  2011-02-18     Revised Date:  2013-05-28    
Medline Journal Info:
Nlm Unique ID:  101285081     Medline TA:  PLoS One     Country:  United States    
Other Details:
Languages:  eng     Pagination:  e12336     Citation Subset:  IM    
Programme Cartes d'Identité des Tumeurs, Ligue Nationale Contre le Cancer, Paris, France.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Analysis of Variance
Computer Simulation
Gene Expression Profiling / statistics & numerical data*
Models, Statistical
Oligonucleotide Array Sequence Analysis / statistics & numerical data*

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Previous Document:  Commercially available outbred mice for genome-wide association studies.
Next Document:  Gradient descent optimization in gene regulatory pathways.