Document Detail

A class comparison method with filtering-enhanced variable selection for high-dimensional data sets.
MedLine Citation:
PMID:  18781559     Owner:  NLM     Status:  MEDLINE    
High-throughput molecular analysis technologies can produce thousands of measurements for each of the assayed samples. A common scientific question is to identify the variables whose distributions differ between some pre-specified classes (i.e. are differentially expressed). The statistical cost of examining thousands of variables is related to the risk of identifying many variables that truly are not differentially expressed, and many different multiple testing strategies have been used for the analysis of high-dimensional data sets to control the number of these false positives. An approach that is often used in practice to reduce the multiple comparisons problem is to lessen the number of comparisons being performed by filtering out variables that are considered non-informative 'before' the analysis. However, deciding which and how many variables should be filtered out can be highly arbitrary, and different filtering strategies can result in different variables being identified as differentially expressed. We propose the filtering-enhanced variable selection (FEVS) method, a new multiple testing strategy for identifying differentially expressed variables. This method identifies differentially expressed variables by combining the results obtained using a variety of filtering methods, instead of using a pre-specified filtering method or trying to identify an optimal filtering of the variables prior to class comparison analysis. We prove that the FEVS method probabilistically controls the number of false discoveries, and we show with a set of simulations and an example from the literature that FEVS can be useful for gaining sensitivity for the detection of truly differentially expressed variables.
Lara Lusa; Edward L Korn; Lisa M McShane
Related Documents :
18566689 - Pattern expression nonnegative matrix factorization: algorithm and applications to blin...
18781559 - A class comparison method with filtering-enhanced variable selection for high-dimension...
6548649 - Effect of orientational order on the decay of the fluorescence anisotropy in membrane s...
16839419 - Multivariate curve resolution of time course microarray data.
14602939 - Viral and microbial genotyping by a combination of multiplex competitive hybridization ...
15654329 - Functional annotation and network reconstruction through cross-platform integration of ...
2708639 - What is the mechanism by which suicide attempts predispose to later suicide attempts? a...
23280139 - 16(th) ihiw: global analysis of registry hla haplotypes from 20 million individuals: re...
17336339 - Estimating incidence of the french bse infection using a joint analysis of both asympto...
Publication Detail:
Type:  Journal Article; Research Support, Non-U.S. Gov't    
Journal Detail:
Title:  Statistics in medicine     Volume:  27     ISSN:  0277-6715     ISO Abbreviation:  Stat Med     Publication Date:  2008 Dec 
Date Detail:
Created Date:  2008-11-04     Completed Date:  2009-02-26     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  8215016     Medline TA:  Stat Med     Country:  England    
Other Details:
Languages:  eng     Pagination:  5834-49     Citation Subset:  IM    
Department of Medical Informatics, University of Ljubljana, Slovenia.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Data Interpretation, Statistical*
Gene Expression
Microarray Analysis / statistics & numerical data*

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Previous Document:  Supramolecular structure and properties of high strength regenerated cellulose films.
Next Document:  Confidence intervals for a ratio of two independent binomial proportions.