Document Detail


Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data.
MedLine Citation:
PMID:  21324971     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
Developments in whole genome biotechnology have stimulated statistical focus on prediction methods. We review here methodology for classifying patients into survival risk groups and for using cross-validation to evaluate such classifications. Measures of discrimination for survival risk models include separation of survival curves, time-dependent ROC curves and Harrell's concordance index. For high-dimensional data applications, however, computing these measures as re-substitution statistics on the same data used for model development results in highly biased estimates. Most developments in methodology for survival risk modeling with high-dimensional data have utilized separate test data sets for model evaluation. Cross-validation has sometimes been used for optimization of tuning parameters. In many applications, however, the data available are too limited for effective division into training and test sets and consequently authors have often either reported re-substitution statistics or analyzed their data using binary classification methods in order to utilize familiar cross-validation. In this article we have tried to indicate how to utilize cross-validation for the evaluation of survival risk models; specifically how to compute cross-validated estimates of survival distributions for predicted risk groups and how to compute cross-validated time-dependent ROC curves. We have also discussed evaluation of the statistical significance of a survival risk model and evaluation of whether high-dimensional genomic data adds predictive accuracy to a model based on standard covariates alone.
Authors:
Richard M Simon; Jyothi Subramanian; Ming-Chung Li; Supriya Menezes
Publication Detail:
Type:  Journal Article; Validation Studies     Date:  2011-02-15
Journal Detail:
Title:  Briefings in bioinformatics     Volume:  12     ISSN:  1477-4054     ISO Abbreviation:  Brief. Bioinformatics     Publication Date:  2011 May 
Date Detail:
Created Date:  2011-05-06     Completed Date:  2011-08-02     Revised Date:  2013-05-26    
Medline Journal Info:
Nlm Unique ID:  100912837     Medline TA:  Brief Bioinform     Country:  England    
Other Details:
Languages:  eng     Pagination:  203-14     Citation Subset:  IM    
Affiliation:
Biometric Research Branch, US National Cancer Institute, Bethesda, MD 20892-7434, USA. rsimon@nih.gov
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Databases, Factual
Kaplan-Meier Estimate*
Models, Statistical*
ROC Curve
Research Design
Risk
Comments/Corrections

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  pHairyRed: A Novel Binary Vector Containing the DsRed2 Reporter Gene for Visual Selection of Transge...
Next Document:  Health policy and systems research: defining the terrain; identifying the methods.