Document Detail

How Accurate are the Extremely Small P-values Used in Genomic Research: An Evaluation of Numerical Libraries.
MedLine Citation:
PMID:  20161126     Owner:  NLM     Status:  Publisher    
In the fields of genomics and high dimensional biology (HDB), massive multiple testing prompts the use of extremely small significance levels. Because tail areas of statistical distributions are needed for hypothesis testing, the accuracy of these areas is important to confidently make scientific judgments. Previous work on accuracy was primarily focused on evaluating professionally written statistical software, like SAS, on the Statistical Reference Datasets (StRD) provided by National Institute of Standards and Technology (NIST) and on the accuracy of tail areas in statistical distributions. The goal of this paper is to provide guidance to investigators, who are developing their own custom scientific software built upon numerical libraries written by others. In specific, we evaluate the accuracy of small tail areas from cumulative distribution functions (CDF) of the Chi-square and t-distribution by comparing several open-source, free, or commercially licensed numerical libraries in Java, C, and R to widely accepted standards of comparison like ELV and DCDFLIB. In our evaluation, the C libraries and R functions are consistently accurate up to six significant digits. Amongst the evaluated Java libraries, Colt is most accurate. These languages and libraries are popular choices among programmers developing scientific software, so the results herein can be useful to programmers in choosing libraries for CDF accuracy.
Sai Santosh Bangalore; Jelai Wang; David B Allison
Related Documents :
17031536 - Jeda: joint entropy diversity analysis. an information-theoretic method for choosing di...
22782876 - Environment-related variations of the composition of the essential oils of rosemary (ro...
10396496 - Secondary analysis of economic data: a review of cost-benefit studies of neonatal scree...
1177026 - Improvement of scintigram reliability by isocount scanning and multilevel analysis.
19954176 - Calibration of silicone rubber passive samplers: experimental and modeled relations bet...
16969666 - Quantitative structure-activity relationships for prediction of the toxicity of hydroxy...
Publication Detail:
Journal Detail:
Title:  Computational statistics & data analysis     Volume:  53     ISSN:  0167-9473     ISO Abbreviation:  -     Publication Date:  2009 May 
Date Detail:
Created Date:  2010-7-13     Completed Date:  -     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  100960938     Medline TA:  Comput Stat Data Anal     Country:  -    
Other Details:
Languages:  ENG     Pagination:  2446-2452     Citation Subset:  -    
The University of Alabama at Birmingham, Section on Statistical Genetics, Department of Biostatistics, RPHB 327, 1665 University Boulevard, Birmingham, AL-35294-0022, USA.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Grant Support
R01 GM077490-02//NIGMS NIH HHS; U54 CA100949-05S1//NCI NIH HHS

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Next Document:  Glycoprotein interactions in paramyxovirus fusion.