Document Detail


DupChecker: a bioconductor package for checking high-throughput genomic data redundancy in meta-analysis.
MedLine Citation:
PMID:  25267467     Owner:  NLM     Status:  Publisher    
Abstract/OtherAbstract:
BACKGROUND: Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates could lead false positive finding, misleading clustering pattern or model over-fitting issue, etc in the subsequent data analysis.
RESULTS: We developed a Bioconductor package Dupchecker that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data. A real data example was demonstrated to show the usage and output of the package.
CONCLUSIONS: Researchers may not pay enough attention to checking and removing duplicated samples, and then data contamination could make the results or conclusions from meta-analysis questionable. We suggest applying DupChecker to examine all gene expression data sets before any data analysis step.
Authors:
Quanhu Sheng; Yu Shyr; Xi Chen
Related Documents :
20349177 - Analytic model comparing the cost utility of tvt versus duloxetine in women with urinar...
25464127 - On the analysis of a repeated measure design in genome-wide association analysis.
22179647 - Calibration of a simple model for waste stabilisation pond performance in seasonal clim...
11329847 - Bayesian cost-effectiveness analysis. an example using the gusto trial.
24599647 - Qsarins-chem: insubria datasets and new qsar/qspr models for environmental pollutants i...
16396007 - Understanding the structural determinants of object confusion in memory: an assessment ...
Publication Detail:
Type:  JOURNAL ARTICLE     Date:  2014-9-30
Journal Detail:
Title:  BMC bioinformatics     Volume:  15     ISSN:  1471-2105     ISO Abbreviation:  BMC Bioinformatics     Publication Date:  2014 Sep 
Date Detail:
Created Date:  2014-9-30     Completed Date:  -     Revised Date:  2014-10-1    
Medline Journal Info:
Nlm Unique ID:  100965194     Medline TA:  BMC Bioinformatics     Country:  -    
Other Details:
Languages:  ENG     Pagination:  323     Citation Subset:  -    
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  Do you see what I see? Can non-experts with minimal training reproduce expert ratings in behavioral ...
Next Document:  Comparison of the platelet-rich plasma and buffy coat protocols for preparation of canine platelet c...