Document Detail


A high-performance computing toolset for relatedness and principal component analysis of SNP data.
MedLine Citation:
PMID:  23060615     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
Genome-wide association studies are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed gdsfmt and SNPRelate (R packages for multi-core symmetric multiprocessing computer architectures) to accelerate two key computations on SNP data: principal component analysis (PCA) and relatedness analysis using identity-by-descent measures. The kernels of our algorithms are written in C/C++ and highly optimized. Benchmarks show the uniprocessor implementations of PCA and identity-by-descent are ∼8-50 times faster than the implementations provided in the popular EIGENSTRAT (v3.0) and PLINK (v1.07) programs, respectively, and can be sped up to 30-300-fold by using eight cores. SNPRelate can analyse tens of thousands of samples with millions of SNPs. For example, our package was used to perform PCA on 55 324 subjects from the 'Gene-Environment Association Studies' consortium studies.
Authors:
Xiuwen Zheng; David Levine; Jess Shen; Stephanie M Gogarten; Cathy Laurie; Bruce S Weir
Related Documents :
10204395 - Graph-theoretic description of the interplay between non-linearity and connectivity in ...
21889105 - Measuring the validity and reliability of forensic likelihood-ratio systems.
16252815 - Probabilistic approaches to fault detection in networked discrete event systems.
24723575 - A regularized approach for geodesic-based semisupervised multimanifold learning.
8178785 - Telephone sampling in epidemiologic research: to reap the benefits, avoid the pitfalls.
24632305 - Network types and their application in natural variation studies in plants.
Publication Detail:
Type:  Journal Article; Research Support, N.I.H., Extramural     Date:  2012-10-11
Journal Detail:
Title:  Bioinformatics (Oxford, England)     Volume:  28     ISSN:  1367-4811     ISO Abbreviation:  Bioinformatics     Publication Date:  2012 Dec 
Date Detail:
Created Date:  2012-12-10     Completed Date:  2013-07-29     Revised Date:  2013-12-04    
Medline Journal Info:
Nlm Unique ID:  9808944     Medline TA:  Bioinformatics     Country:  England    
Other Details:
Languages:  eng     Pagination:  3326-8     Citation Subset:  IM    
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Algorithms
Genome-Wide Association Study*
Humans
Polymorphism, Single Nucleotide*
Principal Component Analysis*
Software*
Grant Support
ID/Acronym/Agency:
U01 HG 004446/HG/NHGRI NIH HHS
Comments/Corrections

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  Tools for mapping high-throughput sequencing data.
Next Document:  CLEVER: Clique-Enumerating Variant Finder.