Document Detail


Efficient Genomewide Selection of PCA-Correlated tSNPs for Genotype Imputation.
MedLine Citation:
PMID:  21902678     Owner:  NLM     Status:  Publisher    
Abstract/OtherAbstract:
The linkage disequilibrium structure of the human genome allows identification of small sets of single nucleotide polymorphisms (SNPs) (tSNPs) that efficiently represent dense sets of markers. This structure can be translated into linear algebraic terms as evidenced by the well documented principal components analysis (PCA)-based methods. Here we apply, for the first time, PCA-based methodology for efficient genomewide tSNP selection; and explore the linear algebraic structure of the human genome. Our algorithm divides the genome into contiguous nonoverlapping windows of high linear structure. Coupling this novel window definition with a PCA-based tSNP selection method, we analyze 2.5 million SNPs from the HapMap phase 2 dataset. We show that 10-25% of these SNPs suffice to predict the remaining genotypes with over 95% accuracy. A comparison with other popular methods in the ENCODE regions indicates significant genotyping savings. We evaluate the portability of genome-wide tSNPs across a diverse set of populations (HapMap phase 3 dataset). Interestingly, African populations are good reference populations for the rest of the world. Finally, we demonstrate the applicability of our approach in a real genome-wide disease association study. The chosen tSNP panels can be used toward genotype imputation using either a simple regression-based algorithm or more sophisticated genotype imputation methods.
Authors:
Asif Javed; Petros Drineas; Michael W Mahoney; Peristera Paschou
Related Documents :
21976368 - Optimal synthesis of chromatographic trains for downstream protein processing.
22903668 - Naturally occurring autoantibodies against β-amyloid.
21811778 - Theoretical studies on models of lysine-arginine cross-links derived from α-oxoaldehyd...
21618558 - Quantitative symmetry and chirality-a fast computational algorithm for large structures...
23211748 - Reliability of adenoma detection rate is based on procedural volume.
23707818 - Defense automated neurobehavioral assessment (dana)-psychometric properties of a new fi...
Publication Detail:
Type:  JOURNAL ARTICLE     Date:  2011-9-8
Journal Detail:
Title:  Annals of human genetics     Volume:  -     ISSN:  1469-1809     ISO Abbreviation:  -     Publication Date:  2011 Sep 
Date Detail:
Created Date:  2011-9-9     Completed Date:  -     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  0416661     Medline TA:  Ann Hum Genet     Country:  -    
Other Details:
Languages:  ENG     Pagination:  -     Citation Subset:  -    
Copyright Information:
© 2011 The Authors Annals of Human Genetics © 2011 Blackwell Publishing Ltd/University College London.
Affiliation:
Computational Biology Center, IBM T. J. Watson Research, Yorktown Heights, NY 10598, USA Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180, USA Department of Mathematics, Stanford University, Palo Alto, CA 94305, USA Department of Molecular Biology and Genetics, Democritus University of Thrace, Alexandroupoli 68100, Greece.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  Maternal smokeless tobacco use in Alaska Native women and singleton infant birth size.
Next Document:  The role of genetic variability in the SLC6A4, BDNF and GABRA6 genes in anxiety-related traits.