Document Detail


Simulating realistic genomic data with rare variants.
MedLine Citation:
PMID:  23161487     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
Increasing evidence suggests that rare and generally deleterious genetic variants might have a strong impact on disease risks of not only Mendelian disease, but also many common diseases. However, identifying such rare variants remains challenging, and novel statistical methods and bioinformatic software must be developed. Hence, we have to extensively evaluate various methods under reasonable genetic models. Although there are abundant genomic data, they are not most helpful for the evaluation of the methods because the disease mechanism is unknown. Thus, it is imperative that we simulate genomic data that mimic the real data containing rare variants and that enable us to impose a known disease penetrance model. Although resampling simulation methods have shown their advantages in computational efficiency and in preserving important properties such as linkage disequilibrium (LD) and allele frequency, they still have limitations as we demonstrated. We propose an algorithm that combines a regression-based imputation with resampling to simulate genetic data with both rare and common variants. Logistic regression model was employed to fit the relationship between a rare variant and its nearby common variants in the 1000 Genomes Project data and then applied to the real data to fill in one rare variant at a time using the fitted logistic model based on common variants. Individuals then were simulated using the real data with imputed rare variants. We compared our method with existing simulators and demonstrated that our method performed well in retaining the real sample properties, such as LD and minor allele frequency, qualitatively.
Authors:
Yaji Xu; Yinghua Wu; Chi Song; Heping Zhang
Related Documents :
23306267 - Influence factors and prediction of stormwater runoff of urban green space in tianjin, ...
24530637 - Limitations of locally sampled characters in phylogenetic analyses of sparse supermatri...
23329857 - Nesting monte carlo em for high-dimensional item factor analysis.
24598117 - Parasites as valuable stock markers for fisheries in australasia, east asia and the pac...
11156267 - Predictive modelling of the growth and survival of listeria in fishery products.
10388677 - Indication that the nitrogen source influences both amount and size of exopolysaccharid...
Publication Detail:
Type:  Journal Article; Research Support, N.I.H., Extramural     Date:  2012-11-17
Journal Detail:
Title:  Genetic epidemiology     Volume:  37     ISSN:  1098-2272     ISO Abbreviation:  Genet. Epidemiol.     Publication Date:  2013 Feb 
Date Detail:
Created Date:  2013-01-11     Completed Date:  2013-09-16     Revised Date:  2014-02-04    
Medline Journal Info:
Nlm Unique ID:  8411723     Medline TA:  Genet Epidemiol     Country:  United States    
Other Details:
Languages:  eng     Pagination:  163-72     Citation Subset:  IM    
Copyright Information:
© 2012 WILEY PERIODICALS, INC.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Algorithms*
Chromosomes, Human, Pair 22
Computer Simulation
Gene Frequency
Genetic Variation*
HapMap Project
Humans
Linkage Disequilibrium
Logistic Models
Models, Genetic*
Polymorphism, Single Nucleotide
Grant Support
ID/Acronym/Agency:
R01 DA016750/DA/NIDA NIH HHS; R01 GM088566/GM/NIGMS NIH HHS; R01DA016750/DA/NIDA NIH HHS; R01GM088566/GM/NIGMS NIH HHS
Comments/Corrections

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  The anatomical relationship between the eustachian tube and petrous internal carotid artery.
Next Document:  MCE-LIF method for the hydrolysis of l-glutamine by using l-asparaginase enzyme reactor based on gol...