Document Detail


Optimal algorithms for haplotype assembly from whole-genome sequence data.
MedLine Citation:
PMID:  20529904     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
MOTIVATION: Haplotype inference is an important step for many types of analyses of genetic variation in the human genome. Traditional approaches for obtaining haplotypes involve collecting genotype information from a population of individuals and then applying a haplotype inference algorithm. The development of high-throughput sequencing technologies allows for an alternative strategy to obtain haplotypes by combining sequence fragments. The problem of 'haplotype assembly' is the problem of assembling the two haplotypes for a chromosome given the collection of such fragments, or reads, and their locations in the haplotypes, which are pre-determined by mapping the reads to a reference genome. Errors in reads significantly increase the difficulty of the problem and it has been shown that the problem is NP-hard even for reads of length 2. Existing greedy and stochastic algorithms are not guaranteed to find the optimal solutions for the haplotype assembly problem. RESULTS: In this article, we proposed a dynamic programming algorithm that is able to assemble the haplotypes optimally with time complexity O(m x 2(k) x n), where m is the number of reads, k is the length of the longest read and n is the total number of SNPs in the haplotypes. We also reduce the haplotype assembly problem into the maximum satisfiability problem that can often be solved optimally even when k is large. Taking advantage of the efficiency of our algorithm, we perform simulation experiments demonstrating that the assembly of haplotypes using reads of length typical of the current sequencing technologies is not practical. However, we demonstrate that the combination of this approach and the traditional haplotype phasing approaches allow us to practically construct haplotypes containing both common and rare variants.
Authors:
Dan He; Arthur Choi; Knot Pipatsrisawat; Adnan Darwiche; Eleazar Eskin
Related Documents :
15619934 - A hybrid hopfield network-genetic algorithm approach for the terminal assignment problem.
17265714 - Gaining weights... and feeling good about it!
20076194 - Two approaches to the star mapping problem for space vehicle attitude determination.
10976144 - Lambda-opt neural approaches to quadratic assignment problems.
20515424 - A comparison of cic and bte hearing aids for three-dimensional localization of speech.
16680534 - A community practice model for community psychologists and some examples of the applica...
Publication Detail:
Type:  Journal Article; Research Support, N.I.H., Extramural; Research Support, U.S. Gov't, Non-P.H.S.    
Journal Detail:
Title:  Bioinformatics (Oxford, England)     Volume:  26     ISSN:  1367-4811     ISO Abbreviation:  Bioinformatics     Publication Date:  2010 Jun 
Date Detail:
Created Date:  2010-06-09     Completed Date:  2010-10-21     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  9808944     Medline TA:  Bioinformatics     Country:  England    
Other Details:
Languages:  eng     Pagination:  i183-90     Citation Subset:  IM    
Affiliation:
Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA. danhe@cs.ucla.edu
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Algorithms*
Base Sequence
Genome*
Genomics / methods*
Haplotypes*
Grant Support
ID/Acronym/Agency:
K25-HL080079/HL/NHLBI NIH HHS; N01-ES-45530/ES/NIEHS NIH HHS; U01-DA024417/DA/NIDA NIH HHS
Comments/Corrections

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  Estimating genome-wide IBD sharing from SNP data via an efficient hidden Markov model of LD with app...
Next Document:  Efficient identification of identical-by-descent status in pedigrees with many untyped individuals.