| Optimal algorithms for haplotype assembly from whole-genome sequence data. | |
| | |
MedLine Citation:
|
PMID: 20529904 Owner: NLM Status: MEDLINE |
Abstract/OtherAbstract:
|
MOTIVATION: Haplotype inference is an important step for many types of analyses of genetic variation in the human genome. Traditional approaches for obtaining haplotypes involve collecting genotype information from a population of individuals and then applying a haplotype inference algorithm. The development of high-throughput sequencing technologies allows for an alternative strategy to obtain haplotypes by combining sequence fragments. The problem of 'haplotype assembly' is the problem of assembling the two haplotypes for a chromosome given the collection of such fragments, or reads, and their locations in the haplotypes, which are pre-determined by mapping the reads to a reference genome. Errors in reads significantly increase the difficulty of the problem and it has been shown that the problem is NP-hard even for reads of length 2. Existing greedy and stochastic algorithms are not guaranteed to find the optimal solutions for the haplotype assembly problem. RESULTS: In this article, we proposed a dynamic programming algorithm that is able to assemble the haplotypes optimally with time complexity O(m x 2(k) x n), where m is the number of reads, k is the length of the longest read and n is the total number of SNPs in the haplotypes. We also reduce the haplotype assembly problem into the maximum satisfiability problem that can often be solved optimally even when k is large. Taking advantage of the efficiency of our algorithm, we perform simulation experiments demonstrating that the assembly of haplotypes using reads of length typical of the current sequencing technologies is not practical. However, we demonstrate that the combination of this approach and the traditional haplotype phasing approaches allow us to practically construct haplotypes containing both common and rare variants. |
| | |
Authors:
|
Dan He; Arthur Choi; Knot Pipatsrisawat; Adnan Darwiche; Eleazar Eskin |
Related Documents
:
|
15619934 - A hybrid hopfield network-genetic algorithm approach for the terminal assignment problem. 17265714 - Gaining weights... and feeling good about it! 20076194 - Two approaches to the star mapping problem for space vehicle attitude determination. 10976144 - Lambda-opt neural approaches to quadratic assignment problems. 20515424 - A comparison of cic and bte hearing aids for three-dimensional localization of speech. 16680534 - A community practice model for community psychologists and some examples of the applica... |
Publication Detail:
|
Type: Journal Article; Research Support, N.I.H., Extramural; Research Support, U.S. Gov't, Non-P.H.S. |
Journal Detail:
|
Title: Bioinformatics (Oxford, England) Volume: 26 ISSN: 1367-4811 ISO Abbreviation: Bioinformatics Publication Date: 2010 Jun |
Date Detail:
|
Created Date: 2010-06-09 Completed Date: 2010-10-21 Revised Date: - |
Medline Journal Info:
|
Nlm Unique ID: 9808944 Medline TA: Bioinformatics Country: England |
Other Details:
|
Languages: eng Pagination: i183-90 Citation Subset: IM |
Affiliation:
|
Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA. danhe@cs.ucla.edu |
Export Citation:
|
APA/MLA Format Download EndNote Download BibTex |
| MeSH Terms | |
Descriptor/Qualifier:
|
Algorithms* Base Sequence Genome* Genomics / methods* Haplotypes* |
| Grant Support | |
ID/Acronym/Agency:
|
K25-HL080079/HL/NHLBI NIH HHS; N01-ES-45530/ES/NIEHS NIH HHS; U01-DA024417/DA/NIDA NIH HHS |
| Comments/Corrections | |
From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine
Previous Document: Estimating genome-wide IBD sharing from SNP data via an efficient hidden Markov model of LD with app...
Next Document: Efficient identification of identical-by-descent status in pedigrees with many untyped individuals.