Document Detail


Multiple sequence assembly from reads alignable to a common reference genome.
MedLine Citation:
PMID:  21778524     Owner:  NLM     Status:  In-Data-Review    
Abstract/OtherAbstract:
We describe a set of computational problems motivated by certain analysis tasks in genome resequencing. These are assembly problems for which multiple distinct sequences must be assembled, but where the relative positions of reads to be assembled are already known. This information is obtained from a common reference genome and is characteristic of resequencing experiments. The simplest variant of the problem aims at determining a minimum set of superstrings such that each sequenced read matches at least one superstring. We give an algorithm with time complexity O(N), where N is the sum of the lengths of reads, substantially improving on previous algorithms for solving the same problem. We also examine the problem of finding the smallest number of reads to remove such that the remaining reads are consistent with k superstrings. By exploiting a surprising relationship with the minimum cost flow problem, we show that this problem can be solved in polynomial time when nested reads are excluded. If nested reads are permitted, this problem of removing the minimum number of reads becomes NP-hard. We show that permitting mismatches between reads and their nearest superstrings generally renders these problems NP-hard.
Authors:
Qian Peng; Andrew D Smith
Related Documents :
144434 - The industrial back problem: role of the industrial hygienist and ergonomics.
20804384 - Rademacher chaos complexities for learning the kernel problem.
15676384 - The introduction of ergonomics: a problem of industrial practice.
21668134 - Stochastic matching problem.
8206044 - The place of epidemiology in environmental decisions: needed support for the developmen...
16115714 - Lies, damned lies and statistics? reliability and personal accounts of smoking among yo...
Publication Detail:
Type:  Journal Article    
Journal Detail:
Title:  IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM     Volume:  8     ISSN:  1557-9964     ISO Abbreviation:  IEEE/ACM Trans Comput Biol Bioinform     Publication Date:    2011 Sep-Oct
Date Detail:
Created Date:  2011-07-22     Completed Date:  -     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  101196755     Medline TA:  IEEE/ACM Trans Comput Biol Bioinform     Country:  United States    
Other Details:
Languages:  eng     Pagination:  1283-95     Citation Subset:  IM    
Affiliation:
University of California, San Diego, La Jolla.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  Fast flexible modeling of RNA structure using internal coordinates.
Next Document:  Probabilistic models for semisupervised discriminative motif discovery in DNA sequences.