Document Detail


An ILP solution for the gene duplication problem.
MedLine Citation:
PMID:  21342543     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
BACKGROUND: The gene duplication (GD) problem seeks a species tree that implies the fewest gene duplication events across a given collection of gene trees. Solving this problem makes it possible to use large gene families with complex histories of duplication and loss to infer phylogenetic trees. However, the GD problem is NP-hard, and therefore, most analyses use heuristics that lack any performance guarantee.
RESULTS: We describe the first integer linear programming (ILP) formulation to solve instances of the gene duplication problem exactly. With simulations, we demonstrate that the ILP solution can solve problem instances with up to 14 taxa. Furthermore, we apply the new ILP solution to solve the gene duplication problem for the seed plant phylogeny using a 12-taxon, 6,084-gene data set. The unique, optimal solution, which places Gnetales sister to the conifers, represents a new, large-scale genomic perspective on one of the most puzzling questions in plant systematics.
CONCLUSIONS: Although the GD problem is NP-hard, our novel ILP solution for it can solve instances with data sets consisting of as many as 14 taxa and 1,000 genes in a few hours. These are the largest instances that have been solved to optimally to date. Thus, this work can provide large-scale genomic perspectives on phylogenetic questions that previously could only be addressed by heuristic estimates.
Authors:
Wen-Chieh Chang; Gordon J Burleigh; David F Fernández-Baca; Oliver Eulenstein
Related Documents :
21358903 - The effects of variable-time delivery of food items and praise on problem behavior rein...
20086443 - Obesity epidemic: time to swallow the frog.
16592283 - On complexifications of real manifolds.
Publication Detail:
Type:  Journal Article; Research Support, U.S. Gov't, Non-P.H.S.     Date:  2011-02-15
Journal Detail:
Title:  BMC bioinformatics     Volume:  12 Suppl 1     ISSN:  1471-2105     ISO Abbreviation:  BMC Bioinformatics     Publication Date:  2011  
Date Detail:
Created Date:  2011-02-23     Completed Date:  2011-04-08     Revised Date:  2011-07-26    
Medline Journal Info:
Nlm Unique ID:  100965194     Medline TA:  BMC Bioinformatics     Country:  England    
Other Details:
Languages:  eng     Pagination:  S14     Citation Subset:  IM    
Affiliation:
Department of Computer Science, Iowa State University, Ames 50011, USA. wcchang@iastate.edu
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Algorithms
Computer Simulation
Gene Duplication*
Genome, Plant
Genomics / methods
Phylogeny*
Plants / genetics*
Programming, Linear*
Comments/Corrections

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  A clique-based method for the edit distance between unordered trees and its application to analysis ...
Next Document:  Maximum likelihood models and algorithms for gene tree evolution with duplications and losses.