Document Detail

Is multiple-sequence alignment required for accurate inference of phylogeny?
MedLine Citation:
PMID:  17454975     Owner:  NLM     Status:  MEDLINE    
The process of inferring phylogenetic trees from molecular sequences almost always starts with a multiple alignment of these sequences but can also be based on methods that do not involve multiple sequence alignment. Very little is known about the accuracy with which such alignment-free methods recover the correct phylogeny or about the potential for increasing their accuracy. We conducted a large-scale comparison of ten alignment-free methods, among them one new approach that does not calculate distances and a faster variant of our pattern-based approach; all distance-based alignment-free methods are freely available from (as Python package decaf+py). We show that most methods exhibit a higher overall reconstruction accuracy in the presence of high among-site rate variation. Under all conditions that we considered, variants of the pattern-based approach were significantly better than the other alignment-free methods. The new pattern-based variant achieved a speed-up of an order of magnitude in the distance calculation step, accompanied by a small loss of tree reconstruction accuracy. A method of Bayesian inference from k-mers did not improve on classical alignment-free (and distance-based) methods but may still offer other advantages due to its Bayesian nature. We found the optimal word length k of word-based methods to be stable across various data sets, and we provide parameter ranges for two different alphabets. The influence of these alphabets was analyzed to reveal a trade-off in reconstruction accuracy between long and short branches. We have mapped the phylogenetic accuracy for many alignment-free methods, among them several recently introduced ones, and increased our understanding of their behavior in response to biologically important parameters. In all experiments, the pattern-based approach emerged as superior, at the expense of higher resource consumption. Nonetheless, no alignment-free method that we examined recovers the correct phylogeny as accurately as does an approach based on maximum-likelihood distance estimates of multiply aligned sequences.
Michael Höhl; Mark A Ragan
Related Documents :
20182905 - Estimating stem volume and biomass of pinus koraiensis using lidar data.
20976875 - Ancestral recombinations graph: a reconstructability perspective using random-graphs fr...
12079645 - Analyzing developmental sequences within a phylogenetic framework.
19481165 - Examining the utility of categorical models and alleviating artifacts in phylogenetic r...
10099235 - Metabolic modeling of polyhydroxybutyrate biosynthesis.
25254205 - Role of feed forward neural networks coupled with genetic algorithm in capitalizing of ...
Publication Detail:
Type:  Evaluation Studies; Journal Article; Research Support, Non-U.S. Gov't    
Journal Detail:
Title:  Systematic biology     Volume:  56     ISSN:  1063-5157     ISO Abbreviation:  Syst. Biol.     Publication Date:  2007 Apr 
Date Detail:
Created Date:  2007-04-24     Completed Date:  2007-07-12     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  9302532     Medline TA:  Syst Biol     Country:  England    
Other Details:
Languages:  eng     Pagination:  206-21     Citation Subset:  IM    
Australian Research Council Centre in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Bayes Theorem
Computational Biology / methods
Likelihood Functions
Models, Genetic
Sequence Alignment*
Sequence Analysis / methods*

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Previous Document:  Alignment and topological accuracy of the direct optimization approach via POY and traditional phylo...
Next Document:  Prevalence of low forearm bone mineral density in Bulgarian men: a pilot study.