| Is multiple-sequence alignment required for accurate inference of phylogeny? | |
| | |
MedLine Citation:
|
PMID: 17454975 Owner: NLM Status: MEDLINE |
Abstract/OtherAbstract:
|
The process of inferring phylogenetic trees from molecular sequences almost always starts with a multiple alignment of these sequences but can also be based on methods that do not involve multiple sequence alignment. Very little is known about the accuracy with which such alignment-free methods recover the correct phylogeny or about the potential for increasing their accuracy. We conducted a large-scale comparison of ten alignment-free methods, among them one new approach that does not calculate distances and a faster variant of our pattern-based approach; all distance-based alignment-free methods are freely available from http://www.bioinformatics.org.au (as Python package decaf+py). We show that most methods exhibit a higher overall reconstruction accuracy in the presence of high among-site rate variation. Under all conditions that we considered, variants of the pattern-based approach were significantly better than the other alignment-free methods. The new pattern-based variant achieved a speed-up of an order of magnitude in the distance calculation step, accompanied by a small loss of tree reconstruction accuracy. A method of Bayesian inference from k-mers did not improve on classical alignment-free (and distance-based) methods but may still offer other advantages due to its Bayesian nature. We found the optimal word length k of word-based methods to be stable across various data sets, and we provide parameter ranges for two different alphabets. The influence of these alphabets was analyzed to reveal a trade-off in reconstruction accuracy between long and short branches. We have mapped the phylogenetic accuracy for many alignment-free methods, among them several recently introduced ones, and increased our understanding of their behavior in response to biologically important parameters. In all experiments, the pattern-based approach emerged as superior, at the expense of higher resource consumption. Nonetheless, no alignment-free method that we examined recovers the correct phylogeny as accurately as does an approach based on maximum-likelihood distance estimates of multiply aligned sequences. |
| | |
Authors:
|
Michael Höhl; Mark A Ragan |
Related Documents
:
|
19800685 - Phylogenetic evidence for early hemochorial placentation in eutheria. 20727135 - Mc-net: a method for the construction of phylogenetic networks based on the monte-carlo... 12066305 - Measuring the phylogenetic randomness of biological data sets. 11919295 - Efficient biased estimation of evolutionary distances when substitution rates vary acro... 18310105 - Machine learning methods for predictive proteomics. 17845085 - Personality measurement, faking, and employment selection. |
Publication Detail:
|
Type: Evaluation Studies; Journal Article; Research Support, Non-U.S. Gov't |
Journal Detail:
|
Title: Systematic biology Volume: 56 ISSN: 1063-5157 ISO Abbreviation: Syst. Biol. Publication Date: 2007 Apr |
Date Detail:
|
Created Date: 2007-04-24 Completed Date: 2007-07-12 Revised Date: - |
Medline Journal Info:
|
Nlm Unique ID: 9302532 Medline TA: Syst Biol Country: England |
Other Details:
|
Languages: eng Pagination: 206-21 Citation Subset: IM |
Affiliation:
|
Australian Research Council Centre in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia. |
Export Citation:
|
APA/MLA Format Download EndNote Download BibTex |
| MeSH Terms | |
Descriptor/Qualifier:
|
Bayes Theorem Computational Biology / methods Likelihood Functions Models, Genetic Phylogeny* Sequence Alignment* Sequence Analysis / methods* |
From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine
Previous Document: Alignment and topological accuracy of the direct optimization approach via POY and traditional phylo...
Next Document: Prevalence of low forearm bone mineral density in Bulgarian men: a pilot study.