Document Detail

MC-Net: a method for the construction of phylogenetic networks based on the Monte-Carlo method.
Jump to Full Text
MedLine Citation:
PMID:  20727135     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
BACKGROUND: A phylogenetic network is a generalization of phylogenetic trees that allows the representation of conflicting signals or alternative evolutionary histories in a single diagram. There are several methods for constructing these networks. Some of these methods are based on distances among taxa. In practice, the methods which are based on distance perform faster in comparison with other methods. The Neighbor-Net (N-Net) is a distance-based method. The N-Net produces a circular ordering from a distance matrix, then constructs a collection of weighted splits using circular ordering. The SplitsTree which is a program using these weighted splits makes a phylogenetic network. In general, finding an optimal circular ordering is an NP-hard problem. The N-Net is a heuristic algorithm to find the optimal circular ordering which is based on neighbor-joining algorithm.
RESULTS: In this paper, we present a heuristic algorithm to find an optimal circular ordering based on the Monte-Carlo method, called MC-Net algorithm. In order to show that MC-Net performs better than N-Net, we apply both algorithms on different data sets. Then we draw phylogenetic networks corresponding to outputs of these algorithms using SplitsTree and compare the results.
CONCLUSIONS: We find that the circular ordering produced by the MC-Net is closer to optimal circular ordering than the N-Net. Furthermore, the networks corresponding to outputs of MC-Net made by SplitsTree are simpler than N-Net.
Authors:
Changiz Eslahchi; Mahnaz Habibi; Reza Hassanzadeh; Ehsan Mottaghi
Related Documents :
9797405 - Molecular evolution modeled as a fractal renewal point process in agreement with the di...
15465695 - Detection of tree roots and determination of root diameters by ground penetrating radar...
22212205 - Indigenous bali cattle is most suitable for sustainable small farming in indonesia.
22722735 - Body adiposity index assess body fat with high accuracy in nondialyzed chronic kidney d...
24062635 - Individual differences in social information gathering revealed through bayesian hierar...
23562915 - Mathematics for biomathics.
Publication Detail:
Type:  Journal Article; Research Support, Non-U.S. Gov't     Date:  2010-08-20
Journal Detail:
Title:  BMC evolutionary biology     Volume:  10     ISSN:  1471-2148     ISO Abbreviation:  BMC Evol. Biol.     Publication Date:  2010  
Date Detail:
Created Date:  2010-09-14     Completed Date:  2010-12-22     Revised Date:  2013-05-28    
Medline Journal Info:
Nlm Unique ID:  100966975     Medline TA:  BMC Evol Biol     Country:  England    
Other Details:
Languages:  eng     Pagination:  254     Citation Subset:  IM    
Affiliation:
Faculty of Mathematics, Shahid Beheshti University, GC, Tehran, Iran. ch-eslahchi@sbu.ac.ir
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Algorithms
Computational Biology / methods
Models, Theoretical
Monte Carlo Method*
Phylogeny*
Comments/Corrections

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Full Text
Journal Information
Journal ID (nlm-ta): BMC Evol Biol
ISSN: 1471-2148
Publisher: BioMed Central
Article Information
Copyright ©2010 Eslahchi et al; licensee BioMed Central Ltd.
open-access:
Received Day: 9 Month: 9 Year: 2009
Accepted Day: 20 Month: 8 Year: 2010
collection publication date: Year: 2010
Electronic publication date: Day: 20 Month: 8 Year: 2010
Volume: 10First Page: 254 Last Page: 254
Publisher Id: 1471-2148-10-254
PubMed Id: 20727135
DOI: 10.1186/1471-2148-10-254

MC-Net: a method for the construction of phylogenetic networks based on the Monte-Carlo method
Changiz Eslahchi1 Email: ch-eslahchi@sbu.ac.ir
Mahnaz Habibi1 Email: mhabibi@ipm.ir
Reza Hassanzadeh12 Email: re.hassanzadeh@mail.sbu.ac.ir
Ehsan Mottaghi1 Email: mottaghi.ehsan@mail.sbu.ac.ir
1Faculty of Mathematics, Shahid Beheshti University, G.C., Tehran, Iran
2School of Computer Science, Institute for Studies in Theoretical Physics and Mathematics (IPM), Tehran, Iran

Background

Phylogenetics is concerned with the construction and analysis of phylogenetic trees or networks to understand the evolution of species, populations, and individuals. Evolutionary processes such as hybridization between species, lateral transfer of genes, recombination within a population, and convergent evolution can all lead to evolutionary histories that are distinctly non-treelike. Moreover, even when the underlying evolution is treelike, the presence of conflicting or ambiguous signals can make a single tree representation inappropriate. In these situations, phylogenetic network methods can be particularly useful.

Phylogenetic network is a generalization of phylogenetic trees that can represent several trees simultaneously. For any network construction method, the conflicting signals should be represented in the network but it is vital that the network does not depict more conflict than is found in the data. At the same time, when the data fits well to a tree, the method should return a network that is close to a tree. Recently, in addition to biology, the phylogenetic networks methods are widely used for classifying different types of data such as those finding in linguistics, music, etc. There are many different methods to construct phylogenetic trees or networks which are based on distance matrix such as ME (minimum evolution) [1], LS (least squares) [2,3], NJ (neighbor-joining) [4], AddTree [5], N-Net (neighbor-net) [6] and Q-Net [7]. All these methods are called distance-based methods.

ME is one of the most well-known methods. It was first introduced by Kidd and Sgamarella-Zonta [1]. Given a distance matrix, the ME principle consists of selecting the tree whose length (sum of its branch lengths) is minimal among all tree topologies for taxa. Comparative studies of tree-building methods show that ME generally is an accurate criterion for selecting a true tree. Nei and Rzhetsky have shown that ME principle is statistically consistent when branch lengths are assigned by ordinary least-squares (OLS) fitting [8]. In the OLS framework, we simply minimize


∑i,j∈X(dij −δi,j)2,

where δij is an estimation of input dij and X is the set of taxa. In fact, the main goal is to find a tree whose induced metric is closer to dij. The LS was first introduced in [2] and [3].

Nearly 20 years have passed by since the landmark paper in Molecular Biology and Evolution introducing NJ [4]. The method has become the most widely used method for building phylogenetic trees from distances. Steel and Gascuel showed that NJ is a greedy algorithm for ME principle [9]. The N-Net is a hybrid of NJ and split decomposition [10]. It is applicable to data sets containing hundreds of taxa. The N-Net is an algorithm for constructing phylogenetic networks.

Split decomposition, implemented in SplitsTree [11], decomposes the distance matrix into simple components based on weighted splits. These splits are then represented using a special type of phylogenetic network called split network. The N-Net works in a similar way: it first produces a circular ordering from distance matrix and then constructs a collection of weighted splits. Dan Levy and Lior Patcher showed that the N-Net is a greedy algorithm for the traveling salesman problem that minimizes the balanced length of the split system at every step and it is optimal for circular distance matrices [12]. Balanced minimum evolution (BME) is designed under the ME principle [13]. The BME is a special version of the ME principle where tree length is estimated by the weighted least squares [13].

In this work, we introduce MC-Net algorithm (Monte-Carlo Network algorithm) which works in a similar way: First, it finds a circular ordering for taxa, based on Monte-Carlo with simulated annealing, it then extracts splits from the circular ordering and uses non-negative least squares for weighting splits. We compare the results of the N-Net and the MC-Net for several data sets.

Preliminaries

A split of a given set X of taxa is a bipartition of the set X into two non-empty subsets of X. A split is called trivial if one of the two subsets contains only one taxon. Let T be a non-empty tree. Let the leaves of the T are labeled by the set of taxa, X ={x1,...,xn}. Every edge e of T defines a split S = A|B, where A and B are two sets of taxa contained in the two components of T - e. For example, Figure 1 shows an eight-leaf tree. Removing the edge e from the tree produces two sets of leaves


A={t3,t4,t5}  and  B={t1,t2,t6,t7,t8}.

In an edge-weighted tree, the weight of each edge is assigned to its corresponding split. The Phyletic distance between any two taxa x and y in an edge-weighted tree is the sum of the weights of the edges along the path from x to y. Hence, the phyletic distance between x and y equals the sum of split weights for all those splits in which x and y belong to separate components.

Two different splits S1 = A1|B1, and S2 = A2|B2, are compatible, if one of the following conditions holds:


A1⊆A2,A1⊆B2,B1⊆A2  or  B1⊆B2.

A collection of splits is called compatible, if all possible pairing of splits are compatible. A compatible collection of splits is represented by a phylogenetic tree [14,15]. Dress and Huson introduced SplitsTree to display more complex evolutionary patterns [16]. For a set of incompatible splits, SplitsTree outputs the split network using bands of parallel edges.

Circular collection of splits is a mathematical generalization of compatible collections of splits. Formally, a collection of splits of X is circular if there exists an ordering x1,⋯,xn of X such that every split is of the form {xi, xi+1,⋯,xj}|X - {xi,⋯,xj} for some i and j, 1 ≤ i j n. A Compatible collection of splits are always circular [10]. On the other hand, the class of circular collection of splits contains the class of the collection of splits corresponding to a tree. Andreas Dress and Daniel Huson proved that circular collections of splits always have a planar splits graph representation [16]. A distance matrix is circular (also called Kalmanson) if it is the phyletic distances for a circular collection of splits with positive weights. Because compatible splits are circular, treelike distances are circular too [6].

As mentioned above, the ME principle consists of selecting a tree whose length is minimal. In fact, the ME principle is equivalent to finding a circular ordering σ = {xσ(1),...,xσ(n)} in order to find the minimum of the function η

[Formula ID: bmcM1]
(1) 
η:∑→ℜη(σ)=d(xσ(1),xσ(n))+∑k=1n−1d(xσ(k),xσ(k+1))

Where Σ is the set of all circular orderings of taxa x1,...,xn. We call function η the energy function, and any circular ordering that minimizes η is called the optimal circular ordering.


Methods

There are a number of different methods for constructing various kinds of phylogenetic networks. A phylogenetic network can be constructed from a collection of weighted splits. N-Net uses circular ordering to construct a collection of weighted splits. Since finding an optimal circular ordering is an NP-hard problem, so we introduce a heuristic algorithm based on the Monte-Carlo method to find optimal circular ordering. The MC-Net seeks to find an optimal circular ordering from the distance matrix and then extracts a collection of weighted splits based on that ordering.

Algorithms

In this section, a new algorithm called the MC-Net, is presented to construct a set of weighted splits for taxa set X = {x1,...,xn}with a given distance matrix. The MC-Net consists of two steps. In the first step, we find a circular ordering. In the second step, the splits which are obtained from the circular ordering are weighted. The core of the first step contains two procedures, namely, INITIAL and the Monte-Carlo. The INITIAL is a greedy algorithm to obtain a circular ordering, namely, the initial circular ordering. The INITIAL works in the following way:

Suppose xσ(1),...,xσ(k) are ordered and let be an element of S = X - {xσ(1),...,xσ(k)} such that


d(x¯,r)=min{d(x,r)|x∈S,r∈{xσ(1),xσ(k)}}.

If r = xσ(1), we consider the new ordering x¯,xσ(1),…,xσ(k). Otherwise the ordering xσ(1),…,xσ(k),x¯ is considered. This process stops when all taxa are ordered.

The second procedure, or the Monte-Carlo procedure, relies on random iteration to find the optimal circular ordering. The Monte-Carlo algorithm starts its movement from the initial circular ordering, σ0 For each circular ordering σ, we define the neighborhood of σ, N (σ), by:


N(σ)={σ~∈∑|∃k;  2≤k≤n−1;                                 σ~={xσ(1),…,xσ(k−1),xσ(k+1),…,xσ(n),xσ(k)}},

where Σ is the set of all circular orderings.

We choose σ1 N (σ0) randomly. if η (σ1) ≤ η (σ0), then the system moves into ordering σ1. However we allow non-greedy movements for the system in order to avoid having the system trapped in local minima. In other words, if η (σ1) >η (σ0), then the system moves into ordering σ1 with a small probability e−η(σ1)+η(σ0)T, where T is a constant temperature. For each temperature, these movements are carried out t times. To compute the minimum energy we allow this process to continue until the temperature drops to zero (see the appendix for more details). Pseudo code of the Monte-Carlo algorithm is shown in Table 1. It is notticeable that the second procedure can start from any circular ordering other than the one obtained by the INITIAL procedure.

In the final step, we use the least squares algorithm to weight the splits of obtained circular ordering. Let A be the matrix with rows indexed by pairs of taxa and columns indexed by splits. Then for each pair of texa i and j and for each split k, Aij,k is defined by:


Aij,k={1if i and j are on opposite of split k;0otherwise.

The matrix A = [Aij,k] is full rank [17].

Let d = (d12, d13,...,d(n-1)n) be an n(n - 1)/2 dimensional vector corresponding to rows of A where dij is obtained by distance matrix. Let b be the weight vector of splits, then the phyletic distance vector is p = Ab.

Now, the ordinary least squares(OLS) is used to estimate b by the following standard formula

[Formula ID: bmcM2]
(2) 
b=(A′A)−1A′d.

If we discard splits with negative weights and leave the remaining splits unchanged, the weight of the remaining splits are often grossly overestimated. Similar to the N-Net algorithm, we compute the optimal least square estimates with a non-negative constraint. In this paper, we use the FNNLS algorithm [18].


Results and Discussion

In this section, we compare the results of the MC-Net and the N-Net on some data sets. We use SplitsTree4 program [19] for drawing phylogenetic networks. Due to the limitation of space, we insert only six figures in this article.

Data sets

One of the data sets, a collection of 110 Salmonella MLST Data, was obtained from authors of the N-Net. The other data sets presented as the examples in SplitsTree4 program (version 4.10): Its(46 taxa), Jsa (46 taxa), Mammals (30 taxa), Primates (12 taxa), Rubber (23 taxa), Dolphins (36 taxa) and Myosin (143 taxa).

Optimal threshold for cooling coefficient and Tlow

There are two parameters, Tlow and cooling coefficient, in the Monte-Carlo procedure. We first adjust Tlow between 105 and 0.2 to obtain the best cooling coefficient. The value of energy function and running time of algorithm for each Tlow for JSA data are given in Figure 2 (for the other data sets, the figures are the same as JSA). According to Figure 2, when cooling coefficient is 0.95, running time of the algorithm compared to other coefficients increases considerably. On the other hand, the value of energy function for 0.95 or 0.9 as a cooling coefficient is significantly better than the other cooling coefficients. Hence, we conclude that the best value of energy function with respect to running time of the algorithm is achieved when cooling coefficient is 0.9 and Tlow < 10-3.


Results

The initial test for performance of our method is done by calculating the value of energy function for circular orderings obtained by the MC-Net and the N-Net (Table 2). The first two rows of Table 2 show that in all data sets except Salmonella, the value of energy function for the MC-Net is less than those obtained from the N-Net. The interesting feature of the MC-Net algorithm is in finding different circular orderings by changing initial ordering. So, the MC-Net algorithm could take the circular ordering obtained by the N-Net as initial ordering. The third row of Table 2 shows the values of energy function for circular orderings achieved by the MC-Net with the circular ordering obtained by the N-Net as an initial ordering. For four data sets, Its, Rubber, Salmonella, Myosin, the third row indicates better results than the first row. But for the other data sets, the conclusions mentioned above are the vice versa.

Another test for the performance of our method is comparing the number of splits obtained by both the algorithms. In Table 3, the number of splits of circular orderings obtained by the MC-Net and the N-Net on different data sets are shown. In all data sets the number of splits obtained by the MC-Net is less than the N-Net except Primates. In this case, these two numbers are equal.

Let d be the input distance vector and P and P' are the phyletic distance vector of weighted splits obtained by the MC-Net and the N-Net, respectively. In Table 4, the value of norm of P - d and P' - d for each data set are shown. The norm of P - d is less than P' - d in all data sets even in Primates. It means that the results of the MC-Net algorithm give better approximation for input distance vector.

To illustrate difference between two algorithms, we present some examples of networks obtained by both the MC-Net and the N-Net using SplitsTree4 (Figures 3,4,5,6, 7 and 8). It is obvious that both algorithms give the same classification of taxa and exhibit the same major splits. For example, in Figures 5 and 6, we highlight some edges such that by removing the same-colored edges, the same clustering of taxa is obtained. But according to what we see in Tables 3 and 4, split networks obtained by the MC-Net are less complicated than split networks obtained by the N-Net. It means that the networks obtained by the MC-Net have less noise than the networks obtained by the N-Net. According to Corollary 1 (see Appendix), when t approaches to 1, the MC-Net finds optimal circular ordering with the probability 1. We examined our algorithm on several treelike distance matrices and it returned corresponding trees quickly. The MC-Net has been implemented in Matlab and is available for download at http://bioinf.cs.ipm.ac.ir/softwares/mc.net.


Conclusions

In this work, we propose an algorithm, MC-Net, which is a distance based method for constructing phylogenetic networks. The MC-Net scales well and can quickly produce detailed and informative networks for large number of taxa. We compare the performance of the MC-Net with the N-Net on eight different data sets. We have shown (Tables 2, 3 and 4) that the MC-Net performs better than the N-Net for almost test cases and the networks obtained by the MC-Net are simpler than the N-Net with the same major splits. The N-Net is a part of SplitsTree program. So, the results of the MC-Net could be used in SplitsTree program too.


Authors' contributions

CE, RH and EM performed initial studies. MH designed the algorithm. RH and EM analysis the data sets. All authors participated in the writing of the manuscript. All authors read and approved the final manuscript.


Appendix

Let S = {E1,...,Es} be a finite set of states, and consider a physical process having these discrete states at time t. A Markov chain is a stochastic model of this system, such that the state of system at time t + 1 depends only on the state of system at time t.

Consider X0, X1,..., be a collection of Markov random variables, such that Xn is the state of the system at time n. Let pij be the probability that the system enters into the state Ej from the state Ei, where i, j ∈ {1,...,s} The matrix P = (pij)1≤i, js is called transition matrix. A probability distribution q = (q1,...,qs) such that qi is the probability that system starts its movement from the state Ei , is called initial probability distribution. A Markov chain is a stochastic model X0, X1,..., such that Xt is the state of the system at time t. For each i and j in {1,2,...,s};


prob(X0=Ei)=qi,prob(Xt+1=Ej|Xt=Ei)=pij.

The Markov chain is irreducible, if for all i, j ∈ {1,...,s} there exists n > 0 such that pij(n)>0, where


∀α   pij(n)=prob(Xn+α=Ej|Xα=Ei).

In other words, the Markov chain is irreducible, if there exist n such that the probability that the system enters into the state Ej from the state Ei after n times is positive. The irreducible Markov chain is called aperiodic, if for some n ≥ 0 and some state Ej,


prob(Xn=Ej|X0=Ej)>0&prob(Xn+1=Ej|X0=Ej)>0.

Theorem 1(Convergence to stationary Markov chain, [20])

If the Markov chain is irreducible and aperiodic then


limt→∞prob(Xt=Ej)=πj      j=1,…,s

such that π = (π1,...,πs) is a unique probability distribution and πj=∑i=1sπipij.

The probability distribution is π is called stationary probability of the Markov chain.

It means that if P is the transition matrix and P(t) is the tth power of P, when t → ∞ the jth column of transition matrix is approximately equal to πj. In the Monte-Carlo algorithm, a special kind of Markov chain is used. Let Σ be the finite set of states and q=(1|Σ|,…,1|Σ|) is the initial probability distribution. For each state i the neighborhood of i, N(i), is defined as the set of all the states that are reachable from i by one movement. In this system the set of neighborhoods have to satisfy the following properties:

1. i, ∉ N(i).

2. i N(j) ⇔ j N(i)

3. if i ≠ j, then there exit i1,i2,...,i1 ∈ Σ such that


i∈N(i1),i1∈N(i2),…,il∈N(j).

The matrix PT=(pijT)i,j∈Σ is defined as the transition matrix by


pijT={1|N(i)|if j∈N(i) and η(j)≤η(i),e−(η(j)−η(i))/T|N(i)|if j∈N(i) and η(j)>η(i),1−∑k∈Σ,k≠ipikTif i=j,0otherwise,

where T is a positive constant number (constant temperature). The third property of the neighborhood shows that this Markov chain is irreducible. Also, if PiiT>0 and PT contains non-negative entries then (PT)ii(t)>0 for all t ≥ 0. So, it is a finite, aperiodic and irreducible Markov chain. The theorem 1 shows that for each constant temperature T and i ∈ Σ, there exists a stationary probability distribution πiT such that:


limt→∞prob(Xt=i)=πiT,

Where πiT=e−η(i)T∑j∈∑e−η(j)T (see page 45 in [20]).

Proposition 1. Let (πiT)i∈Σ be a probability distribution such that:


πiT=e−η(i)T∑j∈∑e−η(j)T

and suppose that m0 = min{η(i) | i ∈ Σ} and, η0 = {i ∈ Σ | η(i) = m0} then for each i∈Σ,limT→0+πiT=πi0, where


πi0={1|η0|if i∈η0;0otherwise.

Proof: The proof is presented in [20] (claim 2.8 and claim 2.9).

Corollary 1. Let Σ be the finite set of states, then for each i ∈ Σ we have


limT→0+limt→∞prob(Xt=i)=πi0.

The corollary 1 illustrates that by cooling temperature (T → 0+), system enters into one of the states of η0 with the probability 1 after t (t → ∞) time. In this article, we define the set of all circular orderings of taxa as the finite set of states. Our definition of neighborhood in the MC-Net satisfies in three properties of neighborhood and every elements of η0 is an optimal circular ordering. Therefore, the MC-Net yields a circular ordering with approximately minimal energy function.


Acknowledgements

We are grateful to the faculty of mathematics of Shahid Beheshti University. This work is supported in part by IPM(cs-1385-02). The authors would like to thank Prof. Hamid Pezeshk for many useful comments.


References
Kidd KK,Sgamarella-Zonta LA,Phylogenetic analysis: concepts and methodsAm J Human GeneticsYear: 197123235252
Cavalli-Sforza LL,Edwards AWF,Phylogenetic analysis: models and estimating proceduresAm J Hum GenetYear: 196719233257 (1967). 6026583
Fitch WM,Margoliash E,Construction of phylogenetic treesScienceYear: 196715527928410.1126/science.155.3760.2795334057
Saitou N,Nei N,The neighbor joining method: a new method for reconstructing phylogenetic treesMolecular Biology and EvolutionYear: 198744064253447015
Sattath S,Tversky A,Phylogenetic similarity treesPsychometrikaYear: 19774231934510.1007/BF02293654
Bryant D,Moulton V,NeighborNet: An agglomerative method for the construction of planar phylogenetic networksMolecular Biology And EvolutionYear: 20042125526510.1093/molbev/msh01814660700
Grunewald S,Forslund K,Dress A,Moulton V,QNet: An agglomerative method for the construction of phylogenetic networks from weighted quartetsMolecular Biology and EvolutionYear: 20072453253810.1093/molbev/msl18017119010
Rzhetsky A,Nei M,Theoretical foundation of the minimum-evolution method of phylogenetic inferenceMol Biol EvolYear: 199310107310958412650
Gascuel O,Steel M,Neighbor-joining revealedMolecular Biology and EvolutionYear: 2006231997200010.1093/molbev/msl07216877499
Bandelt HJ,Dress AWM,Split decomposition: A new and useful approach to phylogenetic analysis of distance dataMol Phyl EvolYear: 1992124225210.1016/1055-7903(92)90021-8
Huson DH,SplitsTree: A program for analyzing and visualizing evolutionary dataBioinformaticsYear: 19981410687310.1093/bioinformatics/14.1.689520503
Levy D,Patcher L,The Neighbor-Net AlgorithmAdvances in Applied Mathematics in press . 20161590
Desper R,Gascuel O,Gascuel OThe Minimum-Evolution Distance Based Approach to Phylogenetic InferenceMath Evolution and PhylogenyYear: 2005Oxford Univ. Press
Semple C,Steel M,Cyclic permutations and evolutionary treesAdv Appl MathYear: 200432466968010.1016/S0196-8858(03)00098-8
Semple C,Steel M,PhylogeneticsYear: 2003Oxford, UK: Oxford University Press
Dress A,Huson DH,Constructing splits graphsIEEE/ACM Transactions in Computational Biology and BioinformaticsYear: 2004110911510.1109/TCBB.2004.27
Bandelt H-J,Dress A,A canonical decomposition theory for metrics on a finite setAdv MathYear: 1992924710510.1016/0001-8708(92)90061-O
Bro R,Jong SD,A Fast Non-negativity-constrained Least Squares AlgorithmJournal of ChemometricsYear: 199711539340110.1002/(SICI)1099-128X(199709/10)11:5<393::AID-CEM483>3.0.CO;2-L
Huson D,Bryant D,Application of phylogenetic networks in evolutionary studiesMolecular Biology and EvolutionYear: 20052325426710.1093/molbev/msj03016221896
Clote P,Backofen R,Computational molecular biologyYear: 2000New York, WILEY

Figures

[Figure ID: F1]
Figure 1 

The split S = A|B is obtained by removing the edge e of T.



[Figure ID: F2]
Figure 2 

The value of energy function (b) and running time of algorithm (a) for each Tlow for JSA data.



[Figure ID: F3]
Figure 3 

The N-Net network for the Rubber data set.



[Figure ID: F4]
Figure 4 

The MC-Net network for the Rubber data set.



[Figure ID: F5]
Figure 5 

The N-Net network for the Mammal data set.



[Figure ID: F6]
Figure 6 

The MC-Net network for the Mammal data set.



[Figure ID: F7]
Figure 7 

The N-Net network for the Salmonella data set. Group A includes the isolates Sty54, Sty54*, Sty2, She9, Sty87, Snp40*, Sty13, Snp41*, Sen5, Sha160, Sha141, Sty20*, Sha58, Sse18, Sha71, Sty31. Group B includes the isolates Sty61, Sha148, Smb-17, Sag75, Sha124. Group C includes the isolates UND3, Sha150, Sha173, Sen23*, Sha153, Sha140, San96, Sen30*, Sen24*, Sha138, Sha176, Sha130, Sha164, Sha157, Sen29*, Sca93, Sha122, Sht20, Sha186. Group D includes the isolates She3, Sha50, Sse95, Sha56, Sen24, Sen34, Sha177, Sty13*, Swo44, Sty86, Ste41, Sha77, UND80. Group E includes the isolates Ssc40, Sse28, Sty89, Sty15*, Ske69, UND110, Sha49, Sen4, Sha48, Sha165, Sty92, Snp33*, Sty52, UND109, Sha131, Sha102, Sty6, Sha175.



[Figure ID: F8]
Figure 8 

The MC-Net network for the Salmonella data set. Group A includes the isolates Sty54, Sty54*, Sty2, She9, Sty87, Snp40*, Sty13, Snp41*, Sen5, Sha160, Sha141, Sty20*, Sha58, Sse18, Sha71, Sty31. Group B includes the isolates Sty61, Sha148, Smb-17, Sag75, Sha124. Group C includes the isolates UND3, Sha150, Sha173, Sen23*, Sha153, Sha140, San96, Sen30*, Sen24*, Sha138, Sha176, Sha130, Sha164, Sha157, Sen29*, Sca93, Sha122, Sht20, Sha186. Group D includes the isolates She3, Sha50, Sse95, Sha56, Sen24, Sen34, Sha177, Sty13*, Swo44, Sty86, Ste41, Sha77, UND80. Group E includes the isolates Ssc40, Sse28, Sty89, Sty15*, Ske69, UND110, Sha49, Sen4, Sha48, Sha165, Sty92, Snp33*, Sty52, UND109, Sha131, Sha102, Sty6, Sha175.



Tables
[TableWrap ID: T1] Table 1 

Pseudo code of the Monte-Carlo algorithm with simulated annealing


Input: T initial temperature
  σ0 initial ordering
  Tlow low temperature
  t constant number

σ =σ0
While T > Tlow
Repeat t time
  choose random σ~∈N(σ)
  If η(σ~)≤η(σ)
   σ=σ~
  Else
   x = random(0, 1)
   If x<e−η(σ~)+(σ)T
   σ=σ~
T = T * 0.9
Return σ and η(σ)

[TableWrap ID: T2] Table 2 

Values of energy function: the values of energy function for circular orderings obtained by the N-Net, the MC-Net and the MC-Net with initial ordering of the N-Net.


Data set Its Jsa Mammals Primates
N-Net 0.4096 0.2808 4.4275 2.1465
MC-Net 0.4079 0.2728 4.4172 2.1410
start N-Net 0.3979 0.2767 4.4202 2.1410


Data set Rubber Dolphins Salmonella Myosin

N-Net 0.7723 2.2 0.2546 43.8199
MC-Net 0.7596 2.1667 0.2575 43.8019
start N-Net 0.7547 2.2 0.2515 43.6935

[TableWrap ID: T3] Table 3 

The number of splits obtained by the MC-Net and the N-Net for all data sets.


Data set Its Jsa Mammals Primates
N-Net 110 83 103 34
MC-Net 105 78 99 34


Data set Rubber Dolphins Salmonella Myosin

N-Net 55 67 107 520
MC-Net 53 62 90 507

[TableWrap ID: T4] Table 4 

The value of norm for all data sets.


Data set Its Jsa Mammals Primates
N-Net 0.0444 0.0329 0.0717 0.0385
MC-Net 0.0358 0.0292 0.0648 0.0358


Data set Rubber Dolphins Salmonella Myosin

N-Net 0.0362 0.1068 0.0487 0.0291
MC-Net 0.0316 0.1019 0.0405 0.0207


Article Categories:
  • Research Article


Previous Document:  Kafka, paranoic doubles and the brain: hypnagogic vs. hyper-reflexive models of disrupted self in ne...
Next Document:  Risk factors and correlates for anemia in HIV treatment-naïve infected patients: a cross-sectional ...