Document Detail

Comparison of gene regulatory networks via steady-state trajectories.
Jump to Full Text
MedLine Citation:
PMID:  18309365     Owner:  NLM     Status:  PubMed-not-MEDLINE    
Abstract/OtherAbstract:
The modeling of genetic regulatory networks is becoming increasingly widespread in the study of biological systems. In the abstract, one would prefer quantitatively comprehensive models, such as a differential-equation model, to coarse models; however, in practice, detailed models require more accurate measurements for inference and more computational power to analyze than coarse-scale models. It is crucial to address the issue of model complexity in the framework of a basic scientific paradigm: the model should be of minimal complexity to provide the necessary predictive power. Addressing this issue requires a metric by which to compare networks. This paper proposes the use of a classical measure of difference between amplitude distributions for periodic signals to compare two networks according to the differences of their trajectories in the steady state. The metric is applicable to networks with both continuous and discrete values for both time and state, and it possesses the critical property that it allows the comparison of networks of different natures. We demonstrate application of the metric by comparing a continuous-valued reference network against simplified versions obtained via quantization.
Authors:
Marcel Brun; Seungchan Kim; Woonjung Choi; Edward R Dougherty
Related Documents :
19213135 - Dense graphlet statistics of protein interaction and random networks.
1862075 - Olfactory computation and object perception.
7922685 - A comparison of radial basis function and backpropagation neural networks for identific...
22060175 - Neural network modelling and dynamical system theory: are they relevant to study the go...
18263105 - Managing search complexity in linguistic geometry.
643935 - Characteristics of physical therapy role models.
Publication Detail:
Type:  Journal Article    
Journal Detail:
Title:  EURASIP journal on bioinformatics & systems biology     Volume:  -     ISSN:  1687-4145     ISO Abbreviation:  EURASIP J Bioinform Syst Biol     Publication Date:  2007  
Date Detail:
Created Date:  2008-02-29     Completed Date:  2010-06-28     Revised Date:  2012-04-26    
Medline Journal Info:
Nlm Unique ID:  101263720     Medline TA:  EURASIP J Bioinform Syst Biol     Country:  United States    
Other Details:
Languages:  eng     Pagination:  82702     Citation Subset:  -    
Affiliation:
Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ 85004, USA.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Comments/Corrections

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Full Text
Journal Information
Journal ID (nlm-ta): EURASIP J Bioinform Syst Biol
Journal ID (publisher-id): BSB
ISSN: 1687-4145
ISSN: 1687-4153
Publisher: Hindawi Publishing Corporation
Article Information
Download PDF
Copyright ? 2007 Marcel Brun et al.
open-access: This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Received Day: 31 Month: 7 Year: 2006
Accepted Day: 24 Month: 2 Year: 2007
Print publication date: Year: 2007
Electronic publication date: Day: 23 Month: 5 Year: 2007
Volume: 2007E-location ID: 82702
ID: 1950252
PubMed Id: 18309365
DOI: 10.1155/2007/82702

Comparison of Gene Regulatory Networks via Steady-State Trajectories
Marcel Brun*1
Seungchan Kim1, 2a2
Woonjung Choi3
Edward R. Dougherty1, 4, 5a4a5
1Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ 85004, USA
2School of Computing and Informatics, Ira A. Fulton School of Engineering, Arizona State University, Tempe, AZ 85287, USA
3Department of Mathematics and Statistics, College of Liberal Arts and Sciences, Arizona State University, Tempe, AZ 85287, USA
4Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
5Cancer Genomics Laboratory, Department of Pathology, University of Texas M.D. Anderson Cancer Center, Houston, TX 77030, USA
Correspondence: *Marcel Brun: marcelbrun@yahoo.com
[other] Recommended by Ahmed H. Tewfik

1. INTRODUCTION

The modeling of genetic regulatory networks (GRNs) is becoming increasingly widespread for gaining insight into the underlying processes of living systems. The computational biology literature abounds in various network modeling approaches, all of which have particular goals, along with their strengths and weaknesses [1, 2]. They may be deterministic or stochastic. Network models have been studied to gain insight into various cellular properties, such as cellular state dynamics and transcriptional regulation [3?8], and to derive intervention strategies based on state-space dynamics [9, 10].

Complexity is a critical issue in the synthesis, analysis, and application of GRNs. In principle, one would prefer the construction and analysis of a quantitatively comprehensive model such as a differential equation-based model to a coarsely quantized discrete model; however, in practice, the situation does not always suffice to support such a model. Quantitatively detailed (fine-scale) models require significantly more complex mathematics and computational power for analysis and more accurate measurements for inference than coarse-scale models. The network complexity issue has similarities with the issue of classifier complexity [11]. One must decide whether to use a fine-scale or coarse-scale model [12]. The issue should be addressed in the framework of the standard engineering paradigm: the model should be of minimal complexity to solve the problem at hand.

To quantify network approximation and reduction, one would like a metric to compare networks. For instance, it may be beneficial for computational or inferential purposes to approximate a system by a discrete model instead of a continuous model. The goodness of the approximation is measured by a metric and the precise formulation of the properties will depend on the chosen metric.

Comparison of GRN models needs to be based on salient aspects of the models. One study used the L1 norm between the steady-state distributions of different networks in the context of the reduction of probabilistic Boolean networks [13]. Another study compared networks based on their topologies, that is, connectivity graphs [14]. This method suffers from the fact that networks with the same topology may possess very different dynamic behaviors. A third study involved a comprehensive comparison of continuous models based on their inferential power, prediction power, robustness, and consistency in the framework of simulations, where a network is used to generate gene expression data, which is then used to reconstruct the network [15]. A key drawback of most approaches is that the comparison is applicable only to networks with similar representations; it is difficult to compare networks of different natures, for instance, a differential-equation model to a Boolean model. A salient property of the metric proposed in this study is that it can compare networks of different natures in both value and time.

We propose a metric to compare deterministic GRNs via their steady-state behaviors. This is a reasonable approach because in the absence of external intervention, a cell operates mainly in its steady state, which characterizes its phenotype, that is, cell cycle, disease, cell differentiation, and so forth. [16?19]. A cell's phenotypic status is maintained through a variety of regulatory mechanisms. Disruption of this tight steady-state regulation may lead to an abnormal cellular status, for example, cancer. Studying steady-state behavior of a cellular system and its disruption can provide significant insight into cellular regulatory mechanisms underlying disease development.

We first introduce a metric to compare GRNs based on their steady-state behaviors, discuss its characteristics, and treat the empirical estimation of the metric. Then we provide a detailed application to quantization utilizing the mathematical framework of reference and projected networks. We close with some remarks on the efficacy of the proposed metric.


2. METRIC BETWEEN NETWORKS

In this section, we construct the distance metric between networks using a bottom-up approach. Following a description of how trajectories are decomposed into their transient and steady-state parts, we define a metric between two periodic or constant functions and then extend this definition to a more general family of functions that can be decomposed between transient and steady-state parts.

2.1. Steady-state trajectory

Given the understanding that biological networks exhibit steady-state behavior, we confine ourselves to networks exhibiting steady-state behavior. Moreover, since a cell uses nutrients such as amino acids and nucleotides in cytoplasm to synthesize various molecular components, that is, RNAs and proteins [18], and since there are only limited supplies of nutrients available, the amount of molecules present in a cell is bounded. Thus, the existence of steady-state behavior implies that each individual gene trajectory can be modeled as a bounded function f (t) that can be decomposed into a transient trajectory plus a steady-state trajectory:

[Formula ID: Eq1]
(1) 
f(t)=ftran(t)+fss(t),
where limt??ftran (t) = 0 and fss (t) is either a periodic function or a constant function.

The limit condition on the transient part of the trajectory indicates that for large values of t, the trajectory is very close to its steady-state part. This can be expressed in the following manner: for any ? > 0, there exists a time tss such that | f (t) ? fss (t) | < ? for t > tss. This property is useful to identify fss (t) from simulated data by finding an instant tss such that f (t) is almost periodical or constant for t > tss.

A deterministic gene regulatory network, whether it is represented by a set of differential equations or state transition equations, produces different dynamic behaviors, depending on the starting point. If ? is a network with N genes and x0 is an initial state, then its trajectory,

[Formula ID: Eq2]
(2) 
f(?,x0)(t)={f(?,x0)(1)(t),?,f(?,x0)(N)(t)},
where
f(?,x0)(i)(t)
is a trajectory for an individual gene (denoted by f(i) (t) or f (t) where there is no ambiguity) generated by the dynamic behavior of the network ? when starting at x0. For a differential-equation model, the trajectory f(?, x0) (t) can be obtained as a solution of a system of differential equations; for a discrete model, it can be obtained by iterating the system's transition equations. Trajectories may be continuous-time functions or discrete-time functions, depending on the model.

The decomposition of (1) applies to f(?, x0) (t) via its application to the individual trajectories

f(?,x0)(i)(t)
. In the case of discrete-valued networks (with bounded values), the system must enter an attractor cycle or an attractor state at some time point tss. In the first case f(?, x0),ss (t) is periodical, and in the second case it is constant. In both cases, f(?, x0),tran (t) = 0 for t ? tss.

2.2. Distance based on the amplitude cumulative distribution

Different metrics have been proposed to compare two real-valued trajectories f (t) and g (t), including the correlation

?f,g?
, the cross-correlation ?f,g (?), the cross-spectral density pf,g (?), the difference between their amplitude cumulative distributions F (x) = pf (x) and G (x) = pg (x), and the difference between their statistical moments [20]. Each has its benefits and drawbacks depending on one's purpose. In this paper, we propose using the difference between the amplitude cumulative distributions of the steady-state trajectories.

Let fss (t) and gss (t) be two measurable functions that are either periodical or constant, representing the steady-state parts of two functions, f (t) and g (t), respectively. Our goal is to define a metric (distance) between them by using the amplitude cumulative distribution (ACD), which measures the probability density of a function [20].

If fss (t) is periodic with period tp > 0, its cumulative density function F (x) over ? is defined by

[Formula ID: Eq3]
(3) 
F(x)=?(?(x)tp),
where ? (A) is the Lebesgue measure of the set A and
[Formula ID: Eq4]
(4) 
?(x)={ts?t<te|fss(t)?x},
where te = ts + tp, for any point ts.

If fss is constant, given by fss (t) = a for any t, then we define F (x) as a unit step function located at x = a. Figure 1 shows an example of some periodical functions and their amplitude cumulative distributions.

Given two steady-state trajectories, fss (t) and gss (t), and their respective amplitude cumulative distributions, F (x) and G (x), we define the distance between fss and gss as the distance between the distributions

[Formula ID: Eq5]
(5) 
dss(fss,gss)=?F?G?
for some suitable norm
???
. Examples of norms include L?, definedby the supremum of their differences,
[Formula ID: Eq6]
(6) 
dL?(f,g)=sup?0?x??|F(x)?G(x)|,
and L1 defined by the area of the absolute value of their difference,
[Formula ID: Eq7]
(7) 
dL1(f,g)=?0?x<?|F(x)?G(x)|dx.
In both cases, we apply the biological constraint that the amplitudes are nonnegative.

The L1 norm is well suited to the steady-state behavior because in the case of constant functions f (t) = a and g (t) = b, their distributions are unit steps functions at x = a and x = b, respectively, so that

dL1
(f, g) = | a ? b |, the distance, in amplitude, between the two functions. Hence, we can interpret the distance
dL1
(f, g) as an extension of the distance, in amplitude, between two constant signals, to the general case of periodic functions, taking into consideration the differences in their shapes.

2.3. Network metric

Once a distance between their steady-state trajectories is defined, we can extend this distance to two trajectories f (t) and g (t) by

[Formula ID: Eq8]
(8) 
dtr(f,g)=dss(fss,gss),
where dss is defined by (5).

The next step is to define the distance between two multivariate trajectories f (t) and g (t) by

[Formula ID: Eq9]
(9) 
dtr(f,g)=1N?i=1Ndtr(f(i),g(i)),
where f(i) (t) and g(i) (t) are the component trajectories of f (t) and g (t), respectively. Owing to the manner in which a norm is used to define dss, in conjunction with the manner in which dtr is constructed from dss, the triangle inequality
[Formula ID: Eq10]
(10) 
dtr(f,h)?dtr(f,g)+dtr(g,h)
holds, and dtr is a metric.

The last step is to define the metric between two networks as the expected distance between the trajectories over all possible initial states. For networks ?1 and ?2, we define

[Formula ID: Eq11]
(11) 
d(?1,?2)=ES[dtr(f(?1,x0),f(?2,x0))],
where the expectation is taken with respect to the space S of initial states.

The use of a metric, in particular, the triangle inequality, is essential for the problem of estimating complex networks by using simpler models. This is akin to the pattern recognition problem of estimating a complex classifier via a constrained classifier to mitigate the data requirement. In this situation, there is a complex model that represents a broad family of networks and a simpler model that represents a smaller class of networks. Given a reference network from the complex model and a sampled trajectory from it, we want to estimate the optimal constrained network. We can identify the optimal constrained network, that is, projected network, as the one that best approximates the complex one, and the goal of the inference process should be to obtain a network close to the optimal constrained network. Let ? be a reference network (e.g., a continuous-valued ODE-based network), let P (?) be the optimal constrained network (e.g., a discrete-valued network), and let

??
be an estimator of P (?) estimated from data sampled from ?. Then
[Formula ID: Eq12]
(12) 
d(??,?)?d(??,P(?))+d(P(?),?),
where the following distances have natural interpretations:
  1. d(??,?)
    is the overall distance and quantifies the approximation of the reference network by the estimated optimal constrained network;
  2. d(??,p(?))
    is the estimation distance for the constrained network and quantifies the inference of the optimal constrained network;
  3. d (P (?), ?) is the projection distance and quantifies how well the optimal constrained network approximates the reference network.

This structure is analogous to the classical constrained regression problem, where constraints are used to facilitate better inference via reduction of the estimation error (so long as this reduction exceeds the projection error) [11]. In the case of networks, the constraint problem becomes one of finding a projection mapping for models representing biological processes for which the loss defined by d (P (?), ?) may be maintained within manageable bounds so that with good inference techniques, the estimation error defined by

d(??,P(?))
will be minimized.

2.4. Estimation of the amplitude cumulative distribution

The amplitude cumulative distribution of a trajectory can be estimated by simulating the trajectory and then estimating the ACD from the trajectory. Assuming that the steady-state trajectory fss (t) is periodic with period tp, we can analyze fss (t) between two points, ts and te = ts + tp. For a continuous function fss (t), we assume that any amplitude value x is visited only a finite number of times by fss (t) in a period ts ? t < te. In accordance with (3), we define the cumulative distribution

[Formula ID: Eq13]
(13) 
F(x)=?({ts?t?te|fss(t)?x})tp.
To calculate F (x) from a sampled trajectory, for each value x, let Sx be the set of points where fss (t) = x:
[Formula ID: Eq14]
(14) 
Sx={ts?t?te|fss(t)=x}?{ts,te}.
The set Sx is finite. Let
n=|Sx|
denote the number of elements t0, ? , tn ? 1. These can be sorted so that ts = t0 < t1 < t2 < ? < tn ? 1 = te. Now we define the set mi, ? i = 0, ? , n ? 2, of intermediate values between two consecutive points where fss (t) crosses x (see Figure 2) by
[Formula ID: Eq15]
(15) 
mi=fss(ti+ti+12).

Let Ix be a set of the indices of points ti such that the function f (t) is below x in the interval [ti, ti + 1],

[Formula ID: Eq16]
(16) 
Ix={0?i?n?2|mi?x}.
Finally, the cumulative distribution F (x), defined by the measure of the set {ts ? t ? te ? f (t) ? x}, can be computed as the sum of the lengths of the intervals where f (t) ? x:
[Formula ID: Eq17]
(17) 
F(x)=?i?Ix(ti+1?ti)tp.
The estimation of F (x) from a finite set {a1, ? , am} representing the function f (t) at points t1, ? , tm reduces to estimating the values in (17):
[Formula ID: Eq18]
(18) 
F?(x)=|{1?i?m|ai?x}|m
at the points ai, i = 1, ? , m.

In the case of computing the distance between two functions f (t) and g (t), where the only information available consists of two samples, {a1, ? , am} and {b1, ? , br}, for f and g, respectively, both cumulative distributions

F?(x)
and
G?(x)
need only be defined at the points in the set
[Formula ID: Eq19]
(19) 
S={a1,?,am}?{b1,?,br}.
In this case, if we sort the set S so that 0 = s0 < s2 < ? < sk = T (with T being the upper limit for the amplitude values, and k ? r + m), then (6) can be approximated by
[Formula ID: Eq20]
(20) 
d?L?(f,g)=max?0?i?k|F?(si)?G?(si)|
and (7) can be approximated by
[Formula ID: Eq21]
(21) 
d?L1(f,g)=?0?i?k?1(si+1?si)|F?(si)?G?(si)|.


3. APPLICATION TO QUANTIZATION

To illustrate application of the network metric, we will analyze how different degrees of quantization affect model accuracy. Quantization is an important issue in network modeling because it is imperative to balance the desire for fine description against the need for reduced complexity for both inference and computation. Since it is difficult, if not impossible, to directly evaluate the goodness of a model against a real biological system, we will study the problem using a standard engineering approach. First, an in numero reference network model or system is formulated. Then, a second network model with a different level of abstraction is introduced to approximate the reference system. The objective is to investigate how different levels of abstraction, quantization levels in this study, impact the accuracy of the model prediction. The first model is called the reference model. From it, reference networks will be instantiated with appropriate sets of model parameters. The model will be continuous-valued to approximate the reference system at its fullest closeness. The second model is called a projected model, and projected networks will be instantiated from it. This model will be a discrete-valued model at a given different level of quantization.

The ability of a projected network, an instance of the projected model, to approximate a reference network, an instance of the reference model, can be evaluated by comparing the trajectories generated from each network with different initial states and computing the distances between the networks as given by (11).

3.1. Reference model

The origin of our reference model is a differential-equation model that quantitatively represents transcription, translation, cis-regulation and chemical reactions [7, 15, 21]. Specifically, we consider a differential-equation model that approximates the process of transcription and translation for a set of genes and their associated proteins (as illustrated in Figure 3) [7].The model comprises the following differential equations:

[Formula ID: Eq22]
(22) 
dpi(t)dt=?iri(t??p,i)??ipi(t),????i??,dri(t)dt=?ici(t??r,i)??iri(t),????i??,ci(t)=?i[pj(t??c,j),j??i],????i??,
where ri and pi are the concentrations of mRNA and proteins induced by gene i, respectively, ci (t) is the fraction of DNA fragments committed to transcription of gene i, ?i is the transcription rate of gene i, and ?p,i, ?r,i and ?c,i are the time delays for each process to start when the conditions are given. The most general form for the function ?i is a real-valued (usually nonlinear) function with domain in
?|?i|
and range in
?,??i:?|?i|??
. The functions are defined by the equations
[Formula ID: Eq23]
(23) 
?i[pj,j??i]=[1??j??i+?(pj,Sij,?ij)]??j??i??(pj,Sij,?ij),?(p,S,?)=1(1+?p)S,
where the parameters ? are the affinity constants and the parameters Sij are the distinct sites for gene i where promoter j can bind. The functions depend on the discrete parameter Sij, the number of binding sites for protein j on gene i, and ?ij, the affinity constant between gene i and protein j.

A discrete-time model results from the preceding continuous-time model by discretizing the time t on intervals n?t, and the assumption that the fraction of DNA fragments committed to transcription and concentration of mRNA remains constant in the time interval [t ? ?t, t) [7]. In place of the differential equations for ri, pi, and ci, at time t = n?t, we have the equations

[Formula ID: Eq24]
(24) 
ri(n)=e??i?tri(n?1)+?is(?i,?t)ci(n?nr,i?1),pi(n)=e??i?tpi(n?1)+?is(?i,?t)ri(n?np,i?1),ci(n)=?i[pj(n?nc,j),j??i],????i??,
where nr,i = ?r,i/?t, np,i = ?p,i/?t, nc,j = ?c,j/?t, and
[Formula ID: Eq25]
(25) 
s(x,y)=1?e?xyx.
This model, which will serve as our reference model, is called a (discrete) transcriptional regulatory system (tRS).

We generate networks using this model and a fixed set ? of parameters. We call these networks reference networks. A reference network is identified by its set ? of parameters,

[Formula ID: Eq26]
(26) 
?=(?1,?1,?1,?1,?1,?p,1,?r,1,?c,1,?1,?1,?,?N,?N,?N,?N,?N,?p,N,?r,N,?c,N,?N,?N).

3.2. Projected model

The next step is to reduce the reference network model to a projected network model. This is accomplished by applying constraints in the reference model. The application of constraints modifies the original model, thereby obtaining a simpler one. We focus on quantization of the gene expression levels (which are continuous-valued in the reference model) via uniform quantization, which is defined by a finite or denumerable set ? of intervals,

L1=[0,?x),L2=[?x,2?x),?,Li=[(i?1)?x,i?x),?,
and a mapping
??:????
such that
?(x)=ai
for some collection of points ai ? Li.

The equations for ri, pi, and ci(24) are replaced by

[Formula ID: Eq27]
(27) 
r?i(n)=?(e??i?tr?i(n?1)+?is(?i,?t)c?i(n?nr,i?1)),
[Formula ID: Eq28]
(28) 
p?i(n)=?(e??i?tp?i(n?1)+?is(?i,?t)r?i(n?np,i?1)),
[Formula ID: Eq29]
(29) 
c?i(n)=?i[p?j(n?nc,j),?j??i],????i??.

Issues to be investigated include (1) how different quantization techniques (specification of the partition ?) affect the quality of the model; (2) which quantization technique (mapping ?) is the best for the model; and (3) the similarity of the attractors of the dynamical system defined by (27) and (28) to the steady state of the original system, as a function of ?x. We consider the first issue.

3.3. A hypothetical metabolic pathway

To illustrate the proposed metric in the framework of the reference and projected models, we compare two networks based on a hypothetical metabolic pathway. We first briefly describe the hypothetical metabolic pathway with necessary biochemical parameters to set up a reference system. Then, the simulation study shows the impacts of various quantization levels in both time and trajectory based on the proposed metric.

We consider a gene regulatory network consisting of four genes. A graphical representation of the system is depicted in Figure 4, where ? denotes an activator and ? denotes a repressor. We assume that the GRN regulates a hypothetical pathway, which metabolizes an input substrate to an output product. This is done by means of enzymes whose transcriptional control is regulated by the protein produced from gene 3. Moreover, we assume that the effect of a higher input substrate concentration is to increase the transcription rate ?1, whereas the effect of a lower substrate concentration is to reduce ?1. Unless otherwise specified, the parameters are assumed to be gene-independent. These parameters are summarized in Table 1.

We assume that each cis-regulator is controlled by one module with four binding sites, and set S = 4, ? = 108 M?1, ?2 = ?3 = ?4 = 0.05 pMs?1, and ? = 0.05 s?1. The value of the affinity constant ? corresponds to a binding free energy of ?U = ?11.35 kcal/mol at temperature T = 310.15?K (or 37?C). The values of the transcription rates ?2, ?3, and ?4 correspond to transcriptional machinery that, on the average, produces one mRNA molecule every 8 seconds. This value turns out to be typical for yeast cells [22]. We also assume that on the average, the volume of each cell in 𝒞 equals 4 pL [18]. The translation rate ? is taken to be 10-fold larger than the rate of 0.3/minute for translation initiation observed in vitro using a semipurified rabbit reticulocyte system [23].

The degradation parameters ? and ? are specified by means of the mRNA and protein half-life parameters ? and ?, respectively, which satisfy

[Formula ID: Eq30]
(30) 
e???=12,??????e???=12.
In this case,
[Formula ID: Eq31]
(31) 
?=ln?2?,???????=ln?2?.

3.4. Results and discussion

It is expected that the finer the quantization is (smaller values of ?x), the more similar will be the projected networks to the reference networks. This similarity should be reflected by the trajectories as measured by the proposed metric. A straightforward simulation consists of the design of a reference network, the design of a projected network (for some value of ?x), the generation of several trajectories for both networks from randomly selected starting points, and the computation of the average distance between trajectories, using (9) and (21). Each process is repeated for different time intervals ?t to study how the time intervals used in the simulation affect the analysis.

The first simulation is based on the same 4-gene model presented in [7]. We use 6 different quantization levels, ?x = 0, 0.001, 0.01, 0.1, 1, and 10, where ?x = 0 means no quantization, and designates the reference network. For each quantization level ?x and starting point x0, we generate the simulated time series expression and compare it to the time-series generated with ?x = 0 (the reference network), estimating the proposed metric using (21). The process is repeated using a total of 10 different time intervals, ?t = 1 second, 5 seconds, 10 seconds, 30 seconds, 1 minute, 2 minutes, 5 minutes, 10 minutes, 30 minutes, and 1 hour. The simulation is repeated and the distances are averaged for 30 different starting points x0.

Figures 5 and 6 show the trajectories and empirical cumulative density functions estimated from the simulated system as illustrated in the previous section. Several quantization levels are used in the simulation. The last graph in Figure 5 shows the mRNA concentration for the forth gene, over the 10 000 first seconds (transient) and over the last 10 000 seconds (steady-state). We can see that for quantizations 0 and 0.001, the steady-state solutions are periodic, and for quantizations 0.001 and 0.1, the solutions are constant. This is reflected by the associated plot of F (x) in Figure 6.

Figure 7 shows how strong quantization (high values of ?x) yields high distance, with the distance decreasing again when the time interval (?t) increases. The z-axis in the figure represents the distance

d?L1(f(?x,?t),f(?x=0,?t))
.

In our second simulation, we use a different connectivity (all other kinetic parameters are unchanged), and we again use 10 different time intervals, ?t = 1 second, 5 seconds, 10 seconds, 30 seconds, 1 minute, 2 minutes, 5 minutes, 10 minutes, 30 minutes and 1 hour, and 6 different quantization levels, ?x = 0, 0.001, 0.01, 0.1, 1, and 10. (?x = 0 meaning no quantization). The simulation is repeated and the distances are averaged for 30 different starting points. Analogous to the first simulation, Figure 9 shows how strong quantization (high values of ?x) yields high distance, which decreases when the time interval (?t) increases.

An important observation regarding Figures 8 and 10 is that the error decreases as ?t increases. This is due to the fact that the coarser the amplitude quantization is, the more difficult it is for small time intervals to capture the dynamics of slowly changing sequences.


4. CONCLUSION

This study has proposed a metric to quantitatively compare two networks and has demonstrated the utility of the metric via a simulation study involving different quantizations of the reference network. A key property of the proposed metric is that it allows comparison of networks of different natures. It also takes into consideration differences in the steady-state behavior and is invariant under time shifting and scaling. The metric can be used for various purposes besides quantization issues. Possibilities include the generation of a projected network from a reference network by removing proteins from the equations and connectivity reduction by removing edges in the connectivity matrix.

The metric facilitates systematic study of the ability of discrete dynamical models, such as Boolean networks, to approximately represent more complex models, such as differential-equation models. This can be particularly important in the framework of network inference, where the parameters for projected models can be inferred from the reference model, either analytically or via synthetic data generated via simulation of the reference model. Then, given the reference and projected models, the metric can be used to determine the level of abstraction that provides the best inference; given the amount of observations available, this approach corresponds to classification-rule constraint for classifier inference in pattern recognition.


NOMENCLATURE
Trajectory: A function f (t)
Distance Function: The proposed distance between networks

NOTATIONS
t: Time
?: Network
x0: Starting Point
f (t), g (t), h (t): Trajectories
fss, gss: Steady-State trajectories
f?,x0 (t): Trajectory
ftran: Transient part of the trajectory
fss: Steady-state part of the trajectory
F (x), G (x), H (x): Cumulative distribution functions
dtr(?,?): Distance between two trajectories
dss(?,?): Distance between two periodic or constant trajectories
? (A): Lebesgue measure of set A
f (t): Multivariate trajectory

ACKNOWLEDGMENTS

We would like to thank the National Science Foundation (CCF-0514644) and the National Cancer Institute (R01 CA-104620) for sponsoring in part this research.


References
1. De Jong H. Modeling and simulation of genetic regulatory systems: a literature reviewJournal of Computational Biology 2002;9(1):67–103. [pmid: 11911796]
2. Srivastava R,You L,Summers J,Yin J. Stochastic vs. deterministic modeling of intracellular viral kineticsJournal of Theoretical Biology 2002;218(3):309–321. [pmid: 12381432]
3. Albert R,Barab?si A-L. Statistical mechanics of complex networksReviews of Modern Physics 2002;74(1):47–97.
4. Kim S,Li H,Dougherty ER,et al. Can Markov chain models mimic biological regulation?Journal of Biological Systems 2002;10(4):337–357.
5. Albert R,Othmer HG. The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogasterJournal of Theoretical Biology 2003;223(1):1–18. [pmid: 12782112]
6. Aburatani S,Tashiro K,Savoie CJ,et al. Discovery of novel transcription control relationships with gene regulatory networks generated from multiple-disruption full genome expression librariesDNA Research 2003;10(1):1–8. [pmid: 12693549]
7. Goutsias J,Kim S. A nonlinear discrete dynamical model for transcriptional regulation: construction and propertiesBiophysical Journal 2004;86(4):1922–1945. [pmid: 15041638]
8. Li H,Zhan M. Systematic intervention of transcription for identifying network response to disease and cellular phenotypesBioinformatics 2006;22(1):96–102. [pmid: 16278241]
9. Datta A,Choudhary A,Bittner ML,Dougherty ER. External control in Markovian genetic regulatory networksMachine Learning 2003;52(1-2):169–191.
10. Choudhary A,Datta A,Bittner ML,Dougherty ER. Control in a family of boolean networksIn: IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS '06)May, 2006College Station, Tex, USA
11. Devroye L,Gy?rffi L,Lugosi G. A Probabilistic Theory of Pattern Recognition. 1996New York, NY, USA: Springer;
12. Ivanov I,Dougherty ER. Modeling genetic regulatory networks: continuous or discrete?Journal of Biological Systems 2006;14(2):219–229.
13. Ivanov I,Dougherty ER. Reduction mappings between probabilistic boolean networksEURASIP Journal on Applied Signal Processing 2004;2004(1):125–131.
14. Ott S,Imoto S,Miyano S. Finding optimal models for small gene networksIn: Proceedings of the Pacific Symposium on Biocomputing (PSB '04)January, 2004Big Island, Hawaii, USA:557–567.
15. Wessels LF,van Someren EP,Reinders MJ. A comparison of genetic network modelsIn: Proceedings of the Pacific Symposium on Biocomputing (PSB '01)January, 2001Lihue, Hawaii, USA:508–519.
16. Elowitz MB,Levine AJ,Siggia ED,Swain PS. Stochastic gene expression in a single cellScience 2002;297(5584):1183–1186. [pmid: 12183631]
17. Kauffman SA. The Origins of Order: Self-Organization and Selection in Evolution. 1993New York, NY, USA: Oxford University Press;
18. Alberts B,Johnson A,Lewis J,Raff M,Roberts K,Walter P. Molecular Biology of the Cell (4th ed). 20024th ed. New York, NY, USA: Garland Science;
19. Kauffman SA. Metabolic stability and epigenesis in randomly constructed genetic netsJournal of Theoretical Biology 1969;22(3):437–467. [pmid: 5803332]
20. Lynn PA. An Introduction to the Analysis and Processing of Signals. 1973New York, NY, USA: John Wiley & Sons;
21. Arkin A,Ross J,McAdams HH. Stochastic kinetic analysis of developmental pathway bifurcation in phage ?-infected Escherichia coli cellsGenetics 1998;149(4):1633–1648. [pmid: 9691025]
22. Iyer V,Struhl K. Absolute mRNA levels and transcriptional initiation rates in Saccharomyces cerevisiaeProceedings of the National Academy of Sciences of the United States of America 1996;93(11):5208–5212. [pmid: 8643554]
23. Lorsch JR,Herschlag D. Kinetic dissection of fundamental processes of eukaryotic translation initiation in vitroEMBO Journal 1999;18(23):6705–6717. [pmid: 10581244]

Article Categories:
  • Research Article


Previous Document:  Uncovering gene regulatory networks from time-series microarray data with variational Bayesian struc...
Next Document:  The implicit function as squashing time model: a novel parallel nonlinear EEG analysis technique dis...