Recovering ensembles of chromatin conformations from contact probabilities.  
Jump to Full Text  
MedLine Citation:

PMID: 23143266 Owner: NLM Status: MEDLINE 
Abstract/OtherAbstract:

The 3D higher order organization of chromatin within the nucleus of eukaryotic cells has so far remained elusive. A wealth of relevant information, however, is increasingly becoming available from chromosome conformation capture (3C) and related experimental techniques, which measure the probabilities of contact between large numbers of genomic sites in fixed cells. Such contact probabilities (CPs) can in principle be used to deduce the 3D spatial organization of chromatin. Here, we propose a computational method to recover an ensemble of chromatin conformations consistent with a set of given CPs. Compared with existing alternatives, this method does not require conversion of CPs to mean spatial distances. Instead, we estimate CPs by simulating a physically realistic, beadchain polymer model of the 30nm chromatin fiber. We then use an approach from adaptive filter theory to iteratively adjust the parameters of this polymer model until the estimated CPs match the given CPs. We have validated this method against reference data sets obtained from simulations of test systems with up to 45 beads and 4 loops. With additional testing against experiments and with further algorithmic refinements, our approach could become a valuable tool for researchers examining the higher order organization of chromatin. 
Authors:

Dario Meluzzi; Gaurav Arya 
Related Documents
:

23345916  Lyotropic ion channel current model compared with ising model. 23591376  Modeling the listeria innocua micropopulation lag phase and its variability. 23082136  The first occurrence in the fossil record of an aquatic avian twignest with phoenicopt... 24378396  Multivariate sar and qsar of cucurbitacin derivatives as cytotoxic compounds in a human... 22692726  Modelbased identification of motion sensor placement for tracking retraction and elong... 18270116  Change detection in multisensor sar images using bivariate gamma distributions. 
Publication Detail:

Type: Journal Article; Research Support, NonU.S. Gov't; Validation Studies Date: 20121111 
Journal Detail:

Title: Nucleic acids research Volume: 41 ISSN: 13624962 ISO Abbreviation: Nucleic Acids Res. Publication Date: 2013 Jan 
Date Detail:

Created Date: 20121228 Completed Date: 20130322 Revised Date: 20130711 
Medline Journal Info:

Nlm Unique ID: 0411011 Medline TA: Nucleic Acids Res Country: England 
Other Details:

Languages: eng Pagination: 6375 Citation Subset: IM 
Affiliation:

Department of NanoEngineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA. 
Export Citation:

APA/MLA Format Download EndNote Download BibTex 
MeSH Terms  
Descriptor/Qualifier:

Algorithms Chromatin / chemistry* Computational Biology / methods Models, Molecular* 
Chemical  
Reg. No./Substance:

0/Chromatin 
Comments/Corrections 
Full Text  
Journal Information Journal ID (nlmta): Nucleic Acids Res Journal ID (isoabbrev): Nucleic Acids Res Journal ID (publisherid): nar Journal ID (hwp): nar ISSN: 03051048 ISSN: 13624962 Publisher: Oxford University Press 
Article Information © The Author(s) 2012. Published by Oxford University Press. creativecommons: Received Day: 26 Month: 7 Year: 2012 Revision Received Day: 2 Month: 10 Year: 2012 Accepted Day: 4 Month: 10 Year: 2012 Print publication date: Month: 1 Year: 2013 Electronic publication date: Day: 10 Month: 11 Year: 2012 pmcrelease publication date: Day: 10 Month: 11 Year: 2012 Volume: 41 Issue: 1 First Page: 63 Last Page: 75 PubMed Id: 23143266 ID: 3592477 DOI: 10.1093/nar/gks1029 Publisher Id: gks1029 
Recovering ensembles of chromatin conformations from contact probabilities  
Dario Meluzzi^{1}^{2}  
Gaurav Arya^{1}*  
^{1}Department of NanoEngineering and ^{2}Department of Chemistry and Biochemistry, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA 

Correspondence: *To whom correspondence should be addressed. Tel: +1 858 822 5542; Fax: +1 858 534 9553; Email: garya@ucsd.edu 
Eukaryotic cells need to accommodate their long genomic DNA within a relatively small nucleus. This remarkable feat is accomplished through several levels of 3D spatial organization (^{1}). The first level consists of wrapping the DNA duplex around octamers of histone proteins to form nucleosomes. The resulting string of nucleosomes is then folded into a thicker fiber known as chromatin. Subsequent levels of folding ultimately lead to the territorial arrangement of chromosomes within the nucleus. These additional levels of folding, referred to as higher order organization of chromatin, are not only essential for efficient DNA packaging but are also believed to play a role in several other biological processes. For example, the formation of chromatin loops facilitates interactions between distant portions of DNA and these interactions are essential for regulating transcription and recombination (^{2–5}). Also, the transcriptional activity of genes tends to be inversely correlated with the spatial density of chromatin fibers (^{6–9}). Furthermore, growing evidence suggests that spatially proximal regions of the genome are more likely to be functionally correlated, leading to the concepts of ‘factories’, ‘globules’ and ‘territories’ (^{10–14}). Unfortunately, owing to the limitations of current experimental methods in visualizing chromatin in vivo, the 3D higher order organization of chromatin is not well understood. In particular, stateoftheart microscopy approaches, such as fluorescence in situ hybridization (FISH) (^{15}) and superresolution fluorescence microscopy (^{16}), do not simultaneously provide the spatial resolutions and the measurement throughput necessary to discern and locate individual chromatin fibers within the nucleus.
During the past decade, however, increasingly higher resolution and throughput have been achieved by a number of sophisticated experimental techniques—including 4C (^{17},^{18}), 5C (^{19}), GCC (^{20}) and HiC (^{21})—that are based on the original method of chromosome conformation capture (3C) (^{22}). These techniques do not directly capture the 3D spatial organization of chromatin. Instead, they measure the frequency of interactions between different fragments of genomic DNA in fixed cells (^{23}). To detect such interactions, spatially proximal segments of DNA are covalently crosslinked by treating millions of intact nuclei with chemical agents, such as formaldehyde. The DNA is then cleaved into small fragments by digestion with appropriate restriction enzymes. Next, the resulting pairs of crosslinked fragments are enzymatically ligated and the crosslinks are chemically removed. Finally, the ligation products are amplified by polymerase chain reaction and sequenced by highthroughput methods. Analysis of the sequences allows one to identify the pairs of fragments that were originally crosslinked. Counting the number of times that each pair was identified from the sequences yields a 2D map of contact probabilities (CPs) for the examined pairs of fragments.
Although CP maps provide abundant information to help researchers infer the higher order organization of chromatin through theoretical and computational models (^{24},^{25}), such a task is rather challenging. To tackle this problem, several approaches have already been proposed. Dekker et al. (^{22}) presented the first such approach to deduce a coarse 3D structure for the 320kb chromosome III in NKY2997 cells. To obtain the structure, 78 CPs were measured by 3C and converted to spatial distances through a theoretical expression for wormlike chains (^{26}). The resulting distances were presumably used to solve a molecular distance geometry problem. Later, Fraser et al. (^{27}) assumed the inverse proportionality relation to calculate spatial distances d from hundreds of CPs p, obtained by 5C experiments on the HoxA gene cluster in THP1 leukemia cells. The resulting distances d were then used as targets to optimize a piecewise linear curve representing the gene cluster under study. The same relation was used by Duan et al. (^{28}) to infer the 3D structure of the budding yeast genome from over 65 000 CPs obtained by 4C. In addition, they modeled chromatin as a chain of beads, each representing 10 kb of DNA, and defined various constraints to enforce known geometric and topological features of yeast chromatin. Nonlinear constrained optimization methods were then used to find an optimal structure. Another full genome, that of fission yeast, was studied by Tanizawa et al. (^{29}) using a HiC variant with enrichment of ligation products. To determine the 3D structure of this genome, the authors used a beadchain model and a method similar to that of (^{28}). This time, however, spatial distances were calculated from CPs through a calibration curve obtained by fitting a double exponential decay function to distance measurements obtained by FISH.
A beadchain model of chromatin was also employed by Baù et al. (^{12}), who used 5C to analyze the 500kb ENm008 domain ( globin gene) on human chromosome 16 in K562 cells and in GM12878 cells. In this case, though, each bead represented a DNA restriction fragment, with bead radius proportional to fragment length. The beads interacted through harmonic restraints with strengths and equilibrium distances derived from experimental CPs. A combination of optimization and clustering algorithms was then used to determine a conformation ensemble and corresponding centroid structure for the ENm008 domain in each cell type. Again seeking conformation ensembles, Rousseau et al. (^{30}) used a probabilistic approach to analyze 5C data on the 142kb HoxA cluster in THP1 and HB1119 cell lines, and HiC data from (^{21}) on the 88.4Mbp long arm of human chromosome 14. In particular, they applied a Markov chain Monte Carlo sampling method to generate ensembles of structures consistent with a posterior distribution of spatial distances between restriction fragment midpoints, where the distances were again obtained from experimental CPs by assuming an inverse power law relation. Another effort to obtain chromatin conformation ensembles, but without using a distanceCP relationship, was recently reported by Gehlen et al. (^{31}). To generate such ensembles for the entire S. cerevisiae genome, the authors performed multiple molecular dynamics simulations of a beadchain polymer model and included within each simulation a randomly selected subset of intra and interchromosomal interactions experimentally determined through GCC.
Although the above computational approaches are remarkable in their ability to handle large numbers of interacting fragments, almost all of them rely on converting the measured CPs to spatial distances between interacting fragments. Such conversion is achieved by assuming a functional relation that describes the behavior of free linear chains. For example, polymer theory predicts that for ideal random walk chains (^{32}), whereas more elaborate relations have been derived for wormlike chains (^{33}). These relations, however, may not be valid for polymers subjected to looping and other external constraints. Also, several of the above approaches ignore the mechanical properties of the chromatin fiber or determine only a single average structure from a given set of CPs, which are in fact the result of crosslinking events over an ensemble of chromatin conformations sampled from millions of cells. Finally, none of the above studies validate their proposed computational methods against known chromatin conformation ensembles.
Here, we describe and validate a computational approach to obtain ensembles of chromatin conformation consistent with a given set of reference CPs. This approach does not require assuming a functional relation between spatial distances and CPs. Instead, we estimate new CPs by simulating a coarsegrained polymer model that approximates the physical behavior of a 30nm chromatin fiber. We then iteratively adjust the parameters of this polymer model until a good match is achieved between the CPs estimated from the simulations and those in the given reference set. The result is an ‘optimal’ ensemble of conformations that is most consistent with the given reference CPs. Our initial validation of this approach against several simulated test systems produced good agreement of average spatial properties between reference and recovered conformation ensembles.
Our goal is to generate an ensemble of conformations consistent with a given set of probabilities of contact between different segments of a chromatin fiber. To achieve this goal, we propose a computational method that consists of three main components (Figure 1): (i) a coarsegrained polymer model approximating the physical properties of chromatin; (ii) a procedure to generate an ensemble of conformations for the polymer model and (iii) a procedure to refine the parameters of the polymer model in such a way that the generated conformation ensemble is consistent with the given set of CPs.
We assume that chromatin exists as a fiber with an average diameter of 30 nm, and that the conformation of this fiber is determined primarily by its stretching resistance, bending stiffness and excluded volume. To approximate these physical properties, we use a beadchain model, with each bead representing a chromatin segment of 3–6 kb (^{34}). A similar model was proposed by Rosa et al. (^{35}) and was recently used to simulate the entire genome of budding yeast (^{36}). Following Baù et al. (^{12}), we also assume that the chromatin fiber is subjected to unknown external constraints, e.g. due to looping interactions or confinement, and that the average effects of these constraints can be approximated by additional harmonic restraints connecting particular beads in the chain (Figure 2a).
Thus, the potential energy U of the bead chain can be expressed as the sum of four terms,
The first term accounts for the chain’s resistance to stretching and results from connecting adjacent beads with harmonic springs,
where is the distance between beads i and j, N is the number of beads in the chain, is the spring constant, is the position vector for bead i, is the equilibrium bond length and = 30 nm is the unit of length used in our simulations (Table 1).The second potential energy term accounts for the chain’s resistance to bending and results from subjecting each triplet of adjacent beads to a harmonic bending potential (^{37}),
where is the angle between the displacement vectors and is an angular ‘spring constant’, is the persistence length of the chain (^{38}), is the Boltzmann constant and T is the absolute temperature.The third potential energy term, , accounts for the excluded volume, or effective thickness, of the chain and is treated using the repulsive part of the Lennard–Jones potential,
where is the unit of energy in our simulations, = 30 nm is the effective thickness of the fiber (Table 1) and is the Heaviside step function, which equals 1 when and 0 otherwise.The last potential energy term, , accounts for the presence of external forces or constraints that affect the shape of the chromatin fiber as well as the probability of contact between different segments of the fiber. In particular, we assume that the average effects of these constraints can be reproduced reasonably well by including a sufficient number of harmonic restraints that connect a subset of the beads in the chain,
where R is the set of pairs of beads connected by harmonic restraints and ( ) is the spring constant (equilibrium distance) for the restraint connecting beads i and j. The actual members (i, j) of the set R and the corresponding values of and are adjustable parameters in this model of restrained chromatin.To obtain optimal values for these adjustable parameters, we compare a reference set of probabilities of contact between the beads in the chain with a set of corresponding probabilities estimated from an ensemble comprising a large number of beadchain conformations.
To obtain such an ensemble, we start by minimizing the potential energy of an initial conformation. To this end, we use the Polak–Ribiere modification of the conjugate gradient algorithm (^{39}). Next, we equilibrate the energyminimized chain by performing 10^{6} steps of Brownian dynamics (BD) simulation. We then perform an additional BD simulation during which we collect one conformation every 100 integration steps. The set of conformations collected from a single simulation trajectory constitute a conformation ensemble.
To perform the BD simulations, we apply a secondorder algorithm (^{40},^{41}), which we simplify to neglect the effects of hydrodynamic interactions. Specifically, for each bead i, we calculate a tentative new position at time using the position of the bead and the force on the bead at time t,
where is the damping constant, is the selfdiffusion coefficient, is the integration time step, m is the mass of each bead and N(t) is a random displacement vector whose components are normally distributed with mean 0 and variance 1. Next, we use the tentative bead positions to calculate a tentative new force for each bead i. Then, for each bead i, we calculate a more accurate position using the tentative position and tentative force at time and the force at time t,Finally, these latter bead positions are used to calculate more accurate forces at time
To estimate the bead CPs from an ensemble of bead chain conformations, we analyze each member of the ensemble and check for the occurrence of contacts within all possible pairs of beads in the chain. Following Rosa et al. (^{35}), a contact between two beads is defined to occur whenever the distance between the beads is less than a predefined ‘contact’ distance, (Table 1). Hence, we estimate the probability of contact between beads i and j by calculating the proportion of conformations in which a contact occurs between those beads,
where is the total number of conformations in the ensemble and is the distance between beads i and j in conformation l of the ensemble.In this work, we assume that a set of reference CPs, denoted , is available for , and the problem is to find an ensemble of beadchain conformations consistent with those CPs. To this end, we need to optimize the adjustable parameters of the beadchain model so that simulating such a model yields a conformation ensemble whose estimated CPs match as closely as possible the corresponding reference CPs for .
The adjustable parameters to be optimized are the pairs of indexes and the values of and for each (i, j). To begin to tackle this complex problem, we choose to reduce the number of adjustable parameters by fixing the members (i, j) of the set R at the start of the optimization procedure and by using zero as the equilibrium distance for the harmonic restraints, i.e. . Although , excluded volume interactions (Equation 4) prevent beads from overlapping.
To determine the pairs of beads that must be connected by harmonic restraints, we analyze the given set of reference CPs , for , by using a peak detection algorithm. Specifically, we construct a smooth surface z = g(x,y) such that for . Each peak on this CP ‘surface’ corresponds to a pair of beads that interact more frequently than their neighbors. Thus, we find the pair of integers (i, j) closest to the location of each peak in the CP surface, and we add (i, j) to the set R of pairs of beads connected by harmonic restraints. To find the location of each peak, we slice the surface at every point (i,j), for , using the four vertical planes x = i, y = j, x + y = i + j and x – y = i – j. Next, we find the local maxima of the curve generated by each slice. If the curves on all four slices have a local maximum close to , then we deem to be the location of a peak on the CP surface.
The remaining group of adjustable parameters are the spring constants that determine the strength of the harmonic restraints on bead pairs . To predict these spring constants from the known reference CPs, we use the general linear model
Here, k* is a vector containing n predicted spring constants of the harmonic restraints, where n is the number of bead pairs in R; W is an matrix of model parameters and p* is a vector containing n + 1 elements, where the first n elements are the reference CPs for the pairs of beads connected by the n harmonic restraints, and the last element is a nonzero constant c that allows W to map the background CPs of an unrestrained chain to zero spring constants. As c is multiplied by appropriate weights, its value can be arbitrary. To minimize roundoff errors, however, we useNow the problem of finding optimal values for the spring constants becomes a problem of determining the optimal elements of the matrix W. This is not a trivial problem, because each spring constant may in general affect not only the CP for the pair (i, j) of beads connected by that spring but also the CPs for other pairs of beads in the chain, including those connected by other restraints. Also, because Equation 9 is an approximation, the optimal W will not necessarily yield valid spring constants when p* changes. Thus, in general, an optimal W must be determined for each given p*.
One could argue that predicting the s through Equation 9 is an unnecessary complication, because optimal s could be found more simply by using a standard optimization algorithm that adjusts the s to minimize the sum of squared differences . We did not pursue such a blind approach, however, because we suspected that it would be less efficient than alternative methods that take advantage of additional information about the underlying physical system. Such information, in the proposed approach, is the hypothesis that there exists an ‘inverse’ system that converts the CPs to spring constants according to the general linear model of Equation 9.
To find optimal elements for W in Equation 9, we apply the least mean squares (LMS) algorithm developed by Widrow and colleagues (^{45–47}) (see the Appendix). This simple yet powerful algorithm has been extensively used in the field of adaptive signal processing to optimize a digital filter structure known as adaptive linear combiner (ALC). An ALC performs a dot product between a timevarying weight vector and a timevarying input vector , thus obtaining a scalar output y_{k} = w_{k}^{T}x_{k}, which is required to approximate a given desired signal at each discrete time step k. To meet this requirement, the LMS algorithm uses a steepest descent scheme that iteratively adjusts the elements of the weight vector at each time step k using
where is the error at time step k and is a gain factor that affects the speed of convergence and the stability of the algorithm.To apply the LMS algorithm toward the optimization of the parameter matrix W in Equation 9, we allow this matrix, the CPs and the predicted spring constants to vary with iteration index k, i.e. k_{k} = W_{k}p_{k}. We then treat the elements of and the rows of as the outputs and transposed weight vectors, respectively, of n ALCs,
To complete this application of the LMS algorithm, we must provide appropriate inputs to the ALCs and obtain appropriate errors, which are necessary to adjust the weight vectors. To obtain an input vector for all ALCs, we first predict a set of restraint spring constants using the parameter matrix available at iteration k and the constant vector of reference CPs (first block in Figure 2b), i.e. . Next, we use this set of spring constants to generate, through BD simulations, an ensemble of beadchain conformations, and we use this ensemble to estimate the CPs for the restrained bead pairs (second block in Figure 2b). The resulting vector of estimated CPs is now used as input for all n ALCs, which produce a corresponding output vector = W_{k} at iteration k (third block in Figure 2b). If the weights of the ALCs were optimal, then the ALC outputs in at iteration k would be very close to the spring constants in , which were used to generate the ensemble of conformations from which was estimated. Therefore, the ALC errors are the elements of the vector ε_{k} = − , which we can finally use to compute a better estimate of the parameter matrix for the next iteration,
To ensure stability of the LMS algorithm, we need a gain factor , where tr[R] is the trace of the input correlation matrix (^{47}). Thus, to calculate a safe value for we let and we use the approximation
where we account for both current and reference CPs in order to decrease when is large and to bound from above when is small. To determine the first vector of predicted restraint spring constants we assume a linear relationship between each and the corresponding reference CP . Specifically, we set for , where a_{0} = 2 − 2 /( − ), a_{1} = 2/( − ) and ( ) is the minimum (maximum) value of the reference CPs for . This choice yields initial spring constants ranging from 20 to 40% of the maximum value of 10 that we allow to take. Then, to begin the restraint optimization procedure with the first vector of estimated CPs we set the diagonal elements of equal to 1 and all other elements equal to 0.After a sufficient number of iterations, the restraint optimization procedure described above should yield a set of predicted spring constants that produce a good match between the CPs estimated for the ‘restrained’ bead pairs and the corresponding reference CPs, i.e. for . Our goal, however, is to generate an optimal ensemble of beadchain conformations such that the CPs estimated for ‘all’ bead pairs, not just the restrained ones, closely match the corresponding reference CPs, i.e. we want for . To quantify the goodness of match between estimated and reference CPs, we calculate the root meansquared deviation (RMSD) between the two sets of probabilities,
To find the set of restraint spring constants that minimize , we perform 40 iterations of the LMS algorithm (Equation 13) during the restraint optimization procedure described above. To accelerate these iterations, we perform only steps during the BD simulations from which the CPs are estimated at each iteration. In general, the conformation ensembles produced by such short simulations will depend on the initial beadchain conformation used for the BD simulations. Therefore, to find the ensemble that minimizes , we perform several trials of the restraint optimization procedure. In each trial, we use a different initial conformation for the simulations, and we identify the set of restraint spring constants that minimize among all iterations performed. Next, these optimal spring constants and the corresponding initial conformation are used to generate a larger conformation ensemble, this time by performing 10^{8} steps of BD simulation. Among the larger conformation ensembles obtained from all trials, we select the one that yields the smallest . This final ensemble is the one we deem to be optimal, i.e. most consistent with the reference CPs.
To obtain the different initial beadchain conformations used for each trial of the restraint optimization procedure, one could simply generate a number of random conformations. We choose, however, a more deterministic approach aimed at generating conformations with different relative orientations of loops. Specifically, we design each initial conformation in the shape of a tight cylindrical bundle (Figure 6). To generate the bundle, all the beads connected by harmonic restraints are arranged on a circle whose circumference is just large enough to prevent overlapping those beads. Next, the intervening fragments that contain the other beads of the chain are used to connect the beads on the circle. As they join the beads on the circle, these fragments are forced to run perpendicular to the plane of the circle. Hence, there are two ways in which each fragment can connect two adjacent beads on the circle: on the same side of the plane of the circle, or on opposite sides. By connecting the beads on the circle with the intervening fragments in all possible ways, we can generate up to distinct conformations, where is the number of beads on the circle and where we omit those conformations that result from reflecting other conformations about the plane of the circle. In the present study, we selected up to 32 different bundle conformations to perform the trials of the restraint optimization procedure (Table 2).
To validate our computational method, we considered six test systems of increasing complexity. Each test system consisted of the same beadchain model that we used to recover an optimal conformation ensemble from reference CPs. In each such system, however, we induced the formation of specific loops by connecting appropriate beads with up to eight harmonic restraints (Supplementary Table S1). To vary the complexity of these test systems, we varied the number of beads in the chain and the number of induced loops (Table 2). In particular, we simulated chains of 25, 35 and 45 beads with 2, 3 and 4 ‘free’ loops, respectively. To mimic the effects of confinement constraints, we also simulated the same chains with additional restraints connecting the middle beads of free loops across such loops, as shown schematically in Supplementary Table S1, thus giving rise to ‘tied’ loops.
We used the same value of for the spring constants of all restrained bead pairs (i, j) in all test systems. The conformations of these test systems obtained after minimizing their potential energy are shown in Figure 3.
To obtain reference sets of estimated bead CPs , for , in each of the six test systems, we generated corresponding ensembles of beadchain conformations by performing BD simulations following the same protocol described above for the ensemble recovery procedure. In particular, for each test system, we constructed an initial beadchain conformation by threading the appropriate number of beads into the path of a 3D Hilbert curve (^{21}). We then minimized the potential energy of the initial conformation (Figure 3), equilibrated the system with 10^{6} simulation steps and performed 10^{8} additional steps, during which we collected one beadchain conformation every 100 steps. From the collected conformations, we estimated using Equation 8. The CPs estimated for a chain of 45 beads with four free loops and four tied loops are represented as heat maps in Figure 4a. Also highlighted are the locations of bead pairs (i, j) that were connected by harmonic restraints to induce the formation of loops or to tie the loops.
These heat maps qualitatively confirm the intuition that for beads connected by restraints and for nearby beads along the chain should be greater than the background CP. These maps, however, also reveal enhanced CPs for pairs of beads that were not directly connected by harmonic restraints and that were relatively distant along the chain from other restrained beads. A similar phenomenon was observed for the chain with 35 beads, but not for the chain with 25 beads (data not shown). Thus, an enhanced probability of contact between two beads in the chain is not always due to an external force directly pulling those beads toward each other. These results underscore the complexity of interactions that can arise even for a chain of only 35 beads, when such a chain is subjected to looping constraints.
Simulating the above test systems to validate our computational method also provided an opportunity to investigate the behavior of chromatin assumed in previous related works. In particular, to infer the 3D conformation of chromatin from experimentally measured CPs, previous studies have assumed that the mean spatial distance d between two DNA fragments can be deduced from their CP p through a simple functional relation. For example, the power law has been used with exponents (^{27},^{28}) and (^{30}). Alternatively, exponential decay (^{29}) and logarithmic types of relations (^{12}) have also been used. To assess whether a simple relationship between mean interbead distance,
and corresponding CP does hold for our test systems, we obtained from the same conformation ensembles that were used to determine .First, however, we analyzed the results from simulations of bead chains that lacked harmonic restraints. A plot of against , for , obtained from simulating an unrestrained chain of 45 beads, shows that, in the absence of constraints inducing loop formation, the CPs follow a clear trend with a peak at (Figure 5a). We observed similar trends for chains with 35 and 25 beads (data not shown).
As does not vary significantly among pairs of beads separated by similar mean spatial distances or by similar loop lengths j – i (Figure 5a), it is reasonable to estimate looping probabilities by averaging over constant values of j – i (Figure 5a, inset). We found that such looping probabilities for an unrestrained chain of 45 beads approximately follow the trend predicted by theory for wormlike chains with nonzero persistence length and nonzero contact distance (^{26}), thus confirming that our simulations can reproduce the behavior of such chains. Furthermore, noting a monotonic relation between and for , we also fitted the power law to our simulation data for the unrestrained chain, and we obtained , which approximately agrees with the value reported in (^{30}). These results suggest that, in the absence of harmonic restraints and for sufficiently large, it may be appropriate to assume a simple monotonic relation between and and to use such relation for predicting approximate values of from known or measured values of .
We next analyzed the results from the simulations of the test systems, where specific beads were connected by restraints as described above. In this case, the plots of against indicate that the addition of harmonic restraints complicates the relation between and (Figure 5b and c) far beyond the clear trend obtained from the simulations of the unrestrained chains. In particular, when harmonic restraints are present, the CPs are overall greater than the corresponding values observed in the absence of restraints, and there appears to exist no simple law that relates and . In fact, different pairs of beads separated by similar mean spatial distances or by similar loop lengths yield CPs that differ significantly by up to four orders of magnitude. The observed variation of with for the chain with four free loops is bounded by power laws with exponents as different as −3 and −8 (Figure 5b and c, dashed lines). Hence, for the test systems considered in the present study, assuming a simple functional relation and using such relation to calculate from would introduce large fractional errors in the predicted values of , and those errors would increase with decreasing bead CPs. These results thus motivate the development of computational approaches that do not rely on calculating from but directly compare estimated CPs with reference CPs to infer a configuration ensemble.
After obtaining the reference set of bead CPs for each test system, we applied our computational method to recover an ensemble of conformations whose estimated CPs match the corresponding reference CPs. We began by selecting a few pairs of beads to be connected with harmonic restraints. This selection was performed in an automated fashion by analyzing the reference CP maps with the peak detection algorithm described above. The algorithm successfully identified all of the bead pairs that were connected by harmonic restraints in the test systems used to generate (Supplementary Table S1). Moreover, for chains with 35 and 45 beads, the algorithm found 2, 3 or 5 additional bead pairs that were not restrained in the test systems (Supplementary Table S1) but nevertheless gave rise to CPs enhanced above the background (Figure 4).
We next adjusted the spring constants of the guessed restraints by performing up to 32 trials of our iterative restraint optimization procedure. Each trial used a different initial chain conformation (Figure 6) to start the BD simulations performed to estimate the CPs for the restrained bead chain.
From each trial, we obtained a different set of restraint spring constants together with the corresponding ensemble of chain conformations. For each such conformation ensemble, we used Equation 15 to calculate , the RMSD between the CPs estimated for that ensemble and the corresponding reference CPs previously obtained for the test system under study. We found that varies among the trials of the ensemble recovery procedure for a given test system and that this variation increases with the complexity of the test system (Figure 7), indicating that, within the simulated time intervals, the restrained bead chain tends to get trapped into local energy minima that depend on the initial chain conformation.
For each recovered conformation ensemble, we also calculated the mean interbead distances for . Then, to compare quantitatively these mean interbead distances with the corresponding reference quantities we calculated the RMSD between the two sets of distances (^{30}),
We found that minimizing over the trials for a given test system yields the smallest, or a relatively small value for the RMSD of the mean interbead distances, (Figure 7). These results indicate that minimizing relative to a set of reference CPs is an effective strategy for identifying a conformation ensemble that closely matches the mean interbead distances of the original conformation ensemble from which the were estimated or measured.
Therefore, to conclude our ensemble recovery procedure, for each test system, we selected the set of restraint spring constants and the corresponding conformation ensemble that minimized among all trials. Comparing the heat maps of the CPs estimated from recovered and reference ensembles for chains with 45 beads (Figure 4a and b) shows a good qualitative agreement between the two ensembles. A similar good agreement was also observed for the simpler test systems (data not shown). This agreement is also apparent in plots of against (Figure 8a).
Furthermore, the mean and standard deviation of the interbead distances in the recovered conformation ensembles are in excellent agreement with the corresponding quantities calculated for the reference ensembles (Figure 8b and c), confirming that our procedure successfully recovered not only the average frequency of the various interbead interactions but also the average interbead distances and the extent to which these distances fluctuate about the mean.
To visualize the reference and recovered conformation ensembles, we uniformly extracted 100 conformations from each such ensemble and aligned those conformations on the beads that were restrained in the test system used to generated the reference ensemble. The resulting 3D representations of the reference and recovered conformation ensembles reveal large fluctuations in the positions of the loops (Figure 9). The same regions of space, however, tend to be occupied by corresponding loops in the reference and recovered ensembles, thus providing a visual confirmation of the similarity between the average spatial arrangements of the two ensembles.
The good agreement that we observed between recovered and reference ensembles is in fact a consequence of successfully optimizing the general linear model of Equation 9 with the LMS algorithm. This optimization resulted in a good prediction of restraint spring constants from the reference CPs associated with the restrained bead pairs. In fact, as noted above, some bead pairs were chosen by the peak detection algorithm for restraining, even though they were not restrained in the test systems. During the ensemble recovery procedure, however, the spring constants for the restraints on these bead pairs decreased to small values relative to the spring constants restraining the other bead pairs (Supplementary Table S1). These results indicate that the ensemble recovery procedure correctly distinguished the pairs of beads that were directly connected by harmonic restraints in the reference conformation ensemble from those pairs that were not. In particular, the RMSD,
between the spring constants predicted during the ensemble recovery procedure and the corresponding value used for the restraints in the test systems was <6% of for all such systems (Table 2). Thus, the procedure successfully deduced approximate values of the underlying spring constants using only the knowledge of reference CPs.We asked whether a similarly good prediction of each restraint spring constant could be achieved with fewer than n + 1 nonzero elements per row in the matrix W in Equation 9, i.e. with fewer than n + 1 parameters per spring constant. To answer this question, we repeated the ensemble recovery procedure on all test systems, this time forcing all of the offdiagonal elements of W—except those in the last column—to be zero, thus effectively using only two parameters to predict each spring constant. This choice corresponds to assuming that each restraint spring constant is linearly related only to the CP estimated for the bead pair (i, j) restrained by that spring constant, i.e. k_{i}_{,}_{j} = w_{i}_{,}_{j} + c_{i}_{,}_{j}. We found that the resulting RMSDs of the spring constants, CPs and mean interbead distances did not differ appreciably from the corresponding values obtained by using n + 1 parameters per spring constant (Table 2). Therefore, for the test systems considered in this study, it appears that the CP associated with each restrained bead pair depends primarily on the spring constant restraining that bead pair. This conclusion, however, may not hold for more complex systems, where the restraints might be less uniformly distributed among the beads and might have more variable spring constants than the restraints we used in this study to generate the reference CPs. For more complex systems, using all parameters in the general linear model of Equation 9 may be necessary to achieve adequate accuracy in the prediction of spring constants from CPs.
The majority of the computation time required by the proposed ensemble recovery procedure is consumed by the BD simulations. These simulations are needed to estimate the CPs either for adjusting the spring constants through the LMS algorithm or for selecting the optimal conformation ensemble through a comparison of values among the ensembles obtained from different initial conformations. The simulations must be sufficiently long to ensure that the variance of the CPs estimated using Equation 8 does not outweigh the variation in CP due to differences in spring constants and initial conformations. In our work with the test systems, obtaining CPs sufficiently precise to ensure rapid convergence of the LMS algorithm in all trials of the restraint optimization procedure required steps per simulation. On the other hand, ensuring that the optimal ensemble selected from several trials of the procedure matches the diversity of conformations present in the corresponding reference ensemble required 10^{8} simulation steps, i.e. the same number that was used to obtain each reference ensemble. The average computation time per trial was found to increase linearly with the number of beads (Table 2).
We have developed a computational approach to recover chromatin conformation ensembles from a set of reference CPs. The overall strategy of this approach consists of comparing the given set of reference CPs to a set of CPs obtained from simulations of a restrained beadchain polymer model of chromatin. The results of this comparison are used iteratively to adjust the parameters of the polymer model so that, after a sufficient number of iterations and trials, an optimal conformation ensemble is obtained whose CPs closely match the corresponding reference probabilities. We have validated this procedure by using reference data sets obtained from simulations of six test systems of increasing complexity. For all such systems, the procedure yielded a conformation ensemble whose CPs, mean interbead distances and standard deviation of interbead distances all agree very closely with the corresponding reference quantities. The most complex test system that we considered was a chain of 45 beads, equivalent to roughly 135–270 kb, containing four tied loops (Figure 3f). Although this system is much smaller than the genomic loci typically investigated in 3Cbased experiments, it does provide initial support to the validity of the proposed computational approach, which can already be used to investigate the spatial organization of small genomic regions.
To enable efficient and accurate analysis of experimental data sets obtained from large genomic loci, entire chromosomes or even entire genomes, the proposed computational approach will require additional improvement and validation. For example, whereas the present approach estimates CPs for beads representing fragments of equal lengths, 3Cbased experiments typically provide reference CPs for fragments of various lengths. This mismatch could be overcome by mapping the experimental fragments onto the beadchain contour and by estimating CPs for pairs of mapped fragments, rather than for pairs of beads. Another issue is computational effort. Although the procedure we described lends itself to parallelization, with each trial executing on a separate processor core, it may nevertheless become too demanding for large genomic loci. Computational effort could be lowered by improving the efficiency of conformation sampling, for example through Monte Carlo simulations, and by avoiding Equation 8 in the estimation of CPs, for example by inferring interbead distance distributions from sample means and higher moments of . Finally, further validation of the procedure will require not only the simulation of larger and more complex test systems but also the availability of experimental data sets that include both 3C and FISH measurements on the same genomic region. Applications of our method to the analysis of experimental data and to the study of specific phenomena, such as gene clustering (^{31}), are important issues that will be addressed in the future.
Supplementary Data are available at NAR Online: Supplementary Table 1.
American Cancer Society [Instructional Research 70002 provided to G.A. through the Moores Cancer Center, University of California, San Diego] and ARCS Foundation, San Diego Chapter [scholarship to D.M.]. Funding for open access charge: American Cancer Society.
Conflict of interest statement. None declared.
REFERENCES
1.  Felsenfeld G,Groudine M. Controlling the double helixNatureYear: 200342144845312540921 
2.  Blackwood EM,Kadonaga JT. Going the distance: a current view of enhancer actionScienceYear: 199828160639679020 
3.  Adhya S. Multipartite genetic control elements: communication by DNA loopAnnu. Rev. Genet.Year: 1989232272502694932 
4.  Carter D,Chakalova L,Osborne CS,Dai Yf,Fraser P. Longrange chromatin regulatory interactions in vivoNat. Genet.Year: 20023262362612426570 
5.  Schleif R. DNA loopingAnnu. Rev. Biochem.Year: 1992611992231497310 
6.  Gheldof N,Tabuchi TM,Dekker J. The active FMR1 promoter is associated with a large domain of altered chromatin conformation with embedded local histone modificationsProc. Natl Acad. Sci. USAYear: 2006103124631246816891414 
7.  Janicki SM,Tsukamoto T,Salghetti SE,Tansey WP,Sachidanandam R,Prasanth KV,Ried T,ShavTal Y,Bertrand E,Singer RH,et al. From silencing to gene expression: realtime analysis in single cellsCellYear: 200411668369815006351 
8.  Müller WG,Walker D,Hager GL,McNally JG. Largescale chromatin decondensation and recondensation regulated by transcription from a natural promoterJ. Cell Biol.Year: 2001154334811448988 
9.  Tumbar T,Sudlow G,Belmont AS. Largescale chromatin unfolding and remodeling induced by VP16 acidic activation domainJ. Cell Biol.Year: 19991451341135410385516 
10.  Cook PR. The organization of replication and transcriptionScienceYear: 19992841790179510364545 
11.  Parada LA,Misteli T. Chromosome positioning in the interphase nucleusTrends Cell Biol.Year: 20021242543212220863 
12.  Baù D,Sanyal A,Lajoie BR,Capriotti E,Byron M,Lawrence JB,Dekker J,MartiRenom MA. The threedimensional folding of the αglobin gene domain reveals formation of chromatin globulesNat. Struct. Mol. Biol.Year: 20111810711421131981 
13.  Lucas JS,Bossen C,Murre C. Transcription and recombination factories: common features? CurrOpin. Cell Biol.Year: 201123318324 
14.  Cremer T,Cremer C. Chromosome territories, nuclear architecture and gene regulation in mammalian cellsNat. Rev. Genet.Year: 2001229230111283701 
15.  Levsky JM,Singer RH. Fluorescence in situ hybridization: past, present and futureJ. Cell Sci.Year: 20031162833283812808017 
16.  Huang B,Babcock H,Zhuang X. Breaking the diffraction barrier: superresolution imaging of cellsCellYear: 20101431047105821168201 
17.  Zhao Z,Tavoosidana G,Sjolinder M,Gondor A,Mariano P,Wang S,Kanduri C,Lezcano M,Singh Sandhu K,Singh U,et al. Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra and interchromosomal interactionsNat. Genet.Year: 2006381341134717033624 
18.  Simonis M,Klous P,Splinter E,Moshkin Y,Willemsen R,de Wit E,van Steensel B,de Laat W. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation captureonchip (4C)Nat. Genet.Year: 2006381348135417033623 
19.  Dostie J,Richmond TA,Arnaout RA,Selzer RR,Lee WL,Honan TA,Rubio ED,Krumm A,Lamb J,Nusbaum C,et al. Chromosome conformation capture carbon copy (5C): a massively parallel solution for mapping interactions between genomic elementsGenome Res.Year: 2006161299130916954542 
20.  Rodley CDM,Bertels F,Jones B,O’Sullivan JM. Global identification of yeast chromosome interactions using genome conformation captureFungal Genet. Biol.Year: 20094687988619628047 
21.  LiebermanAiden E,van Berkum NL,Williams L,Imakaev M,Ragoczy T,Telling A,Amit I,Lajoie BR,Sabo PJ,Dorschner MO,et al. Comprehensive mapping of longrange interactions reveals folding principles of the human genomeScienceYear: 200932628929319815776 
22.  Dekker J,Rippe K,Dekker M,Kleckner N. Capturing chromosome conformationScienceYear: 20022951306131111847345 
23.  de Wit E,de Laat W. A decade of 3C technologies: insights into nuclear organizationGenes Dev.Year: 201226112422215806 
24.  Iyer B,Kenward M,Arya G. Hierarchies in eukaryotic genome organization: insights from polymer theory and simulationsBMC Biophys.Year: 20114821595865 
25.  MartiRenom MA,Mirny LA. Bridging the resolution gap in structural modeling of 3D genome organizationPLoS Comput. Biol.Year: 20117e100212521779160 
26.  Rippe K. Making contacts on a nucleic acid polymerTrends Biochem. Sci.Year: 200126, 73374011738597 
27.  Fraser J,Rousseau M,Shenker S,Ferraiuolo M,Hayashizaki Y,Blanchette M,Dostie J. Chromatin conformation signatures of cellular differentiationGenome Biol.Year: 200910R3719374771 
28.  Duan Z,Andronescu M,Schutz K,McIlwain S,Kim YJ,Lee C,Shendure J,Fields S,Blau CA,Noble WS. A threedimensional model of the yeast genomeNatureYear: 201046536336720436457 
29.  Tanizawa H,Iwasaki O,Tanaka A,Capizzi JR,Wickramasinghe P,Lee M,Fu Z,Noma K. Mapping of longrange associations throughout the fission yeast genome reveals global genome organization linked to transcriptional regulationNucleic Acids Res.Year: 2010388164817721030438 
30.  Rousseau M,Fraser J,Ferraiuolo M,Dostie J,Blanchette M. Threedimensional modeling of chromatin structure from interaction frequency data using Markov chain Monte Carlo samplingBMC BioinformaticsYear: 20111241422026390 
31.  Gehlen LR,Gruenert G,Jones MB,Rodley CD,Langowski J,O’Sullivan JM. Chromosome positioning and the clustering of functionally related loci in yeast is driven by chromosomal interactionsNucleusYear: 2012337038322688649 
32.  de Gennes PG. Scaling Concepts in Polymer PhysicsYear: 1979New YorkCornell University Press 
33.  Shimada J,Yamakawa H. Ringclosure probabilities for twisted wormlike chains. Application to DNAMacromoleculesYear: 198417689698 
34.  Robinson PJJ,Fairall L,Huynh VAT,Rhodes D. EM measurements define the dimensions of the “30nm” chromatin fiber: evidence for a compact, interdigitated structureProc. Natl Acad. Sci. USAYear: 20061036506651116617109 
35.  Rosa A,Becker NB,Everaers R. Looping probabilities in model interphase chromosomesBiophys. J.Year: 2010982410241920513384 
36.  Tokuda N,Terada TP,Sasai M. Dynamical modeling of threedimensional genome organization in interphase budding yeastBiophys. J.Year: 201210229630422339866 
37.  Allen MP,Tildesley DJ. Computer Simulation of LiquidsYear: 1987New YorkOxford University Press 
38.  Langowski J,Heermann DW. Computational modeling of the chromatin fiberSemin. Cell Dev. Biol.Year: 20071865966717936653 
39.  Press WH,Teukolsky SA,Vetterling WT,Flannery BP. Numerical Recipes in CYear: 19922nd ednCambridge, UKCambridge University Press 
40.  Iniesta A,de la Torre JG. A secondorder algorithm for the simulation of the Brownian dynamics of macromolecular modelsJ. Chem. Phys.Year: 19909220152018 
41.  Klenin K,Merlitz H,Langowski J. A Brownian dynamics program for the simulation of linear and circular DNA and other wormlike chain polyelectrolytesBiophys. J.Year: 1998747807889533691 
42.  Ghirlando R,Litt MD,Prioleau MN,RecillasTarga F,Felsenfeld G. Physical properties of a genomic condensed chromatin fragmentJ. Mol. Biol.Year: 200433659760515095975 
43.  Lavelle C. Forces and torques in the nucleus: chromatin under mechanical constraintsBiochem. Cell Biol.Year: 20098730732219234543 
44.  Haynes WMCRC Handbook of Chemistry and PhysicsYear: 201192nd ednBoca Raton, FL, USACRC Press 
45.  Widrow B,Glover JRJ,McCool J,Kaunitz J,Williams C,Hearn R,Zeidler J,Eugene Dong J,Goodlin R. Adaptive noise cancelling: principles and applicationsProceedings of the IEEEYear: 1975Vol 6316921716 
46.  Widrow B. Kalman RE,De Claris NAdaptive filtersAspects of Network and System TheoryYear: 1970New YorkHolt, Rinehart and Winston503587 
47.  Widrow B,Stearns SD. Adaptive Signal ProcessingYear: 1985Englewood Cliffs, NJPrentice Hall 
48.  Pettersen EF,Goddard TD,Huang CC,Couch GS,Greenblatt DM,Meng EC,Ferrin TE. UCSF chimera—a visualization system for exploratory research and analysisJ. Comput. Chem.Year: 2004251605161215264254 
The LMS algorithm (^{45},^{46}) has found numerous applications in the field of adaptive signal processing, including adaptive system identification, adaptive inverse modeling, adaptive control and adaptive interference canceling (^{47}). This algorithm was developed to optimize iteratively and dynamically the weights of a digital filter structure, known as ALC, that performs the dot product y_{k} = w_{k}^{T}x_{k}, where is the ALC output, is a vector containing n + 1 adjustable weights, is a vector containing n + 1 inputs and k is the current time step for the inputs, weights and output. The choice of the inputs in and the role of the output depend on the specific application of the ALC. All applications, however, include a desired signal and require the adjustment of at each time step k so that the output is, on average, as close as possible to or, equivalently, so that the magnitude of the error
averaged over a long interval of k, is as small as possible. The degree to which the ALC meets this requirement can be quantified, as a function of , by defining the quadratic performance surface χ = E[ε_{k}^{2}], where the expected value is taken over the time step k. Hence, the requirement to achieve optimality of the ALC is that the weight vector be adjusted at each time step k to minimize . The LMS algorithm addresses this requirement by using a variant of the steepest descent algorithm. This variant replaces the gradient of the quadratic performance surface with a simpler estimate obtained at time step k directly from , i.e. where is the gradient operator with respect to the components of the weight vector . The gradient estimate is then used to calculate an improved weight vector from the current one, where is a gain factor that determines the size of the step along the negative gradient estimate. A small value of causes slow convergence, whereas too large a value of causes instability of the algorithm. It has been shown (^{47}) that the LMS algorithm is stable for , where is the input correlation matrix. The strengths of the LMS algorithm are its simplicity, robustness and relatively rapid convergence despite the presence of noise in the input and desired signal .Figures
[Figure ID: gks1029F1] 
Figure 1.
Main components of the proposed computational approach to recover a conformation ensemble from a given set of reference CPs. 
[Figure ID: gks1029F2] 
Figure 2.
Schematic representations of (a) restrained beadchain polymer model used for BD simulations of a 30nm chromatin fiber subjected to looping constraints and (b) application of the LMS algorithm to the optimization of the parameters in the general linear model (Equation 9) used to predict restraint spring constants from reference CPs. 
[Figure ID: gks1029F3] 
Figure 3.
Energyminimized conformations of the test systems used to generate reference CPs for validating the proposed computational method. The systems are labeled as in Table 2. Images were generated using UCSF Chimera (^{48}). 
[Figure ID: gks1029F4] 
Figure 4.
Heat maps representing (a) reference and (b) recovered CPs for a chain of 45 beads with (left) four free loops or (right) four tied loops. Free loops result from connecting loop endbeads with harmonic restraints (gray arcs in topleft schematic), while tied loops result from connecting middle beads across free loops (dotted arcs in topright schematic). Blue circles on the maps identify pairs of beads that were restrained (a) when generating reference CPs and (b) when performing the ensemble recovery procedure. Test systems with two and three loops (Table 2) yielded a similarly good visual match between reference and recovered CP maps (data not shown). 
[Figure ID: gks1029F5] 
Figure 5.
Variation of bead CPs with mean interbead distance in reference ensembles for chain (a) without restraints, (b) with four free loops and (c) with four tied loops. Each point represents one of the possible bead pairs in the chain. Error bars are standard deviations over 10 independent simulations. The dashed line in (a) is a fit of the power law , giving . Inset in (a): looping probability versus loop length in ensemble for chain without restraints. The curve in this inset is a fit of Equation 3 from (^{26}). The dashed curves in (b) and (c) are power laws with exponents −8 and −3. Distances are in units of . 
[Figure ID: gks1029F6] 
Figure 6.
Initial conformations used in eight trials of the ensemble recovery procedure for a chain with 35 beads and 6 restraints (third row in Table 2), shown before (top) and after (bottom) minimization of the potential energy, Equation 1. Images were generated using UCSF Chimera (^{48}). 
[Figure ID: gks1029F7] 
Figure 7.
Plots of RMSD of mean interbead distances (Equation 17) against RMSD of CPs (Equation 15) for all trials of the ensemble recovery procedure and for all tested systems. Each point represents RMSD values obtained from an ensemble of 10^{6} conformations at the end of a particular trial. Inset: enlarged view of boxed area. Distances are in units of . 
[Figure ID: gks1029F8] 
Figure 8.
Comparison of (a,b) bead CPs, (c,d) mean interbead distances and (e,f) standard deviation of interbead distance determined from optimal recovered ensemble to respective quantities determined from reference ensemble for a chain with 45 beads and (a,c,e) 4 free loops or (b,d,f) 4 tied loops. Each point represents one of the possible bead pairs in the chain. The dashed lines are plots of y = x, not linear fits. Distances are in units of . Better correlations were observed for test systems with three and two loops (data not shown). 
[Figure ID: gks1029F9] 
Figure 9.
Spatial representation of reference (left) and recovered (right) conformation ensembles for bead chains of 45 beads with (a) four free loops and (b) four tied loops. Each depicted ensemble consists of 100 conformations extracted at equal intervals from 10^{8} steps of BD simulation and aligned on the beads that were connected by harmonic restraints in the simulations used to generate the reference ensembles. The coloring order along each chain is red, yellow, green, cyan and blue. Images were generated using UCSF Chimera (^{48}). 
Tables
Parameter values used to simulate the restrained beadchain polymer model of chromatin and to provide a physically realistic approximation of the mechanical properties of chromatin, as currently known from experiments
Parameter  Symbol  Reduced units  SI units 

Thermal energy^{a}  1.0  
Bead mass^{b}  m  1.0  
Lennard–Jones size parameter  1.0  30 nm  
Lennard–Jones energy parameter  
Bead separation  30 nm  
Contact distance^{c}  45 nm  
Bond spring constant^{d}  
Persistence length^{e}  120 nm  
Bending energy constant  
Time step/damping constant^{f} 
gks1029TF1^{a}Energy per bead per degree of freedom at T = 300 K.
gks1029TF2^{b}Representative value based on the experimental measurement of 23.3 MDa for a 15.5kb fragment of 30nm chromatin upstream of the chicken globin locus (^{42}).
gks1029TF3^{c}Following Rosa et al. (^{35}), equivalent to assuming that contacts between chromatin fibers are mediated by proteins of 15nm diameter.
gks1029TF4^{d}From experiments, the stretching modulus is 5–150 pN (^{43}), hence ranges from to .
gks1029TF5^{e}From experiments, 30 – 200 nm (^{43}).
^{f}To maximize conformation sampling efficiency, we used the largest value of found to maintain stability of the BD simulations. A lower bound for can be estimated by considering a chromatin sphere of radius r = 15 nm and using with the viscosity of water = 890 µPa s at 25°C and 1 bar (^{44}). Then, 18 ns.
Validation of the conformation ensemble recovery procedure using reference CPs estimated by simulating test systems of increasing complexity
gks1029TF6^{a}Characteristics of test systems used to generate conformation ensembles from which reference CPs were estimated.
gks1029TF7^{b}Results of ensemble recovery procedure applied to reference CPs.
gks1029TF8^{c}RMSD between recovered and reference values of restraint spring constants (k), CPs (p) and mean interbead distances ( ), achieved when using a general linear model (Equation 9) with the specified number of parameters per spring constant.
gks1029TF9^{d}Label used to identify test system in Figure 3.
gks1029TF10^{e}Number of beads in the chain.
gks1029TF11^{f}Number of restraints used to induce the loops in the bead chain.
gks1029TF12^{g}Number and type of induced loops.
gks1029TF13^{h}Number of restraints found by peak detection algorithm.
gks1029TF14^{i}Number of trials performed to select the optimal ensemble.
gks1029TF15^{j}Average computation time per trial in hours when performing each trial with n + 1 parameters on one core of a 2.2GHz AMD Opteron Processor 2427.
Article Categories:

Previous Document: Cloning, expression, crystallization and preliminary Xray crystallographic analysis of aspartyl ami...
Next Document: The RNA helicase DDX5/p68 is a key factor promoting cfos expression at different levels from transc...