Using synthetic bacterial enhancers to reveal a looping-based mechanism for quenching-like repression

We explore a model for ‘quenching-like' repression by studying synthetic bacterial enhancers, each characterized by a different binding site architecture. To do so, we take a three-pronged approach: first, we compute the probability that a protein-bound dsDNA molecule will loop. Second, we use hundreds of synthetic enhancers to test the model's predictions in bacteria. Finally, we verify the mechanism bioinformatically in native genomes. Here we show that excluded volume effects generated by DNA-bound proteins can generate substantial quenching. Moreover, the type and extent of the regulatory effect depend strongly on the relative arrangement of the binding sites. The implications of these results are that enhancers should be insensitive to 10–11 bp insertions or deletions (INDELs) and sensitive to 5–6 bp INDELs. We test this prediction on 61 σ54-regulated qrr genes from the Vibrio genus and confirm the tolerance of these enhancers' sequences to the DNA's helical repeat.

T F = 4l (red sphere) bending the chain by 30 ¶ . (E) Illustration (not to scale) of driver-polymerase interaction criteria. Both proteins are bound to the DNA (light-blue cylinders) and directed alongû from the links' centers. The driver (purple circle) is bound to the last link and the polymerase (red circle) to the first link, which is also identified as the location of the ‡ 54 factor. The centers of the driver and the polymerase are denoted by r drv and r pol , respectively, and the center of the ‡ 54 factor is denoted by r 0 = r ‡ 54 . The interaction volume "r in which r drv can reside is shown as a light-green wedge. The possible directions forû N are shown as a light-blue cone with its base in r drv .   sequences generated randomly with equal probability for the 4 nucleotides at each position (blue), and for the candidate promoters from the loop sequences (red). All candidate promoters are more than 2 ‡ from the mean of the gaussian fit (cyan) to the random sample. (B) Histogram of inter-site distances for all candidate binding sites (blue) and for the tandem binding sites (red). Range limited to [0, 50] for clarity.  fluorescence measurements that are subsequently used for the experimental determination of the expression level ratio. The panels plot inducer concentration on the x-axis and raw fluorescence measurements on the y-axis for the synthetic enhancers with looping length N with a single TF binding site displaced k bp away from the driver. All datasets were fit by Hill-1 functions to determine expression level ratio values with error-bars. (A) TetR-repression. (B) TetR-activation. (C) TraRrepression. (D) TraR-activation. (E) TraR -no regulation. (F) LacI-repression. The expression level ratio values determined for these data sets were: (A) 77% ± 5% , (B) 120% ± 5% , (C) 67% ± 6%, (D) 133% ± 3%, (E) 95% ± 5% and (F) 57% ± 3%. Note that increasing inducer levels has di erent e ects for the di erent proteins: IPTG and aTc remove LacI and TetR, respectively, from DNA, while 3OC8 enables binding of TraR to DNA. (G) Comparison of dose response functions for LacI and LacI-GST for the single LacI binding site synthetic enhancer set at k=95 bp. LacI-bound synthetic enhancers (blue) and LacI-GST-bound synthetic enhancers (red). Tables   nt  1  2  3  4  5  6  7  8  9  10  11  12   3 Supplementary Note 1: looping in the context of the wormlike chain (WLC) model

The discrete WLC model
To evaluate the probability ratioR n (N, k, s, ...) theoretically we need to model the looping probabilities P looped,n (N, k, s, ...) of the di erent enhancer configurations. We begin with the simple case of DNA looping without any proteins present. DNA is typically modeled as a discrete semi-flexible chain made of individual links of length l, such that the deviation of one link from its adjacent counterpart depends solely on some elastic bending energy. The chain links do not interact with each other. This class of polymer models is based on the original work of Kratky and Porod [1] and is referred to as the class of wormlike chain models (WLC). This class includes both discrete (i.e., a chain consists of a finite number of links with a certain length) and continuous models.
A chain is described by the locations of its links, and a local coordinate system defined by three orthonormal vectorsû,v,t at each link. The vectort points along the direction of the chain links. For the continuous WLC these vectors are defined continuously along the chain contour. For the discrete WLC, a chain is defined by its joint locations r i , along with the local coordinate systems of all links. An example of a discrete WLC of five links, with unit link length is shown in Supp. Fig.  1A.
The elastic energy of the entire chain can be broken into a sum of contributions from individual chain links. The elastic energy of a single ith link in the chain consists of two contributions. The first contribution is the the elastic energy associated with bending link i oe {2, ..., N} relative to link i ≠ 1 with angles ◊ i , " i (zenith and azimuthal angles in local spherical coordinates of the previous link). This is the conventional bending energy, which can be written as: assuming azimuthal symmetry, where -= (k b T ) ≠1 , T is the temperature, k b is the Boltzmann factor, and a is the bending constant of the DNA chain. The second contribution accounts for the energy associated with twisting each link in the DNA double helix by i (twist angle) with respect to the previous link (i.e. rotatingû i relative toû i≠1 at an angle of i ). This term is modeled similarly to the term used to describe the twisting energy of a torsion spring [2]: where c is the twisting rigidity constant, and where we denote the relaxed twisting angle of the DNA chain (the native twist of ¥ 1.81 radian per nm) by 0 . Consequently, the resulting elastic energy is Note that all the angles ◊ i , " i , i are given in the local coordinate system of the (i ≠ 1)th link. We number the links of a chain in the range 1..N for a chain of N links. For a specific configuration of the chain we introduce a notation {◊ n , " n , n } to denote the set of all the links' angles of the chain, from link 1 to link n. It follows then that the energy of the entire chain consisting of N links is given by The configurational partition function for the model DNA chain consisting of N links immediately follows: This model has been extensively studied in the past [3,4,2].

The discrete WLC Monte-Carlo algorithm
The probability of looping for a given choice of parameters P looped,n (N, k, s, ...) can be calculated using a Monte-Carlo algorithm based on the importance sampling method [5] (as described in [6]). The algorithm generates a faithful statistical ensembles consisting of N c ¥ 10 9 DNA chains. Any physical observable can then be computed from the generated ensemble by The probability of looping is computed by This provides the basis for the self-avoiding wormlike chain model presented in the next section.
4 Supplementary Note 2: looping in the context of the self-avoiding wormlike chain (SAWLC) model

The discrete SAWLC model
Except for a few notable exceptions [7,8], the WLC model does not take into account energetic and entropic e ects that emerge from the cross-section or "thickness" of the DNA double helix. In order to model the e ects of the e ective DNA cross-section size, we must take into account an additional contribution to the elastic energy. We engulf each end-point of a link (a joint in the chain) by a "hard-wall" spherical shell of diameter w. An example of such a chain with three links (four joints), and w = 2.5l is shown in Supp. Fig. 1B. This allows us to model the final contribution to the elastic energy as a set of hard-wall potentials. We denote the end-point of link i as joint i. As previously mentioned, the chain links are numbered 1..N . Joint 0 is the beginning terminus of the chain. For the simple case in which the chain link length l is larger than the chain diameter (l Ø w) and therefore no two neighboring hard-wall spheres overlap, the hard-wall potential energy for the ith chain link is defined as: This allows us to write an expression for the total elastic energy associated with the chain of spheres as follows: We term this model the self-avoiding wormlike chain model (SAWLC). Similarly, Eq. (5) holds for the SAWLC, with the substitution of (9) for (4). Performing the substitutions (9), (8) into (5) and opening the sums yields: which in turn can be written as: In the case in which the chain link length l is smaller than the chain diameter (l < w), two or more consecutive spheres overlap. This is resolved by introducing i such that there can be an interaction between links j, k only if |j ≠ k| Ø i Ø w l . This changes (11) to: Recently, we made a significant contribution to the understanding of polymer cyclization or looping by providing a detailed analysis of this phenomenon in the context of the SAWLC [6]. In this work, we first confirmed numerically previous renormalization group predictions [7,8,9] and scaling theory [10] models, and subsequently provided new numerical analysis that was applicable to resolving the hyper-bendability e ect observed in DNA cyclization experiments for short dsDNA [11,12]. The algorithm used for the polymer cyclization analysis forms the basis for the algorithm that we describe below.

The discrete SAWLC Monte-Carlo algorithm
To faithfully sample the configurational space of the SAWLC model we utilized a Monte-Carlo algorithm based on the weighted-biased sampling method [13] to generate statistical ensembles consisting of N c ¥ 10 7 ≠ 10 9 DNA chains. In brief, the algorithm generates chains by growing them one link at a time, while systematically checking that the new link does not cross any previous links in the chain. When such a crossing occurs, this fact is taken into account by updating the weighting factor of the chain W . This counter-weight balances the over/under-representation of that chain in the ensemble. For a more detailed description of the method see [6].
Any physical observable f can then be estimated from the generated ensemble using the following formula: which includes the probability of looping defined by Eq. 7.

Modeling a DNA chain with a bound protein
In order to expand our basic self-avoiding simulation of polymer configurations to model a nucleoprotein structure made of dsDNA and proteins, we first consider some broad geometric characteristics of dsDNA structure. dsDNA is composed of two interwound helical strands, which form a double-helix backbone linked by base-pairs. The backbone exhibits major and minor grooves. The major groove is the wider groove between the two, while the minor groove corresponds to the situation where strands are closer together on the relevant side of the double helix than on the other. Since most DNA binding proteins bind onto the major groove of the DNA, we must continuously keep track of the location of the major groove along the DNA to properly model protein-DNA binding. To that end, we use one of the vectors defining the local coordinate system of each link:û i (Supp. Fig. 1A). On each link,û i points from the center of the chain's cross-section to the center of the major groove. As a result, we model a protein bound to a binding site of n basepairs, with m being the index of the first basepair of the binding site, by a protrusion from the chain shaped like a sphere with some diameter w protein representative of the protein's volume (Supp. Fig. 1C): where we defined the center link of the binding site as Note that r c is the joint at the end of link c.

Monte-Carlo simulation of a DNA chain with a bound protein
In order to adapt our algorithm to the case of protein-bound DNA, we have to take into account not only the growing chain but also the location of the protrusion as defined by (Eq. 15). During the application of the Monte-Carlo algorithm, upon reaching link (m + n/2) in a specific chain, the simulation attempts to add a hard-wall sphere with diameter w protein at the location r protein . If the sphere overlaps any of the previously generated objects (chain links or other proteins) the chain is discarded, and a new chain is grown from the beginning. The excluded volume of the new sphere is taken into account, when subsequent links are generated. Only completed nucleoprotein chains are properly weighted and counted within the new nucleoprotein configurational distribution.

Modeling bending and sti ening of a thick chain
The excluded volume of the protein bound to a binding site of n basepairs, with m being the index of the first basepair of the binding site, is introduced via a spherical protrusion, as described in section 4.3.1 for the general case. Local DNA bending and sti ening e ects induced by the bound protein are simulated via changes applied to the [m, m + n ≠ 1] chain links as they are being generated.
Local sti ening of the DNA is simulated by using a di erent bending constant a Õ > a (Eq. 1). Local bending of the DNA by an angle of Ÿ is simulated by introduction of a small rotation of the local coordinates system at each link in the protrusion binding site by Ÿ/n aroundv c (the unit vectorv of the center link). This results in a cumulative bending of the chain "around" the bound protein (Supp. Fig. 1D). The energy associated with this additional rotation is zero. To properly generate the bend using our Monte-Carlo approach, the simulation estimates the value ofv c of the center link of the binding site for all binding- where P is the helical repeat of the chain (P ¥ 10.5 for DNA) and R i . This approximation is valid for the simulation of DNA since n/2 is smaller than the bend and twist persistence lengths of ≥50 ≠ 100 nm. by at least an order of magnitude (see Simulation Parameters in Materials and Methods). Thus, at these length scales the DNA is expected to be fairly straightand each successive link i is rigidly twisted by ≥ 2fi P aroundt i relative to link i ≠ 1.

Looping criterion
After numerically generating the configurational ensemble, the subset of "looped" configurations needs to be properly defined numerically. In the basic cyclization model, looping was defined [6] by having both ends of the polymer within some distance ". However, unlike the DNA cyclization experiments in which the DNA chain segment simply closes on itself, ‡ 54 transcription initiation requires the interaction of two proteins bound to the chain's ends. In particular, the driver interacts with the ‡ 54 factor located directly on the DNA beneath the polymerase, thus requiring the driver to access the polymerase complex from the bottom [14]. Therefore, to faithfully model this process via looping, we must add at least two proteins to the ends of the chain -the driver and the polymerase. For simplicity, we neglect the volume of the ‡ 54 . We define a looped state as one in which the driver protein interacts with the ‡ 54 factor in only a subset of the possible solid angles, corresponding to the ‡ 54 acceptance cone and the extent of the GAFTGA loop of NtrC [14]. Thus, the criteria for a looped driver-promoter interaction as they are shown in Supp. Fig. 1E, are as follows: 1. The driver and the ‡ 54 factor must be in close proximity, defined by maximal separation of The direction from the DNA to ‡ 54 factor is collinear with the the direction from the ‡ 54 factor to the driver, within the range "Ê. I.e. (r drv ≠ r ‡ 54 ) is collinear with ≠û 1 within the range "Ê. (Conditions 1-2 define a volume "r in which r drv must be located, illustrated as a green cone in Supp. Fig. 1E).
3. The direction from the DNA to the driver is collinear with the the direction from the driver to the ‡ 54 factor, within the range "Ê. I.e. (r ‡ 54 ≠ r drv ) is collinear withû N within the range "Ê (Illustrated as a light blue cone with its base in r drv centered aroundû N in Supp. Fig. 1E). Using this definition, our algorithm can compute P looped,0 (N ) for a bacterial enhancer-promoter system of looping length N .

Modeling looping of a thick chain with a bending or sti ening protrusion
Any configuration of bound proteins can be simulated in this way, allowing us to compute P looped,n (N , k, s, ...) in a straightforward fashion. For example, we consider a single TetR transcription factor bound to an enhancer of length 100 bp. The TetR transcription factor is roughly spherical and can be modeled well by a spherical protrusion with diameter of w T etR = 5.44 nm, and with a binding site size of n = 18 bp, on an enhancer-promoter chain that is N = 500 bp long. Consequently, such a protein bound to a binding site spanning links 25 ≠ 42 is modeled by a sphere located at r 34 , according to Eq. (15). Additional sti ening of the DNA by a factor of two is simulated by using a Õ = 2a during the generation of the links 25 ≠ 42 . Additional bending of the DNA by Ÿ = 10 ¶ is simulated by 18 instantaneous rotations of the local coordinate system by 10 ¶ 18 during generation of the links 25 ≠ 42 around the orientation ofv 34 , approximated at each link 25 AE i AE 42 as described by Eq. (17). This then allows us to generate the properly weighted nucleoprotein ensemble, and compute P looped,n=1 (N = 500, k = 34) in a straightforward fashion as defined above.

Thermodynamic modeling of looping-initiated transcription
Our proposed numerical model is capable of calculating the probability of looping for a given experimental setup. The output of the in vivo looping experiments, however, is measured indirectly by monitoring the resultant reporter protein fluorescence level when the cells reach steady-state. We need to derive a model that connects the experimental readout to the probability of looping. Since we measure the average fluorescence for a population of cells in our experiments, we only need to provide a prediction for the mean expression levels in our simulations. While a full stochastic model (e.g., [15]) can provide a prediction for the second and higher moments of the distribution, it is has been shown that the stochastic model's expression for the first moment is equivalent to the mean expression level obtained from a thermodynamic equilibrium model, that is constructed using the same assumptions [16].
To model a minimal enhancer, which consists of only the driver binding site and the promoter, we make the following assumptions: • We assume that the glnAp1 promoter is only active when the concentration of NRI ≥ P is vanishingly small (for a justification of this assumption see [17] ). When a small amount of NRI ≥ P accumulates, the hexameric complex assembles, which simultaneously strongly represses the glnAp1 promoter while activating glnAp2. This means that all transcription in our system is a result of the glnAp2 promoter.
• There is always a bound "poised" polymerase at the promoter awaiting an activation signal.
While it was recently shown that the RNAP releases from and rebinds to the promoter frequently [18], it was also previously shown that ‡ 54 promoters are mostly occupied by poised polymerases both in vivo and in vitro [19,20,21]. This means that we do not need to include states that lack bound RNAP in our model.
• For the NRI ≥ P hexamerization, a cooperative process, the appropriate expression for equilibrium binding is given by: where [NRI ] is the concentration of phosphorylated NRI ≥ P dimers, K NRI is the NRI ≥ P dissociation constant that incorporates the cooperativity of the binding interaction, and m > 1 is some coe cient that signifies the multimerization of NRI ≥ P at the two NRI sites. One can expect m to be as high as 6, but it could also be lower since NRI ≥ P is a dimer in solution. Hence, we expect 3 < m < 6 [22]. The subsequent constant production of NRI ≥ P in our experiments and the cooperative binding allows us to assume that: which allows us to posit that a driver complex [i.e. (NRI ≥ P ) m ] is always bound at the driver binding sites.
Given these assumption, we only need to model two states: the driver-and-RNAP-occupied enhancerpromoter non-looped state, and the transcriptionally active looped state.
• Finally, we assume that the rates of driver binding, looping, and unlooping are much faster than the subsequent rates involved in transcription. This means that before ATP can be hydrolyzed and an open complex be formed at the promoter, the driver-DNA- ‡ 54 complex has su cient time to equilibrate. This in turn means that the DNA-bound driver complex has su cient time to explore its conformational space. This assumption is supported by kinetic data recently obtained by [18]. This means that the two relevant states are in thermodynamic equilibrium, and can be modeled accordingly.
Given these assumptions, we begin by writing the rate equation, which describes the kinetics of looping-initiated transcription for a single bacterial enhancer: where P int (N ) is the probability of the driver complex bound N links from the polymerase to interact with the polymerase, [mRN A] is the mRNA concentration, -is the rate for transcription per unit volume after the looped structure between the polymerase and driver has formed, andis an mRNA degradation rate constant. In steady state the fluorescence reporter level is proportional to the number of mRNA transcripts, resulting in: where F l corresponds to reporter protein concentration (or fluorescence level readout). Often the exact values of -, -and the proportion of F l to [mRN A] are not known. The probability of the driver and the polymerase to be in close enough proximity to interact is given by: where › i are the N coordinates that define a conformation, E conf is the energy of a configuration, and -= (k b T ) ≠1 , where T is the temperature and k b is the Boltzmann constant. What actually constitutes a "looped configuration" is defined in 4.3.4. In order to write down a simpler expression for the probability of looping, we denote for convenience a general expression for a partial partition function constrained by some condition: where the integral is taken only over those configurations that satisfy the specific condition. This allows us to define the looping probability functions as follows: To develop an expression for the interaction probability, we add a configuration-independent driverpolymerase interaction energy denoted by E nr to the energy of a looped configuration E conf . This allows us to define the interaction partition functions as follows: leading to the following expression for the probability of interactions: where the index 0 corresponds to the number of TF binding sites in the enhancer. Since Z non≠looped (N ) ¥ Z all (N ), by dividing the numerator and denominator by Z all (N ) we arrive at: where K nr © exp (≠-E nr ) is the dissociation constant of the protein-protein interaction in the looped conformation . For convenience we define the "looping capacity" as: resulting in a compact result for the minimal enhancer interaction probability: Note, that Amit et al. [17] demonstrated experimentally that the model described above and summarized in Eq. (29) adequately describes the transcriptional kinetics of our (NRI ≥ P ) 6 ≠ ‡ 54 system. In addition, Friedman et al. [18] used single molecule kinetics experiments to show that assumptions above are valid via careful measurements of every rate constant in the enhancer-poised promoter complex formation and subsequent transcription initiation.

Connecting thermodynamic model and experiment
By normalizing the results of one experiment by those of another experiment (see Eq. (21)) we can derive an experimentally measurable expression that both eliminates the need to experimentally determine --and corresponds to the ratio of the driver-polymerase interaction probabilities P (i) int for the experiments i = 1, 2: Below, we demonstrate that under certain assumptions that hold for our experimental setup, can be replaced by the ratio of the looping probabilities for the two experiments i = 1, 2.
Based on the kinetic measurements that were made by [18], we can now estimate the magnitude of P int,0 (N ). We begin with the rate equations described in their model [18, Fig.7]: d dt d dt where [ ‡ 54 ] is the concentration of available ‡ 54 , RP 1 and RP 2 are two types of closed complexes in which DNA remains base-paired, and RP 0 is the open complex in the looped configuration (interacting state). Solving for the fraction of DNA in interacting state, and plugging in the values found by [18], we can extract the probability for finding the driver-polymerase in a bound-looped state RP 0 as follows: This implies (see Eq. (29)) that P int,0 (N ) ¥ ‰ 0 (N ) 1+‰ 0 (N ) π 1, and therefore Using Eqs. (30), (33) and (28), we obtain for the experiments i=1,2. This expression enables direct comparison of our experimental measurements to the numerical simulations. Finally, we note that since [18] used DNA segments of comparable lengths to the ones modeled by us, we will assume that Eq. (33) and the assumption hold for all looping capacities (all n) considered in this work.

Enhancer with transcription factor (TF) binding site
When the enhancer contains one TF binding site (n = 1) we can write the probability of interaction in a similar fashion to Eq. (27): where k < N is the number of base pairs between the center of the driver binding site and the center of the TF binding site and K T D is the binding constant of the TF to its binding site. Again, by dividing the numerator and the denominator by Z all (N ): where ‰ 0 and ‰ 1 denote the looping capacities for an enhancer without a bound TF (equal to the looping capacity of the minimal enhancer) and the looping capacity for an enhancer with a TF bound to it, respectively. To quantify the e ect of TF binding on transcription, we divide the reporter concentration with available TF by the reporter concentration obtained without TF . Following Eqs. (30), (29) and (37) we obtain: By performing the experiment with saturating concentrations of the TF we can quantify the maximal regulatory e ect: Further simplifying with Eqs. (33) and (35) yields: where P looped,n is the probability of looping with n TFs bound. Eq. (40) thus provides the connection between the experimental expression level ratio of saturating and zero TF concentrations R 1 (N , k) and the theoretical looping probability ratioR 1 (N, k). The latter can be obtained directly from the looping probabilities P looped,1 (N, k) and P looped,0 (N ), which are computed using the numerical Monte-Carlo simulation.

Enhancer with two TF binding sites
In this case (n = 2), we define an additional parameter s to denote the number of base pairs between the center of the first TF binding site and the center of the second TF binding site. In a similar fashion to the one binding site case, we obtain: Here In order to provide an explanation for the non-monotonic increase of the regulatory e ect as k approaches N/2 observed both for our model (Fig. 1) and experiments with TraR (Fig. 2B, 3C), LacI-GST (Fig. 4A), and TetR ( Figure 2D), we chose to analyze the most-likely looped configuration ' 0 predicted by our model. For simplicity, we neglected the volume of the chain, i.e., we used the WLC model. Discrete minimal-energy looped configurations were found for N = 10, 20, 50 as the numerical solutions that minimize the total configuration energy of Eq. (4), with boundary conditions that satisfy the ideal looping criterion, namely that the chain ends coincide. For comparison, solutions were approximated to continuous curvesr N (uN l) with u oe [0, 1] and then renormalized to unit length r N (u) =r N (uN l) /N l. The curves for the di erent N values were found to collapse onto a single curve r ' 0 (u), indicating that the minimal energy loop ' 0 is independent of N . Supp. Fig. 2A shows ' 0 on a unit length curve, parametrized by u oe [0, 1], in the loop plane. Note that the minimal energy loop for the 3D problem is in fact two-dimensional. The figure shows that the minimal-energy (or most-likely-to-occur) loop is not a circle as might conventionally be thought, but rather shaped like a teardrop. The implications of this geometry can be seen in Supp. Fig. 2B, where we plot the numerically-calculated squared curvature 0 as a function of the normalized length u. The figure shows that the region of highest curvature is at u = 1 2 , precisely half-way along the contour, whereas the curvature near both ends is very small. This indicates that it is energetically preferential for loops to be more curved near the center of the loop than close to the ends, and loops whose shapes are the same as or closely resemble the minimal loop ' 0 will be most probable. We now calculate the curvature-dependence of the energy of link j, for a particular configuration '. This is certainly valid in the elastic regime, i.e. for chains whose length is N . b. From Eq. (1): In the limit of N ∫ 1, the discrete curve along the points {r 0 , ..., r N } can be approximated with a continuous curver (s) parametrized by s oe [0, Nl], which yields: After renormalizingr (s) to a unit length curve r ' (u) =r (s) /N l parametrized by u oe [0, 1], we obtain: where ---is the curvature, which depends only on the geometric shape of the specific configuration '. Thus the energy contribution at the jth link is approximated by: We now ask where to expect the maximal regulatory e ect for a bound TF, based on the shape of ' 0 and on Eq. (47). Since the energy depends on the squared curvature Ÿ 2 ' 1 j N 2 and the curvature was found numerically to be maximal at j = N/2, we expect a TF at the N/2 position to produce the greatest change in energy, and thus to produce the strongest regulatory e ect. We consider the separate e ects of the bound TF. Sti ening (increasing a) hinders curving in all directions (including the loop-forming direction), which increases the energy of the ' 0 -like configurations and results in a down-regulatory e ect. Bending either assists curving in the loop-forming direction (reduces energy) or hinders curving (increasing energy), depending on the direction of the bending, resulting in an upor down-regulatory e ect, respectively. A protrusion hinders curving in the loop-forming direction if it is positioned "within the loop" since curvature is limited by the presence of the protrusion. Similarly, a protrusion assists curving in the loop-forming direction if it is positioned "outside the loop" by limiting the curvature in the non-loop-forming direction.

Supplementary model data for Figure 3: a thick chain with two protrusions
In order to model the regulatory e ect of two binding sites there is an additional control parameter added into the problem: the inter-site spacing, s. This implies that instead of the 2D heat map for the possible regulatory responses (Fig. 1C), we need to explore a 3D map of potential responses (N, s, k). In Supp. Fig. 3A-B we show two 2D cross-sections of the 3D heat maps: N vs. s at a constant value of k=42 links, and N vs. k at a constant value of s=21 links. In the 2D cross-sections, we notice the following: first, protrusions that are positioned "in phase" to the driver and themselves (Supp. Fig. 3A -blue patches) generate a strong repression signature as expected. Second, when the protrusions are positioned "out-of-phase" with respect to each other (e.g. Supp. Fig. 3A, s=26 links -green-yellow patches), only one protrusion is "in-phase" with the driver, while the other counteracts its e ect. As a result, the number of looping configurations excluded by the "in-phase" protrusion is roughly equal to the number excluded in the case of a single protrusion, but the total number of available configurations is reduced by the "out-of-phase" protrusion, leading to a reduced regulatory e ect. A closer examination of the in-phase tandems reveals (Supp. Fig. 3B) a similar heat map to the 2D heat map presented in Fig. 1C, but with a significantly greater amplitude of oscillations between the down-regulation minima and up-regulation maxima.

Supplementary experimental data for Figure 3: tandem TraR synthetic enhancers with varying looping length and constant s
We carried out additional measurements for tandem TraR synthetic enhancers, keeping the binding site spacing (s) and distance to the driver (k) constant while varying the looping length (N ), as follows: in-phase -s=23 bp spacing and k=54 bp, and out-of-phases = 28 bp and k = 54 bp. The expression level ratio data and probability ratio model predictions are plotted in Supp. Fig. 4A and 4C, respectively, with a schematic of the two synthetic enhancer designs shown in Supp. Fig. 4B.
Supp. Fig. 4A shows that both experimental data sets exhibit similar trends. The in-phase synthetic enhancers (s = 23 bp, blue dots) exhibit strong repression at small looping lengths (~30%), and for larger looping lengths a fluctuating regulatory response that converges on a repression value of 80% independent of the looping length. Despite the strong fluctuations, we also seem to be observing a faint periodic signature characterized by 4 cycles with a periodicity of~11 bp centered on peaks in expression level ratio for looping lengths N = 148, 160, 171, 181, 192. For the out-of-phase data set (s = 28 bp, red squares), the expression level ratio shows a similar trend, which is characterized by a weaker over-all regulatory response. The repression at short looping lengths is significantly weaker (70%) as compared with the in-phase case (30%), the convergence to a constant repression value seems to occur at a shorter looping length (N = 120 bp vs N = 140 bp), and the overall long looping length regulatory response seems to hover around the barely detectible expression level ratio value of 95%. Thus, the out-of-phase arrangement of the TraR binding site produces an expression level ratio behavior which is weaker than the in-phase arrangement, and overall is distinguishable from the in-phase expression level ratio response. In order to see if our model captures this behavior, we compared two constant k cross-sections across heat maps that were computed with s = 21 bp (Supp. Fig. 3B, dashed line) and s = 26 bp (Fig. 5A). Here, the model once again captures the experimental trends quantitatively. As in the experiment, the thick chain with "out-of-phase" protrusions produces a significantly di erent probability ratio level response as compared with the in-phase arrangement. The model shows (Supp. Fig. 5C) that the out-of-phase probability ratio level (s = 26 bp, red) produces a weaker response at lower looping lengths (60% vs~5-10%), converges to a steady state repression level at a lower looping length (150 bp vs 230 bp), and settles on an average weaker repression level (90% vs 80%) than for the in-phase arrangement (s = 21 bp, blue). Interestingly, the in-phase variable N model cross-section shows a sharp oscillatory function for all looping lengths, while the out-of-phase and the single binding site variable N cross-sections (Fig. 1E) do not exhibit a detectible 11 bp periodic signal across all looping lengths. Figure 4

Combined excluded volume and sti ening
In order to simulate the e ects of sti ening using the SAWLC model, we modified the simulation by allowing the protrusion to also sti en the thick chain locally by increasing the bending constant from a to 2a at the links attributed to a bound protein . Supp. Fig. 5A shows the e ect of a purely-sti ening region on a thick chain in a 2D heat map, while Supp. Fig. 5B shows the e ect of a single protrusion that also sti ens the chain (see schematic at the top). Both heat-maps show that a bias towards the quenching domain forms and persists for N >300 bp. The addition of the excluded volume e ect (Supp. Fig. 5B) generates a similar checker-board pattern to the ones shown in Figure 1, but this time with oscillations (see also Fig. 4C) between quenching states only in the region a ected by the sti ening of the chain.

Combined excluded volume and bending
Likewise, in order to simulate the e ects of bending using the SAWLC model, we modified the simulation by allowing the protrusion to also bend the thick chain locally by introducing small cumulative bends at the links attributed to a bound protein (see section 4.3.3). Supp. Fig. 5C shows the e ect of a purely-bending region by an angle of 10º on a thick chain in a 2D heat map, while Supp. Fig. 5D shows the e ect of a single protrusion that also bends the chain by an angle of 10º over 16 bp. In Supp. Fig. 5C the data show a similar checker-board pattern to the ones shown in Figure 1, however the magnitude of the regulatory e ect is significantly higher than the one produced by excluded volume e ect alone. In addition, the locations of the maxima and minima are opposite to those in Figure 1, due to the geometry of bending. A protrusion bound on the same side of the polymer as the activator impedes the chain's ability to bend in the direction of the protrusion, thus making formation of a loop less probable. In contrast, a bending protrusion facilitates the chain to bend in the direction of the protrusion (Supp. Fig. 1D). As a result, bending and excluded-volume have opposite e ects. Addition of the protrusion excluded-volume to the bending e ect (Supp. Fig.  5D) diminishes the magnitude of the regulatory e ect, due to the competition between the two opposing e ects. The locations of the minima and maxima of the combined e ect depend on the size of the protrusion. For a small-sized protrusion, the bending e ect dominates, and the locations of the minima and maxima correspond to Supp. Fig. 5C-inset. As the size of the protrusion is increased, the locations of the minima and maxima are shifted towards Figure 1, until the original order of Figure 1 is restored with the use of a large enough protrusion (Supp. Fig. 5D-inset). Figure 5

Experimental results and modeling: LacI-GST out-of-phase tandems
We carried out measurements of the regulatory e ect of synthetic enhancers bound by tandem of LacI-GST proteins in out-of-phase orientation (s=38 bp). The expression level ratio data is shown in Supp. Fig. 6A, together with the expression level data obtained for the synthetic enhancers bound by out-of-phase native LacI proteins previously shown in Fig. 5C. As for the case of the synthetic enhancers with tandem vs. single binding sites, the e ect of LacI-GST is to increase the overall repression e ect due to its larger size. This bias of the regulatory function towards a lower repression value is observed for all synthetic enhancers, as was shown in Fig. 5. Interestingly, however, while the overall trend of repression as a function of displacement (k) is conserved for both the native and fusion protein, the LacI-GST tandem does not seem to exhibit the expected 6 bp oscillations observed with LacI. Rather, the regulatory response seems to be constant with only the longer range looping length dependencies observed.
We reasoned that the loss of detectible oscillations could be due to the structure of LacI-GST, which presumably partially wraps the DNA. To crudely model this structure we simulated the thick chain with two out-of-phase protrusions made of three tangentially-attached spheres of the same size (radii of 2.72 nm), generating a partially wrapping structure when bound to DNA (Supp. Fig. 6B). Specifically, we compared two types of protrusions: a "triangle"-shaped protrusion (Supp. Fig. 6Bleft) with additional spheres at 45 ¶ relative toû i , and partially wrapping protrusion with additional spheres at 45 ¶ relative to ≠û i (Supp. 6B-right). We reasoned that as an out-of-phase tandem, the two aspherical protrusions should form partial arcs around the thick chain, whose excluded volume sensitivity to orientation should be diminished as compared with a spherically symmetric protrusion.
In Supp. Fig 6C, we show the model prediction for both types of protrusions when positioned in outof-phase tandems on a thick chain. The model shows that the amplitude for the partially wrapping protrusion is significantly reduced as compared with the "triangle" shaped protrusion, which exhibits a more pronounced 6 bp periodicity. Thus, we can conclude that the partially-wrapping protrusion generates a more consistent probability ratio prediction with the expression level ratio measurements, and is more likely to reflect the underlying structure of LacI-GST.

Experimental results: gel shift assay for TetR
In order to test the dumbbell binding structure of TetR(B) to dsDNA, we carried out gel-shift experiments with purified His-tagged TetR and TetR-GST. We reasoned that purified TetR that binds to DNA as a dimer of dimers should do so not only as a homodimer of dimers (TetR-TetR -TetR-TetR), but as a heterodimer of dimers (TetR-TetR -TetR-GST-TetR-GST) as well. We assumed that both TetR and TetR-GST would already be in dimer form before mixing. Thus, a 200 bp piece of DNA containing the TetR binding site and the suspected secondary cryptic site should exhibit three discrete shifted gel bands for TetR-TetR, TetR-TetR-GST, and TetR-GST -TetR-GST complexes. In Supp. Fig. 6D we show the results for the gel shift experiment carried out with a mix of His-TetR-GST and His-TetR after properly calibrating for a 1:1 binding ratio for the two TetR protein species (data not included). The data show only two shifted bands with respect to the non-bound DNA (242 bp): the shifted band that appears for purified TetR (440 bp -middle band, presumably TetR dimer), and the band that appears for purified TetR-GST (850 bp -top band, presumably TetR-GST dimer). Thus, while we have strong evidence for the formation of a dimer-of-dimer structure in vivo, the gel shift experiment at the very least seems to rule out the formation of a heterodimer-of-dimer structure in vitro, and does not provide additional support for the structural interpretation of our in vivo data.

Supplementary bioinformatics data for Figure 6: pAstC/pAruC promoters
To provide further support to the observations described in Fig. 6, we carried out a comparative study on the annotated pAstC/AruC promoter in E. coli, S. typhimurium, and P. aeruginosa. In all three cases a ‡ 54 promoter is driven by NtrC or its P. aeruginosa homologue CbrB to generate expression of the ast/aru genes. In Supp. Table 2 we depict the structure of these bacterial enhancers as a function of their location on the genomes, and values of control parameters. By examining this table it is clear that once again structure is remarkably conserved. The enhancers for E. coli and S. typhimurium are almost precisely conserved, in terms of looping length and positioning of the ArgR sites. The only di erence seems to be the shifting of the site by roughly a helical periodicity upstream. It is important to note that ArgR binds a tandem of sites to form a dimer-of-trimers, which is capable of bending DNA by roughly 70 . This implies that the four sites really should be viewed as a tandem of binding sites . For S. typhimurium and P. aeruginosa this architecture is particularly conserved, with each tandem separated by approximately three helical periodicities. In E. coli this symmetry is broken by having three half-sites closely positioned, and an additional half-site separated from the arg432 cassette. The significance of this is unclear. However, the conservation of argR arrangement between P. aeruginosa and S. typhimurium, and the overall conservation of enhancer size and binding site locations between S. typhimurium and E. coli (except for the shift in the binding site Arg2) are highly consistent with the previous observations for the qrr genes. each). The remaining set contained 114 hit sequences with unique 500 bp sequences upstream of the hit start position (see Supplementary Data 2).
After identifying the relevant genes, we extracted the sequence that was located immediately upstream of the qrr genes in order to annotate the putative ‡ 54 promoters. ‡ 54 promoters are located within 30 bp of the transcriptional start site (TSS), and exhibit strongly conserved sequences, especially around the -24/-12 positions. We based our search on a consensus sequence derived from [27], a large-scale analysis of 186 -24/-12 promoter elements from 47 di erent bacterial species. Interestingly, unlike ‡ 70 promoters, the consensus sequence for the ‡ 54 promoter is highly conserved across many bacterial species, which makes annotating these promoters a straightforward exercise. Consequently, to narrow our search, we examined the first 50 bp upstream of the BLASTN hit start position. Each position in the the 50 bp window was scored with the likelihood of the promoter to start at its site based on 16-bp ‡ 54 consensus matrix taken from [27], where we divided the non-consensus weight equally among the non-consensus nucleotides at each position. Scores were then normalized to a 0-1 range by subtracting the minimum possible score and dividing by the maximum possible score. Only promoters with normalized scores greater than 0.744 were accepted, based on the distribution of scores for 4000 randomly-sampled sequences (see Supp. Fig. 7A). This allowed the identification of a single putative ‡ 54 promoter for all 114 hits. We continued analysis for the 112 of these sequences that had unique 500 bp windows upstream of the promoter.

Extracting the LuxO binding sites
After identifying the promoters, we next proceeded to identify the LuxO-driver binding sites. Since LuxO is considered to be homologous to members of the NtrC family of response regulators [28], we hypothesized that as for the V. cholerae case, LuxO should drive expression from a tandem of binding sites that are spaced approximately two helical repeats (22 bp) apart. Finally, as with other NtrC-family members, we expected the tandems to be located several tens to a couple of hundred basepairs upstream of the putative ‡ 54 promoter.
To find the LuxO binding site tandems, we searched the 500 bp upstream of the respective putative ‡ 54 promoters. These upstream sequences were first scanned for 13 bp sequences that were candidate LuxO binding sites based on the only annotated LuxO sites in V. cholerae: TTGCATTTTGCAA and TTGCAATTTGCAA [24]. We chose a threshold of 9 of the 13 bases that had to match one of the two V. cholerae LuxO sites. We then computed the separation between the centers of any two candidate sites in the same upstream region. The center-to-center separation distribution is plotted in Supp. Fig. 7B. In the plot we observe three distinct peaks: at 7 bp, 13 bp, and 21 bp. The 7 bp peak corresponds overlapping binding sites, which are highly probable due to the fact that each binding site is almost an exact double repeat of TTGCAA, and thus this peak can be discounted. The peak at 13 bp is likely also an artifact due to the repetitive nature of the binding sites, with the first of the two sites overlapping the first TTGCAA of the real binding site, and the second binding site overlapping the second TTGCAA. This peak may additionally corresponds to adjacent binding sites without separation, but these are not of interest due to the low likelihood of physical binding of proteins next to one another. Finally, The peak at 19-25 bp corresponds to the desired tandem separation, and these pairs of sites were chosen to be the putative LuxO binding site tandems. Some of the upstream sequences had more than one potential tandem. In these cases, we chose the tandem that maximized the following criterion: score(binding site 1) + score(binding site 2) ≠ 1.01 ◊ (spacing ≠ 22), where score of a binding sites is the number of base pairs matching the LuxO binding site (maximized over both LuxO binding sites), spacing is the distance between the binding sites in bp, and where the 1.01 factor gives a slight advantage to optimal spacing when everything is equal. Using this analysis, we found 61 of 112 sequences with putative LuxO binding tandems and unique putative loop sequences, where we define loop sequences to be all bases between the last base of the putative tandem binding site and the first base of the putative promoter. This also allowed us to determine a putative consensus LuxO binding sequence to find bases at particular positions (depicted pictorially with the binding site logo graph in the inset of Fig. 6A): T T G C A A/T T/A T T G C A A.

AT/GC variation within the qrr enhancer sequences
The average relative identity values for both enhancer and upstream-enhancer sequences, averaged over all displacements (see horizontal lines in Figure 6C and  Figure 6C, we plot the AT and GC content as function of position within the loop sequences. The AT content is enriched at positions that are integer multiples of 10.6 bp, consistent with the helical periodicity of 10.45 bp found for the loop sequences using Fourier transform (FFT). While it is not clear what the significance of this enrichment is (e.g., sequences that are amenable for bending, binding site of some sort, etc.), it is interesting to note that we found no significant AT enrichment either at odd half-integer multiples of the helical periodicity, or for the non-looping enhancer upstream sequences.

Testing functionality of fusion proteins
To test additional protein size-related predictions of our model, we constructed a small set of fusion proteins, where a GST epitope was added at either the N-terminus or C-terminus of each of the TFs used in the experiment. All proteins were tested for functionality before a larger-scale synthetic enhancer experiment was performed.

TraR fusions
Fusion protein activity was tested by transforming cells with each of the pAct-Tra fusions as well as a synthetic enhancer bearing a TraR binding site. Both C-terminus and N-terminus fusions showed no e ect upon 3OC8 induction. We concluded that the TraR-GST fusions lost their DNA binding ability and thus were not useful for our purposes.

LacI fusions
We chose to construct a C-terminus GST fusion, since it was reported that N-terminus fusions disrupt DNA binding of LacI and are inactive [29]. The C-terminus LacI-GST fusion were sequence verified, and its activity was tested in by transforming cells each of the pAct-LacI-GST fusions as well as a synthetic enhancer bearing a LacI binding site. In Supp. Fig. 8G, we compare the dose response functions for LacI-GST and LacI, for the LacI synthetic enhancer with the binding site set at k = 95 bp. Both dose response functions show a specific interaction with the LacI binding site, with the transition to unbound state occuring at nearly the same concentration of IPTG. The figure also shows that the expression level ratio is significantly smaller for LacI-GST as compared with LacI.