Allostery through protein-induced DNA bubbles

Allostery through DNA is increasingly recognized as an important modulator of DNA functions. Here, we show that the coalescence of protein-induced DNA bubbles can mediate allosteric interactions that drive protein aggregation. We propose that such allostery may regulate DNA's flexibility and the assembly of the transcription machinery. Mitochondrial transcription factor A (TFAM), a dual-function protein involved in mitochondrial DNA (mtDNA) packaging and transcription initiation, is an ideal candidate to test such a hypothesis owing to its ability to locally unwind the double helix. Numerical simulations demonstrate that the coalescence of TFAM-induced bubbles can explain experimentally observed TFAM oligomerization. The resulting melted DNA segment, approximately 10 base pairs long, around the joints of the oligomers act as flexible hinges, which explains the efficiency of TFAM in compacting DNA. Since mitochondrial polymerase (mitoRNAP) is involved in melting the transcription bubble, TFAM may use the same allosteric interaction to both recruit mitoRNAP and initiate transcription.

P recise communication between DNA-binding proteins is critical for many life processes, including the transcription, replication, and organization of DNA. In all of these cases, appropriate proteins form clusters, required to either initiate or execute the entire process. Although the origin of such protein assemblies is unclear, they are often assumed to be driven by direct protein-protein interactions. This assumption limits the role of DNA to simply facilitating the presence of proteins through protein-DNA interactions. Very recently, however, it has been shown that DNA may play a more active role in its own functions 1 . It has been demonstrated both experimentally 1,2 and computationally 3,4 that DNA deformations induced by binding proteins affect the affinity of other nearby proteins. In other words, allosteric signaling through DNA is also possible 5 . Most related studies have been restricted to two types of conformational DNA changes: stretching and bending 1,2,4,6,7 . In this work, we examine how local protein-induced unwinding of the double strand (bubbles) can also facilitate a different type of allosteric signaling. Since the local melting of DNA increases its flexibility 8 and also exposes the genetic code to RNA polymerase, such an allosteric signal potentially regulates both transcription and gene compaction.
Mitochondrial transcription factor A (TFAM) is an excellent example to test such a hypothesis since there is strong evidence that it locally unwinds mtDNA [9][10][11] . Structurally, TFAM consists of two high mobility group (HMG) box domains A and B connected with a linker and ending with a C-terminal tail attached to Box B 12 (Fig.  1a). TFAM binds specifically close to the light strand promoter (LSP) and heavy strand promoter (HSP1) to form the transcriptional machinery by recruiting transcription factor B2 (TFB2M) and mitochondrial polymerase (mitoRNAP) 9,[13][14][15][16] . TFAM also binds nonspecifically and plays a critical role in mtDNA compaction 17,18 . The physical mechanism behind the dual function of TFAM is still unclear. Very recent experimental studies have shown that both specific and non-specific binding introduce a sharp U-turn in the mtDNA [18][19][20][21][22][23] , which, although seemingly vital in forming and appropriately orienting the transcription machinery, does not explain the high efficiency of DNA compaction in the presence of TFAM 18 . Instead, the ability of TFAM to slide rapidly on mtDNA and, upon colliding, to form stable and immobile oligomers seems to be directly related to mtDNA compaction 17 . It has been proposed that such an aggregation of two TFAM proteins melts a region of two to three base pairs (bp) at the point of contact, thus creating fixed flexible hinges that enhance mtDNA flexibility 17 . However, very recent high-resolution experiments have revealed that TFAM oligomers are neither stable nor immobile 24 . It is thus unclear how such highly diffusive hinges of limited lifetime could effectively compact mtDNA molecules.
The ability of TFAM to unwind mtDNA at the end of each HMG box (see Fig. 1b) has two critical consequences. First, it creates two flexible hinges that can potentially increase the flexibility of the DNA. Second, it effectively generates an attractive interaction that drives TFAM oligomerization. We show that the mechanism underlying this allosteric interaction can be an unbalanced force created by the coalescence of two TFAM-induced bubbles. The role of thermally induced local openings of the double strand appears to be critical, since it affects both the transmission of the allosteric signal and the stability of the aggregations. The main result of TFAM oligomerization is excitation of a considerably larger bubble (hinge) at the point of contact of two TFAMs, which increases mtDNA flexibility even further and regulates compaction. Interestingly, TFAM binds specifically about 20 bp away from the transcription starting point, which, as we show below, is within the range of the allosteric attraction of two bubbles. Since both TFB2M and mitoRNAP are involved in exciting the transcription bubble 25 , TFAM can help the two proteins excite the transcription bubble and then a coalescence of the transcription bubble and a TFAM bubble could stabilize the transcription machinery.

Results
To avoid computationally expensive atomistic molecular dynamics simulations, we use the extended Peyrard-Bishop-Dauxois (EPBD) model to describe the local melting dynamics of DNA. EPBD is a one-dimensional (1D) mathematical model with a demonstrated capability for reproducing experimental results on both the mechanical and thermal denaturation of DNA [26][27][28][29][30][31] . The potential energy of the EPBD model is: where y i describes the distortion of the ith base pair from its equilibrium position. The hydrogen bonds of a base pair are modeled by Morse potentials (first term in Eq. (1)), while the stacking interactions are described by nonlinear springs (second term in Eq. (1)). The model, although simple, takes into account the sequence specificity that is reflected in the parameters D i , a i , k i,i 2 1 , r, and b. In this study, we will use the values of the parameters in Ref. 27, which have been adjusted to reproduce a variety of experimental observations. The sliding of TFAMs on DNA is assumed to be purely 1D. The interaction between the protein and DNA has two parts: where R ij 5 r j 2 ia is the distance of the center of the jth protein from the ith base pair and a is the distance between two consecutive base pairs. The first part of the equation,  [9][10][11] . The coefficient C 5 tanh[cy i ], where c 5 1 Å 21 , controls the strength of the interaction. It increases linearly until the base pair opens (y i $ 2 Å ) and then it plateaus. A schematic representation of the interaction potential is presented in figure 1c.
Since experiments indicate that TFAM proteins do not form oligomers in the absence of DNA (see Ref. 18, for instance), we neglect any possible direct attraction and use a Weeks-Chandler-Andersen (WCA) potential 33 to describe the repulsion (soft sphere) between two TFAMs: where r ij is the distance between the centers of the ith and jth proteins, and e is the interaction strength. The total direct protein-protein interaction energy of a system of multiple proteins is As explained below, the parameters A 1 , A 2 , c 1 , c 2 , and e of equations (2) and (3) have been adjusted to reproduce the experimentally observed cooperative binding of TFAM 17 (see Methods). The protein size is assumed to be s 5 28 bp, an estimate that is in good agreement with most experimental observations 17,18 . To study the behavior of this TFAM-DNA model, we perform Langevin dynamics simulations at a temperature T 5 300 K (see Methods). The potential energy of equation (2) melts a segment three to four bp long at the end of each HMG box.
Bubble-mediated allosteric protein-protein interaction. To test our hypothesis that the coalescence of bubbles drives protein aggregation, we perform standard potential of mean force (PMF) calculations (Methods). Figure 2 presents the PMF between two TFAMs in a homogeneous AT and GC molecule, which, as predicted, has an attractive structure. Protein aggregation is triggered by spontaneous thermal openings in the double strand. These openings (or thermal bubbles) exist even at temperatures well below the melting transition and are a result of the interplay between entropy, nonlinearity, and sequence specificity 26,28,34 . The communication between two proteins begins when they diffuse to positions where a spontaneous bubble nucleation of length approximately equal to their surface-to-surface distance is possible (Fig. 2 ii). This local thermal melting reduces the system's total free energy and creates an unbalanced force that pushes the two proteins toward each other (Fig. 2 iii). This represents a new type of allostery initiated by protein-induced bubbles and transmitted through thermal bubbles. The depth (,4.23 k B T), the average surface-to-surface distance (,10 bp), and the range (,20 bp) are three of the main characteristics of the allosteric potential presented in Figure 2. The parameters  (2) and (3) were tuned so that the depth provides a cooperative factor of ,70, as estimated in the experimental work of Ref. 17. The coalescence of two small TFAM-induced DNA bubbles can be viewed as the elimination of two half-bubbles, i.e. two forks, from the system. The activation energy of such small forks is associated with the energy cost to unzip a base pair. Thus, the elimination of two forks lowers the free energy of the system by approximately the depth of the interaction potential presented in Figure 2. The average surface-to-surface area includes approximately 10 melted bps, which indicates that TFAM oligomerization provides an additional and significantly larger flexible hinge than the flexible hinge of a monomer would. Thus, TFAM oligomerization could potentially increase the flexibility of a DNA molecule and consequently regulate DNA compaction 17,24,35,36 . In the limit of maximum coverage of DNA by TFAM the energetically most favorable hinge is 3 bp, i.e. equal to the surface-to-surface distance, d 0 , that corresponds to the minimum of PMF (see Fig. 2). Interestingly, this 3 bp melted segment in the limit of high TFAM concentration was also predicted by the authors of Ref. 17 using an independent calculation based on the counter length of DNA. Based on our analysis, the effective size of TFAM is s eff 5 s 1 d 0 or s eff 5 31 bp and the maximum number of TFAMs a DNA molecule can host is L DNA /s eff , where L DNA is the length of the DNA. According to Figure 2, a TFAM can attract another TFAM or other proteins from a distance of approximately 20 bp. This result is particularly important when we discuss below the role of TFAM in transcription initiation. The PMF is also sequence dependent. We see that, in homogeneous AT DNA molecules, the range of the potential is longer than in homogeneous GC molecules; however, GC regions support more stable dimerization. Thus, in a realistic DNA molecule, AT-rich regions can facilitate the long-distance transmission of allosteric signals, while GC regions provide a more stable aggregation.
Reversibly assembled protein aggregates. Due to the finite depth of the interaction potential, the picture of multiple TFAMs sliding on mtDNA is expected to be a typical example of 1D reversible particle-particle aggregation. In such systems, one expects oligomerization and dissociation events, as well as a reduction of mobility due to oligomerization, crowding, or even dynamically arrested states 37 . In general, large bubbles induced by TFAM oligomerization are expected to contribute more to DNA compaction than small hinges. However, their excitation, lifetime, and mobility ultimately determine their effectiveness. A large bubble with a short lifetime or high diffusivity, for instance, would have a very small probability of fully developing and melting DNA at a certain position. The question is, however, to what extent does our model agree with recent experimental observations and, in particular, the data presented in Refs. 17, 18, 24? To obtain a qualitative picture of the dynamics of the system, we perform a standard Langevin simulation of 10 TFAMs in an 1000 bp (,0.33 mm) long mtDNA sequence. Figure 3a shows that the position of all TFAMs as a function of time is qualitatively similar to that in the experimental work of Ref. 24. In agreement with these authors, protein oligomerization and dissociation events, oligomer/monomer diffusion, and entrapment due to sequence specificity are also present in our numerical simulations. However, even the Langevin dynamics of a 1D model cannot access scales similar to the experimental ones (,sec and ,10 mm). To overcome this limitation, we implement standard Monte Carlo (MC) simulations of multiple TFAMs interacting with the average PMF shown in Figure 2 (Methods). Figure 3b shows the MC time evolution of a similar system to that in Figure 3a, but for scales directly related to the experimental ones. We emphasize, however, that MC simulations do not take into account the sequence of the DNA; thus, entrapment due to sequence specificity is not observed. Figure 4a shows the distribution of the oligomer size, n, for different values of the coverage, c, of DNA by TFAM. The number of large flexible hinges is simply n 2 1. In Figure 4b we show the mean square displacement (MSD) of the TFAMs as a function of time for the same values of c. We observe three distinct regions. The first region (I) describes the cluster diffusivity prior to collisions. It is purely linear and the slope determines the diffusivity of the system, which scales inversely with the oligomer size appropriately weighted by the distribution of sizes presented in Figure 4a. For intermediate times (region II) the system considerably slows down due to caging effects, i.e. clusters are arrested by nearby clusters and thus only the dynamics within the cage is described. The long time limit (region III) also shows linear behavior that is due to crowding effects 37 . A similar transition from the ballistic regime of individual proteins to region I is also observed but not shown in this plot. All three regimes affect DNA packaging. The slower a TFAM oligomer is, the more stable and long-lived are the developed large hinges.

Discussion
In this work, we showed that protein-induced local melting of DNA is an alternative allosteric mechanism to drive protein assembly. Below, we discuss how TFAM may use such a mechanism to regulate DNA compaction and transcription initiation. Although we focus on TFAM, we believe that other proteins of the HMG family may also use the same allosteric signaling to control DNA functions [in preparation].
Our simulations support the hypothesis that the mechanism of flexible hinges induced by TFAM oligomerization underlies the compaction of mtDNA by TFAM (see Fig. 5a). More accurately, as we show in this work, the concept of spontaneously generated diffusive flexible hinges with finite lifetimes is closer to the experimental picture of Ref. 24. Although both small and large hinges contribute to DNA compaction, large hinges are expected to have a more significant impact, since they are energetically more favorable and diffuse much more slowly. Small hinges are more effective in specific binding or entrapment due to sequence specificity. Assuming that the compaction of DNA is primarily regulated by large hinges, one can use the distribution of hinges presented in Figure 4a to estimate the persistence length of mtDNA for different concentrations of TFAM. If P 0 and P p are the persistence lengths of the mtDNA in the absence of TFAM and fully covered by TFAMs, respectively, then, the persistence length for any value of coverage c can be estimated by 38 where q is the number of hinges for a given c normalized by the maximum number of hinges, L DNA /s eff , in a DNA molecule. According to Ref. 17 P 0 5 45 nm and P p 5 3.9 nm. The value of q can be calculated from the distribution of hinges presented in Figure  4a. Figure 5a compares the persistence length estimated by equation (5) (2) and (3) can also describe the effect of TFAM mutants on compaction efficiency 18 . TFAM mutants missing either Box A or Box B exhibit significantly lower compaction efficiency. Based on our model, with such mutants, which can be described by eliminating one of the two terms of V 2 , only dimerization is possible. This leads to a significantly lower number of flexible hinges, which ultimately reduces the flexibility of the DNA. Mutants with a modified linker (L6) present similar behavior and reduce the efficiency of compaction by approximately the same amount. It appears, as we explain in more detail below, that the L6 mutant reduces only the ability of Box A to unwind the DNA molecule, which can be interpreted in our model by using an asymmetric strength in the expression of V 2 . This modification also provides only dimers, thus reducing the mutant's ability to compact the DNA in a way similar to that of a mutant that is missing Box A. Finally, the dimer mutants presented in Ref. 18 also show a significant reduction in compaction efficiency. Dimer mutants do not interact strongly with each other because their surface has been   modified. Their ability, however, to locally unwind the DNA on both sides of the TFAM is preserved. In our model, dimer mutants can be interpreted by increasing the repulsive potential of equation (3). The resulting PMF will have a smaller depth, which will finally reduce the lifetime of TFAM oligomerization and, as a result, reduce the bendability of the DNA. However, it has to be mentioned that, even in the case of a perfect dimer mutant, which is the limit of hard spheres, caging effects can also provide some large flexible hinges that could contribute to DNA compaction 37 .
According to our hypothesis, in an intermediate step, TFAMinduced bubbles first assist TFB2M and mitoRNAP to excite the transcription bubble and then a coalescence of the two bubbles stabilizes the transcription machinery. The size of the resulting bubble (,10 bp) is consistent with the typical size of transcription bubbles 25,[40][41][42] . Since TFAM can melt DNA in both Boxes A and B, the same mechanism can be used to activate transcription at LSP and HSP1, as presented in Figures 5c and 5d, respectively. This leads to the creation of a large hinge on the promoter's side and a small hinge on the other side of TFAM. These two hinges in combination with the strong interaction between the TFAM tail and TFB2M can also explain why a U-turn is present in LSP and not necessarily present in HSP1 (see Figs. 5c and 5d) 18 . It is worth noting that dimer mutants do not affect transcriptional activity, which is in accordance with our hypothesis, mentioned above, that dimer mutants preserve the ability to locally melt the double strand. Additionally, it indicates that a dimer mutant modifies the repulsive interaction between TFAMs but not necessarily the repulsion between a TFAM and other proteins. In Ref. 18, the L6 mutant appears unable to activate transcription in LSP. According to our hypothesis, this observation implies that L6 does not melt the DNA in Box A and consequently cannot recruit TFB2M and mitoRNAP by using the mechanism described above. That L6 can unwind mtDNA only at Box B is further supported by the fact that L6 activates HSP1. It can also activate LSP only upon interchanging the box domains of TFAM 18 .

Methods
Langevin Dynamics. The Langevin equations of motion for the base pairs and proteins are, respectively, and where i 5 1,2, … N b and j 5 1,2, … N p . Here, N b represents the total number of base pairs and N p the number of proteins sliding on the DNA. The molecular weight of a base pair is m 5 600 Da and for a TFAM is m p 5 29 kDa. The potential energy V DNA , V int , and V prot are given by equations (1)-(4). The parameters of V int and V prot were fitted to reproduce the experimentally observed cooperativity factor of TFAM binding affinity. Specifically A 1 5 0.025 eV, A 2 5 0.13 eV, c 1 5 2 Å 21 , c 2 5 0.225 Å 22 , and e 5 0.125 eV. The phenomenological Langevin friction coefficients are g 5 0.1 ps 21 (for the base pairs), and g p 5 0.1 ps 21 (for the proteins). The stochastic forcesj n t ð Þ andJ n t ð Þ are modeled as Gaussian random noise with ð Þ, respectively, where T is the temperature and k B is the Boltzmann constant. The equations of motion were integrated numerically using a second order Runge-Kutta method 43 with periodic boundary conditions. The time step dt 5 0.001 ps ensured stable and accurate simulations. For each simulation the system was initially thermalized for 50 ns before starting to monitor the trajectories.
Potential of Mean Force. The effective force between two proteins was probed through a harmonic spring of strength k connecting the centers of the two proteins. The equilibrium length L 0 of the spring varied between s 2 3 bp and s 1 100 bp. For each L 0 , we performed 100 independent Langevin simulations of 1 ms duration to compute the average inter-protein distance AEr ij ae. The mean force between the two proteins was estimated by F 5 k(L 0 2 AEr ij ae). The potential of mean force (PMF) was then calculated by a simple integration of the computed force in space.
Monte Carlo Simulations. TFAM proteins interact through the average PMF presented in Figure 2 (green line). In each MC step all proteins are moved by dx~ffi ffiffiffiffiffiffiffiffiffiffiffi 2D 0 dt p , where D 0 5 0.08 mm 2 /s is the reference diffusivity as calculated in Ref. 24 and dt 5 0.003 s is the time step of an MC step. Each trial move of a protein is accepted/rejected based on the standard Metropolis algorithm. Before each MC sampling the system was thermalized for 1 sec. All results were obtained by averaging 100 independent MC simulations.