A single-molecule view of transcription reveals convoys of RNA polymerases and multi-scale bursting

Live-cell imaging has revealed unexpected features of gene expression. Here using improved single-molecule RNA microscopy, we show that synthesis of HIV-1 RNA is achieved by groups of closely spaced polymerases, termed convoys, as opposed to single isolated enzymes. Convoys arise by a Mediator-dependent reinitiation mechanism, which generates a transient but rapid succession of polymerases initiating and escaping the promoter. During elongation, polymerases are spaced by few hundred nucleotides, and physical modelling suggests that DNA torsional stress may maintain polymerase spacing. We additionally observe that the HIV-1 promoter displays stochastic fluctuations on two time scales, which we refer to as multi-scale bursting. Each time scale is regulated independently: Mediator controls minute-scale fluctuation (convoys), while TBP-TATA-box interaction controls sub-hour fluctuations (long permissive/non-permissive periods). A cellular promoter also produces polymerase convoys and displays multi-scale bursting. We propose that slow, TBP-dependent fluctuations are important for phenotypic variability of single cells.

G-Efficiency of photobleaching correction. Individual mRNA molecules were detected in a timeseries experiment. Left: graph displays the number of single pre-mRNA detected before (blue) and after (red) bleaching correction. Middle left: total intensities of single pre-mRNA molecules before (blue) and after (red) photobleaching correction. Middle right and right panels: estimated parameters of the 3D Gaussians fitted to single RNA molecules, for the different time-points and without (blue), or with bleaching correction (red).
H-Comparison of the intensity of single RNAs in cells expressing variable levels of MCP-GFP.
Each data point summarizes the results of the 3D Gaussian fit applied to all spots detected in one cell (199 calibration stacks were used). The median of the estimated amplitude (y-axis) is plotted as a function of the median of the estimated background of nucleoplasmic MCP-GFP (x-axis).
I-Accuracy of live cell calibration. Violin plots show comparison of quantification of active TS with smFISH (left plot for each reporter; N>500), and calibrated value of TS from all time points of MS2 movies (right plot for each reporter; N>60000). Reporters with WT or mutant TATA boxes were used (1T2G and 4G, see Figure 3). Calibrated MS2 values larger than the maximum value obtained by smFISH were removed from the analysis.   A-Cumulative distribution of non-permissive periods in the 8h movies (black) and monoexponential (blue) and bi-exponential (red) fits, for the WT and the 1T2G and 4G reporters.
13 B-Best fitting parameters of the cumulative distribution frequencies of the non-permissive period duration in the 8h movies. Cdf: cumulative density function.
C-Distributions of permissive periods in the 8h movies, for the WT, 1T2G and 4G reporters.
Permissive periods are not expected to follow an exponential distribution because the pre-mRNA remains at the TS for about two minutes after its synthesis, after the promoter turns OFF. The promoter can thus switch back ON before releasing all its pre-mRNAs. Several individual ON periods can thus be merged and considered as a single event, resulting in complex distributions for the duration of permissive periods. This is especially visible for the WT and 1T2G reporters, because of their low frequencies of non-permissive periods (see panel A, insets, and Figure 4).

Supplementary Figure 6. Replicate analysis of HIV-1 smFISH in High Tat cells and parameter values for distribution fits
A-B-Comparison of independent replicates of smFISH data. Histograms show the distribution of the number of released pre-mRNA per cell in the cell population, for each of the indicated reporter (> 400 cells for each replicate). Red: replicate 1; green replicate 2. SmFISH was performed with a set of 40 probes against the MS2 repeat and images were acquired on an OMX microscope using SIM-reconstruction to obtain super-resolution images. Released (A) and nascent (B) pre-mRNAs were counted with FISH-quant.
C-Comparison of the parameter values of the best fitting simulations for the two replicates of the experimental distributions. Values are averages of the 10 best-fitting simulations using the model of Figure 4A; standard deviation in parenthesis. Underlined: the value for k on2a was fixed (see Methods). and mono-exponential (blue) and bi-exponential (red) fits, for the POLR2A reporter.

Linearity of UP and DOWN ramps indicates key properties of polymerase convoys
The UP and DOWN ramps of many isolated transcriptional cycles appeared to be linear (Supplementary Figure 3A). To quantify the linearity of UP and DOWN ramps, they were fit with linear models. R 2 coefficient and residual normality, evaluated with a Shapiro test, were recorded for each ramp. We observed a median determination coefficient higher than 0.

RampFinder
The calibrated time traces measuring the number of RNA molecules of TS over time were analyzed with dedicated tools to identify UP and DOWN ramps, as well as isolated transcription cycles. RampFinder was written in R and utilized user-defined criteria to recognize the various elements of the transcription cycles: plateau, UP ramps and DOWN ramps. In practice, raw data were first smoothened using a local polynomial averaging using 20 data points. Plateau were defined by a series of at least 2 points with a mean slope comprised between -0.033 and +0.033 RNA per second and with a maximal slope lower than 0.06. UP-ramps must be preceded by a plateau and should additionally contain a series of at least 6 points with a positive slope greater than 0.06 RNA/s. DOWN-ramps were defined by at least 6 points with a slope lower than -0.06 RNA/s. Manual examination of the results indicated that these parameters allowed reliable identification of ramps. Isolated transcription cycles were finally defined by a succession of either plateau-UP ramp-DOWN ramp, or plateau-UP ramp-plateau-DOWN ramp, with the mean value of the first plateau having less than 6 polymerases. The raw calibrated data corresponding to isolated transcription cycles were then displayed and manually inspected, and finally stored in a file for future use with the fitting routine (RampFitter, see below). To measure the duration between two transcription cycles, we also recorded the start of the next UP-ramp.

RampFitter
To extract quantitative parameters of transcription, isolated transcription cycles were fit with the polymerase convoy model using a dedicated script in R (RampFitter). The model predicts the TS intensity as a function of the progression of a polymerase convoy across the reporter gene, and it has four variables (see Figure 2C, first polymerase exits the repeat
Fitting the function I(t) to the data was done using a non-linear minimization routine in R. Model 1 and 2 were included in the function minimized and selected according to convoy length, such that a single solution was obtained. Note that the equations are correct only for L2≤L MS2 .

Confidence intervals and limits in the measurements of v el
In the case of model 2 (L T >L MS2 ), the slope of the linear part of the UP ramp is 1/ t space and v el impacts only the non-linear phase of the UP ramp, i.e. the shoulders before and after the linear phase. If the convoy is long compared to the MS2 repeat, most of the data points will be in the linear phase and only a small fraction of the data will thus depend on v el . In this case, v el cannot be determined accurately. In practice, this occurred in about 60% of the cases, in which only a minimal value was estimated. This value was determined by systematically decreasing v el starting with the best fit value, and used a t-test to determine if the fit significantly worsens, taking into account the experimental variability.
Fitting the individual transcription cycles indicated that 82% correspond to Model 2, 15% to Model 1, and less than 3% to Model 3. This agreed with the analysis of the slope of the UP and DOWN ramps, which in the majority of the cases were equal as expected from Model 2 (see Supplementary Figure 3D-F). t-statistics were used to evaluate how well the parameters were determined by the fit. t space was well determined in 95% of the cases, t proc in 60% of the cases and v el in only 40% of the cases. In the other 60% of the cases, a minimal v el value was computed as described above.

Simulation and spacing variability
In order to test the ability of our model to correctly estimate the parameters of polymerase convoys, we performed simulations. We simulated polymerases initiating and moving through We first simulated uniform polymerase spacing with a mean value of 4s, using an elongation rate of 4kb/min and a 3'-end processing time of 100 seconds. As shown in the Supplementary Table I, the analysis accurately recovered these input values. Next, we compared convoys with uniform polymerase spacing to convoys with exponential spacing, and varied parameters of convoys. In all cases, the mean spacing was properly identified. This indicates that the model performs well and is able to estimate the correct mean parameter values for convoys having either uniform or exponential distribution of spacing time. We also simulated convoys with variable number of polymerases (not shown). We found no correlation between polymerase spacing and the number of polymerases whithin convoys, as observed with the real data ( Figure   2G). This is thus a property of the biological system and not an artefact of the analysis pipeline.

Supplementary Note 4: Mechanical modelling of DNA torsional forces during progression of a polymerase convoy
Main results: (1) in a convoy of active polymerases, any local desynchronization (pausing or slowing down of one polymerase locally affecting the spacing to neighboring polymerases) produces DNA supercoiling and a local increase of torsional energy, which in turn generates an apparent force sufficiently strong to restore the initial distance between the polymerases. (2) The ensuing collective behavior of the convoy enhances the stall force (that is, the hindering force capable of stopping polymerase activity) of polymerases in the convoy. Note that these results are valid whether polymerases are evenly spaced or not, but that the model does not explain how spacing is set.
Quantitative statements: (1) if the polymerase (denoted hereafter) is lagging and its distance to the following polymerase decreases, supercoiling appears in the DNA segment separating them. The increase in torsional energy generates an effective force on which is equal to where is the Boltzmann constant, the temperature, the twist persistence length of DNA at this temperature, and the pitch of the (relaxed) DNA double helix (that is, the distance along the DNA axis corresponding to one turn, i.e. to ). If the distance between the polymerases and decreases by 1%, and the force exerted on is .
(2) Denoting the stall torque hindering the activity of an isolated polymerase (related to the stall force according to ), the stall torque of the polymerase within a convoy increases linearly with the convoy length, e.g. it equals for the central polymerase of a convoy of 3 polymerases.

Detailed calculations (1): Effective forces generated by torsional coupling.
We consider a convoy of elongating polymerases. Convoys are observed on actively transcribed genes, and presumably, chromatin in such regions is decondensed; no end constraints are considered. Distances are measured along the DNA axis. Polymerases enter DNA in (TSs) and leave DNA in (termination site). The initial distances between successive polymerases are fixed by the promoter activity (or anything controlling the loading of polymerases onto DNA at the TSs). We label polymerases in the order of their loading onto DNA.
We assume that polymerases actively perform a translation along the DNA axis, due to a strong is simply the relative variation in the distance separating and . We denote (resp. ) the position of (resp. ) so that (see Figure 6E). The potential energy of torsion stored in this DNA stretch is thus (2) Plugging into the expression of yields The force exerted on the polymerase by the one thus writes (4) Now and, because , we may approximate by so that Numerically, the prefactor of equals about (with , and bp/s), a relative variation of 1% of their distance amounts to as few as 3 bp. This means that the distance between neighboring is exquisitely regulated by supercoiling constraints.

Detailed calculations (2): Collective effects and convoy propagation as a whole.
What remains to be solved now is the complete problem of transport of harmonically coupled polymerases (the effective force is analogous to that exerted by a spring since the supercoiling equals the relative increase of the initial distance). This problem is reminiscent of some issues in traffic theory, e.g. the motion of an array of harmonically coupled vehicles (Wagner, 2001).
We first notice that the force exerted on is equivalent to a torque .
Actually the force and the torque form a "screw" (more precisely a "wrench").
We  Therefore the resulting force acting on is . At the same time each flanking polymerase also experiences an opposite force , namely is pulled backwards and is pushed backwards (see Figure 6E). The blocked polymerase will resume its motion as soon as , i.e. when the constraints have accumulated enough so that reaches a threshold value Obviously must be smaller than , which amounts to . Otherwise both flanking polymerases would also stop before has resumed its motion.
(iii) If , both and stop before has resumed its motion, so that the disturbance propagates and the remaining part of the convoy enters the scene: starts pulling while starts pushing , both with the same force of increasing magnitude as and continue their motion.
The same scenario as above applies and we deduce that the constraints keep on accumulating on until either resumes its motion (if ) or both and stop (if ).
(iv) We conclude that the RNAP stall torque within a convoy of N polymerases may be as large as NK.
The linear velocity V of the polymerase along the DNA axis is equal to where h is the local pitch of the DNA. The pitch h is hardly affected by supercoiling because, as shown before, relevant values of supercoiling during elongation are never larger than 1%. Therefore the pitch variations have a marginal effect on the velocity V as compared to the variations of ω due to pauses, roadblocks or opposing forces. Note that the pitch variations have also a marginal effect on the torque that results from the force .
It is worth noting that trafficking onto biological filaments is closely related to the motion of molecular motors. The basic model of out-of-equilibrium systems, ASEP, was introduced to account for the relative motion of ribosomes onto mRNA 5 . Theoretical literature on polymerase motion simply applies this model to polymerases onto DNA. In particular only steric interactions are considered. However, a basic difference is that DNA experiences torsional constraints that do not affect the single-strand mRNA. While ASEP is a valid model for ribosome processivity, it is meaningless for polymerases. Accordingly, we do not expect to observe any synchronization and coherent motion for ribosomes (no ribosome convoys).

Supplementary Note 5: Cloning of the MS2 repeats and HIV-1 reporter plasmids
Two-variants of 16xMS2 were cloned, but one had two accidental mutations expected to inactivate binding to two of the stem-loops. Repetitions of these cloning steps then generated MS2x64 and MS2x128 repeats. Unlike the original MS2x24 sequence, these repeats were stable in bacteria and in cells.
HIV-1 plasmids were derived from pExo 6 . This plasmid contains an HIV-1 vector flanked by human genomic DNA. The vector contains two complete long terminal repeats (LTR) and is deleted for HIV-1 sequences between the viral packaging site and the RRE element. A cassette containing the FRT recombination sequence followed by a hygromycin resistance gene devoid of AUG initiator codon and followed by a polyA site was PCR amplified from pcDNA5 and cloned in the HIV-1 vector to generate the Flp-In plasmid pIntro. The original MS2x24 repeat of pExo was removed and a 45 bp linker containing five unique cloning sites was inserted in the NotI site of HIV-1 intron, between the packaging site and the RRE element. The MS2x128 repeat was then excised from pMK-MS2x128-XbaI with BamHI and EcoRV, and cloned in the HIV-1 intron of the modified pIntro, using the BamHI and SnaBI sites, to generate the plasmid pIntro-MS2x128.

Supplem kinetical
The follo It is kinet