Enhancement of gene expression noise from transcription factor binding to genomic decoy sites

The genome contains several high-affinity non-functional binding sites for transcription factors (TFs) creating a hidden and unexplored layer of gene regulation. We investigate the role of such “decoy sites” in controlling noise (random fluctuations) in the level of a TF that is synthesized in stochastic bursts. Prior studies have assumed that decoy-bound TFs are protected from degradation, and in this case decoys function to buffer noise. Relaxing this assumption to consider arbitrary degradation rates for both bound/unbound TF states, we find rich noise behaviors. For low-affinity decoys, noise in the level of unbound TF always monotonically decreases to the Poisson limit with increasing decoy numbers. In contrast, for high-affinity decoys, noise levels first increase with increasing decoy numbers, before decreasing back to the Poisson limit. Interestingly, while protection of bound TFs from degradation slows the time-scale of fluctuations in the unbound TF levels, the decay of bound TFs leads to faster fluctuations and smaller noise propagation to downstream target proteins. In summary, our analysis reveals stochastic dynamics emerging from nonspecific binding of TFs and highlights the dual role of decoys as attenuators or amplifiers of gene expression noise depending on their binding affinity and stability of the bound TF.

www.nature.com/scientificreports www.nature.com/scientificreports/ TF 42,[46][47][48][49][50][51][52][53][54][55] . However, these results are based on the assumption that sequestration of TF at a decoy site protects the TF from degradation. Relaxing this assumption to consider an arbitrary decay rate of the bound TF, we uncover a novel role of decoys as both noise amplifiers and buffers. We systematically characterize the parameter regimes leading to these contrasting roles in terms of the bound vs. free-TF stability, number and affinity of decoy sites. Finally, we study noise transmission from the TF to a downstream target protein reporting counterintuitive effects in some cases, for example, decoys amplify noise in the level of a TF but reduces noise in the level of the TF's downstream target protein.

Results
Model formulation. To study the role of decoy binding sites, we consider a TF that is synthesized in stochastic bursts (Fig. 1). Such bursty expression of gene products has been experimentally observed in diverse cell types [56][57][58][59][60][61][62][63] , and corresponds to distinct mechanisms at the transcriptional and translation level. For example, a promoter randomly switches from a transcriptionally inactive to an active state, and quickly turns off after producing a bursts of mRNAs 59,[64][65][66] . Moreover, assuming short-lived mRNAs, each synthesized mRNA is rapidly degraded after translating a burst of proteins [67][68][69] . We phenomenologically model the combined effect of transcriptional and translational bursts by considering a Poisson arrival of burst events with rate k x , and each event creates B x molecules, where B x is an independent and identically distributed non-negative random variable following an arbitrary distribution [70][71][72][73][74][75] . More specifically, B x = i with probability α x (i) where i ∈ {0, 1, 2, …}. We consider a general form for the burst size distribution α x (i) throughout the paper, except for plotting and simulation purposes where α x (i) follows a geometrical distribution 69,[76][77][78] .
Consider N decoy binding sites in the genome, with the TF binding/unbinding to each decoy site with rates are k b and k u , respectively. As motivated earlier, we allow for both the free (unbound) and bound TF to decay with rates γ f and γ b , respectively. Let x f (t) denote the number of free TF molecules, and x b (t) the number of bound TFs at time t inside a single cell. Then, the stochastic evolution of x f (t) and x b (t) is governed by the following probabilities x t x t dt x t k x t dt TF unbinding: Probability{ ( ) ( ) 1, x t x t dt Free TF degradation: Probability{ ( ) ( ) 1} ( ) , (1d) Each equation here defines a stochastic event that occurs with a certain probability in the small time interval (t, t + dt], and whenever the event occurs, the population counts change by discrete integer amounts. Based on the underlying stochastic chemical kinetics, these occurrence probabilities are either independent (as in 1a), or linearly/nonlinearly dependent on the molecular counts. For readers convenience, we provide a list of model parameters along with their description in Table 1. Expressing γ b in terms of γ f as γ b = βγ f , β = 0 corresponds to no decay of bound TF, and β = 1 corresponds to equal degradation rates for both the free and bound TF. The key focus of this work is to understand the stochastic dynamics of x f (t) in different regimes of β. www.nature.com/scientificreports www.nature.com/scientificreports/ Based on the above discrete-state continuous-time Markov model, one can write a corresponding Chemical Master Equation (CME) that provides the time evolution of the joint probability density function p( for observing x f free TF, and x b bound TF molecules at time t 79,80 As the CME is analytically intractable, it is typically solved numerically through either the Finite State Projection algorithm [81][82][83] or various Monte Carlo simulation techniques [84][85][86][87][88] . Taking an alternative approach, we focus on the statistical moments of the molecular counts and use the well-known Linear Noise Approximation [89][90][91][92][93][94] to obtain closed-form formulas for the mean and noise levels. In addition to the LNA, we assume that the binding/unbinding reactions are very fast compared to protein synthesis and degradations. This fast binding/unbinding limit, which is also known as quasi-static equilibrium or adiabatic limit, is common in the gene expression modelling 29,95,96 and is supported by recent experiments 97 . This assumption simplifies our calculations to obtain the analytical expression for the Fano factors. As this limit implies negligible fluctuations due to binding kinetics, one may expect relaxing this limit can give rise to more variation in gene expression 95,97 . We have numerically verified that for our system, the qualitative behavior of results does not depend on this limit (see Fig. S1). noise in free tf counts in the absence of decoy sites. When there are no decoys (N = 0), the moments of the free TF count can be solved exactly from the CME. In particular, the steady-state mean level x f ,0 , and the Fano factor F x ,0 f (variance/mean) of x f (t) are given by 26,98 , ,0 2 f respectively, where 〈B x 〉 and B x 2 are the first and second-order moments of the burst size B x . Throughout the paper we use angular brackets to denote the expected value operation, and to represent the steady-state expected value. Note that the Fano factor is completely determined by the burst size distribution and is independent of the burst arrival rate and the protein decay rate. As expected, we recover the Poisson limit ( = F 1 x ,0 f ) for non-bursty production B x = 1 with probability one, and noise is super-Poissonian ( > F 1 x ,0 f ) for bursty production. In the special case where B x follows a geometric distribution Bound tf's degradation titrates the regulating activity of tf. In the presence of decoy (N > 0), exact solutions to the mean and noise levels are unavailable, and one has to resort to approximation techniques. Recall from (1), that the probability of the TF binding event is nonlinearly related to the molecular counts via the product term x f (t)x b (t). This nonlinearity results in unclosed moment dynamics -the time evolution of lower-order moments depends on higher-order moments 99,100 . For example, the dynamics of the means 〈x f 〉, 〈x b 〉 depends on the second order moment 〈x f x b 〉 (see SI, section 1). Typically approximate closure schemes are employed to solve for moments in such cases [101][102][103][104][105][106][107][108][109][110][111] . Here, we use the Linear Noise Approximation (LNA) method, where assuming small fluctuations in x f (t) and x b (t) around their respective mean values 〈x f 〉 and 〈x b 〉, the nonlinear term is lin- Exploiting this linearization, the probability of all events in (1) become linear with respect to the molecular counts, resulting in closed moment dynamics (see SI, section 1). A

Parameter Description
Parameter Description www.nature.com/scientificreports www.nature.com/scientificreports/ direct result of using the LNA is that the time evolution of the means is identical to the deterministic chemical rate equations Solving the above equations at steady state, and further considering fast binding/unbinding rates (k b → ∞, k u → ∞) for a given dissociation constant k d = k u /k b , yields the following mean levels for the unbound and bound TF respectively. Here, 〈 〉 x f ,0 given by (3) is the mean TF count in the absence of decoy sites. When bound TFs are protected from degradation β = 0, the mean free TF count becomes independent of decoy numbers 42,52 . In contrast, with bound TF degradation β > 0, the mean free TF count monotonically decreases with increasing decoy numbers, with the decrease being faster for stronger binding affinity (or lower dissociation constant). This point is exemplified in Fig. 2 x f as a function of N for different dissociation constants. In the limit of a small number of decoys, (6) can be approximated as ,0 ,0 and the rate of decrease of 〈 〉 x f with increasing N is inversely proportional to k d . In the limit of large N and β > 0,  www.nature.com/scientificreports www.nature.com/scientificreports/ Bound tf's degradation enhances noise in the free tf count. Next, LNA is used to derive F x f , the steady-state Fano factor of the free TF level in the presence of decoys. Assuming fast binding and unbinding of TFs to decoy sites (k b → ∞, k u → ∞, fixed k d = k u /k b ) we obtain (see SI, section 1 for details) , with x is the Fano factor in the absence of decoy binding sites (3), and f is the fraction of bound decoys. As expected, in the limit of no decoys (N → 0) or weakly binding decoys ( f ,0 and the fraction of bound decoy f becomes independent of N. In this scenario, (9) simplifies to ,0 ,0 As f is independent of N, it is clear from (10) that F x f monotonically decreases from F x ,0 f to 1 with increasing N (Fig. 2(B)). Thus, when the bound TFs are protected from degradation, the decoy sites function as a noise buffer 52 . An interesting finding that emerges form (9) is that when β > 0, F x f can vary non-monotonically with N. This point is illustrated in Fig. 2 as a function of decoy abundance for β = 1.
While F x f monotonically decreases to 1 for weak binding affinities (similar to the case of β = 0), for strong binding affinities F x f first increases with increasing N to reach a maxima, before decreasing back to the Poisson limit for large N. In essence, with degradation of bound TFs, decoys can function as a noise amplifier / 0 x f in the limit N → 0 leads to an analytical condition for noise enhancement -if the dissociation constant is below a critical threshold . It turns out that this condition for noise enhancement implies that the fraction of bound decoys β > + f 1/(1 ) when a small number of decoys are present. In spite of the noise amplification, it is important to point out that as N → ∞ irrespective of the value of β. Thus, strongly-binding decoys that function as a noise amplifier for small N, become a noise buffer for large N.
Overall these results suggest that decoys buffer noise in a variety of scenarios: β = 0 (irrespective of N and k d ), or for a large number of decoys (irrespective of β and k d ) or if decoys have sufficiently weak binding affinities (irrespective of β and N). In contrast, decoys function as a noise amplifier when β > 0 provided two other conditions hold -decoys have sufficiently strong binding affinities as per (11) and are not present in large numbers.
To check the validity of the linearization and fast binding/unbinding approximations, we perform kinetic Monte Carlo simulations using the Gillespie algorithm 112 to obtain numerically exact results. In Fig. 2, along with the analytical results (lines) we also plot the simulation results (symbols). The match between analytical and simulations results are quite well, especially for large and intermediate k d values. For small k d values with β ≠ 0, there is a clear deviation from the analytical results. In this regime, being fluctuations very large, the linearization approximation based on absence of large fluctuations may not be justified. However, as can be seen, the qualitative features do not depend on these approximations.
As illustrated in Fig. 2(A), bound TFs degradation reduces the number of available TFs with increasing decoy abundance. This naturally leads to the question: is the noise behavior reported in Fig. 2(C) also seen when the mean free TF count is held constant at given desired level? In Fig. 3, we plot the noise as a function N and k d for a given mean free TF count 〈 〉 x f by simultaneously enhancing the production rate k x as per (6). Our results show similar qualitative behaviors with decoys functioning as a noise buffer for β = 0 ( Fig. 3(C)), becoming a noise amplifier when β = 1 for sufficiently small N and k d (Fig. 3(B)). Interestingly, the region of noise amplification is greatly enhanced when bound TFs become more unstable compared to their free counterparts (Fig. 3(A)). noise in free tf counts in a mixture of strong and weak decoy binding sites. Inside cells, TFs bind to various decoy sites with different affinities 33 . How do fluctuations in the TF count gets affected by a mixture of nonidentical decoys? To address this question, we study a system two decoy types D1 and D2 that are present in numbers N 1 and N 2 , respectively. We assume that each free TF molecule can bind to both decoy types with the same (diffusion-limited) rate k b , but unbinds with different rates k u1 and k u2 , respectively, leading to two different dissociation constants k d1 = k u1 /k b and k d2 = k u2 /k b . As before, TFs are synthesized in random bursts, the free and bound TFs decay with rates γ f and γ b (the decay rates of TFs bound to D1 and D2 are assumed to be equal). Together with (1a) and (1d), the stochastic model is now governed by the following probabilities representing jumps in the population counts in the infinitesimal time interval (t, t + dt] www.nature.com/scientificreports www.nature.com/scientificreports/ Here x b1 and x b2 denote the number TFs bound to decoys D1 and D2, respectively. We apply the linear noise approximation to obtain time evolution of statistical moments (presented in the SI, section 2), and solve the resulting moment equations at the steady to compute the Fano factor numerically as the analytical formula for the noise level becomes quite involved.
Noise in the free TF counts is investigated in two complementary ways. In Fig. 4A, we fix the number of decoys D1 and their binding affinity such that D1 itself functions as a noise amplifier. The noise in the free TF counts is then plotted as a function of the number of decoys D2 and its binding affinity. Recall that the dashed line repre- , the noise with decoys is the same as the noise in the absence of decoys. Having a large number of decoys D2 makes the overall decoy mixture a noise buffer, with the dashed line showing how large of a pool N 2 is needed as a function of k d2 . In Fig. 4B we fix the binding affinity of both decoys and plot the noise level as a function of decoys abundances. Here the noise enhancement is only observed when both decoys are present at small numbers, and the decoy mixture is a noise buffer even if one of the decoys is present in sufficiently large numbers. It is interesting to note that in Fig. 4B, D1 by itself is a noise amplifier (gray line intersection with the x-axis), D2 by itself is a noise buffer (gray line intersection with the y-axis), but their combined presence (star on the dashed line) mitigates each other's effect, and the noise level is similar to when there were no decoys.
Quantifying noise propagation from tf to downstream target proteins. Having quantified the magnitude of fluctuations in the free TFs counts, we next focus our attention on the time-scale of fluctuations. Given that the available TF pool activates downstream target proteins, the time scale at which fluctuations relax back to mean levels is key in understanding downstream noise propagation 113,114 . For example, for a given noise level, more prolonged fluctuations in the free TF counts will lead to higher noise in the expression of the target protein.
The time-scale of fluctuations is characterized using the autocorrelation function defined as, www.nature.com/scientificreports www.nature.com/scientificreports/ where time t is sufficiently large for the system to reach the steady-state. In the absence of any decoy binding sites, the decay in the autocorrelation function is given by exp(−γ f τ) 52 . Note that this function only depends on the decay rate and independent of bursting parameters. Thus, the magnitude and time-scale of fluctuations can be independently tuned using (3) and (13) via the burst size and the decay rate. In the presence of decoy binding sites, we compute R(τ) numerically from stochastic realizations of x f (t) obtained via kinetic Monte Carlo simulations 112 . Figure 5 plots the decay of R(τ) as a function of lag-time τ, in the absence and presence of bound TF degradation. When the bound TFs are protected from degradations, the autocorrelation function shifts to the right leading to slower and more prolonged fluctuations in the free TF count. In contrast, when bound TFs are unstable, R(τ) shifts to the left, resulting in faster fluctuations, and the shift is more pronounced for larger values of β.
To understand how fluctuations in the free TF propagate downstream, we consider a case where the stochastic synthesis of target protein Y is activated by available TFs (see Fig. 1). Instead of incorporating the direct binding of TFs to the promoter of the target gene, we model the activation via a linear dose-response by the TF -the synthesis rate of Y is proportional to the number of available TF. We note that this linearity assumption is more appropriate when the binding affinity of TFs to the target gene is relatively small. This limit provides a theoretical understanding of noise propagation in the target protein as the mathematical formula for the noise can be derived.
The target protein is assumed to be produced in bursts B y with an arbitrary burst size distribution y y and burst frequency x f k y that increases linearly with the free TF count. The probabilistic events governing the production and decay of target proteins are given by where y(t) is the population count of protein Y at time t. Applying LNA to the stochastic model (1) and (15) yields the following expression for the steady-state Fano factor of y(t) in the limit of fast binding/unbinding (k b → ∞, www.nature.com/scientificreports www.nature.com/scientificreports/ x k B / f y y y is the steady-state mean level of the target protein, 〈B y 〉 is the average burst size, and B y 2 is the second-order moment of the burst size. Note that this noise level is made up of two components -the first component on the right-hand-side of (16) represents the noise from bursting of the target protein, and the second component is the noise in Y due to upstream fluctuations in the free TF count. In the absence of any decoy, (16) reduces to Our analysis shows that when bound TFs are stable (β → 0), the second noise component in F y monotonically decreases to zero with increasing decoy numbers. For β > 0, similar to the free TF count, noise enhancement in the target protein occurs for a small number of high-affinity decoys (see Fig. S2). However, both the magnitude (F y /F y,0 )and the region of the noise enhancement are smaller compared to that of the free TF count (see Fig. S2 and compare with Fig. 3).
Noise propagation can be measured by the ratio of the target protein noise to the free TF noise, F F / y x f . Figure 5C shows the noise propagation as a function of decoy numbers for different decay rates of bound TF. Here the y-axis intercept quantifies noise propagation in the absence of decoys. When the bound TFs are protected from degradation, noise propagation is first enhanced as expected from the right-shift of the autocorrelation function (Fig. 5A), but them sharply decreases at higher decoy abundances. Note that the increase in noise propagation seen at intermediate decoy numbers does not imply an increase in the target protein noise level, as at the same time the free TF noise level decreases with increasing N for β = 0 (Fig. 3). When bound TFs are unstable, noise propagation is reduced (Fig. 5C) due to faster time-scale of fluctuations of the free TF count (Fig. 5B). This implies that the noise increase seen in the free TF level when β > 0 (Fig. 3) is buffered by a decreased noise propagation, and hence, the region of noise enhancement for the target protein is reduced compared to that of the free TF count. The qualitative feature of noise propagation agrees with the stochastic simulations results. However, the quantitative match between stochastic simulations and the LNA results is poor for β > 0 (Fig. 5C).

Discussion
The sequestration of transcription factors by genomic decoy sites can either protect it from degradation 43,44 or make it more facile for degradation 45 . Here, we have systematically investigated how the magnitude and time-scale of TF copy-number fluctuations are impacted by the stability of the bound TF, the number and affinity of decoy sites. While this contribution focuses on TFs binding to genomic decoy sites, these results are applicable to other classes of proteins, for example, RNA-binding proteins binding to sites on RNA 115-118 . Our results show that the degradation of decoy-bound TFs critically impacts both the mean and noise levels of the free TF pool. More specifically, while the average number of free (available) TFs monotonically decreases with increasing decoy abundance, the noise levels can sharply increase at low/intermediate decoy numbers before . Lines are the plotted using the analytical formula of F y and F x f , given by (16) and (9)  www.nature.com/scientificreports www.nature.com/scientificreports/ attenuating to Poisson levels as N → ∞ (Fig. 2). This behavior can be contrasted to when bound TFs are protected from degradation, in which case the mean free TF count becomes invariant of N, and decoys always buffer noise (Figs. 2 and 3). When β > 0, high-affinity decoys can transition from being noise amplifiers to noise buffers as their numbers are increased (Fig. 2). This point is exemplified in Fig. 6 where high-affinity low-abundance decoys and low-affinity high-abundance decoys result in the same average number of freely available TFs, but with much higher noise in the former case. Moreover, a mixture of both decoy types can mitigate each other's effect leading to no noise enhancement or buffer (Fig. 4).
Closed-form formulas for the noise levels derived using the Linear Noise Approximation led to precise conditions for noise enhancement -the number of decoys is not large, and their binding affinity is strong enough such that the fraction of bound decoys is higher than 1/(β +1 ). For example, β = 1 for a stable TF whose turnover is primarily governed by dilution from cell growth, and this condition implies that more than 50% decoy sites must be occupied. The decoy-mediated noise enhancement is higher for large values of β and for more "sticky" or higher affinity decoys (Fig. 3). It is interesting to point out that the peak noise enhancement generally occurs when the number of decoy sites is comparable to the TF counts when both bound and free TFs decay with the same rate. For example, in Fig. 2C there is an average of 200 TF molecules in the absence of decoys and the noise peak is also seen around N ≈ 200. Note that our results are restricted to TF production in stochastic bursts, and it remains to be seen if these results generalize to other noise mechanisms in gene expression, such as, extrinsic noise that arises from intercellular differences in cell size and the abundance of expression machinery 98,[119][120][121][122] .
The speed of fluctuation in free TF counts also shows distinct features in the presence and absence of bound TF degradation. While the fluctuations become faster in the presence of bound TF degradation, as indicated by the left-shift of the autocorrelation function (Fig. 5), we see an opposite result when bound TFs do not degrade. These results have important implications on how noise propagates from the TF to downstream target proteins. For example, when β > 0 decoys can amplify noise in the free TF levels, while at the same time make fluctuations in TF counts relax back faster attenuating downstream noise transmission. As a result of these two opposing forces, the noise in the target protein may not increase even though the noise in the free TF level has increased. This can be seen for β = 5 in Fig. 3 where for N = 100 and k d = 10 the decoy is a noise amplifier, but as seen in Fig. S2, for the exact same parameter values the noise in the target protein has reduced compared to the no-decoy case.
The analytical formulas for the noise give theoretical insights into the role decoys on the noise enhancement. These are derived by linearizing the propensity for the binding event assuming small copy number fluctuations around the statistical mean and then taking fast bind/binding limit. Using numerical exact stochastic simulations, we have shown that the main results are good agreement with the theory. The quantitative match between theory and simulations can be poor where the fluctuations are large. This happens for higher decoy binding affinities and degradation rates for bound TFs. We numerically have found that results are not much sensitive to fast binding/ unbinding limit (Fig. S1). The objective of obtaining more accurate theoretical results is a matter of future work.
Our novel finding of the dual role of decoys as noise amplifiers/buffers encourages more investigation into the regulatory function of decoys in complex gene networks. In our study, we have considered target gene dose-response is linear. In future, we want to investigate the role of decoy sites in gene networks that contain nonlinearity. For example, in genetic/signalling circuits with oscillatory dynamics decoys can tune the oscillation frequency 48 , enhance coherence 47 , and it will be interesting to see if decoys can make biological clocks more robust to molecular noise. An exciting avenue of experimental research is to use decoys as manipulations of phenotypic Figure 6. High-affinity low-abundance decoys, and low-affinity high-abundance decoys result in the same mean free TF counts with contrasting fluctuation dynamics. Stochastic realizations of the number of free TFs for high-affinity low-abundance decoys (red), and low-affinity high-abundance decoys (blue) when β = 1, along with their corresponding probability distributions on the right. Both scenarios yield the same average number of free TF molecules, but high-affinity decoys drive significantly enhanced noise levels. www.nature.com/scientificreports www.nature.com/scientificreports/ heterogeneity. For example, aberrant expression of resistance markers in individual melanoma cells have been shown to be a driver of cancer drug resistance 5 , and decoy-mediated noise buffering can play a therapeutic role in reducing such outlier cells. In the context of the Human Immunodeficiency Virus (HIV), noisy expression of a key viral protein, Tat, controls the cell-fate outcome between active viral replication and latency 16,[123][124][125][126] . Latency is a dormant state of the virus and considered the biggest challenge in eradicating the virus from a patient since latent cells are long-lived and resistant to drug therapy 127 . Recently, small-molecule compounds have been identified that enhance noise in Tat expression for efficient reactivation of latent cells 128 , and here the role of decoys as noise amplifiers may allow for a Tat-specific compound-free approach for latency reversal.