Stochastic fluctuations can reveal the feedback signs of gene regulatory networks at the single-molecule level

Understanding the relationship between spontaneous stochastic fluctuations and the topology of the underlying gene regulatory network is of fundamental importance for the study of single-cell stochastic gene expression. Here by solving the analytical steady-state distribution of the protein copy number in a general kinetic model of stochastic gene expression with nonlinear feedback regulation, we reveal the relationship between stochastic fluctuations and feedback topology at the single-molecule level, which provides novel insights into how and to what extent a feedback loop can enhance or suppress molecular fluctuations. Based on such relationship, we also develop an effective method to extract the topological information of a gene regulatory network from single-cell gene expression data. The theory is demonstrated by numerical simulations and, more importantly, validated quantitatively by single-cell data analysis of a synthetic gene circuit integrated in human kidney cells.

has feedback regulation, the protein copy number n will directly or indirectly affect the switching rates a n and b n of the promoter between the active and inactive forms. Since many genes have complex epigenetic controls including dissociation of repressors, association of activators, or chromatin remodeling, we do not impose any restrictions on the specific functional forms of a n and b n . In 15 , the authors considered the case of linear feedback regulation with a a un n = + and = b b n , where a is the spontaneous contribution and un is the feedback contribution with u measuring the feedback strength. However, recent single-cell experiments on transcription of mammalian cells 22 have suggested that a n and b n are often saturated when  n 1 and thus are highly nonlinear. In the present work, we consider a more general case by allowing arbitrary nonlinearity.
In most applications, the switching rates of the promoter are fast 10,17 and the effective transcription rate of the gene is given by c a s b r a b ( ) /( ) n n n n n = + + . It is critical to note that the information of network topology is implicitly characterized by c n . If the network has a positive-feedback (negative-feedback) loop, then c n is an increasing (decreasing) function of n. If the network has no feedback, c n is independent of n. Let p n denote the steady-state probability of having n protein molecules. Experimentally, the lifetime of the mRNA is usually much shorter compared to that of its protein counterpart 12 . Once an mRNA is synthesized, it can either produce a protein with probability p u u v /( ) = + or be degraded with probability q v u v /( ) = + . Let λ = v d / denote the ratio of the protein and mRNA lifetimes. When  1 λ , the original Markov model can be simplified to a reduced model with geometrically distributed translation bursts 23 and the steady-state distribution of the protein copy number can be calculated analytically (Supplementary Information): when q p  . This shows that the steady-state probability p n decays exponentially with respect to the protein copy number n when n 1  with q being the exponentially decaying rate of the steady-state protein distribution. Here  q p is justified because = p q u v / / is the average number of proteins synthesized per mRNA lifetime, which is relatively large in living cells and typically on the order of 100 for an E. coli gene 24 . To identify q as an experimentally accessible quantity is of basic importance, as will be shown later.
Decomposition of the protein fluctuations. Experimentally, spontaneous stochastic fluctuations, often referred to as noise, in the protein abundance are usually measured by the squared relative standard deviation η σ = 〈 〉 n / 2 2 , where 〈 〉 n is the mean and 2 σ is the variance 25 . With the analytical steady-state protein distribution, it can be shown that the noise η can be decomposed into three different terms or two different terms as f where 〈 〉 n 1/ is the Poisson noise from individual births and deaths of the protein, 〈 〉 d v m / is the noise due to fluctuations in the mRNA abundance, and η = 〈〉〈 〉 n c n c Cov( , )/ f n n is the relative covariance between n and c n , which characterizes the strength of feedback regulation. We stress here that when the promoter switching rates are fast, the above decomposition formula and the expression of η f hold exactly without any approximation, even when the nonlinearity of feedback regulation is very high.
If the network has no feedback, then c n is a constant and 0 f η = . It is well known that the covariance between a random variable and an increasing (decreasing) function of this random variable must be positive (negative). Therefore, if the network has a positive-feedback loop, then c n is an increasing function of n and 0 f η > . Conversely, if the network has a negative-feedback loop, then c n is a decreasing function of n and η < 0 f . As a result, the sign of η f is completely determined by the network topology and we shall name η f as feedback coefficient. The above analysis clearly explains previous experimental observations that positive feedback generally amplifies noise 26 and negative feedback generally reduces noise 27 .
In the previous literature, there are confusing or even contradictory statements about the feedback-noise relationship. Some studies claimed that positive feedback reduces noise 28 , while negative feedback amplifies noise 29 . The reason for these seemingly contradictory results has been analyzed in 15,20 and here we shall use our noise decomposition formula to provide an clearer explanation. For a positive-feedback (negative-feedback) network, η is the total noise and η ± f is the noise amplified (reduced). Therefore, q n 1/ f η η − = 〈 〉 can be thought of as the feedback-free noise. In general, if all other rate constants remain unchanged, then positive (negative) feedback will lead to an increase (decrease) in the protein mean n 〈 〉 10 and thus lead to a decrease (increase) in the feedback-free noise q n 1/ 〈 〉. This decrease (increase) in the feedback-free noise may counteract the positive (negative) contribution of the feedback coefficient η f and give rise to an anomalous decrease (increase) in the total noise η. This explains why some experiments have observed anomalous noise suppression (amplification) in networks with positive (negative) feedback.
However, from the physical perspective, the feedback-free noise and feedback coefficient have completely different origins: the former characterizes fluctuations from individual births and deaths of the protein and mRNA, while the latter reflects the contribution of feedback regulation. Therefore, it seems logically insufficient to study the effect of feedback regulation on the feedback-free noise by fixing the underlying biochemical rate constants.
Scientific RepoRts | 7: 16037 | DOI:10.1038/s41598-017-15464-9 In fact, what positive (negative) feedback actually amplifies (reduces) is the very part of fluctuations that cannot be explained by the feedback-free noise.
Bounds for the protein noise. Negative feedback proves to be most interesting because it is responsible for the stability of a cell 27 . Since negative feedback reduces noise, it is natural to ask to what extent the noise is inevitable and whether the feedback coefficient η f could be strong enough such that the noise η is approaching zero 5,30 . In fact, for the three-stage model, the upper and lower bounds of the noise η are given by q n p dq qn | ′ | > is the steepness of the regulatory function c x ( ) obtained from c n by replacing n with a positive real number x and the term p dq / α is of the order of one for a wide range of biologically relevant parameters (Supplementary Information). These bounds provide the limits on the ability for a negative-feedback loop to suppress protein fluctuations. We stress here that this lower bound is new and is different from the one derived in 30  η η − is the feedback-free noise, f η − is the noise reduced, and η is the total noise. Then the efficiency of the negative-feedback network, as a noise filter, can be defined as γ f . The lower bound in Eq. (4) reveals a general biophysical principle: The efficiency of a negative-feedback network must satisfy . This fact is similar to Carnot's theorem in classical thermodynamics, which claims that the theoretical maximum efficiency of any heat engine must be smaller than 1.
If all other cellular factors are constant, the protein will display a small-number Poisson noise 24 . When α > d, the lower bound in Eq. (4) is smaller than n 1/〈 〉, which shows that η may be even smaller than the Poisson noise in the negative-feedback case (see Fig. 2b). Recent experiments have shown that although the variance of expression levels is larger than the mean for most genes, there are still some genes whose variance is less than the mean 31 . This fact is well explained by our theory. From Eq. (3), if the network has no feedback or a positive-feedback loop, η is always larger than the Poisson noise (see Fig. 2a). In the positive-feedback case, similar upper and lower bounds for the noise η can also be obtained (Supplementary Information), which provide the limits on the ability for a positive-feedback loop to enhance protein fluctuations.
Inference of feedback topology using single-cell data. When a network has nonlinear feedback regulation, the mean and variance are not enough to determine the steady-state protein distribution and the information of higher-order moments will play a crucial role. In fact, Eq. (3) can be rewritten in a more illuminating form as  This equation is of crucial importance because it bridges the feedback topology of a gene circuit and experimentally accessible measurements. In particular, it reveals a quantitative relation between the feedback coefficient f η , whose sign is fully determined by the network topology, and the digital features of the steady-state protein distribution, characterized by the mean n 〈 〉, variance 2 σ , and decaying rate q, which reflects the overall effect of higher-order moments. This provides an effective method to extract the topological information of a gene regulatory network from single-cell gene expression data. From single-cell data, the three digital features, and thus the feedback coefficient f η , can be estimated robustly (Supplementary Information). If f η is significantly larger (smaller) than zero, one has good reasons to believe that there is a positive-feedback (negative-feedback) loop regulating this gene.
In single-cell experiments such as flow cytometry and fluorescence microscopy, one usually obtains data of protein concentrations, instead of protein copy numbers. Let = x n V / be a continuous variable representing the protein concentration, where V is a constant compatible with the macroscopic scale. It is easy to see that the noise n / 2 2 η σ = 〈 〉 will not be affected by the scaling constant V and thus is dimensionless. In terms of the protein concentration, the mean will become 〈 〉 n V / and the decaying rate will become qV (Supplementary Information). Therefore, the product of these two terms is also dimensionless. This indicates that the above method not only applies to single-molecule data of protein copy numbers, but also applies to single-cell data of protein concentrations. The above analysis also suggests a crucial difference between the two decomposition formulas (2) and (3): The former only applies to data of protein copy numbers, while the latter also applies to data of protein concentrations.

Experimental validation
To validate our theory, we apply it to a synthetic gene circuit (orthogonal property of a synthetic network can minimize "extrinsic" noise) stably integrated in human kidney cells, as illustrated in Fig. 3) 32 . In this circuit, a bidirectional promoter is designed to control the expression of two fluorescent proteins: zsGreen and dsRed. The activity of the promoter can be activated in the presence of Doxycycline (Dox). The green fluorescent protein, zsGreen, is fused upstream from the transcriptional repressor LacI. The LacI protein binds to its own gene and inhibits the transcription of its own mRNA, forming a negative-feedback loop. The negative-feedback strength can be tuned by induction of Isopropyl β-D-1-thiogalactopyranoside (IPTG). As the control architecture, the red fluorescent protein, dsRed, is not regulated by IPTG induction, forming a network with no feedback. The steady-state levels of the zsGreen and dsRed fluorescence are measured under a wide range of IPTG concentrations and two Dox concentrations (low and high) by using flow cytometry.
For each fixed IPTG and Dox concentrations, we can estimate the mean n 〈 〉, variance 2 σ , and decaying rate q for the steady-state distribution of the zsGreen or dsRed fluorescence. Then the feedback coefficient f η can be estimated from Eq. (5). In the high Dox case, Fig. 4a,b illustrate the noise η, feedback-free noise η η − f , and feedback coefficient η f of the zsGreen and dsRed proteins under different IPTG concentrations, respectively. For the zsGreen protein, the feedback coefficient f η is negative under all IPTG concentrations. With the increase of the IPTG concentration, the negative-feedback strength becomes increasingly weaker and the feedback coefficient f η tends to zero. In contrast, for the dsRed protein, the feedback coefficient η f fluctuates around zero in a narrow range under different IPTG concentrations. These results are in full agreement with our theory with high accuracy. As a result, our method correctly extracts the topological information of the synthetic gene circuit in both qualitative and quantitative ways. In the low Dox case, the noise η, feedback-free noise η η − f , and feedback coefficient η f of the zsGreen and dsRed proteins are illustrated in Fig. 4c,d, respectively, and similar conclusions can be drawn.
Although it has been observed that negative feedback suppresses molecular fluctuations 32 , it remains difficult to quantify the corresponding effect 5 . Our theory provides a quantitative characterization of such effect. In the high Dox case, the negative-feedback effect is the strongest when the IPTG concentration is zero. In this situation, the feedback-free noise is  One of the potential applications of our theory is to provide a mechanism-driven method to identify the differentially expressed genes (DEGs) of two different cell populations such as tumor and non-tumor tissues. Most of the existing methods searched the DEGs by identifying the difference in the mean levels of the two cell populations under some a priori assumptions on the protein or mRNA distribution such as the negative binomial distribution 31 . However, the effect of noise amplification or suppression caused by feedback loops is not addressed by these methods, which may result in incorrect predictions (Supplementary Information). Our theory indicates that even if the means and variances of the two cell populations are both very close, one is still able to find the DEGs by detecting the difference in feedback topology. If the signs of the estimated feedback coefficients f η of the two cell populations are different, one has good reasons to believe that there is a change in the topological structure of the underlying gene regulatory network when a non-tumor tissue becomes a tumor one.

Discussion and Conclusions
Here we present a comprehensive analysis of the three-stage model of stochastic gene expression with nonlinear feedback regulation. By taking the limit of a large ratio of protein to mRNA lifetimes, we derive the analytical steady-state distribution of the protein copy number. Furthermore, we decompose the protein noise according to different biophysical origins. The resulting decomposition formula reveals a quantitative relation between stochastic fluctuations and feedback topology at the single-molecule level. In particular, we show that the protein noise η can be decomposed into the sum of two parts: the feedback-free noise 〈 〉 q n 1/ and feedback coefficient η f , whose sign is totally determined by the network topology. Both the two parts can be estimated robustly from single-cell gene expression data via three experimentally accessible quantities: the mean 〈 〉 n , variance σ 2 , and decaying rate q. Such relation not only enables us to quantify the effects of noise amplification or suppression caused by feedback loops, but also allows us to extract the topological information of the underlying gene regulatory network from single-cell gene expression data. The feasibility of this approach is validated quantitatively by single-cell data analysis of a synthetic gene circuit integrated in human kidney cells.
We stress that our results depend nothing on the specific functional forms of the effective transcription rate c n except for its monotonicity, which makes our theory highly general. One of the most powerful parts of our theory is that it can be applied to gene regulatory networks with highly nonlinear feedback. In the present paper, all the derivations are based on the assumption of rapid promoter switching, under which the fluctuations due to promoter switching are averaged out. Intuitively, in the regime of slow promoter switching, our noise decomposition formula (3) should be amended as where η > 0 s is the noise due to promoter switching. Because of the contribution of η s , the difference between the total noise η and feedback-free noise q n 1/ 〈 〉 must be positive in positive-feedback networks and may be either positive or negative in negative-feedback networks due to the competition of η < 0 f and η > 0 s . The above analysis is in full agreement with our numerical simulations in Fig. 5. The ignorance of s η in the present paper is the cost for deriving an analytical protein distribution in networks with nonlinear feedback regulation.
In fact, the idea of noise decomposition in terms of different biophysical origins was first proposed by Paulsson in his pioneering work 25 . However, this work was focused on the decomposition of the local noise around the fixed point of the underlying biochemical reaction system, instead of the global noise of the entire probability distribution, by using the fluctuation-dissipation theorem, also called first-order van Kampen's expansion 24 . In networks with no feedback, a decomposition of noise into the feedback-free noise q n 1/ 〈 〉 and promoter switching noise η s can be found in 12,33 . In the present work, we obtain a noise decomposition in networks with feedback regulation, albeit in the regime of fast promoter switching. There are two major advantages of our decomposition formula (3). First, it can be applied to the situation when the nonlinearity of feedback regulation is very high. Second, all the three contributing terms in the decomposition formula can be estimated robustly from single-cell gene expression data.
In the regime of slow promoter switching, it is difficult to give an intrinsic definition of the promoter switching noise η s since the promoter switching rates a n and b n could be both nonlinear functions of the protein copy number n. In fact, an alternative definition of η s has been proposed in 20 with the aid of the macroscopic limit of a piecewise-deterministic Markov process. By assuming linear feedback regulation and ignoring the mRNA kinetics, the authors decomposed the protein noise into the superposition of the protein birth-death noise, promoter switching noise, and correlation noise. Although their correlation noise is similar to our feedback coefficient (see the green curves in Figs. 2 and 3 of Ref. 20 ), their protein birth-death noise is a constant independent of feedback regulation and thus is very different from our feedback-free noise. Finally, we would like to point out that the lower bound of the protein noise in negative-feedback networks was first derived in 5 by using concepts in information theory. However, this work is based on the diffusion approximation, with approximated Gaussian fluctuations, of the underlying discrete Markov model. A lower bound of the protein noise without diffusion approximation was derived recently in 30 . Our lower bound (4) is more explicit than the one obtained in 30 and is tighter in the regime of strong noise suppression.
Although we have shown how single-cell measurements may be used to reveal the feedback sign of a gene regulatory network, it is conceivable that in the near future, further advances in live-cell imaging with single-molecule resolution could allow the theory to be tested at the single-molecule level.

Methods
The numerical simulations in Figs. 2 and 5 are based on the Gillespie algorithm. The single-cell gene expression data of the synthetic gene circuit analyzed during this study are included in the published article 32 .