Abstract
The invention of the Fourier integral in the 19th century laid the foundation for modern spectral analysis methods. This integral decomposes a temporal signal into its frequency components, providing deep insights into its generating process. While this idea has precipitated several scientific and technological advances, its impact has been fairly limited in cell biology, largely due to the difficulties in connecting the underlying noisy intracellular networks to the frequency content of observed singlecell trajectories. Here we develop a spectral theory and computational methodologies tailored specifically to the computation and analysis of frequency spectra of noisy intracellular networks. Specifically, we develop a method to compute the frequency spectrum for general nonlinear networks, and for linear networks we present a decomposition that expresses the frequency spectrum in terms of its sources. Several examples are presented to illustrate how our results provide frequencybased methods for the design and analysis of noisy intracellular networks.
Similar content being viewed by others
Introduction
Modern microscopy and the advent of a wide array of fluorescent proteins^{1} have afforded scientists the unprecedented ability to monitor the dynamics of living biological cells^{2}. The rapid pace of development in imaging technology coupled with advanced image processing techniques has made it viable to obtain highresolution timelapse livecell data for a multitude of celltypes and biological processes. Recent innovations in microfluidics make it possible to quantitatively measure singlecell dynamics for long periods of time over multiple generations^{3,4,5}. These trends underscore the need for developing theoretical and computational tools that are specifically geared towards quantitatively extracting information about intracellular networks from live singlecell imaging data. One of the main reasons why the development of such tools is mathematically challenging is that the dynamics of singlecells is inherently noisy due to randomness in molecular interactions that constitute intracellular processes, and hence singlecell dynamics must be described with stochastic models that are more difficult to analyse than their deterministic counterparts^{6}. These stochastic models usually represent the reaction dynamics as a continuoustime Markov chain (CTMC) and the existing methods for analysing them have mostly focussed on solving the chemical master equation (CME) that governs the evolution of the probability distribution of the random state^{7}. While these methods have been successfully applied in several significant biological studies^{8,9}, they typically do not account for temporal correlations in timetraces of living cells, but rather they are designed to connect network models to flowcytometry data^{10} where temporal correlations are anyway lost due to discarding of the measured cells. Temporal correlations are a feature of singlecell trajectories that contain valuable information about the underlying network, and in order to access this information we need computational methods that can efficiently deduce the temporal correlation profile from a given stochastic reaction network model.
As is wellknown in engineering and physics communities among many others, frequencydomain analysis is a powerful way to analyse random signals and systematically study temporal correlations. In particular, a signal’s power spectral density (PSD) measures the power content at each frequency, and it is related to the signal’s temporal autocovariance function via the Fourier Transform (see Box 1). The PSD of a singlecell trajectory is intimately related to the underlying network’s architecture and parametrisation within the observed cell^{11}. There exist many studies that have successfully unravelled this relationship and discovered mechanistic principles for specific examples of reaction networks. For example, in ref. ^{12} the role of feedbackinduced delay in generating stochastic oscillations is explored and in ref. ^{13} a stochastic amplification mechanism for oscillations is found. Notably, the exact PSD for linear reaction networks was derived in ref. ^{14} and this was used to show how in gene expression networks posttranslational modification reaction reduces the noise by serving as a lowpass filter.
Other works in this direction have relied on approximating the CTMC with a stochastic differential equation (SDE) such as the linear noise approximation (LNA)^{15} or the chemical Langevin equation (CLE)^{16}. With these SDEbased approaches the protein PSD for generegulatory networks was investigated in refs. ^{17,18,19}, the relationship between input and output PSD for a singleinput singleoutput system was computed in ref. ^{20}, the singlecell PSD for a general biomolecular network in the vicinity of a deterministic Hopf bifurcation was determined in ref. ^{21} and corrections to the LNAbased PSD estimates were systematically derived in ref. ^{22}. Even though SDE approximations make the problem of computing the PSD analytically tractable, their accuracy is severely compromised if any of the species are in low copynumbers, as is the case for many synthetic networks where low copynumbers are desired in order to reduce metabolic load on the host cell^{23}. Moreover, even when the species copynumbers are uniformly large, the accuracy of SDE approximations can only be guaranteed over finite timeintervals^{24}, and hence the PSD, which is estimated at steadystate, could have an error (see the example presented in Fig. 4). It must be noted that for linear networks these approximations yield the exact PSD but if the network has nonlinear propensities then the error in the derived PSD expression can be significant^{25}. In order to address these issues, we need PSD estimation methods that work reliably for CTMC models, especially in the low copynumber regime, without requiring any dynamical approximations. The aim of this paper is to develop such a method.
In a recent paper^{26}, the analytical relationship between the PSDs of the output species and its timedependent production rate was derived for CTMC models of certain reaction networks including birthdeath and simple gene expression. While this analysis enables investigation of the dynamics of the protein creation process from experimentally measured protein timetraces, it does not extend to nonlinear networks, such as gene expression networks with transcriptional feedback, for which some analytical results exist for simplified models^{27}.
A recurring theme in the existing literature is that typically the autocovariance function is wellapproximated by the sum of a few exponential functions^{18,20,26}, and consequently the PSD is a rational function of a special form. This low dimensional feature can be theoretically explained by appealing to the compactness of the resolvent operator^{28} associated with the CTMC, which as we prove, is connected to the PSD. Exploiting this connection we develop the multipoint Padé approximation^{29} technique for estimating the PSD for a general nonlinear stochastic reaction network. This method, which we refer to as Padé PSD, computes the PSD expression based on certain stationary expectations. We design efficient Monte Carlo estimators to estimate the required expectations by generating a handful of simulations of an augmented CTMC, constructed by adding certain statecomponents and reactions to the original CTMC. We show how this augmented CTMC construction not only facilitates PSD estimation but also its empirical validation.
Our PSD estimation approach is semianalytic, in the sense that analytical expressions for the PSD are found by first estimating certain quantities with simulation. Such approaches have become increasingly popular in recent years, as they provide viable solutions to nonlinear problems which are otherwise analytically intractable^{30}. Analytical expressions for the PSD are known in the special case of linear reaction networks^{14}, where all reaction propensity functions are affine functions of the state variables. We show how this expression can be alternatively derived via the resolvent connection and we also generalise this result to allow for arbitrary timevarying inputs. This generalisation yields a PSD decomposition result that is similar to what was found in previous SDEbased studies^{20} and it extends the recent results in ref. ^{26}.
Given a stochastic reaction network model, commonly the singlecell PSD is estimated with nonparametric methods by first simulating a trajectory, and then sampling it at finitelymany timepoints to obtain a discrete timeseries whose PSD can be straightforwardly computed with the Discrete Fourier Transform (DFT)^{31}. Either one can apply the DFT directly to the timeseries to estimate the PSD or one can first estimate the autocovariance function and then compute its DFT (see Box 1 for more details). While the latter approach is computationally very expensive due to the autocovariance function computation, the former approach yields an inconsistent estimator for the PSD, which implies that the estimator variance does not vanish, even as the timeseries length tends to infinity. To mitigate this inconsistency issue, PSDs from several independent trajectories are averaged, at the cost of significant computational burden as trajectory simulations are timeconsuming. More importantly, the averaged PSD may still not be accurate because it is based on discrete sampling of continuous signals that can cause the problem of aliasing which distorts the estimated PSD by introducing frequency components corresponding to the sampling operation (see Chapter 1 in ref. ^{32}). As shown by the Nyquist’s Sampling Theorem^{33} we can mitigate this aliasing effect by choosing the timestep parameter that is smaller than half of the reciprocal of the maximum frequency represented in the signal. However, for stochastic dynamics this criterion is unusable as the range of frequencies in the signal is very wide and picking a very small timestep can lead to computational intractability. These issues motivated us to devise Padé PSD that is not based on discretesampling and provides a parametric approach for estimating the PSD that rather than relying on only the output signal, uses full information contained in the stochastic model of the dynamics.
We illustrate our results with applications of relevance to both systems and synthetic biology. Using our PSD decomposition result for linear networks, we demonstrate how PSDs enable differentiation between two fundamental types of adapting circuit topologies, viz. Incoherent Feedforward (IFF) and Negative Feedback (NFB)^{34}, in the presence of dynamical intrinsic noise. We also present an example where the phenomenon of singlecell entrainment is examined in the stochastic setting using our PSD decomposition result. Employing Padé PSD we illustrate how the performance of certain synthetic circuits, with noisy dynamics, can be optimised. Specifically, we examine the problem of optimising the oscillation strength of a wellknown synthetic oscillator (called the repressilator^{35}) and the problem of reducing singlecell oscillations which can arise when an intracellular network is controlled with the antithetic integral feedback (AIF) controller^{36} that has the important property of ensuring robust perfect adaptation despite randomness in the dynamics and other environmental uncertainties. Lastly, we present examples to highlight how our Padé PSD method helps in the study of oscillations caused by celldivision cycles as well as facilitate parameter inference from experimentally measured singlecell trajectories, by providing clean and accurate estimations of the PSD. Interestingly, inferring a parameter with PSD does not require the explicit knowledge of the proportionality constant that relates the measured signal to the copynumber of the output species^{37}.
Results
The stochastic model
We first describe the CTMC model for a reaction network and define the resolvent operator associated with it. We then connect this operator to the PSD. This connection shall be exploited later to develop our analytical and computational results.
Consider a reaction network with d species, called X_{1}, …, X_{d}, and K reactions. In the classical stochastic reaction network model, the dynamics is described as a continuoustime Markov chain (CTMC)^{7} whose states represent the copy numbers of the d network species. If the state is x = (x_{1}, …, x_{d}) and reaction k fires, then the state is displaced by the integer stoichiometric vector ζ_{k}. The rate of firing for reaction k at state x is governed by the propensity function λ_{k}(x). Under the massaction hypothesis^{7}
where θ_{k} is the rate constant and ν_{jk} is the number of molecules of X_{j} consumed by the kth reaction. Formally, the CTMC (X(t))_{t ≥ 0} representing the reaction kinetics can be defined by its generator \({\mathbb{A}}\), which is an operator that specifies the rate of change of the probability distribution of the process (see Chapter 4 in ref. ^{38}). It is defined by
for any realvalued bounded function f on the statespace which consists of all accessible states in the ddimensional nonnegative integer lattice.
For each state x, let p(t, x) be the probability that the CTMC (X(t))_{t ≥ 0} is in state x at time t. Then these probabilities evolve according to a system of ordinary differential equations, called the chemical master equation (CME)^{7}, which is typically unsolvable. Hence its solutions are often estimated with Monte Carlo simulations of the CTMC, using methods such as Gillespie’s stochastic simulation algorithm (SSA)^{39}. If the CME has a unique, globally attracting fixed point π then the CTMC is called ergodic with π as the stationary distribution. If the convergence of p(t) to π is exponentially fast in t, then the CTMC is called exponentially ergodic. We shall work under the assumption of exponential ergodicity which is computationally verifiable using techniques in ref. ^{40} and in ref. ^{41}, wherein, it is also demonstrated that this assumption is satisfied by networks typically encountered in systems and synthetic biology. It is important to note that for an ergodic network, all stochastic trajectories, despite being different, have the same PSD.
Even though we primarily work with the CTMC model with generator (3), the PSD estimation method that we develop in this paper can also be applied to a more general CTMC model whose generator is given by
where μ_{k}(x, ⋅ ) is a statedependent probability distribution that governs the displacement upon firing of reaction k, i.e. if the state is x and reaction k fires, the process would jump to (x + ζ), where ζ is randomly drawn from the probability distribution μ_{k}(x, ⋅ ). Notice that by setting μ_{k}(x, ⋅ ) to be the probability distribution that puts all the mass at the fixed vector ζ_{k}, irrespective of the state x, we recover the standard CTMC model with generator (3). The generality introduced by allowing the displacement to be random and statedependent is useful in capturing cellwide mechanisms, like celldivision, that impact the whole molecular population within a cell (see the example presented in Fig. 7).
The resolvent operator and its connection to the PSD
Let (X(t))_{t ≥ 0} be a CTMC with generator \({\mathbb{A}}\). For such a Markov process, we define the transition semigroup \({\mathbb{T}}(t)\) as the operator which maps any realvalued function g on the state space, to the function specified by the conditional expectation
We now define the resolvent operator which plays a central role in the development of our method for PSD estimation. For any complex number s, the resolvent operator maps the function g to the Laplace transform of the map \(t\mapsto {\mathbb{T}}(t)g\)
It can be shown that the map \(s\mapsto {\mathbb{R}}(s)g(x)\) is complexanalytic.
Assuming that the observed singlecell trajectory \({({X}_{n}(t))}_{t\ge 0}\) is the copynumber dynamics of the output species X_{n}, we now establish a relation between the PSD \({S}_{{X}_{n}}(\omega )\) (see Box 1) and the resolvent operator. Let \({{\mathbb{E}}}_{\pi }({X}_{n})\) denote the stationary expectation of the copynumber of species X_{n} and let f be the function
Defining
the PSD \({S}_{{X}_{n}}(\omega )\) is given by
where \(i=\sqrt{1}\). This relation is proved in Section S2.2 of the Supplement. In this result we view the function \(x\mapsto f(x){\mathbb{R}}(s)f(x)\) as a random variable on the probability space whose samplespace is the statespace of the CTMC and the probability distribution is given by the stationary distribution π. The expectation of this random variable is denoted by G(s) and in the PSD estimation method we develop, we first estimate G(s) and then obtain the PSD using (9).
The eigendecomposition of the resolvent operator allows us to express G(s) as an infinite sum
where σ_{1}, σ_{2}, … are the nonzero eigenvalues of \({\mathbb{A}}\), assumed to be distinct and arranged in descending order of their real parts (which are negative due to ergodicity). Each coefficient α_{j} captures the power in the signal corresponding to eigenmode σ_{j}, and their sum is equal to the total signal power which is also the stationary variance Var_{π}(X_{n}) of the output species copynumber
Relation (10) is equivalent to the following representation of the autocovariance function
In the case of linear networks, G(s) can be exactly computed and (9) yields an analytical expression for the PSD which is already known in the literature^{14}. However, for such networks stimulated by external inputs it is not known how the output PSD is related to the PSDs of the input signals. We derive this relation by exploiting the resolvent connection and this yields a practically useful PSD decomposition result (see Theorem 2.1). For general nonlinear networks, we apply the theory of Padé approximations to find an accurate rational function representation of G(s) which is then used to estimate the PSD (9).
A PSD decomposition result for linear networks
In this section, we present a PSD decomposition result for linear networks with generator (3), that extends a similar result recently reported in ref. ^{26}. A reaction network is called linear if all its propensity functions are affine functions of the state variables. Under massaction kinetics, linear networks are necessarily unimolecular, i.e. all reactions have at most one reactant and are of the form \({{\emptyset}}\longrightarrow \star\) or X_{j} ⟶ ⋆, where ⋆ represents any linear combination of species. Assuming d species and K reactions, for linear networks we can express the vector of propensity functions λ(x) = (λ_{1}(x), …, λ_{K}(x)) as an affine map on the statespace
where Λ is some K × d matrix and \(\tilde{b}\) is a K × 1 vector. Letting S be the d × K matrix whose columns are the stoichiometric vectors ζ_{1}, …, ζ_{K} for the reactions. We define
and under the assumption of ergodicity, the d × d matrix A is Hurwitzstable, i.e. all its eigenvalues have strictly negative real parts. It can be easily shown (e.g. ref. ^{40}) that the dynamics of the expected state \(x(t)={\mathbb{E}}(X(t))\) is given by
and as t → ∞, x(t) converges to \(\bar{x}\) which is the state expectation under the stationary distribution π
Moreover, the stationary covariance matrix Σ for the state can be computed by solving the following Lyapunov equation
where D is the positive semidefinite matrix satisfying \(D{D}^{T}=S{{{{{{{\rm{diag}}}}}}}}({{\Lambda }}\bar{x}+\tilde{b}){S}^{T}\). In this setting, we can show that the resolvent operator maps the class of affine functions to itself, and this allows us to apply formula (9) to prove (see the Supplement, Section S2.3) that the PSD is given by
where I is the d × d identity matrix and e_{n} denotes its nth column. This expression is equivalent to the PSD formula for linear networks proved in ref. ^{14} using Gardiner’s regression theorem^{42} and it can also be derived using the LNA approximation.
Now consider the situation where such a linear network is being driven by external signals. These signals could be generated by different sources, e.g. upstream interconnected networks, environmental stimuli, or by engineered inputs introduced to probe the dynamics (see Fig. 1). A fundamentally important question is to understand how the internal noise and each of these inputs (deterministic or stochastic) conspire to make up the full power spectrum of an output of interest. Indeed it would be of considerable conceptual and practical significance to be able to decompose the output power spectrum in a way that allows the quantification of the specific contributions to the spectrum of the internal noise and of each of the external inputs. Although approximate decompositions of this sort have been reported in specific example networks modelled by CLEs^{19,20}, to the best of our knowledge no spectral decomposition results exist for general biochemical networks modelled by CLE, nor for those modelled by discrete stochastic CTMC models.
We consider m independent timevarying signals \({({Y}_{1}(t))}_{t\ge 0},\ldots ,{({Y}_{m}(t))}_{t\ge 0}\). We assume that these signals stimulate through m zerothorder reactions of the form
for k = 1, …, m. Each reaction follows massaction kinetics and for reaction k, θ_{k} is a positive constant and c_{k} = (c_{1k}, …, c_{dk}) is the vector representing the number of molecules of each species X_{1}, …, X_{d} created by this reaction. We shall assume that process (Y(t))_{t ≥ 0}, which includes all the stimulating signals, is an exponentially ergodic Markov process with stationary expectation \(\bar{y}=({\bar{y}}_{1},\ldots ,{\bar{y}}_{m})\). Let \(\bar{{{\Sigma }}}\) be the stationary variancecovariance matrix for the process (X(t))_{t ≥ 0} when each stimulating signal is deterministic and fixed to its stationary mean at all times, i.e. \(Y(t)=\bar{y}\) for all t ≥ 0. We now present our main result for linear networks which provides an analytic relationship between the PSD \({S}_{{X}_{n}}(\omega )\) of our output species X_{n} and the PSDs \({S}_{{Y}_{j}}(\omega )\) for j = 1, …, m.
Theorem 2.1
(PSD Decomposition) Consider a linear reaction network comprising species X_{1}, …, X_{d}, stimulated by independent timevarying signals \({({Y}_{1}(t))}_{t\ge 0},\ldots ,{({Y}_{m}(t))}_{t\ge 0}\), through zerothorder reactions of the form (14). We assume that each Y_{j} is an exponentially ergodic Markov process with PSD \({S}_{{Y}_{j}}(\omega )\). The PSD of the output species X_{n} is given by
The proof of this result is provided in Section S2.3 in the Supplement and it shows that the output spectrum is the sum of the intrinsic contribution and the external contributions from all stimulating signals. The external contribution due to signal Y_{j} is modulated by the frequencydependent gain \({\theta }_{j}^{2} {e}_{n}^{T}{(A+i\omega {{{{{{{\bf{I}}}}}}}})}^{1}{c}_{j}{ }^{2}\).
Padé PSD
In this section we develop our method, called Padé PSD, for estimating the PSD for a general nonlinear network with generator (4). For this, we apply Padé approximation theory which is known to be immensely useful in computing accurate rational function approximations for analytic functions. Recall representation (11) of the autocovariance function which is equivalent to representation (10) for the function G(s). Previous studies have established that usually the autocovariance function is wellapproximated by only the first few terms in this infinite series. This fact can be justified by appealing to the compactness of the resolvent operator which ensures that it is close to a finiterank operator (see Section S2.1 in the Supplement). If we only keep the first p terms in the infinite sum (10), then we obtain a rational function of the form
where the degree of the numerator polynomial is (p − 1) while the degree of the denominator polynomial is p. Based on this rational Ansatz, we shall employ the method of multipoint Padé approximation for identifying the 2p coefficients (viz. κ_{0}, …,κ_{p−1}, β_{0}, …, β_{p−1}) such that G_{p}(s) serves as an accurate approximant for the function G(s) given by (8), which then provides the PSD due to (9). The theory of multipoint Padé approximations^{29} (also called NewtonPadé approximations^{43}) is quite rich and many works have analysed their accuracy and convergence properties (see Chapter 3 in ref. ^{44}). In such an approximation, the rational Padé approximant is constructed by matching its power series expansions at several arbitrarily chosen points s_{1}, …, s_{L}, up to a certain number of terms ρ_{1}, …, ρ_{L}, to the corresponding power series expansions of the function being approximated (i.e. G(s) in our case). In our application we allow each s_{ℓ} to belong to the extended positive real line (0, ∞] (i.e. ∞ is included). The power series expansion of G(s) at s = s_{ℓ} can be written as
We show in Section S2.4.1 of the Supplement that each \({a}_{m}^{(\ell )}\) can be identified as the mth Padé derivative at s = s_{ℓ} defined by
where f is the output function (7), \({\mathbb{T}}(t)\) denotes the transition semigroup operator (5) with generator \({\mathbb{A}}\), and \({{\mathbb{A}}}^{m}\) denotes the mth iterate of \({\mathbb{A}}\) with \({{\mathbb{A}}}^{0}={{{{{{{\bf{I}}}}}}}}\) (the identity operator).
Suppose for now that these Padé derivatives have been estimated. Then it can be shown (see Section S2.4.2 in the Supplement) that for the Padé approximant G_{p}(s) to have a power series expansion at s = s_{ℓ} that agrees with the first ρ_{ℓ} terms in (16), the 2pdimensional vector of unknown coefficients x = (κ_{0}, …,κ_{p−1}, β_{0}, …, β_{p−1}) must satisfy the linear system
where A^{(ℓ)} is a ρ_{ℓ} × 2p matrix and b^{(ℓ)} is a ρ_{ℓ}dimensional vector whose components in the case s_{ℓ} < ∞ are given by
In the case s_{ℓ} = ∞ these components become
Aggregating these linear systems (18) for all ℓ = 1, …, L we arrive at the cumulative linear system
where A and b are obtained by vertically stacking A^{(ℓ)}s and b^{(ℓ)}s. Note that the dimensions of A and b are ρ_{sum} × 2p and ρ_{sum} × 1, respectively, with \({\rho }_{{{{{{{{\rm{sum}}}}}}}}}=\mathop{\sum }\nolimits_{\ell = 1}^{L}{\rho }_{\ell }\). Hence this linear system can be underdetermined if ρ_{sum} < 2p or overdetermined if ρ_{sum} > 2p. To handle both these possibilities in a unified way, we solve the linear system Ax = b in the sense of leastsquares, by minimising the residual norm \(\parallel Axb{\parallel }_{2}^{2}\). This provides us with the vector of unknown coefficients x to construct the rational Padé approximant G_{p}(s).
Consider the scenario of Theorem 2.1 where the output trajectory comes from a downstream network that is driven by a stochastic external signal that emanates from an upstream network. The denominator B(s) of the function G(s) that characterises the PSD of the external signal can be viewed as the product of the significant eigenvalues of the generator of the upstream network (see (10)), and one can show that these are also eigenvalues for the generator of the full network that includes both the upstream and the downstream networks (see Remark S2.2 in the Supplement). Hence we can reasonably expect B(s) to appear as a factor in the denominator for the function G(s) that characterises the PSD of the output signal and this factor can be independently estimated from the upstream network. This suggests a more general rational Ansatz than (15), which is of the form
where B(s) = B_{0} + B_{1}s + ⋯ + B_{q−1}s^{q−1} + s^{q} is some known polynomial with degree q ≤ p. In this case, the linear system for the unknown coefficients x = (κ_{0}, …, κ_{p−1}, β_{0}, …, β_{p−q−1}) changes from (21) to
where A and b are same as before, I_{p} is the p × p identity matrix, \(\hat{B}=({B}_{0},\ldots ,{B}_{q1})\) is the qdimensional vector of coefficients of B(s) and C is the p × (p − q) convolution matrix whose entries are given by
For our approach to work, the main challenge is to develop a method for reliable estimation of the Padé derivatives from a handful of trajectory simulations. We describe such a method in the next section and in the subsequent sections we discuss how the resulting Padé approximant can be validated and also provide more details on the computational implementation of our Padé PSD method.
Estimation of the Padé derivatives
We first consider the case s_{ℓ} < ∞. Appealing to the ergodicity of the CTMC we can express the Padé derivative \({D}_{m}^{({s}_{\ell })}\) as
where \({\tau }_{{s}_{\ell }}^{(m)}\) is an independent random variable with Erlang distribution with shape parameter (m + 1) and rate parameter s_{ℓ}. In other words, the probability density function of \({\tau }_{{s}_{\ell }}^{(m)}\) is given by
and we can view \({\tau }_{{s}_{\ell }}^{(m)}\) as the sum of (m + 1) independent and identically distributed exponential random variables with rate parameter s_{ℓ}. Noting that X_{n}(T) and \({X}_{n}(T{\tau }_{{s}_{\ell }}^{(m)})\) shall have the same mean and variance at stationarity we can rewrite (24) as
where Var_{π}(X_{n}) is the stationary variance of the output species copynumber and \({\delta }_{m}^{({s}_{\ell })}\) is the steadystate expectation of the squared change in the output state in a timeperiod of length \({\tau }_{{s}_{\ell }}^{(m)}\), i.e.
We now discuss how we can simultaneously estimate the steadystate expectation (25) for each m = 0, 1, …, (ρ_{ℓ} − 1). For this, we augment the CTMC state with ρ_{ℓ} additional state components, denoted by \({Y}_{1}(t),\ldots ,{Y}_{{\rho }_{\ell }}(t)\), and an extra reaction, called \({{{{{{{{\mathcal{R}}}}}}}}}_{{s}_{\ell }}\) that fires at the constant rate of s_{ℓ}. If this reaction fires at time t, then we reset these additional state components as
where X_{n}(t−) is the copynumber of the output species X_{n}, just before the reaction firing time. Similarly for j ≥ 2, Y_{j}(t) assumes the value of the previous state component before the jump time, which is Y_{j−1}(t − ). Letting \({\tau }_{{s}_{\ell }}^{(m)}\) be the Erlangdistributed random variable mentioned above, for any T ≫ 1
and we can express \({\delta }_{m}^{({s}_{\ell })}\) as
Suppose we have Q simulated trajectories of the augmented CTMC denoted by \({({X}^{(q)}(t),{Y}^{(q)}(t))}_{t\ge 0}\) for q = 1, …, Q. Then we can simultaneously estimate each \({\delta }_{m}^{({s}_{\ell })}\) with the Monte Carlo (MC) estimator
where T_{c} ≪ T_{f} is the cutoff time at which stationarity is assumed to be reached and the initial part of each trajectory in the timeinterval [0, T_{c}] is discarded. Observe that if T_{f} is large enough then even a single trajectory (i.e. Q = 1) is sufficient for this estimation due to Birkhoff’s Ergodic Theorem^{45}. However, using multiple trajectories enhances the MC estimator’s statistical accuracy which can be measured by estimating its sample variance. Based on Q CTMC trajectories the output variance Var_{π}(X_{n}) can be estimated as
Plugging this estimate along with \({\hat{\delta }}_{m}^{({s}_{\ell })}\) in (25), we obtain estimates of the Padé derivatives \({D}_{m}^{({s}_{\ell })}\) for each m = 0, …, (ρ_{ℓ} − 1).
We now come to the case s_{ℓ} = ∞. As before by simulating Q CTMC trajectories we can estimate \({D}_{m}^{(\infty )}\), for each m = 0, 1, …, (ρ_{ℓ} − 1), using the MC estimator
However, we generally find that the estimator (28) has a very large variance unless the simulation timeperiod [0, T_{f}] is very large. To mitigate this issue we design suitable covariates that can be added to the integrands in (28) in order to aid convergence with respect to T_{f} (see Section S2.4.3 in the Supplement). The resulting integrand is given by
Here the function γ_{jl}(x) is defined as
It can be shown that \({D}_{m}^{(\infty )}={{\mathbb{E}}}_{\pi }({{{\Psi }}}_{m}^{(c)})\) and hence we can estimate it from Q CTMC trajectories as
In practice, we find that this covariatebased MC estimator (31) typically has much lower variance than the simpler MC estimator (28).
Validation of the Padé approximant
Once the required Padé derivatives have been estimated, we can compute the Padé approximant G_{p}(s) and then use this approximant to compute the PSD. For this PSD estimation procedure to work well, it is crucial that the Padé approximant G_{p}(s) is an accurate surrogate for the function G(s). This depends on many factors, such as the order of approximation p, the number of Padé derivatives that are estimated and their statistical precision. In order to test if a computed Padé approximant is accurate we can validate it using direct statistical estimates (i.e. without rational approximation) of the function G(s) at multiple values of s, prescribed by \({\bar{s}}_{1},\ldots ,{\bar{s}}_{R}\). These values are all real positive numbers and similar to the Padé derivatives, the direct estimates can be estimated by augmenting the CTMC state with R additional state components, denoted by Z_{1}(t), …, Z_{R}(t), to keep track of the copy number history of the output species X_{n} at random exponential times in the past. Assume that there are R additional reactions \({{{{{{{{\mathcal{R}}}}}}}}}_{{\bar{s}}_{1}},\ldots ,{{{{{{{{\mathcal{R}}}}}}}}}_{{\bar{s}}_{R}}\) that fire independently at constant rates \({\bar{s}}_{1},\ldots ,{\bar{s}}_{R}\), respectively. If reaction \({{{{{{{{\mathcal{R}}}}}}}}}_{{\bar{s}}_{r}}\) fires at time t, then we set
where X_{n}(t−) is the copynumber of the output species X_{n}, just before the reaction firing time. As before we can conclude that for each r = 1, …, R the value \(G({\bar{s}}_{r})\) can be estimated with Q augmented CTMC trajectories, denoted by \({({X}^{(q)}(t),{Z}^{(q)}(t))}_{t\ge 0}\) for q = 1, …, Q
where \({\widehat{{{{{{{{\rm{Var}}}}}}}}}}_{\pi }({X}_{n})\) is the estimator (27) for the output variance.
If the estimated Padé approximant G_{p}(s) is accurate, each \(\hat{G}({\bar{s}}_{r})\) would be close to the value \({G}_{p}({\bar{s}}_{r})\), even though both these estimates would have some inaccuracies due to finite sampling and the finiteness of the simulation timeperiod. Upon comparing the graphs \(\{({\bar{s}}_{r},\hat{G}({\bar{s}}_{r})):r=1,\ldots ,R\}\) and \(\{({\bar{s}}_{r},{G}_{p}({\bar{s}}_{r})):r=1,\ldots ,R\}\), the Padé approximant can be validated.
We now present several biological examples to illustrate applications of Padé PSD method and also the PSD decomposition result for linear networks (Theorem 2.1). We start by considering some simple linear networks where analytical expressions for the exact PSDs are known and we show that Padé PSD is able to provide very accurate approximations to the PSD (see Fig. 2). Next we discuss how our PSD decomposition result allows us to identify a key criterion that enables differentiation between adapting circuit topologies^{34}. We then provide two case studies to illustrate the usefulness of our PSD estimation method for synthetic biology applications. We first examine the problem of optimising the oscillation strength of the repressilator^{35} (see Fig. 3) and then we consider the problem of reducing singlecell oscillations that typically arise due to the recently proposed antithetic integral feedback (AIF) controller^{36} (see Fig. 4) that has the important property of ensuring robust perfect adaptation for arbitrary intracellular networks with stochastic dynamics. Next, we examine how the PSD decomposition result can help us in studying the phenomenon of singlecell entrainment in the stochastic setting (see Fig. 5) and then we present an example to show how Padé PSD facilitates parameter inference with experimental singlecell trajectories that measure the copynumbers of the output species up to an unknown constant of proportionality (see Fig. 6). Lastly, we consider an example with celldivision cycle, and demonstrate that our Padé PSD method can be used for accurately estimating the PSDs and quantitatively examining oscillations induced by the cellcycle (see Fig. 7).
Detailed descriptions of the networks considered in the paper and their PSD analysis can be found in Section S4 of the Supplement. Unless otherwise stated, all reaction networks are assumed to follow CTMC dynamics with generator (3) and all propensity functions are assumed to be of the massaction form (2).
Validation of Padé PSD with linear networks
We now provide analytical expressions for the PSD of certain simple networks, like the birthdeath, the classical gene expression network^{46} and the recently proposed RNA splicing network^{47}. We then show that Padé PSD is able to approximate the PSD quite accurately.
Gene transcription
Consider a simple model of constitutive gene transcription and mRNA degradation, given by a singlespecies birthdeath network with rate of production k and the rate of degradation γ
The stationary distribution for this network is Poisson with parameter k/γ. Hence the stationary mean and variance and equal to k/γ and applying formula (13) we can compute the PSD as
This shows that the PSD (normalised by the total area under its curve) has the fattailed Cauchy distribution with infinite mean and variance, showing that even for such a simple network the stochastic output trajectory contains a very wide range of frequencies.
Gene expression network
We now analyse the gene expression model shown in Fig. 2A that consists of two species—the mRNA (X_{1}) and the protein (X_{2}). There are four reactions corresponding to mRNA transcription, protein translation and the firstorder degradation of both the species. Observe that the mRNA dynamics is birthdeath and hence we can compute its PSD using (34) with (k, γ) ↦ (k_{r}, γ_{r}). Since mRNA stimulates the creation of protein via a reaction of the form (14) we can apply our PSD decomposition result (Theorem 2.1) to express the protein PSD as a sum of two components corresponding to translation and transcription, respectively:
The translation term is computed by setting the mRNA level to its stationary mean \({\bar{x}}_{1}:={k}_{r}/{\gamma }_{r}\) and then viewing the protein dynamics as a birthdeath process with production rate \({k}_{p}{\bar{x}}_{1}\) and degradation rate γ_{p}. The transcription term is simply the PSD of mRNA modulated by the frequencydependent factor given by Theorem 2.1.
RNA splicing network
The recently proposed RNA Splicing network (see Fig. 2B) was used to model the concept of RNA velocity that can help in understanding cellular differentiation from singlecell RNAsequencing data^{47}. Here a single genetranscript can randomly switch between active (X_{1}) and inactive (X_{2}) states with different rates of transcription of unspliced mRNA (X_{3}). The splicing process converts these unspliced mRNAs into spliced mRNAs (X_{4}). Both spliced and unspliced mRNAs undergo firstorder degradation. Applying formula (13) we can write the PSD of the dynamics of active gene count as
Note that when the active gene count is X_{1} ∈ {0, 1} the transcription rate is α_{off} + (α_{on} − α_{off})X_{1}. We can view transcription as a superposition of two reactions—a constitutive reaction with rate α_{off} and reaction of the form (14) where the stimulant is the active gene X_{1}. Applying Theorem 2.1 we can decompose the PSD of the spliced mRNA count as
where \({S}_{{X}_{1}}(\omega )\) is given by (36).
Observe that for both gene expression and RNA splicing networks we can find an analytical expression for the PSD by directly applying formula (13) for the full network. However, using our PSD decomposition result we not only simplify the computation but also identify the contribution of the network mechanisms to the PSD.
For a specific parameterisation of these two networks, we compare the PSDs obtained analytically with those obtained by our Padé PSD method and the standard periodogram estimator for PSD that is based on discretesampling and DFT (see Box 1). The results are presented in Fig. 2A, B and they show good agreement, despite the noisy nature of the DFT estimate. The analytical expressions for the PSD along with the PSD estimates produced by Padé PSD are given in Table 1. One can see that the PSD estimated by our method is quite “close” to the analytical PSD for the gene expression network. The same holds for the RNA splicing network (see the PSD plots in Fig. 2B) even though it is not apparent from the expressions in Table 1.
The PSD enables discrimination between regulatory topologies
We consider simple threenode IFF and NFB topologies depicted in Fig. 2C, D with stochastic kinetics. We provide analytical expressions for the PSDs under the assumption of linearised propensity functions for the repression mechanisms. These expressions inform us about qualitative structural differences between the PSDs obtained from IFF and NFB topologies, regardless of the choice of reaction rate parameters. This shows that in the stochastic setting, the PSD of singlecell trajectories serves as a key “response signature” that can differentiate between adapting circuit topologies. We demonstrate this finding with our Padé PSD model for a specific parametrisation of these networks and we argue why this result holds for arbitrarilysized IFF and NFB networks.
We begin by analysing the IFF topology, where the controller species C catalytically produces the output species O at rate F_{f}(x_{c}) which is a monotonically decreasing function of the controller species copynumber x_{c} and it represents the repression of O by C. We linearise the function F_{f}(x_{c}) as
where β_{0} and β_{ff} are positive constants denoting the basal production rate and the strength of the incoherent feedforward mechanism, respectively. With this linearisation, all propensity functions become affine and hence we can apply the results for linear networks. Specifically, the steadystate means \({\bar{x}}_{c}:={{\mathbb{E}}}_{\pi }(C)\) and \({\bar{x}}_{o}:={{\mathbb{E}}}_{\pi }(O)\) are given by
and it is immediate that if β_{ff} ≈ k_{o}γ_{c}/k_{c}, then the mean output value \({\bar{x}}_{o}\approx {\beta }_{0}/{\gamma }_{o}\) becomes insensitive to the input abundance level I_{0}. This shows the adaptation property of the IFF network.
As the dynamics of C is simply birthdeath with production rate k_{c}I_{0} and degradation rate γ_{c}, its PSD is given by
Under the assumption of linearity of the feedforward function F_{f} the stimulation of O by C can be viewed as zerothorder degradation. Applying Theorem 2.1 we can evaluate the output PSD as
Since this is a sum of two nonnegative monotonically decreasing functions of ω, we can conclude that S_{O}(ω) is also monotonically decreasing. Hence output trajectories cannot show oscillations regardless of the IFF network parameters. This same argument can be extended to IFF networks with arbitrary number of nodes (see the Supplement, Section S4.1.3).
In the NFB topology, the production of the controller species C is repressed by the output species O, and we model the production rate by a monotonically decreasing function F_{b}(x_{o}) of the output species copynumber x_{o}. As before, we linearise this function as
where β_{0} is the basal production rate and β_{fb} is the feedback strength. Under this linearisation, the steadystate means \({\bar{x}}_{c}:={{\mathbb{E}}}_{\pi }(C)\) and \({\bar{x}}_{o}:={{\mathbb{E}}}_{\pi }(O)\) are given by
Observe that if the input abundance level I_{0} is high, then mean output value \({\bar{x}}_{o}\approx {\beta }_{0}/{\beta }_{{{{{{{{\rm{fb}}}}}}}}}\) only depends on the feedback function F_{b} and it is insensitive to I_{0}, thereby demonstrating the adaptation property. Applying formula (13) we arrive at the following expression for the PSD for the output trajectory
Proposition S4.1 in the Supplement proves that the mapping ω ↦ S_{O}(ω) has a positive local maximum (which is also the global maximum) if and only if
where \({{\Gamma }}({\gamma }_{c},{\gamma }_{o},{k}_{o}):={({\gamma }_{c}{\gamma }_{o}+{\gamma }_{c}^{2}+{k}_{o}{\gamma }_{c})}^{2}+{\gamma }_{c}^{4}+{\gamma }_{c}^{3}{k}_{o}+{\gamma }_{o}^{2}{\gamma }_{c}{k}_{o}\). This condition shows that regardless of the choice of NFB network parameters, the output trajectories will exhibit oscillation if the input abundance level I_{0} is high enough. Using the standard rootlocus argument^{48} we can draw the same conclusion for arbitrarilysized NFB networks (see the Supplement, Section S4.1.3). This shows that the existence of oscillations and nonmonotonicity of the PSD is a differentiator between the NFB and the IFF networks as the latter never exhibits oscillations. Note that high I_{0} is precisely the condition for NFB to show adaptation and hence imposing this requirement is not very restrictive. The role of negative feedback in causing stable stochastic oscillations was explored theoretically in ref. ^{27} with CLE, and it has also been demonstrated experimentally.
For a specific parameterisation of the threenode IFF and NFB networks, we compare the PSD produced by our method with the analytical PSD and the DFTbased estimator. The results are shown in Fig. 2C, D and one can see that Padé PSD is quite accurate in estimating the PSD, which is also evident from the PSD expressions provided in Table 1. Since negative propensities cannot be allowed, we perform simulations with the positive part of the linear feedforward (see (38)) and feedback (see (39)) functions. Hence the analytical PSD expressions are not exact but they are still close because the dynamics rarely enters the states for which these linear functions become negative.
Using the PSD for enhanced oscillator design
The repressilator^{35} is the first synthetic genetic oscillator and it consists of three genes repressing each other in a cyclic fashion (see Fig. 3A). These three genes are tetR from the Tn10 transposon, cI from bacteriophage λ and lacI from the lactose operon. These three genes create three repressor proteins which are TetR, cI and LacI, respectively, and the cyclic repression mechanism can be represented as
Due to intrinsic noise in the dynamics, the repressilator loses oscillations at the bulk or the populationaverage level after a few generations. At the singlecell level this intrinsic noise broadens the output PSD peak, making the oscillations less regular in both amplitude and phase. In other words, intrinsic noise compromises the ability of the circuit to keep track of time. This issue was addressed in a recent paper^{49} which elaborately studied the various sources of noise in the original circuit and eliminated them to construct a modified repressilator circuit that showed regular oscillations over several generations. It was found that most of the noise was generated when TetR protein levels were low and the derepression of the TetR controlled promoter occurred at a low threshold. To raise this threshold a sponge plasmid was introduced and this had the remarkable effect of regularising the oscillations and sharpening the singlecell PSD peak.
It is also known that increasing the cooperativity of the repression mechanism improves regularity of the oscillations^{35}. A fundamental question then arises is that—does the PSDsharpening effect of the sponge plasmid persist when the repression cooperativity is increased? If this is true then one can regularise oscillations even more by designing cooperative promoters in addition to employing the sponge device. We study this question using an adaptation of the stochastic model given in ref. ^{49}. The stochastic model is detailed in Section S4.2.1 of the Supplement. The repression mechanism is encoded with a nonlinear Hill function whose coefficient H represents the degree of cooperativity among the promoter binding sites. The sponge plasmid, if present, can competitively bind the free TetR molecules, reducing the number of these molecules available for repressing the cI gene.
We demonstrate that our method is able to accurately estimate the singlecell PSD and exhibit the sharpening of the PSD in the presence of the sponge plasmid when the cooperativity is set to H = 1.5. Surprisingly when the cooperativity is increased to H = 2, the sponge loses its effect of sharpening the PSD. This shows that in certain parameter regimes, the oscillationregularising effects of the sponge plasmid and the repressor binding cooperativity are not additive, possibly due to the fact that increased cooperativity makes the repression mechanism more ultrasensitive^{50}.
With our method, we estimate the PSD for the dynamics of the copynumbers of the cI protein, whose expression is directly repressed by TetR. For the promoter cooperativity (i.e. the Hill coefficient) of H = 1.5, the PSD indeed exhibits a sharper peak, in the presence of the sponge plasmid, at the peak frequency of around \({\omega }_{\max }\approx 1.35\,{{{{{{{\rm{rad./gen.}}}}}}}}\) (see Fig. 3B). This sharpness in PSD suggests more regularity in oscillations which is also evident from the singlecell trajectories plotted in Fig. 3C. We compare our PSD estimation method with the DFT method in both the cases (with and without sponge) and the results are shown in Fig. 3C. The same analysis is repeated for the promoter cooperativity of H = 2 and the results are shown in Fig. 3B and D. From Fig. 3B it is immediate that for H = 2, the PSD sharpening effect of the sponge plasmid is lost.
Biocontroller design with PSD: suppressing singlecell oscillations
In recent years genetic engineering has allowed researchers to implement biomolecular control systems within living cells (see refs. ^{36,51,52,53,54,55,56,57,58,59,60}). This area of research, popularly known as Cybergenetics^{51}, offers promise in enabling control of living cells for applications in biotechnology^{61,62} and therapeutics^{63}. A particularly important challenge in Cybergenetics is to engineer an intracellular controller that facilitates cellular homoeostasis by achieving robust perfect adaptation (RPA) for an output statevariable in an arbitrary intracellular stochastic reaction network. This challenge was theoretically addressed in ref. ^{36} which introduced the antithetic integral feedback (AIF) controller and demonstrated its ability to achieve RPA for the populationmean of output species. This controller has been synthetically implemented in vivo in bacterial cells, and it has been shown that any biomolecular controller that achieves RPA for arbitrary reaction networks with noisy dynamics, must embed this controller^{60}.
Computational analysis has revealed that AIF controller can cause highamplitude oscillations in the singlecell dynamics in certain parameter regimes^{36,64} which could potentially be undesirable and/or unfavourable. Hence it is important to find ways to augment the AIF controller, so that singlecell oscillations are attenuated but the RPA property is preserved. It is known that adding an extra negative feedback (like proportional action) from the output species to the actuated species maintains the RPA property, while decreasing both the output variance and the settlingtime for the mean dynamics^{65}. Using the PSD estimation method developed in this paper we now demonstrate how adding such a negative feedback also helps in diminishing singlecell oscillations.
The AIF controller is depicted in Fig. 4A and it is acting on the gene expression model considered in Fig. 2A. The AIF controller robustly steers the mean copynumber level of the protein X_{2} to the desired setpoint μ/θ, where μ is the production rate of Z_{1} and θ is the reaction rate constant for the output sensing reaction. The AIF affects the output by actuating the production of mRNA X_{1} and the feedback loop is closed by the annihilation reaction between Z_{1} and Z_{2}. This annihilation reaction can be viewed as mutual inactivation or sequestration and it can be realised using biomolecular pairs such as sigma/antisigma factors^{54,66,67}, scaffold/antiscaffold proteins^{68} or toxin/antitoxin proteins^{69}.
It is known from ref. ^{36} that the combined closedloop dynamics is ergodic and mean steadystate protein copynumber is μ/θ
As discussed in ref. ^{65}, this ergodicity is preserved under certain conditions when an extra negative feedback from protein X_{2} to the production of mRNA X_{1} is added. Letting z_{1} and x_{2} denote the copynumbers of Z_{1} and X_{2}, respectively, we add the extra feedback by changing the rate of the actuation reaction from kz_{1} to (kz_{1} + F_{b}(x_{2})) where F_{b} is a monotonically decreasing feedback function which takes nonnegative values. As in ref. ^{65}, we consider two types of feedback. Letting \(\hat{\mu }\) to be the reference point, the first is Hill feedback of the form
which is based on the actual output copynumber x_{2}, while the second is the proportional feedback that is essentially the linearisation of the Hill feedback at the reference point \(\hat{\mu }\)
One can easily see that at the reference point, the values of this feedback function \({F}_{b}(\hat{\mu })\) and its derivative \({F}_{b}^{\prime}(\hat{\mu })\) (equal to −k_{fb}) are the same for both types of feedback. We can view k_{fb} as the feedback gain parameter. The Hill feedback is biologically more realisable, while the proportional feedback captures the classical controller where the feedback strength depends linearly on the deviation of the output x_{2} from the reference point \(\hat{\mu }\), in the output range \([0,3\hat{\mu }]\). In our analysis, we set the reference point \(\hat{\mu }\) as the setpoint μ/θ.
For a particular network parametrization, we use our method to estimate the PSD for the singlecell protein dynamics in the AIFregulated gene expression network, and the results are displayed in Fig. 4. When the extra negative feedback is absent (i.e. k_{fb} = 0) the singlecell trajectory has highamplitude oscillations which is also evident from the estimated PSD (see Fig. 4B). In Fig. 4C, we apply our Padé PSD method to examine how the PSD changes when extra feedback of Hill type is added with varying strengths given by parameter k_{fb}. Observe that as the feedback strength increases, the PSD peak declines and the oscillations become almost nonexistent for \({k}_{{{{{{{{\rm{fb}}}}}}}}}=0.5\,{\min }^{1}\). The same holds true for the proportional feedback (see Fig. 4D). These results suggest that both feedback mechanisms are more or less equally effective in reducing oscillations. This is further corroborated by the singlecell trajectories plotted in Fig. 4C, D which also shows that addition of feedback decreases the stationary output variance, that is equal to the signal power (see Box 1). In a recent paper, the decrease in oscillations upon addition of proportional feedback has been experimentally validated in yeast cells^{70}.
In ref. ^{36}, it is reported that the deterministic model of the AIFregulated gene expression network can exhibit both convergence to a fixed point and sustained oscillations. Keeping all other parameters fixed and setting k_{fb} = 0, we simulate the deterministic model for four values of the actuation rate constant k and plot the output protein trajectories in Fig. 4F. One can see that for lower values of k, the deterministic trajectories converge to a fixed point, which is equal to the setpoint μ/θ, while for higher values of k, the trajectories oscillate around the setpoint. Estimating the PSDs for the stochastic model with our Padé PSD method we find that for all the four k values, the PSDs have a nonzero peak around \(1\,{{{{{{{\rm{rad}}}}}}}}/\min\) (see Fig. 4G). This shows that the oscillatory tendency of the stochastic model persists, albeit at lower PSD peak values, for values of k beyond the critical value where the deterministic system transitions from a limit cycle to a fixed point. For the lower values of k, oscillations are noiseinduced in the sense that they only emerge in the presence of randomness in the dynamics and they disappear (at steady state) if the noisefree deterministic model is considered. For two values of k, we plot the PSD obtained by our method and compare it with the PSD estimated with DFT and one can see from Fig. 4H that there is good agreement. The details on all the computations for the AIFregulated gene expression network can be found in Section S4.2.2 of the Supplement.
This example with noiseinduced oscillations also shows that the LNA would yield a very inaccurate PSD estimate as it essentially adds a Gaussian term to the deterministic dynamics. Hence if the deterministic dynamics converge to a fixed point, the LNAbased PSD estimator cannot have a peak at a nonzero frequency value.
Exploiting the PSD for studying stochastic entrainment
The phenomenon of entrainment occurs when an oscillator, upon stimulation by a periodic input, loses its natural frequency and adopts the frequency of the input. This phenomenon has several applications in physical, engineering and biological systems^{71}. The most wellknown biological example of this phenomenon is the entrainment of the circadian clock oscillator by daynight cycles. The circadian clock is an organism’s timekeeping device and its entrainment is necessary to robustly maintain its periodic rhythm^{72}. The circadian clock is one example among several intracellular oscillators that have been found and their functional roles have been identified^{73}. Often these oscillators provide entrainment cues to other networks within cells^{74} and hence it is important to study entrainment at the singlecell level, where the dynamics is intrinsically noisy due to low copynumber effects.
We now illustrate how our PSD decomposition result (Theorem 2.1) can be used to study singlecell entrainment in the stochastic setting where the dynamics is described by CTMCs. We consider the example of the repressilator stimulating a gene expression system, as shown in Fig. 5A. This gene expression network is the same as in Fig. 2A but we include transcriptional feedback from the protein molecules and so the mRNA transcription rate is given by a monotonic decreasing function F_{b}(x_{2}) of the protein copynumber x_{2}. We shall linearise F_{b}(x_{2}) as
where k_{r} is the basal transcription rate and k_{fb} is the feedback strength. When this gene expression network is connected to the repressilator (see Fig. 5) the transcription rate changes from F_{b}(x_{2}) to
where p_{2} is the molecular count of protein cI in the repressilator and parameter θ captures the “strength” of the interconnection. In other words, cI acts as an activating transcription factor in our example. The parameters of the repressilator are chosen as in Fig. 3 in the “no sponge” and Hill coefficient H = 1.5 case, but the timeunits are changed to minutes. We can view the gene expression network as simply the negative feedback (NFB) network in Fig. 2 with the controller species C as mRNA X_{1} and the output species O as protein X_{2}. Using the same parameters as the NFB network, we study how the PSD of the protein output varies as a function of θ. In order for the gene expression network to be entrained to the repressilator the global maxima of this protein PSD should be near the repressilator’s natural (or peak) frequency of about 1.35 rad/min (see Fig. 3C).
To compute the PSD of the combined network we shall apply Theorem 2.1. For this, we first consider the gene expression network in isolation with p_{2} in the transcription rate (42) replaced by the constant steadystate mean of p_{2} (denoted by \({{\mathbb{E}}}_{\pi }({P}_{2})\)). Hence using (40) we can estimate the protein dynamics PSD \({S}_{{X}_{2}}^{{{{{{{{\rm{iso}}}}}}}}}(\omega )\) as
Irrespective of the value of θ, the PSD \({S}_{{X}_{2}}^{{{{{{{{\rm{iso}}}}}}}}}(\omega )\) has a global maxima at \({\omega }_{\max }\approx 0.85\) rad/min which is the natural frequency of the gene expression circuit in isolation.
When the repressilator is connected to the gene expression network, we can apply Theorem 2.1 to compute the PSD of the protein output as
We call this method composite Padé PSD as it estimates the PSD for the full network by combining two PSDs—one obtained with Padé PSD for the nonlinear subnetwork (repressilator) and the other obtained analytically for the linear subnetwork (gene expression). Notably, this method does not require simulations of the combined process, making it easier to obtain PSDs for multiple values of θ without incurring any simulation burden. In Fig. 5(B) we plot the normalised PSD (area under the PSD curve is normalised to 1) for six values of θ and we also validate this composite method with the DFT method for \(\theta =0.4\,{\min }^{1}\). One can clearly see that as θ gets higher, the gene expression network gives up its natural frequency upon stimulation and adopts a frequency which is close to the repressilator frequency. This exemplifies the phenomenon of singlecell entrainment in the stochastic setting.
In order to investigate this entrainment phenomenon further we define an entrainment score as
where [ω_{l}, ω_{r}] = [0.9ω_{0}, 1.1ω_{0}] represents an interval of relative length 10% on either side of the repressilator’s natural frequency ω_{0}. In Fig. 5C, we plot a heatmap for the entrainment score as a function of the feedback strength parameter k_{fb} and the connection strength parameter θ. One can see that the entrainment score increases monotonically with θ which is to be expected as the first term on the r.h.s. of (43) scales linearly with θ while the second term scales quadratically. Similarly, by computing the ratio of the two terms we can conclude that entrainment score is also a monotonically increasing function of k_{fb}. However, as the heatmap clearly indicates, the entrainment score is more sensitive to k_{fb} than θ, thereby suggesting that transcriptional feedback could be a critical mechanism for facilitating entrainment of gene expression networks.
Now suppose that the transcriptional feedback is given by a nonlinear Hill function F_{b}(x_{2}). In this case, the gene expression subnetwork becomes nonlinear and Theorem 2.1 cannot be used for PSD estimation. However, we can still employ the Padé PSD method on the combined network using a rational Ansatz of the form (22) with B(s) being the denominator for the Padé approximant estimated by our method in estimating the PSD of the stimulating repressilator network. As shown in Fig. 5D, the PSDs estimated with Padé PSD show good agreement with the DFTbased estimates.
PSD as a tool for parameter inference
Consider a selfregulatory gene expression system (see Fig. 6A) modelled as a simple birthdeath network where the production rate is given by the repressing Hill function
of the output copynumber x and the degradation rate is γ. Fixing all other parameters, our goal is to use the experimental PSD to infer the degree of cooperativity H. This experimental PSD is generated via simulations with H = 1 and we average the PSDs over 100 singlecell trajectories in order to reduce the variance in the DFTbased PSD estimate. We assume that the experimental singlecell trajectories are proportional to the output copynumber but the constant of proportionality is unknown as is often the case in timelapse microscopy experiments. We also assume that there is no measurement noise—if the measurement noise appears as an independent process then its PSD simply appears as an additive term in the output PSD, which can be easily removed to recover the output PSD without the measurement noise.
Observe that the unknown constant of proportionality drops out when we compute the normalised PSD (i.e. area under the PSD curve is normalised to 1). Hence we can infer the unknown parameter H by estimating the normalised PSD and comparing it with the experimentally obtained normalised PSD, as was previously demonstrated in^{37}. We estimate the normalised PSD with our Padé PSD method and provide a comparison for various values of H in Fig. 6B and it is evident that the experimental traces come from the network with H = 1. Note that the clean estimates for the normalised PSD produced by our Padé PSD method, greatly facilitate the inference of H. If the same estimates were obtained with DFT then the estimator noise would obfuscate the dependence of the PSD on H and make the inference task difficult.
Exploring cellcycle induced oscillations in gene expression
In all the examples considered so far, we have ignored that reaction networks reside within cells that are undergoing their own division cycles. Overlooking the cellcycle is only reasonable when the dynamics of the network being analysed occurs at a timescale which is much faster than the timescale of celldivision. If this assumption does not hold, as is often the case in prokaryotic cells, the cellcycle process should not be neglected while estimating the frequency spectrum of an output trajectory within a celllineage. Tracking trajectories of output fluorescent proteins across a celllineage over multiple generations is now increasingly possible due to advanced timelapse microscopy techniques^{70,75} and microfluidic platforms such as the mother machine^{4}. As these trajectories can be obtained over very long time horizons, a steadystate property like the PSD can be reliably estimated with experimental data, and by comparing it with theoretically estimated PSDs one may gain insights into the underlying network and the role of cellcycle in inducing oscillations.
Inspired by ref. ^{37}, we consider the cellcycle evolution as a Nstage Markov process with a constant rate α of transitioning from one stage to the next. Hence each transition will occur after a random timeinterval which is exponentially distributed with rate α. Observe that the expected time to complete one cycle would be N/α, implying that the cellcycle frequency is f_{r} = α/N. At the start of each new cellcycle, when the cellcycle process goes from stage N to stage 1, the mother cell undergoes division into two daughter cells and only one of these two cells is tracked and measured, providing us with an output trajectory over a single lineage. The cell division entails a partition of all mother cell molecules into two components—one for each daughter cell. We assume two partitioning mechanisms: symmetric binomial where each mother cell molecule is randomly assigned to each daughter cell with an equal probability, and strict binary where each daughter cell procures exactly half of the mother cell molecules for each network species (see Fig. 7A, C). Observe that partitioning at celldivision forces the displacement in the vector of molecular counts to be statedependent, i.e. the difference between the state x of the mother cell prepartition and the state \(x^{\prime}\) of the (tracked) daughter cell postpartition will depend on x. Hence, instead of a CTMC with generator (3) we need to model the dynamics with a more general CTMC with generator (4). The explicit form of the generator along with all the computational details on this example can be found in Section S4.2.5 of the Supplement.
Suppose that this dividing cell comprises the gene expression network shown in Fig. 2A which operates at the same timescale as the cellcycle process. Notice that if we ignore the celldivision cycle, the protein count trajectory does not show any oscillations as seen from the monotonically decreasing PSD plot in Fig. 2A. We now include the cellcycle and examine how the PSD for the protein counts changes with the cellcycle length N. As we vary N we keep the frequency f_{r} constant by adjusting α. The cellcycle process can be viewed as an external signal that stimulates the gene expression network by inducing celldivision. Hence we estimate the PSD with our Padé PSD method using a rational Ansatz of the form (22), with B(s) = ∣σ∣^{2} − 2Real(σ)s + s^{2} where \(\sigma =\alpha (1\exp (2\pi i/N))\) is the eigenvalue of the cellcycle evolution generator with the least magnitude of the real part. The estimated PSDs show good agreement with the PSDs estimated via DFT, for both types of partitioning mechanisms (see Fig. 7B, D) and one can see that the type of partitioning mechanism has little effect on the PSD. Moreover as N increases, the relative noise in the cellcycle process goes down, causing an increase in the offzero peak of the PSD at the cellcycle frequency of roughly \(1.57\,{{{{{{{\rm{rad/min}}}}}}}}\). This observation is consistent with the results reported in ref. ^{37} for a singlespecies bursty gene expression network but with a much richer celldivision model than what we consider. The analytical computations presented in ref. ^{37} are quite elegant and the authors employ generating function techniques to obtain closedform expressions for the PSD under the assumption of binomial partitioning. However, this analytical approach may become infeasible when other partitioning mechanisms are considered (e.g. strict binary) or when the output trajectories come from a highdimensional nonlinear network. Our numerical Padé PSD method should still perform reliably in these cases as long as one can feasibly simulate the stochastic trajectories of the process.
Discussion
Recent advances in microscopic imaging and fluorescent reporter technologies have enabled highresolution monitoring of processes within living cells^{5}. As the accessibility of this timecourse data rapidly increases, there is an urgent need to design theoretical and computational approaches that make use of the full scope of such data, in order to understand intracellular processes and design effective synthetic circuits. An important feature of timecourse measurements, which is lacking in the data generated by the more common experimental technique of FlowCytometry, is that they capture temporal correlations at the singlecell level which are rich in information about the underlying dynamical model. Frequencydomain analysis provides a viable approach to extract this information, if we have an efficient framework to connect network models to the frequency spectrum or the power spectral density (PSD) of the singlecell trajectories measured with timelapse microscopy^{18,20}. The dynamics within cells is invariably stochastic, owing to the presence of many low abundance biomolecular species, and it is commonly described as a continuoustime Markov chain (CTMC). In this context, the aim of this paper is to develop a computational method for reliably estimating the PSD for singlecell trajectories from CTMC models. Existing approaches for PSD estimation for stochastic network models, are either applicable to a particular class of networks^{17,26}, or they are based on dynamical approximations that are known to be inaccurate over large timeintervals and in situations where low abundance species are present^{19,20}. The method we develop in this paper, called Padé PSD, especially pertains to the low abundance regime. It applies generically to any stable network and it yields an accurate PSD expression using a small number of CTMC trajectory simulations. Moreover, for networks with affine propensity functions, we provide a PSD decomposition result that expresses the output PSD in terms of its constituent parts.
The tools we develop in this paper are of significance to both systems and synthetic biology. We demonstrate that in the presence of intrinsic noise, PSD estimation can successfully differentiate between adapting Incoherent Feedforward (IFF) and Negative Feedback (NFB) topologies^{34}, and it can facilitate performance optimisation of synthetic oscillators^{35} as well as synthetic in vivo controllers^{36}. Moreover, it can also aid the study of stochastic entrainment at the singlecell level. This is of particular relevance for applications such as designing pulsatile dynamics of transcription factors, which is known to enable graded multigene regulation^{76}. We present a simple nonlinear network to illustrate that PSDs enable parameter inference from experimental singlecell trajectory data without requiring the explicit knowledge of the constant of proportionality that links the output species copynumber to the observed signal. Lastly, we consider an example with celldivision cycles and show that our Padé PSD method provides accurate PSD estimates for stochastic trajectories from a single lineage, thereby assisting in precise quantification of the oscillations induced by the cellcycle process.
The main contribution of this paper is to show how the theory of Padé approximations can be effectively applied to the PSD estimation problem for reaction networks with stochastic CTMC dynamics. In Padé PSD a low dimensional approximation of the PSD is computed based on estimates of Padé derivatives that are expressible as certain stationary expectations for which efficient Monte Carlo estimators were developed. As our method requires simulations of stochastic trajectories it naturally inherits the associated drawbacks—these simulations can be computationally expensive, especially if the network possesses multiple reaction timescales. Fortunately, the problem of reliably estimating expectations under the CTMC model has received a lot of attention in recent years^{77}, and various methods designed for this problem, like τleaping^{78} and/or multilevel schemes^{79}, can be easily integrated with Padé PSD, in order to speed up the estimation process and also to reduce the variance of the Monte Carlo estimators. Moreover, model reductions^{80,81} and simulation tools^{82,83} for multiscale networks can be readily applied to simplify the estimation of Padé derivatives. Such extensions would greatly expand the scope of applicability of our method and pave the way for frequencybased analysis and design of stochastic biomolecular reaction networks.
Methods
We now discuss the computational implementation of our Padé PSD method. The detailed algorithms for this method are provided in Section S3 of the Supplement and its full Python implementation is available on GitHub: https://github.com/ankitgupta83/PadePSD_python.git^{84}.
The inputs to our method are as follows:

A positive integer p which specifies the order of the rational Padé approximant G_{p}(s) given by (15).

A vector of distinct points s = (s_{1}, …, s_{L}) on the extended positive realline (0, ∞] along with a vector of positive integers ρ = (ρ_{1}, …, ρ_{L}). The Padé approximant is constructed by matching between G(s) and G_{p}(s) the first ρ_{ℓ} terms in the power series expansion around s = s_{ℓ} for each ℓ = 1, …, L. Without losing any generality we may assume that s_{1}, …, s_{L−1} are all finite and s_{L} = ∞.

A vector of distinct positive real test values \(\bar{{{{{{{{\bf{s}}}}}}}}}=({\bar{s}}_{1},\ldots ,{\bar{s}}_{R})\) for validating the Padé approximant.
Given these inputs, the main computational tasks that Padé PSD performs are:

1.
Estimate the required Padé derivatives: Quantities \({D}_{m}^{({s}_{\ell })}\) are estimated for each m = 0, 1, …, (ρ_{ℓ} − 1) and each ℓ = 1, …, L.

2.
Obtain direct estimates for validation: Quantities \((G({\bar{s}}_{1}),\ldots ,G({\bar{s}}_{R}))\) are directly estimated.
Upon completing these tasks, the linear system (21) for the 2p coefficients for the Padé approximant G_{p}(s) is constructed and solved. This provides us with G_{p}(s) which is then validated with the direct estimates \((G({\bar{s}}_{1}),\ldots ,G({\bar{s}}_{R}))\), and if the validation is successful, the PSD \({S}_{{X}_{n}}(\omega )\) is obtained by applying formula (9) with G(z) = G_{p}(z).
All the required quantities are simultaneously estimated with Q trajectories of the augmented CTMC \({({{{{{{{\mathcal{X}}}}}}}}(t))}_{t\ge 0}\) with
where

X(t) = (X_{1}(t), …, X_{d}(t)) is the vector of species copynumbers.

\(Y(t)=({Y}_{1}(t),\ldots ,{Y}_{{\vartheta }_{1}}(t),{Y}_{{\vartheta }_{1}+1}(t),\ldots ,{Y}_{{\vartheta }_{2}}(t),\ldots ,{Y}_{{\vartheta }_{L1}+1}(t),\ldots ,{Y}_{{\vartheta }_{L1}}(t))\) is the vector of additional statecomponents used for estimating the Padé derivatives \({D}_{m}^{({s}_{\ell })}\) for each m = 0, 1, …, (ρ_{ℓ} − 1) and each ℓ = 1, …, (L − 1). Here \({\vartheta }_{\ell }=\mathop{\sum }\nolimits_{j = 1}^{\ell }{\rho }_{j}\) with ϑ_{0} = 0. Note that the estimation of the Padé derivatives at s_{L} = ∞ does not require these additional state components.

Z(t) = (Z_{1}(t), …, Z_{R}(t)) is the vector of additional statecomponents used for estimating \((G({\bar{s}}_{1}),\ldots ,G({\bar{s}}_{R}))\).
The augmented process has
reactions. Note that each reaction \({{{{{{{{\mathcal{R}}}}}}}}}_{s}\) has the constant propensity of s. Our Padé PSD method simulates such a reaction network over the timeinterval [0, T_{f}], by extending the classical Gillespie’s Stochastic Simulation Algorithm^{39}, and then estimates the Padé derivatives and the direct estimates \((G({\bar{s}}_{1}),\ldots ,G({\bar{s}}_{R}))\). Under this extension, when the firing reaction is k = 1, …, K, then the state (x, y, z) moves to (x + ζ_{k}, y, z) as in the original CTMC. However, when the firing reaction is \({{{{{{{{\mathcal{R}}}}}}}}}_{{s}_{\ell }}\) for some ℓ = 1, …, (L − 1) then the state (x, y, z) moves to \((x,y^{\prime} ,z)\) where
Similarly, if the firing reaction is \({{{{{{{{\mathcal{R}}}}}}}}}_{{\bar{s}}_{r}}\) for some r = 1, …, R then the state (x, y, z) moves to \((x,y,z^{\prime} )\) where
Estimation of the Padé derivatives at ∞ (i.e. \({D}_{m}^{(\infty )}\) for m = 0, …, (ρ_{L} − 1)) requires several evaluations of functions of the form \({{\mathbb{A}}}^{m}f(x)\). This can be done recursively but it is computationally very intensive. In order to minimise these evaluations we exploit the fact that ergodic Markov chains visit the same set of states again and again. Therefore if we can intelligently store the values \({{\mathbb{A}}}^{m}f(x)\) generated by this function, and quickly retrieve them as needed, then it provides a way to leverage the vast memory resources in modern computers in order to gain computational efficiency. Fortunately, Python provides an ideal data structure, called a dictionary, for this purpose and we use it in our computational implementation to boost the efficiency of Padé PSD.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
Custom code was written in Python for data generation. This code is publicly available at the indicated GitHub repository^{84}.
Code availability
The Python code for data generation and analysis can be downloaded from the GitHub repository: https://github.com/ankitgupta83/PadePSD_python.git^{84}.
References
Shaner, N. C., Steinbach, P. A. & Tsien, R. Y. A guide to choosing fluorescent proteins. Nat. Methods 2, 905–909 (2005).
Mullassery, D., Horton, C. A., Wood, C. D. & White, M. R. Single live cell imaging for systems biology. Essays Biochem. 45, 121 (2008).
Norman, T. M., Lord, N. D., Paulsson, J. & Losick, R. Memory and modularity in cellfate decision making. Nature 503, 481–486 (2013).
Kaiser, M. et al. Monitoring singlecell gene regulation under dynamically controllable conditions with integrated microfluidics and software. Nat. Commun. 9, 1–16 (2018).
PotvinTrottier, L., Luro, S. & Paulsson, J. Microfluidics and singlecell microscopy to study stochastic processes in bacteria. Curr. Opin. Microbiol. 43, 186–192 (2018).
Goutsias, J. Classical versus stochastic kinetics modeling of biochemical reaction systems. Biophys. J. 92, 2350–2365 (2007).
Anderson, D. & Kurtz, T. in Design and Analysis of Biomolecular Circuits (eds Koeppl, H., Setti, G., di Bernardo, M. & Densmore, D.) (SpringerVerlag, 2011).
McAdams, H. H. & Arkin, A. Stochastic mechanisms in gene expression. Proc. Natl. Acad. Sci., Biochem. 94, 814–819 (1997).
Arkin, A. P., Rao, C. V. & Wolf, D. M. Control, exploitation and tolerance of intracellular noise. Nature 420, 231–237 (2002).
Fraker, P. J., King, L. E., LillElghanian, D. & Telford, W. G. in Methods in Cell Biology Vol. 46, 57–76 (Elsevier, 1995).
GevaZatorsky, N., Dekel, E., Batchelor, E., Lahav, G. & Alon, U. Fourier analysis and systems identification of the p53 feedback loop. Proc. Natl Acad. Sci. USA 107, 13550–13555 (2010).
Bratsun, D., Volfson, D., Tsimring, L. S. & Hasty, J. Delayinduced stochastic oscillations in gene regulation. Proc. Natl Acad. Sci. USA 102, 14593–14598 (2005).
McKane, A. J., Nagy, J. D., Newman, T. J. & Stefanini, M. O. Amplified biochemical oscillations in cellular systems. J. Stat. Phys. 128, 165–191 (2007).
Warren, P. B., TănaseNicola, S. & ten Wolde, P. R. Exact results for noise power spectra in linear biochemical reaction networks. J. Chem. Phys. 125, 144904 (2006).
van Kampen, N. G. A power series expansion of the master equation. Can. J. Phys. 39, 551–567 (1961).
Gillespie, D. T. The chemical Langevin equation. J. Chem. Phys. 113, 297–306 (2000).
Simpson, M. L., Cox, C. D. & Sayler, G. S. Frequency domain analysis of noise in autoregulated gene circuits. Proc. Natl Acad. Sci. USA 100, 4551–4556 (2003).
Cox, C. D. et al. Frequency domain analysis of noise in simple gene circuits. Chaos: Interdisciplinary J. Nonlinear Sci. 16, 026102 (2006).
Simpson, M. L., Cox, C. D. & Sayler, G. S. Frequency domain chemical Langevin analysis of stochasticity in gene transcriptional regulation. J. Theoret. Biol. 229, 383–394 (2004).
TănaseNicola, S., Warren, P. B. & Ten Wolde, P. R. Signal detection, modularity, and the correlation between extrinsic and intrinsic noise in biochemical networks. Phys. Rev. Lett. 97, 068102 (2006).
Thomas, P., Straube, A. V., Timmer, J., Fleck, C. & Grima, R. Signatures of nonlinearity in single cell noiseinduced oscillations. J. Theoret. Biol. 335, 222–234 (2013).
Thomas, P., Fleck, C., Grima, R. & Popović, N. System size expansion using Feynman rules and diagrams. J. Phys. A: Math.Theoret. 47, 455007 (2014).
Borkowski, O., Ceroni, F., Stan, G.B. & Ellis, T. Overloaded and stressed: wholecell considerations for bacterial synthetic biology. Curr. Opin. Microbiol. 33, 123–130 (2016).
Kurtz, T. G. Strong approximation theorems for density dependent Markov chains. Stoch. Process. Appl. 6, 223–240 (1978).
Thomas, P., Matuschek, H. & Grima, R. How reliable is the linear noise approximation of gene regulatory networks? BMC Genom. 14, 1–15 (2013).
Song, S. et al. Frequency spectrum of chemical fluctuation: a probe of reaction mechanism and dynamics. PLoS Comput. Biol. 15, e1007356 (2019).
Jia, C., Zhang, M. Q. & Qian, H. Analytic theory of stochastic oscillations in singlecell gene expression. Preprint at https://arxiv.org/abs/1909.09769 (2019).
Kato, T. Perturbation Theory for Linear Operators Vol. 132 (Springer Science & Business Media, 2013).
Marano, M. & Cuenya, H. Progress in Approximation Theory 693–701 (Academic Press, 1991).
Cao, Z. & Grima, R. Linear mapping approximation of gene regulatory networks with stochastic dynamics. Nat. Commun. 9, 1–15 (2018).
Cooley, J. W. & Tukey, J. W. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965).
Engelberg, S. Digital Signal Processing: an Experimental Approach (Springer Science & Business Media, 2008).
Nyquist, H. Certain topics in telegraph transmission theory. Trans. Am. Inst. Elect. Eng. 47, 617–644 (1928).
Ma, W., Trusina, A., ElSamad, H., Lim, W. A. & Tang, C. Defining network topologies that can achieve biochemical adaptation. Cell 138, 760–773 (2009).
Elowitz, M. B. & Leibler, S. A synthetic oscillatory network of transcriptional regulators. Nature 403, 335–338 (2000).
Briat, C., Gupta, A. & Khammash, M. Antithetic integral feedback ensures robust perfect adaptation in noisy biomolecular networks. Cell Sys. 2, 15–26 (2016).
Jia, C. & Grima, R. Frequency domain analysis of fluctuations of mRNA and protein copy numbers within a cell lineage: theory and experimental validation. Physical Review X 11, 021032 (2021).
Ethier, S. N. & Kurtz, T. G. Markov Processes: Characterization and Convergence (John Wiley & Sons Inc., 1986).
Gillespie, D. T. Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81, 2340–2361 (1977).
Gupta, A., Briat, C. & Khammash, M. A scalable computational framework for establishing longterm behavior of stochastic reaction networks. PLoS Comput. Biol. 10, e1003669 (2014).
Gupta, A. & Khammash, M. Computational identification of irreducible statespaces for stochastic reaction networks. SIAM J. Appl. Dyn. Syst. 17, 1213–1266 (2018).
Gardiner, C. W. et al. Handbook of Stochastic Methods Vol. 3 (Springer Berlin, 1985).
Claessens, G. On the NewtonPadé approximation problem. J. Approx. Theory 22, 150–160 (1978).
Brezinski, C. Computational Aspects of Linear Control Vol. 1 (Springer Science & Business Media, 2002).
Norris, J. R. Markov Chains (Cambridge University Press, 1998).
Thattai, M. & Van Oudenaarden, A. Intrinsic noise in gene regulatory networks. Proc. Natl Acad. Sci. USA 98, 8614–8619 (2001).
Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020).
Franklin, G. F., Powell, J. D., EmamiNaeini, A. & Powell, J. D. Feedback Control of Dynamic Systems Vol. 4 (Prentice hall Upper Saddle River, 2002).
PotvinTrottier, L., Lord, N. D., Vinnicombe, G. & Paulsson, J. Synchronous longterm oscillations in a synthetic gene circuit. Nature 538, 514–517 (2016).
Goldbeter, A. & Koshland, D. E. An amplified sensitivity arising from covalent modification in biological systems. Proc. Natl Acad. Sci. USA 78, 6840–6844 (1981).
Briat, C., Zechner, C. & Khammash, M. Design of a synthetic integral feedback circuit: dynamic analysis and DNA implementation. ACS Synth. Biol. 5, 1108–1116 (2016).
Qian, Y. & Del Vecchio, D. Realizing ‘integral control’ in living cells: how to overcome leaky integration due to dilution? J. R. Soc. Interface 15, 20170902 (2018).
Samaniego, C. C. & Franco, E. An ultrasensitive biomolecular network for robust feedback control. IFACPapersOnLine 50, 10950–10956 (2017).
Annunziata, F. et al. An orthogonal multiinput integration system to control gene expression in Escherichia coli. ACS Synth. Biol. 6, 1816–1824 (2017).
Kelly, C. L. et al. Synthetic negative feedback circuits using engineered small RNAs. Nucleic Acids Res. 46, 9875–9889 (2018).
Hsiao, V., Swaminathan, A. & Murray, R. M. Control theory for synthetic biology: recent advances in system characterization, control design, and controller implementation for synthetic biology. IEEE Control Syst. Magazine 38, 32–62 (2018).
Ceroni, F. et al. Burdendriven feedback control of gene expression. Nat. Methods 15, 387 (2018).
Huang, H.H., Qian, Y. & Del Vecchio, D. A quasiintegral controller for adaptation of genetic modules to variable ribosome demand. Nat. Commun. 9, 5415 (2018).
Agrawal, D. K., Marshall, R., Noireaux, V. & Sontag, E. D. In vitro implementation of robust gene regulation in a synthetic biomolecular integral controller. Nat. Commun. 10, 5760 (2019).
Aoki, S. K. et al. A universal biomolecular integral feedback controller for robust perfect adaptation. Nature 570, 533–537 (2019).
Venayak, N., Anesiadis, N., Cluett, W. R. & Mahadevan, R. Engineering metabolism through dynamic control. Curr. Opin. Biotechnol. 34, 142–152 (2015).
Cress, B. F., Trantas, E. A., Ververidis, F., Linhardt, R. J. & Koffas, M. A. Sensitive cells: enabling tools for static and dynamic control of microbial metabolic pathways. Curr. Opin. Biotechnol. 36, 205–214 (2015).
Ye, H. & Fussenegger, M. Synthetic therapeutic gene circuits in mammalian cells. FEBS Lett. 588, 2537–2544 (2014).
Olsman, N., Xiao, F. & Doyle, J. C. Architectural principles for characterizing the performance of antithetic integral feedback networks. iScience 14, 277–291 (2019).
Briat, C., Gupta, A. & Khammash, M. Antithetic proportionalintegral feedback for reduced variance and improved control performance of stochastic reaction networks. J. R. Soc. Interface 15, 20180079 (2018).
Chen, D. & Arkin, A. P. Sequestrationbased bistability enables tuning of the switching boundaries and design of a latch. Mol. Syst. Biol. 8, 620 (2012).
Lillacci, G., Aoki, S. K., Schweingruber, D. & Khammash, M. A synthetic integral feedback controller for robust tunable regulation in bacteria. Preprint at BioRxiv https://doi.org/10.1101/170951 (2017).
Hsiao, V., De Los Santos, E. L., Whitaker, W. R., Dueber, J. E. & Murray, R. M. Design and implementation of a biomolecular concentration tracker. ACS Synth. Biol. 4, 150–161 (2014).
De Jonge, N. et al. Rejuvenation of CcdBpoisoned gyrase by an intrinsically disordered protein domain. Mol. Cell 35, 154–163 (2009).
Kumar, S., Rullan, M. & Khammash, M. Rapid prototyping and design of cybergenetic singlecell controllers. Nat. Commun. 12, 1–13 (2021).
Pikovsky, A., Kurths, J., Rosenblum, M. & Kurths, J. Synchronization: a Universal Concept in Nonlinear Sciences Vol. 12 (Cambridge University Press, 2003).
Bagheri, N., Taylor, S. R., Meeker, K., Petzold, L. R. & Doyle III, F. J. Synchrony and entrainment properties of robust circadian oscillators. J. R. Soc. Interface 5, S17–S28 (2008).
Beta, C. & Kruse, K. Intracellular oscillations and waves. Ann. Rev. Condens. Matter Phys. 8, 239–264 (2017).
Purvis, J. E. & Lahav, G. Encoding and decoding cellular information through signaling dynamics. Cell 152, 945–956 (2013).
Rullan, M., Benzinger, D., Schmidt, G. W., MiliasArgeitis, A. & Khammash, M. An optogenetic platform for realtime, singlecell interrogation of stochastic transcriptional regulation. Mol. Cell 70, 745–756 (2018).
Benzinger, D. & Khammash, M. Pulsatile inputs achieve tunable attenuation of gene expression variability and graded multigene regulation. Nat. Commun. 9, 1–10 (2018).
Warne, D. J., Baker, R. E. & Simpson, M. J. Simulation and inference algorithms for stochastic biochemical reaction networks: from basic concepts to stateoftheart. J. R. Soc. Interface 16, 20180943 (2019).
Cao, Y., Gillespie, D. T. & Petzold, L. R. Efficient step size selection for the tauleaping simulation method. J. Chem. Phys. 124, 044109 (2006).
Anderson, D. F. & Higham, D. J. Multilevel Monte Carlo for continuous time Markov chains, with applications in biochemical kinetics. Multiscale Model. Simul. 10, 146–179 (2012).
Kang, H.W. & Kurtz, T. G. Separation of timescales and model reduction for stochastic reaction networks. Ann. Appl. Probab. 23, 529–583 (2013).
Hepp, B., Gupta, A. & Khammash, M. Adaptive hybrid simulations for multiscale stochastic reaction networks. J. Chem. Phys. 142, 034118 (2015).
Cao, Y., Gillespie, D. & Petzold, L. The slowscale stochastic simulation algorithm. J. Chem. Phys. 122, 1–18 (2005).
E, W., Liu, D. & VandenEijnden, E. Nested stochastic simulation algorithms for chemical kinetic systems with multiple time scales. J. Comput. Phys. 221, 158–180 (2007).
Gupta, A. Frequency spectra and the color of cellular noise. GitHub Repository, https://doi.org/10.5281/zenodo.6598550 (2022).
Khintchine, A. Korrelationstheorie der stationären stochastischen prozesse. Math. Ann. 109, 604–615 (1934).
Acknowledgements
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme grant agreement no. 743269 (CyberGenetics project), and from the Swiss National Science Foundation under grant number 182653.
Author information
Authors and Affiliations
Contributions
A.G. and M.K. conceived the project; A.G. performed the theoretical and computational analysis with inputs from M.K.; A.G. and M.K. wrote the paper; M.K. secured the funding.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gupta, A., Khammash, M. Frequency spectra and the color of cellular noise. Nat Commun 13, 4305 (2022). https://doi.org/10.1038/s4146702231263x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4146702231263x
This article is cited by

Advanced methods for gene network identification and noise decomposition from singlecell data
Nature Communications (2024)

Bye bye, linearity, bye: quantification of the mean for linear CRNs in a random environment
Journal of Mathematical Biology (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.