## Abstract

Gene-gene interactions are crucial to the control of sub-cellular processes but our understanding of their stochastic dynamics is hindered by the lack of simulation methods that can accurately and efficiently predict how the distributions of gene product numbers vary across parameter space. To overcome these difficulties, here we present Holimap (high-order linear-mapping approximation), an approach that approximates the protein or mRNA number distributions of a complex gene regulatory network by the distributions of a much simpler reaction system. We demonstrate Holimap’s computational advantages over conventional methods by applying it to predict the stochastic time-dependent dynamics of various gene networks, including transcriptional networks ranging from simple autoregulatory loops to complex randomly connected networks, post-transcriptional networks, and post-translational networks. Holimap is ideally suited to study how the intricate network of gene-gene interactions results in precise coordination and control of gene expression.

### Similar content being viewed by others

## Introduction

Genetic regulation occurs through intricate interactions between a number of genes^{1,2,3,4}. A gene “X” may express a protein which acts as a transcription factor (TF), promoting or inhibiting the RNA polymerase assembly on another target gene “Y” (or on itself) and thus regulating the extent that the latter is expressed^{5}. These gene-gene interactions can be simply visualized as a directed graph with the genes being the nodes (vertices) and the directed edges (links) representing the interactions^{6,7}. Networks inferred from gene expression data, commonly called gene regulatory networks^{8}, have been reconstructed by several methods^{9,10,11,12,13}. The complex connectivity of these networks makes intuitive understanding of their dynamics challenging. Consequently, the construction, mathematical analysis, and simulation of models of gene regulatory networks are indispensable tools in a quantitative biologist’s arsenal.

Several formalisms have been employed to predict gene regulatory network dynamics, including Boolean networks, ordinary differential equations (ODEs), and chemical master equations (CMEs)—for reviews covering these approaches and more, please see refs. ^{14,15}. These approaches have various advantages and disadvantages. In Boolean networks, the expression of each gene is tracked by a binary variable and hence large networks can be examined in a computationally efficient way. A more refined description is provided by the use of ODEs, where the time-dependent concentrations of RNAs, proteins, and other molecules are predicted as a function of the rate constants of the reactions in the network^{16,17}. An even more realistic description makes use of the CME approach where one predicts not only the mean expression levels of various genes but also the distributions of the discrete numbers of mRNAs and/or proteins measured across a population of cells^{18}. This stochasticity has various sources (biological intrinsic and extrinsic noise, and technical noise introduced by experimental protocols), all of which lead to the large differences in gene expression observed from one cell to another^{19,20,21}.

Unfortunately, with an increasing level of sophistication and predictive power, simulations also rapidly become computationally expensive. Unraveling the stochastic dynamics of gene networks requires solving a set of coupled CMEs for the probability of the system being in each possible state. Since the number of states of a gene network is typically infinite, direct solution of these equations is impossible. The finite-state projection algorithm (FSP)^{22} truncates the infinite state space to a finite one; this renders numerical solutions possible because we only need to solve a finite-dimensional CME. However, the immense number of states limits its applications to very small networks with one or two interacting genes. For larger networks with multiple interacting genes, Monte Carlo simulations based on the stochastic simulation algorithm (SSA)^{23} become more practical. Specifically, given the current state of the system, the SSA generates two random numbers to predict the time when the next reaction event occurs and which particular reaction event will occur. The output is a number of statistically correct trajectories (molecule number versus time data), one for each cell, from which the copy number distributions of all biochemical species can be calculated. However, the issue remains that a large sampling size is typically required to obtain smooth distributions and hence the computational time can still be very considerable. For an introduction to simulation methods in stochastic biology, we refer the reader to refs. ^{24,25,26}.

In this paper, we overcome the difficulties of conventional stochastic simulation methods for gene networks by devising an efficient approach—the high-order linear-mapping approximation (Holimap). The basic idea is to map the dynamics of a complex gene network with second or higher-order interactions (a system with nonlinear propensities and hence a nonlinear network) to the dynamics of a much simpler system where all reactions are first-order (a linear network). The reaction rates of this system are generally time-dependent and complex functions of the reaction rates of the original gene network and they are found by conditional moment-matching. The linear network has a much smaller state space than the nonlinear network which means that now simulation using FSP becomes feasible, leading to smooth distributions of protein numbers in a fraction of the time taken by SSA simulations. For an illustration of Holimap see Fig. 1.

The paper is structured as follows. The Holimap method is introduced by means of a simple autoregulatory feedback loop example where we show step-by-step how the approximation is constructed when second or higher-order interactions are only between a protein and a gene. The method is then extended to show the application to more complex networks with multiple protein-gene interactions and also to networks with gene product interactions such as those with RNA-RNA, RNA-protein, and protein-protein high-order reactions. By comparison with the SSA or FSP, we show that independent of the type of interactions in a gene network, Holimap provides highly accurate time-dependent distributions of protein or mRNA numbers over large swathes of parameter space including those regions where the system displays oscillatory or multistable dynamics. Finally, we show that the computation time of Holimap can be significantly reduced while maintaining its accuracy by devising a hybrid method which combines both Holimap and the SSA.

## Results

### Fundamental principles of Holimap illustrated by an autoregulation example

Consider a simple autoregulatory feedback loop^{27,28}, whereby protein expressed from a gene regulates its own transcription (Fig. 2a). Feedback is mediated by cooperative binding of *h* protein copies to the gene^{29,30,31,32}. In agreement with experiments^{33}, protein synthesis is assumed to occur in bursts of random size *k* sampled from a geometric distribution with parameter *p*, i.e., \({\mathbb{P}}(k=n)={p}^{n}(1-p)\). Here *σ*_{b} is the binding rate of protein to the gene; *σ*_{u} is the unbinding rate; *ρ*_{b} and *ρ*_{u} are the burst frequencies of protein, i.e., the frequencies with which bursts are produced, when the gene is in the bound and unbound states, respectively; *d* is the rate of protein degradation and dilution (due to cell division). The reaction system describes a positive feedback loop when *ρ*_{b} > *ρ*_{u} (since in the case, binding of a protein increases its own expression) and describes a negative feedback loop when *ρ*_{b} < *ρ*_{u} (binding of a protein decreases its own expression).

Let *p*_{i,n} denote the probability of having *n* protein copies in an individual cell when the gene is in state *i* with *i* = 0, 1 corresponding to the unbound and bound states, respectively. To proceed, let \({g}_{i}={\sum }_{n=0}^{\infty }{p}_{i,n}\) be the probability of observing the gene in state *i* and let \({\mu }_{m,i}=\mathop{\sum }_{n=0}^{\infty }n(n-1)\cdots (n-m+1){p}_{i,n}\) be the *m*th factorial moment of protein numbers when the gene is in this state. For simplicity, we first focus on the case of non-cooperative binding (*h* = 1). From the CME, it is straightforward to obtain the following time evolution equations for the moments:

where *g*_{1} = 1 − *g*_{0} and *B* = 〈*k*〉 = *p*/(1 − *p*) is the mean protein burst size, i.e., the mean number of protein molecules produced in a single burst. For clarity, we have suppressed the explicit time dependence of all moments. Note that this system of equations is not closed, i.e., the equation for a moment of a certain order depends on moments of higher orders, and hence an exact solution is generally impossible. This difficulty stems from the nonlinear dependence on molecule numbers of the bimolecular propensity modeling protein-gene interactions^{34}.

In contrast, a linear gene network (one composed of only first-order reactions, i.e., the propensity of each reaction has a linear dependence on molecule numbers) is much easier to solve both analytically and numerically than a gene network with nonlinear propensities; for example, the moment equations are closed and thus can be solved exactly in this case. A basic idea of the linear-mapping approximation (LMA) developed in ref. ^{35} is to transform a complex nonlinear network into a linear one by replacing all second or higher-order reactions between proteins and genes by effective first-order reactions. Specifically, for the network in Fig. 2a, we replace the reactions *G* + *h**P* *⇌* *G*^{*} by *G* *⇌* *G*^{*}. The LMA maps the nonlinear network to the linear one shown in Fig. 2b, where the binding rate *σ*_{b} for the former is replaced by the effective gene switching rate \({\hat{\sigma }}_{b}\) for the latter, while the other parameters remain unchanged. In the LMA, \({\hat{\sigma }}_{b}\) is chosen to be *σ*_{b} multiplied by the conditional mean of protein numbers in the unbound gene state, i.e.,

where *g*_{0} and *μ*_{1,0} can be calculated by a natural moment-closure method (“Methods”)^{35}. There are two approximations involved in the LMA: (i) in reality, the effective parameter \({\hat{\sigma }}_{b}\) should be stochastic rather than deterministic since it is proportional to the instantaneous protein number in the unbound state; (ii) any moment-closure method inevitably leads to some errors^{36}.

Next we propose an efficient method—Holimap, which we will show to perform much better than the LMA. There are two types of Holimaps. The first type is the 2-parameter Holimap (2-HM) which transforms the nonlinear gene network into the linear one illustrated in Fig. 2c, where both the binding and unbinding rates *σ*_{b} and *σ*_{u} for the former are replaced by the effective gene switching rates \({\widetilde{\sigma }}_{b}\) and \({\widetilde{\sigma }}_{u}\) for the latter. The remaining question is how to determine \({\widetilde{\sigma }}_{b}\) and \({\widetilde{\sigma }}_{u}\) so that the solution of the linear network accurately approximates that of the nonlinear one. For the linear network, the evolution of moments is governed by

The effective rates \({\widetilde{\sigma }}_{b}\) and \({\widetilde{\sigma }}_{u}\) are chosen so that the two systems have the same zeroth and first-order moment equations (for the latter, we mean the first-order moment when the gene is in the bound state). Matching the first and third identities in Eqs. (1) and (3), we find that \({\widetilde{\sigma }}_{b}\) and \({\widetilde{\sigma }}_{u}\) should satisfy

The remaining question is how to use these equations to obtain formulae for the effective rates. This can be done as follows: we first solve for \({\widetilde{\sigma }}_{b}\) and \({\widetilde{\sigma }}_{u}\) using Eq. (4) and then substitute these into Eq. (3) to obtain a set of closed moment equations. These equations can be solved for the values of all zeroth, first, and second-order moments, i.e., *g*_{i}, *μ*_{1,i}, and *μ*_{2,i}. Finally substituting these into Eq. (4) gives the values of the effective parameters \({\widetilde{\sigma }}_{b}\) and \({\widetilde{\sigma }}_{u}\) for the linear network. See Supplementary Note 2 for a more detailed explanation of the Holimap algorithm.

In steady-state, the values of \({\widetilde{\sigma }}_{b}\) and \({\widetilde{\sigma }}_{u}\) are constants independent of time, and hence we can use the steady-state protein distribution of the linear network to approximate that of the nonlinear one—this can be computed analytically^{37} or using FSP. When the system has not reached steady-state, the values of \({\widetilde{\sigma }}_{b}\) and \({\widetilde{\sigma }}_{u}\) depend on time *t*. In this case, we can use the time evolution of the linear network with time-dependent rates to predict that of the nonlinear one—while analytical solutions are not generally available in this case, the distributions can be efficiently computed using FSP.

In some regions of parameter space, the 2-HM may still not be accurate enough. To solve this problem, we devise a second type of Holimap—the 4-parameter Holimap (4-HM), which transforms the nonlinear network into the linear one illustrated in Fig. 2d. Here the binding rate *σ*_{b}, unbinding rate *σ*_{u}, and the protein burst frequencies *ρ*_{b} and *ρ*_{u} for the former are replaced by four effective parameters \({\bar{\sigma }}_{b},{\bar{\sigma }}_{u},{\bar{\rho }}_{b}\), and \({\bar{\rho }}_{u}\) for the latter, which can be determined by matching the moment equations for the two networks (“Methods”). Note that while for the 2-HM, we matched only the zeroth and first-order moments, for the 4-HM, we matched these and also the second-order moments. The 2-HM and 4-HM will be collectively referred to as Holimaps in what follows.

Thus far, we have only considered the case of *h* = 1. For the case of cooperative binding (*h* ≥ 2), the Holimap approximation procedure can be similarly performed, except that higher-order moment equations need to be solved (Supplementary Note 2)—the algorithm for finding the effective parameters requires the solution of (*h* + 1)-order moment equations. For example, when *h* = 2, third-order moment equations need to be solved and the effective parameters depend on the values of zeroth, first, second, and third-order moments. We emphasize that the computational cost of Holimap is mainly determined by the number of moment equations, *L*, to be solved. For autoregulatory loops, *L* = 1 + 2*h* for the LMA and *L* = 3 + 2*h* for Holimap. Note that the 2-HM and 4-HM have the same *L*.

The principles used to construct Holimaps for autoregulated networks can be used to obtain Holimaps for an arbitrarily complex network consisting of a system of interacting genes that regulate each other via positive or negative feedback. A flow chart of the Holimap algorithm for a general regulatory network can be found in Supplementary Fig. 1. The computational time of Holimap depends on the complexity of the network—an increased number of nodes (genes) or edges (regulatory reactions) results in an increased number of moment equations *L* to be solved. In Supplementary Note 3, we prove that for a general network, *L* scales polynomially with the cooperativity *h* and scales exponentially with respect to the network size *M* (number of genes).

### Applications to one-node (autoregulatory) networks

We now assess the performance of Holimap based on the Hellinger distance (HD) between the steady-state protein distribution obtained by applying FSP to the nonlinear network and the approximate distribution computed using the LMA and the two types of Holimaps. Note that while the direct application of FSP also leads to an approximate distribution, in effect, it can be considered exact since the error is very small provided the state space is truncated to a large enough value^{22}. Here we choose the HD because it is bounded between 0 and 1; a visually accurate approximation is obtained when the HD ≪ 0.1.

Figure 2e illustrates the HD for the LMA as a function of *ρ*_{u} and *ρ*_{b}. Clearly, the LMA performs well when *ρ*_{u} and *ρ*_{b} are not very different from each other. However, it results in larger deviations from FSP when the protein burst frequency in one gene state is significantly larger than that in the other. We also find that the LMA is much more accurate for negative feedback loops (*ρ*_{u} > *ρ*_{b}) than for positive feedback loops (*ρ*_{b} > *ρ*_{u}). In the LMA, the effective stochastic parameter \({\hat{\sigma }}_{b}\) is approximated by *σ*_{b} multiplied by the conditional mean of protein numbers in the unbound state. Hence it must give rise to inaccurate approximations when protein noise in the unbound gene state is large. This is exactly what happens in the positive feedback case where the low synthesis rate in the unbound state results in a small conditional mean and thus large protein noise.

We next examine whether Holimap outperforms the LMA when it is applied to positive feedback loops. Figure 2f shows the HD against *σ*_{u}/*d* and *σ*_{b}/*d* for the LMA, 2-HM, and 4-HM when *ρ*_{b} ≫ *ρ*_{u}. It is clear that the LMA (Fig. 2f, left) performs well when *σ*_{u} and *σ*_{b} are both small, but it becomes highly inaccurate when *σ*_{u} and *σ*_{b} are larger. The protein distribution can be unimodal or bimodal. The bimodal one is of particular interest because it indicates the separation of isogenic cells into two different phenotypes. In particular, we find that the LMA results in poor approximations when *σ*_{u }≥ *d* and when the distribution is bimodal. This can be explained as follows. Recall that the LMA transforms a nonlinear network into a linear one with unchanged *σ*_{u}, which is commonly known as the telegraph model of stochastic gene expression^{38}. In ref. ^{39}, it has been proved that the telegraph model can produce a bimodal steady-state distribution only when both gene switching rates are smaller than the protein decay rate (\({\sigma }_{u},{\hat{\sigma }}_{b} \, < \, d\)). When *σ*_{u }≥ *d*, the linear network can never exhibit bimodality, while the bimodality in the nonlinear network can be apparent.

We emphasize that *σ*_{u }≥ *d* is biologically relevant since in naturally occurring systems, protein is usually very stable^{40} and hence its decay rate is often smaller than the rates of gene state switching. For example, in mouse fibroblasts, it has been measured^{41} that the median protein half-life is 46 h and the mean cell cycle duration is 27.5 h; hence the mean decay rate of protein is estimated to be \(d=(\log 2)/46+(\log 2)/27.5\,{{{{\rm{h}}}}}^{-1}=6.7\times 1{0}^{-4}\,{\min }^{-1}\). In the same cell type, the mean activation and inactivation rates for thousands of genes are estimated to be 0.002 min^{−1} and 0.24 min^{−1} ^{42}. In another study, the mean activation and inactivation rates are estimated to be 0.014 min^{−1} and 0.17 min^{−1}^{43}. Hence *σ*_{u }≥ *d* is indeed satisfied for most genes.

In contrast to the LMA, both the 2-HM and 4-HM markedly reduce the HD values (Fig. 2f, center and right). The LMA has a maximum HD of 0.7, while for the two types of Holimaps, the maximum HDs are only 0.2 and 0.16. The 4-HM performs marginally better than the 2-HM in capturing steady-state protein distributions. We also compare the region of parameter space where bimodality is predicted to exist (region enclosed by the orange curves) with the actual region where bimodality manifests according to FSP (region enclosed by the red curves). We note that while the LMA fails to capture the bimodal region of the protein distribution, especially when *σ*_{u }≥ *d*, both the 2-HM and 4-HM capture the vast majority of the bimodal region. In summary, the deficiencies of the LMA for positive feedback loops are remedied by the use of Holimaps (Fig. 2g).

Finally, we examine how the cooperativity in protein binding affects the accuracy of various approximation methods. Figure 2h shows the maximum HD as a function of *h* for the LMA, 2-HM, and 4-HM, where the maximum HD is computed when *σ*_{u} and *σ*_{b} vary over large ranges and other parameters remain fixed. Clearly, for the LMA, the maximum HD increases approximately linearly with respect to *h* when *h* ≤ 4; for Holimaps, the maximum HD is insensitive to *h*. Since TF cooperativity is the norm rather than the exception^{5}, our results suggest Holimap’s accuracy remains high over the physiologically meaningful range of parameter values.

The results that we have presented assume steady-state conditions. However, the 2-HM can also accurately reproduce the time evolution of the protein distribution for nonlinear gene networks (Supplementary Fig. 2). The 4-HM is also accurate; however depending on parameter values, it may lead to numerical instability at short times, which usually occurs when *σ*_{u} and *σ*_{b} are large for negative feedback loops (Supplementary Fig. 3). We did not observe numerical instability for the 2-HM. As a result, the 2-HM might be the preferable choice when dynamics is of major interest. In steady-state, while the improvement in accuracy of the 4-HM may be marginal, nevertheless since the two types of Holimaps require the solution of the same number of moment equations, the 4-HM is more advantageous when dynamics is not of interest.

### Applications to two-node networks with deterministic mono- and bistability

We next evaluate the performance of Holimaps when applied to study the steady-state behavior of two-node gene networks, where two genes regulate each other (Fig. 3a, left). Feedback is mediated by cooperative binding of *h*_{1} copies of protein *P*_{1} to gene *G*_{2} and cooperative binding of *h*_{2} copies of protein *P*_{2} to gene *G*_{1}. Here *σ*_{bi} and *σ*_{ui} are the binding and unbinding rates for gene *G*_{i}, respectively; *ρ*_{bi} and *ρ*_{ui} are the synthesis rates of protein *P*_{i} when the gene is in the bound and unbound states, respectively; *d*_{i} is the degradation rate of protein *P*_{i}. For simplicity, we do not take protein bursting into account, although it can be included easily. Depending on whether *ρ*_{ui} < *ρ*_{bi} or *ρ*_{ui} > *ρ*_{bi} for *i* = 1, 2, there are four different types of effective system dynamics that constitute either a positive feedback or a negative feedback loop (Fig. 3b). For example, a toggle switch (two negative regulations)^{44} corresponds to the case of *ρ*_{u1} > *ρ*_{b1} and *ρ*_{u2} > *ρ*_{b2}. For two-node networks, Holimaps can be performed in a similar way as we have previously shown for autoregulatory loops, i.e., by replacing all protein-gene binding reactions by effective first-order reactions with new parameters and also allowing some of the other reactions to have different rate constants than those in the original network (Fig. 3a, center and right).

We first focus on a negative feedback loop without cooperative binding (Fig. 3c). Since the LMA performs well when the unbinding rate *σ*_{ui} is much smaller than the degradation rate *d*_{i}, here we consider the case of *σ*_{ui} ≫ *d*_{i}. We use the HD between the actual and approximate steady-state distributions of protein *P*_{1} to test the accuracy of Holimap. Figure 3d illustrates the HDs for the LMA and 4-HM as functions of *σ*_{b1} and *σ*_{b2}. We find that the network displays bimodality when *σ*_{b1} is large and *σ*_{b2} is small. This is surprising because in the literature there are two well-accepted origins for bimodality: (i) a positive feedback loop with ultra-sensitivity (type-I)^{44} and (ii) slow switching between gene states (type-II), independent of the type of feedback loop^{37}. Here the network is a negative feedback loop without cooperative binding, and thus there is neither a positive feedback loop nor ultra-sensitivity. Moreover, since both *σ*_{u1} and *σ*_{b1} are large, gene *G*_{1} switches rapidly between the two states. Hence the bimodality observed is neither type-I nor type-II, and in the following, we refer to it as type-III bimodality.

From Fig. 3d, it is clear that the LMA performs poorly in this bimodal region. Again, the LMA cannot capture type-III bimodality since it transforms the nonlinear network into a linear one with unchanged *σ*_{ui}, which is unable to produce a bimodal distribution when *σ*_{ui }≥ *d*_{i}^{39}. On the other hand, the 4-HM significantly reduces the HD values and performs exceptionally well in capturing the bimodal region (Fig. 3e). Here we do not show the 2-HM because it leads to similar results as the 4-HM except for a slightly larger HD value.

We next consider a toggle switch with cooperative binding, where two genes repress each other (Fig. 3f). Note that this is a positive feedback loop with ultra-sensitivity and hence it can produce deterministic bistability (type-I bimodality), which means that the corresponding system of deterministic rate equations (Supplementary Note 4) is capable of having two stable fixed points and one unstable point. Again, we only focus on the situation of *σ*_{ui} ≫ *d*_{i}. Figure 3g illustrates the HDs for the LMA and 4-HM against *σ*_{b1} and *σ*_{b2}. The yellow curve encloses the region of deterministic bistability, which is markedly smaller than the true bimodal region enclosed by the red curve. According to simulations, bimodality can be observed when both *σ*_{b1} and *σ*_{b2} are large. The LMA fails to reproduce the bimodal distribution since *σ*_{ui }≥ *d*_{i}, as expected. The 4-HM not only successfully captures the bimodal region (enclosed by the orange curve), but also yields small HD values. The maximum HD for the LMA is as large as 0.7, while it is only 0.13 for the 4-HM. In particular, in the deterministically bistable region, both the 2-HM and 4-HM accurately predict the protein distribution while the LMA completely fails (Fig. 3h).

### Applications to three-node networks with deterministic oscillations

We now focus on three-node networks, where three genes regulate each other in a cyclic manner (Fig. 4a, left). Feedback is mediated by cooperative binding of *h*_{i} copies of protein *P*_{i} to gene *G*_{i+1} for *i* = 1, 2, 3, where *G*_{4} = *G*_{1}. Again, depending on whether *ρ*_{ui} < *ρ*_{bi} or *ρ*_{ui} > *ρ*_{bi} for *i* = 1, 2, 3, the network can be a repressilator (three negative regulations)^{45}, a Goodwin model (one negative regulation and two positive regulations)^{46}, or a positive feedback loop^{47}.

As for previous examples, Holimap transforms the nonlinear network into a linear one (Fig. 4a, right). We now focus on the repressilator illustrated in Fig. 4b, where the cooperativities are chosen as *h*_{1} = *h*_{2} = *h*_{3} = 3. Here high cooperativities are chosen since we require the corresponding deterministic system of rate equations (Supplementary Note 4) to produce sustained oscillations. According to simulations, deterministic oscillations are not observed when *h*_{i }≤ 2. Figure 4c illustrates the oscillatory time evolution of the mean and Fano factor (the variance divided by the mean) of fluctuations in the number of protein *P*_{1} computed using the SSA, LMA, and 2-HM. Note that here we do not consider the 4-HM because, as previously mentioned, it may cause numerical instability when computing time-dependent distributions. The LMA fails to reproduce damped oscillations in the time evolution of the mean and Fano factor, while Holimap excellently captures these oscillations. Note also that the LMA significantly underestimates the variance of fluctuations and hence leads to a much smaller Fano factor in the limit of long times.

Figure 4d compares the time-dependent protein distributions computed using the SSA, LMA, and 2-HM. Interestingly, both the LMA and 2-HM accurately reproduce the protein distribution at small times (*t* ≤ 3). However, the LMA fails to reproduce bimodality at intermediate and large times since it underestimates noise. In contrast, Holimap performs remarkably well in predicting the complete time evolution of the protein distribution.

Thus far, we have considered regulatory networks where each gene is regulated by one TF; however, many genes are regulated by a multitude of TFs which are often shared between multiple genes^{48}. In Supplementary Note 5, we investigate gene networks with two TF binding sites. We show that Holimap performs excellently in capturing the protein distributions as well as the bimodal region, independent of the type of network topology and the type of TF binding (independent, positive cooperative, and negative cooperative binding^{49}).

### A hybrid combination of SSA and Holimap provides highly efficient computation of complex gene network dynamics

The FSP and SSA are two widely used methods for solving the dynamics of stochastic chemical reaction systems. While FSP yields an accurate distribution, from a practical point of view, it is only applicable to small networks where protein numbers are not very large; for large networks, the size of the state space leads to an enormous computational cost^{22}. The SSA can also be computationally expensive, particularly when the network has multiple reaction time scales^{23}. When fluctuations are large, it can yield a non-smooth distribution, from which it is sometimes even difficult to determine the number of modes. To overcome this, a huge number of stochastic trajectories may be needed to obtain statistically accurate results. Holimap provides an accurate and smooth approximation of the protein distributions; however, it becomes computationally slow when the network is complex or the cooperativity is large since in these cases we have to solve a large number of moment equations. This raises an important question: is it possible to develop a highly efficient and accurate computation method of stochastic gene network dynamics that yields a smooth distribution?

To address this question, we propose a hybrid method that combines the SSA and Holimap. This method consists of three steps (Fig. 5a). First we use the SSA to generate a small number of trajectories (usually a few thousand trajectories are enough) from which we compute the steady-state or time-dependent sample moments of protein numbers. We then use the latter to compute the approximate effective parameters of the linear network. Finally, we use FSP to compute the protein distribution of the linear network with effective parameters to approximate that of the nonlinear one. For example, for the autoregulatory circuit illustrated in Fig. 2a, we substitute the sample moments obtained from the SSA into Eq. (4) to compute the approximate values of \({\widetilde{\sigma }}_{u}\) and \({\widetilde{\sigma }}_{b}\), and then use the marginal protein distribution of the linear network to construct the 2-HM of the nonlinear network. In other words, for Holimap, the determination of the effective parameters can be done independently of other computational methods while the hybrid method requires the running of the SSA.

This hybrid SSA + Holimap method is computationally much faster than the SSA because the number of trajectories needed to obtain good approximations to low-order moments is much less than that needed to obtain smooth protein distributions. It is also computationally less expensive than Holimap since there is no need to solve a large number of moment equations. To test this hybrid method, we compare the time-dependent distributions for the repressilator calculated using the SSA, LMA, 2-HM, and SSA + 2-HM (Fig. 4d). In Fig. 5b, we also compare the CPU times and accuracy of these methods. The number of SSA trajectories *N* needed for SSA + 2-HM is chosen such that the distributions obtained from *N* trajectories and those obtained from 3*N* trajectories have an HD (averaged over all time points) less than 0.02, i.e., increasing the sample size will not substantially improve the approximation accuracy—a sample size of *N* = 2000 is sufficient to satisfy this criterion. Notably with almost the same CPU time, SSA + 2-HM yields distributions that are significantly more accurate than those obtained from only the SSA with the same number of trajectories—the HD for the former is only 0.04–0.06, while for the latter it is 0.11–0.13; here the distributions obtained from the SSA with 10^{5} trajectories are used as a proxy of ground truth when computing the HDs. We also note that SSA + 2-HM yields distributions that are practically as accurate as the 2-HM but are over 16 times faster (28 s vs 7 min 39 s).

To further test the accuracy of SSA + Holimap, we apply it to a random *M*-node gene network (Fig. 5c), where any pair of nodes has a probability of 2/*M* to be connected. This guarantees that each gene on average regulates two genes. When connected, each direct edge has an equal probability to be positive or negative regulation; autoregulation is also allowed. The details of the stochastic model are described in Methods. We then apply the 2-HM to transform the nonlinear random network into a linear one and then use 2000 SSA trajectories to estimate the effective parameters of the linear network. Figure 5d illustrates the CPU times and HDs against the number of nodes *M* for SSA + 2-HM and the SSA with the same number of trajectories. Again an SSA with 10^{5} trajectories is used to generate a proxy of the ground truth distribution when computing the HDs. Clearly, the two methods yield almost the same CPU times that approximately linearly scale with *M*. This is because for SSA + 2-HM, almost all time is spent on simulating the SSA trajectories, while solving the linear network consumes very little time. However, compared with an SSA with 2000 trajectories, SSA + 2-HM gives rise to markedly lower HD values, which are insensitive to *M*.

### Generalization to networks with post-translational or post-transcriptional regulation

Thus far, we have showcased Holimap in transcriptional networks with protein-gene interactions. A crucial question is whether Holimap can also be applied to solve the dynamics of post-translational and post-transcriptional networks with complex protein-protein, protein-RNA, and RNA-RNA interactions. To see this, we first focus on two post-translational networks (Fig. 6a, b).

Figure 6a shows a two-node synthetic network with autoregulation and protein sequestration^{50}. Here protein *P*_{i} produced from gene *G*_{i} regulates its own expression; the two proteins *P*_{1} and *P*_{2} can bind to each other and form an inactive complex *C*. We then devise a three-parameter Holimap (3-HM) which transforms the nonlinear gene network into the linear one shown in Fig. 6c. In principle, Holimap replaces all high-order interactions between genes, proteins, and RNAs by effective first-order reactions. We first replace the protein-gene binding reactions \({G}_{i}+{h}_{i}{P}_{i} \, \rightleftharpoons \, {G}_{i}^{*}\) by \({G}_{i} \, \rightleftharpoons \, {G}_{i}^{*}\) with effective parameters \({\widetilde{\sigma }}_{ui}\) and \({\widetilde{\sigma }}_{bi}\), and then we replace the protein-protein binding reaction *P*_{1} + *P*_{2} → *C* by \({P}_{i}\to \varnothing\) with effective parameter \({\widetilde{d}}_{i}\). Again, using moment-matching, the three effective parameters \({\widetilde{\sigma }}_{ui},{\widetilde{\sigma }}_{bi}\), and \({\widetilde{d}}_{i}\) can be represented by low-order moments of the nonlinear network (Supplementary Note 7) and hence can be computed approximately using the SSA with a small number of trajectories. In this way, the hybrid SSA + Holimap can be applied to predict the dynamics of the nonlinear network.

Note that since Holimap replaces the binding reaction *P*_{1} + *P*_{2} → *C* by \({P}_{1}\to \varnothing\) with a new parameter \(\widetilde{d}\), intuitively, one may deduce that this approximation is only valid when protein *P*_{2} is very abundant compared to protein *P*_{1} so that noise in protein *P*_{2} number can be ignored. However, unexpectedly, we find that Holimap makes accurate predictions not only in this special case but also in scenarios where the two proteins interact at comparable concentrations and where *P*_{2} is very scarce compared to *P*_{1} (Supplementary Fig. 5). This again confirms the high accuracy of Holimap over large regions of parameter space.

As another example of post-translational regulation, we consider a gene network with autoregulation and protein phosphorylation (Fig. 6b), which has been used to account for circadian oscillations in *Drosophila* and *Neurospora*^{51}. Here the free protein *P* can be reversibly phosphorylated into the forms *P*_{1} and *P*_{2}, successively. The latter form *P*_{2} can bind to the gene and regulate its expression. Both phosphorylation and dephosphorylation are enzyme-catalyzed and are described using Michaelis-Menten kinetics. Holimap can also be applied to this network, where protein-gene interactions are replaced by the switching reactions *G* \(\rightleftharpoons\) *G*^{*} with effective parameters \({\widetilde{\sigma }}_{u}\) and \({\widetilde{\sigma }}_{b}\), and the complex post-translational regulation is replaced by the degradation reaction \(P\to \varnothing\) with effective parameter \({\widetilde{d}}\) (Fig. 6c and Supplementary Note 7).

Furthermore, we apply Holimap to two post-transcriptional networks (Fig. 6d, e). Figure 6d illustrates a gene network with autoregulation and mRNA degradation control^{52}. Here the enzyme can convert between an inactive form *E* and an active form *E*^{*}. The degradation of the mRNA of interest can occur spontaneously with rate *d* and can be catalyzed by the active form of the enzyme with rate *α*. Holimap transforms the nonlinear network into the linear one shown in Fig. 6f by removing all high-order interactions between molecules. In particular, the enzyme-catalyzed degradation reaction *M* + *E*^{*} → *E*^{*} is replaced by the effective degradation reaction \(M\to \varnothing\) with new parameter \(\widetilde{d}\) (Supplementary Note 7).

Figure 6e illustrates another network with microRNA-mRNA interactions, which has been shown to be capable of producing complex emergent behaviors such as bistability and sustained oscillations^{53}. Here the mRNA of interest, expressed from gene *G*_{1}, has two microRNA binding sites. The microRNA, produced from gene *G*_{2}, can bind to its mRNA target and form two inactive complexes *C*_{1} (only one binding site is occupied) and *C*_{2} (both binding sites are occupied). The free mRNA and microRNA are degraded with rates *d*_{1} and *d*_{2}, respectively. Once the complex *C*_{1} (*C*_{2}) is formed, the mRNA and microRNA are degraded with rates *a*_{1} (*b*_{1}) and *a*_{2} (*b*_{2}), respectively. The mRNA dynamics for this network can also be predicted by Holimap, which replaces the complex post-transcriptional regulation by the effective reaction \(M\to \varnothing\) with new parameter \(\widetilde{d}\) (Fig. 6f and Supplementary Note 7).

Note that for transcriptional networks, Holimap does not change the degradation rate; however, for post-transcriptional and post-translational networks, both the binding/unbinding rate and degradation rate need to be modified. To test the accuracy of the three-parameter Holimap, we compare the time-dependent distributions for the above four gene networks computed using the SSA with 10^{5} trajectories, SSA with 2000 trajectories, and hybrid SSA + Holimap with 2000 trajectories (Fig. 6g). Clearly, SSA+Holimap is accurate for all networks. In particular, the distributions predicted by SSA + Holimap with a small number of trajectories have almost the same accuracy as those predicted by the SSA with a huge number of trajectories (HD < 0.03) while the CPU time is reduced by over 60 fold (Fig. 6h).

## Discussion

In this paper, we have constructed a computational method, Holimap, for the accurate and efficient prediction of the protein/mRNA number distributions of a general gene regulatory network. We have showcased the method by applying it to a variety of networks including transcriptional networks with protein-gene interactions, post-translational networks with protein-protein interactions, and post-transcriptional networks with protein-RNA or RNA-RNA interactions. For transcriptional networks, we have tested Holimap in simple autoregulatory loops where a gene influences its own expression, two-gene systems such as the toggle switch, three-gene systems such as the repressilator, and complex randomly connected networks with numerous interacting genes. Notably, we have demonstrated that a hybrid method that uses both Holimap and the SSA leads to much more accurate distributions than solely using the SSA, with practically no increase in the CPU time and high accuracy that is independent of the number of interacting genes in the network.

We devised three types of Holimaps—the 2-HM, 3-HM, and 4-HM—all of them decoupling gene-gene interactions in a nonlinear regulatory network and transforming it into a linear one with multiple effective parameters. The 2-HM and 4-HM apply to transcriptional networks, while the 3-HM is only applicable to post-translational and post-transcriptional networks. The 4-HM is more accurate than the 2-HM, although the improvement in accuracy is marginal. Depending on parameters, the 4-HM may lead to numerical instability at short times. Hence the 4-HM is preferred if our aim is to compute the steady-state distribution, and the 2-HM is a preferable choice if our aim is to compute the time-dependent distribution. The two types of Holimaps require the solution of the same number of moment equations and hence give rise to similar CPU times. Since the number of equations to be solved increases exponentially with the network size, the standard Holimap is only recommended when the scale of the network is not too large. For medium and large-scale networks, the hybrid SSA+Holimap approach is more advantageous since it significantly reduces the CPU time while maintaining high accuracy.

Some of the advantages of our method over other common approximations in the literature are as follows: (i) Holimap does not sacrifice the discrete nature of molecular reactions since the approximate distributions are solutions of the CME of the effective linear network. This is unlike many common methods that achieve a speed increase by making use of a continuum approximation of the CME such as the Fokker-Planck / Langevin equations^{54,55} or partial integrodifferential equations^{56,57}; (ii) Holimap does not assume the protein number distribution to be of a simple type such as the Gaussian, Poisson, Lognormal or Gamma distributions, as commonly assumed by conventional moment-closure methods^{58,59}—the solution of the linear network that Holimap utilizes is very flexible and spans a very large number of possible distribution shapes including those with multiple modes and significant skewness. For example, if each gene in a complex regulatory network switches between a number of states for which only one is active, then Holimap approximates the protein distribution for each gene by that of a multi-state gene expression model with no regulatory interactions (Supplementary Note 5) for which the analytical steady-state solution is known to be a generalized hypergeometric function^{60,61}, which includes a large number of special functions as special cases.

Our hybrid SSA+Holimap method shares some similarities with neural network-based approaches^{62}, which can also be used to predict complex gene network dynamics. The former uses the SSA to generate the sample moments which are then used to compute the values of effective parameters, while the latter uses the SSA to train the surrogate neural network model. While both methods can accurately capture the protein/mRNA distribution, our method outperforms the neural network-based ones in three aspects: (i) while neural network models perform well in the parameter ranges which are used to train the surrogate model, their extrapolation ability is usually weak. Our method is mechanism-based and provides accurate results over wide parameter ranges; (ii) neural network-based methods require a very long time to train the surrogate model. When the network is complex, the training time may take tens of hours to several days and may also require multiple rounds of hyperparameter tuning. In contrast, Holimap avoids the long training time; (iii) neural network models have good predictive ability but their learned approximation does not typically have a clear biophysical interpretation. Holimap transforms the complex nonlinear network into a linear one which not only has a clear physical meaning but also allows an approximative analytical solution. In addition, SSA + Holimap can be combined with neural network-based methods to increase the speed and accuracy of the latter. Since SSA + Holimap can be used to generate distributions comparable in accuracy to those from the SSA with a much larger number of trajectories, it follows that SSA + Holimap can reduce the time to generate an accurate training dataset as input to the neural network.

The main limitation of the present method is that there are no analytical guarantees that the effective parameters of the linear network are positively definite for all times. Nevertheless, for all examples using the 2-HM and 3-HM in this paper, we have numerically found this to be the case and hence we are confident that the linear network obtained by the 2-HM or 3-HM procedure is generally physically interpretable. In contrast, we observed that the 4-HM procedure can occasionally give rise to negative parameter values (typically when the binding and unbinding rates are large) and hence should be used more cautiously. Ongoing work aims to extend the method to predict both mRNA and protein dynamics, including their joint distribution for pairs of genes.

Concluding, we have devised a method that overcomes many of the known difficulties encountered when simulating complex stochastic gene network dynamics. We anticipate that Holimap will be useful for investigating noisy dynamical phenomena in complex regulatory networks where intuitive understanding is challenging to attain and simulations using the SSA become computationally prohibitive.

## Methods

### Determining the effective parameter for the LMA

For the linear network in Fig. 2b, the evolution of moments is governed by

Inserting Eq. (2) into Eq. (5) gives a closed set of moment equations, from which the values of *g*_{0}, *μ*_{1,1}, and *μ*_{1,0} can be computed approximately. Finally, using these values, the effective parameter \({\hat{\sigma }}_{b}\) can be obtained from Eq. (2). The remaining steps for the LMA are the same as for the 2-HM.

### Determining the effective parameters for the 4-HM

For the autoregulatory circuit, it follows from Eq. (1) that

For the linear network in Fig. 2d, the evolution of moments is governed by

From these equations, it is easy to show that

Matching Eqs. (6) and (8), we find that \({\bar{\rho }}_{b}\) and \({\bar{\rho }}_{u}\) should satisfy the following system of linear equations:

Matching the first and third identities in Eqs. (1) and (7), it is clear that \({\bar{\sigma }}_{b}\) and \({\bar{\sigma }}_{u}\) should satisfy the following system of linear equations:

where \({\bar{\rho }}_{b}\) has been determined by solving Eq. (9). Compared with Eq. (4), Eq. (10) has an additional term \(({\bar{\rho }}_{b}-{\rho }_{b})B{g}_{1}\). This is because *ρ*_{b} remains unchanged for the 2-HM but is changed for the 4-HM.

Finally, inserting Eqs. (9) and (10) into Eq. (7) gives a system of closed moment equations and hence the values of all zeroth, first, and second-order moments can be approximately calculated. Substituting these moments into Eqs. (9) and (10), one can finally solve for the four effective parameters \({\bar{\rho }}_{u},{\bar{\rho }}_{b},{\bar{\sigma }}_{u}\), and \({\bar{\sigma }}_{b}\) of the linear network. The 4-HM predicts the protein distribution of the nonlinear network by solving the CME of the linear one in Fig. 2d with the values of the effective parameters found above.

### Stochastic model for complex gene networks

Here we consider the stochastic model of an arbitrary gene regulatory network involving protein synthesis, protein degradation, gene state switching, and complex gene regulation mechanisms^{63}. Specifically, we assume that the network involves *M* distinct genes, each of which can be in two states: an inactive state *G*_{j} and an active state \({G}_{j}^{*}\). The protein associated with gene *G*_{j} is denoted by *P*_{j}. The network can be described by the following reactions:

where the reactions in the first row describe spontaneous gene state switching, the reactions in the second row describe gene regulation, the reactions in the third row describe protein synthesis in the two-gene states, and the last reaction describes protein degradation or dilution. Since *G*_{j} is the inactive state and \({G}_{j}^{*}\) is the active state, we have \({\rho }_{j}^{1} \, > \, {\rho }_{j}^{0}\). Due to complex gene regulation, each gene *G*_{j} may be regulated by all genes. If gene *G*_{i} activates gene *G*_{j}, then \({\sigma }_{ji}^{0} \, > \, 0\) and \({\sigma }_{ji}^{1}=0\) since the binding of protein *P*_{i} induces the switching from *G*_{j} to \({G}_{j}^{*}\); on the contrary, if gene *G*_{i} inhibits gene *G*_{j}, then \({\sigma }_{ji}^{0}=0\) and \({\sigma }_{ji}^{1} \, > \, 0\) since the binding of protein *P*_{i} induces the switching from \({G}_{j}^{*}\) to *G*_{j}. When performing simulations (SSA and SSA + Holimap), the parameters are chosen as \({d}_{i}=1,\, {h}_{ji}=1,\, {\rho }_{j}^{1}=81,\, {\rho }_{j}^{0}=5.4,\, {\alpha }_{j}^{0}={\alpha }_{j}^{1}=0.5,\, {\sigma }_{ji}^{0}=0.01,\, {\sigma }_{ji}^{1}=0\) when *G*_{i} activates *G*_{j}, and \({\sigma }_{ji}^{0}=0,\, {\sigma }_{ji}^{1}=0.01\) when *G*_{i} inhibits *G*_{j}. The presence or absence of a gene-gene interaction and its type are determined randomly. Here we assume that protein-gene interactions are non-cooperative (*h*_{ij} = 1). The spontaneous switching rates between *G*_{j} and \({G}_{j}^{*}\) are chosen to be \({\sigma }_{j}^{0}={\sigma }_{j}^{1}=0.5\). Since each gene is on average regulated by two genes (one positive regulation and one negative regulation), the switching rates due to gene regulation are roughly equal to \({\sigma }_{ji}^{0}={\sigma }_{ji}^{1}=0.01\) multiplied by the number of regulator *P*_{i}, which is ~50. Hence the total switching rates due to spontaneous contributions and gene regulation are roughly 0.5 + 0.01 × 50 = 1, i.e., they are comparable with the degradation rate *d*_{i} = 1.

### Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

## Data availability

MATLAB R2019a was used to analyze the data. Source data are provided with this paper.

## Code availability

The MATLAB codes for Holimap and SSA + Holimap can be found in the Github repository^{64}.

## References

Shen-Orr, S. S., Milo, R., Mangan, S. & Alon, U. Network motifs in the transcriptional regulation network of Escherichia coli.

*Nat. Genet.***31**, 64–68 (2002).Davidson, E. H. & Erwin, D. H. Gene regulatory networks and the evolution of animal body plans.

*Science***311**, 796–800 (2006).Olson, E. N. Gene regulatory networks in the evolution and development of the heart.

*Science***313**, 1922–1927 (2006).Gerstein, M. B. et al. Architecture of the human regulatory network derived from ENCODE data.

*Nature***489**, 91–100 (2012).Spitz, F. & Furlong, E. E. Transcription factors: from enhancer binding to developmental control.

*Nat. Rev. Genet.***13**, 613–626 (2012).Pavlopoulos, G. A. et al. Using graph theory to analyze biological networks.

*BioData Min.***4**, 1–27 (2011).Koutrouli, M., Karatzas, E., Paez-Espino, D. & Pavlopoulos, G. A. A guide to conquer the biological network era using graph theory.

*Front. Bioeng. Biotech.***8**, 34 (2020).Emmert-Streib, F., Dehmer, M. & Haibe-Kains, B. Gene regulatory networks and their applications: understanding biological and medical problems in terms of networks.

*Front. Cell Dev. Biol.***2**, 38 (2014).Margolin, A. A. et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context.

*BMC Bioinform.***7**, 1–15 (2006).Stolovitzky, G., Prill, R. J. & Califano, A. Lessons from the DREAM2 challenges: a community effort to assess biological network inference.

*Ann. N. Y. Acad. Sci.***1158**, 159–195 (2009).Emmert-Streib, F., Glazko, G. V., Altay, G. & de Matos Simoes, R. Statistical inference and reverse engineering of gene regulatory networks from observational expression data.

*Front. Genet.***3**, 8 (2012).Chan, T. E., Stumpf, M. P. & Babtie, A. C. Gene regulatory network inference from single-cell data using multivariate information measures.

*Cell Syst.***5**, 251–267 (2017).Badia-i Mompel, P. et al. Gene regulatory network inference in the era of single-cell multi-omics.

*Nat. Rev. Genet.***24**, 1–16 (2023).De Jong, H. Modeling and simulation of genetic regulatory systems: a literature review.

*J. Comput. Biol.***9**, 67–103 (2002).Karlebach, G. & Shamir, R. Modelling and analysis of gene regulatory networks.

*Nat. Rev. Mol. Cell Biol.***9**, 770–780 (2008).Edelstein-Keshet, L.

*Mathematical models in biology*(SIAM, 2005).Ingalls, B. P.

*Mathematical modeling in systems biology: an introduction*(MIT press, 2013).Schnoerr, D., Sanguinetti, G. & Grima, R. Approximation and inference methods for stochastic biochemical kinetics—a tutorial review.

*J. Phys. A Math. Theor.***50**, 093001 (2017).Elowitz, M. B., Levine, A. J., Siggia, E. D. & Swain, P. S. Stochastic gene expression in a single cell.

*Science***297**, 1183 (2002).Ozbudak, E. M., Thattai, M., Kurtser, I., Grossman, A. D. & van Oudenaarden, A. Regulation of noise in the expression of a single gene.

*Nat. Genet.***31**, 69–73 (2002).Munsky, B., Neuert, G. & Van Oudenaarden, A. Using gene expression noise to understand gene regulation.

*Science***336**, 183–187 (2012).Munsky, B. & Khammash, M. The finite state projection algorithm for the solution of the chemical master equation.

*J. Chem. Phys.***124**, 044104 (2006).Gillespie, D. T. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions.

*J. Comput. Phys.***22**, 403–434 (1976).Székely Jr, T. & Burrage, K. Stochastic simulation in systems biology.

*Comput. Struct. Biotechnol. J.***12**, 14–25 (2014).Klipp, E., Liebermeister, W., Wierling, C. & Kowald, A.

*Systems biology: a textbook*(John Wiley & Sons, 2016).Munsky, B., Hlavacek, W. S. & Tsimring, L. S.

*Quantitative biology: theory, computational methods, and models*(MIT Press, 2018).Bateman, E. Autoregulation of eukaryotic transcription factors.

*Prog. Nucleic Acid Res. Mol. Biol.***60**, 133–168 (1998).Becskei, A. & Serrano, L. Engineering stability in gene networks by autoregulation.

*Nature***405**, 590–593 (2000).Crews, S. T. & Pearson, J. C. Transcriptional autoregulation in development.

*Curr. Biol.***19**, R241–R246 (2009).Hermsen, R., Ursem, B. & Ten Wolde, P. R. Combinatorial gene regulation using auto-regulation.

*PLoS Comput. Biol.***6**, e1000813 (2010).Nie, Y., Shu, C. & Sun, X. Cooperative binding of transcription factors in the human genome.

*Genomics***112**, 3427–3434 (2020).Jia, C. & Grima, R. Dynamical phase diagram of an auto-regulating gene in fast switching conditions.

*J. Chem. Phys.***152**, 174110 (2020).Cai, L., Friedman, N. & Xie, X. S. Stochastic protein expression in individual cells at the single molecule level.

*Nature***440**, 358–362 (2006).Singh, A. & Hespanha, J. P. Lognormal moment closures for biochemical reactions. In

*Proc. of the 45th IEEE Conference on Decision and Control*, 2063–2068 (IEEE, 2006).Cao, Z. & Grima, R. Linear mapping approximation of gene regulatory networks with stochastic dynamics.

*Nat. Commun.***9**, 3305 (2018).Grima, R. A study of the accuracy of moment-closure approximations for stochastic chemical kinetics.

*J. Chem. Phys*.**136**, 154105 (2012).Jia, C. & Grima, R. Small protein number effects in stochastic models of autoregulated bursty gene expression.

*J. Chem. Phys.***152**, 084115 (2020).Ko, M. S. A stochastic model for gene induction.

*J. Theor. Biol.***153**, 181–194 (1991).Jiao, F., Sun, Q., Tang, M., Yu, J. & Zheng, B. Distribution modes and their corresponding parameter regions in stochastic gene transcription.

*SIAM J. Appl. Math.***75**, 2396–2420 (2015).Jia, C. & Grima, R. Frequency domain analysis of fluctuations of mRNA and protein copy numbers within a cell lineage: theory and experimental validation.

*Phys. Rev. X***11**, 021032 (2021).Schwanhäusser, B. et al. Global quantification of mammalian gene expression control.

*Nature***473**, 337 (2011).Larsson, A. J. et al. Genomic encoding of transcriptional burst kinetics.

*Nature***565**, 251–254 (2019).Suter, D. M. et al. Mammalian genes are transcribed with widely different bursting kinetics.

*Science***332**, 472–474 (2011).Gardner, T. S., Cantor, C. R. & Collins, J. J. Construction of a genetic toggle switch in Escherichia coli.

*Nature***403**, 339 (2000).Elowitz, M. B. & Leibler, S. A synthetic oscillatory network of transcriptional regulators.

*Nature***403**, 335–338 (2000).Goodwin, B. C. Oscillatory behavior in enzymatic control processes.

*Adv. Enzym. Regul.***3**, 425–437 (1965).Bragdon, M. D. et al. Cooperative assembly confers regulatory specificity and long-term genetic circuit stability.

*Cell***186**, 3810–3825 (2023).Lammers, N. C., Kim, Y. J., Zhao, J. & Garcia, H. G. A matter of time: using dynamics and theory to uncover mechanisms of transcriptional bursting.

*Curr. Opin. Cell Biol.***67**, 147–157 (2020).Goentoro, L., Shoval, O., Kirschner, M. W. & Alon, U. The incoherent feedforward loop can provide fold-change detection in gene regulation.

*Mol. Cell***36**, 894–899 (2009).Zhu, R., del Rio-Salgado, J. M., Garcia-Ojalvo, J. & Elowitz, M. B. Synthetic multistability in mammalian cells.

*Science***375**, eabg9765 (2022).Gonze, D., Halloy, J. & Goldbeter, A. Robustness of circadian rhythms with respect to molecular noise.

*Proc. Natl Acad. Sci. USA***99**, 673–678 (2002).Kuwahara, H. & Schwartz, R. Stochastic steady state gain in a gene expression process with mRNA degradation control.

*J. R. Soc. Interface***9**, 1589–1598 (2012).Nordick, B., Yu, P. Y., Liao, G. & Hong, T. Nonmodular oscillator and switch based on RNA decay drive regeneration of multimodal gene expression.

*Nucleic Acids Res.***50**, 3693–3708 (2022).Tian, T., Burrage, K., Burrage, P. M. & Carletti, M. Stochastic delay differential equations for genetic regulatory networks.

*J. Comput. Appl. Math.***205**, 696–707 (2007).Tomioka, R., Kimura, H., Kobayashi, T. J. & Aihara, K. Multivariate analysis of noise in genetic regulatory networks.

*J. Theor. Biol.***229**, 501–521 (2004).Friedman, N., Cai, L. & Xie, X. S. Linking stochastic dynamics to population distribution: an analytical framework of gene expression.

*Phys. Rev. Lett.***97**, 168302 (2006).Bokes, P. & Singh, A. Protein copy number distributions for a self-regulating gene in the presence of decoy binding sites.

*PLoS one.***10**, e0120555 (2015).Schnoerr, D., Sanguinetti, G. & Grima, R. Comparison of different moment-closure approximations for stochastic chemical kinetics.

*J. Chem. Phys.***143**, 185101 (2015).Lakatos, E., Ale, A., Kirk, P. D. & Stumpf, M. P. Multivariate moment closure techniques for stochastic kinetic models.

*J.**Chem. Phys.***143**, 094107 (2015).Zhou, T. & Zhang, J. Analytical results for a multistate gene model.

*SIAM J. Appl. Math.***72**, 789–818 (2012).Jia, C. & Li, Y. Analytical time-dependent distributions for gene expression models with complex promoter switching mechanisms.

*SIAM J. Appl. Math.***83**, 1572–1602 (2023).Sukys, A., Öcal, K. & Grima, R. Approximating solutions of the chemical master equation using neural networks.

*Iscience.***25**, (2022).Wang, X., Li, Y. & Jia, C. Poisson representation: a bridge between discrete and continuous models of stochastic gene regulatory networks.

*J. R. Soc. Interface***20**, 20230467 (2023).Jia, C. & Grima, R. Holimap: an accurate and efficient method for solving stochastic gene network dynamics.

*chenjiacsrc/Holimap*https://doi.org/10.5281/zenodo.12725485 (2024).

## Acknowledgements

We thank Augustinas Sukys for comments on the manuscript. C.J. acknowledges support from the National Natural Science Foundation of China with grant Nos. U2230402 and 12271020. R.G. acknowledges support from the Leverhulme Trust (RPG-2020-327).

## Author information

### Authors and Affiliations

### Contributions

R.G. conceived the original idea. C.J. performed the theoretical derivations and numerical simulations. C.J. and R.G. interpreted the theoretical results and jointly wrote the manuscript.

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Additional information

**Publisher’s note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary information

## Source data

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Jia, C., Grima, R. Holimap: an accurate and efficient method for solving stochastic gene network dynamics.
*Nat Commun* **15**, 6557 (2024). https://doi.org/10.1038/s41467-024-50716-z

Received:

Accepted:

Published:

DOI: https://doi.org/10.1038/s41467-024-50716-z