## Abstract

Long-term behaviors of biochemical reaction networks (BRNs) are described by steady states in deterministic models and stationary distributions in stochastic models. Unlike deterministic steady states, stationary distributions capturing inherent fluctuations of reactions are extremely difficult to derive analytically due to the curse of dimensionality. Here, we develop a method to derive analytic stationary distributions from deterministic steady states by transforming BRNs to have a special dynamic property, called complex balancing. Specifically, we merge nodes and edges of BRNs to match in- and out-flows of each node. This allows us to derive the stationary distributions of a large class of BRNs, including autophosphorylation networks of EGFR, PAK1, and Aurora B kinase and a genetic toggle switch. This reveals the unique properties of their stochastic dynamics such as robustness, sensitivity, and multi-modality. Importantly, we provide a user-friendly computational package, CASTANET, that automatically derives symbolic expressions of the stationary distributions of BRNs to understand their long-term stochasticity.

## Introduction

A standard approach to mathematical modeling of biochemical reaction networks (BRNs) is to use ordinary differential equations (ODEs), whose variables represent concentrations of molecules^{1}. However, this deterministic description, while convenient for computation, by its nature cannot capture the inherent randomness of BRNs. In particular, the long-term behavior of ODE systems is characterized by steady states or other attractors, rather than by the stationary distributions statistically observed in real biological systems. As cell biology moves away from bulk averages to single-cell measurements, a focus has shifted to the study of such stationary distributions^{2,3}. They can be described by various stochastic approaches^{1,4}. In particular, stationary distributions can be described as steady-state solutions of the chemical master equation (CME), which has been widely used to describe the time evolution of the probabilities for the numbers of chemical species in BRNs such as gene regulatory networks and signaling pathways^{5}.

Since the CME is a differential equation with infinitely many variables, its steady-state solution (i.e., the stationary distribution) can be found analytically only for simple cases, such as linear reaction networks^{6} or birth-death processes^{7}. Unlike the CME, its deterministic counterpart is a finite dimensional ODE, whose steady-state solutions are relatively easier to calculate. An interesting question, therefore, is whether there is a systematic way of using these deterministic steady states for characterizing the stationary distribution of the stochastic counterpart. There is a positive answer to this question for special networks, called complex balanced networks.

A result from queuing theory^{8}, reinterpreted in the context of BRNs^{9} through the connection between Petri nets and BRNs^{10}, shows that for complex balanced networks whose kinetics are described by mass action reactions, stationary distributions can be characterized in terms of jointly distributed Poisson random variables with parameters corresponding to deterministic steady states. An independent proof of this result, together with deep applications to CMEs, was developed by Anderson, Craciun, and Kurtz^{11}. Complex balancing is difficult to check and depends on rate constant values. However, beautiful work by Horn, Jackson, and Feinberg^{12,13,14} has shown that all networks that have the special structural properties of weak reversibility and zero deficiency are complex balanced, independently of rate constants. Weak reversibility of a network means that the network is a union of closed reaction cycles, and the deficiency of a network is the number of dependent closed reaction cycles, which can be easily checked. Satisfying these two structural properties is a simple condition to derive the stationary distribution of network under mass action reactions with the method in ref.^{11}.

As various BRNs such as networks of several reversible reactions (e.g., *A* + *B* ↔ *C* ↔ 0) or cyclic reactions (e.g., *A* → *B* → *C* → *A*) are weakly reversible and deficiency zero, their stationary distributions can be analytically derived^{15,16,17,18,19,20}. These have been used to characterize the stochasticity of various systems, including a genetic oscillator^{21} and a competitive inhibition enzyme kinetics model^{22}. Unfortunately, the majority of BRNs do not have the special network structure. For instance, only ~0.36% of the Erdös-Rényi random networks of two species with up to bimolecular reactions have a deficiency of zero when the edge probability is 0.5, and the fraction decreases to zero as the number of species increases^{23}. Moreover, from a biological standpoint, even simple networks are unlikely to be weakly reversible if they include a bimolecular reaction whose reverse reaction is unimolecular (e.g., autophosphorylation and dephosphorylation).

Here, we develop a framework to derive stationary distributions for a class of networks which do not have the special structure (i.e., weakly reversibility and zero deficiency) by modifying their structures via *network translation*^{24,25}. Specifically, by simply merging reactions with a common stoichiometric vector and shifting reactions in the networks, we are able to change their structure to be weakly reversible and deficiency zero while preserving their stochastic dynamics. This allows us to derive the stationary distributions of a large class of BRNs including autophosphorylation networks of EGFR, PAK1, and Aurora B kinase. This derivation reveals key reactions determining the autophosphorylation status, which can seldom be done with a purely numerical approach. Furthermore, we describe how the stochastic dynamics of more complex BRNs can be tracked when our method is applicable for only their subnetworks. Importantly, we provide a user-friendly computational package CASTANET (Computational package for deriving Analytical STAtionary distributions of biochemical reaction networks with NEtwork Translation) that automatically derives the stationary distributions of submitted BRNs via our method. This will provide an effective tool to analyze the stochasticity of BRNs.

## Results

### Obtaining the desired network structure via network translation

As mentioned in the introduction, the stationary distributions of the stochastic mass action models for BRNs can be derived with any choice of rate constants using the previous method^{11} if and only if the networks have two structural properties: *weak reversibility* and *zero deficiency*. However, even very simple networks such as the one shown in Fig. 1a-left fail to satisfy the two properties. Weak reversibility means that if there exists a path from a *complex* (i.e., a node in the reaction graph) to another complex, then there is a reverse path from the second one back to the first one. Because there is no path from *A* + *A* to *A* while there is a path from *A* to *A* + *A*, the network in Fig. 1a-left is not weakly reversible. The deficiency of a network is a non-negative integer index calculated by subtracting both the number of linkage classes (i.e., connected components in the reaction graph) and the dimension of the subspace spanned by the stoichiometric vectors from the number of complexes. The deficiency of the network in Fig. 1a-left is one. Therefore, the previous method^{11} cannot be used to derive its stationary distribution.

Two different reactions, 0 → *A* and *A* → *A* + *A*, have the same stoichiometric vector (1, 0) because both reactions produce one molecule of *A* (Fig. 1a-left). Thus, these two reactions can be merged by unifying the source complexes 0 and *A* into 0 and summing the propensities of both reactions (Fig. 1a). This procedure is known as *network translation*^{24,25}, which was proposed to investigate deterministic systems. This procedure is also applicable to stochastic systems as it preserves the stochastic dynamics (see Supplementary Note 1 for details). For instance, the propensities of the production of *A* are *α*_{1} + *α*_{4}*n*_{A} in both the original (Fig. 1a-left) and the translated network (Fig. 1a-right). Although the network translation is simple, it can effectively change the structure of the network to be a weakly reversible deficiency zero network.

### Propensity factorization is required

Even though the translated network is weakly reversible and of zero deficiency, the new model no longer follows mass action kinetics since the propensity of the reaction 0 → *A* is not constant (Fig. 1a-right). In this case, previously, it was known that the method in refs.^{11,26} is still applicable if the non-mass action propensity functions can be factorized as a certain form. However, the propensity functions of this translated network do not have the certain form. Thus, we generalize the previous factorization form so that stationary distributions can be derived for a larger class of BRNs. Specifically, we show that all the propensities of the translated network \({\tilde{\lambda }}_{k}({\bf{n}})\) need to be factorized as

for some constants *κ*_{k} > 0 and functions *θ*(**n**) > 0 and *ω*(**n**) ≥ 0 on a set Γ = {**n**∣**n** ≥ **b**} where the \({\nu }_{k}\) is the source complex vector of the *k*th reaction, the inequality is coordinate-wise, and the **b** needs to be chosen so that \({\tilde{\lambda }}_{k}({\bf{n}})\,> \, 0\) if and only if there are sufficient reactants (i.e., **n** ≥ \({\nu }_{k}\) + **b**) in Γ. For the translated network (Fig. 1a-right), \({\tilde{\lambda }}_{k}({\bf{n}})\,> \, 0\) if and only if **n** ≥ \({\nu }_{k}\) like mass action kinetics, and thus **b** = (0, 0) and \({{\Gamma }}={{\mathbb{Z}}}_{\ge 0}^{2}\).

Propensity functions satisfying the factorization condition (Eq. (8)) include a generalized mass action kinetics (Eq. (7)). For instance, if a source complex is 0, *A*, *A* + *A*, or *A* + *B*, propensity functions following the generalized mass action kinetics are proportional to 1, *f*_{A}(*n*_{A}), *f*_{A}(*n*_{A})*f*_{A}(*n*_{A} − 1), or *f*_{A}(*n*_{A})*f*_{B}(*n*_{B}), respectively. Note that if the *f*_{i}’s are identity functions then the propensities follow standard mass action kinetics (Eq. (6))^{11,27}. The propensity functions following the generalized mass action kinetics can be easily factorized with \(\theta ({\bf{n}})=\omega {({\bf{n}})}^{-1}=\mathop{\prod }\nolimits_{i = 1}^{d}\mathop{\prod }\nolimits_{j = {b}_{i}+1}^{{n}_{i}}{f}_{i}(j)\), where *d* is the number of the constitutive chemical species (see Eq. (9) for details). However, the translated network (Fig. 1a-right) does not follow the generalized mass action kinetics (Eq. (7)) because the propensity function of the reaction 0 → *A*, *α*_{1} + *α*_{4}*n*_{A}, is not proportional to 1 (i.e., it is not constant). Thus, we need to solve recurrence relations as described in Supplementary Note 2 to identify the propensity factorization (Fig. 1b):

### Derivation of stationary distribution

After identifying *κ*_{k}, *θ*(**n**), and *ω*(**n**) via the propensity factorization, we need to find a complex balanced equilibrium (CBE) of the deterministic mass action model with rate constants {*κ*_{k}} for the translated network (Fig. 1c). The CBE is a steady state at which for each complex *ν*, the in-flow rate to *ν* is equal to the out-flow rate from *ν*^{12}. For instance, based on the deterministic model in Fig. 1c, the complex balance conditions for the complexes 0, *A*, and *A* + *B* are *κ*_{3}*c*_{A}*c*_{B} = *κ*_{1}, *κ*_{1} = *κ*_{2}*c*_{A}, and *κ*_{2}*c*_{A} = *κ*_{3}*c*_{A}*c*_{B}, respectively. By solving these equations, we can obtain the CBE, (*c*_{A}, *c*_{B}) = (*κ*_{1}/*κ*_{2}, *κ*_{2}/*κ*_{3}). Note that the existence of a CBE is guaranteed because we translate a network to be weakly reversible and deficiency zero^{13}.

Finally, using the function *θ*(**n**) (Fig. 1b) and the CBE (*c*_{A}, *c*_{B}) (Fig. 1c), we can derive the stationary distribution of the stochastic model for the translated network, which is the same as that of the original network, as follows:

for *n*_{A} ≥ 0, *n*_{B} ≥ 0 where *M* is the normalizing constant so that the sum of the stationary distribution is one (see Methods for details). In this example, the distribution *π*(**n**) is obtained on \({{\Gamma }}={{\mathbb{Z}}}_{\ge 0}^{2}\). This state space is closed as proved in Supplementary information, and it is irreducible (i.e., every state is reachable from every other state; see Supplementary Note 3 for details). On the other hand, if an irreducible state space is a proper subset of Γ, possibly due to a conservation law, then the normalizing constant *M* is chosen so that the sum of *π*(**n**) over the subset is one.

### Computational package, CASTANET

Applying our theoretical framework (Fig. 1) has two practical difficulties. Translating a given network to a weakly reversible deficiency zero network (Fig. 1a) is not straightforward as prohibitively many candidates of translated networks often exist. Furthermore, it is challenging to check whether the factorization condition holds (Fig. 1b) as it requires to solve associated recurrence relations. Thus, we have developed a user-friendly, open-source, and publicly available computational package, “CASTANET (https://github.com/Mathbiomed/CASTANET),” that automatically performs network translation and propensity factorization and derives stationary distributions (Fig. 2a). With this package, we were able to easily identify hundreds of BRNs and derive analytic forms of their stationary distributions. We have provided some of them in Fig. 2b and Supplementary Figs. 3 and 4. To use this package, users only need to enter the source complexes, product complexes, and propensity functions of reactions.

### Stationary distributions of autophosphorylation networks

Our theoretical framework and especially CASTANET extend the class of BRNs whose stationary distributions can be analytically derived using CBEs (Figs. 1 and 2). This class includes various autophosphorylations networks (Fig. 3) that are not weakly reversible due to autophosphorylation reactions, which occur in intermolecular (trans), intramolecular (cis), or mixed manners^{28,29}.

Asymmetric trans-autophosphorylation occurs if two monomers form a homodimer and one of them acts as an ‘enzyme’ and phosphorylates the other. This type of autophosphorylation occurs in the epidermal growth factor receptor (EGFR), which triggers signal transduction for cell proliferation^{30}. The key regulatory reactions for EGFR include its synthesis, trans-autophosphorylation, dephosphorylation, and degradation (Fig. 3a-left). The asymmetric trans-autophosphorylation is a reaction that transforms the complex *A* + *A* to *A* + *A*_{P}. The dephosphorylation reaction is not the reverse of the previous reaction; instead, it occurs from the complex *A*_{P} to the complex *A*. Thus, the network is not weakly reversible. However, CASTANET automatically identifies a weakly reversible deficiency zero translated network and its propensity factorization (Fig. 3a-right) and then derives the analytic form of stationary distribution (Fig. 3b) that matches what is calculated with stochastic simulations (Fig. 3c).

In addition, having the formula (Fig. 3b) allows us to easily understand the long-term behavior of the system, something that is not possible with a purely computational approach. For instance, \(\pi ({n}_{{A}_{P}})\) in Fig. 3c is the Poisson distribution with rate *α*_{1}/*α*_{3}. This indicates that the synthesis (*α*_{1}) and degradation rates (*α*_{3}) of *A* are the determinants of the long-term status of \({n}_{{A}_{P}}\), which is surprisingly robust to the changes of phosphorylation (*α*_{2}) and dephosphorylation rates (*α*_{4}). Furthermore, *π*(*n*_{A}) is solely determined by \(\frac{{\alpha }_{1}}{{\alpha }_{2}}(1+\frac{{\alpha }_{4}}{{\alpha }_{3}})\), and its moments can be calculated with the modified Bessel functions (see Supplementary Note 3 for details). This allows us to identify that the stationary distribution of *n*_{A} is sub-Poissonian, and its coefficient of variation attains the maximum at \(\frac{{\alpha }_{1}}{{\alpha }_{2}}(1+\frac{{\alpha }_{4}}{{\alpha }_{3}})\approx 1.8\) (Supplementary Fig. 1).

Trans- and cis-autophosphorylation can occur sequentially. For example, p21-activated kinase 1 (PAK1), which regulates cell motility and morphology, phosphorylates a threonine residue in the kinase domain in a trans manner asymmetrically (*A* + *A* → *A* + *A*_{P})^{28,31} and then phosphorylates a serine residue in the regulatory domain of itself in a cis manner (*A*_{P} → *A*_{PP})^{32,33} (Fig. 3d-left). While the original network of PAK1 is not weakly reversible (Fig. 3d-left), CASTANET identifies a weakly reversible deficiency zero translated network (Fig. 3d-right) and derives the analytic form of stationary distribution (Fig. 3e) that matches the simulation result (Fig. 3f).

Both trans- and cis-autophosphorylation can occur simultaneously as in Aurora B kinase, which controls mitotic progression^{34}. In an Aurora B kinase network, cis-autophosphorylation (*A* → *A*_{P}) promotes rapid trans-autophosphorylation (*A* + *A*_{P} → *A*_{P} + *A*_{P}), which forms a positive feedback in the system^{34}. For this network, CASTANET successfully applies our method to derive the analytic form of stationary distribution (Fig. 3g–i).

While mass action kinetics are commonly used to describe autophosphorylations^{34,35} as in our examples, the Michaelis–Menten function and Hill function are also often used^{36,37}. Moreover, they are also used to describe the effects of phosphatases on dephosphorylation and proteasomes on degradation^{38,39}, which EGFR, PAK1, and Aurora B kinase undergo^{40,41,42,43,44,45}. Even when the mass action propensities in the networks (Fig. 3a, d, and g-left) are replaced with the Michaelis-Menten or Hill functions, their stationary distributions can still be derived with the same approach (Supplementary Fig. 5).

When the presented networks are extended by adding reactions, our methods might not be applicable. For instance, if an additional trans-autophosphorylation (*A* + *A*_{P} → *A*_{P} + *A*_{P}) is added to the example in Fig. 3a, although it can be translated to a weakly reversible deficiency zero network, their propensities cannot be factorized as in Eq. (1). Thus, the stationary distribution of the extended network cannot be derived by our method. However, it can be approximated by the stationary distribution of the original network if the rate constant of the added reaction is small enough (see Supplementary Fig. 6 for details). Such approximation works for the extended networks of the other networks (Fig. 3d, g) as well. This indicates that if the stationary distributions of core subnetworks, which consist of dominant reactions, can be derived by our method, then it could be used to approximate the stationary distributions of their more complex parent networks.

### Translation of fast subnetworks reveals both the fast and slow dynamics of a multi-timescale system

As the number of nodes (i.e., complexes) of networks increases, the networks are less likely to be a weakly reversible deficiency zero network even after network translation, and thus our method is less likely to be applicable. However, such large networks commonly consist of reactions occurring at different time scales^{46}. In this case, if we can derive the conditional stationary distributions of only fast subnetworks with our method, both the fast and slow dynamics of the full network can be accurately captured.

For gene regulatory networks, if the promoter kinetics (i.e., binding and unbinding of transcription factors to promoters) are fast, the fast subnetwork is a simple reversible binding network (i.e., weakly reversible and of zero deficiency), and thus its stationary distribution can be easily calculated^{21}. On the other hand, when the promoter kinetics are slow, the fast subnetwork includes a complex protein reaction network whose stationary distribution is challenging to derive. This can occur for a variety of reasons, e.g., the presence of nucleosomes in eukaryotic cells usually slows down the binding and unbinding of transcription factors^{20}.

A genetic toggle switch with the slow promoter kinetics, which consists of a pair of genes *G*^{A} and *G*^{B}, is an example of such multi-timescale system (Fig. 4a). The genes *G*^{A} and *G*^{B} express proteins *A* and *B*, respectively. Subsequently, these proteins undergo asymmetric trans-autophosphorylation, and they mutually repress gene expression by binding to the promoter region of each other’s gene. The binding and unbinding of the phosphorylated proteins occur at a slower time scale. Therefore, the entire network can be divided into the fast subnetwork consisting of gene expression, phosphorylation, dephosphorylation, and degradation and the slow subnetwork consisting of binding and unbinding (Fig. 4b). While the fast subnetworks for *A* and *B* are neither weakly reversible nor deficiency zero, they can be translated to weakly reversible deficiency zero networks (Fig. 4c-top). The propensity functions of these translated networks can be factorized as in Eq. (8), so their stationary distributions conditioned on the slow variables can be derived with CASTANET (Fig. 4c). Depending on the slowly changing gene states, the distributions of the proteins *A*, *A*_{P}, *B*, and *B*_{P} can dramatically change.

When the gene states slowly change, the fast variables rapidly equilibrate to the conditional stationary distributions determined by the current gene states (Fig. 4c). Thus, the weighted average of the conditional stationary distributions with the probabilities of the corresponding gene states accurately approximates the full (i.e., unconditional) stationary distribution under timescale separation (i.e., *ε* ≪ 1)^{20}. For instance, the full stationary distribution of *A*_{P} can be approximated as

where *ρ*_{act} and *ρ*_{rep} are the probabilities that the gene *G*^{A} is active and repressed, respectively. The *ρ*_{act} becomes larger as the dissociation constant between *G*^{A} and its repressor *B*_{P} is larger, and the number of repressor *B*_{P} is smaller. The *ρ*_{act} can be calculated by identifying the eigenvector of the matrix consisting of the dissociation constant, and the conditional stationary moments of the repressors obtained from Fig. 4c^{20}, and *ρ*_{rep} = 1 − *ρ*_{act} (see Supplementary Note 5 for details). Thus, using Eq. (4), we can accurately capture the bimodal stationary distribution of the protein *A*_{P} (Fig. 4d), leading to phenotypic heterogeneity in isogenic populations. Similarly, the full bimodal stationary distributions of the other fast variables *A*, *B*, and *B*_{P} can also be accurately captured (Supplementary Fig. 7). Note that these bimodalities cannot be captured by the corresponding deterministic model, which predicted monostability. Such mismatches between the stochastic and deterministic model have been frequently observed in the presence of timescale separation^{1,20,47}.

The conditional stationary distributions of the fast variables, obtained by using our approach (Fig. 4c), allow us to capture the slow dynamics of the full system as well. On the slow time scale, the slow variables are unlikely to be changed, but the fast variables rapidly equilibrate to their conditional stationary distributions for the given slow variables. Thus, by replacing the fast variables in the propensity functions of the slow reactions with their quasi-steady states (QSSs): conditional stationary moments, we can obtain the reduced model^{21,48}. For the toggle switch system, the QSSs of the fast variables *A*_{P} and *B*_{P} can be computed from their conditional stationary distributions (Fig. 4c). Then by replacing the fast variables *A*_{P} and *B*_{P} with their QSSs, we can obtain the reduced model with only the slow variables, the active and repressed genes (Fig. 4e). This reduced model accurately captures the slow dynamics of the full model: the binding and unbinding of the repressors to the genes. Both the full and the reduced models yield nearly identical distributions of the residence time of the repressor *B*_{P}, which quantifies how long the repressor maintains its binding to the gene *G*^{A} (Fig. 4e). Because the reduced model does not simulate the fast reactions, which incur a large computational cost in the full model simulation, computation time decreases by 99.9998%.

## Discussion

In this study, we have developed a framework and its computational package that analytically derive stationary distributions of a large class of BRNs. Specifically, we showed that the stationary distribution of a BRN can be derived if two conditions are satisfied: the network can be transformed to a weakly reversible deficiency zero network via network translation (Fig. 1a) and the propensity functions of the translated network satisfies the generalized factorization property of mass action kinetics, identified in this study (Fig. 1b). We found that these conditions are satisfied in numerous BRNs including various autophosphoryaltion networks by using CASTANET (Fig. 2, Supplementary Figs. 3 and 4). Furthermore, even when only a subnetwork of more complex BRNs satisfies the conditions, the stochastic dynamics can often be captured. That is, the stationary distribution of the subnetwork consisting of dominant reactions, derived with our method, can accurately approximate the stationary distribution of its parent network (Supplementary Fig. 6). Furthermore, the derivation of the stationary distribution of a fast subnetwork is enough to capture both the slow and fast stochastic dynamics of its multi-timescale parent network (Fig. 4). With these analytically derived stationary distributions of BRNs, their long-term stochastic behaviors such as their dependence on rate constants can be effectively investigated, and the likelihood function of parameters for Bayesian inference can also be derived^{2}.

Our work focused on the derivation of steady-state solutions of the CME using the underlying network structure following previous studies^{11,26}. However, the CME is not usually used to capture cell division, which should be taken into account to describe single cell behavior in general. Thus, it would be interesting in future work to extend our method to the population balance equation^{49,50}, which describes stochastic cell population dynamics (e.g., cell division) as well as intracellular dynamics. This extension could be accomplished by averaging stationary distributions from cell populations after a stationary distribution of each cell is derived by our method.

We have translated a network to have the desired structural properties (i.e., weak reversibility and zero deficiency) by merging reactions with a common stoichiometric vector (Figs. 1a and 3g) and shifting a reaction preserving its stoichiometric vector (Fig. 3a, d). While the idea underlying this procedure is simple, it greatly extends the class of networks whose structure can be changed to the desired one. For instance, when the edge probability is \(\frac{1}{2}\), the fraction of deficiency zero networks among Erdös-Rényi random networks with two species and at most bimolecular reactions increases more than six times after network translation. The identification of such translation, which is not simple, can be done automatically by the provided computational package, CASTANET. In particular, to efficiently search translated networks, in CASTANET, we use the necessary conditions for network translation toward weakly reversible and deficiency zero networks, derived in this study (see Supplementary Note 4 for details).

Furthermore, CASTNET performs the propensity factorization of translated networks, which is required to derive the stationary distributions of networks with non-mass action kinetics. In this study, by extending the previous factorization condition^{11,26} to ours (Eq. (8)), we have been able to derive stationary distributions of various BRNs (Figs. 1, 2, and 3, Supplementary Figs. 3 and 4). Although the factorization condition with non-mass action kinetics have been rarely investigated^{11,26} due to its complexity and lack of motivation, our work motivates studies on it as translated networks typically follow non-mass action kinetics. To cover more weakly reversible deficiency zero translated networks, we aim to further generalize our factorization conditions, and accordingly, we will update our computational package CASTANET.

By changing the network structure while preserving the stochastic dynamics via network translation, we have been able to use the theory, applicable to weakly reversible deficiency zero networks^{11}, to understand the stochastic dynamics of a larger class of networks with non-zero deficiencies. Similarly, by translating a network to have a deficiency of one, it would be possible to show that the networks have the properties of a network with a deficiency of one, such as absolute concentration robustness: the steady state value of a species is invariant to the overall input of the system^{51,52,53}. Furthermore, network translation of stochastic BRNs can also be used to identify stochastic properties of networks based on their structures, such as positive recurrence^{54} and extinction^{52,55,56}.

## Methods

### Biochemical reaction network

BRN is a graphical representation of a given biochemical system^{12,14,57,58}. It consists of the triple \(\{{\mathcal{S}},{\mathcal{C}},{\mathcal{R}}\}\) where \({\mathcal{S}}=\{{S}_{1},\ldots ,{S}_{d}\}\) is the set of interacting *species*, \({\mathcal{C}}=\{{C}_{1},\ldots ,{C}_{m}\}\) is the set of *complexes*, and \({\mathcal{R}}=\{{\nu }_{1}\to {\nu }_{1}^{\prime},\ldots ,{\nu }_{r}\to {\nu }_{r}^{\prime}\}\) is the set of *reactions*. A complex is a non-negative linear combination of species (i.e., *C*_{i} = *a*_{i1}*S*_{1} + ⋯ + *a*_{id}*S*_{d}), which is also represented as a *d*-dimensional non-negative integer-valued vector (*a*_{i1}, …, *a*_{id}). A reaction is an ordered pair of complexes. This allows the BRN to be represented as a directed graph (\({\mathcal{C}},{\mathcal{R}}\)), where complexes are nodes and reactions are directed edges. Hence, a reaction \({R}_{j}:= {\nu }_{j}\to {\nu }_{j}^{\prime}\), where \({\nu }_{j}\) and \({\nu }_{j}^{\prime}\) are the *source* and *product* complexes of the *j*th reaction, respectively. The vector \({\nu }_{j}^{\prime}-{\nu }_{j}\) is called a *stoichiometric vector* of the *j*th reaction, which describes the relative change in the number of molecules of reactants and products between the sides of each reaction. A *linkage class* is a connected component of the network when all reactions are regarded as undirected edges. Weak reversibility means that if there is a sequence of reactions from a complex *C*_{i} to another complex *C*_{j} then there must be a sequence of reactions from *C*_{j} to *C*_{i}. The deficiency *δ* is the integer index defined as \(| {\mathcal{C}}| -l-s\), where \(| {\mathcal{C}}|\) is the number of complexes, *l* is the number of linkage classes, and *s* is the dimension of the subspace spanned by all stoichiometric vectors (Fig. 1a).

#### Complex balanced equilibrium

CBE of the deterministic mass action model for a BRN with rate constants {*κ*_{k}} is the steady state \({\bf{c}}\in {{\mathbb{R}}}_{ \,{> }\,0}^{d}\) which satisfies the following equality for each complex \(z\in {\mathcal{C}}\) (Fig. 1c):

where the \({\kappa }_{k}{{\bf{c}}}^{{\nu }_{k}}={\kappa }_{k}{c}_{1}^{{\nu }_{k1}}{c}_{2}^{{\nu }_{k2}}\cdot \cdots \cdot {c}_{d}^{{\nu }_{kd}}\) is the rate function of the *k*th reaction following the deterministic mass action kinetics, and \({\nu }_{ki}\) is the *i*th entry of \({\nu }_{k}\)^{12}. The LHS is the sum of rate functions over reactions whose product complex is *z* (i.e., \({\nu }_{k}^{\prime}=z\)), and the RHS is the sum of rate functions over reactions whose source complex is *z* (i.e., \({\nu }_{k}\) = *z*). In other words, at CBE, the in-and out-flows create a balance for each complex. The deterministic mass action model for a BRN possesses a CBE regardless of rate constants if and only if the BRN is weakly reversible and deficiency zero^{13}. Furthermore, even when a BRN has non-zero deficiency and is weakly reversible, the deterministic mass action model for the BRN possesses a CBE with specific choice of rate constants^{13}.

#### Stochastic model of biochemical reaction networks

We model a BRN as a continuous-time Markov chain (CTMC) for an isothermal well-stirred system with a constant volume. The state of the CTMC at time *t*, \({\bf{n}}(t)=({n}_{1},\ldots ,{n}_{d})\in {{\mathbb{Z}}}_{\ge 0}^{d}\), represents the copy number of each species. Each reaction is associated with a *propensity function*:

Specifically, *λ*_{k}(**n**) is the probability that the *k*th reaction occurs in a short interval of length *d**t* if the state at the beginning of the interval was **n**. Using the propensity functions, we can derive the CME, which describes the time evolution of the probability of the model:

for \({\bf{n}}\in {{\mathbb{Z}}}_{\ge 0}^{d}\), where *p*_{n}(*t*) denotes the probability that the state of the system equals \({\bf{n}}\in {{\mathbb{Z}}}_{\ge 0}^{d}\) at time *t*. A *stationary distribution **π*(**n**) of a given CTMC is the steady-state solution of the CME that satisfies the following infinite equation:

It means that if the CTMC is initialized with its stationary distribution, the vector of probabilities *p*(*t*) will stay constant for all time *t* > 0.

The stochastic mass action propensity functions are assumed to be proportional to the number of ways in which species can combine to form the source complex. Hence, the *k*th propensity function with the rate constant *α*_{k} can be written as:

Additionally, the propensity functions can have a more generalized form as follows:

where functions \({f}_{i}:{{\mathbb{Z}}}_{\ge 0}\to {{\mathbb{R}}}_{\ge 0}\). For instance, the translated network in Fig. 3a-right follows this form as \({f}_{A}({n}_{A})={n}_{A}({n}_{A}-1)\) and \({f}_{{A}_{P}}({n}_{{A}_{P}})={n}_{{A}_{P}}\). This is called ‛generalized’ stochastic mass action kinetics since if *f*_{i}’s are identity functions then it is equivalent to the stochastic mass action kinetics (Eq. (6)).

### Network translation

Network translation is a procedure to transform a BRN \(\{{\mathcal{S}},{\mathcal{C}},{\mathcal{R}}\}\) to another one \(\{{\mathcal{S}},\tilde{{\mathcal{C}}},\tilde{{\mathcal{R}}}\}\) that satisfies the condition: the sum of propensities of a set of reactions sharing the same stoichiometric vector remains identical (Fig. 1a). That is, for each vector \(\gamma \in {{\mathbb{Z}}}^{d}\), the propensity functions of the original and the translated networks, *λ*_{k}(**n**) and \({\tilde{\lambda }}_{\tilde{k}}({\bf{n}})\), satisfy the following:

for all \({\bf{n}}\in {{\mathbb{Z}}}_{\ge 0}^{d}\), where \({\nu }_{k}^{\prime}-{\nu }_{k}\) and \({\tilde{\nu }}_{\tilde{k}}^{\prime}-{\tilde{\nu }}_{\tilde{k}}\) are the stoichiometric vectors of the *k*th and \(\tilde{k}\)th reactions of the first and second models, respectively. For example, merging several reactions sharing a common stoichiometric vector into one reaction is network translation. Similarly, shifting reactions preserving their stoichiometric vectors is also an instance of network translation (e.g., *A* + *B* → 2*B* to *A* → *B*). Network translation can change the structural properties of BRNs, such as weak reversibility and the deficiency, but it preserves the associated CME, i.e., stochastic dynamics (see Supplementary Note 1 for details).

### Propensity factorization

To derive the stationary distribution with our approach (Fig. 1), all the propensities \({\tilde{\lambda }}_{k}({\bf{n}})\) should be factorized as

for some constants *κ*_{k} > 0 and functions *θ*(**n**) > 0 and *ω*(**n**) ≥ 0 on a set \({{\Gamma }}=\{{\bf{n}}\,| \,{\bf{n}}\ge {\bf{b}}\}\subset {{\mathbb{Z}}}_{\ge 0}^{d}\) at which \({\tilde{\lambda }}_{k}({\bf{n}})\,> \, 0\) if **n** ≥ \({\nu }_{k}\) + **b** and \({\tilde{\lambda }}_{k}({\bf{n}})=0\) otherwise. For the stochastic mass action kinetics (Eq. (6)), **b** = **0** as \({\tilde{\lambda }}_{k}({\bf{n}})\,> \, 0\) if and only if **n** ≥ \({\nu }_{k}\) . For the translated network in Fig. 3a, **b** = (1, 0) because each propensity is positive if and only if **n** ≥ \({\nu }_{k}\) + (1, 0) in \({{\Gamma }}=\{{\bf{n}}| ({n}_{A},{n}_{{A}_{P}})\,\ge\, (1,0)\}\).

While the propensity factorization can be calculated by solving recurrence relations (see Supplementary Note 2 for details), it can be obtained without solving recurrence relations if all the propensities of a given network follow the generalized mass action kinetics (Eq. (7)). In this case, the factorization can be easily obtained as

where \(\mathop{\prod }\nolimits_{j = 0}^{-1}{a}_{j}=1\) for any {*a*_{j}} (see Supplementary Note 2 for details).

### Derivation of stationary distribution

If a network is weakly reversible and deficiency zero so that it has a CBE \({\bf{c}}\in {{\mathbb{R}}}_{ \,{> }\,0}^{d}\)^{13} and propensity function *λ*_{k}(**n**) can be factorized as in Eq. (8) on Γ, a stationary measure of the network can be derived as

Supplementary Note 3 provides the proof and detailed illustration. By scaling this stationary measure with the normalizing constant, which is the reciprocal of the summation of *π*(**n**) over the irreducible state space, the stationary distribution on the irreducible state space can be obtained. For instance, the normalizing constant for the stationary distribution (Fig. 3e) is calculated by summing *π*(**n**) over the irreducible state space \(\{({n}_{A},{n}_{{A}_{P}},{n}_{{A}_{PP}})\,| \,{n}_{A}\ge 1,{n}_{A}+{n}_{{A}_{P}}+{n}_{{A}_{PP}}={T}_{0}\}\). While computing the normalizing constants is sometimes challenging, a symbolic computation approach using Wilf-Zeilberger theory can be used for the stochastic mass action model^{59}.

### Computational package, CASTANET

We have developed a user-friendly, open-source, and publicly available computational package, CASTANET, that performs the network translation and propensity factorization automatically (Fig. 2a). The package checks two conditions: whether a given BRN can be made weakly reversible and of zero deficiency after network translation, and whether the propensities of the translated network can be factorized as in Eq. (8). If these two conditions are satisfied, CASTANET then calculates the analytic formula for a stationary distribution.

To efficiently search weakly reversible deficiency zero translated networks, we derived their necessary conditions (see Supplementary Note 4 for details) and incorporated them in the package. Furthermore, CASTANET constructs a candidate for the factorization function *θ*(**n**) in symbolic expression, which allows us to check propensity factorization condition without checking infinite combinations (see Supplementary Note 4).

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

## Code availability

The MATLAB (version R2020b) code performing network translation, propensity factorization, and CBE calculation to derive stationary distribution (schematically shown in Figs. 1 and 2) can be found at https://github.com/Mathbiomed/CASTANET. The detailed description and step-by-step manual are provided in Supplementary information.

## References

Hahl, S. K. & Kremling, A. A comparison of deterministic and stochastic modeling approaches for biochemical reaction systems: on fixed points, means, and modes.

*Front. Genet.***7**, 157 (2016).Kim, J. K. & Marioni, J. C. Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data.

*Genome Biol.***14**, R7 (2013).Stegle, O., Teichmann, S. A. & Marioni, J. C. Computational and analytical challenges in single-cell transcriptomics.

*Nat. Rev. Genet.***16**, 133–145 (2015).Schnoerr, D., Sanguinetti, G. & Grima, R. Approximation and inference methods for stochastic biochemical kinetics—a tutorial review.

*J. Phys. A: Math. Theor.***50**, 093001 (2017).Yang, J.-M. et al. Integrating chemical and mechanical signals through dynamic coupling between cellular protrusions and pulsed erk activation.

*Nat. Commun.***9**, 4673 (2018).Gadgil, C., Lee, C. H. & Othmer, H. G. A stochastic analysis of first-order reaction networks.

*Bull. Math. Biol.***67**, 901–946 (2005).Allen, L. An Introduction to Stochastic Processes with Applications to Biology (CRC Press, 2010).

Kelly, F.

*Reversibility and Stochastic Networks*. (Wiley, New York, 1979).Mairesse, J. & Nguyen, H.-T. Deficiency zero Petri nets and product form. In Franceschinis, G. & Wolf, K. (eds.)

*Applications and Theory of Petri Nets*, 103-122 (Springer-Verlag, 2009).Angeli, D., de Leenheer, P. & Sontag, E. A Petri net approach to the study of persistence in chemical reaction networks.

*Math. Biosci.***210**, 598–618 (2007).Anderson, D. F., Craciun, G. & Kurtz, T. G. Product-form stationary distributions for deficiency zero chemical reaction networks.

*Bull. Math. Biol.***72**, 1947–1970 (2010).Horn, F. & Jackson, R. General mass action kinetics.

*Arch. Rat. Mech. Anal.***47**, 81–116 (1972).Horn, F. Necessary and sufficient conditions for complex balancing in chemical kinetics.

*Arch. Rat. Mech. Anal.***49**, 172–186 (1972).Feinberg, M. Complex balancing in general kinetic systems.

*Arch. Rat. Mech. Anal.***49**, 187–194 (1972).Wu, S., Fu, J., Li, H. & Petzold, L. Automatic identification of model reductions for discrete stochastic simulation.

*J. Chem. Phys.***137**, 034106 (2012).Ghaemi, R. & Del Vecchio, D. Stochastic analysis of retroactivity in transcriptional networks through singular perturbation. In

*2012 American Control Conference (ACC)*, 2731–2736 (2012).Mélykúti, B., Hespanha, J. P. & Khammash, M. Equilibrium distributions of simple biochemical reaction systems for time-scale separation in stochastic reaction networks.

*J. R. Soc. Interface***11**, 20140054 (2014).Hepp, B., Gupta, A. & Khammash, M. Adaptive hybrid simulations for multiscale stochastic reaction networks.

*J. Chem. Phys.***142**, 034118 (2015).Hwang, H. J. & Velázquez, J. J. L. Bistable stochastic biochemical networks: highly specific systems with few chemicals.

*J. Math. Chem***51**, 1343–1375 (2013).Ali Al-Radhawi, M., Del Vecchio, D. & Sontag, E. D. Multi-modality in gene regulatory networks with slow promoter kinetics.

*PLoS Comput. Biol.***15**, 1–27 (2019).Kim, J. K. & Sontag, E. D. Reduction of multiscale stochastic biochemical reaction networks using exact moment derivation.

*PLoS Comput. Biol.***13**, 1–24 (2017).Kan, X., Lee, C. H. & Othmer, H. G. A multi-time-scale analysis of chemical reaction networks: Ii. stochastic systems.

*J. Math. Biol.***73**, 1081–1129 (2016).Anderson, D. F. & Nguyen, T. D. Prevalence of deficiency zero reaction networks in an erdos-renyi framework (2019). 1910.12723.

Johnston, M. D. Translated chemical reaction networks.

*Bull. Math. Biol.***76**, 1081–1116 (2014).Johnston, M. D. & Burton, E. Computing weakly reversible deficiency zero network translations using elementary flux modes.

*Bull. Math. Biol.***81**, 1613–1644 (2019).Anderson, D. F. & Cotter, S. L. Product-form stationary distributions for deficiency zero networks with non-mass action kinetics.

*Bull. Math. Biol.***78**, 2390–2407 (2016).Sontag, E. D. Structure and stability of certain chemical networks and applications to the kinetic proofreading model of t-cell receptor signal transduction.

*IEEE Trans. Autom. Control***46**, 1028–1047 (2001).Beenstock, J., Mooshayef, N. & Engelberg, D. How do protein kinases take a selfie (autophosphorylate)?

*Trends Biochem. Sci.***41**, 938–953 (2016).Dodson, C. A., Yeoh, S., Haq, T. & Bayliss, R. A kinetic test characterizes kinase intramolecular and intermolecular autophosphorylation mechanisms.

*Sci. Signal.***6**, ra54–ra54 (2013).Oda, K., Matsuoka, Y., Funahashi, A. & Kitano, H. A comprehensive pathway map of epidermal growth factor receptor signaling.

*Mol. Syst. Biol.***1**, 2005.0010 (2005).Wang, J., Wu, J.-W. & Wang, Z.-X. Structural insights into the autoactivation mechanism of p21-activated protein kinase.

*Structure***19**, 1752–1761 (2011).Dammann, K., Khare, V. & Gasche, C. Tracing paks from gi inflammation to cancer.

*Gut***63**, 1173–1184 (2014).Parrini, M. C., Lei, M., Harrison, S. C. & Mayer, B. J. Pak1 kinase homodimers are autoinhibited in trans and dissociated upon activation by cdc42 and rac1.

*Mol. Cell***9**, 73–83 (2002).Zaytsev, A. V. et al. Bistability of a coupled aurora b kinase-phosphatase system in cell division.

*eLife***5**, e10644 (2016).Doherty, K., Meere, M. & Piiroinen, P. T. Some mathematical models of intermolecular autophosphorylation.

*J. Theor. Biol.***370**, 27–38 (2015).Wang, Z.-X. & Wu, J.-W. Autophosphorylation kinetics of protein kinases.

*Biochem. J.***368**, 947–952 (2002).Mouri, K., Nacher, J. C. & Akutsu, T. A mathematical model for the detection mechanism of dna double-strand breaks depending on autophosphorylation of atm.

*PLoS ONE***4**, 1–14 (2009).Nguyen, L. K., Kolch, W. & Kholodenko, B. N. When ubiquitination meets phosphorylation: a systems biology perspective of egfr/mapk signalling.

*Cell Commun. Signal.***11**, 52 (2013).Luciani, F., Keşmir, C., Mishto, M., Or-Guil, M. & de Boer, R. J. A mathematical model of protein degradation by the proteasome.

*Biophys. J.***88**, 2422–2432 (2005).Tiganis, T. Protein tyrosine phosphatases: dephosphorylating the epidermal growth factor receptor.

*IUBMB Life***53**, 3–14 (2002).King, C. C. et al. p21-activated kinase (pak1) is phosphorylated and activated by 3-phosphoinositide-dependent kinase-1 (pdk1).

*J. Biol. Chem.***275**, 41201–41209 (2000).Sessa, F. & Villa, F. Structure of Aurora B–INCENP in complex with barasertib reveals a potential transinhibitory mechanism.

*Acta Crystallogr. Sect. F***70**, 294–298 (2014).Chen, C.-Y., Yu, Z.-Y., Chuang, Y.-S., Huang, R.-M. & Wang, T.-C. V. Sulforaphane attenuates egfr signaling in nsclc cells.

*J. Biomed. Sci.***22**, 38 (2015).Gully, C. P. et al. Antineoplastic effects of an aurora b kinase inhibitor in breast cancer.

*Mol. Cancer***9**, 42 (2010).Weisz Hubsman, M., Volinsky, N., Manser, E., Yablonski, D. & Aronheim, A. Autophosphorylation-dependent degradation of Pak1, triggered by the Rho-family GTPase, Chp.

*Biochem. J.***404**, 487–497 (2007).Shamir, M., Bar-On, Y., Phillips, R. & Milo, R. Snapshot: timescales in cell biology.

*Cell***164**, 1302–1302.e1 (2016).Bibbona, E., Kim, J. & Wiuf, C. Stationary distributions of systems with discreteness-induced transitions.

*J. R. Soc. Interface***17**, 20200243 (2020).Rao, C. V. & Arkin, A. P. Stochastic chemical kinetics and the quasi-steady-state assumption: application to the gillespie algorithm.

*J. Chem. Phys.***118**, 4999–5010 (2003).Waldherr, S. Estimation methods for heterogeneous cell population models in systems biology.

*J. R. Soc. Interface***15**, 20180530 (2018).Kremling, A. Systems biology: mathematical modeling and model analysis (CRC Press, 2013).

Shinar, G. & Feinberg, M. Structural sources of robustness in biochemical reaction networks.

*Science***327**, 1389–1391 (2010).Anderson, D. F., Enciso, G. A. & Johnston, M. D. Stochastic analysis of biochemical reaction networks with absolute concentration robustness.

*J. R. Soc. Interface***11**, 20130943 (2014).Enciso, G. A. Transient absolute robustness in stochastic biochemical networks.

*J. R. Soc. Interface***13**, 20160475 (2016).Anderson, D. F. & Kim, J. Some network conditions for positive recurrence of stochastically modeled reaction networks.

*SIAM J. Appl. Math.***78**, 2692–2713 (2018).Johnston, M. D. A computational approach to extinction events in chemical reaction networks with discrete state spaces.

*Math. Biosci.***294**, 130–142 (2017).Johnston, M. D., Anderson, D. F., Craciun, G. & Brijder, R. Conditions for extinction events in chemical reaction networks with discrete state spaces.

*J. Math. Biol.***76**, 1535–1558 (2018).Feinberg, M. Foundations of Chemical Reaction Network Theory (Springer, 2019).

Anderson, D. F. & Kurtz, T. G. Continuous time markov chain models for chemical reaction networks. In

*Design and analysis of biomolecular circuits*, 3-42 (Springer, 2011).Sontag, E. D. & Zeilberger, D. A symbolic computation approach to a problem involving multivariate poisson distributions.

*Adv. Appl. Math.***44**, 359–377 (2010).Gillespie, D. T. Exact stochastic simulation of coupled chemical reactions.

*J. Phys. Chem.***81**, 2340–2361 (1977).

## Acknowledgements

This work is supported by the Institute for Basic Science (IBS-R029-C3)(J.K.K.), the Samsung Science and Technology Foundation (SSTF-BA1902-01) (J.K.K.), and the National Research Foundation of Korea (Global Ph. D. Fellowship Program 2019H1A2A1075303) (H.H.), as well as US NSF grants 1716623 and 1849588 (E.D.S.). J.K. thanks German Enciso for travel support via NSF grant DMS1616233. H.H. and J.K. thank the 2019 Chemical Reaction Networks summer school and workshop at Politecnico di Torino for providing valuable feedbacks on this work.

## Author information

### Authors and Affiliations

### Contributions

H.H., E.D.S., and J.K.K. designed research. H.H., J.K., and J.K.K. developed the theory and algorithm. H.H. performed computation. H.H., J.K., M.A.A., E.D.S., and J.K.K. discussed and analyzed results. H.H., J.K., and J.K.K. drafted the manuscript and designed the figures, and H.H., J.K., M.A.A., E.D.S., and J.K.K. revised the paper.

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Additional information

**Publisher’s note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Hong, H., Kim, J., Ali Al-Radhawi, M. *et al.* Derivation of stationary distributions of biochemical reaction networks via structure transformation.
*Commun Biol* **4**, 620 (2021). https://doi.org/10.1038/s42003-021-02117-x

Received:

Accepted:

Published:

DOI: https://doi.org/10.1038/s42003-021-02117-x

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.