Introduction

How nonlinear is the regulation of the components of biological networks? That is, to what extent do the biochemical components of these networks non-independently interact (Fig. 1) in influencing downstream processes and network behavior. Research on the “nonlinearity” front has hitherto focused on its various manifestations in the dynamics of biological systems, such as chaos, bifurcation, multistability, synchronization, patterning, dissipation, etc.1, but a characterization of regulatory nonlinearity among the components of the underlying systems that give rise to those phenomena is lacking. A more complete understanding of biological regulatory nonlinearity would not only yield insights into their design principles2 but also have theoretical implications ranging from canalization to control3,4 and practical implications for biomedical therapy, synthetic biology, etc.1,5. A good example of this concerns the mapping between molecular or genetic information and the resulting system-level anatomical structure and function of an organism. Advances in regenerative medicine and synthetic morphology require rational control of physiological and anatomical outcomes6, but progress in genetics and molecular biology produce methods and knowledge targeting the lowest-level cellular hardware. There is no one-to-one mapping from genetic information to tissue- and organ-level structure; similarly, ion channels open and close post-translationally, driving physiological dynamics that are not readily inferred from proteomic or transcriptomic data. System-level properties in biology are often highly emergent, with gene-regulatory or bioelectric circuit dynamics connecting initial state information and transition rules to large-scale structure and function. Thus, the difficult inverse problem7 of inferring outcomes and desirable interventions across scales of biology illustrates some of the fundamental questions about the directness or nonlinearity of encodings of information, as well as the importance of this question for practical advances in biomedicine and bioengineering that exploit the plasticity and robustness of cellular collectives. Many deep questions remain about the potential limitations and best strategies to bridge scales for prediction and control in developmental, evolutionary, and cell biology. To that end, we introduce here a formal characterization of the nonlinearity of models of biological regulatory networks, such as those often used to describe relationships between regulatory genes. Specifically, we consider a class of discrete models of biological regulatory systems called “Boolean models” that are known for their relative simplicity and tractability compared to continuous ordinary differential equation-based (ODE) models8.

Fig. 1: An illustration of the concept of regulatory nonlinearity.
figure 1

Each black circle represents a generic biochemical component such as a gene, transcription factor, enzyme, etc., regulated by a set of inputs (also biochemical components) and generates a generic output such as concentration level, strength, etc. Non-zero interactions among the inputs are represented by red circles connected by red lines, with the total number of possible interactions for a node with k inputs equal to \(\sum_{l=1}^k {{k}\choose{l}}\). The size of the red circles and the width of the connecting lines represents the weight of the interactions. Independent inputs are represented by unconnected red circles. The degree of nonlinearity would thus be expected to increase, though not necessarily linearly, from left to right, as the numbers and the strengths of the regulatory interactions increase. One could also visualize these local interactions in a broader network-context as “hypergraphs”46.

A Boolean network is a discrete network model characterized by the following features. Each node in a Boolean network can only be in one of two states, ON or OFF, which represents the expression or activity of that node. The state of a node depends on the states of other input nodes which are represented as a Boolean rule of these input nodes. Many of the available Boolean network models were created via literature search of the regulatory mechanisms and subsequently validated via experiments9. Some of the publicly available models were generated via network inference methods from time course data3.

Previous studies have found that certain characteristic features of the biological Boolean models, such as the mean in-degree, output bias, sensitivity and canalization, tend to assume an optimal range of values that support optimal function10,11. Here we study a new but generic feature of complex systems termed “regulatory nonlinearity” that we broadly define as the degree to which the inputs to the components of a complex interact. To characterize the regulatory nonlinearity of Boolean networks we formalize an approach to generalizing Boolean logic by casting it as a form of probability, which was originally proposed by George Boole himself12. We leverage the continuous nature of the resulting polynomials to decompose a Boolean function using Taylor-series and reveal the distinct layers of its regulatory nonlinearity (Fig. 5). Various other methods, both discrete and continuous, of decomposing Boolean functions exist, such as Reed-Muller, Walsh spectrum, Fourier, discrete Taylor and fuzzy logic13,14,15,16. Our continuous Taylor decomposition method is distinct in that it offers a clear and systematic way to characterize nonlinearity.

Is the regulatory nonlinearity of biological systems special? To that end we specifically ask: (1) how well could biological Boolean models be approximated, that is, faithfully represented with only partial information containing lower levels of nonlinearity relative to that of the original?; (2) is there an optimal level of regulatory nonlinearity, characterized by maximum approximability, that these models may have been selected for by evolution?; and (3) do different classes of biological networks show characteristically different optimal levels of regulatory nonlinearity? To answer these questions, we first approximate the biological models by systematically composing the various nonlinear layers resulting in a sequence of model-approximations with increasing levels of nonlinearity. We then estimate the accuracy of these approximations by comparing the outputs of their simulations with that of the original unapproximated model. We then construct an appropriate random ensemble for each biological model and compare their mean accuracies for fixed levels of approximation. The main idea is that a biological model that is more approximable than expected for a particular level of nonlinearity would mean that the network may have been optimized for that level nonlinearity. Finally, we classify the biological networks into various categories and compare their approximabilities to identify any category-dependent effects.

Results

The regulation of biological models tends to be less nonlinear than expected by chance

The mean approximation error (MAE; see “Methods”) of biological models tends to be less than both the constrained and the unconstrained random ensembles across all approximation orders (Fig. 2a). At the linear order, for example, the mean MAE of biological models is about 0.025, whereas the mean MAE of the constrained ensembles is twice as large at about 0.05 (p < 10−6) and even larger for the unconstrained ensembles at about 0.07 (p < 10−23). This ordering of MAE extends to all approximation orders (Fig. 2a). Even though the mean MAE of the unconstrained ensembles tends to be remarkably low (0.07 at worst), the percentage change in MAE (PMAE; see “Methods”) convey a more accurate sense of approximability. For example, the MAE of the biological models and the constrained ensembles are respectively about 50 and 25% lower than the baseline expectation for the linear order (Fig. 2b). This means that the biological models are about 25% more linearly approximable than expected by chance, an effect that amplifies with higher approximation orders going up to 50% for order 4, since the PMAE of biological models shrinks faster than the constrained ensembles (Fig. 2b).

Fig. 2: Biological models are more approximable than expected by chance.
figure 2

a The MAE of the biological models and the associated constrained and unconstrained ensembles for approximation orders 1 to 5; MAE values for orders 6 and above are negligible and not shown. b Percentage change in MAE for the biological models and the associated constrained ensembles computed with respect to the MAE of the corresponding unconstrained ensembles. Every point in the distributions corresponding to the random ensembles represents the average MAE of an ensemble of 100 random networks associated with each biological model. The p values indicate the statistical significance of the difference in mean MAEs between sets of random ensembles and the biological models for a given order of nonlinearity. Statistical analysis by Welch’s unequal variances t-test.

A possible explanation for the observed trends is that the Taylor spectrums of the biological models tend to be lopsided and more clustered at the linear ends (see Fig. 4 for an example), dramatically reducing the load on the higher orders and thus resulting in a faster growth of approximability with higher orders. This effect may be less pronounced in constrained ensembles, and even less in unconstrained ensembles, as the apportioning in the corresponding Taylor spectrums may be less lopsided and more uniform in comparison. Based on these observations we hypothesize, but do not conclude, that biological systems may have been subjected to a slight but discernible selection pressure for developing less nonlinear regulatory rules. This hypothesis is further supported by the fact that the approximability systematically increases as more biological constraints are imposed, as one goes from unconstrained to constrained to the biological models (Fig. 2). This has implications not only for the feasibility of biomedical approaches to control emergent somatic complexity or guided self-assembly of novel forms17, but also for models of anatomical homeostasis and evolvability: linearity implies easier control of its own complex processes by any biological system, and more efficient credit assignment during evolution.

Random models with minimal biological constraints tend to be naturally approximable

The MAE of the unconstrained random ensembles are remarkably low with values averaging around 0.07 and not exceeding 0.17 at the linear order; these values only decrease further with higher approximation orders (Fig. 2a). These observations suggest that even random Boolean models with minimal biological constraints tend to be considerably approximable. In other words, approximability may be a natural property of random Boolean ensembles to some extent and not necessarily a special property of biological models. Furthermore, the MAE has its inflection point at order 2 (Fig. 2a), meaning that the approximability starts saturating at that order. We hypothesize based on these observations that regulatory information beyond the second order may be inconsequential with regards to the statistics of network dynamics in general.

The above hypothesis has an analog in the realm of Boolean Ising models, where it was found that maximum entropy (MaxEnt) models with only pairwise interactions were sufficient to fit random multivariate Ising networks that are densely connected and whose state spaces satisfy certain entropy constraints, features that many biological systems share18. There are however important differences between our results and theirs: (1) even though the Taylor expansions resemble MaxEnt expressions, the latter fit global state-space distributions, whereas our Taylor polynomials are local formulations; and (2) whereas our Taylor polynomials are built on derivatives of the states, MaxEnt models are based on the raw values. Notwithstanding these potentially superficial differences, the analogous results are striking and calls for further research on whether the respective explanations are similar if not fundamentally the same. Besides, the above observations are reminiscent of the concept of model degeneracy or “sloppiness”, where several models (defined by unique sets of parameters) explain the same biological phenomenon due to redundant parameters19. In our case, it’s not the parameters but the higher-order relationships among them that are redundant, as they contribute minimally to the MAE beyond a certain order of approximation (Fig. 2a). Future research will determine whether there’s a deep connection between the sloppiness of the parameters and the orders of relationships among them.

The approximability of a biological model depends on its class, with the cancer family displaying the most variability

Even though the nonlinearity of biological networks is less than expected on average, individual and category-dependent variations were observed. In the following, we focus on the “linear approximability” (PMAE corresponding to the linear order) of the biological models since it’s hypothesized to be the cause of the observed approximabilities in the higher order; values are listed in Supplementary Table 1. First, there are a few networks in almost every category that are more nonlinear than expected, as evidenced by the negative linear approximability values (Fig. 3). Second, the disease networks in C1 and the cancer networks in C2 are the ones with the most linear approximability, with values ≥75% (Fig. 3). In other words, the cancer or disease pathways tend to be more optimized for linearity compared to the other categories. This makes sense since a more linear pathway is more amenable to control, which presumably works in favor of the agents of disease. On the other hand, the disease and cancer networks also display the highest variability in their linear approximability compared to other categories (p < 0.05 in all comparisons for the cancer models), with the PMAE values ranging between −92 and 80% (Fig. 3). In other words, disease and cancer models could be either relatively extremely more linear or nonlinear than expected, depending on the specific cases. Note that about 56% of the disease category comprises of non-cancer networks (31/55), which suggests that the effect is not significantly biased by cancer networks. These observations suggest that regulatory nonlinearity may offer an effective “entry point” to the agents of disease by virtue of its natural heterogeneity that they could leverage to their advantage perhaps as a means to evade treatment since there’s no single level of nonlinearity to target. This heterogeneity may also have a connection to one of the hallmarks of cancer, namely genetic heterogeneity20 where the cancerous cells within an individual display heterogeneous gene expression compared to the homogeneous expression in the healthy cells. In the case of nonlinearity, the heterogeneity manifests at the population level, raising the question of whether it may also be observed at the level of single cells within an individual. In other words, could the heterogeneity of nonlinearity be yet another hallmark of cancer?

Fig. 3: The linear approximabilities of various categories of biological models.
figure 3

a Classification 1 (C1); b Classification 2 (C2). The categories in either classification are displayed in increasing order of variance. Each box represents the distribution of the linear approximabilities of the corresponding category. The p values indicate the statistical significance of the difference in the variance between pairs of categories; only the p values of significantly different pairs are shown. Statistical analysis by F-test of equality of variances.

The shape of the Taylor spectrum explains the extreme opposite characters of linear approximability of a pair of cancer models

Why are some models more linearly approximable and others less? The answer lies in the organization of the corresponding Taylor decompositions, as described above. To illustrate this in detail, we compared the Taylor decompositions of a pair of models chosen from among the most extreme outliers of linear approximability in either direction (Fig. 3). Those models are respectively the following: a linear model describing the role of the protein p53 in the regulation of cell-cycle arrest in breast cancer21, henceforth referred to as the “P53” model; and a nonlinear model describing the role of mutations in the regulation of metastasis in lung cancer22, henceforth referred to as the “Metastasis” model. P53 has a linear approximability of about 72%, and it consists of 16 nodes with a mean in-degree of 3.8 ± 2.4 and a mean output bias of 0.38 ± 0.14; and, Metastasis has a linear approximability of about −79%, and it consists of 32 nodes with a mean in-degree of 4.9 ± 2.5 and a mean output bias of 0.27 ± 0.26. Thus, while P53 is smaller and sparser than Metastasis, its nodes exhibit more output-uncertainty compared to Metastasis. According to the mean field theory of random Boolean networks23, the opposite characters of the mean in-degree and the output bias of these models means that their dynamical behaviors could be expected to be similar (although with the caution that the theory was originally developed for infinite-sized and homogeneously connected networks, which is not the case here). However, we know that their linear approximabilities, which is another expression of dynamical behavior, are opposites. One explanation for this discrepancy lies in the distinct apportioning of nonlinearity in their respective Taylor decompositions (Fig. 4). Specifically, while the magnitude of nonlinearity, defined as the mean absolute value of the Taylor derivatives for a given order and normalized appropriately (see text of Fig. 4), tends to be clustered around the linear order for P53, they are relatively more spread out for Metastasis. Moreover, while the magnitude of the linear order for P53 is more than twice as large the next largest magnitude at lower orders the corresponding ratios for Metastasis are relatively smaller, thus explaining why P53 is more linearly approximable than Metastasis. This result is consistent with predictions based on a model of scaling of cellular control policies24. A more controllable (linear) network (P53) is optimal for cooperation with other cells toward collective (normal morphogenetic) goals. In contrast, a cell defecting from the collective and reverting to a more unicellular lifestyle (Metastasis) should exhibit a less predictable, controllable network due to pressures from parasites and competitors that independent unicellular organisms face. Methods for calculating controllability (e.g., linearity) are an important addition to recent efforts to solve the conundrum of interpretability of information structures in contexts ranging from machine learning to evolutionary developmental biology25,26,27.

Fig. 4: A comparison of the spectrums of the magnitudes of nonlinearity of the models corresponding to the two extreme cases of approximability.
figure 4

The bars represent the mean absolute value of all the Taylor derivatives of the corresponding order, averaged over all the nodes of the corresponding model containing those terms. The maximum orders (in-degrees) of P53 and Metastasis are respectively 10 and 8. To compare the derivatives from different models they were normalized with respect to the maximum possible absolute value of a Taylor derivative of order α (Eq. (6)) of a Boolean function with output bias p, given by \((\min (p,0.5)-\max (p-0.5,0)){2}^{| \alpha | }\) (proof provided in the “Methods” section). The error bars represent the standard deviation and not the confidence intervals of the means since they are not estimates. The errors appear large whenever the corresponding distribution of the magnitudes are bimodal with many values clustered close to 0.

Discussion

This paper introduces the concept of regulatory nonlinearity as a characteristic of Boolean networks. There are several other related characterizations of Boolean networks such as canalization28, effective connectivity11, symmetry29 and controllability30. It has been previously reported that the levels of canalization (a measure of the extent to which a subset of inputs actually influence the outputs of a Boolean function) and the mean effective connectivity (a measure of collective canalization) are high in biological networks3,11. It has also been found that biological networks need few inputs to reprogram31 and are relatively easier to control4. Our formulation of regulatory nonlinearity is related to these other measures in that more linearity implies more apportioning of influence to individual inputs rather than collective sets of inputs (Fig. 4). Hence, we hypothesize that regulatory nonlinearity may serve the purpose of controllability and epigenetic stability32.

Our results further moot the possibility that regulatory nonlinearity may be a factor underlying more powerful dynamical phenomena such as memory33 and computation, defined as the capacity for adaptive information-processing34. Even though there’s increasing consensus that biological systems contain memory and perform computation, clarity is lacking as to what features of those systems enable it and what general principles underlie it34,35. Our framework of regulatory nonlinearity offers an approach to answering these questions. For example, one could consider a known dynamical model with a capacity for memory33 or universal computation such as the elementary cellular automaton (ECA) driven by rule 11036 and ask if there are unique properties of their Taylor spectrums that confer their respective capabilities. Present approaches to answering this question typically consists of characterization of the dynamical behavior and not the rules33,37,38. A characterization of the rules especially makes sense for ECA39 since the structure is the same (lattice) and the only feature that distinguishes one ECA from the other is the rule.

Looking at such questions from an even broader perspective it becomes evident that they are only instances of the ultimate puzzle of complex systems, namely what connects the structure and the function of a system. Even though recent work has attempted to answer this question from the perspective of the rules or dynamical laws that govern the system11,39,40, more tools are needed41. In that regard, our framework of regulatory nonlinearity could be a novel addition to this burgeoning toolkit in that it could also be applied to continuous differentiable models of biological networks such as those based on differential equations.

The main limitation of our formulation of approximability is that the approximation accuracy will necessarily increase with higher orders of approximation since the Taylor terms would be accumulated with each higher order (the highest order of approximation is exact). However, this does not affect the falsifiability of our framework since it’s possible, for example, to construct networks with XOR-like functions that would be clearly less linearly approximable than the associated ensembles. The Metastasis model is another example in that regard (Fig. 4). Furthermore, though our experiments control for regulatory nonlinearity they don’t offer insights into how it dynamically interacts with other network features such as connectivity in generating the observed approximability. For example, the network structure may be expected to determine which states the network enters into at any given point that in turn determines which parts of the Taylor compositions are actually utilized and how. Moreover, it’s known that certain characteristics of the local regulatory rules, such as “effective connectivity”, dynamically modulate the network structure itself42. Thus, more research is required to investigate the extent to which the structure and output bias facilitates the high approximability of biological networks and vice versa, for which we already have produced preliminary results by way of the constrained models (Fig. 2). Lastly, our conclusions about the linearity of biological regulatory networks may be a reflection of a hidden bias built in the inference methods that produced the models in the first place. We leave it to future work to explore these realms.

Methods

Probabilistic generalization of Boolean logic

Here we provide a continuous-variable formulation of a Boolean function by casting Boolean values as probabilities, thus transforming it into a pseudo-Boolean function43. Consider random variables Xi : {0, 1} → [0, 1], i = 1, …, n, with Bernoulli distributions. That is, pi = Pr(Xi = 1) = 1 − Pr(Xi = 0) = 1 − qi, for i = 1, …, n. Let X = X1 ×  × Xn be the product of random variables and f : X → {0, 1} a Boolean function. Let \({R}_{0}^{f}=\{x\in X:f(x)=0\}\) and \({R}_{1}^{f}=\{x\in X:f(x)=1\}\). Note that X is a disjoint union of \({R}_{0}^{f}\) and \({R}_{1}^{f}\). Then:

$$Pr(f=1)=Pr({R}_{1}^{f})=\mathop{\sum}\limits_{x\in {R}_{1}^{f}}Pr(x)=\mathop{\sum}\limits_{x\in {R}_{1}^{f}}\mathop{\prod }\limits_{i=1}^{n}{\hat{p}}_{i}$$
(1)

where \({\hat{p}}_{i}={p}_{i}\) if xi = 1 and \({\hat{p}}_{i}=1-{p}_{i}\) if xi = 0. Let \(\hat{f}({p}_{1},\ldots ,{p}_{n})={\sum }_{x\in {R}_{1}^{f}}\mathop{\prod }\nolimits_{i = 1}^{n}{\hat{p}}_{i}\). Thus, \(\hat{f}:{[0,1]}^{n}\to [0,1]\) is a continuous-variable function. The following theorem shows that \(\hat{f}\) is a generalization of f in the sense that \(\hat{f}(x)=f(x)\) for all x {0, 1}n.

Theorem. For discrete values of xi {0, 1}, i = 1, …, n, we have \(\hat{f}({x}_{1},\ldots ,{x}_{n})=f({x}_{1},\ldots ,{x}_{n})\).

Proof. Let z = (z1, …, zn)  {0, 1}n. Since each zi is either 0 or 1, we have that pi = 1 if zi = 1 or pi = 0 if zi = 0 for i = 1, …, n. We want to show that \(\hat{f}({p}_{1},\ldots ,{p}_{n})=f({z}_{1},\ldots ,{z}_{n})\). Since \(X={R}_{0}^{f}\cup {R}_{1}^{f}\), we have that either \(z\in {R}_{0}^{f}\) or \(z\in {R}_{1}^{f}\). If \(z\in {R}_{1}^{f}\), then f(z) = 1 and \(Pr(z)=\mathop{\prod }\nolimits_{i = 1}^{n}{\hat{p}}_{i}=1\). Moreover, for any other \(x\in {R}_{1}^{f}\) with x ≠ z we have that Pr(x) = 0. Thus, \(\hat{f}(z)={\sum }_{x\in {R}_{1}^{f}}Pr(x)=Pr(z)=1\). Now if \(z\in {R}_{0}^{f}\), then \(\hat{f}(z)=0\) because \({\sum }_{{{\emptyset}}}=0\). Thus, \(\hat{f}(x)=f(x)\) for all x {0, 1}n.□

Corollary. If pi = 1/2 for all i = 1, …, n, then \(\hat{f}({p}_{1},\ldots ,{p}_{n})\) is the output bias of f.

Proof If pi = 1/2, the qi = 1/2. Then

$$Pr(f=1)=Pr({R}_{1})=\mathop{\sum}\limits_{x\in {R}_{1}^{f}}Pr(x)=\mathop{\sum}\limits_{x\in {R}_{1}^{f}}\mathop{\prod }\nolimits_{i = 1}^{n}{\hat{p}}_{i}$$
(2)

Example. Consider the AND, OR, XOR, and NOT Boolean functions given in Table 1. The continuous-variable generalization of f1, f2, f3, and f4 are: \({\hat{f}}_{1}={x}_{1}{x}_{2}\), \({\hat{f}}_{2}=(1-{x}_{1}){x}_{2}+{x}_{1}(1-{x}_{2})+{x}_{1}{x}_{2}={x}_{1}+{x}_{2}-{x}_{1}{x}_{2}\), \({\hat{f}}_{3}=(1-{x}_{1}){x}_{2}+{x}_{1}(1-{x}_{2})={x}_{1}+{x}_{2}-2{x}_{1}{x}_{2}\), and \({\hat{f}}_{4}=1-x\).

Table 1 Truth tables of basic Boolean functions.

Note that the above expressions have previously been derived via other (not probability-based) means15,16.

Taylor decomposition of Boolean functions

Since \(\hat{f}\) is a continuous-variable function, we can calculate its Taylor expansion. And since \(\hat{f}\) is a square-free polynomial, its Taylor expansion is finite and simplified (any term containing multiple derivatives of the same variable is zeroed out), as described in Proposition using the standard multi-index notation. Let α = (α1, …, αn) where αi {0, 1}. We define:

$$| \alpha | ={\alpha }_{1}+\cdots +{\alpha }_{n},$$
(3)
$${x}^{\alpha }={x}_{1}^{{\alpha }_{1}}{x}_{2}^{{\alpha }_{2}}\cdots {x}_{n}^{{\alpha }_{n}},$$
(4)

and

$${\partial }^{\alpha }f={\partial }_{1}^{{\alpha }_{1}}{\partial }_{2}^{{\alpha }_{2}}\cdots {\partial }_{n}^{{\alpha }_{n}}f=\frac{{\partial }^{| \alpha | }f}{{\partial }_{1}^{{\alpha }_{1}}{\partial }_{2}^{{\alpha }_{2}}\cdots {\partial }_{n}^{{\alpha }_{n}}}.$$
(5)

Proposition. For p [0, 1]n, we have:

$$\hat{f}(x)=\mathop{\sum}\limits_{| \alpha | \le n}\frac{{\partial }^{\alpha }\hat{f}(p)}{\alpha !}{(x-p)}^{\alpha }.$$
(6)

Note that \(\hat{f}(p)\) in Eq. (6) is the output bias of f as was seen in Corollary. A natural choice for p is p = (1/2, …, 1/2) as it represents an unbiased selection for each variable and it also gives the output bias of the function. Such unbiased choices are not available for the discrete case. Our continuous formulation thus offers such unique advantages over the discrete Taylor decomposition, as it’s a natural generalization of the latter. The Taylor decomposition can be used to approximate a Boolean function by considering a subset of the terms. For example, a linear approximation consists of terms only up to α ≤ 1, a bilinear approximation up to α ≤ 2, etc., up until α ≤ n where it ceases to be an approximation and provides an exact decomposition of \(\hat{f}\). A visual illustration is provided in Fig. 5. The approximation order of a Boolean network could therefore vary between its minimum and maximum in-degrees (number of inputs per node).

Fig. 5: The various approximations of a Boolean function in increasing order of nonlinearity.
figure 5

The logical OR function is represented as a 2D hypercube (top left) with the coordinate values representing input combinations and the color of the circles representing the corresponding outputs (white = 0, black = 1) and is approximated using Taylor decomposition as the 0th order approximation (top right) showing only the first term, the mean output bias; the 1st order approximation (bottom left) including the linear terms; and finally the 2nd order exact form (bottom right) including all the terms.

Example. Consider the continuous generalizations of the AND, OR, XOR and NOT functions given in Example The corresponding Taylor expansions using Eq. (6) and using the derivatives shown in Table 2with p = (1/2, 1/2) are: \({\hat{f}}_{1}=0.25+0.5({x}_{1}-0.5)+0.5({x}_{2}-0.5)+({x}_{1}-0.5)({x}_{2}-0.5)\), \({\hat{f}}_{2}=0.75+0.5({x}_{1}-0.5)+0.5({x}_{2}-0.5)-({x}_{1}-0.5)({x}_{2}-0.5)\), \({\hat{f}}_{3}=0.5-2({x}_{1}-0.5)({x}_{2}-0.5)\), and \({\hat{f}}_{4}=0.5-(x-0.5)=1-x\).

Table 2 Values of partial the derivatives in the Taylor decompositions of the generalizations of basic Boolean functions.

Note that \({\hat{f}}_{1}(1/2,1/2)=0.25,{\hat{f}}_{2}(1/2,1/2)=0.75,{\hat{f}}_{3}(1/2,1/2)=0.5\), and \({\hat{f}}_{4}(1/2)=0.5\) in the above equations are the output biases of the AND, OR, XOR, and NOT functions respectively. Also note that both the AND and OR functions contain the linear and the second order terms in their Taylor decomposition while the XOR function only contains the second order term. This difference is because both the AND and OR functions are monotone while XOR is not since it requires both inputs to be known.

Approximability of a model

We considered a suite of Boolean network models of biochemical regulation from two sources, namely the cell collective9 and reference3. This suite consists of 137 networks with the number of nodes ranging from 5 to 321. The mean in-degree of these models ranges from 1.1818 to 4.9375 with the variances ranging between 0.1636 and 9.2941, while the mean output bias is limited to the range [0.1625, 0.65625] with the variances between 0.0070 and 0.0933. For each biological model an ensemble of 100 randomized models was generated each of whose connectivity and output biases were set to be the same as the former, with only the logic rules randomized. This set, referred to as the “constrained ensemble”, facilitates benchmarking the role of regulatory nonlinearity in the biological rules. To accurately assess the differences between the biological models and the constrained ensembles, we considered a baseline set known as the “unconstrained ensemble” relative to which those differences were computed. This ensemble preserved neither the connectivity nor the output bias but had them bootstrap-sampled from the corresponding distributions characterizing the associated biological models. Taylor decomposition was then applied to both the biological models and the random ensembles and all possible nonlinear approximations were computed by considering terms starting from the linear order accumulating up to the maximum possible nonlinear order. Both the biological models and the random ensembles were then simulated using a set of 1000 randomly chosen initial states iterated through 500 update steps for all orders of approximation; the same initial conditions were used for a given biological model and the random ensemble. The states of the variables were restricted to the interval [0,1] at every step during the simulations by simply resetting them to the nearest boundary of the interval whenever they crossed it. The mean approximation error (MAE) of each model is defined as the mean squared error between the exact Boolean states and the approximated probabilistic states at the end of the simulations; for the random ensembles a single average MAE was computed. It varies between 0.0 and 1.0. The “percentage change in MAE” (PMAE) of the biological models and the constrained ensembles is defined as the percentage difference between the respective MAE and the corresponding unconstrained MAE; it can vary between −100.0 and positive infinity. The “approximability” of a biological model is represented by the difference between the PMAE of the corresponding constrained ensemble and that of itself; it can vary between negative infinity and positive infinity. Hence, the more negative the PMAE is for biological models compared to that of constrained ensembles the more approximable they are deemed to be and the more unique their regulatory nonlinearity is. An illustration of how approximability is computed is provided in Fig. 6.

Fig. 6: An illustration of the relationship between nonlinearity and approximability using a simple 3-node Boolean network.
figure 6

The higher the order of nonlinearity of the network the more approximable it is with respect to the exact dynamics. Left: An example 3-node network utilizing all three 2-input Boolean functions. Right: The network is simulated from a single initial state of (1,1,1). Here the dynamics of the 0th order network (blue) least matches the exact Boolean trajectory (gray), while 1st order network (red) is a slightly better match. The 2nd order network (gray) is exact since it includes all the Taylor terms up to the maximum possible order of 2 (the maximum in-degree of any node). The closer the approximated average dynamics to the exact dynamics, the lower the MAE and the higher the approximability of the former. Notice also that while for node “a” the 0th order network is almost as good as the 1st order network, it is the opposite for node “c” in that its 1st order approximation is as poor as the 0th order; this is because “c” implements the XOR function whose 1st order derivatives are zero.

Classification of biological models

To identify any differences among the approximabilities of different types of biological networks we sought to classify them. Since there are multiple ways to classify biological networks, we chose two classifications so that: (1) they are as orthogonal as possible to each other; and (2) each classification has an appropriate number of (neither too few nor too many) categories. Classification 1 (C1) follows the “pathway ontology” (PW)44 where the networks are grouped into five categories (Fig. 3a), namely biochemical (n = 13), signaling (n = 22), disease (n = 55), metabolic (n = 14) and regulatory (n = 33). According the definitions used in the PW ontology, a “signaling” network comprises mainly of extracellular signal transduction components such as growth factors, kinases, etc. A “regulatory” network, on the other hand, comprises intracellular transcriptional components such as genes, transcription factors, etc. The term “biochemical” here refers to networks that comprises a mix of signaling and regulatory components. “Metabolic” networks consist of components involved in the synthesis and conversion of biomolecules such as enzymes and lipids. Finally, “disease” networks consist of components involved in diseases such as cancer, anemia, pathogenic ailments and disorders such as cell cycle malfunction. Classification 2 was suggested by in-house expertise, where the networks are grouped into four categories (Fig. 3b), namely metazoan (n = 85), cancer (n = 24), primitive (n = 19) and plants (n = 9). The “metazonan” category refers to multicellular organisms and “primitive” refers to unicellular organisms. A given model could naturally belong in multiple categories within a classification but is assigned a unique category for the purpose of simplicity; we chose the categories according to the emphasis laid in the abstracts of the corresponding publications. More details are provided in the Supplementary Material (Supplementary Table 1).

Maximum absolute value of a Taylor derivative

Here we show that \(\max (| {\partial }^{\alpha }\hat{f}| )=(\min (p,0.5)-\max (p-0.5,0)){2}^{| \alpha | }\).

We begin with the definition of the derivative given by:

$$\frac{\partial \hat{f}({x}_{1},\ldots ,{x}_{i},\ldots ,{x}_{k})}{\partial {x}_{i}}=\mathop{\lim }\limits_{h\to 0}\frac{\hat{f}(0.5,\ldots ,h,\ldots ,0.5)-\hat{f}(0.5,\ldots ,0,\ldots ,0.5)}{h}$$
(7)

Since \(\hat{f}\) is a pseudo-Boolean function and hence a multilinear polynomial45, we can compute this as a finite difference; the idea being that the derivative taken over any point on a line is the line itself:

$$\frac{\partial \hat{f}({x}_{1},\ldots ,{x}_{i},\ldots ,{x}_{k})}{\partial {x}_{i}}=\hat{f}(0.5,\ldots ,1,\ldots ,0.5)-\hat{f}(0.5,\ldots ,0,\ldots ,0.5)$$
(8)

Since multilinear interpolation can be formulated as weighted averaging43, we can further rewrite it as follows (the weights are equal here since the non-binary values are all set to 0.5):

$$\frac{\partial \hat{f}({x}_{1},\ldots ,{x}_{i},\ldots ,{x}_{k})}{\partial {x}_{i}}=\mathop{\sum}\limits_{{x}_{1},\ldots ,{x}_{i-1},{x}_{i+1},\ldots ,{x}_{k}\in \{0,1\}}\frac{\hat{f}({x}_{1},\ldots ,{x}_{i-1},1,{x}_{i+1},\ldots ,{x}_{k})-\hat{f}({x}_{1},\ldots ,{x}_{i-1},0,{x}_{i+1},\ldots ,{x}_{k})}{{2}^{k-1}}$$
(9)

As can be seen, there are a total of 2k terms, with half of them positively signed and half negative. This form of expression generalizes to derivatives taken over two or more variables. For example, the derivative taken over two variables, xi and xj, looks as follows:

$$\begin{array}{rcl}&&\frac{{\partial }^{2}\hat{f}(\ldots ,{x}_{i},\ldots ,{x}_{j},\ldots )}{\partial {x}_{i}\partial {x}_{j}}=\mathop{\sum}\limits_{{x}_{1},\ldots ,{x}_{i-1},{x}_{i+1},\ldots ,{x}_{k}\in \{0,1\}}\mathop{\sum}\limits_{{x}_{1},\ldots ,{x}_{j-1},{x}_{j+1},\ldots ,{x}_{k}\in \{0,1\}}\\ &&\left[\frac{\hat{f}(\ldots ,{x}_{i-1},1,{x}_{i+1},\ldots ,{x}_{j-1},1,{x}_{j+1},\ldots )-\hat{f}(\ldots ,{x}_{i-1},1,{x}_{i+1},\ldots ,{x}_{j-1},0,{x}_{j+1},\ldots )}{{2}^{k-2}}\right]\\ &&-\left[\frac{\hat{f}(\ldots ,{x}_{i-1},0,{x}_{i+1},\ldots ,{x}_{j-1},1,{x}_{j+1},\ldots )-\hat{f}(\ldots ,{x}_{i-1},0,{x}_{i+1},\ldots ,{x}_{j-1},0,{x}_{j+1},\ldots )}{{2}^{k-2}}\right]\end{array}$$
(10)

Following rearrangement of terms it becomes evident that this expression also contains 2k−1 positive terms and 2k−1 negative terms, with the only difference in the power of the denominator term. It can thus be concluded that any derivative \({\partial }^{\alpha }\hat{f}\) (in multi-index notation) has 2k−1 positive terms and 2k−1 negative terms.

A straightforward way to maximize the value of a derivative expressed in this form is by assigning as many instances of 1 as possible to the positive terms and as few instances of 1 as possible to the negative terms. For a Boolean function with k inputs and output bias p, this can be accomplished by assigning \(\min ({2}^{k-1},p{2}^{k})\) ones and \(\max (p{2}^{k}-{2}^{k-1},0)\) ones respectively. Therefore:

$$\begin{array}{l}\max (| {\partial }^{\alpha }\hat{f}| )\,=\,\frac{\min ({2}^{k-1},p{2}^{k})-\max (p{2}^{k}-{2}^{k-1},0)}{{2}^{k-| \alpha | }}\\ \qquad\qquad\quad\,\,\, =\,(\min (p,0.5)-\max (p-0.5,0)){2}^{| \alpha | };1\le | \alpha | \le k.\end{array}$$
(11)

Note that this formula only applies to a specific order of nonlinearity α independent of the other orders within the same Boolean function. In actuality, there are dependencies between the various orders within a Boolean function. That is, if a Boolean function were to be constructed such that the derivative of a particular order α1 is maximized then there’s no guarantee that the derivative of another order α2 ≠ α1 could be simultaneously maximized. This is one of the limitations of the normalization for which the above formula is used.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.