The debate on how to define disease is shaped by the necessity of balancing formal definitions motivated by physiology and correspondence with societal norms and expectations1,2,3. At the core of early definitions of diseases is a dysfunction of an organismal subsystem (often on the systemic level of organs), which affects the evolutionary goals of the organism as a whole4,5. The challenge of defining disease is also reflected in the multitude of disease ontologies6 and the limited ability to create mappings among them7.

On a theoretical level, the multi-stage model of carcinogenesis8,9 is an early example of formalizing diseases using a high-level abstraction in terms of a mathematical framework by writing down an explicit equation for the incidence as a function of age, based on the assumption of carcinogenesis as a multi-stage process9. The model has been extended by Rozhok and DeGregori10 including environmental factors and in Ref.11 incorporating additional levels of detail, leading to insight into disease mechanisms and in particular, providing an explanation of the nearly universal age-dependent incidence patterns observed across many cancers. This evolutionary model of cancer considers oncogenic mutations as well as tumour microenvironment and tissue architecture. The question of, how in the case of cancer the environment contributes to risk has been addressed in Ref.12, where the necessity of adopting an ecological perspective on diseases has also been pointed out.

A recent review13 summarizing the application of network biology to human diseases illustrates how disease mutations can be thought of in a network and emphasizes the importance of considering biological networks embedded in an environmental context.

As an illustration of this avenue of research to one specific non-cancer disease, in Ref.14 the authors have created a modular graph model to describe incidence curves for Crohn’s disease, a disease currently in the focus of interest of Systems Medicine15,16,17,18.

Most approaches in Systems Medicine fall into two categories, (1) employing mathematical or computational concepts to analyze medical data and (2) employing mathematical or computational concepts to model a specific disease or class of diseases. In contrast to these data-driven or single-disease approaches, we here strive for a model-driven understanding of some generic relationships between environment, genotype and disease phenotype. To this end, we distill the diverse concepts into a highly stylized model of an abstract genetic disease. A suitable framework is a complex system \({\mathscr {C}}\) receiving at each moment in time t an input vector \({\mathscr {I}}(t)\) (representing environmental stimuli) and generating an output vector \({\mathscr {O}}(t)\) indicating systemic function.

Our model allows us to simulate the interaction of the disease (represented as a loss of function of some network nodes) and a fluctuating environment (represented by inputs to the network). The simplicity of the model enables us to investigate in detail how the observed features (disease severity, incidence curves, etc.) depend on the topology of the network and on the characteristic of the disease.

An important component of the model is that it operates using Boolean logic: Binary inputs are processed via Boolean ANDs (representing complete dependence) and ORs (representing the possibility of choice) and yield a binary output vector. Due to its minimal character and design, we are able to map the concept of directed percolation19 onto our disease model. Our model, therefore, allows harnessing the extensive knowledge of statistical physics about percolation phenomena20,21 for the analysis of diseases.

Detailed analysis of our model suggests the following core properties: Chronic diseases occur predominantly, when clusters of affected nodes are proximal to the output layer (representing network function or phenotype) and are enhanced by network connectivity (higher branching). Acute diseases tend to be independent of the position of affected nodes in the network. Higher branching transforms acute diseases into chronic diseases, but also in general reduces the likelihood of disease.

We further find that for a high number of OR nodes high connectivity between pathways mitigates the severity of a disease. In contrast, for a high number of AND nodes, low connectivity mitigates the severity. Additionally, we find that the impact of the position of the disease-affected nodes increases with the connectivity and decreases with the fraction of AND nodes.


Our disease model is motivated in parts by genome-scale metabolic models22,23 and flux-balance analysis24,25, where nutrient availability and the choice of the cellular objective function (e.g., maximization of growth or energy output) determine the steady-state pattern of metabolic fluxes. It also bears similarity to random Boolean networks as minimal models of gene-regulatory systems, where discrete time and the reduced state space allow for an analysis of attractors and their robustness26,27. As such, our model is in the tradition of minimal models (or ’toy models’, ’stylized models’) in statistical physics28,29.

Figure 1 summarizes the general scheme of our investigation (Fig. 1a), the formal definition of node states (Fig. 1b), the notion and dynamical effect of branching (Fig. 1c) and the layer structure of the model (Fig. 1d). In addition to obvious size parameters (number of input nodes, number of layers), our model depends on two parameters, the branching probability b and the ratio a that defines the fraction of nodes that act as AND or OR gates.

Disordered lattice model

Motivated by biological network maps30,31 we write the generic biological network representing genotype and interfacing environment (input layer) and phenotype (output layer) as a set of H parallel pathways (as depicted in Fig. 1d). Each idealized pathway is represented as a directed line graph with L nodes, where each node represents a functional unit of the network that can either be active or inactive. We characterize each node by its pathway index \(k \in \{1 \dots H\}\) and its position \(j \in \{1 \dots L\}\) in the pathway (number of steps from input towards output). Following the definition of a line graph, each node (kj) is hence connected to the following node within the same pathway \((k,j+1)\) via a directed edge. To incorporate generic dependencies between different pathways (e.g. regulatory and discriminating mechanisms or regulatory overlap of metabolic pathways) with a probability given by the branching parameter b, a node (kj) is connected to the following node of a neighboring pathway \((k-1,j+1)\) or \((k+1,j+1)\), respectively. We do not employ any type of periodic boundary conditions. Hence, connections outside the obvious boundaries are omitted. Due to this branching, each node can have up to three inputs. For simplicity, we assume that the processing performed at each node is represented by one of two possible Boolean functions (logical AND or logical OR) determining the local input-output relation of this node. The parameter a determines the ratio between the number of nodes that act as ANDs (and consequently \(1-a\) is the percentage of ORs).

Figure 1
figure 1

(a) Flow chart of our investigation: The outputs of a healthy and a defect (disease) network (receiving the same input vector) are compared. Depending on their difference they are assigned to a specific class: Class A, if both outputs are equal (no disease), class B, if the disease network has a lower but non-zero output (symptomatic disease), class C, if the disease network has zero output, while the healthy network has a non-zero output (lethal disease), class D, if both the healthy and the disease network have zero output. (b) Definition of node states and visual representation of the effect of a non-functional node. (c) Illustration of branching and visual representation of compensating a defect via interacting pathways. (d) Schematic representation of the full model and illustration of the layer structure.

The environment is represented by presences and absences of input components (‘stimuli’, ‘nutrients’) and hence by a binary vector. This input vector, together with the processing capabilities of each node, then creates a flux pattern of active nodes and links which finally results in an output vector.

This model of interdependent pathways is, of course, only a stylized approximation of a real-life biological network. In order to keep the interactions as simple as possible, the model is based on several idealizations: (1) Due to the enforced lattice structure, our model assumes that within the network only neighbouring pathways can be connected. This does not reflect the topology of real networks. However, for the sake of phenomenological insight into the local correlations between pathways, we decided to concentrate on this lattice structure. (2) As an acyclic graph our model does not allow for loops. This is at odds with the fact that regulatory elements heavily rely on direct or indirect feedback mechanisms. Feedback loops are often associated with rapid systemic responses to perturbations32,33. In this sense, our simplifying assumption is comparable to a steady-state approximation (e.g. employed in flux-balance analysis in metabolic investigations25,34). Also, note that the nodes in our network represent functional units that might internally rely on different feedback structures.

These idealizations do not allow for a one-to-one mapping onto real biological systems. Our disordered lattice functions as an effective network summarizing the joint action of a multitude of biological networks—from signalling pathways35,36 and protein interactions37,38 to metabolic networks39,40.

For biological systems, where the knowledge about interactions of biological entities is more complete than for human cells, e.g., bacteria, it has been shown that the precise interplay of genetic regulation and metabolism installs a balance between robustness to environmental fluctuations and sensitivity to genetic changes41,42,43. By adjusting the ratio between AND and OR gates we can continuously tune our model to such a robust or sensitive behaviour. For example, in the limit of only AND nodes (\(a=1\)), a single deactivated input will cause the deactivation of any connected downstream node. Likewise, in the limit of only OR nodes (\(a=0\)), a single activated input activates all connected downstream nodes. The first row of Fig. 2 illustrates this effect.

Figure 2
figure 2

Schematic summaries of various aspects of the model. Red indicates non-functional nodes (i.e. the genetic predisposition for a disease, ‘disease nodes’). Green nodes and links indicate activity (‘flux’). Inactive components are indicated in yellow. All rows (except for the first one) use the same (intermediate) value of a. The first row shows how a affects the impact of non-functional nodes. The second row illustrates the effect of different clustering \(d\) for the same number of non-functional nodes \(D\). The third row shows how the position of the non-functional nodes can affect the phenotype. The fourth row is an example of synergistic effects between network and environment. The fifth row shows an example of a genetic disease being masked by environmental factors.

Disease generation

As with many other examples of minimal models or ’toy models’ in biology (see28,44,45), the stylized nature of our model allows us to formally represent many features of a real-life system. In the following, we seek to study basic disease characteristics.

Within our model, a genetic disease—a genetic predisposition for a disease phenotype—manifests as a loss of function of one or several network nodes as shown in Fig. 1b. We characterize such a disease by a set of properties: the number \(D\) of disease-associated (defect) nodes and their distribution, determined by the clustering parameter \(d\) and the resulting average location \(\lambda\) of the genetic damage, where a larger value of \(\lambda\) means that the average genetic damage is closer to the output layer. To define which nodes are affected, the first node is chosen randomly. Then, the selection of the \(D- 1\) remaining nodes relies on the Eden growth model46,47,48 with teleportation and proceeds as follows:

  • With probability \(d\) a node connected to the current cluster of disease-affected nodes gets deactivated (growth of the current cluster).

  • Otherwise (with probability \(1-d\)) a randomly selected node gets deactivated and serves as a nucleus for a new cluster.

The parameter \(d\) can hence be used to tune the model between a state, where the disease-associated nodes are either distributed randomly (\(d= 0\)) or concentrated in one connected cluster (\(d= 1\)) as illustrated in the second row of Fig. 2.


Incidence curves

We are now in a position to analyze how fluctuations of the environment affect an unperturbed (‘healthy’) network in contrast to a network with non-functional nodes representing a disease genotype.

Due to the focus on only AND or OR gates, a node can only become active, if at least one input was active. Now, since a disease appears as a deactivated node, for the same input a defect network has always the same or fewer active outputs than a healthy network: The active outputs of the disease affected network are always a proper subset of the active outputs of the healthy network.

For our quantitative analysis, we first generate a random environmental condition (characterized by the probability \(I\) of active inputs). For every time step we then proceed as follows (compare Fig. 1a):

  • For the given environment we analyze the output vector of the healthy network. If the vector is zero, the environment is already lethal for a healthy and hence also for the defect network (case D in Fig. 1).

  • If the output of the healthy system is non-zero we compare its output to the output of a defect network (receiving the same input vector). There are three possible outcomes: (1) Both vectors are equal. This can be interpreted as no disease symptoms: Both networks display a healthy phenotype (case A in Fig. 1). (2) The vector of the defect network is non-zero but has fewer non-zero components than the healthy network. This case represents a disease phenotype (case B in Fig. 1). (3). The output vector of the defect network is zero. This indicates lethality due to the disease (case C in Fig. 1).

  • To simulate fluctuations in the environment at each time step each element of the input vector is preselected for change with a fixed probability of 20%. Then, each element within this preselected group is set to 1 with probability \(I\) and to 0 otherwise.

Figure 3
figure 3

Top panel: time series of the number of active inputs (grey background) and the output strength for the healthy (green) and the disease (red) network. A difference between the red and the green curve (red below green) indicates visible symptoms. The type of the observed disease is indicated by the colour bar (colour code as defined in Fig. 1a). Further time series for different sets of parameters can be found in the supplement. Bottom panel: histograms of disease incidences for different values of \(d\).

As a result, we obtain time series as shown in Fig. 3 (top). Here, the green and red curves represent the output strength of the healthy and the disease network, respectively. Every difference between the green and the red curves thus indicates a disease phenotype for the respective environment. The corresponding case of the four possible (no detectable disease, disease phenotype, death due to disease, death due to environmental conditions) is indicated by the respective colour in the bars in the lower segment. For all time series without death, it is possible to analyze the distribution of time spans with and without symptoms, which results in incidence curves as e.g. shown in Fig. 3 (bottom). Further time series are shown in the supplement.

One should note that the environment characteristic can in principle be used to scale the incidence curve in time for comparison with a specific disease, e.g. by decreasing the frequency of the fluctuations. However, as we are interested in the universal features of the model we do not pursue this line of investigation here. We have shown that our model, despite its simplicity, is capable of producing realistic incidence curves (Fig. 3 (bottom); showing a similar range of shapes as a function of the model parameters, as the incidence curves displayed, e.g., in Ref.14). In the following, we will now investigate how the different parameters influence the behaviour of the system.

Whether and how often the time series show a specific case depends on the choice of parameters. In Fig. 4 we vary the fraction of active inputs \(I\) and analyze for 1000 steps how often a specific case was reached. The same plots for other parameter combinations are presented in the supplement. For a high fraction of AND gates (\(a > 0.5\)) the healthy network is already very sensitive and often shows zero output if only a few input elements are deactivated (frequent occurrence of case D). These sensitive systems become more robust, if the connectivity between the pathways is decreased, indicated by a shrinking number of D-cases and simultaneously a growing number of Dx-cases. This dependence stems from the fact that low connectivity is likely to isolate the consequences of a genetic defect, by restricting it to very few pathways, or even a single one. In the special case of \(b=0\), the system is just a collection of independent single pathways. In such a case the parameter a does not have an effect and consequently for \(D=0\) the output is always the same as the input. In contrast, for a large number of OR gates (\(a < 0.5\)), there is a high chance that the healthy, as well as the defect system, have non-zero output. Within this regime, for high branching \(b > 0.5\) the outputs of both systems are likely to be equal because a deactivated pathway gets healed by neighbouring pathways as depicted in Fig. 1c. For low branching \(b < 0.5\) the defect network often shows symptoms. Depending on the proportion of AND and OR gates less disease phenotypes hence occur if either the branching is high or low. The figures also allow for another observation: For a high number of active inputs there are—depending on the disease—mainly two possibilities: Either the system stays in case A (the healthy and the disease network show the same output), or it stays in case B (the disease network shows lower output). If the fraction of active inputs is decreased, it is also possible (as e.g. observed in Fig. 3 (top)) that the time series switches between cases A and B. We can identify these outcomes with two different disease conditions: If the system stays in case B this corresponds to a chronic disease where a lower output persists. Contrarily, if we observe a switching between A and B, this corresponds to diseases observed as acute. In this regard it is instructive to relate some of the behaviours observed in Fig. 4 to the schematic effects listed in Fig. 2. Parameter settings resulting in large green regions in Fig. 4 correspond to input and damage masking (fifth row in Fig. 2), while parameter settings with small green regions provide evidence for input and damage (disruptive) synergy (fourth row in Fig. 2). Note that due to the design of our model (not allowing the case of no genetic damage) we cannot directly measure such input and damage synergy. Changes of the region sizes with increasing genetic damage can be assessed by observing changes in region sizes between Fig. 4 and the additional figures presented in the supplement.

Figure 4
figure 4

Dependence of the observed cases on the fraction of active inputs \(I\). For the upper figure, the clustering parameter was set to \(d= 1\) which means that all defective nodes form one cluster. Contrarily, in the lower figure, the parameter was set to \(d= 0\), leading to a broad distribution of the defective nodes. For each parameter combination, the system was simulated 100 times for 1000 time steps. The colour code (as defined in Fig. 1a) indicates, which cases occurred in the time line. Here, Cx and Dx denote that the time line also contained the cases (A, B) or (A, B, C), respectively. Curves for other sets of parameters are provided in the supplement.

Besides the general dependence on the choice of parameters, it is particularly interesting to analyze, how the position of the disease nodes (within the network) affects the visibility of the disease. Figure 5 compares the possible outcomes of the network depending on the average location \(\lambda\) of the disease-associated nodes as well on the fraction of ANDs a and the branching parameter b (Further results for different sets of parameters are presented in the supplement).

For small a we observe a strong dependence on the average location of the genetic damage: If the average location is close to the input, the symptoms are often not visible and the output of the healthy and disease-affected network are equal (green, case A). Contrarily, if the average location is close to the output, the defect network has often less activity than the healthy network (the disease is visible; yellow, case B). For large a the model shows different behaviour. Here, in most cases, both the healthy and the defect network have zero output (both are dead; black, case D). Additionally, the behaviour is mostly independent of the position of the non-functional nodes. We can explain this behaviour with some simple arguments: Let us assume a single non-functional node at location k in the jth pathways. Without any crosstalk between the pathways, this defect affects all subsequent sites on position \(k+1\), \(k+2\),... L. Now, if we allow for branching, two mechanisms need to be taken into account: (1) A signal from a neighbouring pathway (\(j-1\) or \(j+1\)) can arrive and restart one of those affected nodes, which require a logical OR. (2) Since the transmission and distribution of 1s coming from the now deactivated pathway disappear, the single deactivated node can deactivate neighbouring pathways in case of a logical AND. Depending on the fraction of ANDs (determined by the parameter a) a disease affected node can hence create longer “shadows” of deactivated pathways or it can be circumvented. The branching determines the speed of these two mechanisms.

Figure 5
figure 5

Dependence of the observed cases on the average position \(\lambda\) of inactive nodes. The colour code (as defined in Fig. 1a) indicates, which cases occurred in the time line. For the upper figure, the clustering parameter was set to \(d= 1\) which means that all defective nodes form one cluster. Contrarily, in the lower figure, the parameter was set to \(d= 0\), leading to a random distribution of the defective nodes. Results for other sets of parameters are provided in the supplement.

State-space dynamics

The stylized nature of our model also allows for a more stringent and more comprehensive analysis, which is less based on numerical simulations, but on formalisms of discrete systems. This direction is pursued in the present section.

In the previous section, we generated a fluctuating environment and observed the corresponding output of the network. As a result, we obtained samples of phenotypes for given input activity. For small systems, however, it is feasible to test all possible input vectors, which allows for a full characterization of the disease. The following analysis illustrates the funnelling of states, namely that different environments are mapped onto the same phenotype. Thus, it is generally not possible to infer the exact triggering environment for a given phenotype.

We consider the middle (bulk) segment of the network as an operator transforming the input vector (environment) into the output vector (phenotype). As the bulk is a set of consecutive layers, it is possible to trace the evolution of the input vector, step by step, all the way to the output. Using a state-space representation then allows us to analyze the evolution on the scale of the whole state space. With N input nodes (or parallel pathways) we formally have \(2^N\) distinct input states. The corresponding N-digit binary numbers are then processed layer by layer. As this processing is deterministic, a single state at layer k cannot give rise to multiple states at layer \(k+1\). However, multiple states at layer k can lead to the same state at layer \(k+1\). Hence, as already described before, the diversity of states can only decrease across layers. This ’funnelling’ of states along the network is instrumental for the functionality of our model.

Figure 6
figure 6

(Top) Detailed view of the state space evolution. Input states (left; sorted according to the binary number they represent) are converted into output states (phenotypes; right) by the network. The number of phenotypes is usually much smaller than the number of possible input states. Green indicates a trajectory in the healthy network, while orange indicates a trajectory in the disease network. (Bottom) Stylized views of the state space evolution for different values of a.

The funnelling of binary states is illustrated in Fig. 6, where all possible initial states are arranged along the y-axis and layers are shown along the x-axis, thus allowing us to follow all possible input states through the network. As soon as two lines meet they merge, which decreases the number of possible states after this time point by one. Formally, in the limit of infinite time (infinite length of the network) this always leads to a system where all outputs are either on or off.

Relation to percolation theory

One design goal behind our model was to closely link it to existing models since this allows to easily transfer and apply existing theoretical insights. In this section, we explain how our model bears such helpful similarities to theoretical models in statistical physics.

Biological regulatory networks can be classified as complex dynamical systems, the analysis of which has a long tradition in statistical physics49. The design of our model allows a specific class of models from physics—directed percolation (DP)19,20—to be transferred to our system. More specifically, our model is similar to a subgroup of DP, namely Compact Directed Percolation50,51,52. Percolation models are simple models from statistical physics to analyze signal propagation through heterogeneous systems e.g. cells or neurons53,54. In most of these models, a single parameter p determines the probability that a signal is locally transmitted. If p is too small, no signal can reach the output. In this (inactive) phase, the probability that the signal reaches a specific layer decays exponentially with the number of layers. Contrarily, if p is large, there is almost always a connected path through the system and hence a finite probability that the signal reaches the output. The transition point between these two phases \(p = p_c\) is, mathematically speaking, a critical point. At the critical point, the probability that the signal can traverse the medium decays algebraically with the number of layers. We can use these results from statistical physics to understand and interpret the results observed in our model. An example is Fig. 5 where, from left to right, the fraction of ANDs was increased. At \(a \approx 0.5\), we observe a sudden change in the general behaviour of the system: For \(a \le 0.5\) most systems show either case A or case B. However, for \(a > 0.5\) most systems belong to case D where both systems show zero output. This transition can be identified as a phase transition within the genetic disease model.

Another central topic in the analysis of percolation models is their dependence on small perturbations. The analysis of how a single perturbation (often also called damage) evolves over time is known as damage spreading55. If one introduces damage and compares the difference to the unmodified network there are two possible results: The damage can spread, which ultimately leads to a system that evolves very differently from the original network, or the damage might disappear. We envision that the analysis of damage spreading transitions can be an interesting direction for further research within our disease model.

Discussion and conclusion

We presented a minimal model to study the interplay between network topology and disease nodes. Our model can be used to analyze, how incidence curves and disease visibility depend on parameters like the clustering of disease genes or the cross-talk between pathways. The situation considered here is generic (i.e., not tailored to any specific set of biological processes). The model is motivated by the general properties of metabolic networks, where the most obvious type of environmental fluctuation is a change in nutrient availability. The output vector can be thought of as some type of cellular objective function (for example growth or energy production) as typically employed in genome-scale metabolic models, for example for flux-balance analysis25. A design concept of our model is that genetic predisposition manifests as a loss of function, which is a suitable model, if the signal processing does not include a logical NOT. Following this choice, we only employ logical ORs and ANDs. However, if one relaxes these constraints, there are obviously other possible choices for logical gates e.g. the functionally complete sets {AND, NOT} or {NOR}.

The conceptual foundation of our model is the basic fact that human diseases are rarely the consequence of a single defective gene, but the result of complex interactions within the cellular-molecular network30. The disease phenotype is hence a result of different and mutually dependent interactions.

We believe that minimal, generative models of typical data types, as well as stylized representations of typical medical scenarios, are necessary to organize the analysis of the intricate relationship between genetic risk factors, environmental stimuli and observed disease phenotype. Our model allows building such an understanding from a general point of view: By the variation of a few parameters, it is possible to compare the interplay of different network topologies and disease characteristics. We also show that the specific pattern of a genetic predisposition of a biological network can have a direct and systematic impact on the disease phenotype. Such relationships between genetic defect patterns and phenotype patterns are a direct consequence of the architecture of the underlying network. Our model suggests that the distribution of disease genes in biological networks has a concrete impact on disease properties like incidence statistics and chronicity. We are convinced that ultimately these predictions can be verified by analyzing disease-associated genes. Major obstacles are the appropriate choice of the biological network (with signaling networks and metabolic networks being obvious candidates due to their structure as an interface between environmental stimuli and phenotypic responses), the incompleteness of current inventories of disease-related genes (e.g., the GWAS catalog56, or DisGeNet57) and the intricate biological details behind mapping disease-associated SNPs to genes. As an example, chromosomal positions may link SNPs to diseases more indirectly than via specific genes (see e.g.58) and such aspects are not part of the current model.

Cellular systems can—to some extent—retain their functions despite changes and fluctuations of external conditions59,60. This ability is known as the biological robustness. In practice it means that the initial diversity of potential input states reduces during the course of interactions within the system to a lower number of possible output states. If anything is required to be robust it means that we need at least two input states, that can produce one output state. Our model demonstrates how this might work on a microscopical mechanistic level, where robustness results from the network architecture of the interlinked pathways.

Our model stratifies diseases according to four main model properties: (1) high or low clustering of affected nodes (representing genetic predisposition), (2) strong or weak network connectivity (branching), (3) high or low numbers of ORs (regulatory alternatives) vs. ANDs (regulatory interactions), and (4) the clustering of affected nodes either proximal to input layer (representing environment) or proximal to the output layer (representing network function or phenotype) and thus the average position of affected nodes.

Based on the detailed analysis of our model, we arrive at the following picture: High average position, high clustering and high branching facilitate chronic diseases. The average position of affected nodes does not strongly affect the probability of acute disease, in contrast to the clustering of these nodes, which disfavors acute diseases.

Employing mathematical modelling to leverage biological networks—and signalling networks in particular—for the purpose of understanding human diseases is a cornerstone of the emerging field of precision medicine61,62. Due to the simple structure of the network, our model allows for an in-depth and node-by-node analysis of the observed results. The model can hence be used to assess the robustness and vulnerability—common topics in Systems Biology63,64—of phenotypic states from a functional point of view. Specifically, the balance between AND and OR nodes (as given by the parameter a in our model) is a balance between sensitivity and robustness: While OR nodes lead to alternative paths through the system, AND nodes allow for more specific input-output relationships. We therefore believe that estimating the parameter a from real networks (as illustrated in the Appendix) is an informative strategy for better understanding biological robustness. Additionally, a quantitative comparison of the disease incidence curves in our model with data for various genetic diseases is an important direction for future research, as it offers an opportunity to discover mechanistic (though model-dependent) relationships between disease epidemiology and genetic risk factors.

On a more theoretical side, we can imagine different variants to extend and analyse our model. For example, it seems interesting to analyse an evolving version of our model to see how a selection for particular phenotypes (e.g. no chronic diseases) shapes and robustifies the network. To keep our model as simple as possible, we currently only allow connections to the nearest neighbours. It would therefore be worthwhile to investigate whether a few long-range links affect the general behaviour of the network.