Abstract
Percolation is an emblematic model to assess the robustness of interconnected systems when some of their components are corrupted. It is usually investigated in simple scenarios, such as the removal of the system’s units in random order, or sequentially ordered by specific topological descriptors. However, in the vast majority of empirical applications, it is required to dismantle the network following more sophisticated protocols, for instance, by combining topological properties and nontopological node metadata. We propose a novel mathematical framework to fill this gap: networks are enriched with features and their nodes are removed according to the importance in the feature space. We consider features of different nature, from ones related to the network construction to ones related to dynamical processes such as epidemic spreading. Our framework not only provides a natural generalization of percolation but, more importantly, offers an accurate way to test the robustness of networks in realistic scenarios.
Introduction
Classical percolation is a toy model in which one deletes nodes from a lowdimensional regular lattice and computes different properties, statistical and geometrical, of the remaining isolated clusters. Although firstly considered as a gelation problem in the context of macromolecules^{1}, it is not until the development of the theory of critical phenomena^{2} that percolation drew a great deal of attention to the physics community. The reasons were, at least, twofold. On one hand, percolation provided stimulating theoretical challenges, considered one of the simplest models displaying a phase transition, without any need of introducing dynamics or thermodynamics quantities, as it occurs in the Ising model^{3}, for instance. On the other hand, percolation was flexible enough in its definition to be mapped to many diverse problems, such as the dielectric response of inhomogeneous materials^{4}, epidemiology^{5} or flows in porous media^{6}, among many other^{7,8}.
The interest in percolation is renewed with the advent of modern network theory^{9}. A network, broadly construed, is a set of nodes with arbitrary connections among them, contrary to lattices, that are regular structures embedded in spaces of finite dimension, with all of their nodes having the same number of edges, i.e., same degree. In this context, the fraction of removed nodes are usually thought of as failures or attacks, and the largest connected component after the perturbation has a functional interpretation assumed to be the part of the network that is still operative. Therefore, percolation in this type of topologies has brought a deeper understanding of the robustness and resilience of realworld networked systems^{10,11,12,13}, as well as, from a fundamental perspective, it has provided new analytical techniques^{14,15} and interesting phenomenology from the standpoint of statistical physics^{16,17,18,19}.
Since in networks the degree of nodes is distributed heterogeneously, one can exploit this fact to devise new physically meaningful removal strategies, such as targeting nodes from higher to lower degree^{14,20}. These interventions on the networks are called attacks since they are intentionally performed using some a priori information. Studying percolation based on degree attacks elucidates the role played by large degree nodes, the hubs, on the network robustness. For instance, graphs with longtailed degree distributions are very weak to hub removal, that is, by removing a very small number of hubs the network is broken into many small components. This has catastrophic consequences for the security of realworld networks since many of them display such degree distributions^{21}.
The degree is the most basic centrality measure in complex networks. However, there exist a plethora of alternatives to assess the importance of a node within a graph^{22,23,24}, and accordingly, we can test the importance of these variables on the network robustness by performing attacks based on them^{25,26,27} and evaluate how far is each attack strategy from being optimal^{28}. Moreover, nodes could be characterized by nontopological properties as well, such as age^{29}, biomass^{30}, or bank credibility^{31}. Hence, similar network attacks can be implemented to test percolation properties of the system when a group of nodes with certain characteristics is removed, for instance, those users of online social platforms generating or spreading hateful content^{10}.
Taking into account that in many relevant situations singular information is accessible at a node level, it is desirable to have a method to quantify the impact of intervening in the network following alternative protocols based on these nontopological features. We tackle therefore the challenge of developing a percolation framework that accounts for both topological and nontopological information simultaneously. The latter element will be considered as node metadata, what we call the features. We generalize standard messagepassing methods by introducing a joint degreefeature probability density function on the network. Several percolation quantities, such as the critical point or the size of the giant component are computed. We check the validity of our theory by confronting the analytical estimates with synthetic and realworld networks, finding an excellent agreement.
The rest of the paper is organized as follows. We first motivate the usefulness of featureenriched percolation by presenting some examples of realworld networks with different degreefeature correlations and proposing virtual dismantling experiments on them. Next, the model is presented and we work out the analytical expression for the size of the giant component in terms of a generic degreefeature joint probability function. We then confirm the analytical predictions in synthetic networks, discussing separately the cases of uncorrelated and correlated degreefeature distributions. We also test the validity of our theory in random geometric graphs (RGGs), which are known to be highly spatially correlated, and therefore, message passing methods may fail to accurately predict the percolation point. A final section is devoted to the very interesting case in which the features are related to variables coming from dynamic processes running on top of networks. We investigate these latter cases in both synthetic and realworld networks. We close the article by drawing conclusions.
Results
Empirical evidence of nontrivial feature distributions
In this section, we report different patterns of feature distributions in realworld complex networks. The chosen examples are arbitrarily selected based on our biases, but they are not, by no means, an exception. Indeed, when collecting real data to construct network structures, most of the times the nodes have some associated properties, or metadata, that individually characterize them beyond their degree.
The first example corresponds to a board interlock network, i.e., a bipartite system of directors and companies. Links between them exist whenever a director holds a seat on the corporate board of a company, and nothing prevents a director to sit on more than one boardroom. We regard the feature as the age of the members of the board. We see that, in this case, the average age of the directors is uncorrelated with respect to their degree, see Fig. 1a. Nevertheless, the spread of the distribution as a function of the degree is not constant. Hence, featureenriched percolation in this example can help reveal the role played by directors of a certain age on the global connectivity structure of the system.
The second example belongs to the context of crowdsourced creation of cultural content. We take the snapshot of the current version of Wikipedia, in which nodes are articles and edges are the hyperlinks among them. Moreover, for each node we keep track of the number of revisions that it has suffered since its creation, and how many unique users have participated in these edits. We see that the correlation between the degree of a Wikipedia page and its number of unique editors is positive and heterogeneously distributed, see Fig. 1b. With the framework of featureenriched percolation, we are able to remove nodes with a certain degreefeature pattern, thus, for example, one can be interested to assess how the navigability across this knowledge corpus is modified under the removal of wellconnected articles that do not show enough levels of collaborative edition, i.e., that have been written by too few users.
As a final example, we present a system that displays a negative correlation among degree and feature, see Fig. 1c. It corresponds to misinformation propagation in the online social platform Twitter. We take a sample of messages within a twoweek time window during the COVID19 pandemic and consider only messages that share an url in the text. The nodes are users, where the degree is their number of followers, and the feature is the number of fake news that they have posted. We see that the average number of fake news tends to decrease as the visibility of Twitter accounts is higher, varying broadly in this case as well. Featureenriched percolation can be helpful, for example, to shed light on the problem of how to guarantee the information spreading while cutting off users that systematically share dysfunctional and harmful content.
The model
Let us take the ensemble of networks generated via the configurational model^{9}, with a degree sequence obtained from a degree distribution p_{k}. We assume that each node is characterized by its degree k and by a set of features F = (F_{1}, F_{2}, …, F_{M}), with \(M\in {\mathbb{N}}\). The degree distribution is considered discrete and the feature distribution can be considered either discrete p_{F} or continuous p(F). Throughout the article, we shall assume that features are continuous, but reproducing the results for a discrete domain is straightforward. The joint probability is indicated by P(k, F). We define the occupation probability ϕ_{k,F} as the probability that a node with degree k and feature values F within the interval [F, F + dF] have not been removed from the original network. Our goal is to find the position of the critical point and the size of the giant connected component S as a function of the parameters of P(k, F) and ϕ_{k,F}^{32}.
Let u be the average probability that a node with k connections and feature values F does not belong to the giant component via one of its neighbors. The probability that it does not belong to the giant component is u^{k}. Thus, the probability that the node belongs to the giant component due to a neighbor state is 1 − u^{k} and has to be multiplied by ϕ_{k,F}, which determines whether or not the node itself is present in the network. Averaging this quantity over degrees and features we obtain ∫dF∑_{k}ϕ_{k,F}P(k, F)(1 − u^{k}), where ∫dF is an Mdimensional definite integration over the elements of the feature vector. This is identified as the average probability of finding a node in the giant cluster, or equivalently, the fraction of nodes in the giant component. Therefore
where the generating function is
To solve Eq. (1) we need to obtain an expression for u. This can be achieved by writing a selfconsistent equation that has two contributions. On the one hand, a neighbor might not be in the giant component because it has been deleted, which occurs with probability 1 − ϕ_{k,F}. On the other hand, if the neighbor has not been removed, it should not belong to the giant component via any of its other k_{e} neighbors. k_{e} is called the excess degree distribution, and it is equal to k_{e} = k − 1. This happens with probability \({\phi }_{k,{\bf{F}}}{u}^{{k}_{e}}\). Averaging over the distributions, we get
Here Q(k_{e}, F) is the excess degreefeature distribution, which is normalized and verifies Q(k_{e}, F) = (k_{e} + 1)P(k_{e} + 1, F)/〈k〉, with the mean degree computed as 〈k〉 = ∫dF∑_{k}kP(k, F). Introducing the generating function as
Eq. (3) simplifies to
We readily obtain the size of the giant component by plugging the solutions of Eq. (5) into Eq. (1). Notice that u = 1 is always a solution for Eq. (1), corresponding to S = 0. To observe the percolating structure we need to use the other solution, which appears when the condition \(1=g^{\prime} (u){ }_{u = 1}\) is met. The existence of an analytical expression will depend on P(k, F) and ϕ_{k,F}, otherwise, it can always be solved numerically.
Notice that the relation between generating functions held in ordinary percolation is valid here as well, i.e., g_{1}(z) = ∂_{z}g_{0}(z)/〈k〉. It is important to also note that our generating functions, although including the Mdimensional integral in the definition, depend only on one variable since the feature enrichment does not add any new information in terms of connectivity. Therefore, our framework should not be taken equivalent, for instance, to the study of percolation in graphs with colored edges^{33}, in general multilayers^{34}, in networks with multitype nodes^{35} or in interdependent systems^{36}, where multivariable generating functions are common.
The occupation probability ϕ_{k,F} allows us to understand the role played by certain values of degree and/or features in the connectivity of the network. The classical percolation, where nodes are removed in a uniformly random fashion, is recovered by selecting a constant function ϕ_{k,F} = ϕ ∈ [0, 1]. The case of removing the most connected nodes is recovered by setting ϕ_{k,F} = θ( − (k − k_{0})), being θ(⋅) the Heaviside step function and k_{0} a threshold such that all nodes with degree larger than it are removed. Similarly, one can apply the same arguments in the feature space, and study the case in which all nodes with a feature larger than a threshold are deleted ϕ_{k,F} = θ( − (F − F_{0})). These three examples are sketched in Fig. 1.
Applications
To illustrate and check the validity of the theory, we investigate several examples. For the sake of simplicity we focus on unidimensional feature vectors, i.e., F = F. First we address the case of independent degree and feature, i.e., P(k, F) = p_{k}P(F). We then move to consider joint distributions which are positively and negatively correlated. These latter cases leave the nature of the feature undetermined. In this section, though, we also address problems in which the features are related to the distance in a geometrical space and to dynamical processes evolving on top of the network.
Independent case
Let us consider a network with degree distribution and feature distribution
where k = 0, 1, 2,…, and F ∈ [1, ∞) and α > 1. We take as occupation probability \({\phi }_{F}={{\theta }}\left((F{F}_{0})\right)\), that is, all nodes with feature F > F_{0} are removed. In this case, the generating functions are
It does not exist a closed expression for the size of the giant component, but it is straightforward to obtain a numerical solution. In Fig. 2 we compare the numerical solutions with the actual process of percolation and we see that the agreement is excellent. Figure 2a displays S against the parameter characterizing the topology and we observe, as one could expect, that the network needs to be dense enough to observe the emergence of the giant component. In Fig. 2b we plot the dependence of S on the parameter characterizing the feature distribution. In this case, we see that if the feature distribution does not decay fast enough, no giant component is possible. Notice that in Fig. 2b the critical point is <2, i.e., the feature distribution has a diverging mean value. This case might seem extreme or unrealistic, but the critical point α_{c} can be located within the interval [2, 3]—a much more common case—if a is small enough. This evinces the important role that the feature distribution may play for the robustness of a network to, for instance, attacks that are featurebased. For completeness, we give the values of the critical points, that in this case have a closed expression, namely
To show that the good agreement between theory and simulations extends to other topologies, the same analyses conducted above are presented in Supplementary Notes 1 and 2 for the Erdős–Rényi and scalefree network models.
We investigate the universality class of the percolation process for independent featuredegree distributions, in order to figure out whether the introduction of features modifies the critical properties of meanfield percolation. Here we understand meanfield as the behavior of classical percolation in regular lattices of dimension d ≥ d_{c} = 6. The critical properties of complex networks, which are infinitedimensional entities, might not be the same as meanfield, if, for instance, the underlying degree distribution is powerlaw^{16} or the removal process is modified^{37}. Note that in featureenriched percolation we will necessarily have two parameters, one controlling the topology and another the features. Hence, for the size of the giant component, we write \(S(a) \sim {(a{a}_{c})}^{{\beta }_{a}}\) and \(S(\alpha ) \sim {(\alpha {\alpha }_{c})}^{{\beta }_{\alpha }}\), being β_{a} and β_{α} the corresponding critical exponents. (To be accurate, S(a) also depends on α and S(α) depends on a too, although when studying the critical behavior, they are taken as constants, hence we do not write them for the sake of clearness.) In Supplementary Note 3 we analytically show that β_{a} = β_{α} = 1, i.e., it takes the meanfield value. To find other critical exponents, we can employ the finitesize scaling hypothesis
Thus, the critical exponent \(\overline{\nu }\) can be immediately found by fitting the resulting power law of the size of the largest connected component against the system size, at the critical point. Note that since networks are infinitedimensional, \(\overline{\nu }={d}_{c}\nu\)^{29}, where d_{c} = 6 is the percolation upper critical dimension and ν the typical critical exponent associated with the correlation length. The results are shown in Fig. 2c, finding that \({\beta }_{a}/{\overline{\nu }}_{a}=0.347\pm 0.015\) and \({\beta }_{\alpha }/{\overline{\nu }}_{\alpha }=0.342\pm 0.021\). The values of \({\overline{\nu }}_{a}\) and \({\overline{\nu }}_{\alpha }\) agree well with the meanfield percolation exponent \(\overline{\nu }=3\). We arrive at the same conclusion by employing data collapse based on the finitesize scaling relations, see Supplementary Note 4. Therefore, we conclude that the critical properties are the same as the meanfield percolation process, even though if the feature distribution is scalefree^{16}.
Positively correlated case
Let us consider now the more interesting and realistic case of a joint distribution in which feature and degree are not separable. We take one of the simplest scalefree distributions that are positively correlated,
with \(k\in {\mathbb{N}}\), F ∈ [1, ∞), α > 1, and normalization constant
ζ(⋅) is the Riemann zeta function^{38}. The correlation is positive because the nodes with a high degree tend to have larger values of the feature, a property that can be easily checked by computing the conditional average degree 〈k(F)〉. The distribution (10) has been considered before for instance in the context of transport properties on weighted networks^{39} or in temporal correlations of dynamical processes^{40}.
In practical terms, to assign the values of the degree and the feature in the simulations, we first construct the network from the configurational model with a degree sequence drawn from p_{k} = ∫dFP(k, F). Then, depending on the degree of each node, we draw a random variable from the conditioned distribution P(F∣k). Randomized versions of the system are possible either by randomly shuffling the feature of the nodes in the correlated network or by constructing a new network from p_{k} and assign features from P(F).
Considering again the removal of all nodes with feature value above a threshold F_{0}, the generating functions are
where Φ( ⋅ , ⋅ , ⋅ ) is the Lerch transcendent function^{38}. A closed expression for the size of the largest connected component exists, although it is long and not very enlightening, so we do not report it here.
The theoretical predictions are compared with the simulations in Fig. 3a, finding an excellent agreement. Another curve is also displayed, corresponding to the randomized version of the correlated model Eq. (10), and that serves to single out the role of the degreefeature correlations in the behavior of the size of the giant component. It is immediate to observe that the critical point and the size of the giant component are smaller for the correlated case than for the randomized case. This holds true for all feature thresholds, being the separation among both the critical points and the S more accentuated as F_{0} decreases. The rationale behind this is the following: the marginal feature probability P(F) is the same for both cases, so for a given threshold F_{0} the same fraction of nodes is removed, although the chances of hitting a lowdegree node are much larger in the randomized case than in the correlated case, due to, precisely, the degreefeature correlation. Since targeting hubs it is a very fast way of dismantling a network, it is natural to observe a smaller critical point and a smaller size of the giant component. A quantitative way to measure the deviations caused by degreefeature correlations with respect to the uncorrelated scenario is to compute the area between both curves,
where we have split the dependency of the order parameter into two subsets of variables, for the sake of generality. The sign of Δ gives information on whether the correlations reduce the robustness (positive sign) or enhance it (negative sign). In the inset of Fig. 3a, we show Δ(F_{0}), indicating that the behavior depicted in the main panel, and discussed above, is hold for any value of the feature threshold F_{0}. Surprisingly, Δ(F_{0}) is nonmonotonic and presents an optimal value of the threshold for which the correlations induce the largest robustness reduction.
Figure 3a also shows a striking peculiarity: the nonmonotonicity of the giant component and its approach to the nonpercolating regime S = 0 when α → 1. In Supplementary Note 5, we give an explanation for this behavior and show that there are no critical properties in the vicinity of α = 1. This approach to the nonpercolating phase is a genuine effect of the degreefeature correlation that would be missed if the correlation is disregarded. In other words, the robustness of this positively correlated system under the attack to its most prominent nodes in the feature space would be overestimated if correlations are ignored. In Supplementary Note 6 (see Supplementary Movies 1 and 2 as well) we also address the critical properties of this positively correlated network model, where we show that it shares the same critical exponents as the meanfield percolation.
Negatively correlated case
Contrary to the last section, there are certain situations in which peripheral, low degree nodes might be the ones carrying the largest feature values. Again, we model this scenario with an ad hoc negatively correlated joint distribution, one of the simplest displaying powerlaw behavior^{39}:
with α > 1 and normalization constant
When the exponent α is an integer, \({\mathcal{Z}}\) can be expressed as a combination of polygamma functions, but we were not able to find a closed expression for a noninteger exponent. Anyway, the sum is immediate to compute in any software for numerical calculations. The anticorrelation can be appreciated in the decaying dependence of the conditioned mean degree 〈k(F)〉 with the feature value.
The generating functions corresponding to Eq. (14), and for the same occupation probability as the other examples, are
The comparison between the theoretical predictions and the simulations for the negatively correlated distribution is given in Fig. 3b, both for the correlated distribution and its randomized version, finding again a very good agreement. We observe that in this case the behavior is reversed with respect to what is found in the positively correlated case. The critical point and the size of the giant component for the anticorrelated networks are slightly higher than the one for the randomized counterparts. However, surprisingly this behavior depends on the feature threshold employed. Depending on the F_{0}, we might be in regimes in which disregarding the negative degreefeature correlations leads to an underestimation or to an overestimate of the robustness, see the inset of Fig. 3b. Similarly to the positively correlated case, Δ is nonmonotonic and there is an intermediate threshold value for which the underestimation is maximum, but increasing F_{0} even more we find the point at which the overestimation is maximum. This evinces the nontrivial phenomenology that arises even in these simple cases of ad hoc correlations.
Features related to the network construction
Let us now consider a case where the degree and feature distributions are not imposed exogenously but they emerge naturally from the network construction process. As an example, we consider RGGs, which are especially useful to model situations in which there is some kind of physical contact or proximity between the units of the system, and they find applications in areas as diverse as wireless sensor architectures^{41}, population dynamics^{42}, or consensus formation^{43}, to name but a few. Even though percolation has been widely studied in RGGs in the mathematical literature (see, e.g., Refs. ^{44,45}), critical points are known up to some bounds^{46}, and because of the strong topological correlations of RGGs, treelike approximations, in general, are not as accurate as in the case of infinitedimensional networks. On this basis, we will need to quantify the discrepancy between theory and simulations and how it scales with the link density.
We consider RGGs composed of N nodes, placed uniformly at random on [0,1)^{2} with periodic boundary conditions. Two nodes are connected if they are within a distance r. We take the feature as the distance between a node and its closest neighbor d_{min}, i.e., F ≡ d_{min}, and we delete those nodes that have the nearest neighbor at a distance smaller than a certain threshold r_{0} > 0, i.e., ϕ_{F} = θ(F − r_{0}) (see Fig. 4a). This scenario can be relevant, for instance, in spatial ecological models in which there is a competition for resources^{47,48}. Note that this particular choice of the occupation probability is made for illustrative purposes, and without much effort, one could envision other relevant scenarios, for example, with ϕ_{F} = θ( − (F − r_{0})) or that take the distance to the furthest neighbor d_{max} as the feature. In all these mentioned cases, the generating functions can be calculated.
To compute the generating functions we first need the joint probability function. Its calculation is detailed in Supplementary Note 7. Setting a maximum bound k_{max} = N − 1 in the degree distributions, the generating functions are
The analytical calculations are compared with simulations in Fig. 4b. We keep the radius of interaction r constant and discuss the results as a function of the number of nodes in the network. First of all, we see that the position of the percolation point is inversely proportional to N. This behavior is somehow expected: the larger the system size, the denser the network, so the probability of nodes to have neighbors at distance smaller than r_{0} becomes high, hence the rapid network dismantling. Moreover, we see that when the system size is low, the theoretical results systematically overestimate the value of the giant component. Note that this is not a finitesize effect as it occurs in other systems displaying phase transitions, but inherent limitations of the theory due to the topological correlations present in RGGs and not captured in their degreefeature distribution. We can quantify this discrepancy following the rationale of Eq. (13) but taking the absolute value in the integrand. Thus,
where S_{theor}(x, y) and S_{simu}(x, y) are the analytical expressions of the order parameter given by the theory and obtained via simulations, respectively. Note that a similar expression has been used recently^{49}. In order to understand how the size of the network affects the accuracy of the predictions we compute ϵ(N), see Fig. 4c. We can observe that the discrepancy is reduced as the networks become larger, as already hinted in Fig. 4b. For the range of N explored, the decay trend does not change, hence suggesting that there are not different regimes where our framework works better or fails, but there is a single regime where the effects of topological correlations are gradually washed out as the link density increases.
Features related to a dynamical process
As a final application of the featureenriched percolation, we investigate the case in which the features are coupled to dynamics running on top of the network. There are many situations in which the dynamical evolution of some processes on the network generates some quantity, or attribute, that might not be evenly distributed across nodes. For instance, when studying the problem of synchronization, it is known that in the desynchronized phase, the synchronization error—the timeaveraged distance in the phase space between the state of a node and the average state of the system—displays a powerlaw decrease with the degree^{50}. In such a scenario, one might be interested to study what is the surviving network after the removal of the most (or least) synchronized nodes with respect to the mean activity of the system. Another example is found in communication networks when modeling traffic, where nodes with a higher degree tend to be more congested on average^{51}.
Here we focus on the SIS model, wellknown in the context of epidemiology^{52}. It consists of nodes that can be in either of two states, susceptible or infected. Infected nodes transmit the disease to susceptible ones at a certain rate upon encounter, and infected nodes recover spontaneously at a different rate. The dynamical evolution of the model is written as
where A is the adjacency matrix of the network, τ_{1} and τ_{2} are constants that we set to 1 for convenience and x_{i} is the probability that node i is infected, hence x_{i} ∈ [0, 1]. Integrating these equations from a random initial condition, we see that the probability of finding an infected node in the stationary state depends on the node degree. This allows us to define the feature as the probability of infection at the stationary state, i.e., F_{i} ≡ x_{i}(t → ∞). This way, by using the framework of featureenriched percolation we can study what is the proportion of nodes with the highest probability of ending up being infected need to be removed to dismantle the network. That is, we use again an occupation probability ϕ_{F} = θ( − (F − F_{0})), but other choices of course are possible.
For certain dynamical processes on certain types of topologies, one can calculate the exact joint degreefeature distribution and plug it into Eqs. (1)– (5). However, these will be marginal cases over all the ensemble of dynamical models and network architectures, and for many applications, the joint distribution will not be analytically available. To illustrate how one can proceed in this latter case, here we take an agnostic approach and use only information about the node degrees and their feature value to infer an approximate P(k, F). We first collect all the pairs (k_{i}, F_{i}) and compute a nonnormalized twodimensional histogram, see Fig. 5a. The collapse in the k − F plane tells us the type of correlation between degree and feature, that for the case of the SIS turns out to be positive. Note that for each value of the degree (recall that k is considered discrete), the distribution P(F∣k) has a bellshape curve, of different height, different mean value, and different width, see the ridgeline plot of Fig. 5b for a better appreciation. The first strong approximation is to consider that each P(F∣k) is proportional to a normal form
To obtain the values of the mean feature at degree k, μ_{F}(k), and its standard deviation σ_{F}(k), we employ the Bayesian machine scientist (BMS)^{53}, a recent algorithm based on Bayesian probability and Monte Carlo Markov chains that it is able to provide the most plausible closedform expression given a dataset, see Fig. 5c. To incorporate the decaying behavior of the peak height as a function of the degree, we compute the relation between degree k and the maxima of P(F∣k) and find the most plausible expression, h(k), again with the BMS, see Fig. 5d. Notice that here we are also assuming that the height of the probabilities do not depend on the feature F. In summary, we have an approximate degreefeature distribution
with k = k_{m}, k_{m} + 1, k_{m} + 2, …, and F ∈ [0, 1], where k_{m} is a minimum degree seen in the data and \({\mathcal{Z}}\) is the normalization constant. It is important to note that the joint distribution has been obtained in an unsupervised way and without any prior knowledge of the real degree distribution. In other dynamical processes different from the SIS model the functional expressions used here might not work, but eventually one can always follow similar steps or even apply the BMS to the twodimensional empirical data to directly compute an approximation to P(k, F).
In Fig. 5e we compare the results of the simulations with the curve obtained from the theory. We have employed the joint degreefeature distribution (21) to compute the generating functions, but we proceed numerically because no closed expressions can be found. We see that the agreement is quite good, despite the strong approximations used during the process. A small discrepancy around the critical region can be appreciated, rooted in finitesize effects and probably a systematic deviation that could potentially be reduced by employing more complicated functions in the process of constructing P(k, F), such as skewed Gaussian distributions. Anyway, we have shown that with little information one can find a quite satisfactory description of the percolation properties of systems in which the features are related to a dynamical process. This works reasonably well for synthetic networks, and we proceed next to test the accuracy of the featureenriched percolation in real networks, which are characterized by topological correlations not taken into account in the theory.
To this goal, we use three different dynamics on three different real networks and proceed similarly as before, i.e., we will find the approximate degreefeature joint distribution by means of the BMS. The first example is a mutualistic dynamics arisen in symbiotic ecosystems, where the time dependence of the abundance x_{i} of the species i is given by the equation^{54,55}
τ_{1} and τ_{2} are constants, and they will be so in the subsequent models. The first term on the righthand side models the logistic growth and the second term captures the mutualistic interaction that neighboring species have on species i. The network on which we run these dynamics is the onemode projection^{9} of the plantpollinator bipartite network reported in Ref. ^{56}. The projection is constructed in such a way that two plants are connected if they are pollinated by the same insect. We apply the featurebased interventions following the step occupation probability that we have been using throughout, i.e., we test the percolation properties of the network when the most abundant plants are removed. The results are shown in Fig. 6a. We observe that the theoretical expressions match very well the size of the largest connected component obtained from the simulations. Notice that the qualitative behavior of the functions used to construct the approximate degreefeature distribution (21) (see insets of Fig. 6a) is different from the previous case and it strongly depends on the dynamics employed. Indeed the feature values grow linearly with the degree while keeping a constant and very small standard deviation, and the peak height is not monotonic with the degree, with its most noisy part located where the peaks are largest.
The second example corresponds to birthdeath processes^{57}. In the context of population dynamics, the temporal evolution of the population x_{i} in a site i can be described by
The values of the exponents 2 and 1 in the x_{i} and x_{j} terms on the righthand side are arbitrary for the present study, and other choices are of course possible. These particular ones represent pairwise depletion and linear flow between interacting populations, respectively. The dynamics are implemented on top of the onemode pollinator projection of the previous plantpollinator network. This network is constructed by connecting the pollinators that pollinate the same plant. The results of the featureenriched percolation are displayed in Fig. 6b. In this case, we observe that the theory offers estimates for the critical point and for the size of the giant component that are a bit lower than the values given by the simulations. Depending on the application, this discrepancy might be tolerable or not, but anyway, the theoretical calculations are fairly good at catching the response of the system to featurebased interventions.
As a final dynamics, we use the massaction kinetics model often employed in biochemistry^{58}. The equation
gives the temporal evolution of the concentration x_{i} of protein i. The first term represents the rate at which the protein i is synthesized, the second term stands for its degradation, and the last term accounts for the interaction between molecules. The real topology where we test the featureenriched percolation is the C. elegans interactome constructed considering the interolog interactions^{59}. The results are shown in Fig. 6c, where we find a good agreement between theory and simulations as well.
The dynamical processes presented here are merely illustrative examples of the potential and flexibility of the theory. For all the chosen examples the microscopic rules and the time evolution of the variable of interest were known, but we could have proceeded even without that information. There are two minimal ingredients to apply the featureenriched percolation: the degree and the feature value for every node. If for whatever reason, a relation between the feature and the degree cannot be obtained, one can always proceed by employing the BMS technique, or other methods whose goal is to provide closedform equations from data^{60,61,62}.
Discussion
In recent years there has been an upsurge of contributions aiming at offering more realistic descriptions of the natural and sociotechnical phenomena that networks encode, e.g., via multilayer^{63,64} or temporal^{65} networks, or via higherorder interactions^{66}. These topological generalizations induce nontrivial consequences in network dynamics, with important implications for the stability and proper functioning of the systems when subjected to perturbations. Beyond these more accurate topological characterizations, there is a dimension that has been frequently ignored in the study of robustness and resilience, which is node metadata. Its omission is not rooted, by no means, in its irrelevance, but because node metadata is a type of information that many times is lost or disregarded in the process of constructing the networked architecture from empirical observations.
Taking percolation as the paradigmatic model to assess the robustness of a network, in this article, we propose a natural generalization of this phenomenon, flexible enough to include node removal protocols based on a combination of the degree and nontopological node metadata, what we call the features. We have worked out the analytical expression for the size of the giant component and have checked its good agreement with simulations. We have discussed in some detail the phenomenology that appeared in a set of examples displaying typical degreefeature relations of real systems, such as the critical exponents of the transition or the characterization of the nonmonotonic response of the robustness induced by the correlations. In this first part of the article, the nature of the features has been left undetermined and, far from being a limitation, this is actually a strength of our model. Indeed, the origin of real node metadata can be either an exogenous or endogenous property of the nodes, can be either of constant or mutable nature, can be either numerical or nonnumerical, etc. All these cases can be included in our model. In the second part of the article, we have dealt with two families of problems in which the features have a physical meaning: spatial networks and dynamical processes on networks. The latter case is the most challenging problem because very frequently one cannot analytically extract the main ingredient to use the featureenriched percolation framework, the joint degreefeature distribution. We have shown, however, how to overcome this limitation by employing a stateoftheart probabilistic method that gives us an approximate distribution.
A groundbreaking discovery in network robustness assessments was the realization that the response to random failures and degree attacks is radically different when the variance of the degree distribution is much larger than the mean degree. We believe that, in a similar vein, conceptualizing the possibility of attacking networks with new featurebased protocols, and providing the mathematical framework to study this process, can help unravel unexpected responses and hidden fragilities. We hypothesize that it might be possible to choose a smart occupation probability ϕ_{k,F} that leads to a truly discontinuous percolation phase transition, thus identifying a crucial subset of nodes (that is, from a network ensemble perspective, identifying the range of degree and feature values) that once removed cause catastrophic consequences for the robustness of the system.
Because our model builds upon traditional, wellgrounded messagepassing techniques that have been successfully employed in a myriad of different problems, many generalizations not treated in the present work are still possible. Some of them are the study of bond percolation based on features, which is a very relevant situation because many existing network datasets convey information about the link weight rather than node metadata. Generalizations are also possible in topologically correlated networks, i.e., those showing clustering or nontrivial assortativity mixing. Percolation in multilayer networks has also attracted a considerable extent of attention in the last decade, and featurebased protocols can be devised in these layered structures as well. The feature dimension can be relevant too when studying optimal percolation, that is, finding the sets of nodes that, when removed, cause the largest possible reduction in the giant component. Indeed, one can devise attacks that combine feature and topological information, more complex than the one studied in this article, in order to eventually outperform purely topologicallybased interventions and move closer to the optimal dismantling. Finally, a conceptually similar problem but that would require a completely different mathematical approach is the one of featurebased percolation in lowdimensional lattices, where it can be addressed which traits of the feature distribution modify the wellknown properties of these systems.
Last, but not least, we would like to discuss some implications of our model on the robustness of complex interconnected systems. Firstly, we are opening the doors to the possibility of assessing the behavior of a network as a function of the combination of several traits, enabling the exploration of responses in large phase spaces. This can be particularly useful not only for scientific research problems but for policy making too, where, many times, it is needed to evaluate scenarios taking into account the optimization of a multitude of factors. Think, for instance, in the current COVID19 pandemic, where policies have required to maintain a delicate balance between the protection of public health and the sustainability of the economic system. Our model represents a first step towards the nontopological multidimensional optimization of robustness. Secondly, depending on the nature of the features, different implementations can be devised. There are some situations in which the features values can be tuned at will, e.g., resources are given to certain nodes, therefore it can be explored how to modify the feature allocation, correlating it with topological information, in order to better protect the network. We learned, for instance, that the robustness in networks with positively correlated degreefeature distributions is considerably lower than a network with uncorrelated degreefeature, and that this is not always true for negatively correlated ones. This kind of information can be exploited, of course combining it with attacking protocol. There are other situations in which the features remain fixed, e.g., the age of nodes, but the topology is flexible. Put otherwise, we can tune the strength of correlations at will by redirecting the edges in a convenient way. Here we can also take advantage of the relation between correlation and robustness to increase the system’s ability to sustain its function despite the attacks. Finally, we can also use our formalism to infer the temporal evolution of the robustness in the case where the features evolve. For the sake of illustration, in the article, we have presented several examples of dynamics with welldefined equations that display a steady state, but none of these characteristics are actually necessary. It just suffices to have an accurate prediction for the future feature values, whatever the way we have used to obtain it. Taking snapshots at different times, we can apply the BMS method at each of them to obtain a timedependent size of the largest connected component. This allows us to gain access to information on when a system will be most robust and most vulnerable, hence forestalling undesired behaviors. All these practical examples evince the potential of our framework, from which insightful lessons can be learned to better protect, or dismantle real systems. Likewise, at a fundamental level, a very interesting new phenomenology can be obtained due to the inclusion of features. For all these reasons we hope our model becomes a steppingstone on the path towards a more realistic and useful description of the process of network robustness.
Data availability
All datasets used in this article are publicly available on the Internet. The data used in Fig. 1a can be found at https://zenodo.org/record/3553442. The data used in Fig. 1b can be found at https://consonni.dev/datasets/. The data used in Fig. 1c can be found at https://covid19obs.fbk.eu/#/api. The data used in Fig. 6a, b can be found at https://iwdb.nceas.ucsb.edu/html/robertson_1929.html. The data used in Fig. 6c can be found at http://interactome.dfci.harvard.edu/C_elegans/index.php?page=download.
Code availability
The code for the Bayesian Machine Scientist is available at https://bitbucket.org/rguimera/machinescientist/. Another code relevant to the paper is available from the authors upon reasonable request.
References
Stauffer, D., Coniglio, A. & Adam, M. Gelation and critical phenomena. in Polymer Networks. 103–158 (Springer, 1982).
Stanley, H. E. Phase Transitions and Critical Phenomena. (Clarendon, Oxford, 1971).
Yeomans, J. M. Statistical Mechanics of Phase Transitions. (Clarendon Press, 1992).
Clerc, J., Giraud, G., Laugier, J. & Luck, J. The electrical conductivity of binary disordered systems, percolation clusters, fractals and related models. Adv. Phys. 39, 191–309 (1990).
Cardy, J. L. & Grassberger, P. Epidemic models and percolation. J. Phys. A 18, L267 (1985).
Hunt, A., Ewing, R. & Ghanbarian, B. Percolation Theory for Flow in Porous Media, vol. 880 (Springer, 2014).
Sahini, M. Applications of Percolation Theory. (CRC Press, 1994).
Stauffer, D. & Aharony, A. Introduction to Percolation Theory. (Taylor & Francis, 2018).
Newman, M. Networks. (Oxford University Press, 2018).
Artime, O., d’Andrea, V., Gallotti, R., Sacco, P. L. & De Domenico, M. Effectiveness of dismantling strategies on moderated vs. unmoderated online social platforms. Sci. Rep. 10, 14392 (2020).
Allard, A., Althouse, B. M., Scarpino, S. V. & HébertDufresne, L. Asymmetric percolation drives a double transition in sexual contact networks. Proc. Natl Acad. Sci. USA 114, 8969–8973 (2017).
Buldyrev, S. V., Parshani, R., Paul, G., Stanley, H. E. & Havlin, S. Catastrophic cascade of failures in interdependent networks. Nature 464, 1025 (2010).
Klosik, D. F., Grimbs, A., Bornholdt, S. & Hütt, M.T. The interdependent network of gene regulation and metabolism is robust where it needs to be. Nat. Commun. 8, 534 (2017).
Callaway, D. S., Newman, M. E., Strogatz, S. H. & Watts, D. J. Network robustness and fragility: Percolation on random graphs. Phys. Rev. Lett. 85, 5468 (2000).
Cantwell, G. T. & Newman, M. Message passing on networks with loops. Proc. Natl Acad. Sci. USA 116, 23398–23403 (2019).
Cohen, R., BenAvraham, D. & Havlin, S. Percolation critical exponents in scalefree networks. Phys. Rev. E 66, 036113 (2002).
Cellai, D., Lawlor, A., Dawson, K. A. & Gleeson, J. P. Tricritical point in heterogeneous kcore percolation. Phys. Rev. Lett. 107, 175703 (2011).
Colomerde Simón, P. & Boguñá, M. Double percolation phase transition in clustered complex networks. Phys. Rev. X 4, 041020 (2014).
Radicchi, F. & Castellano, C. Breaking of the sitebond percolation universality in networks. Nat. Commun. 6, 10196 (2015).
Albert, R., Jeong, H. & Barabási, A.L. Error and attack tolerance of complex networks. Nature 406, 378 (2000).
Broido, A. D. & Clauset, A. Scalefree networks are rare. Nat. Commun. 10, 1–10 (2019).
Borgatti, S. P. & Everett, M. G. A graphtheoretic perspective on centrality. Soc. Netw. 28, 466–484 (2006).
Bertagnolli, G., Agostinelli, C. & De Domenico, M. Network depth: identifying median and contours in complex networks. J. Complex Netw. 8, cnz041 (2020).
Morone, F. & Makse, H. A. Influence maximization in complex networks through optimal percolation. Nature 524, 65–68 (2015).
da Cunha, B. R., GonzálezAvella, J. C. & Gonçalves, S. Fast fragmentation of networks using modulebased attacks. PLoS ONE 10, e0142824 (2015).
Almeira, N., Billoni, O. V. & Perotti, J. I. Scaling of percolation transitions on Erdös–Rényi networks under centralitybased attacks. Phys. Rev. E 101, 012306 (2020).
Artime, O. & De Domenico, M. Abrupt transition due to nonlocal cascade propagation in multiplex systems. New J. Phys. 22, 093035 (2020).
de Abreu, C., Gonçalves, S. & da Cunha, B. R. Empirical determination of the optimal attack for fragmentation of modular networks. Physica A. 563, 125486 (2021).
Artime, O., Peralta, A. F., Toral, R., Ramasco, J. J. & San Miguel, M. Aginginduced continuous phase transition. Phys. Rev. E 98, 032104 (2018).
Woodson, C. B., Schramski, J. R. & Joye, S. B. A unifying theory for topheavy ecosystem structure in the ocean. Nat. Commun. 9, 23 (2018).
Chemmanur, T. J. & Fulghieri, P. Investment bank reputation, information production, and financial intermediation. J. Financ. 49, 57–79 (1994).
Newman, M. E., Strogatz, S. H. & Watts, D. J. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E 64, 026118 (2001).
Söderberg, B. Properties of random graphs with hidden color. Phys. Rev. E 68, 026107 (2003).
Leicht, E., D’Souza, R. M. Percolation on interacting networks. arXiv preprint arXiv:0907.0894 (2009).
Allard, A., Noël, P.A., Dubé, L. J. & Pourbohloul, B. Heterogeneous bond percolation on multitype networks with an application to epidemic dynamics. Phys. Rev. E 79, 036113 (2009).
Baxter, G., Dorogovtsev, S., Goltsev, A. & Mendes, J. Avalanche collapse of interdependent networks. Phys. Rev. Lett. 109, 248701 (2012).
Achlioptas, D., D’Souza, R. M. & Spencer, J. Explosive percolation in random networks. Science 323, 1453–1455 (2009).
Olver, F. W., Lozier, D. W., Boisvert, R. F., Clark, C. W. NIST Handbook of Mathematical Functions Hardback and CDROM. (Cambridge University Press, 2010).
Ramasco, J. J. & Gonçalves, B. Transport on weighted networks: when the correlations are independent of the degree. Phys. Rev. E 76, 066106 (2007).
Artime, O., Ramasco, J. J. & San Miguel, M. Dynamics on networks: competition of temporal and topological correlations. Sci. Rep. 7, 41627 (2017).
Pottie, G. J. & Kaiser, W. J. Wireless integrated network sensors. Commun. ACM 43, 51–58 (2000).
Grilli, J., Barabás, G., Allesina, S. Metapopulation persistence in random fragmented landscapes. PLoS Comput. Biol. 11, e1004251 (2015).
Zhang, W., Lim, C. C., Korniss, G. & Szymanski, B. K. Opinion dynamics and influencing on random geometric graphs. Sci. Rep. 4, 1–9 (2014).
Penrose, M. Random Geometric Graphs, vol. 5 (Oxford University Press, 2003).
Balister, P., Sarkar, A. & Bollobás, B. Percolation, connectivity, coverage and colouring of random geometric graphs. In Handbook of LargeScale Random Networks, 117–142 (Springer, 2008).
Balister, P., Bollobás, B. & Walters, M. Continuum percolation with steps in the square or the disc. Random Struct. Algor. 26, 392–403 (2005).
MartínezGarcía, R., Calabrese, J. M., HernándezGarcía, E. & López, C. Minimal mechanisms for vegetation patterns in semiarid regions. Philos. Trans. R. Soc. A 372, 20140068 (2014).
Kiziridis, D. A., Fowler, M. S. & Yuan, C. Modelling fungal competition for space: towards prediction of community dynamics. Discret. Continuous Dyn. Syst. 25, 4411 (2020).
Allard, A. & HébertDufresne, L. Percolation and the effective structure of complex networks. Phys. Rev. X 9, 011023 (2019).
Zhou, C. & Kurths, J. Hierarchical synchronization in complex networks with heterogeneous degrees. Chaos 16, 015104 (2006).
Liu, Z., Ma, W., Zhang, H., Sun, Y. & Hui, P. M. An efficient approach of controlling traffic congestion in scalefree networks. Physica A 370, 843–853 (2006).
PastorSatorras, R., Castellano, C., Van Mieghem, P. & Vespignani, A. Epidemic processes in complex networks. Rev. Mod. Phys. 87, 925 (2015).
Guimerà, R. et al. A Bayesian machine scientist to aid in the solution of challenging scientific problems. Sci. Adv. 6, eaav6971 (2020).
Cantrell, R. S., Cosner, C., Ruan, S. Spatial Ecology. (CRC Press, 2010).
Holling, C. S. Some characteristics of simple types of predation and parasitism. The Canadian Entomologist 91, 385–398 (1959).
Robertson, C. Flowers and Insects: Lists of Visitors of Four Hundred and Fiftythree Flowers. (The Science Press Printing Company, 1928). Data downloaded from https://iwdb.nceas.ucsb.edu/html/robertson_1929.html. Accessed 18 Feb 2021.
Novozhilov, A. S., Karev, G. P. & Koonin, E. V. Biological applications of the theory of birthanddeath processes. Brief. Bioinform. 7, 70–85 (2006).
Voit, E. O. Computational Analysis of Biochemical Systems: a Practical Guide for Biochemists and Molecular Biologists. (Cambridge University Press, 2000).
Simonis, N. et al. Empirically controlled mapping of the Caenorhabditis elegans proteinprotein interactome network. Nat. Methods 6, 47–54 (2009).
Bongard, J. & Lipson, H. Automated reverse engineering of nonlinear dynamical systems. Proc. Natl Acad. Sci. USA 104, 9943–9948 (2007).
Yair, O., Talmon, R., Coifman, R. R. & Kevrekidis, I. G. Reconstruction of normal forms by learning informed observation geometries from data. Proc. Natl Acad. Sci. USA 114, E7865–E7874 (2017).
Champion, K., Lusch, B., Kutz, J. N. & Brunton, S. L. Datadriven discovery of coordinates and governing equations. Proc. Natl Acad. Sci. USA 116, 22445–22451 (2019).
De Domenico, M. et al. Mathematical formulation of multilayer networks. Phys. Rev. X 3, 041022 (2013).
Boccaletti, S. et al. The structure and dynamics of multilayer networks. Phys. Rep. 544, 1–122 (2014).
Masuda, N., Lambiotte, R.A. Guide To Temporal Networks. (World Scientific, 2020), 2nd edn.
Battiston, F. et al. Networks beyond pairwise interactions: structure and dynamics. Phys. Rep. 544, 1–122 (2020).
Evtushenko, A. & Gastner, M. T. Beyond fortune 500: women in a global network of directors. In Proc. International Conference on Complex Networks and Their Applications, 586–598 (Springer, 2019). Data downloaded from https://zenodo.org/record/3553442. Accessed 18 Feb 2021.
Consonni, C., Laniado, D. & Montresor, A. WikiLinkGraphs: a complete, longitudinal and multilanguage dataset of the Wikipedia link networks. In Proc. International AAAI Conference on Web and Social Media, vol. 13, 598–607 (2019). Data downloaded from https://consonni.dev/datasets/. Accessed 18 Feb 2021.
Gallotti, R., Valle, F., Castaldo, N., Sacco, P. & De Domenico, M. Assessing the risks of “infodemics” in response to COVID19 epidemics. Nat. Hum. Behav. 4, 1285–1293 (2020).
Barabási, A.L. & Albert, R. Emergence of scaling in random networks. Science 286, 509–512 (1999).
Author information
Authors and Affiliations
Contributions
O.A. performed the analytical computations and the simulations. O.A. and M.D.D. designed the research, discussed and interpreted the results, and wrote and revised the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Communications thanks Yanqing Hu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Artime, O., De Domenico, M. Percolation on featureenriched interconnected systems. Nat Commun 12, 2478 (2021). https://doi.org/10.1038/s4146702122721z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4146702122721z
Further reading

Homophily impacts the success of vaccine rollouts
Communications Physics (2022)

A sustainable strategy for Open Streets in (post)pandemic cities
Communications Physics (2021)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.