Optimal percolation on multiplex networks

Optimal percolation is the problem of finding the minimal set of nodes whose removal from a network fragments the system into non-extensive disconnected clusters. The solution to this problem is important for strategies of immunization in disease spreading, and influence maximization in opinion dynamics. Optimal percolation has received considerable attention in the context of isolated networks. However, its generalization to multiplex networks has not yet been considered. Here we show that approximating the solution of the optimal percolation problem on a multiplex network with solutions valid for single-layer networks extracted from the multiplex may have serious consequences in the characterization of the true robustness of the system. We reach this conclusion by extending many of the methods for finding approximate solutions of the optimal percolation problem from single-layer to multiplex networks, and performing a systematic analysis on synthetic and real-world multiplex networks.

Optimal percolation is the problem of finding the minimal set of nodes such that if the members of this set are removed from a network, the network is fragmented into non-extensive disconnected clusters. The solution of the optimal percolation problem has direct applicability in strategies of immunization in disease spreading processes, and influence maximization for certain classes of opinion dynamical models. In this paper, we consider the problem of optimal percolation on multiplex networks. The multiplex scenario serves to realistically model various technological, biological, and social networks. We find that the multilayer nature of these systems, and more precisely multiplex characteristics such as edge overlap and interlayer degree-degree correlation, profoundly changes the properties of the set of nodes identified as the solution of the optimal percolation problem.

I. INTRODUCTION
A multiplex is a network where nodes are connected through different types or flavors of pairwise edges [1][2][3]. A convenient way to think of a multiplex is as a collection of network layers, each representing a specific type of edges. Multiplex networks are genuine representations for several real-world systems, including social [4,5], and technological systems [6,7]. From a theoretical point of view, a common strategy to understand the role played by the co-existence of multiple network layers is based on a rather simple approach. Given a process and a multiplex network, one studies the process on the multiplex and on the single-layer projections of the multiplex (e.g., each of the individual layers, or the network obtained from aggregation of the layers). Recent research has demonstrated that accounting for or forgetting about the effective co-existence of different types of interactions may lead to the emergence of rather different features, and have potentially dramatic consequences in the ability to model and predict properties of the system. Examples include dynamical processes, such as diffusion [8,9], epidemic spreading [10][11][12][13], synchronization [14], and controllability [15], as well as structural processes such as those typically framed in terms of percolation models [16][17][18][19][20][21][22][23][24][25][26][27][28][29].
The vast majority of the work on structural processes on multiplex networks have focused on ordinary percolation models where nodes (or edges) are considered either in a functional or in a non-functional state with homogenous probability [30]. In this paper, we shift the focus on the optimal version of the percolation process: we study the problem of identifying the smallest number of nodes in a multiplex network such that, if these nodes are removed, the network is fragmented into many disconnected clusters of non-extensive size. We refer to the nodes belonging to this minimal set as Structural Nodes (SNs) of the multiplex network. The solution of the optimal percolation problem has direct applicability in the context of robustness, representing the cheapest way to dismantle a network [31][32][33]. The solution of the problem of optimal percolation is, however, important in other contexts, being equivalent to the best strategy of immunization to a spreading process, and also to the best strategy of seeding a network for some class of opinion dynamical models [34][35][36][37]. Despite its importance, optimal percolation has been introduced and considered in the framework of single-layer networks only recently [35,36]. The optimal percolation is an NP-complete problem [32]. Hence, on large networks, we can only use heuristic methods to find approximate solutions. Most of the research activity on this topic has indeed focused on the development of greedy algorithms [31][32][33]35]. The generalization of optimal percolation to multiplex networks that we consider here consists in the redefinition of the problem in terms of mutual connectedness [16]. To this end, we reframe several algorithms for optimal percolation from single-layer to multiplex networks. Basically all the algorithms we use provide coherent solutions to the problem, finding sets of SNs that are almost identical. Our main focus, however, is not on the development of new algorithms, but on answering the following question: What are the consequences of neglecting the multiplex nature of a network under an optimal percolation process? We compare the actual solution of the optimal percolation problem in a multiplex network with the solutions to the same problem for single-layer networks extracted from the multiplex system. We show that "forgetting" about the presence of multiple layers can be potentially dangerous, leading to the overestimation of the true robustness of the system mostly due to the identification of a very high number of false SNs. We reach this conclusion with a systematic analysis of both synthetic and real multiplex networks.

II. METHODS
We consider a multiplex network composed of N nodes arranged in two layers. Each layer is an undirected and unweighted network. Connections of the two layers are encoded in the adjacency matrices A and B. The generic element A ij = A ji = 1 if nodes i and j are connected in the first layer, whereas A ij = A ji = 0, otherwise. The same definition applies to the second layer, and thus to the matrix B. The aggregated network obtained from the superposition of the two layers is characterized by the adjacency matrix C, with generic elements The basic objects we look at are clusters of mutually connected nodes [16]: Two nodes in a multiplex network are mutually connected, and thus part of the same cluster of mutually connected nodes, only if they are connected by at least a path, composed of nodes within the same cluster, in every layer of the system. In particular, we focus our attention on the largest among these cluster, usually referred to as the Giant Mutually Connected Cluster (GMCC). Our goal is to find the minimal set of nodes that, if removed from the multiplex, leads to a GMCC that has at maximum a size equal to √ N . This is a common prescription, yet not the only one possible, to ensure that all clusters have non-extensive sizes in systems with a finite number of elements [35]. Whenever we consider single-layer networks, the above prescription apply to the single-layer clusters in the same exact way.
We generalize most of the algorithms devised to find approximate solutions to the optimal percolation problem in single-layer networks to multiplex networks [31-33, 35, 36]. Details on the implementation of the various methods are provided in the Supplementary Information (SI). We stress that the generalization of these methods is not trivial at all. For instance, most of the greedy methods use node degrees as crucial ingredients. In a multiplex network, however, a node has multiple degree values, one for every layer. In this respect, it is not clear what is the most effective way of combining these numbers to assign a single score to a node: they may be summed, thus obtaining a number approximately equal to the degree of the node in the aggregated network derived from the multiplex, but also multiplied, or combined in more complicated ways. We find that the results of the various algorithms are not particularly sensible to this choice, provided that a simple post-processing technique is applied to the set of SNs found by a given method. In Figure 1 for example, we show the performance of several greedy algorithms when applied to a multiplex network composed of two layers generated independently according to the Erdős-Rényi (ER) model. Although the mere application of an algorithm may lead to different estimates of the size of the set of SNs, if we greedily remove from these sets the nodes that do not increase the size of the GMCC to the predefined sublinear threshold ( √ N ) [33], the sets obtained after this post-processing technique have almost identical sizes.
As Figure 1 clearly shows, the best results, in the sense that the size of the set of SNs is minimal, is found with a Simulated Annealing (SA) optimization strategy [32] (see details in the SI). The fact that the SA method is outperforming score-based algorithms is not surprising. SA  Figure 1. Comparison among different algorithms to approximate solutions of the optimal percolation problem. We consider a multiplex network with N = 10, 000 nodes. The multiplex is composed of two network layers generated independently according to the Erdős-Rényi model with average degree k = 5. Each curve represents the relative size of the GMCC as a function of the number of nodes inserted in the set of SNs, thus removed from the multiplex. Colored markers indicate the effective fraction of nodes left in the set of SNs after a greedy post-processing technique is applied to the set found by the corresponding algorithm. The red cross identifies instead the size of the set of SNs found trough the Simulated Annealing optimization. Please note that the ordinate value of the markers has no meaning; in all cases, the relative size of the largest cluster is smaller than √ N . Details on the implementation of the various algorithms are provided in the SI.
actually represents one of the best strategies that one can apply in hard optimization tasks. In our case, it provides us with a reasonable upper-bound of the size of the set of SNs that can be identified in a multiplex network. The second advantage of SA in our context is that it doesn't rely on ambiguous definitions of ingredients (as for example, the aforementioned issue of node degree). Despite its better performance, SA has a serious drawback in terms of computational speed. As a matter of fact, the algorithm can be applied only to multiplex networks of moderate size. As here we are interested in understanding the fundamental properties of the optimal percolation problem in multiplex networks, the analysis presented in the main text of the paper is entirely based on results obtained through SA optimization. This provides us with a solid ground to support our statements. Extensions, relying on score-based algorithms, of the same analyses to larger multiplex networks are qualitatively similar (see SI).

III. RESULTS
A. The size of the set of structural nodes We consider the relative size of the set of SNs, denoted by q, for a multiplex composed of two indepen-  Figure 2. Optimal percolation problem in synthetic multiplex networks. A) We consider multiplex networks with N = 1, 000 and layers generated independently according to the Erdős-Rényi model with average degree k . We estimate the relative size of the set of SNs on the multiplex as a function of k (green circles), and compare with the same quantity but estimated on the individual layers (black squares, red down triangles) or the aggregated (orange right triangles). B) Relative errors of single-layer estimates of the size of the structural set with respect to the ground-truth value provided by the multiplex estimate. Colors and symbols are the same as those used in panel A. The blue curves with no markers represent instead the results for an ordinary site percolation process [16]. dently fabricated ER network layers as a function of their average degree k . We compare the results obtained applying the SA algorithm to the multiplex, namely q M , with those obtained using SA on the individual layers, i.e., q A and q B , or the aggregated network generated from the superposition of the two layers, i.e., q S . By definition we expect that q M ≤ q A q B ≤ q S . What we don't know, however, is how bad/good are the measures q A , q B and q S in the prediction of the effective robustness of the multiplex q M . For ordinary random percolation on ER multiplex networks with negligible overlap, we know that q M 1 − 2.4554/ k [16], q A q B 1−1/ k , and q S 1−1/(2 k ) [38]. Relative errors are therefore A B (2.4554−1)/( k −2.4554), and S (2.4554 − 1/2)/( k − 2.4554). We find that the relative error for the optimal percolation behaves more or less in the same way as that of the ordinary percolation ( Figure 2B), noting that, as k is increased, the decrease in the relative error associated with the individual layers is slightly faster than what expected for the ordinary percolation. The relative error associated with the aggregated network is instead larger than the one expected from the theory of ordinary percolation. As shown in Figure 2A, for sufficiently large k , dismantling the ER multiplex network is almost as hard as dismantling any of its constituent layers.

B. Edge overlap and degree correlations
Next, we test the role played by edge overlap and layer-to-layer degree correlation in the optimal percolation problem. These are ingredients that dramatically Relative size of the structural set Multiplex Single layer Aggregated Figure 3. The effect of reducing edge overlaps and interlayer degree-degree correlation by partially relabeling nodes in multiplex networks with initially identical layers. Initially, both layers are a copy of a random network generated by an Erdős-Rényi model with N = 1, 000 nodes and average degree k = 5. Then, in one of the layers, each node is selected to switch its label with another randomly chosen node with a certain probability α. For each α, we determine the mean of the relative size of the set of SNs over 100 realizations of the SA algorithm on the multiplex network.
change the nature of the ordinary percolation transition in multiplex networks [26,[39][40][41][42][43]. In Figure 3, we report results of a simple analysis. We take advantage of the model introduced in Ref. [44], where a multiplex is constructed with two identical layers. Nodes in one of the two layers relabeled with a certain probability α. For α = 0, multiplex, aggregated network and single-layer graphs are all identical. For α = 1, the networks are analogous to those considered in the previous section.
We note that this model doesn't allow to disentangle the role played by edge overlap among layers and the one played by the correlation of node degrees. For α = 0, edge overlap amounts to 100%, and there is a one-to-one match between the degree of a node in one layer and the other. As α increases, both edge overlap and degree correlation decrease simultaneously. As it is apparent from the results of Figure 3, the system reaches the multiplex regime for very small values of α, in the sense that the relative size of the set of SNs deviates instantly from its value for α = 0. This is in line with what already found in the context of ordinary percolation processes in multiplex networks: as soon as there is a finite fraction of edges that are not shared by the two layers, the system behaves exactly as a multiplex [26,[39][40][41][42][43].

C. Accuracy and sensitivity
So far, we focused our attention only on the size of the set of SNs. We neglected, however, any analysis regarding the identity of the nodes that actually compose this set.
To proceed with such an analysis, we note that different runs of the SA algorithm (or any algorithm with stochas-tic features) generally produce slightly different sets of SNs (even if they all have almost identical sizes). The issue is not related to the optimization technique, rather to the existence of degenerate solutions to the problem. In this respect, we work with the quantities p i , each of which describes the probability that a node i appears in the set of SNs in a realization of the detection method (here, the SA algorithm). This treatment takes into account the fact that a node may belong to the set of SNs in a number of realizations of the detection method and may be absent from this set in some other realizations. Now, we define the self-consistency of a detection method as S = i p 2 i / i p i , which describes the ratio of the expected overlap between two SNs obtained from two independent realizations of the detection method to the expected size of an SN. If the set of SNs is identical across different runs, then S = 1. On the other hand, the minimal value we can observe is S = Q/N , assuming that the size of the structural set is equal to Q in all runs, but nodes belonging to this set are changing all the times, so that for every node we have p i = Q/N .
As reported in Figure 4A, even for random multiplex networks, self-consistency is rather high for single layer representations of the network. On the other hand, S decreases significantly as the overlap and interlayer de-  Figure 4. The effect of reducing edge overlaps and interlayer degree-degree correlation by partially relabeling nodes in multiplex networks with initially identical layers. We consider the multiplex networks described in Figure 2 and the sets of SNs found for the multiplex and single layer based representations of these networks. A) As the set of SNs found in different instances of the optimization algorithm are different from each other, we first quantify the self-consistency of those solutions across 100 independent runs of the SA algorithm. We then assume that the multiplex representation provides the groundtruth classification of the nodes. We compare the results of the other representation with the ground truth by measuring their precision (panel B), their sensitivity or recall (panel C), and their F1 score (panel D).
gree correlations are decreased ( Figure 4A). The low S values for multiplexes with small overlap and correlation together with the small sizes of their set of SNs ( Figure 2) suggests that in such networks there can exist many relatively different sets of SNs that if nodes of each of these sets are removed the network is dismantled. Next, we turn our attention to quantifying how the sets of SNs identified in single-layer or aggregated networks are representative for the ground-truth sets found on the multiplex networks. Here, we denote by p i and w i the probability that node i is found within the set of SNs of, respectively, a multiplex network (ground truth) and a specific single-layer representation of that multiplex. To compare the sets represented by w i to the ground truth sets, we adopt three standard metrics in information retrieval [45,46], namely precision, recall and the Van Rijsbergen's F 1 score: Precision is defined as i.e., the ratio of the (expected) number of correctly detected SNs to the (expected) total number of detected SNs. Recall is instead defined as i.e., the ratio of the (expected) number of correctly detected SNs to the (expected) number of actual SNs of the multiplex. We note that the selfconsistency we previously defined corresponds to precision and recall of the ground-truth set with respect to itself, thus providing a base line for the interpretation of the results. The F 1 score defined as F 1 = 2 1/P +1/R provides a balanced measure in terms of P and R.
As Figure 4B shows, precision deteriorates as the edge overlap and interlayer degree correlation decrease by increasing the relabeling probability. In particular, when the overlap and correlation between the layers of the multiplex network are not large, the precision of the sets of SNs identified in single layers or in the superposition of the layers is quite small (around 0.3), even smaller than the ratio of the q M of the multiplex to the q of any of these sets (see Figure 3). This means that, when the multiplex nature of the system is neglected, not only too many SNs are identified, but also a significant number of the SNs of the multiplex are not identified.
Recall, on the other hand, behaves differently for single-layer and aggregated networks ( Figure 4C). In single layers, we see that recall systematically decreases as the relabelling probability increases. The structural set of nodes obtained on the superposition of the layers instead provides large values of recall. This is not due to good performance rather to the fact that the set of SNs identified on the aggregated network is very large (see Figure 3). The results of Figure 4 demonstrate that even the larger recall values for the aggregated network do not lead to a better F 1 score: the F 1 score diminishes as the relabeling probability is increased.

D. Real-world multiplex networks
In Table I, we present results of the analysis of optimal percolation problem on several real-world multiplex net-

Network
Layers N

Multiplex
Single layers Aggregate  Table I. Optimal percolation in real multiplex networks. From left to right we report the following information. The first three columns contain the name of the system, the identity of the layers, and the number of nodes of the network. The fourth and fifth columns are results obtained from the optimal percolation problem studied on the multiplex network, and contain information about the relative size qM , and self-consistency metric S of the set of SNs. Then, we report results obtained for the first single-layer network of the multiplex, namely the fraction qA of nodes in the structural set, the precision PA, the recall RA, and the F1 score of the set of SNs of the first layer. The next three columns are identical to those, but refer to the second layer. Finally, the three rightmost columns contain information about the fraction qS of nodes in the structural set, PS precision, RS recall, and the F1 score of the set of SNs for the aggregated network obtained from the superposition of the two layers. All results have been obtained with 100 independent instances of the SA optimization algorithm.
works generated from empirical data. For most of these networks, the optimal percolation on the multiplex rep- resentation has a rather high self-consistency. This implies that there is a certain small group of nodes that have a major importance in the robustness of such realworld networks to the optimal percolation process. The F 1 score for most of the networks (not shown) is quite low indicating that on real-world networks we loose essential information about the optimal percolation problem if the multiplex structure is not taken into account.
To provide a practical case study with a lucid interpretation, we depict, in Figure 5, the results for optimal percolation on a multiplex network describing the air transportation operated by two of the major airlines in the United States. SA identifies always 10 airports in the set of SNs. There is a slight variability among different instances of the SA optimization, with a total of 14 distinct airports appearing in the structural set at least once over 100 SA instances. However, changes in the SN set from run to run mostly regard airports in the same geographical region. Overall, airports in the structural set are scattered homogeneously across the country, suggesting that the GMCC of the network mostly relies on hubs serving specific geographical regions, rather than global hubs in the entire transportation system. For instance, the probabilities that describe the membership of the airports to the set of SNs do not strictly follow the same order as that of the recorded flight traffics [52]; nor merely the number of connections of the airports (not shown) is sufficient to determine the structural nodes. This is well consistent with the collective nature of the optimal percolation on the complex network of air transportation.

IV. CONCLUSIONS
In this paper, we studied the optimal percolation problem on multiplex networks. The problem regards the detection of the minimal set of nodes (or set of structural nodes, SNs) such that if its members are removed from the network, the network is dismantled. The solution to the problem provides important information on the microscopic parts that should be maintained in a functional state to keep the overall system functioning, in a scenario of maximal stress. Our study focused mostly on the characterization of the SN sets of a given multiplex network in comparison with those found on the single-layer projections of the same multiplex, i.e., in a scenario where one "forgets" about the multiplex nature of the system. Our results demonstrate that, generally, multiplex networks have considerably smaller sets of SNs compared to the SN sets of their single-layer based network representations. The error committed when relying on single-layer representations of the multiplex doesn't regard only the size of the SN sets, but also the identity of the SNs. Both issues emerge in the analysis of synthetic network models, where edge overlap and/or interlayer degree-degree correlations seem to fully explain the amount of discrepancy between the SN set of a multiplex and the SN sets of its single-layer based representations. The issues are apparent also in many of the real-world multiplex networks we analyzed. Overall, we conclude that neglecting the multiplex structure of a network system subjected to maximal structural stress may result in significant inaccuracies about its robustness. In this section, we briefly discuss some of the most effective dismantling algorithms on monoplex networks and their generalization to multiplex networks with two layers. The aim of these algorithms is to approximate the set of structural nodes which is the minimal set of nodes that their removal dismantles the network into vanishingly small (non-extensive) clusters. We first introduce some of the score-based algorithms. In such algorithms, at each step, a score for each node is calculated and the node with the highest score is removed (when several nodes have the same score, one of them is removed at random). We discuss four different types of such algorithms: High Degree (HD), High Degree Adaptive (HDA),

Optimal percolation on multiplex networks
Collective Influence (CI) and Explosive Immunization (EI). These methods are partially deterministic in nature, i.e., nearly the same set of structural nodes are discovered at each realizations of the algorithm. Besides score-based algorithms, we present Simulated Annealing (SA) algorithm which is a greedy algorithm that searches the solution space of the dismantling problem to find the best approximation to the structural sets. The SA method takes into account the collective behavior of the dismantling problem and provides several dismantling sets for different realizations of the algorithm on the same network structure.

A. High Degree (HD)
In a monoplex network, the easiest way to dismantle a network, is a degree-based attack.
After sorting the nodes with respect to their degrees, the nodes with the highest degree are removed one by one 1 until the network is dismantled. As this algorithm is deterministic (except from the randomness in choosing a node from those with the same degree), the set of nodes that are removed to dismantle the network is almost unique.
In multiplex networks, the degree of a node can be defined in various ways. We consider four different cases: the score of a node is defined as (i) its degree in layer A, (ii) its degree in layer B, (iii) the sum of its degrees across all the layers, and (iv) the product of its degrees across all the layers. It is worth mentioning that, when using HD, HDA and CI methods, Relative size of the largest cluster each layer) of the multiplex.
As Figure S-1 illustrates, to destruct a multiplex, the two scores defined as a combination of degrees in different layers are more effective than those based on the degrees in only one of the layers. In the main script and in the rest of the Supplemental Material (SM) when we refer to HD method, we mean the one in which a node degree is the product of its degrees across all the layers.

B. High Degree Adaptive (HDA)
In the HD algorithm if we take into account the history of the process and recalculate, at each step, the degrees of the nodes, it is referred to as an HDA algorithm. Since the HDA algorithm is adaptive it is expected to work better than the HD method. In Relative size of the largest cluster Like the HD case, we can define at least four methods to define the degree of a node.
Please notice that in the multiplex version, when we update the degrees, we exclude those neighbors that are not in the Giant Mutually Connected Component (GMCC) of the network. Figure S-2 shows the effectiveness of the HDA algorithm for the different definitions of nodes' degrees. Similar to the results for HD, it is more effective to combine the scores of different layers, than considering layers as isolated networks. In all the subsequent sections and in the main script, when we refer to HDA, we mean the one in which the score of a node is defined as the multiplication of its degrees across all the layers.

C. Collective Influence (CI)
In the monoplex version of the CI algorithm [S1], the score CI i (l) of node i is equal to the excess degree of i multiplied by the sum of the excess degrees of its neighbours at a specific distance l from i: where ∂Ball(i, l) denotes the neighbors of i at the distance l (i.e., the nodes that have a geodesic distance l from i). At each step, the CI score is adaptively calculated for all the Relative size of the largest cluster nodes; then nodes with the highest score are removed from the network, until the network is dismantled. It was shown [S1, S2] that the performance of the CI method increases with l up to l = 4; for l > 4 the performance is not improved appreciably as l is increased.
To adapt the CI algorithm to multiplex networks with two layers, we considered several possible definitions of the CI score in the multiplex: (i) using the CI obtained based only on the structure of layer A, (ii) based only on the structure of layer B, (iii) the sum of the CIs of a node in layer A and layer B, and (iv) the product of these two CI scores. Figure S-3 illustrates that, the generalizations of the CI method we considered here are not as effective as those derived based on the HD (Figure S-1) and HDA ( Figure S-2) methods. Thus, methods based on the CI measures of the layers do not provide an effective algorithm for the optimal percolation problem.

D. Explosive Immunization (EI)
The EI algorithm is based on a method referred to as explosive percolation. The original explosive percolation method was introduced by Achlioptas et al. [S3]. For a monoplex network we implement the EI algorithm as follows. At each step, we select N (C) = 1000 candidate nodes from the set of absent nodes, and calculate the score σ i of each them using the following kernel: where, N i represents the set of all connected components (CCs) linked to node i, each of which has a size C j , and k the GCC and should be ignored in the score of a node.
In our extension of the EI method to multiplex networks, we consider the different kernel but otherwise perform the exact same procedure as the one described above. The new kernel (Eq. (S.3)) we use is based on the sizes of the mutually connected components (MCCs) rather than on the sizes of CCs: is the set of neighbors of node i in layer A, M j is the size of the MCC to which node j belongs, and k nodes (N (C) ). In the simulations of Figure 1 of the main text, we used a N (C) = 1000.

E. Simulated Annealing (SA)
The simulated annealing (SA) method has been used for the dismantling problem in monoplex networks [S5]. Generally an SA algorithm defines an energy function that attributes energy values to each configuration of the system. The phase space of the system is searched for the optimal configuration (the one with the minimum energy) by Markov Chain Monte Carlo moves that switch the system from one configuration to another. In dismantling of multiplex networks, the algorithm should find the minimal set of nodes which if deleted the size of the GMCC becomes non-extensive. Each configuration of the multiplex network is represented by {R, g}, where R and g are, respectively, the number of removed nodes (each node and all its corresponding replica nodes are counted as one node), and the relative size of the GMCC. The energy of a configuration is defined as follows: where v is the cost of removing a node from the multiplex network and in the simulations presented in this paper it is set v = 0.6. At each step t of the algorithm, one node, present or removed, is selected at random; then one of the following sets of operations are performed: • If the node is present and it belongs to the GMCC, it is removed (thus R t = R t−1 + 1) and the new size of the GMCC (g t ) is calculated.
• If the node is present but it does not belong to the GMCC, then it is removed (thus R t = R t−1 + 1); but since it did not belong to the GMCC, g t = g t−1 .
• If the node is in the set of removed nodes, it is added back to the network and R t = R t−1 − 1. Then M i (the size of the MCC formed after inserting i) is calculated and Afterwards the energy of the new configuration ε t is calculated and the set of operations is accepted with a probability equal to min 1, e −β(εnew−ε) . If it is accepted, the new configuration ({R t , g t }) is retained, otherwise, the operations are omitted and the old configuration Here, β is interpreted as the inverse of the temperature of the annealing process. The SA algorithm starts with a β min and, at each step, β is slightly increased by δβ. A smaller δβ means a slower decrease in the temperature which allows the SA method to better search for the optimal configurations, at the expense of increasing the running time of the algorithm.
In this paper we change the values of β from 0.5 to 20.0 with δβ = 10 −6 .
In Figure 1 of the main text, we show that the SA method outperforms the four scorebased algorithms; thus, for the analysis of the optimal percolation problem, we mostly use the SA method (see the main text). In Sec. S.3, we also provide results for the second best algorithm, i.e., the HDA method and show that the results are qualitatively similar to those of the SA method.

S.2. GREEDY REINSERTING (GR)
After a network (either isolated or multiplex) is dismantled using a greedy or score-based algorithm, there are some removed nodes that if added back to the network, the size of the GMCC is not increased substantially, i.e., no cluster with an extensive size is created if they are reinserted. Such nodes may have been removed because the greedy or scorebased algorithms are not exact in the sense that they do not take into account the collective nature of the dismantling problem. An approach that addresses this issue is referred to as the greedy reinserting (GR) method [S2, S5]. In the GR method, after a set of structural nodes is detected using another algorithm, at each step a randomly chosen node from the set is reinserted to the network, and unless its reinsertion does not increase the size of the GMCC to a threshold √ N , it is removed again. This process is continued until practically none of the nodes remained in the set can be added to the network without keeping the size of the GMCC non-extensive.
As shown in Figures S-1-S-5 and Figure 1 of the main text, the GR method boosts effectively the performance of every one of the score-based dismantling algorithms and returns sets of structural nodes with almost identical sizes irrespective of the initial algorithm used.
Moreover, the result of each of the score-based algorithms combined with the GR method is nearly as good as that of the SA method (the SA method itself is not improved appreciably by applying a GR method afterwards). These results suggest that probably the sets obtained by the SA method and any one of the score-based algorithms combined with GR are to a considerable extent similar to each other. In Sec. S.3 we show that actually the results of   to those find by the SA algorithm.