## Abstract

An excellent method for predicting links in multiplex networks is reflected in its ability to reconstruct them accurately. Although link prediction methods perform well on estimating the existence probability of each potential link in monoplex networks by the set of partially observed links, we lack a mathematical tool to reconstruct the multiplex network from the observed aggregate topology and partially observed links in multiplex networks. Here, we fill this gap by developing a theoretical and computational framework that builds a probability space containing possible structures with a maximum likelihood estimation. Then, we discovered that the discrimination, an indicator quantifying differences between layers from an entropy perspective, determines the reconstructability, i.e., the accuracy of such reconstruction. This finding enables us to design the optimal strategy to allocate the set of observed links in different layers for promoting the optimal reconstruction of multiplex networks. Finally, the theoretical analyses are corroborated by empirical results from biological, social, engineered systems, and a large volume of synthetic networks.

## Introduction

Multiplex networks, composed of a collection of layers sharing the same set of nodes, can describe multiple types of interactions between nodes in different layers more precisely than the corresponding aggregate networks^{1,2,3,4,5,6}. Neglecting the layers topology in the multiplex structure may lead to significant inaccurate predictions^{7,8,9,10}. For example, a tiny fraction of node removal in one layer may cause cascading failures between layers and a catastrophic collapse of the entire multiplex network^{11} (Fig. 1a). However, the corresponding aggregate network may be predicted to remain unscathed for the same set of node removal (Fig. 1b), which are corroborated by recent results that an interdependent network is significantly more vulnerable than its isolated layers and from the aggregate network against random failures^{12,13,14}. Another example is the random walk process over a multiplex transportation network. The cost of switching between two layers determines the navigability of a multiplex transportation network^{15}. The walker may need an extra cost to switch from one layer to other layers (Fig. 1c), while one can walk along with any link without extra cost within the aggregate network (Fig. 1d). Such an extra cost causes a lower coverage of the nodes that are visited by the walker in a multiplex transportation network. A further example is the dynamic of the spreading process in a temporal network, modeling rumor circulation in a social network or disease outbreak (e.g., COVID-19) in a susceptible population network^{16,17,18,19}. The topology of a temporal network may change at each time (Fig. 1e), while it is regarded as static in the aggregate network (Fig. 1f), leading to a lower infected fraction in a temporal network compared to the aggregate one.

However, it is practically difficult to obtain data representing the detailed layers of a full multiplex topology accurately, since it is costly, time-consuming, and even impossible to measure all types of interactions and heterogeneity of nodes, especially in a large-scale complex system. On the contrary, the aggregate network of a multiplex network is relatively feasible, aggregating all interactions without distinguishing their specific types. Researchers, for example, can construct the entire connectome of Caenorhabditis elegans’ neural system^{20,21} (Supplementary Fig. 1), potentially offering a better understanding of brains’ functionality. However, this information is of limited use, as discussed above, because of the unidentified multiplex layers topology, reflecting on the types of interactions (e.g., gap-junction or synapse) between any two connected neurons without extensive experiments^{22}. Analogous cases widely appear in various aspects of life, including social networks^{23} and transportation networks^{24}. Since dynamical phenomena on a multiplex network significantly differ from those on its aggregate one, it pressingly promotes the need for effective tools that can leverage limited information (the aggregate network) to accurately and efficiently predict unobserved links in each layer.

Recent studies have attempted to predict links in multiplex networks based on the structures (e.g., multiplex links and multilayer neighborhoods) when partial observations of a subnetwork in each layer are available^{25,26}. These methods are efficient in retrieving missing links in different layers; however, they ignore the useful information of the aggregate network. Building on the results of link prediction, a more exciting and vital issue in multiplex networks is developing an efficient reconstruction method based on knowledge of both the aggregate topology and partial observations of layers beyond link prediction. Recent research demonstrates that one can reconstruct a multiplex network with a given generating model (e.g., the stochastic block model)^{27,28,29} or specific dynamics data (e.g., random walk or a spreading process) is available^{24,30}. Furthermore, efforts have been devoted to estimating the number of multiplex layers from the knowledge of the aggregate network or a dynamic process^{24,31}. However, to the best of our knowledge, it remains lacking a framework to reconstruct multiplex layer structures and to display the specific topology of each hidden layer, when the number of layers is known. Thus, the following fundamental questions remain open.

The same aggregate network can be generated by different combinations of single-layer networks, which is a combinatorial optimization problem. There exists an enormous number of possible mappings from the potential multiplex layers structures to the observed aggregate topology. Specifically, the probability space composed by potential multiplex structures has an exponential (\({({2}^{L}-1)}^{|{A}^{{{{{{{{\mathcal{O}}}}}}}}}|}\)) possibilities with the number of layers *L* and the number of links in the aggregate network \(| {A}^{{{{{{{{\mathcal{O}}}}}}}}}|\) (see Supplementary Fig. 2 for more details). Q1. Can one conceive a low-complexity framework to reconstruct multiplex layer structures, avoiding the enormous cost of ergodic methods^{32}? Various characteristics of a multiplex structure affect the reconstructability. For example, the average degrees and overlap of edges in different layers has a different impact on the reconstruction accuracy. Q2. Is there an indicator that can quantify the fundamental relation between the reconstructability and diverse network characteristics? More known links yield higher reconstruction accuracy while they are more costly. Furthermore, there is a huge number of possibilities to allocate a limited budget in different layers, which is also a combinatorial optimization problem. Q3. Is there an optimal strategy to allocate a limited budget that will enable the highest reconstruction accuracy for various multiplex networks using the above indicator?

In this article, we propose a mathematical and computational framework that can reconstruct a multiplex network and predict its dynamic process on it. We found a discrimination indicator derived from information entropy, integrated by multiple network characteristics, that linearly determines the reconstruction accuracy of multiplex networks. This discovery enables us to design an optimal strategy to allocate a fixed budget for partial observations of the layers, promoting the optimal reconstruction of layers in multiplex networks for the considered network model. Empirical results based on nine real-world multiplex networks and several synthetic networks corroborate our analytical results (see Supplementary Table 1 for details of the real-world datasets). Therefore, our answers to the open questions above are “yes”.

## Results

### Framework for reconstructing multiplex layer structures

We will first introduce the notations by denoting the adjacency matrix of layer *α* in a multiplex network **M** by *M*^{α} (*α* = 1, 2, ⋯ , *L*), and \({M}_{ij}^{\alpha }=1\) if there is an edge between nodes *i*, *j* in layer *α* and vice versa (*i*, *j* = 1, 2, ⋯ , *N*). It is hypothesized that each multiplex network **M** is generated by some process such that the probability of generating a multiplex network with adjacency matrices *M*^{α} is ∏_{α}*P*(*M*^{α}∣*θ*), where *θ* represents the parameters of such a process. An aggregate mechanism is a mapping from a multiplex network **M** to a monoplex (single-layer) topology \({A}^{{{{{{{{\mathcal{O}}}}}}}}}\), where \({A}^{{{{{{{{\mathcal{O}}}}}}}}}\in {{\mathbb{R}}}^{N\times N}\) describes the adjacency matrix. In this article, for illustration, we describe the framework using multiplex networks aggregated by the OR mechanism, which is the most common case ranging from biological networks to social networks (see Supplementary Note 1 for other aggregate mechanisms). Then, we have

Partial observation Γ indicates a set that contains the observed edges in the multiplex network, where \({{\Gamma }}=\{(i,j,\alpha )| {A}_{ij}^{{{{{{{{\mathcal{O}}}}}}}}}=1,{{{{{{{\rm{and}}}}}}}}\,{M}_{ij}^{\alpha }{{\,{{{{{\rm{is}}}}}}\,{{{{{\rm{observed}}}}}}}}\}\). Supposing that we have the aggregate topology \({A}^{{{{{{{{\mathcal{O}}}}}}}}}\) and the partial observation Γ from a multiplex network **M** with *L* layers, our goal is to predict whether there is a link between any two nodes *i*, *j* in layer *α* when (*i*, *j*, *α*) ∉ Γ (Fig. 2a). Once given an aggregate topology \({A}^{{{{{{{{\mathcal{O}}}}}}}}}\), there are in total \(L\cdot | {A}^{{{{{{{{\mathcal{O}}}}}}}}}|\) links are expected to predict, where \(| {A}^{{{{{{{{\mathcal{O}}}}}}}}}|\) indicates the number of edge in the aggregate topology \({A}^{{{{{{{{\mathcal{O}}}}}}}}}\). Notice that, among the \(L\cdot | {A}^{{{{{{{{\mathcal{O}}}}}}}}}|\) potential links, we have already observed *c**a**r**d*(Γ) links, where *card*(⋅) indicates the elements number of a set. Thus, we denote the fraction of partial observations by *c* (0 ≤ *c* < 1), indicating the ratio between the number of observed edges in layers and all edges in the multiplex network, i.e., \(c=\frac{{card}({{\Gamma }})}{L\cdot | {A}^{{{{{{{{\mathcal{O}}}}}}}}}| }\).

The first step of the reconstruction is to find the most probable value of *θ* by maximizing the posterior probability \(P(\theta | {A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }})\), where *θ* is the network model parameter. Notice that the probability here provides a general description for any form of the network parameter *θ* (see Supplementary Note 2 for specific forms with a certain network model). Since there is no prior knowledge about the parameter *θ*, we assume it to have a uniform distribution, i.e., *P*(*θ*) = *const*.^{33}. In this case, based on the Bayesian rule, the maximum posterior estimate is equivalent to maximizing the log-likelihood function

which performs the maximum likelihood estimation (MLE). The second step is to reconstruct the multiplex structure **M**, inferring the probability for each possible structure specifically. Since many potential layer structures can produce the same multiplex aggregate topology, we denote the probability distribution for all multiplex structures by *Q*(**M**), where ∑_{M}*Q*(**M**) = 1. Then, the estimated parameter *θ* can reconstruct the multiplex structure by calculating the posterior distribution

However, there is a gap between the observations and network model parameters *θ*, since the multiplex structure **M** is a hidden variable in *l*(*θ*), where \(l(\theta )=\ln {\sum }_{{{{{{{{\bf{M}}}}}}}}}P({A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }},{{{{{{{\bf{M}}}}}}}}| \theta )\). Notice that the sums over **M** here are expected to be over **M** that are consistent with \({A}^{{{{{{{{\mathcal{O}}}}}}}}}\) and Γ. For any distribution *Q*(**M**), by employing the Jensen’s inequality, we have

where the equality holds if and only if the Eq. (3) is satisfied. Thus, *l*(*θ*) and *Q*(**M**) are interdependent, and we perform an iterative process to obtain the MLE of the parameter *θ* and the posterior distribution *Q*(**M**) as follows. Given an arbitrary initial value *θ*^{(0)}, we find the optimized posterior distribution *Q*^{(k)}(**M**) by Eq. (3). Then, we update the parameters *θ*^{(k)} that maximize the right-hand side of Eq. (4) by posterior distribution *Q*^{(k−1)}(**M**), which performs a coordinate ascent to maximize the log-likelihood function (Fig. 2b). The iterations above are derived from the expectation-maximization (EM) algorithm^{34} (details in the “Methods” section), and a toy example is shown (Fig. 2c). Note that if there is any prior on the parameters *θ*, the proposed framework above can be improved by maximizing the product of the likelihood function \(P({A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }}| \theta )\) and the prior *P*(*θ*), i.e., the so-called maximum a posterior estimation (MAP).

In estimation and statistics theory, an unbiased estimator is called efficient if the variance of the estimator reaches Cramer-Rao lower bound (CRLB)^{35}. Fortunately, the proposed framework yields a maximum likelihood estimation, which is an unbiased estimator, and performs asymptotic normality indicating the estimator converges in distribution to a normal distribution^{36}. With this, we prove that the variance of the estimator designed in our framework decreases as the fraction of partial observations *c* increases, and further reaches the CRLB when the network size *N* approaches infinity (see Supplementary Note 3 and Supplementary Fig. 4 for more details).

### Evaluations for the performance of the reconstruction

We now analyze the performance of reconstruction on various real-world multiplex networks. Notice that the framework works for any given analytical generative model, such as Erdos-Rényi random network model and stochastic block model. For illustration, we employ the configuration model as our network model, which exploits the parameter set *D* to describe model parameters. For a multiplex network composed by *L* layers, specifically, the parameter set *D* contains *L* vectors \(\overrightarrow{{d}^{\alpha }}\in {{\mathbb{R}}}^{N}\) (*α* = 1, 2, ⋯ , *L*), encoding the degree sequences in each layer, where the component *d*^{α}(*i*) describes the degree of node *i* in layer *α*. The configuration model can significantly reduce the complexity from exponential to polynomial by exploiting the independence of each link and this model has been widely applied to analyze the relationship between structure and function of complex networks^{37,38,39}.

As we mentioned in the last section, once a certain network model is determined, we can conduct specific derivations to find the most probable values of all parameters in *D* by maximizing the likelihood \(P({A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }}| D)\) (see Supplementary Note 2 for detailed derivations). After estimating degree sequences *D*, the posterior probability \({Q}_{ij}^{\alpha }\) can be calculated by

which is called here link reliability, measuring the probability that a link exists between node *i* and node *j* in layer *α* (see Supplementary Note 2 for complete algorithm and complexity analysis in this case). We examine the reliability of all links in the testing set *E*^{T} consisting of potential links except those of the partial observations, i.e., \({E}^{T}=\{(i,j,\alpha )\ \notin\ {{\Gamma }}| {A}_{ij}^{{{{{{{{\mathcal{O}}}}}}}}}=1\}\) (see Supplementary Fig. 5 for a schematic illustration). For this purpose, we calculate the TP (true positive rate) \(P({M}_{ij}^{\alpha }=1| {Q}_{ij}^{\alpha }\ > \ q)\), FP (false positive rate) \(P({M}_{ij}^{\alpha }=0| {Q}_{ij}^{\alpha }\ > \ q)\), TN (true negative rate) \(P({M}_{ij}^{\alpha }=0| {Q}_{ij}^{\alpha }\ < \ q)\) and FN (false negative rate) \(P({M}_{ij}^{\alpha }=1| {Q}_{ij}^{\alpha }\ < \ q)\) in *E*^{T}, where *q* is the threshold that determines the classifier boundary for varying classes.

We first vary the threshold *q* from 0 to 1, and calculate AUC (area under the receiver operating characteristic (ROC) curve) for two real-world datasets (C. elegans neural network and London transportation network) against different *c*. In the meantime, we compare our results with three-link prediction methods that performed well on inference tasks by partial observations so far. The first relevant work is referred to De Bacco et al., who have proposed a generative model, and an efficient expectation-maximization algorithm, which allows to perform inference tasks such as community detection and link prediction^{40}. It works for multiplex networks with groups, but it may fail in networks without group-based structures. The second is referred to Tarres-Deulofeu et al., who introduced a stochastic block model, which can take full advantage of the information contained in the whole network^{41}. The third baseline is calculated by a single-layer link prediction method, layer by layer, in the multiplex network (Layered model). Notice that all of them did not take the aggregate topology into consideration, while our method can provide valuable insights into how to aggregate topology information helps reconstruction. The AUCs by four methods for C. elegans neural network and London transportation are shown in Fig. 3a, displaying that the aggregate topology truly provides important information (see Supplementary Fig. 13 for more empirical results).

Further, we display the ROC curves in the ROC space for *c* = 0.01, 0.05, 0.25, 0.5, respectively (Fig. 3b, c). Here, the ROC space is defined by plotting the false positive rate on the *x*-axis and the true positive rate on the *y*-axis, displaying the relative trade-offs between false positive (costs) and true positive (benefits). Notice that the ROC curve describes the performance as a continuous function of the threshold *q*, and can well quantify the true positive rate and the false-positive rate for any given threshold *q*. Thus, AUC is a scalar that quantifies the performance and it does not depend on the choice of threshold. The results of the ROC analysis show that our proposed framework is very effective for any threshold in the interval of (0, 1), and thus it is stable for different thresholds in various real-world networks. Further, the true positive rate is positively correlated with the false-positive rate, and there exists a threshold, above which a false positive rate increases faster than the true positive rate. It is, thereby, not justified anymore to improve a true positive rate by increasing the false positive rate beyond such a threshold.

Next, we will set specific thresholds for further analysis. For example, since we do have any prior knowledge about the threshold *q* in the task, we set *q* = 0.5 to avoid arbitrary choice, i.e., when the link reliability is larger than 0.5, an edge is considered to exist, and vice versa. Then, we calculate the four metrics to evaluate the performance of multiplex reconstruction for the nine real-world datasets. As a result, the fraction of partial observations *c*, indicating the portion of the observed edges, exhibits a positive correlation with the accuracy of the reconstruction, showing good performance even for a quite small *c* (Fig. 3f) (see the “Methods” section and Supplementary Fig. 6 for more details for the evaluations). In practice, we sample the partial observations by vertex sampling, and repeat the reconstruction 100 times for each value of *c* (see the “Methods” section for more details about the sampling of partial observations).

Besides link reliability, network characteristics also include average degree, degree distribution, length distribution of the shortest paths, which are significant to network reconstruction. One prominent advantage of the reconstruction framework is that we can simultaneously obtain other micro- or mesoscale network properties. For example, the expectation of the degree distribution in layer *α* can be obtained by *E*(*p*_{α}) = ∑_{M}*Q*(**M**) ⋅ *p*_{α}(**M**), where *p*_{α}(**M**) is the degree distribution in layer *α* for any specific multiplex structure **M**. The reconstructed degree distributions in two layers with different values of *c* are compared to the ground truth (*c* = 1), demonstrating that the degree distributions in all layers can also be well reconstructed as *c* increases (Fig. 3d, e). Generally, the expectation of property *X* is the first raw moment obtained by *E*(*X*) = ∑_{M}*Q*(**M**) ⋅ *X*(**M**), while the corresponding variance is the second central moment, \(D(X)={\sum }_{{{{{{{{\bf{M}}}}}}}}}Q({{{{{{{\bf{M}}}}}}}})\cdot {\left[X({{{{{{{\bf{M}}}}}}}})-E(X)\right]}^{2}\). Moreover, we can obtain skewness, kurtosis, and higher moments of a property *X*. We will discuss how different network characteristics (e.g., average degrees and overlap of edges in different layers) impact the performance of reconstruction in the next section.

### Reconstructability determined by the discrimination indicator

It is of great significance to investigate how various characteristics of multiplex networks affect the reconstructability, i.e., the reconstruction accuracy we measured. Without loss of generality, we conduct an analysis of two-layer multiplex networks to illustrate the method in detail. Once a link is observed in the aggregate network, the probability space of the potential multiplex structure contains three events: the potential link exists (i) only in layer 1, (ii) only in layer 2, or (iii) in both layers. Then, the uncertainty of all links in the multiplex network **M** can be quantified by the information entropy

Generally speaking, the smaller the information entropy \({{{{{{{\mathcal{H}}}}}}}}\) is, the more certain the potential multiplex structure is, and vice versa.

To study how different characteristics of multiplex networks impact the information entropy, we first introduce the ratio of average degrees of two layers denoted by *r*, i.e., \(r=\frac{\left\langle {k}_{1}\right\rangle }{\left\langle {k}_{2}\right\rangle }\), where \(\left\langle {k}_{1}\right\rangle\) and \(\left\langle {k}_{2}\right\rangle\) are the average degrees in layer 1 and layer 2, respectively. We assume, without loss of generality, that \(\left\langle {k}_{1}\right\rangle \le \left\langle {k}_{2}\right\rangle\), such that 0 < *r* ≤ 1. Then, we consider the overlap of edges denoted by *v* between the two layers. A high overlap indicates that a link is more likely to exist in one layer if the corresponding link exists in the other layer, i.e., a low uncertainty. To measure the overlap *v* of a multiplex network, we refer to the Jaccard index of \({{{{{{{{\mathcal{E}}}}}}}}}_{1}\) and \({{{{{{{{\mathcal{E}}}}}}}}}_{2}\) that indicate the two edge sets in the two layers, i.e., \(v=| {{{{{{{{\mathcal{E}}}}}}}}}_{1}\cap {{{{{{{{\mathcal{E}}}}}}}}}_{2}| /| {{{{{{{{\mathcal{E}}}}}}}}}_{1}\cup {{{{{{{{\mathcal{E}}}}}}}}}_{2}|\) (see Supplementary Fig. 9 for more details about multiplex network characteristics).

We next explore how these factors impact the information entropy of a multiplex network and further determine the performance of reconstruction. By Eq. (6), we can calculate the information entropy \({{{{{{{\mathcal{H}}}}}}}}\) with a mean-field approximation, and obtain

where

and

indicating the average probability for the existence of any link in layer 1 and layer 2, respectively. Notice that in Eqs. (8) and (9), \(\hat{v}\) and \(\hat{r}\) are the estimations of *v* and *r* when we only have partial observations Γ, and we approximate them by *c* ⋅ *v* and *r*^{c} empirically (see Methods section for more details). Thus, we find that the information entropy of a given multiplex network is highly related to the fraction of partial observations *c*, the ratio of average degrees *r* and overlap *v*. It is clear that the uncertainty of the probability space decreases with the increasing of *c* and *v*. Hence, the information entropy \({{{{{{{\mathcal{H}}}}}}}}\) is a monotonously decreasing function of *c* and *v* over the domain. For *r*, however, the information entropy is a monotonously increasing function when *r* increases from 0 to 1 (Fig. 4a). Clearly, \({{{{{{{\mathcal{H}}}}}}}}\) describes the microscale discrimination between layers of a multiplex network, since a high discrimination (e.g., *r* tends to 0) leads to a low information entropy.

Generally, the reconstruction accuracy is expected to be determined by the information entropy \({{{{{{{\mathcal{H}}}}}}}}\), since \({{{{{{{\mathcal{H}}}}}}}}\) is the primary variable that affects the uncertainty of the distribution of potential multiplex structures. Empirically, we find that the accuracy is linearly determined by the indicator \(\rho \cdot {{{{{{{\mathcal{H}}}}}}}}\) (Fig. 4b), i.e.,

where *ρ* is a scaling factor satisfying

In Eq. (11) above, *s* = *s*_{(M)} is a constant related to the topology of the multiplex network **M** (see Supplementary Table 2 for approximate values of *s*_{(M)}. The term (1 − *v*)/(1 + *v*) in Eq. (11) indicates the uncertainty of links in the testing set can be reduced by partial observations. The parameter *s* describes the scale how partial observations can reduce the uncertainty of links in the testing set (details in Methods section). We further find that *s*_{(M)} is closely proportional to the cosine similarity of the two degree sequences in each layer, i.e., \(s\propto \cos \langle \overrightarrow{{d}^{1}},\overrightarrow{{d}^{2}}\rangle\) (Fig. 4c). Clearly, \(\cos \langle \overrightarrow{{d}^{1}},\overrightarrow{{d}^{2}}\rangle\) describes the similarity between degree sequences of the two layers in a multiplex network, indicating the mesoscale discrimination, which is not relevant to microscale discrimination including *r*, *v*, and \({{{{{{{\mathcal{H}}}}}}}}\) generally (Fig. 4d).

Thus, the reconstructability is determined by the discrimination indicator (\(1-\rho \cdot {{{{{{{\mathcal{H}}}}}}}}\)) from both microscale and mesoscale views. This discovery indicates that the reconstruction can be predicted accurately by the discrimination indicator, obtaining a high accuracy of reconstruction where either *ρ* or \({{{{{{{\mathcal{H}}}}}}}}\) is small. For example, the reconstructability can be enhanced when the difference in average degrees between layers is vast (*r* tends to 0). Notice that we can approximate *s* by the cosine similarity if we do not meet the exact value of *s* empirically, since *s* is highly related to the cosine similarity. We will next discuss how to allocate the partial observations in different layers when a specific budget \(\bar{c}\) is given.

### Allocating limited budget for partial observations

Usually, we have a limited budget for conducting observations in practice. It is, thereby, necessary to investigate budget allocation (partial observations Γ) in different layers to optimize the performance of reconstruction (e.g., the accuracy) as far as possible. We denote the average fraction of partial observations in each layer by \(\bar{c}\), i.e., \(\bar{c}={\sum }_{\alpha }{c}_{\alpha }/L\), where *c*_{α} indicates the fraction of partial observation in layer *α*, and denote \({{{{{{{\mathcal{H}}}}}}}}({M}^{\alpha }| {A}^{{{{{{{{\mathcal{O}}}}}}}}})\) by \({{{{{{{{\mathcal{H}}}}}}}}}_{\alpha }\) for simplicity. Similarly, employing the mean-field approximation (details in the “Methods” section), we can predict the accuracy by the function *F* defined as

where

We next explore how the performance of reconstruction is impacted by the ratio *c*_{1}/*c*_{2} when a certain budget is given for partial observations. Once \(\bar{c}\) is given, we regard the function *F* as a unary function of *c*_{1}, i.e., *F* = *F*(*c*_{1}), since \({c}_{2}=2\bar{c}-{c}_{1}\). Then, the domain of *F*(*c*_{1}) is \([0,2\bar{c}]\) if \(\bar{c}\ \le\ 0.5\), and is \([2\bar{c}-1,1]\) if \(\bar{c}\ > \ 0.5\).

We notice that the function *F*(*c*_{1}) is monotonically increasing over the domain if \(\bar{c}\) is small, but decreases at first and increases later if \(\bar{c}\) is large. Theoretical analysis shows that \(F(0)\ge F(2\bar{c})\) if \(\bar{c}\le 0.5\), and \(F(2\bar{c}-1)\ge F(1)\) if \(\bar{c}\ > \ 0.5\) (details in the “Methods” section). The result indicates that it is always better to allocate the budget as much as possible to the layer whose average degree is lower, and we can reach the optimal strategy to obtain the highest accuracy for the given network model then. Moreover, there exists a threshold \(0\ < \ {\bar{c}}_{0}({{{{{{{\bf{M}}}}}}}})\ < \ 1\) for each multiplex network **M**, where \({\bar{c}}_{0}\) is the solution to the equation

If the budget \(\bar{c}\) is less than \({\bar{c}}_{0}\), the accuracy increases when *c*_{1}/*c*_{2} increases, and reaches the maximum as *c*_{1}/*c*_{2} tends to *∞*. If the budget \(\bar{c}\) is large (\(\bar{c}\ > \ {\bar{c}}_{0}\)), however, the accuracy increases when *c*_{1}/*c*_{2} tends to 0 or *∞*, and reaches the maximum as *c*_{1}/*c*_{2} tends to *∞* (Fig. 5a), indicating that the multiplex network can be reconstructed when the aggregate topology and either of the two layers is observed (Fig. 5b). The reason is as follows. The partial observations in different layers can capture the maximal characteristics of each layer when *c*_{1}/*c*_{2} = 1. However, it will lead to more redundancy and lower accuracy if the partial observations in different layers have a high overlap of observations, making the performance even worse when *c*_{1}/*c*_{2} = 1 and \(\bar{c}\) is large. The theoretical results enable us to make the best strategy to allocate budget and thus obtain the optimal reconstruction of multiplex networks in this case. Furthermore, results from real-world data sets verified our theoretical analysis (Fig. 5a, c). We will discuss how different multiplex network characteristics impact the performance of reconstruction from a dynamical behavior point of view in the next section.

### Predicting dynamic processes in multiplex networks

We proceed to investigate the performance of the reconstructed multiplex networks on the prediction of dynamic processes, which is critical to the network functionality. First, we study a percolation process occurring on a two-layer interdependent multiplex network. In such a multiplex network, once a set of nodes is removed (e.g., being attacked or random failure) in one layer, nodes disconnected to the GCC (giant connected component) in the same layer and the counterparts of the removed nodes will also fail and thus be removed. The new removed nodes result in more node removal, and the repetitive processes lead to the catastrophic cascade of failures. For the reconstructed multiplex network encoded by the expectation *E*[*Q*(*M*)], we binarize the matrix *E*[*Q*(*M*)] and randomly remove nodes in one layer with probability 1 − *p* (see Supplementary Note 4 for more details of the process). We calculate the size of GMCC (giant mutually connected component) as a function of *p* and the critical threshold *p*_{c}, above which the GMCC exists. We compare the average size of GMCC in the reconstructed network (repeated 100 times) to the real one with the C. elegans neural network against ranging *c* (Fig. 6a). The performance of reconstruction is well as seen from the size of GMCC, even if *c* is small. The estimates of the size of GMCC and the critical probability *p*_{c} approach those of the real networks (*c* = 1) as *c* increases 1. However, the proposed method slightly underestimates both the size of GMCC and the critical threshold *p*_{c} for the C. elegans neural network. Further, simulations on synthetic networks reveal that the method underestimates much more the robustness and *p*_{c} of the interdependent networks when *r* is small (Fig. 6b). When *r* is small and close to 0, the edges in the multiplex network are concentrated on one layer, leading to the extreme vulnerability. However, the reconstructed method could not capture the unbalance, especially for a small fraction of partial observation.

Second, we consider a random walk process taking place on interconnected multiplex networks, where interlayer links only exist between counterparts. We suppose that a number of walkers start from randomly chosen nodes and walk along with intralayer links with a probability *p*_{intra}, and along with interlayer links with probability *p*_{inter} (see Supplementary Note 5 for more details). We employ the coverage *ϕ*(*t*) as the performance metric, indicating the fraction of nodes that have been visited by the walkers before time *t*. The coverage at each time on reconstructed multiplex networks are compared to the real one (London multiplex transportation network) against different *c*, showing an outstanding good prediction as *c* increases (Fig. 6c). Simulations on synthetic networks show that the coverage will be overestimated no matter *r* is small or large (Fig. 6d). Moreover, we conduct more simulations with real-world multiplex networks with more than two layers. For example, we simulated the random walk processes on the European air transportation network which has three layers (Ryanair, Lufthansa, EasyJet), and the air transportation network in the U.S. which has four layers (SkyWest, Southwest, American Eagle, American Airlines) (see Supplementary Fig. 10c, d for more results on real-world networks).

Last, we investigate a spreading process based on the SI (susceptible-infected) model on temporal networks, where interlayer links only exist between two counterparts at two adjacent time slots^{23}. In the SI model, each node has only two states: “susceptible” (S) or “infected” (I), and 5% nodes are randomly chosen to be sources (infected) at initial time *t* = 0 in our simulations. At each time (which corresponds to one layer in the temporal network), the infected nodes will infect the susceptible neighboring nodes with a specific infection rate *λ* (see Supplementary Note 6 for more details). In the spreading process, the fraction of infected nodes *I*(*T*) at time *t* = *T* on the reconstructed networks is calculated and compared to the real one with the network of social interactions at SFHH (La Société française d’Hygiène Hospitalière) conference (Fig. 6e). Simulations on synthetic networks reveal that the method overestimates *I*(*T*) when *r* is close to 1 (Fig. 6f). We also compare these results with the existing methods (see Supplementary Fig. 11 for more details). Further, we simulated the spreading processes on the Wiki-talk network which has 14 layers (14 weeks), and the CollegeMsg network that has 22 layers (22 days) (see Supplementary Fig. 10e, f for more results on real-world networks). Moreover, we have studied how the performance of reconstruction for dynamics is influenced by more network characteristics, including the overlap of edges and ratio of heterogeneity (see Supplementary Fig. 12 for more results on synthetic network with different characteristics).

## Discussion

Network reconstruction has attracted much research attention recently^{24,31,42}, since it has wide applications such as link prediction, community detection and systems’ vulnerability analysis. Most previous studies focused on monoplex networks, and therefore there is a pressing need to develop a reconstruction framework for multiplex networks. Existing work have met success to determine if an observed monoplex network is the outcome of a hidden multilayer process by assuming generative models for each layer of the multiplex network. However, it is necessary to further explore the multiplex layers structure and predict the dynamics once it is verified that there is a hidden multiplex structure. Our primary goal is to reconstruct the multiplex structure from the knowledge of both an observed aggregate monoplex network and partial observations.

However, there are many challenges preventing us to build a framework for the reconstruction of multiplex networks. Given the aggregation mechanism (e.g., the OR mechanism), apparently, there are a large number of potential structures given the same aggregate network. To avoid the ergodic methods, we propose a framework by building a probability space *Q*(**M**), and reduce the complexity from exponential to polynomial by employing the configuration model that allows an arbitrary degree sequence. Generally, priors on generating models or other dynamic information are almost unavailable, while the local information (referred to as partial observations in this article) in some specific layers is more accessible. For this purpose, we need to estimate the network model parameter *θ* based on very limited partial observations and, unfortunately, it is interdependent on the posterior probability distribution *Q*(**M**). We design here an efficient mathematical framework based on the Expectation-Maximization method performing a maximum likelihood estimation, and prove that the variance of the estimation reaches the CRLB when network size *N* approaches infinity. We evaluate the performance of the reconstruction using various empirical multiplex networks, ranging from microscale (e.g., the accuracy of link reliability) to mesoscale (e.g., degree distributions). Empirical results demonstrate that the performance of reconstruction mounts quickly initially with a small fraction of partial observations, exhibiting the power of the proposed reconstruction framework. Applying mean-field approximation, we find that a discrimination indicator that integrates all considerable factors determines the accuracy of the reconstruction, which theoretically aids us to have a deep understanding the relation between the reconstructability and the network characteristics. Thus, the indicator enables us to make the best strategy to allocate limited budget and obtain the highest accuracy for the given network model, i.e., the optimal reconstruction. Notice that though our analysis was illustrated by the configuration model that was based on the degree sequence, we found that the empirical results did not rely on the degree sequence. Finally, we investigate the performance from a dynamical view, finding that our proposed framework can well predict dynamic processes taking place on multiplex networks, and the impact of network characteristics (e.g., average degree, heterogeneity and overlap of edges in different layers) on performance is analyzed.

To the best of our knowledge, we provide here a comprehensive mathematical framework for reconstructing a multiplex network layer structures although there exist several good related works. For example, Bianconi et al. studied an ensemble consisting of all multiplex networks submitted to a given degree sequence, which is a constraint in their case^{43}. In our work, however, the degree sequence is the parameter (of the configuration model) to be estimated, which is a fundamental distinction between our work and Bianconi’s case. Thus, our framework paves insight on the relation between the structure and the reconstructability of multiplex networks, and our finding reveals the essential features of multiplex network reconstruction. In future work we will focus on improving the reconstruction performance by data other than node-pair interactions that can be helpful in terms of reconstructing multiplex networks. We took the configuration model as an illustration for our framework, since it is a relative general model that allows an arbitrary degree sequence. However, it remains necessary to study how well does the framework perform with group-based models or centrality-based models. Note that it is also interesting to reconstruct multiplex networks through the dynamics and steady state of the system^{44,45}. We believe that our proposed framework will have an impact in many different applications including link prediction, missing links recovery, spurious links location drawn from biological, social, transportation domains.

## Methods

### Expectation maximization framework

In this section, we will present details on how to obtain the maximum likelihood estimation of network model parameter *θ*. The primary objective is to find *θ* that maximizes the likelihood function \(P({A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }}| \theta )\). In practice, we will maximize its logarithm \(l(\theta )=\ln P({A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }}| \theta )\) rather than \(P({A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }}| \theta )\) for computational convenience. Clearly, by employing the law of total probability, we have

which is a summation over all possible potential multiplex structure **M**. Employing the Jensen’s inequality then, we have

where the inequality holds as long as *Q*(**M**) is a distribution of the multiplex structure **M** satisfying ∑_{M}*Q*(**M**) = 1. For simplicity, we denote the right-hand side of equation (16) by *J*(*Q*, *θ*), i.e.,

Notice that *J* is a function of the distribution *Q*(**M**) and the parameter *θ* simultaneously, and it is obviously a lower-bound function of the log-likelihood function *l*(*θ*).

In the expectation-maximization (EM) algorithm^{34}, we will maximize the lower-bound function *J* by recursively executing two steps: E-step and M-step. In the E-step, we maximize *J*(*Q*, *θ*) while keeping *θ* constant. It is easy to see that the equality in Eq. (16) holds if and only if *Q*(**M**) is the posterior distribution of the multiplex structure **M**, i.e.,

In the M-step, we differentiate Eq. (17) with respect to *θ* while fixing *Q*(**M**), and find the solution to the following equation

Notice that the item \({\sum }_{{{{{{{{\bf{M}}}}}}}}}Q({{{{{{{\bf{M}}}}}}}})\ln P({A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }},{{{{{{{\bf{M}}}}}}}}| \theta )\) is the posterior expectation of the log-likelihood function \(\ln P({A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }},{{{{{{{\bf{M}}}}}}}}| \theta )\) with respect to the distribution *Q*(**M**). Thus, given an arbitrary initial value denoted by *θ*^{(0)}, we can iteratively update the distribution *Q*(**M**) and the parameter *θ* until convergence. The two steps can be written as the iteration scheme

Next we will briefly prove that the seris {*θ*^{(k)}} converges to the value that maximizes the log-likelihood. On one hand,

We can see that the sequence {*l*(*θ*^{(k)})} monotonously increases as *k* grows. On the other hand, the log-likelihood sequence {*l*(*θ*^{(k)})} obviously has a upper bound, i.e., \(\max \ln P({A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }}| \theta )\). Then {*θ*^{(k)}} converges to the value that loccally maximizes the log-likelihood function *l*(*θ*)^{46}. In practice, however, the log-likelihood function may have more than one local maximum values in more complex situations, while the EM algorithm is not guaranteed to converge to the global one. To overcome the problem, we try different random initial values for the parameter repeatedly, and find the global maximum of the likelihood value when they converge^{47}.

### Evaluation indices for reconstruction

Accuracy, precision, recall, and AUC (area under the receiver operating characteristic curve) have been widely adopted to evaluate classification methods^{48}. Accuracy is defined by the fraction between true results (both true positives and true negatives) and the total number of tests, i.e., accuracy = (TP + TN)/(TP + TN + FP + FN); Precision gives the probability that a link exists in real network when reliability \({Q}_{ij}^{\alpha } > q\), i.e., precision = TP/(TP + FP); Recall equals to the proportion of true positive rate over true positive rate and false-negative rate, i.e., recall = TP/(TP+FN). In addition, for those links whose reliability \({Q}_{ij}^{\alpha }=q\), they have half contribution to the proportion. The AUC quantifies the expectation that the proposed method ranks a positive one higher than a negative one. Thus, all the tested links are ranked decreasingly according to their values of reliability, and the probability that a real link has a higher reliability than a nonexistent link is calculated. For imbalanced datasets (networks always tend to be sparse), the area under the precision-recall curve (AUPRC) is also a meaningful metric. Moreover, the Matthews correlation coefficient (MCC) is another popular choice, which produces a more informative and truthful score in evaluating binary classifications^{49}. These six metrics are used to evaluate the proposed framework for the nine real-world data sets (Supplementary Figs. 6 and 7).

### Partial observation sampling

In this section, we would clarify the sampling of partial observations in our simulations. We sampled them by vertex sampling for a given fraction *c*. Specifically, for a large undirected network, there are \(\frac{{N}^{2}}{2}\) pairs of nodes, where *N* is the number of nodes. Note that, once we have observed these \(\frac{{N}^{2}}{2}\) pairs, the network is certain and we do not need to infer the edge probability any more. Then, we randomly choose \(n=\sqrt{c}\cdot N\) nodes, and the subgraph consisting of these nodes includes \(\frac{{(\sqrt{c}\cdot N)}^{2}}{2}=c\cdot \frac{{N}^{2}}{2}\) edges in the original network. For each value of *c*, we repeated the reconstruction 100 times to avoid randomness, and drawn error bars in relevant figures of the manuscript.

### Mean-field approximation

Our goal in this section is to clarify the discrimination indicator introduced in Results section by mean-field approximation. First we will calculate the information entropy \({{{{{{{\mathcal{H}}}}}}}}({{{{{{{\bf{M}}}}}}}}| {A}^{{{{{{{{\mathcal{O}}}}}}}}})\) determined by the ratio of the average degrees *r*, overlap of edges *v*, and fraction of partial observations *c*.

Notice that *r* ratio of average degrees of two layers and *v* is the overlap of the edge sets in two layers, and they are not available when we only have partial observations, and thus they are estimated by \(\hat{r}(r,c)\) and \(\hat{v}(v,c)\). Since the average degree \(\left\langle {k}_{\alpha }\right\rangle\) is a mesoscale property, we estimate \(\hat{\left\langle {k}_{\alpha }\right\rangle }\) by \(E[\left\langle {k}_{\alpha }\right\rangle ]={\sum }_{{{{{{{{\bf{M}}}}}}}}}[Q({{{{{{{\bf{M}}}}}}}})\cdot \left\langle {k}_{\alpha }\right\rangle ({{{{{{{\bf{M}}}}}}}})]\), indicating the expectation of \(\left\langle {k}_{\alpha }\right\rangle\) for all potential multiplex structure. The results shown in Supplementary Fig. 8 allow us to estimate \(\hat{\left\langle {k}_{\alpha }\right\rangle }\) approximately by

Thus, we have

Noticing that we can approximate \(\hat{r}\) by \(\hat{r}\approx {r}^{c}\), and approximate \(\hat{v}\) by \(\hat{v}\approx c\cdot v\) for simplicity in practice. Moreover, we also designed nonlinear approximation with a higher precision. However, the approximations that are more precise but more complex would only slightly improve the final results (see Supplementary Note 7 for the evaluations of the approximations).

Next, we explore the expression of \({{{{{{{\mathcal{H}}}}}}}}\) with parameters *c*, *r*, and *v*. Supposing that links between different nodes are independent, we obtain

For the OR-aggregation mechanism, \({{{{{{{\mathcal{H}}}}}}}}({M}_{ij}^{1}| {A}_{ij}^{{{{{{{{\mathcal{O}}}}}}}}})\) and \({{{{{{{\mathcal{H}}}}}}}}({M}_{ij}^{2}| {A}_{ij}^{{{{{{{{\mathcal{O}}}}}}}}})\) are both equal to 0 if \({A}_{ij}^{{{{{{{{\mathcal{O}}}}}}}}}=0\). When \({A}_{ij}^{{{{{{{{\mathcal{O}}}}}}}}}=1\), we have

where

and

Thus, we obtain

We empirically find that the accuracy of the reconstruction is linearly determined by the product of \({{{{{{{\mathcal{H}}}}}}}}\) and scaling factor *ρ*, i.e.,

where

Notice that

and

When observing an edge in layer *α*, the probability that the edge exists in the other layer satisfies

Thus, the term (1 − *v*)/(1 + *v*) in Eq. (33) indicates the fraction of uncertainty that can be reduced by partial observations, and *s* describes the scale how partial observations can reduce the uncertainty of links in testing set.

### Budget allocation

We consider the case when we have different fractions of partial observations in each layer denoted by *c*_{1} and *c*_{2}. We denote \({{{{{{{\mathcal{H}}}}}}}}({M}^{1}| {A}^{{{{{{{{\mathcal{O}}}}}}}}})\) by \({{{{{{{{\mathcal{H}}}}}}}}}_{1}\) for simplicity, and use \(\bar{c}=({c}_{1}+{c}_{2})/2\) to denote the given budget. Empirically, we can predict the accuracy by the function *F* defined by

where

and

Once a certain budget (\(\bar{c}\)) is given, we regard the function *F* as a unary function of *c*_{1}, i.e.,

We next study the property of *F* with *c*_{1}, and we will first prove that \({{{{{{{{\mathcal{H}}}}}}}}}_{1}\ge {{{{{{{{\mathcal{H}}}}}}}}}_{2}\) here. According to the definition,

where

Notice that the function

is a monotone increasing function when \(0\ < \ x\ \le\ \frac{1}{2}\), and a monotone decreasing function when \(\frac{1}{2}\,\le\, x \, < \, 1\). For \({\bar{p}}_{2}\), we have

When \({\bar{p}}_{1}\le \frac{1}{2}\),

indicating \(1/2\le 1-{\bar{p}}_{1}\le {\bar{p}}_{2}\). Thus, \(f(1-{\bar{p}}_{1})\ge f({\bar{p}}_{2})\), i.e., \({{{{{{{{\mathcal{H}}}}}}}}}_{1}\ge {{{{{{{{\mathcal{H}}}}}}}}}_{2}\). When \({\bar{p}}_{1}\ge \frac{1}{2}\),

indicating \(1/2\le {\bar{p}}_{1}\le {\bar{p}}_{2}\). Thus, \(f({\bar{p}}_{1})\ge f({\bar{p}}_{2})\), i.e., \({{{{{{{{\mathcal{H}}}}}}}}}_{1}\ge {{{{{{{{\mathcal{H}}}}}}}}}_{2}\).

Then, we will consider the maxima of function *F*. When \(\bar{c}\le 1/2\) (\({c}_{1}\in [0,2\bar{c}]\)), we have

and

Thus, we have

indicating \(F(0)\le F(2\bar{c})\).

When \(\bar{c}\ > \ 1/2\) (\({c}_{1}\in [2\bar{c}-1,1]\)), we have

and

Thus, we have

indicating \(F(2\bar{c}-1)\le F(1)\).

### Synthetic networks generation

To have a deep exploration of the framework proposed in this article, we generate several synthetic networks with various network characteristics for performance evaluation. Here we mainly focus on two-layer networks with different *r*, *v*, and \(\cos \langle \overrightarrow{{d}^{1}},\overrightarrow{{d}^{2}}\rangle\) as we defined in the Results section.

As shown in Fig. 4b, we test synthetic networks ranging 0 < *r* ≤ 1 and 0 ≤ *v* ≤ 1/2. We first clarify how to generate a multiple network with a given *r*^{*} and *v*^{*}. When given *r*^{*} and *v*^{*}, we generate the adjacency matrix *M*^{1} (the first layer in the multiplex network) by the Erdős-Rényi model, which indicates the edge \({M}_{ij}^{1}\) between any two nodes *i* and *j* submitted to the Bernoulli distribution

Without loss of generality, we take \(p=\frac{5}{N-1}\) such that the average degree \(\left\langle {k}_{1}\right\rangle =5\) to avoid being too sparse. Then, we generate the adjacency matrix *M*^{2} (the second layer in the multiplex network) by a specific way, where \({M}_{ij}^{2}\) is submitted to the distribution

and

Next we prove that the parameters *r* and *v* of the generated multiplex network **M** satisfy *r* = *r*^{*} and *v* = *v*^{*}.

Obviously, the expectation for average degree \(\left\langle {k}_{1}\right\rangle\) satisfies

and the expectation for average degree \(\left\langle {k}_{2}\right\rangle\) satisfies

Thus,

Then, the expectation of \(| {{{{{{{{\mathcal{E}}}}}}}}}_{1}\cap {{{{{{{{\mathcal{E}}}}}}}}}_{2}|\) satisfies

and the expectation of \(| {{{{{{{{\mathcal{E}}}}}}}}}_{1}\cup {{{{{{{{\mathcal{E}}}}}}}}}_{2}|\) satisfies

Thus,

As shown in Fig. 4c, we test a number of synthetic multiplex networks ranging \(0 < \cos \langle \overrightarrow{{d}^{1}},\overrightarrow{{d}^{2}}\rangle < 1\). Since \(\cos \langle \overrightarrow{{d}^{1}},\overrightarrow{{d}^{2}}\rangle\) is determined by degree sequences of the two layers, we generate multiplex networks with given expectation of degree sequences. Specifically, we first randomly generate positive vectors \(\overrightarrow{{d}^{1}}\) and \(\overrightarrow{{d}^{2}}\) such that the inner product \(\overrightarrow{{d}^{1}}\cdot \overrightarrow{{d}^{2}}\) ranges from 0 to 1. Then, we generate a number of multiplex networks with each layer being generated by the given degree sequence \({\overrightarrow{d}}^{\alpha }\). For adjacency matrix *M*^{α}, the element \({M}_{ij}^{\alpha }\) is submitted to a Bernoulli distribution, i.e., \(P({M}_{ij}^{\alpha }=1)=\frac{{d}^{\alpha }(i)\cdot {d}^{\alpha }(j)}{|| {\overrightarrow{d}}^{\alpha }|| { }_{1}-1}\). This generating process can also provide multiplex networks of different 0 < *r*_{h} ≤ 1 as shown in the Supplementary Fig. 12a, b, c, since we can also generate the degree sequences with a given variance.

## Data availability

All data needed to evaluate the conclusions in the paper are available online as follows. The C. elegans multiplex connectome dataset used in this study is available at https://comunelab.fbk.eu/data.php. The London multiplex transportation network is available at https://comunelab.fbk.eu/data.php. The temporal social interactions at the SFHH (La Société française d’Hygiène Hospitalière) conference is available at www.sociopatterns.org/datasets/sfhh-conference-data-set/. The multiplex GPI (genetic and protein interactions) network of the Saccharomyces Pombe is available at https://comunelab.fbk.eu/data.php. The Yeast landscape multiplex interaction networks of genes is available at https://comunelab.fbk.eu/data.php. The multiplex air transportation network of Europe is available at http://complex.unizar.es/ãtnmultiplex/. The multiplex air transportation network of the U.S.A. is available at http://stat-computing.org/dataexpo/2009/the-data.html. The temporal network of Wikipedia users editing each other’s Talk page is available at http://snap.stanford.edu/data/wiki-talk-temporal.html. The CollegeMsg temporal social network is available at http://snap.stanford.edu/data/CollegeMsg.html.

## Code availability

Code reproducing the key results of this paper is available from the corresponding author upon reasonable request.

## References

Gao, J., Buldyrev, S. V., Stanley, H. E. & Havlin, S. Networks formed from interdependent networks.

*Nat. phys.***8**, 40–48 (2012).Boccaletti, S. et al. The structure and dynamics of multilayer networks.

*Phys. Rep.***544**, 1–122 (2014).Kivelä, M. et al. Multilayer networks.

*J. Complex Netw.***2**, 203–271 (2014).Bianconi, G.

*Multilayer Networks: Structure and Function*(Oxford University Press, 2018).Moreno, Y. & Perc, M. Focus on multilayer networks.

*N. J. Phys.***22**, 010201 (2019).Cozzo, E., de Arruda, G. F., Rodrigues, F. A. & Moreno, Y. (eds) In

*Multiplex networks: basic formalism and structural properties*, 7–20 (Springer, 2018).De Domenico, M., Solé-Ribalta, A., Omodei, E., Gómez, S. & Arenas, A. Ranking in interconnected multilayer networks reveals versatile nodes.

*Nat. Commun*.**6**, 6868 (2015).Zeng, G. et al. Switch between critical percolation modes in city traffic dynamics.

*Proc. Natl Acad. Sci.***116**, 23–28 (2019).Cardillo, A. et al. Emergence of network features from multiplexity.

*Sci. Rep.***3**, 1344 (2013).Wu, M. et al. A tensor-based framework for studying eigenvector multicentrality in multilayer networks.

*Proc. Natl Acad. Sci. USA***116**, 15407–15413 (2019).Buldyrev, S. V., Parshani, R., Paul, G., Stanley, H. E. & Havlin, S. Catastrophic cascade of failures in interdependent networks.

*Nature***464**, 1025–1028 (2010).Liu, X., Eugene Stanley, H. & Gao, J. Breakdown of interdependent directed networks.

*Proc. Natl Acad. Sci.***113**, 1138–1143 (2016).Gao, J., Buldyrev, S. V., Havlin, S. & Stanley, H. E. Robustness of a network of networks.

*Phys. Rev. Lett.***107**, 195701 (2011).Danziger, M. M., Bonamassa, I., Boccaletti, S. & Havlin, S. Dynamic interdependence and competition in multilayer networks.

*Nat. Phys.***15**, 178 (2019).De Domenico, M., Solé-Ribalta, A., Gómez, S. & Arenas, A. Navigability of interconnected networks under random failures.

*Proc. Natl Acad. Sci. USA***111**, 8351–8356 (2014).Pastor-Satorras, R., Castellano, C., Van Mieghem, P. & Vespignani, A. Epidemic processes in complex networks.

*Rev. Mod. Phys.***87**, 925 (2015).Li, W., Tian, L., Gao, X. & Pan, B. Impacts of information diffusion on green behavior spreading in multiplex networks.

*J. Cleaner Prod*.**222**, 488–498 (2019).Alvarez-Zuzek, L., Di Muro, M., Havlin, S. & Braunstein, L. Dynamic vaccination in partially overlapped multiplex network.

*Phys. Rev. E***99**, 012302 (2019).Chinazzi, M. et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (covid-19) outbreak.

*Science***368**, 395–400 (2020).Yan, G. et al. Network control principles predict neuron function in the caenorhabditis elegans connectome.

*Nature***550**, 519 (2017).Cook, S. J. et al. Whole-animal connectomes of both caenorhabditis elegans sexes.

*Nature***571**, 63–71 (2019).White, J. G., Southgate, E., Thomson, J. N. & Brenner, S. The structure of the nervous system of the nematode caenorhabditis elegans.

*Philos. Trans. R. Soc. Lond. B Biol. Sci.***314**, 1–340 (1986).Gauvin, L., Panisson, A., Cattuto, C. & Barrat, A. Activity clocks: spreading dynamics on temporal networks of human contact.

*Sci. Rep.***3**, 3099 (2013).Lacasa, L. et al. Multiplex decomposition of non-markovian dynamics and the hidden layer reconstruction problem.

*Phys. Rev. X***8**, 031038 (2018).Jalili, M., Orouskhani, Y., Asgari, M., Alipourfard, N. & Perc, M. Link prediction in multiplex online social networks.

*R. Soc. open Sci.***4**, 160863 (2017).Pujari, M. & Kanawati, R. Link prediction in multiplex networks.

*Netw. Heterogeneous Media***10**, 17 (2015).Peel, L., Larremore, D. B. & Clauset, A. The ground truth about metadata and community detection in networks.

*Sci. Adv.***3**, e1602548 (2017).Massucci, F. A. et al. Inferring propagation paths for sparsely observed perturbations on complex networks.

*Sci. Adv.***2**, e1501638 (2016).Taylor, D., Caceres, R. S. & Mucha, P. J. Super-resolution community detection for layer-aggregated multilayer networks.

*Phys. Rev. X***7**, 031056 (2017).Ma, C., Chen, H.-S., Li, X., Lai, Y.-Ch & Zhang, H.-F. Data based reconstruction of duplex networks.

*SIAM J. Appl. Dynamical Syst.***19**, 124–150 (2020).Valles-Catala, T., Massucci, F. A., Guimera, R. & Sales-Pardo, M. Multilayer stochastic block models reveal the multilayer structure of complex networks.

*Phys. Rev. X***6**, 011036 (2016).Santoro, A. & Nicosia, V. Algorithmic complexity of multiplex networks.

*Phys. Rev. X***10**, 021069 (2020).Peixoto, T. P. Parsimonious module inference in large networks.

*Phys. Rev. Lett.***110**, 148701 (2013).Dempster, A. P., Laird, N. M. & Rubin, D. B. Maximum likelihood from incomplete data via the em algorithm.

*J. R. Stat. Soc.: Ser. B (Methodol.)***39**, 1–22 (1977).Lehmann, E. L.

*Elements of Large-sample Theory*(Springer Science & Business Media, 2004).Newey, W. K. & McFadden, D. Large sample estimation and hypothesis testing.

*Handb. Econ.***4**, 2111–2245 (1994).Newman, M. E. The structure and function of complex networks.

*SIAM Rev.***45**, 167–256 (2003).Courtney, O. T. & Bianconi, G. Generalized network structures: The configuration model and the canonical ensemble of simplicial complexes.

*Phys. Rev. E***93**, 062311 (2016).Newman, M.

*Networks*(Oxford University Press, 2018).De Bacco, C., Power, E. A., Larremore, D. B. & Moore, C. Community detection, link prediction, and layer interdependence in multilayer networks.

*Phys. Rev. E***95**, 042317 (2017).Tarrés-Deulofeu, M., Godoy-Lorite, A., Guimera, R. & Sales-Pardo, M. Tensorial and bipartite block models for link prediction in layered networks and temporal networks.

*Phys. Rev. E***99**, 032307 (2019).Newman, M. Network structure from rich but noisy data.

*Nat. Phys.***14**, 542 (2018).Bianconi, G. Statistical mechanics of multiplex networks: entropy and overlap.

*Phys. Rev. E***87**, 062806 (2013).Gao, J., Barzel, B., & Barabási, AL. Universal resilience patterns in complex networks.

*Nature***530**, 307–312 (2016).Jiang, C., Gao, J., & M.-I., Malik. True nonlinear dynamics from incomplete networks.

*Proceedings of the AAAI Conference on Artificial Intelligence***34**, 131–138 (2020).Moon, T. K. The expectation-maximization algorithm.

*IEEE Signal Process. Mag.***13**, 47–60 (1996).Newman, M. E. & Clauset, A. Structure and inference in annotated networks.

*Nat. Commun.***7**, 11863 (2016).Storey, J. D. et al. The positive false discovery rate: a bayesian interpretation and the q-value.

*Ann. Stat.***31**, 2013–2035 (2003).Chicco, D. & Jurman, G. The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation.

*BMC Genomics***21**, 1–13 (2020).

## Acknowledgements

We thank Prof. H. Eugene Stanley for enlightening discussions and constructive suggestions. We would like to additionally thank the anonymous reviewers for their comprehensive feedback. This work was supported by NSFC 62088101 Autonomous Intelligent Unmanned Systems, NSFC under grant No. U1909207, and Key Research and Development Program of Zhejiang Province (Grant No. 2021C03037). Shlomo H. thanks the Israel Science Foundation (Grant No. 189/19), the NSF-BSF (Grant No. 2019740), the EU H2020 project RISE (Project No. 821115), the EU H2020 DIT4TRAM, and DTRA (Grant No. HDTRA-1-19-1-0016) for financial support. J.G. acknowledges the support of the U.S. National Science Foundation under Grant No. 2047488 and the Rensselaer-IBM AI Research Collaboration.

## Author information

### Authors and Affiliations

### Contributions

J.C., Shibo H., and J.G. conceived and designed the project. M.W. performed theoretical derivation and numerical simulation. J.C., Shibo H., J.G., and Shlomo H. contributed to the theoretical interpretation of the empirical data. All authors analyzed the results. M.W., J.C., Shibo H., and J.G. wrote the manuscript. Y.S. and Shlomo H. edited the manuscript.

### Corresponding authors

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Peer review

### Peer review information

*Communications Physics* thanks Xiangyun Gao, Nikita Frolov and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

## Additional information

**Publisher’s note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary information

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Wu, M., Chen, J., He, S. *et al.* Discrimination reveals reconstructability of multiplex networks from partial observations.
*Commun Phys* **5**, 163 (2022). https://doi.org/10.1038/s42005-022-00928-w

Received:

Accepted:

Published:

DOI: https://doi.org/10.1038/s42005-022-00928-w

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.