Discrimination reveals reconstructability of multiplex networks from partial observations

An excellent method for predicting links in multiplex networks is reflected in its ability to reconstruct them accurately. Although link prediction methods perform well on estimating the existence probability of each potential link in monoplex networks by the set of partially observed links, we lack a mathematical tool to reconstruct the multiplex network from the observed aggregate topology and partially observed links in multiplex networks. Here, we fill this gap by developing a theoretical and computational framework that builds a probability space containing possible structures with a maximum likelihood estimation. Then, we discovered that the discrimination, an indicator quantifying differences between layers from an entropy perspective, determines the reconstructability, i.e., the accuracy of such reconstruction. This finding enables us to design the optimal strategy to allocate the set of observed links in different layers for promoting the optimal reconstruction of multiplex networks. Finally, the theoretical analyses are corroborated by empirical results from biological, social, engineered systems, and a large volume of synthetic networks.

M ultiplex networks, composed of a collection of layers sharing the same set of nodes, can describe multiple types of interactions between nodes in different layers more precisely than the corresponding aggregate networks [1][2][3][4][5][6] . Neglecting the layers topology in the multiplex structure may lead to significant inaccurate predictions [7][8][9][10] . For example, a tiny fraction of node removal in one layer may cause cascading failures between layers and a catastrophic collapse of the entire multiplex network 11 (Fig. 1a). However, the corresponding aggregate network may be predicted to remain unscathed for the same set of node removal (Fig. 1b), which are corroborated by recent results that an interdependent network is significantly more vulnerable than its isolated layers and from the aggregate network against random failures [12][13][14] . Another example is the random walk process over a multiplex transportation network. The cost of switching between two layers determines the navigability of a multiplex transportation network 15 . The walker may need an extra cost to switch from one layer to other layers (Fig. 1c), while one can walk along with any link without extra cost within the aggregate network (Fig. 1d). Such an extra cost causes a lower coverage of the nodes that are visited by the walker in a multiplex transportation network. A further example is the dynamic of the spreading process in a temporal network, modeling rumor circulation in a social network or disease outbreak (e.g.,  in a susceptible population network [16][17][18][19] . The topology of a temporal network may change at each time ( Fig. 1e), while it is regarded as static in the aggregate network (Fig. 1f), leading to a lower infected fraction in a temporal network compared to the aggregate one.
However, it is practically difficult to obtain data representing the detailed layers of a full multiplex topology accurately, since it is costly, time-consuming, and even impossible to measure all types of interactions and heterogeneity of nodes, especially in a large-scale complex system. On the contrary, the aggregate network of a multiplex network is relatively feasible, aggregating all interactions without distinguishing their specific types. Researchers, for example, can construct the entire connectome of Caenorhabditis elegans' neural system 20,21 ( Supplementary  Fig. 1), potentially offering a better understanding of brains' functionality. However, this information is of limited use, as discussed above, because of the unidentified multiplex layers topology, reflecting on the types of interactions (e.g., gap-junction or synapse) between any two connected neurons without extensive experiments 22 . Analogous cases widely appear in various aspects of life, including social networks 23 and transportation networks 24 . Since dynamical phenomena on a multiplex network significantly differ from those on its aggregate one, it pressingly promotes the need for effective tools that can leverage limited information (the aggregate network) to accurately and efficiently predict unobserved links in each layer.
Recent studies have attempted to predict links in multiplex networks based on the structures (e.g., multiplex links and multilayer neighborhoods) when partial observations of a subnetwork in each layer are available 25,26 . These methods are efficient in retrieving missing links in different layers; however, they ignore the useful information of the aggregate network. Building on the results of link prediction, a more exciting and vital issue in multiplex networks is developing an efficient reconstruction method based on knowledge of both the aggregate topology and partial observations of layers beyond link prediction. Recent research demonstrates that one can reconstruct a multiplex network with a given generating model (e.g., the stochastic block model) [27][28][29] or specific dynamics data (e.g., random walk or a spreading process) is available 24,30 . Furthermore, efforts have been devoted to estimating the number of multiplex layers from the knowledge of the aggregate network or a dynamic process 24,31 . However, to the best of our knowledge, it remains lacking a framework to reconstruct multiplex layer structures and to display the specific topology of each hidden layer, when the number of layers is known. Thus, the following fundamental questions remain open.
The same aggregate network can be generated by different combinations of single-layer networks, which is a combinatorial optimization problem. There exists an enormous number of possible mappings from the potential multiplex layers structures to the observed aggregate topology. Specifically, the probability space composed by potential multiplex structures has an exponential (ð2 L À 1Þ jA O j ) possibilities with the number of layers L and the number of links in the aggregate network jA O j (see Supplementary Fig. 2 for more details). Q1. Can one conceive a lowcomplexity framework to reconstruct multiplex layer structures, avoiding the enormous cost of ergodic methods 32 ? Various characteristics of a multiplex structure affect the reconstructability. For example, the average degrees and overlap of edges in different layers has a different impact on the reconstruction accuracy. Q2. Is there an indicator that can quantify the fundamental relation between the reconstructability and diverse network characteristics? More known links yield higher reconstruction accuracy while they are more costly. Furthermore, there is a huge number of possibilities to allocate a limited budget in different layers, which is also a combinatorial optimization problem. Q3. Is there an optimal strategy to allocate a limited budget that will enable the highest reconstruction accuracy for various multiplex networks using the above indicator?
In this article, we propose a mathematical and computational framework that can reconstruct a multiplex network and predict its dynamic process on it. We found a discrimination indicator derived from information entropy, integrated by multiple network characteristics, that linearly determines the reconstruction accuracy of multiplex networks. This discovery enables us to design an optimal strategy to allocate a fixed budget for partial observations of the layers, promoting the optimal reconstruction of layers in multiplex networks for the considered network model. Empirical results based on nine real-world multiplex networks and several synthetic networks corroborate our analytical results (see Supplementary Table 1 for details of the real-world datasets). Therefore, our answers to the open questions above are "yes".

Results
Framework for reconstructing multiplex layer structures. We will first introduce the notations by denoting the adjacency matrix of layer α in a multiplex network M by M α (α = 1, 2, ⋯ , L), and M α ij ¼ 1 if there is an edge between nodes i, j in layer α and vice versa (i, j = 1, 2, ⋯ , N). It is hypothesized that each multiplex network M is generated by some process such that the probability of generating a multiplex network with adjacency matrices M α is ∏ α P(M α |θ), where θ represents the parameters of such a process. An aggregate mechanism is a mapping from a multiplex network M to a monoplex (single-layer) topology A O , where A O 2 R N N describes the adjacency matrix. In this article, for illustration, we describe the framework using multiplex networks aggregated by the OR mechanism, which is the most common case ranging from biological networks to social networks (see Supplementary Note 1 for other aggregate mechanisms). Then, we have Partial observation Γ indicates a set that contains the observed edges in the multiplex network, where Γ ¼ fði; j; αÞjA O ij ¼ 1; and M α ij is observedg. Supposing that we have the aggregate topology A O and the partial observation Γ from a multiplex network M with L layers, our goal is to predict whether there is a link between any two nodes i, j in layer α when (i, j, α) ∉ Γ (Fig. 2a). Once given an aggregate topology A O , there are in total L Á jA O j links are expected to predict, where jA O j indicates the number of edge in the aggregate topology A O . Notice that, among the L Á jA O j potential links, we have already observed card(Γ) links, where card(⋅) indicates the elements number of a set. Thus, we denote the fraction of partial observations by c (0 ≤ c < 1), indicating the ratio between the number of observed edges in layers and all edges in the multiplex network, i.e., c ¼ cardðΓÞ LÁjA O j . The first step of the reconstruction is to find the most probable value of θ by maximizing the posterior probability PðθjA O ; ΓÞ, where θ is the network model parameter. Notice that the probability here provides a general description for any form of the network parameter θ (see Supplementary Note 2 for specific forms with a certain network model). Since there is no prior knowledge about the parameter θ, we assume it to have a uniform distribution, i.e., P(θ) = const. 33 . In this case, based on the Bayesian rule, the maximum posterior estimate is equivalent to maximizing the log-likelihood function which performs the maximum likelihood estimation (MLE). The second step is to reconstruct the multiplex structure M, inferring the probability for each possible structure specifically. Since many potential layer structures can produce the same multiplex aggregate topology, we denote the probability distribution for all multiplex structures by Q(M), where ∑ M Q(M) = 1. Then, the estimated parameter θ can reconstruct the multiplex structure by calculating the posterior distribution However, there is a gap between the observations and network model parameters θ, since the multiplex structure M is a hidden variable in l(θ), where lðθÞ ¼ ln ∑ M PðA O ; Γ; MjθÞ. Notice that the sums over M here are expected to be over M that are consistent with A O and Γ. For any distribution Q(M), by employing the Jensen's inequality, we have where the equality holds if and only if the Eq. (3) is satisfied. Thus, l(θ) and Q(M) are interdependent, and we perform an iterative process to obtain the MLE of the parameter θ and the posterior distribution Q(M) as follows. Given an arbitrary initial value θ (0) , we find the optimized posterior distribution Q (k) (M) by Eq. (3). Then, we update the parameters θ (k) that maximize the right-hand side of Eq. (4) by posterior distribution Q (k−1) (M), which performs a coordinate ascent to maximize the loglikelihood function (Fig. 2b). The iterations above are derived from the expectation-maximization (EM) algorithm 34 (details in the "Methods" section), and a toy example is shown (Fig. 2c). Note that if there is any prior on the parameters θ, the proposed framework above can be improved by maximizing the product of the likelihood function PðA O ; ΓjθÞ and the prior P(θ), i.e., the socalled maximum a posterior estimation (MAP). In estimation and statistics theory, an unbiased estimator is called efficient if the variance of the estimator reaches Cramer-Rao lower bound (CRLB) 35 . Fortunately, the proposed framework yields a maximum likelihood estimation, which is an unbiased estimator, and performs asymptotic normality indicating the estimator converges in distribution to a normal distribution 36 . With this, we prove that the variance of the estimator designed in our framework decreases as the fraction of partial observations c increases, and further reaches the CRLB when the network size N approaches infinity (see Supplementary Note 3 and Supplementary Fig. 4 for more details).
Evaluations for the performance of the reconstruction. We now analyze the performance of reconstruction on various real-world multiplex networks. Notice that the framework works for any given analytical generative model, such as Erdos-Rényi random network model and stochastic block model. For illustration, we employ the configuration model as our network model, which exploits the parameter set D to describe model parameters. For a multiplex network composed by L layers, specifically, the parameter set D contains L vectors d α ! 2 R N (α = 1, 2, ⋯ , L), encoding the degree sequences in each layer, where the component d α (i) describes the degree of node i in layer α. The configuration model can significantly reduce the complexity from exponential to polynomial by exploiting the independence of each link and this model has been widely applied to analyze the relationship between structure and function of complex networks [37][38][39] .
As we mentioned in the last section, once a certain network model is determined, we can conduct specific derivations to find the most probable values of all parameters in D by maximizing the likelihood PðA O ; ΓjDÞ (see Supplementary Note 2 for detailed derivations). After estimating degree sequences D, the posterior probability Q α ij can be calculated by which is called here link reliability, measuring the probability that a link exists between node i and node j in layer α (see Supplementary Note 2 for complete algorithm and complexity analysis in this case). We examine the reliability of all links in the testing set E T consisting of potential links except those of the partial observations, i.e., Fig. 5 for a schematic illustration). For this purpose, we calculate the TP (true positive rate) PðM α ij ¼ 1jQ α ij > qÞ, FP (false positive rate) PðM α ij ¼ 0jQ α ij > qÞ, TN (true negative rate) PðM α ij ¼ 0jQ α ij < qÞ and FN (false negative rate) PðM α ij ¼ 1jQ α ij < qÞ in E T , where q is the threshold that determines the classifier boundary for varying classes.
We first vary the threshold q from 0 to 1, and calculate AUC (area under the receiver operating characteristic (ROC) curve) for two real-world datasets (C. elegans neural network and London transportation network) against different c. In the meantime, we compare our results with three-link prediction methods that performed well on inference tasks by partial observations so far. The first relevant work is referred to De Bacco et al., who have proposed a generative model, and an efficient expectationmaximization algorithm, which allows to perform inference tasks such as community detection and link prediction 40 . It works for multiplex networks with groups, but it may fail in networks without group-based structures. The second is referred to Tarres-Deulofeu et al., who introduced a stochastic block model, which When repeating this process, we can obtain the convergent degree sequences d ! ð1Þ 1 , d can take full advantage of the information contained in the whole network 41 . The third baseline is calculated by a single-layer link prediction method, layer by layer, in the multiplex network (Layered model). Notice that all of them did not take the aggregate topology into consideration, while our method can provide valuable insights into how to aggregate topology information helps reconstruction. The AUCs by four methods for C. elegans neural network and London transportation are shown in Fig. 3a, displaying that the aggregate topology truly provides important information (see Supplementary Fig. 13 for more empirical results). Further, we display the ROC curves in the ROC space for c = 0.01, 0.05, 0.25, 0.5, respectively (Fig. 3b, c). Here, the ROC space is defined by plotting the false positive rate on the x-axis and the true positive rate on the y-axis, displaying the relative tradeoffs between false positive (costs) and true positive (benefits). Notice that the ROC curve describes the performance as a continuous function of the threshold q, and can well quantify the true positive rate and the false-positive rate for any given threshold q. Thus, AUC is a scalar that quantifies the performance and it does not depend on the choice of threshold. The results of the ROC analysis show that our proposed framework is very effective for any threshold in the interval of (0, 1), and thus it is stable for different thresholds in various real-world networks. Further, the true positive rate is positively correlated with the false-positive rate, and there exists a threshold, above which a false positive rate increases faster than the true positive rate. It is, thereby, not justified anymore to improve a true positive rate by increasing the false positive rate beyond such a threshold.
Next, we will set specific thresholds for further analysis. For example, since we do have any prior knowledge about the threshold q in the task, we set q = 0.5 to avoid arbitrary choice, i.e., when the link reliability is larger than 0.5, an edge is considered to exist, and vice versa. Then, we calculate the four metrics to evaluate the performance of multiplex reconstruction for the nine real-world datasets. As a result, the fraction of partial observations c, indicating the portion of the observed edges, exhibits a positive correlation with the accuracy of the reconstruction, showing good performance even for a quite small c (Fig. 3f) (see the "Methods" section and Supplementary Fig. 6 for more details for the evaluations). In practice, we sample the partial observations by vertex sampling, and repeat the reconstruction 100 times for each value of c (see the "Methods" section for more details about the sampling of partial observations). . The x-axis indicates the degree k and the y-axis represents the probability that the degree of a randomly chosen node is equal to k. Further, for a specific value of q = 0.5, we compare the accuracy of the reconstructed networks using our proposed framework for nine real-world networks in f by increasing c from 0 to 0.95. The error bar indicates the standard deviation in this figure.
COMMUNICATIONS PHYSICS | https://doi.org/10.1038/s42005-022-00928-w ARTICLE COMMUNICATIONS PHYSICS | (2022) 5:163 | https://doi.org/10.1038/s42005-022-00928-w | www.nature.com/commsphys Besides link reliability, network characteristics also include average degree, degree distribution, length distribution of the shortest paths, which are significant to network reconstruction. One prominent advantage of the reconstruction framework is that we can simultaneously obtain other micro-or mesoscale network properties. For example, the expectation of the degree distribution in layer α can be obtained by E(p α ) = ∑ M Q(M) ⋅ p α (M), where p α (M) is the degree distribution in layer α for any specific multiplex structure M. The reconstructed degree distributions in two layers with different values of c are compared to the ground truth (c = 1), demonstrating that the degree distributions in all layers can also be well reconstructed as c increases (Fig. 3d, e). Generally, the expectation of property X is the first raw moment obtained by E(X) = ∑ M Q(M) ⋅ X(M), while the corresponding variance is the second central moment, Moreover, we can obtain skewness, kurtosis, and higher moments of a property X. We will discuss how different network characteristics (e.g., average degrees and overlap of edges in different layers) impact the performance of reconstruction in the next section.

Reconstructability determined by the discrimination indicator.
It is of great significance to investigate how various characteristics of multiplex networks affect the reconstructability, i.e., the reconstruction accuracy we measured. Without loss of generality, we conduct an analysis of two-layer multiplex networks to illustrate the method in detail. Once a link is observed in the aggregate network, the probability space of the potential multiplex structure contains three events: the potential link exists (i) only in layer 1, (ii) only in layer 2, or (iii) in both layers. Then, the uncertainty of all links in the multiplex network M can be quantified by the information entropy Generally speaking, the smaller the information entropy H is, the more certain the potential multiplex structure is, and vice versa.
To study how different characteristics of multiplex networks impact the information entropy, we first introduce the ratio of average degrees of two layers denoted by r, i.e., r ¼ and k 2 are the average degrees in layer 1 and layer 2, respectively. We assume, without loss of generality, that k 1 ≤ k 2 , such that 0 < r ≤ 1. Then, we consider the overlap of edges denoted by v between the two layers. A high overlap indicates that a link is more likely to exist in one layer if the corresponding link exists in the other layer, i.e., a low uncertainty. To measure the overlap v of a multiplex network, we refer to the Jaccard index of E 1 and E 2 that indicate the two edge sets in the two layers, i.e., v ¼ jE 1 \ E 2 j=jE 1 ∪ E 2 j (see Supplementary Fig. 9 for more details about multiplex network characteristics).
We next explore how these factors impact the information entropy of a multiplex network and further determine the performance of reconstruction. By Eq. (6), we can calculate the information entropy H with a mean-field approximation, and obtain where and indicating the average probability for the existence of any link in layer 1 and layer 2, respectively. Notice that in Eqs. (8) and (9),v andr are the estimations of v and r when we only have partial observations Γ, and we approximate them by c ⋅ v and r c empirically (see Methods section for more details). Thus, we find that the information entropy of a given multiplex network is highly related to the fraction of partial observations c, the ratio of average degrees r and overlap v. It is clear that the uncertainty of the probability space decreases with the increasing of c and v. Hence, the information entropy H is a monotonously decreasing function of c and v over the domain. For r, however, the information entropy is a monotonously increasing function when r increases from 0 to 1 (Fig. 4a). Clearly, H describes the microscale discrimination between layers of a multiplex network, since a high discrimination (e.g., r tends to 0) leads to a low information entropy. Generally, the reconstruction accuracy is expected to be determined by the information entropy H, since H is the primary variable that affects the uncertainty of the distribution of potential multiplex structures. Empirically, we find that the accuracy is linearly determined by the indicator ρ Á H (Fig. 4b), i.e., where ρ is a scaling factor satisfying In Eq. (11) above, s = s (M) is a constant related to the topology of the multiplex network M (see Supplementary Table 2 for approximate values of s (M) . The term (1 − v)/(1 + v) in Eq. (11) indicates the uncertainty of links in the testing set can be reduced by partial observations. The parameter s describes the scale how partial observations can reduce the uncertainty of links in the testing set (details in Methods section). We further find that s (M) is closely proportional to the cosine similarity of the two degree sequences in each layer, i.e., s / coshd 1

!
; d 2 ! i (Fig. 4c). Clearly, i describes the similarity between degree sequences of the two layers in a multiplex network, indicating the mesoscale discrimination, which is not relevant to microscale discrimination including r, v, and H generally (Fig. 4d).
Thus, the reconstructability is determined by the discrimination indicator (1 À ρ Á H) from both microscale and mesoscale views. This discovery indicates that the reconstruction can be predicted accurately by the discrimination indicator, obtaining a high accuracy of reconstruction where either ρ or H is small. For example, the reconstructability can be enhanced when the difference in average degrees between layers is vast (r tends to 0). Notice that we can approximate s by the cosine similarity if we do not meet the exact value of s empirically, since s is highly related to the cosine similarity. We will next discuss how to allocate the partial observations in different layers when a specific budget c is given.
Allocating limited budget for partial observations. Usually, we have a limited budget for conducting observations in practice. It is, thereby, necessary to investigate budget allocation (partial observations Γ) in different layers to optimize the performance of reconstruction (e.g., the accuracy) as far as possible. We denote the average fraction of partial observations in each layer by c, i.e., c ¼ ∑ α c α =L, where c α indicates the fraction of partial observation in layer α, and denote HðM α jA O Þ by H α for simplicity. Similarly, employing the mean-field approximation (details in the "Methods" section), we can predict the accuracy by the function F defined as where We next explore how the performance of reconstruction is impacted by the ratio c 1 /c 2 when a certain budget is given for partial observations. Once c is given, we regard the function F as a unary function of c 1 , i.e., F = F(c 1 ), since c 2 ¼ 2 c À c 1 . Then, the domain of F(c 1 ) is ½0; 2 c if c ≤ 0:5, and is ½2 c À 1; 1 if c > 0: 5. We notice that the function F(c 1 ) is monotonically increasing over the domain if c is small, but decreases at first and increases later if c is large. Theoretical analysis shows that Fð0Þ ≥ Fð2 cÞ if c ≤ 0:5, and Fð2 c À 1Þ ≥ Fð1Þ if c > 0:5 (details in the "Methods" section). The result indicates that it is always better to allocate the budget as much as possible to the layer whose average degree is lower, and we can reach the optimal strategy to obtain the highest accuracy for the given network model then. Moreover, there exists a threshold 0 < c 0 ðMÞ < 1 for each multiplex network M,  where c 0 is the solution to the equation If the budget c is less than c 0 , the accuracy increases when c 1 /c 2 increases, and reaches the maximum as c 1 /c 2 tends to ∞. If the budget c is large ( c > c 0 ), however, the accuracy increases when c 1 /c 2 tends to 0 or ∞, and reaches the maximum as c 1 /c 2 tends to ∞ (Fig. 5a), indicating that the multiplex network can be reconstructed when the aggregate topology and either of the two layers is observed (Fig. 5b). The reason is as follows. The partial observations in different layers can capture the maximal characteristics of each layer when c 1 /c 2 = 1. However, it will lead to more redundancy and lower accuracy if the partial observations in different layers have a high overlap of observations, making the performance even worse when c 1 /c 2 = 1 and c is large. The theoretical results enable us to make the best strategy to allocate budget and thus obtain the optimal reconstruction of multiplex networks in this case. Furthermore, results from realworld data sets verified our theoretical analysis (Fig. 5a, c). We will discuss how different multiplex network characteristics impact the performance of reconstruction from a dynamical behavior point of view in the next section.
Predicting dynamic processes in multiplex networks. We proceed to investigate the performance of the reconstructed multiplex networks on the prediction of dynamic processes, which is critical to the network functionality. First, we study a percolation process occurring on a two-layer interdependent multiplex network. In such a multiplex network, once a set of nodes is removed (e.g., being attacked or random failure) in one layer, nodes disconnected to the GCC (giant connected component) in the same layer and the counterparts of the removed nodes will also fail and thus be removed. The new removed nodes result in more node removal, and the repetitive processes lead to the catastrophic cascade of failures. For the reconstructed multiplex network encoded by the expectation E[Q(M)], we binarize the matrix E[Q(M)] and randomly remove nodes in one layer with probability 1 − p (see Supplementary Note 4 for more details of the process). We calculate the size of GMCC (giant mutually connected component) as a function of p and the critical threshold p c , above which the GMCC exists. We compare the average size of GMCC in the reconstructed network (repeated 100 times) to the real one with the C. elegans neural network against ranging c (Fig. 6a). The performance of reconstruction is well as seen from the size of GMCC, even if c is small. The estimates of the size of GMCC and the critical probability p c approach those of the real networks (c = 1) as c increases 1. However, the proposed method slightly underestimates both the size of GMCC and the critical threshold p c for the C. elegans neural network. Further, simulations on synthetic networks reveal that the method underestimates much more the robustness and p c of the interdependent networks when r is small (Fig. 6b). When r is small and close to 0, the edges in the multiplex network are concentrated on one layer, leading to the extreme vulnerability. However, the reconstructed method could not capture the unbalance, especially for a small fraction of partial observation.
Second, we consider a random walk process taking place on interconnected multiplex networks, where interlayer links only exist between counterparts. We suppose that a number of walkers start from randomly chosen nodes and walk along with intralayer links with a probability p intra , and along with interlayer links with probability p inter (see Supplementary Note 5 for more details). We employ the coverage ϕ(t) as the performance metric, indicating the fraction of nodes that have been visited by the walkers before time t. The coverage at each time on reconstructed multiplex networks are compared to the real one (London multiplex transportation network) against different c, showing an outstanding good prediction as c increases (Fig. 6c). Simulations on synthetic networks show that the coverage will be overestimated no matter r is small or large (Fig. 6d). Moreover, we conduct more simulations with real-world multiplex networks with more than two layers. For example, we simulated the random walk processes on the European air transportation network which has three layers (Ryanair, Lufthansa, EasyJet), and the air transportation network in the U.S. which has four layers (SkyWest, Southwest, American Eagle, American Airlines) (see Supplementary Fig. 10c, d for more results on real-world networks). Last, we investigate a spreading process based on the SI (susceptible-infected) model on temporal networks, where interlayer links only exist between two counterparts at two adjacent time slots 23 . In the SI model, each node has only two states: "susceptible" (S) or "infected" (I), and 5% nodes are randomly chosen to be sources (infected) at initial time t = 0 in our simulations. At each time (which corresponds to one layer in the temporal network), the infected nodes will infect the susceptible neighboring nodes with a specific infection rate λ (see Supplementary Note 6 for more details). In the spreading process, the fraction of infected nodes I(T) at time t = T on the reconstructed networks is calculated and compared to the real one with the network of social interactions at SFHH (La Société française d'Hygiène Hospitalière) conference (Fig. 6e). Simulations on synthetic networks reveal that the method overestimates I(T) when r is close to 1 (Fig. 6f). We also compare these results with the existing methods (see Supplementary Fig. 11 for more details). Further, we simulated the spreading processes on the Wiki-talk network which has 14 layers (14 weeks), and the CollegeMsg network that has 22 layers (22 days) (see Supplementary Fig. 10e, f for more results on real-world networks).
Moreover, we have studied how the performance of reconstruction for dynamics is influenced by more network characteristics, including the overlap of edges and ratio of heterogeneity (see Supplementary Fig. 12 for more results on synthetic network with different characteristics).

Discussion
Network reconstruction has attracted much research attention recently 24,31,42 , since it has wide applications such as link prediction, community detection and systems' vulnerability analysis. Most previous studies focused on monoplex networks, and therefore there is a pressing need to develop a reconstruction framework for multiplex networks. Existing work have met success to determine if an observed monoplex network is the outcome of a hidden multilayer process by assuming generative models for each layer of the multiplex network. However, it is necessary to further explore the multiplex layers structure and predict the dynamics once it is verified that there is a hidden multiplex structure. Our primary goal is to reconstruct the multiplex structure from the knowledge of both an observed aggregate monoplex network and partial observations. However, there are many challenges preventing us to build a framework for the reconstruction of multiplex networks. Given the aggregation mechanism (e.g., the OR mechanism), apparently, there are a large number of potential structures given the same aggregate network. To avoid the ergodic methods, we propose a framework by building a probability space Q(M), and reduce the complexity from exponential to polynomial by employing the configuration model that allows an arbitrary degree sequence. Generally, priors on generating models or other dynamic information are almost unavailable, while the local information (referred to as partial observations in this article) in some specific layers is more accessible. For this purpose, we need to estimate the network model parameter θ based on very limited partial observations and, unfortunately, it is interdependent on the posterior probability distribution Q(M). We design here an efficient mathematical framework based on the Expectation-Maximization method performing a maximum likelihood estimation, and prove that the variance of the estimation reaches the CRLB when network size N approaches infinity. We evaluate the performance of the reconstruction using various empirical multiplex networks, ranging from microscale (e.g., the accuracy of link reliability) to mesoscale (e.g., degree distributions). Empirical results demonstrate that the performance of reconstruction mounts quickly initially with a small fraction of partial observations, exhibiting the power of the proposed reconstruction framework. Applying mean-field approximation, we find that a discrimination indicator that integrates all considerable factors determines the accuracy of the reconstruction, which theoretically aids us to have a deep understanding the relation between the reconstructability and the network characteristics. Thus, the indicator enables us to make the best strategy to allocate limited budget and obtain the highest accuracy for the given network model, i.e., the optimal reconstruction. Notice that though our analysis was illustrated by the configuration model that was based on the degree sequence, we found that the empirical results did not rely on the degree sequence. Finally, we investigate the performance from a dynamical view, finding that our proposed framework can well predict dynamic processes taking place on multiplex networks, and the impact of network characteristics (e.g., average degree, heterogeneity and overlap of edges in different layers) on performance is analyzed.
To the best of our knowledge, we provide here a comprehensive mathematical framework for reconstructing a multiplex network layer structures although there exist several good related works. For example, Bianconi et al. studied an ensemble consisting of all multiplex networks submitted to a given degree sequence, which is a constraint in their case 43 . In our work, however, the degree sequence is the parameter (of the configuration model) to be estimated, which is a fundamental distinction between our work and Bianconi's case. Thus, our framework paves insight on the relation between the structure and the reconstructability of multiplex networks, and our finding reveals the essential features of multiplex network reconstruction. In future work we will focus on improving the reconstruction performance by data other than node-pair interactions that can be helpful in terms of reconstructing multiplex networks. We took the configuration model as an illustration for our framework, since it is a relative general model that allows an arbitrary degree sequence. However, it remains necessary to study how well does the framework perform with group-based models or centralitybased models. Note that it is also interesting to reconstruct multiplex networks through the dynamics and steady state of the system 44,45 . We believe that our proposed framework will have an impact in many different applications including link prediction, missing links recovery, spurious links location drawn from biological, social, transportation domains.

Methods
Expectation maximization framework. In this section, we will present details on how to obtain the maximum likelihood estimation of network model parameter θ.
The primary objective is to find θ that maximizes the likelihood function PðA O ; ΓjθÞ. In practice, we will maximize its logarithm lðθÞ ¼ ln PðA O ; ΓjθÞ rather than PðA O ; ΓjθÞ for computational convenience. Clearly, by employing the law of total probability, we have which is a summation over all possible potential multiplex structure M. Employing the Jensen's inequality then, we have where the inequality holds as long as Q(M) is a distribution of the multiplex structure M satisfying ∑ M Q(M) = 1. For simplicity, we denote the right-hand side of equation (16) Notice that J is a function of the distribution Q(M) and the parameter θ simultaneously, and it is obviously a lower-bound function of the log-likelihood function l(θ).
In the expectation-maximization (EM) algorithm 34 , we will maximize the lower-bound function J by recursively executing two steps: E-step and M-step. In the E-step, we maximize J(Q, θ) while keeping θ constant. It is easy to see that the equality in Eq.
We can see that the sequence {l(θ (k) )} monotonously increases as k grows. On the other hand, the log-likelihood sequence {l(θ (k) )} obviously has a upper bound, i.e., max ln PðA O ; ΓjθÞ. Then {θ (k) } converges to the value that loccally maximizes the log-likelihood function l(θ) 46 . In practice, however, the log-likelihood function may have more than one local maximum values in more complex situations, while the EM algorithm is not guaranteed to converge to the global one. To overcome the problem, we try different random initial values for the parameter repeatedly, and find the global maximum of the likelihood value when they converge 47 .
Evaluation indices for reconstruction. Accuracy, precision, recall, and AUC (area under the receiver operating characteristic curve) have been widely adopted to evaluate classification methods 48 . Accuracy is defined by the fraction between true results (both true positives and true negatives) and the total number of tests, i.e., accuracy = (TP + TN)/(TP + TN + FP + FN); Precision gives the probability that a link exists in real network when reliability Q α ij >q, i.e., precision = TP/(TP + FP); Recall equals to the proportion of true positive rate over true positive rate and falsenegative rate, i.e., recall = TP/(TP+FN). In addition, for those links whose reliability Q α ij ¼ q, they have half contribution to the proportion. The AUC quantifies the expectation that the proposed method ranks a positive one higher than a negative one. Thus, all the tested links are ranked decreasingly according to their values of reliability, and the probability that a real link has a higher reliability than a nonexistent link is calculated. For imbalanced datasets (networks always tend to be sparse), the area under the precision-recall curve (AUPRC) is also a meaningful metric. Moreover, the Matthews correlation coefficient (MCC) is another popular choice, which produces a more informative and truthful score in evaluating binary classifications 49 . These six metrics are used to evaluate the proposed framework for the nine real-world data sets ( Supplementary Figs. 6 and 7).
Partial observation sampling. In this section, we would clarify the sampling of partial observations in our simulations. We sampled them by vertex sampling for a given fraction c. Specifically, for a large undirected network, there are N 2 2 pairs of nodes, where N is the number of nodes. Note that, once we have observed these N 2 2 pairs, the network is certain and we do not need to infer the edge probability any more. Then, we randomly choose n ¼ ffiffi c p Á N nodes, and the subgraph consisting of these nodes includes ð ffi ffi c p ÁNÞ 2 2 ¼ c Á N 2 2 edges in the original network. For each value of c, we repeated the reconstruction 100 times to avoid randomness, and drawn error bars in relevant figures of the manuscript.
Mean-field approximation. Our goal in this section is to clarify the discrimination indicator introduced in Results section by mean-field approximation. First we will calculate the information entropy HðMjA O Þ determined by the ratio of the average degrees r, overlap of edges v, and fraction of partial observations c.
Notice that r ratio of average degrees of two layers and v is the overlap of the edge sets in two layers, and they are not available when we only have partial observations, and thus they are estimated byrðr; cÞ andvðv; cÞ. Since the average degree k α is a mesoscale property, we estimatek α by E½ k α ¼ ∑ M ½QðMÞ Á k α ðMÞ, indicating the expectation of k α for all potential multiplex structure. The results shown in Supplementary Fig. 8 allow us to estimatek α approximately bŷ Thus, we haver Noticing that we can approximater byr % r c , and approximatev byv % c Á v for simplicity in practice. Moreover, we also designed nonlinear approximation with a higher precision. However, the approximations that are more precise but more complex would only slightly improve the final results (see Supplementary Note 7 for the evaluations of the approximations).
Next, we explore the expression of H with parameters c, r, and v. Supposing that links between different nodes are independent, we obtain For the OR-aggregation mechanism, HðM 1 ij jA O ij Þ and where and Thus, we obtain We empirically find that the accuracy of the reconstruction is linearly determined by the product of H and scaling factor ρ, i.e., Notice that and When observing an edge in layer α, the probability that the edge exists in the other layer satisfies ð36Þ Thus, the term (1 − v)/(1 + v) in Eq. (33) indicates the fraction of uncertainty that can be reduced by partial observations, and s describes the scale how partial observations can reduce the uncertainty of links in testing set.
Synthetic networks generation. To have a deep exploration of the framework proposed in this article, we generate several synthetic networks with various network characteristics for performance evaluation. Here we mainly focus on twolayer networks with different r, v, and coshd 1 ! ; d 2 ! i as we defined in the Results section.
As shown in Fig. 4b, we test synthetic networks ranging 0 < r ≤ 1 and 0 ≤ v ≤ 1/2. We first clarify how to generate a multiple network with a given r * and v * . When given r * and v * , we generate the adjacency matrix M 1 (the first layer in the multiplex network) by the Erdős-Rényi model, which indicates the edge M 1 ij between any two nodes i and j submitted to the Bernoulli distribution Without loss of generality, we take p ¼ 5 NÀ1 such that the average degree k 1 ¼ 5 to avoid being too sparse. Then, we generate the adjacency matrix M 2 (the second layer in the multiplex network) by a specific way, where M 2 ij is submitted to the distribution and Next we prove that the parameters r and v of the generated multiplex network M satisfy r = r * and v = v * .
Obviously, the expectation for average degree k 1 satisfies and the expectation for average degree k 2 satisfies Thus, Then, the expectation of jE 1 \ E 2 j satisfies and the expectation of jE 1 ∪ E 2 j satisfies Thus, As shown in Fig. 4c . This generating process can also provide multiplex networks of different 0 < r h ≤ 1 as shown in the Supplementary Fig. 12a, b, c, since we can also generate the degree sequences with a given variance.

Data availability
All data needed to evaluate the conclusions in the paper are available online as follows.