Layer aggregation and reducibility of multilayer interconnected networks

Many complex systems can be represented as networks composed by distinct layers, interacting and depending on each others. For example, in biology, a good description of the full protein-protein interactome requires, for some organisms, up to seven distinct network layers, with thousands of protein-protein interactions each. A fundamental open question is then how much information is really necessary to accurately represent the structure of a multilayer complex system, and if and when some of the layers can indeed be aggregated. Here we introduce a method, based on information theory, to reduce the number of layers in multilayer networks, while minimizing information loss. We validate our approach on a set of synthetic benchmarks, and prove its applicability to an extended data set of protein-genetic interactions, showing cases where a strong reduction is possible and cases where it is not. Using this method we can describe complex systems with an optimal trade--off between accuracy and complexity.

Network science has shown that characterising the topology of a complex system is fundamental when it comes to understanding its dynamical properties [1,3,21].However, in most cases, the basic units of real-world systems are connected by different types of interactions occurring at comparable time scales.For instance, this is the case of social systems, in which the same set of people might have political or financial relationships [25], or might be interacting using different platforms like e-mail, Twitter, Facebook, phone calls, etc. [18,30].Similarly in biological systems, basic constituents such as proteins can have physical, co-localization, genetic or many other types of interactions.Recently, it has been shown that retaining the whole multi-dimensional information [8] in the modeling of interdependent [6,13] and multilayer systems [9,17,20,22] leads to new non-trivial structural properties [2,11,24] and unexpected levels of dynamical complexity [10,14,15,23,26].On the other hand, nothing is known about the inverse problem, that is under which circumstances a more compact aggregated representation of a system can give the same information of a fully multilayer representation.Inspired by quantum physics, where a similar question emerges to quantify the distance between mixed quantum states [19], we propose a method to aggregate some of the layers of a multilayer system without a sensible loss of information.Our procedure is based on the application of information theory to graphs, and allows to construct a reduced representation of a multilayer network which provides a good trade-off between accuracy and compactness.

QUANTIFYING THE INFORMATION CONTENT OF A MULTIPLEX NETWORK
In quantum mechanics, there are pure states, describing the system by means of a single vector in the Hilbert space, and mixed states, arising from composite quantum systems described by a statistical ensemble of pure states.The most general quantum system can then be described by the so-called density operator ρ, a semidefinite positive matrix with eigenvalues summing up to 1, which encodes all the information about the statistical ensemble of pure states of the system [12].A widely adopted descriptor to measure the mixedness of a quantum system is given by the Von Neumann entropy, the natural extension of Shannon information entropy to quantum operators, although other definitions, satisfying extensivity or nonextensivity paradigms, have been lately introduced and studied [28].The Von Neumann entropy is defined for any density operator ρ.In particular, if the Von Neumann entropy is zero, then the system is in a pure state, otherwise it is in a mixed state.
The quantum mechanics formalism can also be used to describe a complex system, if we imagine that each layer of a multilayer network represents one possible state of the system, so that the entire network is described by an ensemble of states.It has already been shown that the amount of information carried by a single-layer network can be quantified by the Von Neumann entropy of the graph [4], represented by a matrix which resembles quantum operators.Such a matrix can be obtained from the Laplacian associated to the graph, after a proper normalisation (see Appendix for details).The resulting number effectively summarises the complexity of the wiring patterns of a network.Here we propose to use the Von Neumann entropy to quantify the information gained or lost by aggregating some of the layers of a multiplex network in a single graph.This problem is surprisingly related to the separability of mixing states in quantum systems [7,16,27], where, for instance, the entanglement of pure states is quantified by considering the minimum information loss due to a complete local measurement, in terms of the corresponding density operator ρ.
If the existence of inter-layer connections among the different replicas of the same node at the various layers of a multi-layer network is implicitly assumed, while their weights can not be defined or is not taken into account, the network is a multiplex network, i.e. an edge-colored multi-graph [17] that can be represented [22] by a set A = {A [1] , A [2] , . . ., A [M ] }, whose elements are the adjacency matrices of the M layers.Placing each adjacency matrix of A in the diagonal of a (N ×M )×(N ×M ) block matrix, while setting to zero the entries in off-diagonal blocks, the above representation is casted into a supraadjacency matrix [14] A, a special flattening of the rank-4 adjacency tensor, a more general representation of multilayer networks [9].Exploiting this mathematical representation, the Von Neumann entropy h A of the interconnected multilayer network is computed using Eq. ( 3) (see Appendix), as a function of the N × M eigenvalues of the normalised Laplacian supra-matrix associated to A [9].In the specific case of an edge-colored multigraph this entropy reduces to the sum of the Von Neumann entropies of its layers, i.e., and λ [α] i are the eigenvalues of the corresponding Laplacian matrix of A [α] .

QUANTIFYING THE INFORMATION LOSS IN (PARTIALLY) AGGREGATED MULTIPLEX NETWORKS
Storing, handling and manipulating multiplex networks requires an amount of space and computational power which increases at least linearly with the number of layers of the system.It is therefore natural to ask whether the additional information obtained by explicitly considering the M available layers of a system as separate levels is indeed necessary to characterise it, or if instead the dimensionality of the network can be reduced without a sensible loss of information by aggregating some of the layers which carry redundant information.
The Von Neumann entropy of a multiplex explicitly depends on the actual number and structure of layers of which it consists, its value being larger than the Von Neumann entropy of the corresponding aggregated graph, by construction.To measure the information loss due to the aggregation of a M -layer multiplex in a single-layer graph, we use the relative entropy where = A [1] ⊗ A [2] ⊗ . . .⊗ A [M ] is the multiplex where all the M layers are kept separated, = A [1] ⊕ A [2] ⊕ . . .⊕ A [M ] is the associated single-layer aggregated graph and h ⊗ and h ⊕ are the corresponding Von Neumann entropies.The rescaling factor M −1 is necessary for a correct comparison between h ⊕ and h ⊗ .The quantity q(M ) measures the additional information obtained by considering a M -layer multiplex representation of the system instead than a single-layer aggregated graph.In particular, if all the layers of the multiplex are identical then q(M ) = 0, since no layer adds new information to that already encoded in the corresponding aggregated graph.Conversely, higher values of q(M ) indicate that the M -layer representation is more informative than a single-layer aggregation.We notice that it is possible to obtain higher values of the relative entropy in Eq. ( 1) by considering a -layer multiplex where each layer corresponds either to one of the original layers or to the aggregation of some of them.
In general, the optimal configuration of aggregated layers is the one which maximises q(•), but finding such a configuration would in general require the enumeration of all the possible partitions of a set of M objects (the layers), which is a well-known NP-hard problem (i.e., its solution requires a computational time which scales exponentially with M ).To overcome this problem, we employ a different approach, similar in spirit to the one adopted in quantum physics to quantify the distance between mixed quantum states [19].More specifically, capitalizing on the concept of Von Neumann entropy of a graph, we use the quantum Jensen-Shannon divergence to quantify the (dis-)similarity between all pairs of layers of a multiplex (see Eq. ( 5) and Appendix).This choice is justified by the peculiar mathematical properties of this measure, which allows to define a metric distance and can be used to perform a hierarchical clustering of the layers.The result of this procedure is a dendrogram (see Fig. 1), i.e., a hierarchical diagram in which each of the M leaves is associated to one of the original layers of the system, each internal node indicates the aggregation of (clusters of) layers into a single network and the root corresponds to the fully aggregated graph.After the m th step of the algorithm, we obtain a new multiplex network consisting of M −m layers, for which we can compute the associated value of relative entropy q(M − m).The cut of the dendrogram with maximal value of q(•) corresponds to the (sub-)optimal configuration of layers in terms of relative information gain with respect to the aggregated graph.
The whole procedure proposed here is sketched in Fig. 1 and can be summarised as follows: i) compute the quantum Jensen-Shannon distance matrix between all pairs of layers; ii) perform hierarchical clustering of layers using such a distance matrix and use the relative change of Von Neumann entropy as the quality function for the resulting partition; iii) finally, choose the partition which maximises the relative information gain.

Layer aggregation of synthetic multiplex networks
To shed light on the impact of the layer aggregation procedure proposed here on the structural properties of a multiplex network, we considered different benchmarking scenarios.Each benchmark consists of several layers characterised by specific features or a given amount of correlation.In Fig. 2 we report the case of a multiplex network in which the layers are obtained by rewiring different percentages of the edges of the first layer.The layers of the resulting multiplex network are characterised by an increasing amount of edge overlap.As shown in the Figure, the hierarchical clustering procedure first aggregates layers which are more similar to each other (namely, the layers which correspond to an amount of edge rewiring smaller than 50%) and then merges the layers characterised by higher rewiring.The monotonically decreasing behaviour of the relative entropy q(•), shown in Fig. 2(c), confirms that in this case the best representation of the system is the one in which all the layers are kept distinct.In fact, independently of the fraction of edges actually rewired, on average a pair of layers exhibits a relatively small redundancy, since each of the rewired layers carries some information which is not included in the other layers (this multiplex has an overall overlap smaller than 5%).
The results obtained from synthetic multiplex networks suggest that layers with high overlap and similar topology tend to be aggregated first.This corroborates our procedure showing that the principle of minimum information loss is satisfied.

Layer aggregation of multilayer biological networks
To test the usefulness of our proposal on real-world networks, we consider here the multiplex networks obtained by taking into account different types of genetic interactions in 13 organisms of the Biological General Repository for Interaction Datasets (BioGRID).This is a public database that stores and disseminates genetic and The first layer is a scale-free graph with P (k) ∼ k −3 , while the other layers are obtained by rewiring an increasing percentage of the edges of the first layer, from 5% up to 95%.By doing so, each pair of layers is characterised by a different amount of edge redundancy (the total overlap of the multiplex is < 5%).(a) The heat map shows the Jensen-Shannon distance between the twenty layers, where each layer is identified by the corresponding percentage of rewiring.(b) The hierarchical clustering procedure successively merges layers with a decreasing percentage of redundant edges.(c) In this case q(M − m) is a decreasing function of m, since each layer has some unique edges which are not present in the others.Consequently, the best representation of the multiplex is that in which all the layers are kept separated.
protein interaction information from model organisms and humans (thebiogridd.org),and currently holds over 720,000 interactions obtained from both high-throughput data sets and individual focused studies, as derived from over 41,000 publications in the primary literature.We use BioGRID 3.2.108(updated to 1 Jan 2014) [29].In this data set, the networks represent protein-protein interactions and the layers correspond to interactions of different nature, i.e., physical (labelled "Phys" in the following), direct ("Dir"), co-localization ("Col"), association ("Ass"), suppressive ("GSup"), additive ("GAdd") or synthetic genetic ("GSyn") interaction.The number of layers identified for each organism ranges from 3 to 7.
In Fig. 3 we show the results obtained on three organisms (C.elegans, Mus and Candida).Despite the multiplex networks corresponding to these organisms have a similar number of layers (six for C. elegans, seven for Mus and Candida), each of them is characterised by a peculiar level of reducibility.In particular, in the case of C. elegans no layer aggregation is advisable at all, since the maximum value of q(•) is obtained for the multiplex in which all the six layers are kept distinct.Conversely, in the case of Mus and Candida some pairs of layers carry redundant information and can be aggregated without a sensible loss of information.
In Fig. 4 we summarize the results obtained by applying the proposed layer aggregation procedure to multilayer genetic interaction networks in the BioGRID data set.This particular visualization allows to compare the reducibility of all organisms, simultaneously.Not all multiplex networks can be reduced to a smaller number of layers, suggesting that for some organisms layer aggregation should be avoided.For instance, this is the case of m (nematode), Arabidopsis thaliana (cress) and Bos taurus (mammal), where no global maximum is present -except for m = 0, i.e. the initial multiplex.In other cases, reducibility might take place as, for instance, in Saccharomyces cerevisiae (yeast) and Drosophila melanogaster (common fruit fly), where a global maximum of q(•) is present at m = 2.
We have proposed a practical procedure to aggregate layers of a multilayer network and we have presented an application of our method to the case of biological data sets.Nevertheless, our method is not limited to this kind of data and can be applied to other multilayer networks.For instance, we have applied it to the edge-colored multi-Figure 3. Layer aggregation of protein-genetic interaction networks.The multiplex protein-genetic networks of different species have different levels of reducibility.We show the heat map of the Jensen-Shannon divergence, together with the dendrogram resulting from hierarchical clustering and the corresponding values of q(•), in three of the 13 species considered in this study.The dashed red lines identify the maximum of the global quality function q(•).For some organisms (like C. elegans, reported in panel (A)), such maximum is obtained by leaving all the layers separate and no aggregation is possible, while for some other species a few layers carry redundant information, e.g. in (B) Mus and in (C) Candida, and can be safely compressed without sensible loss of information.
graph of European airports [8], where each layer indicates an airline, and we have found that this transportation system can not be reduced to a smaller number of layers.This result indicates that the connectivity between airports is not redundant for any airline, as expected in the case of a large-scale transport infrastructure.versus the of merges in the hierarchical clustering procedure for the protein-genetic interaction multiplex networks of all the 13 organisms considered in this study (the plots are vertically rescaled to avoid overlaps).The values of q(•) are not reported in the y−axis because only the existence of a global maximum, and the corresponding value of m in the x−axis, is meaningful for the analysis.For each organism, q(•) has a maximum corresponding to the partition of the layers which minimises layer redundancy at the cost of a small loss of information.
this procedure M − 1 times we obtain a dendrogram, i.e. a hierarchical diagram whose M leaves are associated to the original layers of the system, internal nodes indicate merges of (clusters of) layers and the root corresponds to the aggregated graph.

Figure 1 .
Figure 1.Layer aggregation of multilayer networks.Given a multiplex network (A), we compute the the Jensen-Shannon distance between each pair of its layers (B), which is a proxy for layer redundancy.The resulting distance matrix allows to perform a hierarchical clustering, whose output is a hierarchical diagram (a dendrogram) whose leaves represent the initial layers and internal nodes denote layer merging (C).At each step, two layers (or group of layers) are merged and the information gain (or loss) is quantified by the global quality function q(•), shown by the curve on the left-hand side of panel C.The merging procedure is stopped when q(M ) is maximum, obtaining a reduced version of the original multiplex network (D).

Figure 2 .
Figure 2. Multiplex benchmark.We considered a benchmark multilayer network with N = 5000 nodes and M = 20 layers.The first layer is a scale-free graph with P (k) ∼ k −3 , while the other layers are obtained by rewiring an increasing percentage of the edges of the first layer, from 5% up to 95%.By doing so, each pair of layers is characterised by a different amount of edge redundancy (the total overlap of the multiplex is < 5%).(a) The heat map shows the Jensen-Shannon distance between the twenty layers, where each layer is identified by the corresponding percentage of rewiring.(b) The hierarchical clustering procedure successively merges layers with a decreasing percentage of redundant edges.(c) In this case q(M − m) is a decreasing function of m, since each layer has some unique edges which are not present in the others.Consequently, the best representation of the multiplex is that in which all the layers are kept separated.
A.A. and M.D.D. are supported by MINECO through Grant FIS2012-38266; by the EC FET-Proactive Project PLEXMATH (grant 317614) and the Generalitat de Catalunya 2009-SGR-838.A. A. also acknowledges partial financial support from the ICREA Academia and the James S. McDonnell Foundation.V.L. and V.N. aknowledge support from the Project LASAGNE, Contract No. 318132 (STREP), funded by the European Commission.

Figure 4 .
Figure 4. Reducibility of protein-genetic networks in the BioGRID data set.The global quality function q(•)versus the of merges in the hierarchical clustering procedure for the protein-genetic interaction multiplex networks of all the 13 organisms considered in this study (the plots are vertically rescaled to avoid overlaps).The values of q(•) are not reported in the y−axis because only the existence of a global maximum, and the corresponding value of m in the x−axis, is meaningful for the analysis.For each organism, q(•) has a maximum corresponding to the partition of the layers which minimises layer redundancy at the cost of a small loss of information.