The multiplex network of human diseases

Untangling the complex interplay between phenotype and genotype is crucial to the effective characterization and subtyping of diseases. Here we build and analyze the multiplex network of 779 human diseases, which consists of a genotype-based layer and a phenotype-based layer. We show that diseases with common genetic constituents tend to share symptoms, and uncover how phenotype information helps boost genotype information. Moreover, we offer a flexible classification of diseases that considers their molecular underpinnings alongside their clinical manifestations. We detect cohesive groups of diseases that have high intra-group similarity at both the molecular and the phenotypic level. Inspecting these disease communities, we demonstrate the underlying pathways that connect diseases mechanistically. We observe monogenic disorders grouped together with complex diseases for which they increase the risk factor. We propose potentially new disease associations that arise as a unique feature of the information flow within and across the two layers.

: The observed overlap is statistically highly significant against a background of randomized network realizations.
the clustering coefficient distribution. We find that while both layers have heavy tailed degree distributions, the maximum degree in the phenotype layer is much higher than the genotype layer (Fig. S3). This is due to the criterion for forming a link between diseases for the phenotype layer (i.e. if they share a symptom) than the respective criterion for the genotype layer (i.e. if they share a disease gene), following from the fact that many symptoms are rather non-specific, resulting in potentially many connections for a given disease.
Another important aspect underlying the structural difference between the two layers is the distinct levels of heterogeneity of the two networks. In order to study this aspect and learn more about the local granularity of each network, we plot their local clustering coefficient distribution P (C i ). We find that the distribution of local clustering coefficients is bimodal in both cases, owing to the presence of both densely clustered regions with C i close to 1 (due to diseases with wide-ranging symptoms, such as multi-system diseases, in the phenotype layer; and complex diseases involving numerous gene mutations, such as cancers, in the genotype layer) and sparsely connected regions C i close to 0 in both networks. However, the two networks are significantly different (P = 5.25E − 13, two-sided Mann-Whitney U test) from each other in which type (C i close to 0 or C i close to 1) is more predominant in the network (Fig. S4. In particular, we note that the phenotype layer has more of the clique-like group of diseases (also the root of its bimodal degree distribution), which results in a higher portion of nodes with C i = 1.0. The opposite is true for the genotype layer where we see a higher portion of nodes with C i = 0, indicating that, overall, the clustering is weaker compared to the phenotype layer.
Following from the presence of heterogeneous dense regions in each layer, we observe that the overlapping links tend to be localized around what can be called "overlap hubs" rather than placed homogeneously. Indeed, the heavy-tailed degree distribution of the overlap links confirms this observation (Fig. S5). A closer look at the diseases that have a high number of edges in the overlap network reveals that these are either groups of diseases (such as retinis pigmentosa, k = 8) or multi-system diseases (such as lamb-girdle muscular dystrophy, k = 8) that affect a number of organs and have wide ranging symptoms as well as common genetic factors with other diseases, resulting in their hub status in the overlap network.  3 Notes on the use of multiplex Infomap and the choice of relax rate In the case of single-layer networks, different community detection methods provide, in general, different results, unless the community structure is extremely strong. However, while in the single-layer case there are many algorithms that can be used and compared, for instance to obtain an overall consensus, in the case of multiplex networks this is not possible. There are only a couple of grounded methods, attempting to generalize single-layer approaches. One of them is the multislice modularity maximization introduced by Mucha et al [1], which depends on a resolution parameter and on the weight (uniformly) assigned to inter-layer connections.
The main drawback of this method is that in our case inter-layer links are missing and not even theoretically reasonable to explain. Instead, the multiplex version of the InfoMap algorithm has been designed ad hoc to cope with the absence of inter-layer links and depends only on one parameter, the relax rate. The relax rate might seem arbitrary, therefore, in absence of further information, we invoke the maximum-entropy principle and choose the relax rate that equally gives importance to local exploration and global teleportation, i.e. r = 0.45. To demonstrate the robustness of the results with respect to changes in the relax parameter, we compare the results we obtained from the data against an ensemble of M = 199 Monte Carlo realizations representing a specific null model. In our null model, the degree distribution of each layer is preserved but intra-layer correlations are destroyed by rewiring the links, whereas inter-layer correlations are preserved, because each node keeps the original number of links (ie, its degree) in each layer separately. We therefore apply our community detection algorithm to the ensemble and perform a non-parametric test to quantify the significance of our findings. We use the Wilcoxon-Mann-Whitney test, where the value s of the statistics obtained from the data (in this case the code-length and the number of modules) is compared against the values s i (i = 1, 2, ..., M ) obtained from the random realizations of the null model. The two-sided test rejects the null hypothesis that the data are compatible with the null model either if s < s i or s > s i for all i. The statistical significance of the test is α = 2/(M + 1), and in our case it is 1%. Figure S6 shows that our statistical test rejects the null hypothesis either using code-length (top panels) or number of modules (bottom panels) as test statistics, regardless of the relax rate used (here we show the results for r = 0.45, 0.50 and 0.55). We can therefore assess that the clusters of diseases found by our algorithm are significant with 99% confidence level with respect to a null model where the networks in the two layers preserve the underlying degree distribution and inter-layer correlations. The significance of the result is further reduced if inter-layer correlations are destroyed as well, by randomly relabeling the nodes.

Counts
Relax rate: 0.55 Figure S6: The Wilcoxon-Mann-Whitney statistical test rejects the null hypothesis either using code-length (top panels) or number of modules (bottom panels) as test statistics, regardless of the relax rate used (here we show the results for r = 0.45, 0.5 and 0.55). The dashed lines indicate the respective test statistic s obtained from data whereas the data points indicate the text statistics s i (red for code length and blue for number of modules) obtained from the random realizations.

Specificity of single-layer disease communities
We first detect communities for the genotype and the phenotype layer separately using the Infomap algorithm. Then, to find out whether there is significant disease overlap of communities found separately in each layer, we calculate the average Jaccard index for the disease overlap between all possible genotype-phenotype community pairings. This gives us an observed average Jaccard index J = 0.00286. To check if this indeed indicates a low overlap, we then follow the same procedure on randomized ensembles for a total number of 5000 realizations. Since the topological aspects of each network are crucial in the resulting community structure, we randomize the network in each layer by keeping the degree distribution constant, using degreepreserving randomization. The mean of the average Jaccard distribution of this randomized ensemble is J = 0.00261 ± 0.000289, resulting in a picture where the actual disease overlap is indistinguishable from the random case with a z-score = 0.882 (Fig. S7). Figure S7: The communities found by the Infomap algorithm separately on the two layers have a low overlap comparable to random expectation.

Size distribution of multiplex disease communities
Overall the 779 diseases are divided into 128 multiplex communities. Out of the 128 disease groups, the largest one has 91 diseases, whereas the smallest one has 2 diseases. The number of unique diseases classified into groups based on Multiplex Infomap reaches a plateau rapidly, with the disease communities of size greater than or equal to 10 comprising 520 diseases (Fig.  S8).

Bridge diseases
As a result of the multiplex community detection method, many diseases are classified into two different communities, acting as "bridges" between the two communities that they belong to. Among 779 diseases, 215 are in this category. For the 29 largest disease communities that have a size greater than 10, the connections between communities are shown in Fig. S9. We note that some smaller disease communities are more abundant in terms of the number of bridge diseases they have, such as disease community 23, 26 and 28. We compare the 215 bridge diseases with the remaining 564 non-bridge diseases to find out whether there are any discerning topological properties of bridge diseases. In particular, we compare the degree (sum of the number of connections, clustering coefficient, centrality, and strength (sum of weights) distributions of the bridge and non-bridge diseases. Our results show mostly no statistically significant difference between the two groups of diseases. For degree distributions, the Mann-Whitney U p-values are 0.40 and 0.11 for the genotype and phenotype layer, respectively. For clustering coefficients, the Mann-Whitney U p-values are 0.03 and 0.44 for the genotype and phenotype layer, respectively. The mean clustering coefficient of bridge and non-bridge diseases in the genotype layer are 0.48 and 0.41, respectively, which means that the bridge diseases have significantly higher clustering coefficient in the genotype layer than the non-bridge diseases. For eigenvector centrality, the bridge diseases have significantly lower eigenvector centrality than non-bridge diseases in both the genotype layer (0.012 vs 0.014, Mann-Whitney p-value 0.04) and the phenotype layer (0.0150 vs 0.0152, Mann-Whitney p-value 0.004). For betweenness centrality, the bridge diseases have significantly higher eigenvector centrality than non-bridge diseases in the phenotype layer (0.0022 vs 0.0017, Mann-Whitney p-value 0.006) but not the the genotype layer (0.0027 vs 0.0039, Mann-Whitney p-value 0.14). Finally, for strength distributions, the difference is not significant for either the genotype or the phenotype layer (with Mann-Whitney U p-values 0.38 and 0.07, respectively). These results are summarized in Fig. S10.

Pairwise disease similarity heatmaps
To assess the intra-similarity of multiplex disease communities, we calculate the pairwise similarity of all disease pairs in the communities to use as the background distribution when calculating the significance of intra-similarity. The whole similarity matrices for GO: Biological Process, comorbidity as measured by relative risk, gene overlap Jaccard index, and semantic similarity are shown in Figs. S11, S12, S13, S14, respectively.

Selected community
All disease communities P-value by Mann-Whitney U Figure S15: Similarity assessment of multiplex disease communities. Each disease community is evaluated for significance of intra-community similarity according to the four similarity measures. For each similarity measure, the distribution of similarity values for the disease community is compared against the background of all disease communities and its respective P-value is calculated by two-sided Mann-Whitney U test. Communities that have an average similarity higher than the background and a P-value smaller than 0.01 are considered as significantly similar for the given similarity measure.

Validation of potential disease associations using Disease-Connect
As an unbiased, global, quantitative comparison of all multiplex and individual layer disease communities, we next measured the disease communities' propensity to capture disease associations missing from both the genotype and the phenotype layers, thereby highlighting potentially novel disease associations that can be derived from our data that might be of interest to clinicians and biomedical researchers. We consider all multiplex and single-layer disease communities (Figure S16a), and construct the complete graph containing all possible edges between community diseases ( Figure S16b). We then subtract the edges already in the genotype and phenotype layers, which leaves us with diseases that do not have a shared gene or symptom, but are potentially associated with each other (Figure S16c and S16d). To evaluate these potential associations, we turn to an independent and comprehensive disease association database (Disease-Connect), which assigns "strengths" to edges based on multiple sources of molecular evidence [2]. We measure the overlap of potentially novel edges within our multiplex communities with the edges that are captured by Disease-Connect ( Figure S16e). As such, this validation analysis is simply a cross-check with an external "ground truth" rather than a direct inference of unknown edges, which is a separate domain of research outside the scope of this study. Following this procedure, we found that in terms of the number of potentially novel edges evidenced by Disease-Connect, multiplex disease communities far outperform genotype and phenotype communities, with overall 146 overlapping edges for multiplex communities versus 61 and 15 for genotype and phenotype communities, respectively ( Figure S16f). This result demonstrates that computing communities in the multiplex allows the uncovering of novel or underappreciated molecular disease associations, owing to the bridging of different communities that are separately represented in single-layers. We also measured the strength of the potentially novel edges in terms of the negative log of their association p-values. To adjust for the confounding factor that the number of diseases and, therefore, the possible edges between them may be higher for multiplex communities than single-layer communities, we calculated the meanlog(p-value) of all new edges for each case. Once again, multiplex communities have a higher mean disease association strength compared to both phenotype-and genotype-layer communities ( Figure S16g). This result suggests that, even after normalizing for the number of edges where multiplex communities have an advantage, the mean association strength is higher for the multiplex communities than for both of the single-layer communities.

Multiplex disease communities: Comparison with randomized communities
In order to establish that the multiplex disease communities we found are biologically distinct from random communities, we randomize the pairwise similarity matrices by keeping the size of communities fixed, and compare the four similarity measures between the actual and the randomized case. Once again, we compare the similarity distribution of the each community with the background of all disease pairs using two-sided Mann-Whitney U test. We find that, within the top 30 communities with the largest size, there are no communities with mean RR comorbidity less than that of the background (i.e. of all diseases), and only three communities are not significant, whereas for the randomized case there are two communities with mean RR comorbidity smaller than background, and nine are not significant. Furthermore, the intra-community similarity -log(p-values) for RR comorbidity are significantly higher than those of randomized communities (mean(-logP): 30.72 vs. 6.54, p = 0.00034 two-sided Whitney-Mann U test). Similarly fo MimMiner semantic similarity, there are no communities with mean MimMiner similarity less than that of the background and only two communities are not significant, whereas for the randomized case there are seven communities with mean MimMiner similarity smaller than background, and ten are not significant. The intra-community similarity -log(p-values) for Mim-Miner similarity are significantly higher than those of randomized communities (mean(-logP): 33.40 vs 3.73, p = 1.008E −08, two-sided Whitney-Mann U test). For Gene Ontology: Biological Process (GO:BP) similarity, there are six communities with mean GO:BP similarity less than that of the background and only three communities are not significant, whereas for the randomized case there are four communities with mean GO:BP similarity smaller than background, and ten communities are not significant. The intra-community similarity -log(p-values) for GO:BP similarity are significantly higher than those of randomized communities (mean(-logP): 28.85 vs 5.41, p = 0.00024, two-sided Whitney-Mann U test). The reason that multiplex communities perform relatively closer to random for GO:BP similarity compared to other similarity measures is that the overall similarity of background is relatively higher for this measure, which could be due to many common endophenotypes and pathobiological pathways that loosely connect many diseases that are in different disease communities. Finally, for gene overlap (Jaccard index), there are once again no communities with mean gene overlap less than that of the background and only two communities are not significant, whereas for the randomized case, while there are no communities with mean gene overlap smaller than background, eighteen communities are not significant. The intra-community similarity -log(p-values) for gene overlap are significantly higher than those of randomized communities (mean(-logP): 10.10 vs 1.37 , p = 5.98E − 09. two-sided Whitney-Mann U test).

Biological cohesiveness of single-layer disease communitiesa comparison with multiplex communities
It is crucial to assess how well the single-layer disease communities perform in terms of the four similarity measures, in order to be able to comment on the level of cohesiveness of the multiplex disease communities. To do this, we repeat the intra-community similarity measurements for the genotype and phenotype communities and present the results here. We first examine the genotype layer-specific communities. To be consistent with the analogous analysis on multiplex disease communities, we take the largest communities with size 10 and greater and consider the top 20 communities. For RR comorbidity, we find that all top-20 communities have higher average RR comorbidity similarity than the background. For the multiplex case, all top-30 communities had higher RR comorbidity than the background as well. Therefore, as a second level of quantification, we compare how significantly higher than the background each case is, measured by −log(P ) values of the Mann-Whitney U test for each disease community. We find that RR comorbidity intra-community similarity significances (measured in terms of -log(P )) are lower for genotype layer-specific disease communities than multiplex disease communities (23.60 vs 30.72). For MimMiner semantic similarity, we have the same situation where all top-20 communities have higher average MimMiner semantic similarity than the background. For the multiplex case, all top-30 communities also had higher MimMiner semantic similarity than the background. Turning to p-values, we find that MimMiner intra-community similarity significances are lower for genotype layer-specific disease communities than multiplex disease communities (26.79 vs 33.40). For GO:BP similarity, all top-20 communities have higher average GO:BP similarity than the background whereas for the multiplex case, six of the top-30 communities had lower GO:BP similarity than the background. Finally, for gene overlap Jaccard similarity, all top-20 communities have higher average gene overlap Jaccard than the background. For the multiplex case, all top-30 communities had higher gene overlap Jaccard similarity than the background as well. Comparing the p-values, we find that gene overlap Jaccard intra-community similarity significances are significantly higher (two-sided Mann-Whitney U test P=9.47E-05) for genotype layer-specific disease communities than multiplex disease communities (20.28 vs 10.10). Taken together, these results indicate that in terms of phenotypic similarity measures, multiplex disease communities do better than genotype layer-specific disease communities. Meanwhile for molecular similarity measures, the genotype layer communities do better than multiplex communities. We note that the better performance of the genotype layer communities in this case is to be expected since GO:BP similarity and gene overlap are both molecular similarity measures from which the links in this layer are derived.
Second, we examine the phenotype layer-specific communities. We investigate the top-16 disease communities of size 10 and greater. For RR comorbidity, we find that all top-16 communities have higher average RR comorbidity similarity than the background. For the multiplex case, all top-30 communities had higher RR comorbidity than the background as well. The RR comorbidity intra-community similarity significances (measured in terms of -log(P )) are higher for phenotype layer-specific disease communities than multiplex disease communities (35.60 vs 30.72), although this difference is not significant (two-sided Wilcoxon Rank Sum test (P=0.10). For MimMiner semantic similarity, once again, all top-16 phenotype layer-specific communities have higher average MimMiner semantic similarity than the background. For the multiplex case, all top-30 communities also had higher MimMiner semantic similarity than the background. The MimMiner intra-community similarity significances are lower for phenotype layer-specific disease communities than multiplex disease communities (28.55 vs 33.40). For GO:BP similarity, five top-16 phenotype layer-specific communities have lower average GO:BP similarity than the background whereas for the multiplex case, six of the top-30 communities had lower GO:BP similarity than the background. The GO:BP intra-community similarity significances are significantly lower (two-sided Wilcoxon Rank Sum test P=0.02) for pheotype layer-specific disease communities than multiplex disease communities (11.06 vs 28.85). For gene overlap Jaccard similarity, all top-16 communities have higher average gene overlap Jaccard than the background. For the multiplex case, all top-30 communities had higher gene overlap Jaccard similarity than the background as well. Comparing the p-values, we find that gene overlap Jaccard intracommunity similarity significances are significantly lower (two-sided Wilcoxon Rank Sum test P=0.00055) for phenotype layer-specific disease communities than multiplex disease communities (2.57 vs 10.10). Taken together, these results indicate that multiplex disease communities do comparably with or better than phenotype layer-specific communities for phenotypic similarity measures, and significantly better than phenotype layer-specific communities for molecular similarity measures. The results of this section are summarized in Fig. S17. Overall, these analyses point towards the cooperative effect of the multiplex disease network where the multiplex disease communities compensate for the lack of molecular similarity in the phenotype layer communities and the lack of phenotypic similarity in the genotype layer communities, offering a good compromise between the two aspects. Figure S17: The comparison of multiplex disease communities with genotype and phenotype layer-specific disease communities according to the four similarity measures. Green values indicate the measures for which multiplex communities have higher intra-community similarity than the respective single layer communities whereas red values indicate measures for which multiplex communities have lower intra-community similarity than the respective single layer communities. 11 Assessing the clarity of boundaries of multiplex disease communities As a means to demonstrate the lack of edges between communities, we turn to PubAtlas publication co-occurrence and test whether or not the high publication co-occurrence of intra-layer disease pairs (see Figure 4 in the main text and Supplementary Figures S18 and S24) is accompanied by a lack of publication co-occurrence for inter-layer disease pairs. As proof of concept, we first compare the two communities presented in the main text, disease community 8 and 23, and plot the publication co-occurrence heatmap between these two communities. As expected, we find that there are few publications between the diseases of these two communities (see Fig. S19), where the publication co-occurrence is measured by the Jaccard index J.
Overall, the distributions of log(J) of intra-community links for disease community 8 and disease community 23 (mean= −11.45 and −11.87, respectively) are significantly higher than the inter-community links between disease community 8 and disease community 23 (mean= −16.32) (p-value= 3.92E − 12 and 2.47E − 07, respectively). In order to generalize this observation to all disease communities discussed, we plot the pairwise publication co-occurrence heatmap for all communities (Fig. S20). By inspecting Fig. S20, we see that the multiplex communities (along the diagonal) have higher publication co-occurrence similarity within themselves than with other communities, the only exceptions (lines "repeated" in other rows/columns) being the communities connected together with "bridge diseases," which belong to more than one community. When we plot the intra-and inter-community publication co-occurrence Jaccard indices for the top 30 largest communities, we see that there are only two instances where the inter-community co-occurrence was higher than intra-community (diagonal values), i.e. the diagonal is the highest of its respective row (Fig. S21). Among these top 30 largest disease communities, all have higher mean intra-community publication co-occurrence compared to the background and 25 of them are significant (two-sided Mann-Whitney U test). We note that the few "dark" off-diagonal values, even though they have a similar shade of color to the diagonal values, are still of lower co-occurrence by several folds due to the logarithmic scale of the heatmap. Furthermore, we performed the same analysis for layer-specific communities in order to assess their community boundaries. Phenotype communities, which are based on shared symptoms and therefore expected to be more well-defined and to have clearer boundaries in terms of publication co-occurrence. We found that overall, the publication co-occurrence is higher for off-diagonal elements for phenotype layer communities compared to multiplex communities (Fig. S22a), indicating that multiplex communities offer a more well-defined boundary between disease groups measured in terms of publication co-occurrence. For the genotypespecific communities, the intra-vs. inter-community co-occurrences are similar, however the number of strong off-diagonal similarities similar to the diagonal ones is higher in genotype specific communities, indicating that there are more instances where the boundaries between different disease communities are blurred, compared to the multiplex case (Fig. S22b). Finally, to probe further into which communities are relatively more convoluted, we compare, for each community, its intra-community publication co-occurrence with its inter-community publication co-occurrence with the other 29 communities, and plot the -log(P) values (Fig.  S23). We find that although almost all intra-community co-occurrences are higher than intercommunity ones (see Fig. S21), a few of them are not significant for some communities (especially communities 3, 7, 17 and 27). This could be due to the low number of publications for these diseases and therefore insufficient statistics.    Figure S22: (a) The logarithm of the publication co-occurrence Jaccard indices for the intra-and inter-community diseases in the top 30 phenotype layer disease communities. (b) The logarithm of the publication co-occurrence Jaccard indices for the intra-and inter-community diseases in the top 30 genotype layer disease communities. Darker color indicates higher publication co-occurrence. Figure S23: Comparison of intra-and inter-community publication co-occurrences in terms of significance (-log(P) values). In each row, the publication co-occurrence of the community denoted by the row index is compared against all other communities. Any color other than black denotes insignificant (P > 0.05, two-sided Kolmogorov-Smirnov test).

Preparation of the GWAS dataset
To build the GWAS multiplex, we follow the same procedure as in the main text, with the only difference being the disease-gene bipartite network, i.e. the genotype layer. For the GWAS multiplex, we use the Disease-Connect database [2], and limit our use to the GWAS evidence disease-gene associations. This results in a multiplex network consisting of 113 diseases, divided into 16 multiplex disease communities (Supplementary Table 2

Average gene similarity of disease pairs
To compare the average gene overlap between the genotype, phenotype and overlap network, we calculate the average Jaccard index of overlap over all the edges in these networks. This results in J = 0.007 for the phenotype network, : J = 0.208 for the genotype network, and J = 0.252 for the overlap network (Fig. S25, blue circles). Then to compare these values to random expectation, we generate ensembles of networks with the same degree distribution in each layer separately and calculate the resulting randomized J values. J rand = 0.007 ± 0.001 for the phenotype network, J rand = 0.011±0.002 for the genotype network and J rand = 0.011±0.006 for the overlap network (Fig. S25, boxplots). Figure S25: (Left) The average gene similarity of disease pairs in the genotype and phenotype layers, and the overlap network (red squares) against random expectation (boxplots) demonstrates the combined effect of the two sources of information (gene and symptom) on the multiplex construct of the edge overlap layer. (Right) Zoom-in detail of random expectation boxplots. Boxplots represent minimum (Q1 -1.5 IQR), first quartile (Q1), median, third quartile (Q3), and maximum (Q3 + 1.5 IQR)