Article | Open

Disentangling genetic and environmental risk factors for individual diseases from multiplex comorbidity networks

  • Scientific Reports 6, Article number: 39658 (2016)
  • doi:10.1038/srep39658
  • Download Citation
Published online:


Most disorders are caused by a combination of multiple genetic and/or environmental factors. If two diseases are caused by the same molecular mechanism, they tend to co-occur in patients. Here we provide a quantitative method to disentangle how much genetic or environmental risk factors contribute to the pathogenesis of 358 individual diseases, respectively. We pool data on genetic, pathway-based, and toxicogenomic disease-causing mechanisms with disease co-occurrence data obtained from almost two million patients. From this data we construct a multiplex network where nodes represent disorders that are connected by links that either represent phenotypic comorbidity of the patients or the involvement of a certain molecular mechanism. From the similarity of phenotypic and mechanism-based networks for each disorder we derive measure that allows us to quantify the relative importance of various molecular mechanisms for a given disease. We find that most diseases are dominated by genetic risk factors, while environmental influences prevail for disorders such as depressions, cancers, or dermatitis. Almost never we find that more than one type of mechanisms is involved in the pathogenesis of diseases.


Multifactorial diseases are disorders that involve multiple disease-causing mechanisms, such as genes acting in concert with environmental factors. They represent one of the most significant challenges that medical research faces today1. Disease-causing mechanisms may be (and typically are) involved in more than one disorder2. If two diseases are related to the same mechanism (say, a single point mutation, SNP, or an altered metabolic pathway), they have a tendency to co-occur in the same patients3,4. Here we develop a novel network-medicine approach to quantify the relative contributions of genetic and environmental risk factors for diseases. The central idea of the approach is illustrated in Fig. 1. We consider three diseases i, j, k (circles) and assume that diseases i and j co-occur very frequently in patients (thick line), whereas diseases i and k rarely coincide within patients (thin line). Assume further that i can arise through two different disease-causing mechanisms, A and B, where mechanism A is also responsible for (or involved in) disease k and mechanism B for disease j. Obviously, mechanism B explains the observed disease phenotype i (the frequent co-occurrence with disease j) much better than mechanism A and is therefore a more probable causes for disease i. Using this idea we are able to identify the most likely causes and are able to disentangle genetic and environmental disease-causing mechanisms for individual disease phenotypes.

Figure 1: Consider three diseases i, j, k (blue circles) and assume that disease i co-occurs very frequently with j (thick line) but only in rare cases with k (thin line).
Figure 1

Further, assume that there are two different disease-causing mechanisms for i, A and B, where mechanism A (B) is also known to be involved in disease k (j). Since i is very often observed together with j, but not with k, mechanism B explains the disease phenotype i much better than A.

Here we construct a multiplex comorbidity network that combines phenotypic comorbidity networks with those given by different types of shared disease-causing mechanisms (genes, pathways, or exposure to chemicals), the human disease multiplex network (HDMN) (see Fig. 2). Multiplex networks are given by a set of nodes connected by multiple sets of links5,6. One set of links in the HDMN corresponds to phenotypic comorbidity relations, whereas the other sets of links represent different classes of genetic or environmental mechanisms. We quantify how similar the phenotypic links of a particular disease are to its links in other layers in the HDMN. This allows us to derive scores for each disease of how well its phenotypic comorbidities can be explained by genetic, pathway-based, or toxicogenomic mechanisms. In this sense the derived scores quantify “how genetic” or how strong environmental influences are for a given disease.

Figure 2: Illustration of the HDMN for a disease i.
Figure 2

In the HDMN, nodes correspond to disease phenotypes that are connected by four different types of links which can be visualized as network layers. The first layer, , encodes phenotypic comorbidity relations. The link-weights in this layer are given by the comorbidity strengths ϕij that measure how often two diseases i and j co-occur within the same patients, i.e. the numbers of patients with either disease i (red individuals) or j (blue) are compared to the numbers of patients with both diseases (green). The second layer, , contains genetic comorbidities (blue links) where two different phenotypes (illustrated as blue and red individuals) are related to the same genetic defect or alteration. The third type of links are pathway-based comorbidities (green links), layer . Here, two different alterations occur in a pathway that is involved in two or more different diseases. Finally, the fourth layer, , is given by toxicogenomic comorbidities (red links), where a chemical substance is known to trigger different disease-causing mechanisms. Disorder i is shown as a red node in the HDMN, together with other phenotypes (blue nodes) that are in i’s neighborhood in at least one of the layers. The relative comorbidity risks measure to which extent shared disease-causing mechanisms between two diseases lead to their phenotypic comorbidity. is the average comorbidity strength of all neighbors of i in layer α, normalized to the average comorbidity strength over all phenotypes that share no disease-causing mechanism of any type with i. In the above example the greatest similarity to the phenotype network ϕij has obviously the genetic one, , and disease i is most likely of genetic origin.

The construction and analysis of networks of diseases that are connected by different comorbidity relations has recently lead to substantial progress in our understanding of the etiologies of various diseases2,7,8. For instance, gene-disease associations collected in the Online Mendelian Inheritance in Man (OMIM) database9 can be used to construct a network where diseases are linked if they are related to the same mutations in one or several genes10. This network allowed for the identification of clusters of diseases, such as cancers, which are held together by a small number of genes11. Another approach is to connect diseases if they are both associated with enzymes that catalyze reactions in the same pathway4. Protein-protein interaction data can be integrated with toxicogenomics data to construct a network where two diseases are linked if they are both caused by exposure to the same chemical, which has led to the successful identification of novel chemical-protein associations12. It has recently been shown that diseases that are comorbid in the population tend to be related with clusters of proteins that are close to each other in the human protein-protein interaction network13. Different types of genomic, metabolomic, and proteomic disease-disease relations have also been combined to form an “integrated disease network”14,15. In phenotypic comorbidity networks, nodes correspond to disease phenotypes that are linked if the two diseases tend to co-occur in the same patients16. Chronic, multifactorial disorders often assume the role of hubs in such networks (i.e. nodes that are strongly connected with a large number of other diseases)17.

We consider the three most important classes of disease-causing mechanisms. (i) Genetic mechanisms relate a disease to a specific defect or alteration in the genome. If one such defect is related to two or more pathologies, then those diseases share a genetic comorbidity. For example, it was shown that the phenotypic comorbidity between schizophrenia and Parkinson’s disease is almost entirely accounted for by SNPs in loci near NT5C2 and HLA-DRA18. (ii) Pathway-based mechanisms are given by a defective pathway (e.g. metabolic or signal transduction pathway) that is involved in the etiology of the disease. Pathway-based comorbidities indicate that two diseases are related to different defects in the same pathway. For instance, it is known that the Pi3K/AKT pathway up-regulates anti-inflammatory cytokines and inhibits proinflammatory cytokines such as IL-1b, IL-6, TNF-α, and IFN-γ that show increased levels in patients with major depressive disorder19. Also, inactivation of the Pi3K/AKT pathway through the suppression of insulin receptor substrates (IRS) may act as the underlying mechanism for the metabolic syndrome (i.e. the frequent concurrence of metabolic disorders such as hypertension, obesity, or diabetes)20. Indeed, depression has been identified as an important comorbidity of the metabolic syndrome in various cross-sectional surveys21,22. Finally, (iii) toxicogenomic mechanisms characterize diseases caused by exposure to chemical substances that change the activity of certain genes. Two diseases share a toxicogenomic comorbidity if each of them is related to a gene that interact with the same chemical. For example, the immunosuppressive chemical methoxychlor is used as pesticide and can cause atopic dermatitis, possibly by expressing IL-13 in the skin23. Methoxychlor also promotes the epigenetic transgenerational inheritance of kidney disease. Upon prenatal exposure to methoxychlor during fetal gonadal development, offspring show increased incidence of adult-onset kidney disease that was related to differentially DNA methylated regions24. Kidney disease and atopic dermatitis are therefore, both, related to methoxychlor and connected in the toxicogenomic comorbidity network. Atopic dermatitis is indeed associated with the nephritic syndrome25.

Data and Methods


Phenotypic disease-disease associations were obtained from a database of the Main Association of Austrian Social Security Institutions that contains pseudonymized claims data of all persons receiving inpatient care in Austria between January 1st, 2006 and December 31st, 200717,26. The data contains age, sex, main- and side-diagnoses (ICD10 codes)27 for each hospital stay from N = 1, 862, 258 patients. Not all ICD codes represent disorders, they may also indicate general examinations, injuries, collections of unspecific symptoms or disorders that are not classified elsewhere. Unspecific codes are excluded and we work with the remaining 1,252 diagnoses on the three-digit ICD levels in chapters (i.e. first-digit-levels) A-Q, labeled by the capital index I. We use the words disease, disorder and diagnosis interchangeably whenever referring to an ICD entry.

Molecular disease-disease associations were obtained from molecular data of three types, namely purely genetic associations and two different types of environmental associations. (i) Genetic disease associations were extracted from the OMIM dataset9, which provides a collection of gene-phenotype relationships. It contains for instance currently more than 30 genes that are known to play a role in type 2 diabetes, e.g. the aforementioned IRS 2 gene. (ii) Pathway-based disease associations we took from the UniProtKB database28,29. The UniProtKB database contains protein sequence and functional information that is cross-referenced with pathways in which the proteins play a role and the protein’s involvement in diseases. For instance, an UniProt entry for the PI3-kinase protein cross-references about 40 different pathways, including the PI3K/AKT activation pathway, in addition to three different disease phenotypes from the OMIM dataset. (iii) Toxicogenomic disease associations were obtained from the Comparative Toxicogenomic Database (CTD)30. Entries in the CTD correspond to chemicals that are linked to diseases caused by exposure to the substance and with disease genes that are differentially expressed under exposure to it. For instance, according to this data the chemical methoxychlor is involved in more than ten different diseases, including atopic dermatitis where its influence is mediated by eight different genes, including IL-13. To link the molecular to the phenotypic data, a mapping between ICD10 and OMIM disease identifiers had to be established. To obtain such mappings we compiled three different data sources, namely the Human Disease Ontology database31, OrphaNet32, and Wikipedia33. Note that from these definitions it follows that we only focus on disorders that have a heritable component. For more information on data extraction and the construction of the ICD10-OMIM mappings see supporting information, Text S1. Each of the three molecular datasets can be represented by a bipartite network , where α labels the classes of mechanisms, i.e. genetic (α = G), pathway-based (α = P), or toxicogenomic (α = T), index i labels disorders (ICD10 codes) and j labels unique genes (if α = G), pathways (if α = P), or chemicals (if α = T). We set , if there exists is at least one relation between disease i and gene/pathway/chemical j, , otherwise.

Heritability and drug approvals

Information on the broad-sense heritability (see supporting information, Text S2) of individual diseases i, , was taken from the SNPedia database34. As a source for drug approvals we used the Drugs@FDA database35 from which we obtained FDA-approved brand names and approval dates for all drug products approved since 1939. These drugs were mapped via known molecular targets to diseases36 to obtain the number of newly approved drug products of the last twenty years for the specific disease i, Di.

Construction of the HDMN

We constructed a multiplex network that encodes disease-disease associations of four different types, the HDMN, . This network contains one phenotypic layer, α = ϕ, and three layers that encode molecular disease-disease associations, α {G, P, T}. The layer of phenotypic disease associations, , is given by the contingency coefficient, ϕij, between diseases i and j: Here Ni is the number of patients with disease i and N is the total number of patients. For each pair of diseases (i, j) we counted the number of patients that have both diseases (Nij), only disease i or j ( or , respectively), or neither disease (). Here, the bar denotes “not”. Entries in the phenotypic disease network, , are then given by the contingency coefficient,

Values of ϕij are within the range [−1, +1] and measure the phenotypic comorbidity strength between diseases i and j. The higher (lower) ϕij, the higher (lower) the probability that a patient with disease i also suffers disease j. ϕij = 0 indicates that occurrences of i and j are independent from each other. We set , whenever the patient numbers are too low to allow for a reliable estimate of ϕij, i.e. whenever one of the possible outcomes for Nij, , , or was below 5. An age-dependent version of the phenotypic disease network for a given age interval t is denoted by . Patients fall within one of 11 age groups, 0y–7y, 8y–15y, …, 80y–87y.

The layers 2, 3, and 4 of the HDMN encode three different types of molecular associations, α {G, P, T}. Each of these layers, , is obtained from the bipartite network as follows,

Note that this definition ensures that associations between pathologies i and j in the pathway, , and toxicogenomic, , layers are indeed due to shared pathways or exposure to the same chemical that can not be explained by direct genetic causes (i.e. ). It is therefore guaranteed that comorbidity relations in the toxicogenomic or pathway-based layers are due to gene-by-environment interactions.

The numbers of non-isolated nodes, Nα, and links, Lα, for each layer α are shown in the SI, Table S1. Diseases are not included in the HDMN if they are isolated in every molecular layer α = G, P, or T. This constraint reduces the number of nodes in the phenotypic network from about 900–1000 (depending on patient age) to 358 disorders. Links in the phenotypic layer Mϕ are weighted and typically close to zero16,17. Numbers for Nα are between 200 and 300 for the molecular layers. Note that disease codes in the (phenotypic) ICD10 classification are typically coarser than, for instance, the OMIM disease phenotype classification that has about 1,800 entries. Many of these OMIM codes, however, map to the same ICD10 entry which leads to the substantial reduction of nodes in the molecular layers, see the supporting information, Text S1.

Disease risks from shared pathophysiological mechanisms

We introduce a relative risk indicator that measures how similar the phenotypic comorbidities of disease i are to its genetic, pathway-based, or toxicogenomic comorbidities. In this sense quantifies how much a specific class of disease-causing mechanisms contributes to the phenotype i. is the quotient of the average comorbidity strengths, , of all diseases that are linked to i in layer , and the comorbidity strengths of those diseases that are linked to i in none of the pathophysiological layers, i.e.,

Here is the degree of disease i in layer α given by and is a control set of links for disease i that contains all links j, i ≠ j, for which .

Let us illustrate the relative risk indicator proposed in equation 3 by considering disease i in the example shown in Fig. 1. In this case we have two different pathophysiological processes that are represented in layers, A and B. Each layer contains the respective mechanism only; for the degrees of i follows . We further assume that there are two other diseases, m and n (not shown in Fig. 1), that are not connected to i by any mechanism. These diseases are therefore contained in the control set for disease i, i.e. and . For the layer with disease mechanism A we get the relative risk . Similarly, for disease mechanism B we have . Since we assumed that ϕik > ϕij it follows that and therefore mechanism A explains the observed comorbidities of disease i better than B. The relative risk indicators in equation 3 also covers cases with more than one mechanism in a given layer, i.e. , as it will be typically the case for the pathopyhsiological layers considered in this work. Further, since re-scales the observed comorbidity strengths for a given layer by the typical strengths observed in the control set for disease i, it is meaningful to compare these indicator values for different diseases, even in the presence of statistical biases that in ϕij that may occur when very rare and frequent diseases are compared16,17.

For convenience we also defined the logarithmic relative comorbidity risk, . A value of close to zero indicates that the presence of pathophysiological comorbidities of type α have no relation whatsoever to the actual, phenotypic comorbidities of i. With increasingly positive values of , the probability increases that the pathophysiological comorbidities of i are indeed observed in the population.

Note that the relative comorbidity risk can be large due to a single comorbidity j of type α with a very high phenotypic comorbidity strength , or because there are a large number of comorbidities with only moderately increased comorbidity strengths. In particular, might favor diseases that have a large number of connections of type α to diseases that are physiologically very similar and that have similar ICD10 diagnosis codes, see Text S1. To adjust for these biases we rescaled by the node degree to obtain a measure that favors diseases with a smaller number of highly relevant disease-causing mechanisms. The re-scaled comorbidity risk, , is given by .

We performed two different statistical tests to evaluate whether is significantly greater than zero. First, a Wilcoxon rank sum test for equal medians of two samples was performed. The samples were given by the set of comorbidity strengths of all diseases j that share a link of type α with i, , and the set . The p-value for , , was obtained from the one-sided Wilcoxon rank sum test against the alternative hypothesis that the median of S1 is smaller than the median of S2. A Benjamini-Hochberg multiple hypothesis testing correction was applied on each layer using an exploratory threshold for the false discovery rate of α = 0.25 (which corresponds to thresholds for the adjusted p-values in the range between 0.1 and 0.05). Second, we performed a randomization test for where we replace by a random permutation of its elements, denoted by . The randomized was computed from equation 3 where was replaced by . For a given α, has the same number of nodes and links as , but is otherwise completely randomized.

Results and Discussion

The estimates of the most probable disease causes can be visualized in a three-dimensional representation where the axes show the genetic (G), pathway-based (P), and toxicogenomic comorbidity risks (T). Each disease corresponds to a point with coordinates , see Fig. 3(a) and its projections onto the (b) G − P, (c) G − T, and (d) P − T planes. The size of each marker is proportional to the frequency Ni/N of disease i. For this visualization we do not include diseases that are only present in one of the molecular layers, G, P, or T. For the remaining 254 pathologies we set for all diseases where is not significantly different from zero after the multiple hypothesis testing correction. The majority of disorders are clearly dominated by genetic risk factors (many points are close to the G-axis). Some disorders cluster around the P and T axes indicating purely pathway-based and toxicogenomic origins. Intriguingly, there is precisely no disease that has a significant pathway-based and toxicogenomic comorbidity risk at the same time, see Fig. 3(d). However, a number of disorders with significant pathway-based or toxicogenomic risks have also significant genetic contributions, see Fig. 3(b) and (c). This can also be seen in Table 1, where for instance the chronic nephritic syndrome ranks high in genetic and toxicogenomic comorbidity risks.

Figure 3: Classification of diseases (circles) according to the dominant causes of their phenotypic comorbidities.
Figure 3

Results are shown for (ad) the relative comorbidity risks and (eh) their re-scaled versions, . Circle size is proportional to the number of disease occurrences. Re-scaling the risks by the degrees leads to almost perfect clustering of the diseases around one of the axes. The per-link contribution to the relative comorbidity risk is always dominated by one specific mechanism. Only a comparably small number of diseases cluster around the toxicogenomic axis. The comorbidity risks for most pathologies are dominated by genetic disease-causing mechanisms.

Table 1: Top 10 diseases in every class of disease-causing mechanisms, α, and their relative comorbidity risks , ranked by the significance of its overlap with the phenotypic disease layer, .

The per-link contributions, , of three types of pathophysiological mechanisms are shown in Fig. 3(e)–(h). Almost all disorders show one dominant comorbidity risk contribution, i.e. they cluster around a single axis. As we have excluded here all diseases for which only one type of data exists, this clustering can not be trivially explained by incomplete or missing data. Our results are particularly relevant for “complex diseases” where we focus on disorders that have not only a genetic component as defined by OMIM9, but also pathway-based and/or toxicogenomic contributions. It can be shown that the observation that disorders cluster around a single axis in Fig. 3(d) also holds for the 120 diseases that are present in each of the layers. Again, most diseases show large genetic risks, while some cluster around the P and T axes. In the supporting information, SI Fig. 1, we show results for where we allow comorbidities that are at the same time genetic and pathway-based/toxicogenomic (i.e. we drop the second condition for in equation 2). We also include diseases that are only present in one of the molecular layers and therefore fall by construction on one of the axis. There are now disorders with, both, significant pathway-based and toxicogenomic comorbidity risks. For these comorbidities, however, there exists also a direct genetic mechanism that may account for the phenotypic comorbidities.

Table 1 shows the diseases with the largest genetic, pathway-based, or toxicogenomic comorbidity risks, ranked by statistical significance. The top genetic diseases include schizo-affective and delusional disorders, as well as schizophrenia. Different forms of osteoarthritis and chronic bronchitis, as well as nephrotic and nephritic syndromes also show high genetic comorbidity risks. The top pathway-based diseases are major depressive disorders, endocrine disorders such as obesity and amyloidosis, diseases of the nervous systems including epilepsy and extrapyramidal and movement disorders, as well as disorders of bone density and multiple myeloma. The top toxicogenomic diseases include various forms of dermatitis and other skin diseases such as lichen simplex chronicus and prurigo, but also aortic aneurysms, and the chronic nephritic syndrome.

Schizophrenia is indeed a highly heritable disorder that is associated with more than hundred gene loci37. The large pathway-based risk for depressions is corroborated by strong and supposedly bi-directional associations between the metabolic syndrome and depression, which have been a long-standing puzzle in epidemiological studies38. Depressions also exhibit strongly significant genetic comorbidity risks (, ) in consistency with the finding of a gene-by-environment interaction where individuals with a functional polymorphism in the promoter region of the serotonin transporter (5-HT T) gene exhibited more depressive symptoms in relation to stressful life events39. The high toxicogenomic risks for aortic aneurysms are in line with the effects of chemicals such as nicotine and prostaglandin on related disease-genes40. In summary, for most of the top ranking diseases for each layer there are indeed known and highly relevant pathobiological mechanisms of the given type, which validates our approach.

We next answer the question if there is a relation between pairs of diseases that tend to be mutually exclusive in individual patients, i.e. ϕij < 0, and the pathophysiological layers in the HDMN. To do so one can define an “anti-comorbidity” network, η, as ηij = −ϕij iff ϕij < 0 and ηij = 0 otherwise. The relative comorbidity risks that are obtained using η instead of ϕ, , are not significantly different from zero in all but two cases (pathway-based risk for D86, sarcoidosis, and , and the toxicogenomic risk for G30, Alzheimer’s disease, and ). An overlap between anti-correlations of diseases and shared mechanisms is therefore not a significant feature of the data for the vast majority of disorders.

Since phenotypic disease networks are known to undergo large changes in their topology as a function of the age of the underlying patient cohorts17, we first clarified how the relative comorbidity risks depend on patient age. The age-dependent relative risks, , were computed using equation 3 and by replacing with its age-dependent counterpart, . Results for the average relative comorbidity risks over all diseases i, denoted by , are shown in Fig. 4(a). Note that this average is also taken over diseases with comorbidity risks that are not significantly different from zero. The genetic comorbidity risk averaged over all diseases i, , is substantially higher than the pathway-based or toxicogenomic risks and assumes values above 1 for ages between 30 and 90. Effects are considerably smaller for the average pathway-based (toxicogenomic) comorbidity risks that reach values around 0.5 at ages around 30 (50). These age differences in the peaks of the environmental comorbidity risks are driven by the age-dependence in the prevalences of the diseases that provide the most dominant contributions to . In all cases, results for clearly exceed the expectation values from the randomized risks , obtained from . Note that we have confirmed that the dominance of genetic disorders can not be a simple consequence of the exclusion of genetic comorbidities in the other molecular layers in equation 2. Removing this constraint would increase the average environmental contributions by a factor of about 1.5, while the genetic comorbidity risks exceed them by a factor between four and five. From now on we consider only the time-independent HDMN.

Figure 4: Contributions of genetic, pathway-based, and toxicogenomic comorbidity risks.
Figure 4

(a) The genetic risks, , clearly exceed the pathway-based, , and toxicogenomic, , risks across all ages of patients. The results for all three types of mechanisms exceed their expectations from the randomization test (markers connected by dotted lines, error bars show the standard deviation over 5,000 randomizations). (b) Averages of the relative risks are shown for the chapters of the ICD10 classification, the solid vertical lines show the values of genetic (blue), pathway-based (green) and toxicogenomic (red) risks averaged over all diseases. Diseases of the digestive system, mental disorders, and infections show the highest genetically caused comorbidity risk, whereas cancers, diseases of the skin, eye, and ear show the lowest genetic risks. Pathway-based contributions are also highest for mental disorders and toxicogenomic contributions assume their maximum for diseases of the genitourinary system.

Figure 4(b) shows how much genetic, pathway-based, and toxicogenomic risks contribute to the observed comorbidities for subgroups of diseases that are given by the chapters of the ICD10 classification, the disease groups I. Clear differences between groups of diseases are revealed. Genetically caused comorbidities include mental disorders, disorders of the digestive system, but also susceptibility to infections. Genetic mechanisms are least relevant for disorders of the eye, ear, skin, and for cancers. Pathway-based comorbidity risks are largest for, again, mental disorders and diseases of the genitourinary system. This shows that the group of mental disorders comprises heterogeneous phenotypes that have either genetically caused or pathway-based comorbidities. Toxicogenomic comorbidity risks are largest for diseases of the skin, the genitourinary and the respiratory system, as well as for congenital malformations.

The “nurture index”, Ii, quantifies to which extent comorbidities of phenotype i are caused by environmental, i.e. pathway-based or toxicogenomic, mechanisms,

Figure 5 shows results for (a) the heritability and (b) the number of new drug approvals Di as a function of Ii. Each circle in Fig. 5 corresponds to a disease phenotype, labeled by its ICD10 code. The colors of the circles refer to their chapter in the ICD classification. The highest values of Ii are found for diseases of the genitourinary system (N03 and N05 nephritic syndrome, N02 hematuria, N08 glomerular disorders), depressions (F32, F33), several cancers (C84 T/NK-cell lymphoma, C74 adrenal gland, C61 prostate), as well as bronchiectasis (J47). Figure 5(a) shows that there is a significant negative correlation between the nurture index, Ii, and the broad-sense heritability, , of disorder i. This corroborates that Ii is indeed related to the plasticity of phenotype i, i.e. Ii increases with the influence of environmental risk factors. There is also a strong significant negative correlation between the logarithms of Ii and Di shown in Fig. 5(b). We found this result to be very robust for a large variety of choices of this time span, ranging from five years upwards. Note that and show no significant correlation among them (ρ = 0.19, p = 0.17). This indicates a significant bias in pharmaceutical R&D that favors market placements of drugs that target disorders with low environmental risk factors. It has indeed been shown that the success rates for drug development vary dramatically among disease areas41. These rates have been found to increase with the existence of direct genetic evidence, which in particular applies to diseases of the musculoskeletal system and infections, which we also identified as predominantly genetic in Fig. 3(b).

Figure 5
Figure 5

Heritability , (a) and the number of newly developed drugs Di (b) are negatively correlated with the relevance of environmental risk factors for diseases. Each circle corresponds to one disease phenotype, labeled by its three-digit ICD10 code. Both, and Di are shown as a function of the nurture index. Colors indicate the main ICD chapter to which the diseases belong. We observe particularly high Ii values for diseases of the genitourinary system, various cancers, depression, and bronchiectasis.


We developed a novel approach to quantitatively disentangle the most relevant genetic or environmental disease-causing mechanisms for a large number of particular disorders. This has become possible through recent advances in observing networks of phenotypic comorbidity relations with unprecedented precision16,17. We considered three different classes of mechanisms that can be at the core of these observed comorbidities, namely genetic, pathway-based, and toxicogenomic mechanisms that cause more than one disorder. By constructing the HDMN we have been able to identify the most probable causes for 358 different phenotypes by measuring the overlap between phenotypic and pathophysiological comorbidities, the relative comorbidity risks . We find that the different environmental disease-causing mechanisms do not mix; we found no pathologies that have significant pathway-based and toxicogenomic comorbidity risk contributions at the same time. By considering only diseases for which at least two different types of molecular comorbidities are known, we can rule out that this result is due to missing data. While for most of the studied diseases genetic risk factors dominate, we identify a number of disorders with significant environmental contributions which typically coincides with low heritability and lower rates of successful market placements of drugs.

Our approach cross-validates pathophysiological mechanisms by whether their predicted comorbidities are indeed directly observed in the population. Moreover we can rule out certain types of disease-causing mechanisms when the comorbidities that they predict are not observed. The methodology developed here can be extended to decide on a quantitative basis if the comorbidities predicted by a particular individual pathophysiological mechanism are also phenotypically relevant. The new technology can be used as a novel and data-driven way to validate potential drug targets.

Additional Information

How to cite this article: Klimek, P. et al. Disentangling genetic and environmental risk factors for individual diseases from multiplex comorbidity networks. Sci. Rep. 6, 39658; doi: 10.1038/srep39658 (2016).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    et al. A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990–2010: a systematic analysis for the Global Burden of Disease Study 2010. The Lancet 380(9859), 2224–60 (2012).

  2. 2.

    , & Network medicine: A network-based approach to human disease. Nat Rev Genet 12(1), 56–68 (2011).

  3. 3.

    , , & Probing genetic overlap among complex human phenotypes PNAS 104, 11694–9 (2007).

  4. 4.

    et al. The implications of human metabolic network topology for disease comorbidity. PNAS 105, 9880–5 (2008).

  5. 5.

    et al. The structure and dynamics of multilayer networks. Physics Reports 544, 1–122 (2014).

  6. 6.

    et al. Multilayer networks. Journal of Complex Networks 3(2), 203–271 (2014).

  7. 7.

    & Network medicine. FEBS Lett 582, 1266–70 (2008).

  8. 8.

    , & A network medicine approach to human disease. FEBS Lett 583, 1759–65 (2009).

  9. 9.

    McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Online Mendelian Inheritance in Man, OMIM, , (Date of access: 30/04/2015).

  10. 10.

    et al. The human disease network. PNAS 104, 8685–90 (2007).

  11. 11.

    , & Network properties of genes harboring inherited disease mutations. PNAS 105, 4323–8 (2008).

  12. 12.

    et al. Deciphering diseases and biological targets for environmental chemicals using toxicogenomics networks. PLoS Comput Biol 6(5), e1000788 (2010).

  13. 13.

    et al. Uncovering disease-disease relationships through the incomplete interactome. Science 347(6224), 1257601-1-8 (2015).

  14. 14.

    , , & The integrated disease network. Integr. Biol. 6, 1069–79 (2014).

  15. 15.

    , , & Predicting disease associations via biological network analysis. BMC Bioinformatics 15, 304–316 (2014).

  16. 16.

    , , & A dynamic network approach for the study of human phenotypes. PLoS Comput. Biol. 5, 1–11 (2009).

  17. 17.

    , & Spreading of diseases through comorbidity networks across life and gender. New Journal of Physics 16, 115013 (2014).

  18. 18.

    et al. Genetic comorbidities in Parkinson’s disease. Hum. Mol. Genet. 23(3), 831–41 (2014).

  19. 19.

    , , & Roles of PI3K/AKT/mTOR pathway in cell signaling of mental illnesses. Depression Research and Treatment 2012, 752563 (2012).

  20. 20.

    Insulin signaling, resistance, and metabolic syndrome: insights from mouse models into disease mechanisms. J Endocrinol. 22, T1–23 (2014).

  21. 21.

    et al. Depression: an important comorbidity with metabolic syndrome in a general population. Diabetes Care 31(12), 2368–73 (2009).

  22. 22.

    , , , & Quantification of diabetes comorbidity risks across life using nation-wide big claims data. PLoS Comput. Biol. 11(4), e1004125 (2015).

  23. 23.

    , , , & The role of TSLP in IL-13-induced atopic march. Sci Rep 1, 23 (2011).

  24. 24.

    , , , & , Pesticide methoxychlor promotes the epigenetic transgenerational inheritance of adult-onset disease through the female germline. PLoS ONE 9(7), e102091 (2014).

  25. 25.

    , , & Atopic dermatitis as a systemic disease. Clinics in dermatology 32(3), 409–13 (2014).

  26. 26.

    et al. Quantification of excess-risk for diabetes when born in times of hunger, in an entire popuation of a nation, across a century. PNAS 110(12), 4703–7 (2013).

  27. 27.

    WHO, ICD-10 Version: 2010, , (Date of access: 18/01/2016).

  28. 28.

    The UniProt Consortium, Activities at the Universal Protein Resource. Nucleic Acids Research 42, D191–8 (2014).

  29. 29.

    et al. The reactome pathway knowledgebase. Nucleic Acids Research 42, D472–7 (2014).

  30. 30.

    et al. The comparative toxicogenomics database’s 10th year anniversary: update 2015. Nucleic Acids Research, D914–20 (2014).

  31. 31.

    et al. Annotating the human genome with disease ontology. BMC Genomics 10 (Suppl 1), S6 (2009).

  32. 32.

    & Networking for rare diseases: a necessity for Europe. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 50(12), 1477–83 (2007).

  33. 33.

    Wikipedia, ICD-10, , (Date of access: 30/04/2015).

  34. 34.

    & SNPedia: a wiki supporting personal genome annotation, interpretation and analysis. Nucleic Acids Res 40, D13008–12 (2012).

  35. 35.

    , 2016 (Date of access: 07/01/2016).

  36. 36.

    , , , & Drug-target network. Nat. Biotechnol. 25(10), 1119–26 (2007).

  37. 37.

    et al. Biological insights form 108 schizophrenia-associated genetic loci. Nature 511(7510), 421–7 (2014).

  38. 38.

    et al. Bidirectional association between depression and metabolic syndrome. Diabetes Care 35(5), 1171–80 (2012).

  39. 39.

    et al. Influence of life stress on depression: moderation by a polymporphism in the 5-HTT gene. Science 301(5631), 386–9 (2003).

  40. 40.

    , & Abdominal aortic aneurysm. The Lancet 365(9470), 1577–89 (2005).

  41. 41.

    et al. The support of human genetic evidence for approved drug indications. Nature Genetics 47, 856–60 (2015).

Download references


We are very grateful to Jörg Menche for stimulating discussions and acknowledge financial support from the European Commission, FP7 project MULTIPLEX No. 317532.

Author information


  1. Section for Science of Complex Systems, CeMSIIS, Medical University of Vienna, Spitalgasse 23, A-1090, Austria

    • Peter Klimek
    • , Silke Aichberger
    •  & Stefan Thurner
  2. Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA

    • Stefan Thurner
  3. IIASA, Schlossplatz 1, A 2361 Laxenburg, Austria

    • Stefan Thurner


  1. Search for Peter Klimek in:

  2. Search for Silke Aichberger in:

  3. Search for Stefan Thurner in:


P.K. and S.T. conceived the paper, P.K. and S.A. researched the data, P.K. and S.T. wrote the manuscript. All authors reviewed the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Stefan Thurner.

Supplementary information


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Creative Commons BYThis work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit