Disentangling genetic and environmental risk factors for individual diseases from multiplex comorbidity networks

Klimek, Peter; Aichberger, Silke; Thurner, Stefan

doi:10.1038/srep39658

Download PDF

Article
Open access
Published: 23 December 2016

Disentangling genetic and environmental risk factors for individual diseases from multiplex comorbidity networks

Peter Klimek¹,
Silke Aichberger¹ &
Stefan Thurner^1,2,3

Scientific Reports volume 6, Article number: 39658 (2016) Cite this article

9652 Accesses
22 Citations
79 Altmetric
Metrics details

Subjects

Abstract

Most disorders are caused by a combination of multiple genetic and/or environmental factors. If two diseases are caused by the same molecular mechanism, they tend to co-occur in patients. Here we provide a quantitative method to disentangle how much genetic or environmental risk factors contribute to the pathogenesis of 358 individual diseases, respectively. We pool data on genetic, pathway-based, and toxicogenomic disease-causing mechanisms with disease co-occurrence data obtained from almost two million patients. From this data we construct a multiplex network where nodes represent disorders that are connected by links that either represent phenotypic comorbidity of the patients or the involvement of a certain molecular mechanism. From the similarity of phenotypic and mechanism-based networks for each disorder we derive measure that allows us to quantify the relative importance of various molecular mechanisms for a given disease. We find that most diseases are dominated by genetic risk factors, while environmental influences prevail for disorders such as depressions, cancers, or dermatitis. Almost never we find that more than one type of mechanisms is involved in the pathogenesis of diseases.

Network analysis reveals rare disease signatures across multiple levels of biological organization

Article Open access 09 November 2021

Lacking mechanistic disease definitions and corresponding association data hamper progress in network medicine and beyond

Article Open access 25 March 2023

Visualizing novel connections and genetic similarities across diseases using a network-medicine based approach

Article Open access 01 September 2022

Introduction

Multifactorial diseases are disorders that involve multiple disease-causing mechanisms, such as genes acting in concert with environmental factors. They represent one of the most significant challenges that medical research faces today¹. Disease-causing mechanisms may be (and typically are) involved in more than one disorder². If two diseases are related to the same mechanism (say, a single point mutation, SNP, or an altered metabolic pathway), they have a tendency to co-occur in the same patients^3,4. Here we develop a novel network-medicine approach to quantify the relative contributions of genetic and environmental risk factors for diseases. The central idea of the approach is illustrated in Fig. 1. We consider three diseases i, j, k (circles) and assume that diseases i and j co-occur very frequently in patients (thick line), whereas diseases i and k rarely coincide within patients (thin line). Assume further that i can arise through two different disease-causing mechanisms, A and B, where mechanism A is also responsible for (or involved in) disease k and mechanism B for disease j. Obviously, mechanism B explains the observed disease phenotype i (the frequent co-occurrence with disease j) much better than mechanism A and is therefore a more probable causes for disease i. Using this idea we are able to identify the most likely causes and are able to disentangle genetic and environmental disease-causing mechanisms for individual disease phenotypes.

Here we construct a multiplex comorbidity network that combines phenotypic comorbidity networks with those given by different types of shared disease-causing mechanisms (genes, pathways, or exposure to chemicals), the human disease multiplex network (HDMN) (see Fig. 2). Multiplex networks are given by a set of nodes connected by multiple sets of links^5,6. One set of links in the HDMN corresponds to phenotypic comorbidity relations, whereas the other sets of links represent different classes of genetic or environmental mechanisms. We quantify how similar the phenotypic links of a particular disease are to its links in other layers in the HDMN. This allows us to derive scores for each disease of how well its phenotypic comorbidities can be explained by genetic, pathway-based, or toxicogenomic mechanisms. In this sense the derived scores quantify “how genetic” or how strong environmental influences are for a given disease.

The construction and analysis of networks of diseases that are connected by different comorbidity relations has recently lead to substantial progress in our understanding of the etiologies of various diseases^2,7,8. For instance, gene-disease associations collected in the Online Mendelian Inheritance in Man (OMIM) database⁹ can be used to construct a network where diseases are linked if they are related to the same mutations in one or several genes¹⁰. This network allowed for the identification of clusters of diseases, such as cancers, which are held together by a small number of genes¹¹. Another approach is to connect diseases if they are both associated with enzymes that catalyze reactions in the same pathway⁴. Protein-protein interaction data can be integrated with toxicogenomics data to construct a network where two diseases are linked if they are both caused by exposure to the same chemical, which has led to the successful identification of novel chemical-protein associations¹². It has recently been shown that diseases that are comorbid in the population tend to be related with clusters of proteins that are close to each other in the human protein-protein interaction network¹³. Different types of genomic, metabolomic, and proteomic disease-disease relations have also been combined to form an “integrated disease network”^14,15. In phenotypic comorbidity networks, nodes correspond to disease phenotypes that are linked if the two diseases tend to co-occur in the same patients¹⁶. Chronic, multifactorial disorders often assume the role of hubs in such networks (i.e. nodes that are strongly connected with a large number of other diseases)¹⁷.

We consider the three most important classes of disease-causing mechanisms. (i) Genetic mechanisms relate a disease to a specific defect or alteration in the genome. If one such defect is related to two or more pathologies, then those diseases share a genetic comorbidity. For example, it was shown that the phenotypic comorbidity between schizophrenia and Parkinson’s disease is almost entirely accounted for by SNPs in loci near NT5C2 and HLA-DRA¹⁸. (ii) Pathway-based mechanisms are given by a defective pathway (e.g. metabolic or signal transduction pathway) that is involved in the etiology of the disease. Pathway-based comorbidities indicate that two diseases are related to different defects in the same pathway. For instance, it is known that the Pi3K/AKT pathway up-regulates anti-inflammatory cytokines and inhibits proinflammatory cytokines such as IL-1b, IL-6, TNF-α, and IFN-γ that show increased levels in patients with major depressive disorder¹⁹. Also, inactivation of the Pi3K/AKT pathway through the suppression of insulin receptor substrates (IRS) may act as the underlying mechanism for the metabolic syndrome (i.e. the frequent concurrence of metabolic disorders such as hypertension, obesity, or diabetes)²⁰. Indeed, depression has been identified as an important comorbidity of the metabolic syndrome in various cross-sectional surveys^21,22. Finally, (iii) toxicogenomic mechanisms characterize diseases caused by exposure to chemical substances that change the activity of certain genes. Two diseases share a toxicogenomic comorbidity if each of them is related to a gene that interact with the same chemical. For example, the immunosuppressive chemical methoxychlor is used as pesticide and can cause atopic dermatitis, possibly by expressing IL-13 in the skin²³. Methoxychlor also promotes the epigenetic transgenerational inheritance of kidney disease. Upon prenatal exposure to methoxychlor during fetal gonadal development, offspring show increased incidence of adult-onset kidney disease that was related to differentially DNA methylated regions²⁴. Kidney disease and atopic dermatitis are therefore, both, related to methoxychlor and connected in the toxicogenomic comorbidity network. Atopic dermatitis is indeed associated with the nephritic syndrome²⁵.

Data and Methods

Data

Phenotypic disease-disease associations were obtained from a database of the Main Association of Austrian Social Security Institutions that contains pseudonymized claims data of all persons receiving inpatient care in Austria between January 1st, 2006 and December 31st, 2007^17,26. The data contains age, sex, main- and side-diagnoses (ICD10 codes)²⁷ for each hospital stay from N = 1, 862, 258 patients. Not all ICD codes represent disorders, they may also indicate general examinations, injuries, collections of unspecific symptoms or disorders that are not classified elsewhere. Unspecific codes are excluded and we work with the remaining 1,252 diagnoses on the three-digit ICD levels in chapters (i.e. first-digit-levels) A-Q, labeled by the capital index I. We use the words disease, disorder and diagnosis interchangeably whenever referring to an ICD entry.

Molecular disease-disease associations were obtained from molecular data of three types, namely purely genetic associations and two different types of environmental associations. (i) Genetic disease associations were extracted from the OMIM dataset⁹, which provides a collection of gene-phenotype relationships. It contains for instance currently more than 30 genes that are known to play a role in type 2 diabetes, e.g. the aforementioned IRS 2 gene. (ii) Pathway-based disease associations we took from the UniProtKB database^28,29. The UniProtKB database contains protein sequence and functional information that is cross-referenced with pathways in which the proteins play a role and the protein’s involvement in diseases. For instance, an UniProt entry for the PI3-kinase protein cross-references about 40 different pathways, including the PI3K/AKT activation pathway, in addition to three different disease phenotypes from the OMIM dataset. (iii) Toxicogenomic disease associations were obtained from the Comparative Toxicogenomic Database (CTD)³⁰. Entries in the CTD correspond to chemicals that are linked to diseases caused by exposure to the substance and with disease genes that are differentially expressed under exposure to it. For instance, according to this data the chemical methoxychlor is involved in more than ten different diseases, including atopic dermatitis where its influence is mediated by eight different genes, including IL-13. To link the molecular to the phenotypic data, a mapping between ICD10 and OMIM disease identifiers had to be established. To obtain such mappings we compiled three different data sources, namely the Human Disease Ontology database³¹, OrphaNet³², and Wikipedia³³. Note that from these definitions it follows that we only focus on disorders that have a heritable component. For more information on data extraction and the construction of the ICD10-OMIM mappings see supporting information, Text S1. Each of the three molecular datasets can be represented by a bipartite network , where α labels the classes of mechanisms, i.e. genetic (α = G), pathway-based (α = P), or toxicogenomic (α = T), index i labels disorders (ICD10 codes) and j labels unique genes (if α = G), pathways (if α = P), or chemicals (if α = T). We set , if there exists is at least one relation between disease i and gene/pathway/chemical j, , otherwise.

Heritability and drug approvals

Information on the broad-sense heritability (see supporting information, Text S2) of individual diseases i, , was taken from the SNPedia database³⁴. As a source for drug approvals we used the Drugs@FDA database³⁵ from which we obtained FDA-approved brand names and approval dates for all drug products approved since 1939. These drugs were mapped via known molecular targets to diseases³⁶ to obtain the number of newly approved drug products of the last twenty years for the specific disease i, D_i.

Construction of the HDMN

We constructed a multiplex network that encodes disease-disease associations of four different types, the HDMN, . This network contains one phenotypic layer, α = ϕ, and three layers that encode molecular disease-disease associations, α ∈ {G, P, T}. The layer of phenotypic disease associations, , is given by the contingency coefficient, ϕ_ij, between diseases i and j: Here N_i is the number of patients with disease i and N is the total number of patients. For each pair of diseases (i, j) we counted the number of patients that have both diseases (N_ij), only disease i or j ( or , respectively), or neither disease (). Here, the bar denotes “not”. Entries in the phenotypic disease network, , are then given by the contingency coefficient,

Values of ϕ_ij are within the range [−1, +1] and measure the phenotypic comorbidity strength between diseases i and j. The higher (lower) ϕ_ij, the higher (lower) the probability that a patient with disease i also suffers disease j. ϕ_ij = 0 indicates that occurrences of i and j are independent from each other. We set , whenever the patient numbers are too low to allow for a reliable estimate of ϕ_ij, i.e. whenever one of the possible outcomes for N_ij, , , or was below 5. An age-dependent version of the phenotypic disease network for a given age interval t is denoted by . Patients fall within one of 11 age groups, 0y–7y, 8y–15y, …, 80y–87y.

The layers 2, 3, and 4 of the HDMN encode three different types of molecular associations, α ∈ {G, P, T}. Each of these layers, , is obtained from the bipartite network as follows,

Note that this definition ensures that associations between pathologies i and j in the pathway, , and toxicogenomic, , layers are indeed due to shared pathways or exposure to the same chemical that can not be explained by direct genetic causes (i.e. ). It is therefore guaranteed that comorbidity relations in the toxicogenomic or pathway-based layers are due to gene-by-environment interactions.

The numbers of non-isolated nodes, N^α, and links, L^α, for each layer α are shown in the SI, Table S1. Diseases are not included in the HDMN if they are isolated in every molecular layer α = G, P, or T. This constraint reduces the number of nodes in the phenotypic network from about 900–1000 (depending on patient age) to 358 disorders. Links in the phenotypic layer M^ϕ are weighted and typically close to zero^16,17. Numbers for N^α are between 200 and 300 for the molecular layers. Note that disease codes in the (phenotypic) ICD10 classification are typically coarser than, for instance, the OMIM disease phenotype classification that has about 1,800 entries. Many of these OMIM codes, however, map to the same ICD10 entry which leads to the substantial reduction of nodes in the molecular layers, see the supporting information, Text S1.

Disease risks from shared pathophysiological mechanisms

We introduce a relative risk indicator that measures how similar the phenotypic comorbidities of disease i are to its genetic, pathway-based, or toxicogenomic comorbidities. In this sense quantifies how much a specific class of disease-causing mechanisms contributes to the phenotype i. is the quotient of the average comorbidity strengths, , of all diseases that are linked to i in layer , and the comorbidity strengths of those diseases that are linked to i in none of the pathophysiological layers, i.e.,

Here is the degree of disease i in layer α given by and is a control set of links for disease i that contains all links j, i ≠ j, for which .

Let us illustrate the relative risk indicator proposed in equation 3 by considering disease i in the example shown in Fig. 1. In this case we have two different pathophysiological processes that are represented in layers, A and B. Each layer contains the respective mechanism only; for the degrees of i follows . We further assume that there are two other diseases, m and n (not shown in Fig. 1), that are not connected to i by any mechanism. These diseases are therefore contained in the control set for disease i, i.e. and . For the layer with disease mechanism A we get the relative risk . Similarly, for disease mechanism B we have . Since we assumed that ϕ_ik > ϕ_ij it follows that and therefore mechanism A explains the observed comorbidities of disease i better than B. The relative risk indicators in equation 3 also covers cases with more than one mechanism in a given layer, i.e. , as it will be typically the case for the pathopyhsiological layers considered in this work. Further, since re-scales the observed comorbidity strengths for a given layer by the typical strengths observed in the control set for disease i, it is meaningful to compare these indicator values for different diseases, even in the presence of statistical biases that in ϕ_ij that may occur when very rare and frequent diseases are compared^16,17.

For convenience we also defined the logarithmic relative comorbidity risk, . A value of close to zero indicates that the presence of pathophysiological comorbidities of type α have no relation whatsoever to the actual, phenotypic comorbidities of i. With increasingly positive values of , the probability increases that the pathophysiological comorbidities of i are indeed observed in the population.

Note that the relative comorbidity risk can be large due to a single comorbidity j of type α with a very high phenotypic comorbidity strength , or because there are a large number of comorbidities with only moderately increased comorbidity strengths. In particular, might favor diseases that have a large number of connections of type α to diseases that are physiologically very similar and that have similar ICD10 diagnosis codes, see Text S1. To adjust for these biases we rescaled by the node degree to obtain a measure that favors diseases with a smaller number of highly relevant disease-causing mechanisms. The re-scaled comorbidity risk, , is given by .

We performed two different statistical tests to evaluate whether is significantly greater than zero. First, a Wilcoxon rank sum test for equal medians of two samples was performed. The samples were given by the set of comorbidity strengths of all diseases j that share a link of type α with i, , and the set . The p-value for , , was obtained from the one-sided Wilcoxon rank sum test against the alternative hypothesis that the median of S₁ is smaller than the median of S₂. A Benjamini-Hochberg multiple hypothesis testing correction was applied on each layer using an exploratory threshold for the false discovery rate of α = 0.25 (which corresponds to thresholds for the adjusted p-values in the range between 0.1 and 0.05). Second, we performed a randomization test for where we replace by a random permutation of its elements, denoted by . The randomized was computed from equation 3 where was replaced by . For a given α, has the same number of nodes and links as , but is otherwise completely randomized.

Results and Discussion

The estimates of the most probable disease causes can be visualized in a three-dimensional representation where the axes show the genetic (G), pathway-based (P), and toxicogenomic comorbidity risks (T). Each disease corresponds to a point with coordinates , see Fig. 3(a) and its projections onto the (b) G − P, (c) G − T, and (d) P − T planes. The size of each marker is proportional to the frequency N_i/N of disease i. For this visualization we do not include diseases that are only present in one of the molecular layers, G, P, or T. For the remaining 254 pathologies we set for all diseases where is not significantly different from zero after the multiple hypothesis testing correction. The majority of disorders are clearly dominated by genetic risk factors (many points are close to the G-axis). Some disorders cluster around the P and T axes indicating purely pathway-based and toxicogenomic origins. Intriguingly, there is precisely no disease that has a significant pathway-based and toxicogenomic comorbidity risk at the same time, see Fig. 3(d). However, a number of disorders with significant pathway-based or toxicogenomic risks have also significant genetic contributions, see Fig. 3(b) and (c). This can also be seen in Table 1, where for instance the chronic nephritic syndrome ranks high in genetic and toxicogenomic comorbidity risks.

**Table 1 Top 10 diseases in every class of disease-causing mechanisms, α, and their relative comorbidity risks , ranked by the significance of its overlap with the phenotypic disease layer, .**

The per-link contributions, , of three types of pathophysiological mechanisms are shown in Fig. 3(e)–(h). Almost all disorders show one dominant comorbidity risk contribution, i.e. they cluster around a single axis. As we have excluded here all diseases for which only one type of data exists, this clustering can not be trivially explained by incomplete or missing data. Our results are particularly relevant for “complex diseases” where we focus on disorders that have not only a genetic component as defined by OMIM⁹, but also pathway-based and/or toxicogenomic contributions. It can be shown that the observation that disorders cluster around a single axis in Fig. 3(d) also holds for the 120 diseases that are present in each of the layers. Again, most diseases show large genetic risks, while some cluster around the P and T axes. In the supporting information, SI Fig. 1, we show results for where we allow comorbidities that are at the same time genetic and pathway-based/toxicogenomic (i.e. we drop the second condition for in equation 2). We also include diseases that are only present in one of the molecular layers and therefore fall by construction on one of the axis. There are now disorders with, both, significant pathway-based and toxicogenomic comorbidity risks. For these comorbidities, however, there exists also a direct genetic mechanism that may account for the phenotypic comorbidities.

Table 1 shows the diseases with the largest genetic, pathway-based, or toxicogenomic comorbidity risks, ranked by statistical significance. The top genetic diseases include schizo-affective and delusional disorders, as well as schizophrenia. Different forms of osteoarthritis and chronic bronchitis, as well as nephrotic and nephritic syndromes also show high genetic comorbidity risks. The top pathway-based diseases are major depressive disorders, endocrine disorders such as obesity and amyloidosis, diseases of the nervous systems including epilepsy and extrapyramidal and movement disorders, as well as disorders of bone density and multiple myeloma. The top toxicogenomic diseases include various forms of dermatitis and other skin diseases such as lichen simplex chronicus and prurigo, but also aortic aneurysms, and the chronic nephritic syndrome.

Schizophrenia is indeed a highly heritable disorder that is associated with more than hundred gene loci³⁷. The large pathway-based risk for depressions is corroborated by strong and supposedly bi-directional associations between the metabolic syndrome and depression, which have been a long-standing puzzle in epidemiological studies³⁸. Depressions also exhibit strongly significant genetic comorbidity risks (, ) in consistency with the finding of a gene-by-environment interaction where individuals with a functional polymorphism in the promoter region of the serotonin transporter (5-HT T) gene exhibited more depressive symptoms in relation to stressful life events³⁹. The high toxicogenomic risks for aortic aneurysms are in line with the effects of chemicals such as nicotine and prostaglandin on related disease-genes⁴⁰. In summary, for most of the top ranking diseases for each layer there are indeed known and highly relevant pathobiological mechanisms of the given type, which validates our approach.

We next answer the question if there is a relation between pairs of diseases that tend to be mutually exclusive in individual patients, i.e. ϕ_ij < 0, and the pathophysiological layers in the HDMN. To do so one can define an “anti-comorbidity” network, η, as η_ij = −ϕ_ij iff ϕ_ij < 0 and η_ij = 0 otherwise. The relative comorbidity risks that are obtained using η instead of ϕ, , are not significantly different from zero in all but two cases (pathway-based risk for D86, sarcoidosis, and , and the toxicogenomic risk for G30, Alzheimer’s disease, and ). An overlap between anti-correlations of diseases and shared mechanisms is therefore not a significant feature of the data for the vast majority of disorders.

Since phenotypic disease networks are known to undergo large changes in their topology as a function of the age of the underlying patient cohorts¹⁷, we first clarified how the relative comorbidity risks depend on patient age. The age-dependent relative risks, , were computed using equation 3 and by replacing with its age-dependent counterpart, . Results for the average relative comorbidity risks over all diseases i, denoted by , are shown in Fig. 4(a). Note that this average is also taken over diseases with comorbidity risks that are not significantly different from zero. The genetic comorbidity risk averaged over all diseases i, , is substantially higher than the pathway-based or toxicogenomic risks and assumes values above 1 for ages between 30 and 90. Effects are considerably smaller for the average pathway-based (toxicogenomic) comorbidity risks that reach values around 0.5 at ages around 30 (50). These age differences in the peaks of the environmental comorbidity risks are driven by the age-dependence in the prevalences of the diseases that provide the most dominant contributions to . In all cases, results for clearly exceed the expectation values from the randomized risks , obtained from . Note that we have confirmed that the dominance of genetic disorders can not be a simple consequence of the exclusion of genetic comorbidities in the other molecular layers in equation 2. Removing this constraint would increase the average environmental contributions by a factor of about 1.5, while the genetic comorbidity risks exceed them by a factor between four and five. From now on we consider only the time-independent HDMN.

Figure 4(b) shows how much genetic, pathway-based, and toxicogenomic risks contribute to the observed comorbidities for subgroups of diseases that are given by the chapters of the ICD10 classification, the disease groups I. Clear differences between groups of diseases are revealed. Genetically caused comorbidities include mental disorders, disorders of the digestive system, but also susceptibility to infections. Genetic mechanisms are least relevant for disorders of the eye, ear, skin, and for cancers. Pathway-based comorbidity risks are largest for, again, mental disorders and diseases of the genitourinary system. This shows that the group of mental disorders comprises heterogeneous phenotypes that have either genetically caused or pathway-based comorbidities. Toxicogenomic comorbidity risks are largest for diseases of the skin, the genitourinary and the respiratory system, as well as for congenital malformations.

The “nurture index”, I_i, quantifies to which extent comorbidities of phenotype i are caused by environmental, i.e. pathway-based or toxicogenomic, mechanisms,

Figure 5 shows results for (a) the heritability and (b) the number of new drug approvals D_i as a function of I_i. Each circle in Fig. 5 corresponds to a disease phenotype, labeled by its ICD10 code. The colors of the circles refer to their chapter in the ICD classification. The highest values of I_i are found for diseases of the genitourinary system (N03 and N05 nephritic syndrome, N02 hematuria, N08 glomerular disorders), depressions (F32, F33), several cancers (C84 T/NK-cell lymphoma, C74 adrenal gland, C61 prostate), as well as bronchiectasis (J47). Figure 5(a) shows that there is a significant negative correlation between the nurture index, I_i, and the broad-sense heritability, , of disorder i. This corroborates that I_i is indeed related to the plasticity of phenotype i, i.e. I_i increases with the influence of environmental risk factors. There is also a strong significant negative correlation between the logarithms of I_i and D_i shown in Fig. 5(b). We found this result to be very robust for a large variety of choices of this time span, ranging from five years upwards. Note that and show no significant correlation among them (ρ = 0.19, p = 0.17). This indicates a significant bias in pharmaceutical R&D that favors market placements of drugs that target disorders with low environmental risk factors. It has indeed been shown that the success rates for drug development vary dramatically among disease areas⁴¹. These rates have been found to increase with the existence of direct genetic evidence, which in particular applies to diseases of the musculoskeletal system and infections, which we also identified as predominantly genetic in Fig. 3(b).

Conclusions

We developed a novel approach to quantitatively disentangle the most relevant genetic or environmental disease-causing mechanisms for a large number of particular disorders. This has become possible through recent advances in observing networks of phenotypic comorbidity relations with unprecedented precision^16,17. We considered three different classes of mechanisms that can be at the core of these observed comorbidities, namely genetic, pathway-based, and toxicogenomic mechanisms that cause more than one disorder. By constructing the HDMN we have been able to identify the most probable causes for 358 different phenotypes by measuring the overlap between phenotypic and pathophysiological comorbidities, the relative comorbidity risks . We find that the different environmental disease-causing mechanisms do not mix; we found no pathologies that have significant pathway-based and toxicogenomic comorbidity risk contributions at the same time. By considering only diseases for which at least two different types of molecular comorbidities are known, we can rule out that this result is due to missing data. While for most of the studied diseases genetic risk factors dominate, we identify a number of disorders with significant environmental contributions which typically coincides with low heritability and lower rates of successful market placements of drugs.

Our approach cross-validates pathophysiological mechanisms by whether their predicted comorbidities are indeed directly observed in the population. Moreover we can rule out certain types of disease-causing mechanisms when the comorbidities that they predict are not observed. The methodology developed here can be extended to decide on a quantitative basis if the comorbidities predicted by a particular individual pathophysiological mechanism are also phenotypically relevant. The new technology can be used as a novel and data-driven way to validate potential drug targets.

Additional Information

How to cite this article: Klimek, P. et al. Disentangling genetic and environmental risk factors for individual diseases from multiplex comorbidity networks. Sci. Rep. 6, 39658; doi: 10.1038/srep39658 (2016).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Lim, S. S. et al. A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990–2010: a systematic analysis for the Global Burden of Disease Study 2010. The Lancet 380(9859), 2224–60 (2012).
Article Google Scholar
Barabási, A.-L., Gulbahce, N. & Loscalzo, J. Network medicine: A network-based approach to human disease. Nat Rev Genet 12(1), 56–68 (2011).
Article Google Scholar
Rzhetsky, A., Wajngurt, D., Park.N. & Zheng, T. Probing genetic overlap among complex human phenotypes PNAS 104, 11694–9 (2007).
Article CAS ADS Google Scholar
Lee, D.-S. et al. The implications of human metabolic network topology for disease comorbidity. PNAS 105, 9880–5 (2008).
Article CAS ADS Google Scholar
Boccaletti, S. et al. The structure and dynamics of multilayer networks. Physics Reports 544, 1–122 (2014).
Article ADS MathSciNet Google Scholar
Kivelä, M. et al. Multilayer networks. Journal of Complex Networks 3(2), 203–271 (2014).
Article Google Scholar
Pawson, T. & Linding, R. Network medicine. FEBS Lett 582, 1266–70 (2008).
Article CAS Google Scholar
Zanzoni, A., Soler-López, M. & Aloy, P. A network medicine approach to human disease. FEBS Lett 583, 1759–65 (2009).
Article CAS Google Scholar
McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Online Mendelian Inheritance in Man, OMIM, http://omim.org/, (Date of access: 30/04/2015).
Goh, K.-I. et al. The human disease network. PNAS 104, 8685–90 (2007).
Article CAS ADS Google Scholar
Feldman, I., Rzhetsky, A. & Vitkup, D. Network properties of genes harboring inherited disease mutations. PNAS 105, 4323–8 (2008).
Article CAS ADS Google Scholar
Audouze, K. et al. Deciphering diseases and biological targets for environmental chemicals using toxicogenomics networks. PLoS Comput Biol 6(5), e1000788 (2010).
Article Google Scholar
Menche, J. et al. Uncovering disease-disease relationships through the incomplete interactome. Science 347(6224), 1257601-1-8 (2015).
Sun, K., Buchan, N., Larminie, C. & Pržulj, N. The integrated disease network. Integr. Biol. 6, 1069–79 (2014).
Article Google Scholar
Sun, K., Goncalves, J. P., Larminie, C. & Pržulj, N. Predicting disease associations via biological network analysis. BMC Bioinformatics 15, 304–316 (2014).
Article Google Scholar
Hidalgo, C. A., Blumm, N., Barabási, A.-L. & Christakis, N. A. A dynamic network approach for the study of human phenotypes. PLoS Comput. Biol. 5, 1–11 (2009).
Article Google Scholar
Chmiel, A., Klimek, P. & Thurner, S. Spreading of diseases through comorbidity networks across life and gender. New Journal of Physics 16, 115013 (2014).
Article ADS Google Scholar
Nalls, M. A. et al. Genetic comorbidities in Parkinson’s disease. Hum. Mol. Genet. 23(3), 831–41 (2014).
Article CAS Google Scholar
Kitagishi, Y., Kobayashi, M., Kikuta, K. & Matsuda, S. Roles of PI3K/AKT/mTOR pathway in cell signaling of mental illnesses. Depression Research and Treatment 2012, 752563 (2012).
Article Google Scholar
Guo, S. Insulin signaling, resistance, and metabolic syndrome: insights from mouse models into disease mechanisms. J Endocrinol. 22, T1–23 (2014).
Article MathSciNet Google Scholar
Dunbar, J. A. et al. Depression: an important comorbidity with metabolic syndrome in a general population. Diabetes Care 31(12), 2368–73 (2009).
Article Google Scholar
Klimek, P., Kautzky-Willer, A., Chmiel, A., Schiller-Frühwirth, I. & Thurner, S. Quantification of diabetes comorbidity risks across life using nation-wide big claims data. PLoS Comput. Biol. 11(4), e1004125 (2015).
Article ADS Google Scholar
Zhu, Z., Oh, M. H., Yu, J., Liu, Y. J. & Zheng, T. The role of TSLP in IL-13-induced atopic march. Sci Rep 1, 23 (2011).
Article Google Scholar
Manikkam, M., Haque, M., Guerrero-Bosagna, C., Nilsson, E. E. & Skinner, M. K., Pesticide methoxychlor promotes the epigenetic transgenerational inheritance of adult-onset disease through the female germline. PLoS ONE 9(7), e102091 (2014).
Article ADS Google Scholar
Darlenski, R., Kazandjieva, J., Hristakieva, E. & Fluhr, J. Atopic dermatitis as a systemic disease. Clinics in dermatology 32(3), 409–13 (2014).
Article Google Scholar
Thurner, S. et al. Quantification of excess-risk for diabetes when born in times of hunger, in an entire popuation of a nation, across a century. PNAS 110(12), 4703–7 (2013).
Article CAS ADS Google Scholar
WHO, ICD-10 Version: 2010, http://apps.who.int/classifications/icd10/browse/2010/en, (Date of access: 18/01/2016).
The UniProt Consortium, Activities at the Universal Protein Resource. Nucleic Acids Research42, D191–8 (2014).
Croft, D. et al. The reactome pathway knowledgebase. Nucleic Acids Research 42, D472–7 (2014).
Article CAS Google Scholar
Davis, A. P. et al. The comparative toxicogenomics database’s 10th year anniversary: update 2015. Nucleic Acids Research, D914–20 (2014).
Osborne, J. D. et al. Annotating the human genome with disease ontology. BMC Genomics 10 (Suppl 1), S6 (2009).
Article Google Scholar
Aymé, S. & Schmidtke, J. Networking for rare diseases: a necessity for Europe. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 50(12), 1477–83 (2007).
Article Google Scholar
Wikipedia, ICD-10, https://en.wikipedia.org/wiki/ICD-10, (Date of access: 30/04/2015).
Cariaso, M. & Lennon, G. SNPedia: a wiki supporting personal genome annotation, interpretation and analysis. Nucleic Acids Res 40, D13008–12 (2012).
Article Google Scholar
http://www.fda.gov/drugsatfda, 2016 (Date of access: 07/01/2016).
Yildirim, M. A., Goh, K.-I., Cusick, M. E., Barabási, A. L. & Vidal, M. Drug-target network. Nat. Biotechnol. 25(10), 1119–26 (2007).
Article CAS Google Scholar
Ripke, S. et al. Biological insights form 108 schizophrenia-associated genetic loci. Nature 511(7510), 421–7 (2014).
Article CAS ADS Google Scholar
Pan, A. et al. Bidirectional association between depression and metabolic syndrome. Diabetes Care 35(5), 1171–80 (2012).
Article Google Scholar
Caspi, A. et al. Influence of life stress on depression: moderation by a polymporphism in the 5-HTT gene. Science 301(5631), 386–9 (2003).
Article CAS ADS Google Scholar
Sakalihasan, N., Limet, R. & Defawe, O. D. Abdominal aortic aneurysm. The Lancet 365(9470), 1577–89 (2005).
Article CAS Google Scholar
Nelson, M. R. et al. The support of human genetic evidence for approved drug indications. Nature Genetics 47, 856–60 (2015).
Article CAS Google Scholar

Download references

Acknowledgements

We are very grateful to Jörg Menche for stimulating discussions and acknowledge financial support from the European Commission, FP7 project MULTIPLEX No. 317532.

Author information

Authors and Affiliations

Section for Science of Complex Systems, CeMSIIS, Medical University of Vienna, Spitalgasse 23, A-1090, Austria
Peter Klimek, Silke Aichberger & Stefan Thurner
Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, 87501, NM, USA
Stefan Thurner
IIASA, Schlossplatz 1, Laxenburg, A 2361, Austria
Stefan Thurner

Authors

Peter Klimek
View author publications
You can also search for this author in PubMed Google Scholar
Silke Aichberger
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Thurner
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.K. and S.T. conceived the paper, P.K. and S.A. researched the data, P.K. and S.T. wrote the manuscript. All authors reviewed the manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Klimek, P., Aichberger, S. & Thurner, S. Disentangling genetic and environmental risk factors for individual diseases from multiplex comorbidity networks. Sci Rep 6, 39658 (2016). https://doi.org/10.1038/srep39658

Download citation

Received: 03 October 2016
Accepted: 24 November 2016
Published: 23 December 2016
DOI: https://doi.org/10.1038/srep39658

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.