Interpreting molecular similarity between patients as a determinant of disease comorbidity relationships

Comorbidity is a medical condition attracting increasing attention in healthcare and biomedical research. Little is known about the involvement of potential molecular factors leading to the emergence of a specific disease in patients affected by other conditions. We present here a disease interaction network inferred from similarities between patients’ molecular profiles, which significantly recapitulates epidemiologically documented comorbidities. Furthermore, we identify disease patient-subgroups that present different molecular similarities with other diseases, some of them opposing the general tendencies observed at the disease level. Analyzing the generated patient-subgroup network, we identify genes involved in such relations, together with drugs whose effects are potentially associated with the observed comorbidities. All the obtained associations are available at the disease PERCEPTION portal (http://disease-perception.bsc.es).


-Filtering the Stratified Comorbidity Network looking for shared genes allows detecting potential biological processes involved in comorbidity relations
As previously done with genes, the Stratified Comorbidity Network can be used to look for drugs potentially associated with comorbidity relations between pairs of patientsubgroups. Focusing on the AD -NSCLC inverse comorbidity relation, we detected 5 negative Relative Molecular Similarities (nRMS) interactions between AD and NSCLC patient-subgroups associated to at least one drug (i.e., at least one drug is positively associated to all the patients from one patient-subgroup and negatively associated to all the patients from the other subgroup). Interestingly, several drugs targeting different molecular mechanisms were detected in those nRMS interactions, with MST312 (telomerase inhibitor), tryptophan (amino acid used in the biosynthesis of proteins), and antimycin A (antibiotic) among others, suggesting that different molecular mechanisms (derived from the patient-drug associations extracted using LINCS) might explain the same comorbidity relationship between diseases. Regarding MST-312, it has been described that telomerase inhibition shows a strong antiproliferative effect on lung cancer 6 and, at the same time, a significantly accelerated rate of telomere shortening has been described in AD patients 7 (pointing to the idea that the telomerase shortening in AD might be driving the protection against NSCLC development). Tryptophan metabolism is altered in AD influencing the balance of pro-and anti-inflammatory cytokines within the Central Nervous System, and has been proposed as a novel druggable target 8 . On the other hand, a higher tryptophan transport and metabolism has been described in tumors with higher proliferation rates 9 , and biomarkers of tryptophan metabolism have also been associated with lung cancer risk 10 .
In summary, the use of the Stratified Comorbidity Network filtered by shared drugs and small molecules allows the analysis of the specific molecular processes potentially involved in comorbidities. This approach is especially interesting in those cases in which a set of patients present comorbidity relations opposite to the ones observed at the diseaselevel.

-Biological insights into the Relative Molecular Similarity relations between
Alzheimer's disease, asthma, diabetes, COPD and schizophrenia Keeping our focus on Alzheimer's disease, it has been described to be directly comorbid with asthma, COPD, schizophrenia, and diabetes 11,12,13,14 (known to be comorbid between them, forming a size 5 clique). Among the ten comorbidity relations described at an epidemiological level, DMSN only recovers Alzheimer's disease pRMS interactions with both schizophrenia and diabetes. Additionally, three interactions contradict epidemiological tendencies (asthma with COPD and diabetes, and diabetes with schizophrenia).
Going down to the patient-subgroup level (as done in the Alzheimer's disease -NSCLC case), selecting subgroups with at least 4 patients and one gene deregulated in the same orientation in all of the patients composing the subgroup, we still recover interactions between Alzheimer's disease subgroups and diabetes and schizophrenia subgroups.
Additionally, we recover pRMS interactions of COPD with schizophrenia, asthma and Alzheimer's disease, and between asthma and diabetes. Interestingly, we still detect nRMS interactions for asthma with COPD and diabetes (opposite to what was expected based on epidemiological studies, that is, a pRMS).
Looking for potential molecular explanations of such relations, we identified the gene ACTL6A, which plays a role in proteolysis, to be up-regulated in the type II diabetes subgroup 4 and down-regulated in the asthma subgroup 40. These subgroups have a significant nRMS. Protein degradation has been described to be increased both in insulindeficient and insulin-resistant humans 15 . At the same time, a differential proteolytic activity has been described between eosinophilic and neutrophilic asthma 16 , which can potentially explain this unexpected RMS. On the other hand, asthma subgroup 32 and type II diabetes 4 present a pRMS, sharing the down-regulated gene MYLPF, which has been previously described to be associated both to asthma 17 and diabetes 18 .
Finally, schizophrenia is known to be directly comorbid with COPD and Alzheimer's disease.
We detect a pRMS between schizophrenia subgroup 28 and COPD subgroup 39.
Exploring the molecular drivers of such pRMS, the gene ADRB3 (an adrenergic receptor) is down-regulated in both of them. Interestingly, polymorphisms affecting adrenergic receptors have been associated with COPD in adults 19 , and at the same time, several papers provide indirect evidence that adrenergic receptors may play an important role in schizophrenia 20 .
Moreover, we detect a pRMS between schizophrenia subgroup 14 and Alzheimer's disease subgroup 7, sharing the down-regulated gene CNTN6. Supporting our results, deletions of the entire CNTN6 gene have been previously described in neuropsychiatric disorders (including schizophrenia) 21 , and at the same time, duplications at 3p26.3 disrupting the CNTN6 gene have been described in Alzheimer's disease 22 .

-Correlation based similarities
As an alternative approach, patients were also connected based on correlation analyses.
In summary, we calculated pairwise Pearson correlations on the complete list of DEGs

-Patient classification
We have seen that comorbidity relationships can be understood when subdividing diseases into patient-subgroups, suggesting different underlying molecular mechanisms. Therefore, we set to predict the comorbidity propensity of new patients by locating them in our networks, in a proof-of-principle personalized framework. To this end, we associated each patient to the most similar patient, according to the Euclidean distances between their discretized differential expression profiles (see Supplementary Methods), and then assigned them to a disease and patient-subgroup. Using this approach, 92% of the patients were properly assigned to their corresponding disease. We correctly assigned 183 of the 189 patients with AD, of whom 132 were also correctly classified into their corresponding subgroups, and 293 of the 302 patients with NSCLC, of whom 211 were also correctly classified into their corresponding subgroups (see leave-one-out procedure details in Supplementary Fig. 17).
In order to test the performance of our method with an external independent dataset not previously used, we downloaded one additional NSCLC expression dataset (see supplementary methods), and tested whether these new patients could be correctly associated with their disease. 68% of the patients were classified into NSCLC, and the remaining 32% were classified into lung cancer (not specifying the type of lung cancer).
To conclude, estimating Euclidean distances between discretized differential expression profiles for each patient, we are able to assign patients to their corresponding diseases and importantly to these newly defined subgroups, suggesting the potential for the construction of a general system for the prediction of patients' specific comorbidity risks.

Subgroups' sample sizes
Subgroups' sample sizes