Investigation of COVID-19 comorbidities reveals genes and pathways coincident with the SARS-CoV-2 viral disease

The emergence of the SARS-CoV-2 virus and subsequent COVID-19 pandemic initiated intense research into the mechanisms of action for this virus. It was quickly noted that COVID-19 presents more seriously in conjunction with other human disease conditions such as hypertension, diabetes, and lung diseases. We conducted a bioinformatics analysis of COVID-19 comorbidity-associated gene sets, identifying genes and pathways shared among the comorbidities, and evaluated current knowledge about these genes and pathways as related to current information about SARS-CoV-2 infection. We performed our analysis using GeneWeaver (GW), Reactome, and several biomedical ontologies to represent and compare common COVID-19 comorbidities. Phenotypic analysis of shared genes revealed significant enrichment for immune system phenotypes and for cardiovascular-related phenotypes, which might point to alleles and phenotypes in mouse models that could be evaluated for clues to COVID-19 severity. Through pathway analysis, we identified enriched pathways shared by comorbidity datasets and datasets associated with SARS-CoV-2 infection.

COVID-19 emerged as a global pandemic through the first half of 2020 1 . The severity of the disease varies from asymptomatic to lethal with a case mortality rate in the 20 most affected countries ranging between 1 and 15.3% (https ://coron aviru s.jhu.edu/data/morta lity; retrieved 24 July 2020). Severe disease shows manifestations of both acute respiratory distress syndrome (ARDS) and cytokine release syndrome (CRS) 2,3 . In pediatric patients, a blood vessel inflammatory pathology similar to Kawasaki disease is sometimes present 4 . All of these presentations have common elements of abnormality of inflammatory responses and manifestations of vascular defects such as thrombosis, which may be causally related [5][6][7][8][9] .
Since the emergence and global transmission of the SARS-CoV-2 virus, many studies have reported that patients with certain underlying medical conditions have especially severe responses to the coronavirus infection 10 . Some of the identified comorbidities that lead to severe disease are Cardiovascular Disease, Diabetes, Hepatitis, Lung Disease, and Kidney Disease 11-17 . Understanding what makes some patients suffer from severe COVID-19 is an ongoing puzzle that is being investigated from both the virus and host perspectives 18 . We hypothesize that by exploring the underlying genetic basis of comorbidities associated with severe disease, we can identify putative host genes and pathways that are responsible for or contribute to the severity. Identification of these genes and pathways can serve as a gateway for further investigation into understanding how the host responds to the virus and for potential therapeutic strategies to interfere with a severe outcome.
We interrogated gene sets that are associated with the five previously mentioned underlying comorbidities to determine gene products that are shared among them. We identified several pathways and phenotypes in common, including those that are associated with severe COVID-19 pathology. All of the comorbid diseases have been and continue to be actively studied, now in the additional context of response to SARS-CoV-2 infection [19][20][21][22][23] . In particular, the laboratory mouse has been extensively utilized as an animal model to study these conditions 24 . As a result, mouse strains carrying mutations in shared genes or genes in shared pathways, and engineered to be capable of being infected by the virus, can present useful starting points for investigating the biological basis of disease severity 25 . Mammalian phenotype enrichment analysis. To investigate details of the phenotypes associated with the shared genes, we took advantage of the mammalian phenotype data available from the Mouse Genome Informatics site (www.infor matic s.jax.org). The Mouse Genome Database (MGD) group captures phenotypic data using the Mammalian Phenotype Ontology (MP), a computable ontological structure, that can be queried and used for phenotypic enrichment analysis 32 . MGD also integrates these murine data into the context of human disease data based on orthology and gene expression 24,32 . This integrated resource allowed us to exploit the associations of mouse genes and their phenotypes for enrichment purposes using VLAD, and gives us an entry into identification of potential mouse models for future study 33 .
Mouse orthologs for the shared human genes were identified using data available from the Alliance of Genome Resources (Alliance) (www.allia ncege nome.org/) 34 using the Alliance release 3.1 stringent mouse-human orthology set. If a human gene symbol matched more than one mouse marker, that gene was not included in the analysis. This resulted in the following fourteen human genes being excluded from the analysis: AGTR1, CCL2,  CFH, CYP2D6, CYP3A4, GSTM1, GSTP1, HAMP, HLA-B, HLA-DRB1, IFNA1, MMP1, SERPINA1, and TIMP2. We did not identify mouse orthologs for two human genes (CXCL8, HLA-DQB1). Excluding these 16 from the initial 123 genes that were shared among four of five comorbidities, left us with 107 remaining mouse orthologs www.nature.com/scientificreports/ which we used in the VLAD analysis. All but one of the 107 mouse genes (H2-Ea, a polymorphic pseudogene; i.e. a coding gene in some strains and a pseudogene in others) had annotations to MP. The VLAD phenotype analysis was run on 20 August 2020 using annotation data from 20 August 2020, ontology data from 9 July 2020 and default parameters.
Reactome pathway enrichment analysis. For pathway enrichment, we submitted the gene lists to the Reactome ' Analyze Gene List' enrichment tool (https ://react ome.org/Pathw ayBro wser/#TOOL=AT) based on Reactome version 72. The analysis was performed on 18 May 2020. Results were downloaded using the 'Pathway Analysis Results' and ' Analysis Report' functionality at Reactome. Pathway enrichment analysis was also performed for six COVID-19-related gene sets shown in Table 2. We selected up-regulated (log 2 fold change > = 2) genes in host transcriptional response to SARS-CoV-2 in three cell cultures: human A549 lung alveolar cells (102 genes), Calu3 human lung adenocarcinoma epithelial cells (333 genes), normal human bronchial epithelium (NHBE) cells (14 genes); genes that are up-regulated in postmortem lung samples from COVID-19-positive patients relative to biopsied healthy lung tissue from uninfected individuals (586 genes) 28 ; immune-response and angiogenesis-related genes that are up-regulated in lungs from patients who died from COVID-19 (114 genes) 5 ; genes that are overexpressed in severe compared to mild cases of COVID-19 (4 genes) 29 . We corrected for any symbols that were out of date and again used the Reactome Pathway analysis tool. The analysis was performed on 10 August 2020.

Results
COVID-19 comorbidities share associated genes. To test our hypothesis that comorbidities associated with COVID-19 severity have common underlying molecular bases, we chose five comorbidities that have been reported in the literature as closely associated with poorer disease outcome: Kidney Disease, Liver Disease, Diabetes, Lung Disease and Cardiovascular Disease. We searched the GeneWeaver Data repository for gene sets associated with these comorbidities and identified gene sets from MeSH and HPO that we used in our analyses (Table 1).
To identify genes that were shared among the five comorbidity gene sets, we used the 'Combine GeneSets' tool to create a matrix of genes and sets in which they were contained. We tabulated the number of gene sets that This gene set describes genes that are up-regulated in lungs from patients who died from COVID-19. COVID-19 is the disease caused by SARS-CoV-2 virus. Note that this expression analysis includes only the angiogenesis-associated and inflammation-associated genes available on NanoString panels. The authors define up-regulated as those genes that show a (FDR) of < = 0.05. These data are from the publication (angiogenesis) and supplementary (inflammation) materials associated with the publication. PMID:32437596 Genes that are overexpressed in severe compared to mild cases of Covid-19 People with severe cases of Covid-19 express these proteins at significantly higher levels than people with mild cases of Covid-19. Data from www.nature.com/scientificreports/ contained each gene and determined that eight genes were present in all five sets: APOA1, APOE, B2M, CTLA4, F2, F5, HMOX1 and STAT3; 123 genes were common to at least four out of five comorbidity sets (Table 3).
Genes shared among COVID-19 comorbidities are enriched for phenotypes corresponding to immune system processes and circulatory system biology. We tested the functional significance of the genes shared among the five comorbidities by performing two different kinds of enrichment analysis on our gene sets. First, we identified the mouse orthologs of the human genes and performed a phenotype enrichment using the VisuaL Annotation Display tool (VLAD) (http://proto .infor matic s.jax.org/proto types / vlad/). For genes shared among all five comorbidities, VLAD enrichment analysis identified 762 significantly enriched (p = < 0.05) mammalian phenotypes (Supplemental Table 1). The most significantly enriched terms fall into three general categories: T-cell related phenotypes, inflammation or infection related phenotypes, and cardiovascular phenotypes including blood clotting. Table 4 shows that of the eight shared genes, several were annotated to each of the significantly enriched phenotypes. We repeated our phenotype enrichment analysis using genes that are co-annotated to four of the five comorbidities associated with COVID-19. When we examined the shared genes among comorbidity sets, we found that 123 genes were shared among four out of five comorbidities. Phenotype enrichment analysis performed with the 107 one-to-one mouse orthologs of these human genes was consistent with our analysis of the eight genes that were conserved in all five comorbidities. The increase in gene number resulted in an increase in the number of Table 3. Genes shared by COVID-19 comorbidities. This table shows the genes that were annotated to four out of five comorbidities that are associated with COVID-19 severity. Genes in bold are annotated to all five comorbidities.  Table 2). VLAD analysis showed that the major areas of the ontology with the most highly significant enrichment were, as in the analysis for the eight genes shared by all the comorbidities, in inflammatory response and infection, leukocyte biology and blood vessel morphology. Abnormal blood coagulation was no longer in the most highly significant group of phenotypes, but was significantly enriched (p = 1.52 × 10e-11). Similar to our results for the eight genes shared among all five comorbidities, the mouse orthologs of the 123 genes shared in four out of five comorbidities showed many genes associated with each of the most significant phenotypes (Table 5).
Pathway analysis enrichment includes cytokine signaling pathways, blood coagulation and plasma lipoprotein metabolism. In addition to our phenotype analysis, we were also interested in investigating whether the genes shared among comorbidities were enriched for specific biological pathways. To answer this question, we used our human gene sets and the Reactome Knowledgebase in an enrichment analysis for biological pathways. Reactome is a manually curated resource that captures information about reactions, their relationships and the genes and chemicals that play a role in those reactions 35 .
We interrogated the Reactome Knowledgebase using the eight genes that were shared among all five comorbidities and identified 103 pathways/subpathways that were significantly enriched (FDR = < 0.05, Supplemental Table 5). Reactome pathways are organized into a hierarchical format where grouping pathways are subcategorized into more specific pathways which in turn are eventually represented by individual reactions. Reactome captures information about not only the genes and molecules that act in a pathway but also those that are acted upon, thus casting a wide net for genes that are included in an analysis. The 25 most significantly enriched pathways grouped under several parent pathways are shown in Fig. 1. Two genes, APOA1 and APOE, are shared among several plasma lipoprotein assembly, remodeling and clearance pathways. Three genes, APOA1, F2 and F5, are found in the hemostasis pathway, all are included in platelet activation and the latter two in blood clotting. Five of the eight genes shared among the comorbidities were involved in immune system pathways: B2M, HMOX1, CTLA4, STAT3 and F2. Of these five genes, three are included in cytokine signaling: B2M, HMOX1 and STAT3. Other informative pathways showed that APOA1, APOE, F2 are in GPCR downstream signaling, and F5, APOA1, APOE are in vesicle-mediated transport.
We repeated the pathway enrichment analysis with the 123 genes that were shared by four out of five comorbidities. We identified 172 pathways that were significantly enriched (FDR = < 0.05, Supplemental table 6). These results supported and confirmed the results we obtained with the eight genes that were shared among all five comorbidities. Although with lower significance, enriched pathways include 'common pathway of fibrin clot formation' (FDR = 5.9 × 10e−3; four genes), 'platelet degranulation' (FDR = 6.0 × 10e−6; thirteen genes) and 'plasma lipoprotein assembly remodeling and clearance' (FDR = 0.034; five genes). Immune signaling pathways and particularly interleukin signaling pathways were frequent in our enrichment results (Fig. 2). The downstream GPCR signaling pathway and the retinoid/vitamin pathways were no longer significantly enriched.
We compared our pathway enrichment results with our phenotype enrichment results for the eight genes conserved among all five comorbidities to determine if there was consistency between the results obtained from independently curated resources: MGI phenotype and Reactome. Like the pathway analysis, our phenotype enrichment analysis also revealed lipoprotein phenotypes for significant enrichment, for example 'abnormal circulating lipoprotein level' (p = 1.81 × 10e−2). Phenotype analysis also revealed 'abnormal blood coagulation' and 'decreased platelet aggregation' (p = 5.81 × 10e−8 and p = 2.64 × 10e−2 respectively) in common with the pathway analysis. Our results from the analyses of the 123 genes conserved in four out of five comorbidities were also consistent. The pathway analysis revealed that the 25 most significant pathways were pathways related to the immune system, in particular there was concordance with the results from the eight genes, identifying pathways www.nature.com/scientificreports/ related to inflammatory responses, interferon and interleukin signaling. In total, the results of all of our analyses show that five comorbidities associated with severe COVID-19 share common physiological aspects including cytokine signaling, blood clotting, and plasma lipoprotein biochemistry.

STAT3 is shared among interleukin signaling pathways that are enriched in COVID-19 comorbidities.
To further investigate whether there is a common molecular basis for the interleukin signaling pathway enrichment we examined the Reactome enrichment results using the 123 genes conserved in four out of five comorbidities for interleukin signaling pathways (Supplemental Table 3). We created gene sets of the shared genes that were in Reactome interleukin pathways that are significantly enriched (FDR < 0.05). We used the HiSim graph tool in the GeneWeaver resource to create a graphical view of the genes that are found in the sets. The gene conserved among the largest number of sets is STAT3 which is found in nine of the eleven significantly enriched interleukin signaling pathways. IL12B is shared among four signaling pathways (data not shown).
Genes associated with SARS-CoV-2 infection response and genes shared among COVID-19 comorbidities identify common cytokine signaling pathways and hemostasis. Since we had identified pathways that were common to COVID-19-associated comorbidities, we investigated whether these pathways were also associated with COVID-19 itself. To answer this question, we created gene sets directly associated with SARS-CoV-2 infection from published literature ( Table 2). GeneWeaver Gene Set GS398287, represents four plasma cytokines that are significantly elevated in patients with severe disease versus patients with mild disease 29 . GS398329 represents 114 genes associated with angiogenesis or inflammation that were upregulated in COVID-19 postmortem samples 5 . Two gene sets, GS398539 and GS398534, of 119 and 333 genes respectively represent genes that are upregulated in two distinct lung adenocarcinoma epithelial cells infected with SARS-CoV-2; GS398533 with 14 genes represents genes that are upregulated in normal human bronchial epithelium cells infected with SARS-CoV-2; GS398334 represents 587 genes upregulated in post-mortem COVID-19 samples 28 . We ran Reactome pathway enrichment analysis on each of these sets and determined the enriched pathways that were shared with those identified in the comorbidity analyses (Supplemental tables 7-12). Unsurprisingly, GS398329 and GS398287, which were preselected for genes involved in the immune response, were enriched for immune response pathways. GS398329, preselected to be associated with angiogenesis, also showed significant enrichment for the comorbidity pathways associated with platelet biology. All of the data sets showed significant enrichment for signaling mediated by interleukin-4, -10 and -13 ( Table 6).

Identification of potential mouse models to study comorbidities and COVID-19 severities.
The results of our phenotype analysis using mouse orthologs of shared human genes shows that phenotypic enrichment is consistent with the pathway enrichment using the human genes and is also consistent with pathologies associated with severe COVID-19: blood coagulation, inflammation and cardiovascular pathologies [36][37][38][39][40][41] . Since mice provide an attractive genetic system for disease modeling, we investigated the phenotypes associated with each of these genes in further detail. Figure 3 shows each of the eight genes shared by all five comorbidities and the phenotype categories that were enriched in this set. Yellow highlighting indicates that mutations in the mouse gene have been annotated to a phenotype of a category that is enriched in the eight shared genes. For example homozygous mice of the genotype Ctla4 tm1Shr /Ctla4 tm1Shr display multiple phenotypes that are shared with severe COVID-19: abnormal lung inflammation, abnormal cytokine secretion (interferon secretion) and autoimmune response 42 . Hmox1 tm1Mlee /Hmox1 tm1Mlee mice are another example with cardiovascular, immune and liver system phenotypes (http://www.infor matic s.jax.org/allel e/MGI:24297 84).

Discussion and summary
COVID-19 is a global health concern. The disease is complex and varies in severity from asymptomatic to lethal 43 . As our understanding of the disease has progressed, a number of comorbidities associated with the disease have been identified that lead to greater severity. The goal of our work is to identify underlying genetic factors that might explain the mechanism of why certain comorbidities lead to more severe disease. To this end, we studied genetic features of five comorbidities that are associated with severe COVID-19: Cardiovascular Disease, Diabetes, Hepatitis, Lung Disease, and Kidney Disease. We identified common genes that were associated with each of the comorbidities and the pathways and phenotypes with which they are associated. We compared the results of the comorbidity analysis with genes that were directly associated with SARS-CoV-2 and showed that they shared common pathways involved in the immune response and platelet biology. Our results are encouraging in that these areas of physiology have also been correlated with severe disease. Here we discuss our results in the context of COVID-19 severity. Our analysis of genes shared among both comorbidities and SARS-CoV-2 infection identified several interleukin signaling pathways that were enriched in both categories. Interleukin-4/-13 and interleukin-10 signaling shared enrichment among the largest number of gene sets we examined, and interleukin-6, interleukin-12 and interleukin-2 shared enrichment between at least one comorbidity set and a set of genes upregulated in patients who died from COVID-19. STAT3 positively regulates the transcription of IL-6, which controls inflammation 44 and is a downstream signaling player in the IL-6 pathway through the IL6ST protein 45,46 . IL-12 is produced in response to infection and signals through the JAK-STAT pathway, including STAT3, to induce the proliferation of NK cells and T cells. These cells in turn trigger cytokine signaling including interferon gamma 47 .
One mechanism proposed for the severity of COVID-19 is the coincidence of severe acute respiratory distress triggered by a cytokine related syndrome triggered by the angiotensin signaling pathway 48 . An interesting aspect of this proposal is its action through STAT3, one of the genes that we also found conserved in the comorbidities we studied. Targeting the JAK-STAT pathway has been proposed as a therapeutic approach to COVID-19 2  www.nature.com/scientificreports/ results show that STAT3 is conserved in all five comorbidities we analyzed. This supports the hypothesis that a promising therapeutic strategy to combat severe COVID-19 compounded by preexisting comorbidities may be to target common JAK-STAT pathways. In addition to immune signaling pathways, we also saw shared enrichment for pathways involved in platelet biology. Platelets are the cells that are responsible for blood clotting 49 . Abnormal clotting has been observed in severe COVID-19 patients and it has been suggested as a complication that leads to more severe disease 50,51 . Magro et al. reported that the abnormal clotting observed in severe COVID-19 patients correlated with activation of the complement pathway 52 . Our results show that HMOX1, APOA1, APOE and two members of the coagulation cascade, F2 and F5, are shared among all five comorbidities we examined. In mice, Hmox1 deficiency leads to coagulation defects and results in arterial damage due to oxidative stress 53 . APOA1 is released during platelet degranulation as part of the platelet secretory granule 54 (Reactome:R-HSA-482770). APOA1 levels have also been shown to be significantly decreased in severe COVID-19 patients 55 . APOE is a gene that encodes a lipid binding protein involved in cholesterol metabolism 56 . Preliminary evidence suggests that the e4 allele of APOE may lead to a higher risk of deep vein thrombosis and the same allele also predicts severe COVID-19 57,58 . F2 and F5 are both involved in the formation of a fibrin clot (Reactome:R-HSA-140877). The complement pathway and coagulation cascade have been shown to interact, tying together inflammation and hemostasis 59 . Additionally in a proteomic study of proteins differentially expressed in the serum of 28 severe COVID-19 patients compared with non-severe COVID-19 patients, Shen et al. report that 50 of 93 differentially regulated proteins fall into three categories one of which is platelet degranulation 55 . These results suggest that one of the factors contributing to severe disease in patients with any of the five comorbidities may be due to an underlying genetic mechanism that acts through the hemostatic pathway.
Our results show that genes that are shared among five comorbidities associated with severe COVID-19 identify pathways that are consistent with the pathologies associated with the disease. In our analysis we excluded mouse orthologs that did not correlate 1:1 with human genes to avoid potential skewing of the enrichment analysis by having multiple paralogs over-represented. Despite this, and the exclusion of several potentially important immune system genes such as some histocompatibility genes, our results show that analysis using www.nature.com/scientificreports/ mouse orthologs of the shared genes also identifies phenotypes that are consistent with disease pathology. The laboratory mouse provides a tractable system to study the effects of genetic foundations of the comorbidities and severe disease. As mentioned above mice carrying Ctla4 tm1Shr /Ctla4 tm1Shr and Hmox1 tm1Mlee /Hmox1 tm1Mlee homozygous mutations display multiple phenotypes consistent with severe COVID-19 pathology. Mice engineered for mutations in these genes crossed with mice engineered to carry the human ACE-2 SARS-CoV-2 viral receptor, ICR-Tg(Ace2-ACE2)1Cqin/J, would be a starting point to explore the underlying genetic variants related to comorbidities that interact with viral infection 60  In this study we have used a bioinformatics approach to interrogate genes associated with five COVID-19 comorbidities that correlate with severe disease. Using genes that have been annotated to these comorbidities in the MeSH or HPO resource we have shown that genes are shared among the comorbidities and that shared genes are enriched for pathways that could be the genetic basis for the pathologies observed with severe COVID-19, specifically our results suggest that the interrelated pathways of hemostasis and inflammation may be key players in understanding the severity of comorbidities with COVID-19 61,62 . Our studies provide a gateway to understand how host genetics interacts with and influences the consequences of viral infection. Note that this work will be limited by the sample types used in the original experimental analyses: Our analysis of samples isolated from specific tissues or cell types will be limited to pathways that are relevant to those tissues. For example, samples www.nature.com/scientificreports/ from blood may identify pathways active in blood cell types but miss pathways that are significant in cardiac or lung tissue. Our knowledge about COVID-19 continues to grow at a rapid rate and future work will entail the examination of additional comorbidities, more specific comorbidities, a wider survey of genes beyond our initial seed set from MeSH and HPO. As more expression data become available from a larger number of samples, we may be able to better discern which pathways are common to all tissues and which are affected only in some tissues but not others. As we learn more about correlations between individual comorbidities and disease pathologies, we may be able to identify specific pathway/comorbidity combinations that can be used to inform us about treatment decisions. Our work also provides an entry point into an experimental system using the laboratory mouse to manipulate host genetics and to study its subsequent effect on the pathology of viral infection.

Data availability
All gene sets generated during and analyzed during the current study are based on data published in peerreviewed papers, are available in the public GeneWeaver repository [www.genew eaver .org] and are accessible using the gene set identifiers given in the text (e.g. GS398287). Results data generated during this study are included in this published article and its supplementary files.