Introduction

Globally, Alzheimer’s disease (AD) and Parkinson’s disease (PD) are among the most common causes of severe and fatal dementia1. While these diseases’ pathological hallmarks are neuronal loss, extracellular senile plaques containing the peptide β amyloid, and neurofibrillary tangles for AD1,2, and the loss of neurons in the substantia nigra and elsewhere in association with the presence of Lewy bodies for PD1,3, recent data indicate that neuroinflammation is involved in the progression of both neurodegenerative disorders2,3. Neuroinflammation driven by activated microglial cells causes a vicious cycle of inflammatory reaction between microglia, astrocytes, and β-amyloid plaques leading to neuronal death1. Investigators have long suspected that pathogenic agents contribute to the onset and progression of AD and PD4,5. However, only recently it was demonstrated that increased concentrations of proinflammatory cytokines occur in the initial stages of neurodegenerative diseases6. Moreover, certain genetic variants in the chromosome 6 region that specifies immune response human leukocyte antigens (HLAs)7 are associated with PD.

Peripheral viral infections could elicit brain dysfunction through direct cytolytic effects on site or through whole body circulating inflammatory reactions8. Neurotropic viruses, such as arboviruses, influenza viruses and herpes viruses developed escape mechanisms from host immune surveillance enabling access to the central nervous system (CNS) which can result in long-lasting subclinical infections (reviewed in9). The systemic and local responses of the immune system to viral infection potentially contribute to neuronal damage, even in the absence of cell death9. Viruses can elicit CNS inflammation either by traversing a comprised Blood Brain Barrier (BBB), infecting peripheral nerves or broadly over-activating the peripheral innate and adaptive host immune system10.

Multiple studies report the association between Herpes simplex virus type 1 (HSV1) and Alzheimer’s disease5,11,12. Other members of the Herpesviridae family have been implicated, particularly EBV, CMV and HHV613,14,15,16. Several viruses are capable of latent residency in the peripheral nervous system and target, in acute cases of encephalitis (for HSV1/2 and EBV), the same regions of the central nervous system (temporal and frontal cortex, and hippocampus) affected in AD17.

In this study, we hypothesize that comparisons of gene expression profiles from patients afflicted with either AD or PD with those profiles resulting from Herpesviridae infection, specifically CMV, EBV or HHV6, might reveal new associated neuro-inflammation pathways. We performed direct transcriptome profile comparisons between gene expression changes and enriched pathways in patients with Herpesviridae viral infection and AD/PD patients. To prioritize individual targets and pharmacological opportunities of intervention, we leveraged additional biological information such as human genetic disease associations and drug-repurposing analyses. We report here multiple human host genes and pathways that were significantly shared by human immune system responses to viral infections and neurodegenerative pathology.

Results

Data set selection

As described in detail in Methods, our computational analyses involved six steps (Fig. 1): (1) NCBI Gene Expression Omnibus (GEO) database querying and selection of AD and PD and CMV, EBV and HHV6 infection datasets; (2) stringent quality control and normalization for each dataset (Table S1); (3) differential expression analysis of individual datasets for healthy controls versus diseased individuals; (4) comparisons of differential expressed genes (DEGs) and pathways enriched across AD, PD and viral infection profiles; (5) integration of genetic associations and tissue-specific gene expression data integration and (6) target repositioning hypotheses generation using EMBL-EBI ChEMBL and Connectivity Map (CMAP) databases.

Figure 1
figure 1

Flowchart of the cross-neurodegeneration (Alzheimer’s Disease and Parkinson’s Disease) and cross-viral infection (CMV, EBV and HHV6) transcriptome analysis pipeline. The computational analyses consist of six major steps which were presented in the blue boxes. Detailed criteria for each major step were described in the Methods.

We found 35 GEO datasets with gene expression profiles by microarrays for AD, PD and human host response to CMV, EBV and HHV6 (Table S1). We recalculated fold changes for AD and PD, and results are comparable with previous studies18,19. Based on the filtering criteria described in the Methods section, six datasets were selected (Table 1): GSE636063 (132 whole blood samples from healthy vs 135 AD patients18), GSE99039 (whole blood samples from 232 healthy vs 204 PD patients19), GSE81246 (peripheral blood mononuclear cells (PBMC) from 24 patients with latent CMV infection vs 10 patients with active disease20), GSE2020021 and GSE4582922 (7 B cell independent samples vs 7 EBV infected samples), and GSE40396 (22 whole blood samples from seronegative patients vs 10 patients seropositive for HHV6 with fever23). Table S1 summarizes all retrieved datasets along with the reasons for their inclusion or exclusion from our analyses.

Table 1 List of patient blood and in vitro B cells infected with EBV gene expression datasets selected in this study, and the number of samples and DEGs in each dataset.

For each transcriptome dataset, we first determined the statistical significance of differentially expressed genes (DEGs) within each study by comparing disease to control samples. Subsequently, we compared individual study lists of significant DEGs to determine shared patterns in gene expression profiles for individual viruses compared to AD (Fig. 2A) and PD (Fig. 2B). This approach minimized any potential biases due to study differences in platforms or blood cell types.

Figure 2
figure 2

Heatmaps of subset of CMV, EBV and HHV6 DEGs shared with AD (A) and PD (B). For each DEG, log2 fold changes were indicated in the heatmap. The genes were clustered using the UPGMA method.

CMV infection and Alzheimer’s disease shared molecular markers

We identified 906 and 1,910 significant (False Discovery Rate adjusted [FDR-adj.] p-value < 0.05) DEGs in relation to AD and CMV human host response, respectively. Overall, 68 DEGs were shared in AD and CMV response signatures (Hypergeometric p-value (PHyper) = 1.5 × 10−7; Table S2). SESN3 was the most down-regulated gene in patients infected with CMV and in active disease status (CMV: Fold Change [FC] = −2.7, FDR-adj. p-value = 6.3 × 10−11; AD: FC = −1.3, FDR-adj. p-value = 5.0 × 10−4, Fig. 2A). Sestrin 3 controls intracellular response to reactive oxygen species24 and acts as a trans-acting genetic regulator of a pro-convulsant gene network in the human epileptic hippocampus25.

The full list of significant DEGs associated with AD and CMV (906 and 1,910 DEGs respectively) were analyzed for enriched functional pathways using MetaCore/MetaBase (GeneGo) v6.0 (Thomson Reuters, https://portal.genego.com/). In total, 5 human canonical pathways were significantly enriched in both AD and CMV DEG lists (Fig. 3A). The most significant pathway was “Role of Sirtuin1 and PGC1-alpha in activation of antioxidant defense system” (Fig. 4) (AD: FDR-adj. p-value = 0.01; CMV: FDR-adj. p-value = 2.1 × 10−3).

Figure 3
figure 3

Statistically significant (adjusted FDR p-value ≤ 0.05) shared CMV, EBV and HHV6 human host response pathways and differentially expressed genes in AD (A,B) and PD (C,D), respectively.

Figure 4
figure 4

Pathway map for “Role of Sirtuin1 and PGC1-alpha in activation of antioxidant defense system”. Significant up-regulation of genes is denoted as up-pointing bars colored in red, and significant down-regulation of genes is denoted as down-pointing bars colored in blue. The height of the colored bar represents to the magnitude of the gene expression changes (fold change) between cases and controls.

EBV infection and Alzheimer’s Disease shared molecular markers

We found 802 DEGs which were associated with EBV human host response. Of those, 36 genes were shared in AD and EBV host response signatures (PHyper = 2.0 × 10−3; Table S3). As in CMV host response comparison with AD transcriptional profiles, SESN3 ranked as one of the top 20 genes associated with EBV human host response with same direction of association as seen in CMV human host response (EBV: FC = −6.9, FDR-adj. p-value = 1.1 × 10−4) (Fig. 2A).

Five canonical signaling pathways were enriched in both AD and EBV (Fig. 3A). Similar to that observed for CMV infection, the pathways “Role of Sirtuin1 and PGC1-alpha in activation of antioxidant defense system” (AD: FDR-adj. p-value = 0.01; EBV: FDR-adj. p-value = 0.049) and “Antigen presentation by MHC class II” were most significantly enriched in AD and EBV infection (AD: FDR-adj. p-value = 6.8 × 10−3; CMV: FDR-adj. p-value = 0.04).

HHV6 infection and Alzheimer’s Disease shared molecular markers

We identified 1,698 genes associated with HHV6 human host response. Comparisons of transcriptional profiles yielded 95 genes shared in AD and HHV6 host response (PHyper = 0.038; Table S4). IDO1 ranked as the top gene associated with HHV6 response (HHV6: FC = 4.5, FDR adj. p-value = 5.3 × 10−5; AD: FC = −1.08, FDR adj. p-value = 0.001, Fig. 2A). Indoleamine 2, 3-dioxygenase (IDO1) catalyzes the first and rate limiting step in the kynurenin pathway26, which has been implicated in neuroinflammation and neurodegeneration27,28.

For AD and HHV6 (906 and 1,698 DEGs respectively), there were 9 common significantly enriched canonical pathways of which “Antigen presentation by MHC class II” had the highest significance (AD: FDR-adj. p-value = 6.8 × 10−3; CMV: FDR-adj. p-value = 6.2 × 10−6) (Fig. 3A).

Common genes and pathways across multiple viruses and Alzheimer’s Disease

We identified 28 genes that were associated with AD and host response to at least two of the three viruses investigated in this study (CMV, EBV or HHV6) (Fig. 3B). Thioredoxin (TXN) was the most down-regulated gene in patients with AD (FC = −1.44, FDR adj. p-value = 1.20 × 10−3), while up-regulated in patients with CMV (FC = 2.7, FDR adj. p-value = 3.0 × 10−4) and HHV6 active infection (FC = 1.51, FDR adj. p-value = 2.4 × 10−3) (Fig. 2A). Thioredoxin is crucial in maintaining a reduced oxygen intra-cellular environment and thus renders protection against oxidative stress29. For this reason, thioredoxin has been considered a promising early biomarker in the diagnosis of AD, suggesting the potential involvement of oxidative stress in the pathogenesis of the disease30.

Three pathways were significantly enriched in all host response viruses and AD datasets: “Antigen presentation by MHC class II”, “Role of IAP-proteins in apoptosis” and “Role of Sirtuin1 and PGC1-alpha in activation of antioxidant defense system” (Fig. 3A, Table S5). Two other pathways enriched in CMV and HHV6 host response as well as AD were “Induced oxidative stress and apoptosis in airway epithelial cells” and “Regulation of G1/S transition”.

CMV infection and Parkinson’s Disease shared molecular markers

There were 939 genes associated with PD and 1910 genes associated with CMV host response. Of those, 152 DEGs were shared between PD and CMV host response (PHyper = 0.04; Table S6). Amyloid beta precursor like protein 2 (APLP2) was the most up-regulated gene in CMV host response (FC = 8.8, FDR adj. p-value = 4.1 × 10−7) and was also associated with PD (FC = 1.13, FDR adj. p-value = 3.6 × 10−3) (Fig. 2B). APLP2 belongs to the Alzheimer’s-associated amyloid beta-protein precursor gene family, which interacts with the synaptic release machinery, suggesting a role in neurotransmission31.

We found 28 canonical signaling pathways enriched in both PD and CMV using the full list of significant DEGs. The most significant pathways were “Integrin inside-out signaling in neutrophils” (PD: FDR-adj. p-value = 4.1 × 10−11; CMV: FDR-adj. p-value = 3.0 × 10−3) and “Inhibition of neutrophil migration by pro-resolving lipid mediators” (PD: FDR-adj. p-value = 4.1 × 10−11; CMV: FDR-adj. p-value = 3.4 × 10−3; Fig. 3C).

EBV infection and Parkinson’s Disease shared molecular markers

Comparisons of transcriptional profiles yielded 60 genes shared in PD and EBV host response (PHyper = 0.02; Table S7). As seen in the CMV host response comparison with PD transcriptional profiles, APLP2 ranked as one of the top 5 genes associated with EBV human host response (EBV: FC = −6.5, FDR-adj. p-value = 1.2 × 10−4; Fig. 2B).

From the list of genes associated with PD and EBV (939 and 802 DEGs respectively), there were 53 pathways significantly enriched in both diseases with “Reverse signaling by Ephrin-B” as the most significant one (PD: FDR-adj. p-value = 3.8 × 10−7; EBV: FDR-adj. p-value = 5.4 × 10−3; Fig. 3C).

HHV6 infection and Parkinson’s Disease shared molecular markers

Overall, 181 DEGs were shared in PD and HHV-6 response signatures (PHyper = 6.4 × 10−9; Fig. 2B; Table S8). IL1RN was the most highly over-expressed gene in patients infected with HHV6 and significantly expressed in PD (HHV6: FC = 8.47, FDR-adj. p-value = 1.7 × 10−6; PD: FC = 0.27, p-value = 1.7 × 10−3). Interleukin 1 (IL-1) receptor antagonist (IL-1RN) is a naturally occurring anti-inflammatory agent that binds to the IL-1 receptor but lacks agonist activity and therefore functions like a competitive inhibitor of IL-132.

Considering the total genes associated with PD and HHV6 (939 and 1697 DEGs respectively), 253 human canonical pathways significantly enriched in both DEG lists. As seen for CMV, the pathway entitled “Inhibition of neutrophil migration by pro-resolving lipid mediator” ranked as the most significant result (PD: FDR-adj. p-value = 4.3 × 10−11; EBV: FDR-adj. p-value = 3.0 × 10−3).

Common genes and pathways across multiple viruses and Parkinson’s Disease

We identified 54 genes associated with PD and host response to at least two of the three viruses investigated in this study (CMV, EBV or HHV6) (Fig. 3D). BCL6, GYG1, RBCK1, TIMP2 and CIRBP were common DEGs across all viruses tested and associated with PD. Tissue inhibitors of metalloproteinases (TIMPs) are endogenous inhibitors of matrix metalloproteinases (MMPs), and the aberrant expressions of MMPs are strongly associated with neuroinflammation and neuronal cell death33.

In total, 15 human canonical pathways were significantly enriched from DEGs associated with PD and human host response to CMV, EBV and HHV6; many of which are involved in host immune response (Table S9). The most significant pathway was “Reverse signaling by Ephrin-B” (PD FDR-adj. p-value = 3.3 × 10−6; HHV6 FDR-adj. p-value = 3.7 × 10−2; EBV FDR-adj. p-value = 1.9 × 10−2; CMV FDR-adj. p-value = 1.5 × 10−3). Previous studies have indicated that ephrin signaling pathway is involved in the inflammatory process following CNS injury by serving roles in the maintenance of endothelial junction integrity and cytoskeletal structure. Remarkably, there was significant enrichment for several pathways commonly associated with PD and neurodegenerative diseases in general. These include the leucine rich repeat kinase 2 (LRRK2) pathway (PD FDR-adj. p-value = 3.7 × 10−6; HHV6 FDR-adj. p-value = 4.2 × 10−2; p-value = EBV FDR-adj. p-value = 2.0 × 10−2; CMV FDR-adj. p-value = 1.8 × 10−3; Fig. 5). The G2019S mutation within the LRRK2 kinase domain is the most common causal mutation in PD patients34, and it results in substantial increase in LRRK2 kinase activity35. The mechanism by which LRRK2-G2019S induces PD pathology remains unclear, although several studies have implicated this mutation in the dysregulation of autophagic function36.

Figure 5
figure 5

Pathway map for “LRRK2 in neurons in Parkinson’s disease”. Significant up-regulation of genes is denoted as up-pointing bars colored in red, and significant down-regulation of genes is denoted as down-pointing bars colored in blue. The height of the colored bar represents to the magnitude of the gene expression changes (fold change) between cases and controls.

Control analyses for disease relevance of detected molecular signatures

To evaluate potential biases in our approach, we performed further analyses. First, we also compared CMV, EBV and HHV6 host response DEGs to genes associated with Huntington’s Disease (HD) and Type 2 diabetes mellitus (T2DM; FDR adj. p-value ≤ 0.05), as control analyses with another neurodegenerative condition (HD) and a disease unrelated to neurodegenerative and infectious disease (T2DM), to examine for potential spurious comparisons. We selected three publicly available datasets with peripheral blood gene expression samples: GSE900637 (11 children diseased for T2DM and 23 healthy children), GSE6952838 (23 adults diseased for T2DM and 27 healthy controls) and GSE3472139 (150 samples from patients with HD and 70 samples from healthy controls). By contrast with the comparisons with AD and PD, these control analyses showed no statistically significant enrichment with T2DM or HD in the 3 datasets tested (PHyper > 0.1, Fig. S1) which suggests that our common molecular signatures between AD or PD and Herpesviridae infections are robust and non-spurious.

Second, since the gene expression datasets used in our analyses were obtained from blood samples of AD and PD patients we needed to evaluate the co-expression of shared viral host response genes with the most important immune function cell type found in the brain, the microglia. We re-analyzed and evaluated the gene expression profiles of 161 CMV, EBV or HHV6 host response genes shared with AD, and 329 genes shared with PD in 37 human microglia post-mortem samples, whose donors had history of normal cognitive function and no apparent neuropathological abnormalities40 (Fig. 6). We found the majority of genes (139 of 161 genes) 86.3% in AD and (308 of 329 genes) 93.6% in PD were actively expressed in human microglia (log2 TPM > 2) (Fig. 6). Of the DEGs shared by at least two of the three viruses, we found 82.1% (23 out of 28 genes) in AD and 90.7% (49 out of 54 genes) in PD were expressed in microglia. These findings lend further support for sampling of the blood as a surrogate for direct microglia gene expression profiling.

Figure 6
figure 6

Microglia gene expression levels (log2 transcripts per million reads (TPM)) of CMV, EBV and HHV6 DEGs common with AD (A) or PD (B).

Potential drug targets with human genetic evidence

It was previously reported that drug targets with robust human genetics support regarding disease pathology could boost success rates in clinical development41. Therefore, we surveyed the public genome-wide association studies present in the GWAS catalog for genetic evidence for the 172 and 329 DEGs that associated with host response to viruses (CMV, EBV or HHV6) and AD or PD, respectively. Each SNP was interposed to the linkage disequilibrium (LD) region upstream/downstream within 1 kb of the DEG coding region. A total of 19 genes were proximal to at least one SNP associated with neurodegenerative diseases (AD: 11 genes PD: 8 genes) (Table 2). Rs6430538, the most significant variant (p-value = 8 × 10−24) associated with PD, is located near the gene HNMT on chromosome 2. HNMT encodes histamine N-methyltransferase which has a key leading role in histamine metabolism in the central nervous system42, and was pointed as a genomic biomarker for PD increased susceptibility43. Rs2373115, an intronic variant near GRB2 associated binding protein 2 (GAB2), was the most significant variant (p-value = 1 × 10−10) associated with AD. Multiple genetic variants in the GAB2 region are associated with late-AD onset, which could be involved in multiple pathways leading to the formation of neurofibrillary tangles44.

Table 2 List of 19 DEGs associated with viral host response and Alzheimer’s Disease or Parkinson’s Disease proximal to SNPs associated with neurodegenerative diseases in the GWAS catalog.

In addition, we identified several SNPs in close proximity (<1 KB to the coding region) of the 172 and 329 DEGs showed genetic association with other non-relevant conditions to neurodegenerative diseases. For example, 20 genes were proximal to SNPs associate with T2DM (Table S10).

Drug repurposing analysis

The 401 targets identified in this study (172 and 329 DEGs associated with host response to viruses and AD and PD, respectively) were mapped to public compounds by searching the EMBL-EBI ChEMBL database for approved and marketed drugs targeting these genes. Overall, we identified 55 drug-target pairs in 20 DEGs (Table 3). Most genes were associated with multiple drugs, for example, TUBB6 was targeted by 12 unique compounds with diverse therapeutic indications such as oncology and acute coronary syndrome. Alfacalcidol, a vitamin D receptor (VDR) agonist with therapeutic indication for PD is also included in our list as a potential repurposing opportunity to treat peripheral drivers of neurodegeneration.

Table 3 List of launched drugs targeting DEGs associated with viral host response and Alzheimer’s Disease or Parkinson’s Disease.

CMAP is another drug repurposing approach which deploys the anti-correlation relationship across disease gene expression signatures and pharmacological in vitro perturbations45. We performed separate analyses using CMV, EBV and HHV6 human host response gene expression signatures to assess anti-correlation of approximately 5000 small-molecule compounds and 300 reagents from the Broad Institute public library (www.broadinstitute.org/connectivity-map-cmap). Overall, 16, 24 and 16 compounds were significantly anti-correlated to the CMV, EBV and HHV6 host response signature respectively (p-value < 0.05, Specificity < 0.1; Table 4). Of those, 14 compounds (highlighted in Table 4) showed evidence in the literature of neuro-protection to Alzheimer’s or Parkinson’s Disease through multiple mechanisms, such as dopamine receptor agonism, monoamine oxidase and cholinesterase inhibition. In addition, multiple compounds identified showed anti-inflammatory properties, which were previously considered potential pharmacological options for AD prevention46.

Table 4 List of the Broad Institute public library of compounds associated with gene targets in CMV, EBV or HHV6 human host response based on CMAP45 analysis of contrary gene expression profiles.

Discussion

Understanding the causal basis for neurodegenerative diseases is challenged by its extended preclinical stage, and the unfeasible task to sample brain tissues routinely. Currently, it is known that neurodegenerative diseases, such as AD and PD, could result from multiple risk factors including genetic susceptibility47,48, age49, and toxins and inflammatory responses as environmental triggers for microglia and astrocyte activation50. In this context, pervasive viral infections could precipitate peripheral inflammatory reactions or immune dysregulation that are often associated with AD and PD9,36,51. We report here a systematic study of common molecular markers between viral perturbations to human immune response and clinical AD and PD. Our strategy was to examine multiple public transcriptome datasets from patients seropositive/seronegative for CMV, EBV or HHV6, and AD/PD patients with the goal of identifying novel biology mechanisms suited for therapeutic modulation.

The concept of utilizing datasets with blood samples for the detection of disease associated molecular changes in gene expression relies on the natural role of peripheral blood cells in immune response to circulating pathogens. This enabled our blood-to-blood sample gene expression comparisons between human host response to CMV, EBV or HHV-6 infection to that of AD and PD patients. In addition, recent studies demonstrated significant correlation in gene expression between multiple brain tissues and peripheral blood cells52,53,54,55,56. We confirmed that the majority of DEGs from the blood are also actively expressed in human microglia. Therefore, we feel there is validity in our approach of inferring genes and pathways involved in AD/PD pathology through comparative blood differential gene expression analyses with host response to viral pathogens.

Our results provide evidence of the involvement of oxidative stress mechanisms in the pathologies of our representative viruses and AD through the activation of the Sirtuin and PGC1-alpha pathway. In addition, SESN3 and TXN, which play important roles in this pathway, ranked among the top genes associated with CMV and EBV, and CMV and HHV6 host responses, respectively. Further support is provided by genetic evidence from GWAS which show an association of the SNP rs3911569 located near the gene SESN3 with a 5-fold increased risk for AD. These findings support the emerging “mitochondrial cascade hypothesis” based on growing evidence for AD-related mitochondrial dysfunction57, and the potential impact of CMV, EBV and HHV6 host response in oxidative stress.

Our analyses also highlighted BCL6, GYG1, RBCK1, TIMP2 and CIRBP, which were DEGs shared between all viruses and PD. TIMP2 was associated with neuroprotection through inhibition of matrix metalloproteinases33, which were involved in neuropathological processes such as inflammation, BBB damage and neuronal cell death, leading to multiple CNS disorders such as PD58. To our knowledge, none of the other genes have been previously linked to PD, neurodegeneration or neuroinflammation. BCL6, a sequence specific transcriptional repressor which is a key player in B cell differentiation, has recently gained attention due to the association of EBV latent proteins with BCL6 down-regulation59. These findings have implications for emerging strategies targeting B cell differentiation but how they could influence neurodegeneration still needs further investigation.

Recent studies show that LRRK2, a kinase mutated in PD clinical cases60,61,62, modulates inflammation in response to different pathological stimuli. LRRK2 plays a potential role in cytoskeleton remodeling and vesicle trafficking in microglia cells toward a pro-inflammatory state and, consequently, neurodegeneration63. The LRRK2 pathway was significantly enriched from DEGs associated with PD and human host response to CMV, EBV and HHV6. LRRK2 gene expression is regulated by IFN-γ and potentially mediates immune responses to pathogens64,65. Recently, we reported that this pathway was also linked to human host response to Mycobacterium tuberculosis66. LRRK2 knock-outs in mouse models, displays phenotypes of hyperactive immune responses and increased risk to inflammatory bowel disease by regulating the transcriptional regulatory protein nuclear factor of activated T cells67. Our findings further support the potential roles of LRRK2 in host response to infection and neurodegeneration.

By mapping the 401 DEGs identified in this study to compounds listed in the ChEMBL database, we identified 55 drug-target pairs for 20 genes. Of those, 12 drug-target pairs showed primary therapeutic indication for auto-immune disease or chronic inflammatory conditions, such as psoriasis and rheumatoid arthritis. These results highlight the role of immune dysregulation in neurodegeneration, particularly, in AD and PD68. Thus, our findings suggest the use of immunomodulators as potential therapeutic strategies for AD and PD. Pro-inflammatory cytokines and chemokines as well as reactive oxygen and nitrogen species secreted by activated microglia can trigger a neurotoxic cascade leading to neuronal lesions and significant damage to the CNS. Therefore, therapies targeting neuroinflammation either directly or indirectly warrant further investigation68.

From our CMAP analysis, we identified several clinically used drugs that could be potentially repurposed for targeting human host factors in CMV, EBV and HHV6 infections. Overall, 14 of those compounds showed evidence in the literature of neuro-protection to AD or PD through multiple mechanisms, such as dopamine receptor agonism, and monoamine oxidase and cholinesterase inhibition69,70,71. Moreover, other CMAP compounds identified showed anti-inflammatory properties, many of which have shown promising results in experimental models of the disease70,72. These findings suggest several relevant mechanisms pertinent to both viral infection and neurodegeneration that need to be further explored.

Multiple epidemiological reports have associated AD or PD with diverse bacterial and viral pathogens4,5,73. Most of them connect Herpesviridae to AD, particularly HSV-174,75,76, EBV, CMV, and HHV613,14,15,16. In aggregate, these studies are suggestive of a viral contribution to neurodegenerative diseases although their findings offer little insight into potential mechanisms. Recently, Readhead et al.77 compared computational networks between AD and the RNA-Seq abundance of multiple viruses. Their findings implicate HHV6 and HHV7 contribution to the development of neuropathology and AD. Differently, our study provides a direct gene expression comparison of changes in expression in patients with documented evidence for viral infection (and active disease) and AD/PD. Arguably, using host gene expression signatures might be a more “agonistic” approach which overcomes the limitations of “virus hunting” for specific pathogens and could reveal the participation of both known and unknown viruses (or other pathogens) in neurodegenerative disease pathology based on overall host response.

Our study has some limitations to be considered. Most importantly, we were limited by the sample size and quality of publicly available datasets. For instance, we wished to investigate HSV-1 human host response comparing blood gene expression with the AD/PD DEGs/pathways but none of the available datasets were generated from patients. Moreover, the results presented here are not enough to conclusively prove causality relationship between viral host response and neurodegeneration. For that to occur, further clinical trials and interventional studies are necessary. Lastly, drug repurposing compounds obtained from CMAP analyses are derived from Broad Institute gene expression data on fibroblasts and tumor cell lines, which may not be the most relevant tissue for this study. Validating these results on microglia or brain tissues or even specific blood cells would be ideal.

Our study adds to the growing evidence of the role of immune dysfunction in neurodegenerative diseases. Moreover, gene expression systematic comparisons between host response to EBV, CMV and HHV6 and AD/PD provide new insights into host genes and pathways important for neurodegeneration and convey potential drug repurposing opportunities promoting neuroprotection. Experimental validation of the pharmacologic interventions proposed here would constitute the next stage in the drug development for the proposed targets and compounds. Further evolution of this paradigm shift viewing peripheral immunity dysregulation as a potential driver of neurodegeneration could lead to novel therapeutic approaches for the treatment of PD and AD.

Methods

Selection of gene expression datasets

Data analysis workflows broadly followed our previously published studies on human host response to various intra-cellular residing pathogens including bacteria78, viruses79 and tuberculosis66. Gene Expression Omnibus (GEO) database (as of June 2018) was queried for human blood microarray gene expression datasets in response to Herpesviridae infection, Alzheimer’s and Parkinson’s Disease. The specific search terms used were: “HSV”, “EBV”, “HHV6”, “CMV”, “Alzheimer’s Disease”, “Parkinson’s Disease”, “Homo sapiens”, and “Whole blood”. The retrieved datasets were filtered based on the following criteria: gene expression profiles from published studies which were: 1) raw data available and derived from human cells of AD/PD or single virus infected patients; 2) there was at least one control group (healthy subjects) and one diseased group and; 3) data originated from human array platforms. Additionally, type 2 diabetes mellitus (T2DM) and Huntington’s Disease (HD) datasets were included to allow gene expression comparisons with unrelated neurodegenerative and non-infectious diseases and serve as controls for spurious comparisons. Table S1 summarizes all retrieved datasets along with the reasons for their inclusion or exclusion from our analyses.

Raw (intra-slide normalized) gene expression data, study design table and annotation table of each dataset were obtained from the GEO/ArrayExpress databases and processed using ArrayStudio v10.0 (OmicSoft, USA). The datasets retrieved are microarray datasets obtained from the following platforms: Illumina Human HT, Affymetrics Human Genome U133 and Affymetrics Exon Array (Table S1). Several datasets (GSE42834, GSE56153, GSE31348 and GSE36238) were further excluded due to both a noisy kernel density plot and low within group pairwise correlation (correlation cutoff 0.9, Table S1), but were included as independent datasets for validation purposes. After quality filtering, six microarray datasets (GSE636063, GSE99039, GSE81246, GSE202007, GSE458298, and GSE40396) from either whole blood or peripheral blood mononuclear cells (PBMCs) were retained for further analysis.

Data processing and statistical analysis

Quality Control analyses were performed in all datasets selected66. Data imported was previously normalized. Intra-slide normalization was assessed by: 1) kernel density; 2) Principal Component Analysis or PCA (showing divergent samples within groups); 3) Median Absolute Deviation (MAD) score and; 4) within group pairwise correlation. Samples were considered outliers if failed at least two of these assessments. Samples irrelevant to our study design (such as samples from host response to bacteria and viruses other than HHV6 in GSE40396) were also excluded. In total, 6 samples from AD, 2 samples from PD datasets were excluded. All samples in CMV, EBV, HHV6 and T2DM datasets passed QC.

For each dataset, scale quantile inter-slide normalization (fixed target median value to 500), log2 transformation and probe differential expression analysis was performed in ArrayStudio v10.0 (OmicSoft, USA). When more than one probe mapped to a gene, the expression value of the lowest p-value was used for that gene (the “aggregate” R function was applied). Differentially expressed genes (DEG) passed the false discovery rate adjusted [FDR-adj.] p-value threshold of 0.05. The AD or PD DEG list was compared with the list of DEGs associated with CMV, EBV or HHV6 host response to identify shared gene expression signatures. The statistical significance of the overlap between AD/PD DEGs with CMV, EBV or HHV6 DEGs was assessed with a hypergeometric test (using the “phyper” R function).

Pathway enrichment analysis

Pathway enrichment analysis was performed for all DEGs from each dataset using MetaCore/MetaBase (GeneGo) v6.34 (Thomson Reuters, https://portal.genego.com/)66. The p-value for each of the 1480 human canonical pathways in MetaCore was generated using a hypergeometric test with an FDR-adj. p-value cutoff of 0.01. The Compare Experiments Workflow tool was used for comparing gene expression data across different datasets (AD/PD DEG with CMV, EBV or HHV6 DEGs) by analyzing their intersections in terms of their mappings onto MetaCore’s ontologies, including canonical pathway maps.

Genetic variants enriched in candidate gene region

Based on the shared genes across AD or PD and CMV, EBV and HHV6 DEGs, we searched the Open Targets80 validation platform to identify genetic variants proximal to these candidate gene targets associated with AD or PD. Sources of genetic associations in Open Targets include the following: the GWAS catalog, Genomics England PanelApp, the PheWAS catalog, the European Variation Archive (EVA) and Gene2Phenotype80. Variant-gene assignment considered deleterious consequences within the gene coding region, and the variant location within introns or regulatory regions. Intergenic variants assigned to the promoter region of the nearest gene were also retrieved in this search.

Gene expression validation in human microglia available datasets

To validate the expression profile of blood sample DEGs, publicly available datasets with human microglia gene expression datasets were identified in GEO database. Raw gene expression files were downloaded from GSE9907440.An initial quality check of host RNA-Seq data was performed using FastQC81. Quality filtered reads were mapped to the human reference genome GRCh38 ensembl 86 using STAR82, and quantified with featureCounts83. The data was annotated with Biomart84, and gene expression measurements were reported in log2 transcripts per million (TPM).

Drug-target prioritization

To link putative targets (DEGs) to public compounds, we obtained evidence from approved and marketed drugs that are associated to 11,538 targets from the EMBL-EBI ChEMBL database v2385. This analysis included data on drugs that have been approved for marketing by the U.S. Food and Drug Administration (FDA) and direct clinical evidence of interaction with the encoded DEG.

In addition, we performed drug repurposing analysis with the Connectivity Map45 (CMAP, https://www.broadinstitute.org/cmap/). For each gene expression profile to host response to CMV, EBV and HHV6, the 500 genes that ranked (based on FDR-adj. p-values) at the very top and bottom of each list were selected and compared against the gene expression profiles from the Broad CMAP compound library. Significant compounds were prioritized based on anti-correlated compound enrichment score, which represents compounds inversely matched to the disease signatures surveyed (score < 0; FDR-adj. p-value ≤ 0.05; compound specificity < 0.1). To enable result interpretation, ChEMBL data85 on target, mechanism of action, and drug indication was integrated. Compounds with unknown mechanism of action, no clinical use or antibacterial effect were excluded from our results.