Introduction

Investigation into the role of viral infections on brain cancer have supported an influence of several common viruses on brain cancer risk and prognosis. For example, human cytomegalic virus (HHV5), human polyoma JC virus (JCV) and human papilloma virus (HPV) have been widely implicated in brain cancers and have been detected in brain tumors, albeit inconsistently1,2,3,4,5,6,7,8. Still other viruses, particularly other human herpes viruses (HHV) such as HHV1 and HHV2 (herpes simplex viruses), HHV4 (Epstein–Barr virus), and HHV6 (Roseolovirus) have been cited as possibly linked to brain cancers although research is relatively sparse and inconsistent1, 2, 4, 8, 9. Some of the inconsistencies with regard to the associations of viruses with brain cancer might be partially attributable to population10, 11 and individual12 variation in Class I human leukocyte antigen (HLA) genes, an essential component of the human immune response to viruses and other foreign antigens.

Class I HLA genes code for cell-surface proteins located on nucleated cells that bind and export small peptides derived from proteolytically degraded cytosolic viruses and other foreign antigens to the cell surface for presentation to CD8+ cytotoxic T cells, thereby signaling cell destruction. The success of this process rests on both the binding affinity of an HLA molecule with a foreign antigen epitope and immunogenicity, or the ability to mount an immune response. HLA is the most highly polymorphic region of the human genome with nearly all of the variability located in the binding groove12; consequently, subtle differences in the amino acid residues of the binding groove influence antigen elimination13. Exposure to viral or other foreign antigens in the absence of high affinity HLA-antigen binding or insufficient immunogenicity may result in antigen persistence and downstream deleterious health effects14, 15. Indeed, HLA has been implicated in a wide variety of conditions including cancers16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 and, specifically, has been implicated in brain cancer risk and protection32,33,34,35,36,37,38,39.

Taken together, separate lines of research have linked several viruses and HLA to brain cancer. We have suggested that HLA moderates the influence of exposure to viruses on long-term health effects including cancer such that HLA-facilitated antigen elimination promotes health and, conversely, the inability to eliminate foreign antigens due to weak binding affinity and/or immunogenicity, contributes to cancer risk27,28,29,30,31. Here, we evaluated HLA-virus immunogenicity with respect to a population-based HLA-brain cancer profile. Specifically, we computed an HLA profile for brain cancer characterized by the covariance between the population frequency of HLA alleles and the population prevalence of brain cancer in 14 Continental Western European countries. Next, we used an in silico approach to evaluate the binding and immunogenicity of viral antigens linked to brain cancer with common HLA Class I alleles and to assess the association between those HLA-antigen epitope immunogenicity scores and the HLA-brain cancer profile.

Results

Brain Cancer-HLA protection/susceptibility covariance (BC-HLA PScov) scores

The BC-HLA PScov scores are given in Table 1. We rejected the null hypothesis that this set of PScov scores could be due to chance, since no cases were found where the ranks following a random pairing of BC-prevalence with allele frequency (out of 1,000,000 runs) matched the ranked observed BC-HLA PScov profile, thus rejecting the null hypothesis that the observed profile could be accounted for by chance (P < 1 × 10–6).

Table 1 Brain cancer PScov × 1000 scores for HLA Class I alleles.

Immunogenicity of virus proteins for HLA Class I alleles

Immunogenicity scores varied appreciably among HLA alleles (Fig. 1 and Table 2). Allele C*03:03 had the highest I score (12.81) and A*03:01 had the lowest score (3.45), a 3.7 × differential. An analysis of variance (ANOVA) was used to evaluate the effect of HLA Gene on the log-transformed immunogenicity score \(I^{\prime}\). We found that Gene had a highly statistically significant (F[2,754] = 51.9, P < 0.001). It can be seen in Fig. 2 that HLA allele C had the highest immunogenicity effect (P < 0.001 for both comparisons against genes A and B; two-sided t-test ANOVA), whereas gene B had a higher effect than gene A which, however, did not reach statistical significance (P = 0.057, two-sided t-test, ANOVA). Finally, given that an individual carries 6 classical HLA Class I alleles (2 for A, B, and C genes), the combination of alleles with highest average immunogenicity are shown in bold in Table 2.

Figure 1
figure 1

Mean immunogenicity scores I (± 95% CI) of the 69 HLA alleles used, ranked from high to low. The number in the abscissa correspond to the ranks in Table 3 (column 1). See text for details.

Table 2 Average immunogenicity score I for the 69 HLA Class I alleles, sorted by gene and ranked from high to low (N = 11 viruses).
Figure 2
figure 2

Mean (± 95% CI) log-transformed immunogenicity score (\(I^{\prime}\)) for each one of the 3 HLA genes. See text for detail.

Association between BC-HLA PScov score and virus proteins immunogenicity

A main objective of this study was to assess in silico the possible association between BC-HLA PScov scores and the HLA-immunogenicity of the proteins of various viruses that have been implicated in brain cancer. Our rationale was that viruses with proteins of high HLA immunogenicity would be eliminated more effectively from the body and the brain, and hence less likely to cause cancer: this hypothesis would predict that BC-HLA PScov scores would be negatively correlated with HLA virus immunogenicity. Indeed, the immunogenicity scores \({I}^{\prime}\) of virus proteins were, overall, negatively and highly significantly associated with the BC-HLA PScov scores (r = − 0.126, P < 0.001, N = 11 proteins × 69 alleles = 759).

With respect to specific viruses, the key measure was the percent of the BC-HLA PScov scores explained by the virus immunogenicity (Percent Variance Explained, PVE). The results are shown in Table 3, ranked from high to low PVE, and illustrated in Figs. 3, 4, 5, 6, 7, 8. The following can be seen. Ten of the 11 virus immunogenicities (all but HPV) were negatively associated with the corresponding (per HLA allele) BC-HLA PScov scores. JCV had the highest PVE (column 3 in Table 3, Fig. 3A), followed by HHV6A (Fig. 3B), HHV5 (Fig. 4A), HHV3 (Fig. 4B), HHV8, and HHV7, whereas HPV, HHV6B, HHV4, and HHV2 had negligible PVEs (all < 1%). These groupings of BC-HLA and virus immunogenicity associations are illustrated in Fig. 5 which plots the PVE (Eq. 4) for each virus. A formal assessment of the PVE using MDS differentiated the groupings more clearly. It can be seen in Fig. 6 that (a) JCV is alone in the upper right quadrant, (b) HHV6A, HH5 and HHV3 are in the lower right quadrant, (c) HHV7 and HHV8 are in a separate cluster also in the lower right quadrant, and (d) all the remaining viruses (HHV1, HHV2, HHV4, HHV6B, and HPV) are tightly clustered together in the upper left quadrant. This segregation of PVEs is further exemplified in the dendrogram of Fig. 7, as follows. There are two main branches: (i) branch 1 comprises the 6 viruses with non-negligible association to BC-HLA scores, corresponding to the points on the positive (right) side of the MDS plot of Fig. 6 (clusters a, b, c); and (ii) branch 2 comprises the 5 viruses with negligible contributions (PVE) to BC-HLA scores (corresponding to cluster d in Fig. 6). Furthermore, branch 1 gives rise to two smaller branches (1a and 1b). Branch 1a comprises two branches (1a1 and 1a2), of which branch 1a1 contains JCV alone (corresponding to cluster a in Fig. 6) and branch 1a2 contains HHV3, HHV5 and HHV6A (corresponding to cluster b in Fig. 6); branch 1b comprises HHV7 and HHV8 (cluster c in Fig. 6).

Table 3 Pearson correlations (and associated statistics) between BC PScov and the log-transformed immunogenicity of the 11 virus proteins studied.
Figure 3
figure 3

(A) BC-HLAcov scores are plotted against \(I^{\prime}\) for JCV virus. (B) Same for HHV6A virus. See Table 3 for statistics.

Figure 4
figure 4

(A) BC-HLAcov scores are plotted against \(I^{\prime}\) for HHV5 virus. (B) same for HHV3 virus. See Table 3 for statistics.

Figure 5
figure 5

Percent of variance of BC PScov scores explained by immunogenicities of the 11 viruses studied. Colors reflect gradation in PVE. (Data from Table 3).

Figure 6
figure 6

Multidimensional scaling plot of percent variance explained by the 11 viruses studied. See text for details.

Figure 7
figure 7

Dendrogram derived by hierarchical tree clustering of percent variance explained by the 11 viruses studied. See text for details.

Figure 8
figure 8

Schematic diagrams to illustrate the contributions of the 11 viruses studied to the BC-HLA association. The filled black circle in the left panel denotes the percentage of the BC-HLA variance explained by the sum of contributions by immunogenicities of individual viruses (column 3 in Table 3). The area of the color-coded viral sectors in the right panel is proportional to individual viral contributions shown in column 3 of Table 3. The sectors of HHV4 and HHV2 (in black) were almost totally overlapping.

The total, combined PVE of the 11 viruses was 44.52%, which means that of the BC-HLA PScov variance, 44.52% was accounted for by the immunogenicities of the 11 virus proteins studied (Fig. 8, left panel). The relative contributions to this total by each virus are given in column 4 of Table 3 and illustrated in the pie graph on the right panel in Fig. 8. Specifically, PVEs of JCV, HHV6A, HHV5, HHV3, HHV8, and HHV7 contributed 95% of PVE, each virus contributing > 10%. In contrast, the remainder viruses (HPV, HHV6B, HHV4, HHV2) contributed only 5%, each virus contributing < 2%. Finally, this sharp separation of the two groups of viruses was also reflected in the significance levels of the correlation coefficients. (One-sided P-values were used because we were testing explicitly only a negative association between virus immunogenicity and BC-HLA covariance, and no correction for multiple comparisons was applied because these were planned comparisons and not the results of a dredging search for significant correlations.)

Additional HHV5 proteins

The immunogenicity scores of all 5 additional HHV5 proteins were negatively correlated with the BC-HLA PScov scores but at strengths smaller than that of HHV5 gB (Table 3, lower panel). The PVE was < 4% for all 5 proteins, well below the 7.56% PVE value for HHV5 gB.

Discussion

Methodological considerations

Here we employed the iNeo-Epp algorithm40 (see “Materials and methods”) that uses a combination of diverse factors to evaluate the T-cell immunogenicity of a peptide (epitope), namely T-cell epitope prediction of the peptide-HLA Class I allele complex, factors including foreignness, accessibility, molecular weight, molecular structure, molecular conformation, chemical properties, and physical properties of target peptides. As a result, the outcome (for specified epitope/HLA Class I pairs T-cell receptor prediction, i.e. immunogenicity) includes both estimated binding affinities and immunogenicity scores; remarkably, high binding affinity is not necessarily associated with high immunogenicity, such that, e.g. an epitope with high binding affinity to a HLA Class I allele (indicated by the percentile rank value) may have a low immunogenicity (indicated by a low immunogenicity score). This is in contrast to other epitope prediction algorithms (e.g. ref41) which focus on the epitope-HLA allele binding affinity, which is a necessary but not sufficient step for immunogenicity.

Brain cancer and HLA

Previous findings regarding the influence of specific viruses on brain cancer have been somewhat inconclusive. Here, we evaluated the association of 11 viruses and the population risk of brain cancer with respect to Class I HLA, a system that is critically involved in the human immune response to viruses and other foreign antigens. Specifically, we evaluated correspondence between HLA-viral antigen immunogenicity and the brain cancer HLA protection/susceptibility profile based on the premise that variability in HLA binding and immunogenicity with viral antigens influences cancer risk via differential ability to eliminate potentially oncogenic viruses. The findings demonstrated a negative association between brain cancer-HLA protection/susceptibility profile and all 11 HLA-viral antigen immunogenicities, with substantial effects (> 4% PVE) by 6 viruses (JCV, HHV6A, HHV5/CMV, HHV3/VZV, HHV8, HHV7) and negligible effects (< 1% PVE) by 5 viruses (HPV, HHV6B, HHV4, HHV2). This suggest that Class I HLA alleles that bind with, and eliminate, the former 6 viruses in particular may protect against brain cancer risk; conversely, in the absence of antigen elimination due to weak binding and/or low immunogenicity, these viruses may be involved in brain cancer risk. Since both BC-HLA and immunogenicity scores refer to the same HLA Class I alleles, and since the potential presence (and associated effects) of the various viruses are not mutually exclusive, the PVE contributions of the 6 viruses above can be added and their sum would be an estimate of how much of the variance in BC-HLA scores could be accounted for by PVE contributions of these viruses. This sum amounts to 44.5%, which leaves 55.5% of the variance of BC-HLA scores to be accounted for. It is reasonable to suppose that this would be the contribution of HLA Class II alleles, involved in the formation of antibodies against those viruses. This hypothesis remains to be investigated. Finally, it should be noted that only negligible contributions were found by HHV1, HHV2, HHV4, HHV6B, and HPV.

All four viruses identified here as most relevant to brain cancer risk, reflecting variation of their immunogenicity with common Class I HLA alleles, share several features as follows: (1) they are practically ubiquitous; (2) initial infection typically occurs during childhood; (3) following initial infection, the viruses establish lifelong latency in the host; (4) the latent viruses can be reactivated by various triggers and/or waning immunity; and (5) they are all neurotropic. Moreover, the presence of JCV, HHV5/CMV, and HHV6 DNA and/or proteins have been fairly commonly detected in brain tumors42. Although not indicative of causality in and of itself, the co-localization of viral components and malignancy provides compelling evidence supporting an association. The case for VZV/HHV3, however, is not quite as clear. The majority of studies investigating VZV/HHV3 in relation to brain cancer have been serological studies in which the presence of VZV antibodies and self-reported history of varicella (i.e., chicken pox) have been associated with decreased risk of brain cancer43,44,45. Research evaluating the presence of VZV/HHV3 viral proteins in brain tumors is scant42; therefore, additional research is warranted regarding VZV/HHV3 viral presence in relation to brain cancer. Finally, with respect to CMV/HHV5, it is worth noting that the gB glycoprotein we used (see “Materials and methods”) is “an essential glycoprotein for both in vivo and in vitro viral replication…. Furthermore, without exception, serological responses to gB can be detected in patients infected with HCMV.”46 In addition to gB glycoprotein, we also evaluated 5 abundant CMV proteins, including three strains of pp65, gM and gN (Table 7), all of which were negatively associated with the BC HLA immunogenetic profile but at lower strength than gB (Table 3). In fact, the combination of gB and pp65 in CMV vaccines has been found to possess an additive protective effect47.

Mechanisms of virus-induced carcinogenesis

With mounting research supporting a role of viruses in cancer, several viral oncogenic mechanisms have been identified48. These include direct mechanisms such as those affecting genomic instability, rate of cell proliferation, resistance to apoptosis and alteration in DNA repair mechanisms as well as indirect mechanisms including immunosuppression, chronic inflammation, and chronic antigenic stimulation. The mechanisms of carcinogenesis are fairly well characterized for JCV and CMV/HHV5 given the wealth of research on their role in oncogenesis and oncomodulation. CMV appears to indirectly promote oncogenesies via chronic inflammation, interference with signaling pathways, and influencing cell motility and proliferation factors whereas JCV directly promotes oncogenesis via expression of viral T-antigen which inhibits tumor suppression, prevention of DNA repair, and chromosomal alterations2, 8, 49. Considerably less research has evaluated the oncogenic mechanisms of HHV6 and HHV3/VZV. Several potential oncogenic and oncomodulataroy mechanisms have been proposed or preliminarily supported for HHV6 including disruption of apoptosis, cellular transformation, immunomodulatory mechanisms, and transactivation of other viruses8, 50. A role of indirect oncogenic mechanisms including immune evasion and immunosuppression have been implicated for VZV2. Particularly for HHV6 and HHV3/VZV, additional studies are warranted to elucidate the mechanisms that may permit or promote brain cancer.

Of the 11 viruses studied, the immunogenicity of 4 of the herpes viruses and HPV were unrelated to the brain cancer-HLA protection/susceptibility profile. Interestingly HHV6A immunogenicity was clearly associated with the brain cancer-HLA protection/susceptibility profile but HHV6B was not. Despite high molecular homology, pathomechanisms (including carcinogenic mechanisms) of HHV6A and HHV6B differ50. The absence of findings for HHV6b and other herpes viruses studied here or HPV do not suggest that they are not implicated in brain cancer. Indeed, HPV has been shown to be associated with brain tumors3 and some evidence has linked HSV-1/HHV1, HHSV-2, and EBV with brain cancer1, 4, 8. Rather, the current findings suggest that their potential roles in brain cancer are likely unrelated to immunogenicity with common Class I HLA alleles. Whether that is potentially due to viral immune escape mechanisms including downregulation of Class I HLA molecules2 or the influence of other mechanisms remains to be seen.

Here we focused on Class I HLA due to its early immune response to viral infection. However, Class II HLA, which is involved in long-term immunity including antibody production and immunological memory, may also be associated with brain cancer risk and protection. HLA Class I (HLA-A, B, C) and HLA Class II (HLA-DR, DQ, DP) are complementary classes of cell surface proteins that are instrumental in binding and elimination of intracellular and extracellular foreign antigens (e.g., viruses, bacteria, cancer neoantigens), respectively. Widespread identification of CMV DNA in the peripheral blood of patients with CMV DNA-positive brain tumors has been suggested to reflect either systemic reactivation or viral shedding from tumor cells to the periphery51, 52 in which case Class II HLA may be particularly relevant.

In summary, across the 69 Class I HLA alleles investigated here, the immunogenicity of all 11 viruses studied were negatively associated with the BC-HLA profile such that HLA allele-epitope complexes with high immunogenicity were associated with protective (i.e., negative) population brain cancer-HLA risk scores whereas those with low immunogenicity were associated with susceptibility (i.e., positive) population brain cancer risk, accounting for 44.52% of BC-HLA variance explained. Of the 11 viruses tested, 6 (JCV, HHV6A, HHV5, HHV3, HHV8, HHV7) had a substantial effect on BC-HLA scores, contributing 95% to the BC-HLA variance explained above, whereas the remainder 5 viruses (HPV, HHV6B, HHV1, HHV4, HHV2) contributed only 5%. Although the association between brain cancer and JCV, HHV6A, HHV5 and HHV3 has been more investigated, the presence of genomic material from HHV753, 54 and HHV855 has been reported in the brain. A major issue has to do with the large variety of brain tumors and the reasonable possibility that different viruses might be involved in different tumors, thus making the generalization of virus(es)-BC association difficult.

Our findings speak to the interaction of viral exposure and host immunogenetics on brain cancer and may partially account for some of the conflicting findings linking these particular viruses to brain cancer risk. However, these novel finding must be considered within the context of study limitations. First, it is unclear if the current findings extend to other geographic locations outside of Continental Western Europe. Evolutionary pressure aimed at maximizing geographically-relevant pathogen elimination has shaped the distribution of HLA alleles at the population level10, 11. Thus, the brain cancer-HLA profile used for analyses here may differ in other countries. Second, it is uncertain to what extent these population-based findings apply to individuals. The HLA region is the most highly polymorphic region of the human genome12. Each individual possesses 12 classical HLA alleles (two of each gene) that determine the repertoire of foreign antigens that can be bound and subsequently eliminated. Here our analyses focused on the association between the frequencies of single HLA alleles and the population prevalence of brain cancer; the combined effects of the 12 alleles carried by an individual with regard to viral immunogenicity and brain cancer occurrence remain to be investigated. Third, the brain cancer-HLA profile was determined here for overall brain cancer risk and may vary by brain cancer subtypes. Finally, we evaluated immunogenicity with respect to representative proteins of the 11 viruses and acknowledge that other proteins may differentially relate to the brain cancer-HLA profile. Despite these limitations, HLA binding and elimination of viral antigens that have been linked to brain cancer is an interesting avenue for continued investigation, particularly in light of ongoing research on viruses as candidates for oncolytic virotherapy4, 56,57,58.

Clinical relevance

The main focus of this study is on HLA, the common denominator for the epidemiologically derived BC HLA risk scores and the in silico derived immunogenicity scores for 11 viruses that have been implicated in nonmetastatic brain cancer. The main function of HLA is the elimination of foreign antigens, by killing the infected cell (HLA Class I, CD8+ mediated and assisted by Class II CD4+ assisted) and/or neutralizing the offending antigen by specific antibodies against it (HLA Class II, CD4+ mediated). The key and necessary step in this process is the formation of a complex between the HLA molecule and a peptide fragment (epitope) of the foreign antigen: it is this complex that is transported to the cell surface and engages the CD8+ and CD4+ lymphocytes. Therefore, the successful formation of a HLA molecule–antigen complex, and, hence, the successful lymphocyte engagement depends critically on the availability of HLA molecules which would bind to the foreign antigen. Indeed, the outcome of immune checkpoint blockade (ICB) therapy in melanoma depends on the HLA makeup of the patient59, in keeping with epidemiological evidence that higher population frequencies of HLA alleles found beneficial in ICB therapy are also associated with lower melanoma prevalence in those populations27. The results of this study form the basis of predictions with respect to outcomes of interventions against diseases caused by a virus among those we investigated (e.g. vaccines against HHV5/CMV48) or tumors where viral products have been found (e.g. glioblastoma60). It should be noted that the crucial role of HLA extends to any immunotherapy directed against tumor specific antigens (TSA; neoantigens)61, 62. Given this fundamental, permissive role of HLA in enabling cancer immunotherapies, it would be ideal if the requisite HLA molecule for a particular tumor TSA or virus could be present. Indeed, we proposed recently such a procedure, namely the administration of mRNA of specific HLA Class I molecules with high binding affinities to specific TSA(s)63. Such an approach may be particularly useful when coupled with immune checkpoint blockade for glioblastoma64.

Materials and methods

Prevalence of brain cancer

The population prevalences of brain cancer (BC) were computed for 14 countries in Continental Western Europe (Table 4). For each country we identified the total number of people with BC in 2019 from the Global Health Data Exchange65, a publicly available catalog of data from the Global Burden of Disease study, and divided those values by the total population of each country in 201965. Life expectancy was not included in the current analyses since it is virtually identical for these countries17.

Table 4 Prevalence of brain cancer in 14 CWE countries.

HLA alleles

We obtained the population frequency in 2019 of 69 common HLA Class I alleles from 14 Continental Western European Countries (Austria, Belgium, Denmark, Finland, France, Germany, Greece, Italy, Netherlands, Portugal, Norway, Spain, Sweden, and Switzerland)66. There was a total of 2746 entries of alleles from these countries, comprising 844 distinct alleles. Of those, 127 alleles (69 Class I and 58 Class II) occurred in 9 or more countries, with a minimum frequency (in any country) of 0.01. In this study we focused on the 69 HLA Class I alleles whose mean frequencies (over the contributing countries) are given in Table 5.

Table 5 The 69 HLA Class I alleles used and their mean frequencies.

BC-HLA protection/susceptibility (PS) scores

We estimated the association between brain prevalence and allele frequency by computing the covariance between the prevalence of brain cancer and the population frequency of the 69 HLA Class I alleles of Table 5. The covariance can be negative or positive, indicating a protective or susceptibility (PS) association, respectively. We call these covariance values PScov scores. Thus the HLA-BC profile is a vector of 69 PScov scores. The equation for the PScov scores is:

$$\mathrm{PScov }=\frac{1}{N-1}\sum_{i}^{i=1,N}({f}_{i}-\overline{f })({p}_{i}-\overline{p })$$
(1)

where \({f}_{i}, {p}_{i}\) denote allele frequency and BC prevalence for the ith country, respectively, and

\(\overline{f },\overline{p }\) are their means. As mentioned above, cancer prevalence was available for each one of the 14 CWE countries but availability of allele frequencies varied across countries. Hence, we used only the 69 alleles (Table 5) whose frequencies were available in at least 9 countries.

Random permutations test for assessing the statistical significance of the BC-HLA PScov profile

In this analysis, we tested the null hypothesis that the BC-HLA PScov profiles (vectors) may be due to chance by performing the permutation test below. Let \(H\) be the observed Disease-HLA profile and \(H^{\prime}\) be the profile obtained by random permutation, as follows. (1) For each allele, its observed frequencies were paired to randomly permuted prevalences of BC in the 14 CWE countries above. (2) The covariance between allele frequency and corresponding (to the permuted country) disease prevalence was computed to yield a “permuted” Disease-HLA profile, \(H^{\prime}\), consisting of 69 PScov scores. (3) The observed (H) and permuted (H’) PScov profiles were ranked and the absolute differences between the H and H’ ranks computed and summed: a sum of zero would indicate that the two profiles are the same. (4) Finally, we carried out this procedure 1,000,000 times and counted the number of times M that that sum was equal to zero, indicating that the ranks of the observed and permuted PScov scores are the same. (5) Then, the ratio \(w=\frac{M}{\mathrm{1,000,000}}\) is the probability that the observed profile \(H\) could be due to chance.

Virus antigens

We estimated the immunogenicity (for each one of the 69 Class I alleles) of proteins of 11 viruses that have been implicated in brain cancer, namely 9 human herpes virus species (HHV1-HHV9), human polyoma JC virus (JCV), and human papilloma virus (HPV). Details of the proteins analyzed are given in Table 6 and their amino acid (AA) sequences are given in Appendix 1. Given the extensive research on the role HHV5/CMV in glioblastoma and the use of its protein pp65 in vaccines against glioblastoma60, we also estimated the HLA-related immunogenicity of the following 5 HHV5/CMV proteins (Table 7): pp65 (strain AD169), pp65 (strain Merlin), pp65 (strain Towne), gM_HCMVA (strain AD169), and gN_HCMVA (strain AD169).

Table 6 Viral proteins used.
Table 7 The 5 additional HHV5/CMV proteins used.

Determination of immunogenicity of HLA class I alleles

The INeo-Epp method40 was used for T-cell receptor (TCR) epitope prediction using the INeo-Epp web tool via the INeo-Epp web form interface67. For that purpose, we split a given virus antigen (Table 6) to all possible 9-mer (nonamer) AA residue epitopes using a sliding window approach68,69,70 (Fig. 9) and submitted each epitope to the web-application together with a specific HLA allele (Table 5). More specifically, we paired all epitopes with all alleles and obtained for each pair its percentile rank, a measure of binding affinity of the epitope-HLA allele complex; smaller percentile ranks indicate higher binding affinity. The web-application gave as an outcome a TCR predictive score for pairs with high binding affinities (percentile rank < 2); scores > 0.4 indicated positive immunogenicity and were analyzed further. We computed the following as a comprehensive measure of immunogenicity for quantitative analyses. Let K be the number of nonamers that showed positive immunogenicity (score > 0.4); then, K weighted by their average score \(\overline{w }\), would serve as a good estimate of the overall effectiveness of a given allele, I, to induce immunogenicity for a given protein:

Figure 9
figure 9

The sliding nonamer window approach used to determine exhaustively in silico the immunogenicity of all possible consecutive nonamers in a protein, illustrated here for human polyoma major capsid protein VP1.

$$I=\overline{w }K$$
(2)

The distribution of I was skewed to the right (Fig. 10A), deviating substantially from a normal distribution (Fig. 10B). A logarithmic transformation of I made the distribution more symmetric (Fig. 11A) and closer to normal (Fig. 11B). Therefore, this transformation was applied to I for further analyses:

Figure 10
figure 10

(A) frequency distribution of raw immunogenicity score I (N = 11 virus proteins × 69 HLA alleles = 759). (B) Probability-probability (P-P) plot to illustrate deviation from normal distribution.

Figure 11
figure 11

(A) frequency distribution of the log-transformed immunogenicity score \(I^{\prime}\). (B) Probability-probability (P-P) plot to illustrate closeness to normal distribution.

$${I}^{\prime}={{\text{log}}}_{{\text{e}}}\left(I\right)={\text{ln}}(I)$$
(3)

Association of immunogenicity with BC-HLA PScov score

We evaluated the association between BC-HLA PScov scores (Eq. 1) and virus protein immunogenicity scores \(I^{\prime}\) (Eq. 3) by computing the Pearson correlation between them for each HLA allele. The correlation coefficient obtained for each virus was squared and multiplied × 100 to provide the percent of BC-HLA PScov explained (PVE) by the virus protein immunogenicity:

$$PVE=100{r}^{2}$$
(4)

The potential groupings of the 11 virus PVEs were explored using multidimensional scaling (MDS) and hierarchical tree clustering.

Implementation of analysis procedures

The IBM-SPSS statistical package (version 28) was used for implementing standard statistical analyses, including descriptive statistics and regression analysis. Reported P-values are 2-sided, unless noted otherwise. For MDS, the PROXSCAL procedure was used (model: ratio, initial configuration: simplex, stress convergence: 0.0001, minimum stress: 0.0001, maximum number of iterations: 100). For hierarchical tree clustering, the average between-groups linkage was used as the method and squared Euclidean distance as the interval, implemented by the Hierarchical Cluster function. Finally, the permutations test was implemented using FORTRAN (Geany, version 1.38, built on or after 2021-10-09) and 64-bit Mersenne Twister random number generator with a large random double-precision odd-number seed.