Introduction

Cancer, a leading cause of death worldwide1, is associated with environmental risk factors coupled with genetic predisposition2. It is now well-established that genetic factors are not only a significant contributor of cancer risk but also contribute to shared heritability among different cancers3,4,5,6,7,8. For instance, one study evaluating the genetic correlation between six cancer types in individuals of European ancestry found significant genetic correlations of colorectal cancer with pancreatic cancer, breast cancer, and lung cancer as well as genetic correlations between lung cancer and breast cancer6; in contrast, they found no evidence of shared heritability between prostate cancer and the other cancers they investigated. A more recent study from the same group reported statistically significant genetic correlations between breast cancer with ovarian cancer, lung cancer, and colorectal cancer as well as lung cancer with head/neck cancer7. A study evaluating 13 cancer types found the strongest genetic correlations between kidney and testicular cancer, diffuse large B-cell lymphoma and osteosarcoma and chronic lymphocytic leukemia, and bladder and lung cancer5. Finally, a study of 18 cancer types in two large cohorts of European ancestry found several genetically associated cancer pairs including positive correlations between colon and rectal cancers; esophageal/stomach cancer and Non-Hodgkin’s lymphoma, breast, lung, and rectal cancers; bladder and breast cancers; melanoma and testicular cancer; and prostate and thyroid cancers8. In addition they reported four negative correlations including endometrial and testicular cancers; esophageal/stomach cancer and melanoma; lung cancer and melanoma; and Non-Hodgkin’s lymphoma and prostate cancer8. Moreover, there was evidence of widespread pleiotropy including 25 regions that were associated with more than 1 cancer type, 14 of which involved the Human Leukocyte Antigen (HLA) region8.

The HLA region of chromosome 6 codes for cell-surface proteins involved in immunosurveillance and T-cell activation aimed at elimination of tumor cells and pathogens. Specifically, Class I HLA molecules (HLA-A, -B, -C) present intracellular antigen peptides to CD8+ cytotoxic T cells to signal destruction of infected cells whereas HLA Class II molecules (HLA-DR, DQ, and DP genes) present endocytosed extracellular antigen peptides to CD4+ T cells to promote B-cell mediated antibody production and adaptive immunity. HLA has been increasingly implicated in various types of cancer8,9,10,11,12,13. Moreover, HLA has recently been implicated in the shared heritability across cancers with some loci evidencing unidirectional associations with cancer types and others demonstrating discordant associations with cancer types, highlighting varying associations between HLA and different types of cancer8.

Taken together, prior studies have identified shared genetic associations between some cancer types and implicate HLA as common genetic mechanism. HLA genes are the most highly polymorphic of the human genome. Here we take advantage of the population heterogeneity of HLA and extend prior lines of research by evaluating similarities between population-level Cancer-HLA associations involving 30 types of cancer and 127 HLA alleles. This approach captures the complex associations between HLA and cancer and permits clustering of cancers with similar HLA profiles, potentially permitting identification of common genetic mechanisms underlying clusters of cancers.

Materials and methods

Prevalence of 30 cancers

The population prevalence of the 30 cancers (Table 1) in 2016 was computed for each of the following 14 countries in Continental Western Europe (CWE): Austria, Belgium, Denmark, Finland, France, Germany, Greece, Italy, Netherlands, Portugal, Norway, Spain, Sweden, and Switzerland. Specifically, the total number of people with each cancer in each of the 14 CWE countries was identified from the Global Health Data Exchange14, a publicly available catalog of data from the Global Burden of Disease study, the most comprehensive worldwide epidemiological study of more than 350 diseases. The number of people with each cancer in each country was divided by the total population of each country in 201615 and expressed as a percentage. We have previously shown that life expectancy for these countries are virtually identical16; therefore, life expectancy was not included in the current analyses.

Table 1 The 30 cancers studied in alphabetical order.

HLA

The frequencies of all reported HLA alleles of classical genes of Class I (A, B, C) and Class II (DPB1, DQB1, DRB1) for each of the 14 CWE countries were retrieved from the website allelefrequencies.net (Estimation of Global Allele Frequencies)17,18 on October 20, 2020. As we reported previously16, there was a total of 2746 entries of alleles from the 14 CWE countries, comprising 844 distinct alleles, i.e. alleles that occurred in at least one country. Of those, 127 alleles occurred in 9 or more countries and were used in further analyses. This criterion is somewhat arbitrary but reasonable; it was partially validated in a previous study19.

The distribution of those alleles (Table 2) to the HLA classes and their genes is given in Table 3.

Table 2 The 127 HLA alleles used and their Class, gene assignments.
Table 3 Distribution of 127 HLA alleles analyzed to Class and Genes.

Data analysis

Statistical analyses were performed using the IBM−SPSS package (IBM SPSS Statistics for Windows, Version 27.0, 64-bit edition. Armonk, NY: IBM Corp; 2019) and Intel FORTRAN (Microsoft Visual Studio Community 2019, Version 16.7.5; Intel FORTRAN Compiler 2021).

Cancer-HLA profiles and protection/susceptibility estimates

HLA profiles for each cancer were derived by computing, first, the Pearson correlation coefficient, \(r\), between the prevalence of a cancer and the population frequency of an allele, and then its Fisher z-transform, \(r{\prime }\), to normalize its distribution:

$${\text{HLA-Cancer}}\;{\text{Protection/Susceptibility}}\;\left( {P/S} \right)\;{\text{estimate}}:r^{\prime} = {\text{atanh}}\left( r \right)$$
(1)

Negative P/S estimates indicate a protective association (“protective” alleles), whereas positive P/S estimates indicate a susceptibility association (“susceptibility” alleles). Thus 30 Cancer-HLA profiles were computed, each consisting of 127 values of \(r\prime\). These data were tabulated in a 127 allele (rows) × 30 cancers (columns) matrix (“Cancer-HLA” matrix).

Multidimensional scaling (MDS)

A MDS analysis of the Cancer-HLA matrix was performed using the ALSCAL procedure of the IBM-SPSS statistical package (version 27). More specifically, we used (a) the individual differences (weighted) Euclidean distance model, where each HLA allele contributed separately to the solution, (b) Euclidean distance as the distance measure, (c) ordinal level of measurement, (d) solution in 2 dimensions, and (e) the default criteria for convergence: S-stress = 0.001, minimum s-stress value = 0.005, maximum iterations = 30.

Spatial analysis of the MDS map: test for pure randomness

The MDS analysis yielded a 4 × 4 (arbitrary units; area = 16) 2-dimensional cancer configuration map, where each cancer type occupied a point in the map based on its X–Y (Dimension 1 and 2) coordinates. The initial step in the spatial analysis of the distribution of cancer types in this map was to test the null hypothesis that the spatial distribution of the 30 cancer types was random. For that purpose, we used the distance to nearest neighbor measure of Clark and Evans20, as follows. Given N = 30 points (cancer types), we computed 30 distances to nearest neighbor, \({\text{g}}_{{\text{A}}}^{k}\), one for each \(k\)th X–Y point, and obtained its average \({\overline{\text{g}}}_{{\text{A}}}\):

$${\overline{\text{g}}}_{{\text{A}}} = \frac{1}{N}\mathop \sum \limits_{k}^{k = 1, N} {\text{g}}_{{\text{A}}}^{k}$$
(2)

Next, we computed the density, \(\rho\), of the observed distribution expressed as the number of points (N = 30) per unit of area (total area = 4 × 4 = 16):

$$\rho = \frac{N}{{{\text{area}} }}$$
(3)

The mean distance to nearest neighbor expected in an infinitely large random distribution of density \(\rho\) is given by

$${\overline{\text{g}}}_{{\text{E}}} = \frac{1}{2\sqrt \rho }$$
(4)

With a standard error of

$$\sigma_{{{\overline{\text{g}}}_{{\text{E}}} }} = \frac{0.26136}{{\sqrt {N\rho } }}$$
(5)

The ratio \(G\)

$$G = \frac{{{\overline{\text{g}}}_{A} }}{{{\overline{\text{g}}}_{E} }}$$
(6)

can be used as a measure of the degree to which the observed distribution approaches or departs from random expectation. In a random distribution, \(G = 1\), whereas under conditions of maximum aggregation, \(G = 0\). Finally, the key test statistic \(c\) for testing the null hypothesis of pure randomness is given by

$$c = \frac{{{\overline{\text{g}}}_{{{A}}} - {\overline{\text{g}}}_{{{E}}} }}{{\sigma_{{{\overline{\text{g}}}_{{{E}}} }} }}$$
(7)

where \(c\) is the standard deviate of the normal curve.

Spatial analysis of the MDS map: clustering of cancer types

We performed a K-means clustering analysis of the location of cancer types in the MDS map in order to identify possible clusters using the iterate-and-classify method of the K-means Cluster Analysis procedure of the IBM-SPSS statistical package (version 27).

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Results

The MDS cancer type configuration map is shown in Fig. 1 and the X–Y coordinates of the 30 cancer types are given in Table 4.

Figure 1
figure 1

Multidimensional scaling (MDS) configuration map of the 30 cancer types with color-coded clusters of cancer types. Each point denotes the location of a cancer type within the 2-dimensional MDS configuration space. The clusters were derived by applying a K-means clustering algorithm on the x–y MDS coordinates of the cancer types. The distance between two points (cancer types) is a measure of similarity of the immunogenetic profiles of two cancers, where each such profile consisted of 127 protection/susceptibility (P/S) HLA estimates for each cancer type.

Table 4 The X–Y coordinates of the 30 cancers in the MDS cancer types configuration map of Fig. 1.

Analysis of spatial distribution of the MDS map of 30 cancer types: test for pure randomness

The following values were obtained:

$${\overline{\text{g}}}_{A} = 0.009456$$
(9)
$$\rho = \frac{30 }{{16}} = 1.875$$
(10)
$${\overline{\text{g}}}_{{\text{E}}} = \frac{1}{2\sqrt \rho } = 0.36515$$
(11)
$$G = \frac{{{\overline{\text{g}}}_{A} }}{{{\overline{\text{g}}}_{E} }} = 0.0259$$
(12)
$$\sigma_{{{\overline{\text{g}}}_{{\text{E}}} }} = 0.03485$$
(13)
$$c = \frac{{{\overline{\text{g}}}_{{{A}}} - {\overline{\text{g}}}_{{{E}}} }}{{\sigma_{{{\overline{\text{g}}}_{{{E}}} }} }} = 10.21\;\left( {P < 0.001} \right)$$
(14)

The high value of normal deviate \(c\) (Eq. 14), and the corresponding very low probability value, reject the null hypothesis of pure randomness of the distribution of the 30 cancer types in the MDS configuration map (Fig. 1).

Spatial clustering of cancer types

Inspection of Fig. 1 suggests the existence of 3 main clusters. (Interestingly, this is the same number of clusters predicted by the measure of Carlis and Bruso21). Therefore, we performed a K-means clustering analysis which assigned the 30 cancer types to 3 clusters (red, green and blue ellipses in Fig. 1). Cluster 1 (blue, Fig. 1) comprised 7 cancers of which melanoma and prostate cancer were tightly clustered; brain cancer and non-Hodgkin’s lymphoma were also included in this cluster. Cluster 2 (red, Fig. 1) comprised 11 cancers including several reproductive system cancers as well as some cancers involving the endocrine system. Here, breast, kidney, and tracheal-bronchus-lung cancers are tightly clustered as are pancreatic, ovarian, and neoplasms. Finally, cluster 3 (green, Fig. 1) comprised 12 cancer including several cancers of the digestive system, cervical cancer, and skin cancers.

Immunogenetic associations between specific cancers

The results above are based on associations between immunogenetic HLA-Cancer profiles of specific cancer types. It would be important to cross-validate this approach by comparing associations above with known associations between cancer types occurring in the same individual. To that end, we used 4 pairs of cancers with significant probabilities of co-occurrence in the same individuals and computed the correlations between their immunogenetic HLA-Cancer profiles to test the prediction that those correlations would be positive, high, and statistically significant, if, indeed, the immunogenetic predictions are congruent with the documented co-occurrence. The 4 pairs of cancers included (1) multiple myeloma and kidney cancer22,23,24,25 (Fig. 2), (2) pancreatic and ovarian cancers26,27,28 (Fig. 3), (3) prostate cancer and melanoma29 (Fig. 4), and (4) cancer of the bladder and larynx30 (Fig. 5). We also analyzed two additional cancer pairs for which common causes have been postulated but for which we could not find cases documenting their co-occurrence. One is the gallbladder-colorectal cancer pair (Fig. 6) for which gallstones have been implicated31,32,33, and the other is the mesothelioma-esophageal cancer pair (Fig. 7) for which exposure to asbestos has been implicated34,35. It can be seen that the immunogenetic HLA profiles of all these 6 cancer pairs were positively and highly significantly associated (P < 0.001 for all correlations), attesting to the congruence of existing data from other studies. Finally, the immunogenetic association between mesothelioma and melanoma P/S is illustrated in Fig. 8.

Figure 2
figure 2

Immunogenetic P/S estimates for multiple myeloma are plotted against those for kidney cancer. r is the Pearson correlation coefficient; P < 0.001; N = 127.

Figure 3
figure 3

Immunogenetic P/S estimates for pancreatic cancer are plotted against those for ovarian cancer. P < 0.001; N = 127.

Figure 4
figure 4

Immunogenetic P/S estimates for prostate cancer are plotted against those for melanoma. P < 0.001; N = 127.

Figure 5
figure 5

Immunogenetic P/S estimates for the cancer of larynx cancer are plotted against those for the cancer of bladder. P < 0.001; N = 127.

Figure 6
figure 6

Immunogenetic P/S estimates for colorectal cancer are plotted against those for the cancer of gallbladder. P < 0.001; N = 127.

Figure 7
figure 7

Immunogenetic P/S estimates for mesothelioma are plotted against those for esophageal cancer. P < 0.001; N = 127.

Figure 8
figure 8

Immunogenetic P/S estimates for mesothelioma are plotted against those for melanoma. P < 0.001; N = 127.

Discussion

Here we used multidimensional scaling36 to identify clusters of cancers based on the population-based HLA profile of each cancer in 14 countries in Continental Western Europe. The findings, which indicate the presence of three clusters derived from the immunogenetic HLA-cancer associations, partially overlap with prior studies documenting shared genetic associations for different cancers3,4,5,6,7,8 and provide novel insights regarding potential shared mechanisms underlying different types of cancers. Remarkably, the immunogenetic associations derived here were congruent with known associations between the co-occurrence (or the significant probability of co-occurrence) of certain types of cancers, thus lending credit to our immunogenetic approach.

The three clusters derived (Fig. 1) are based on similarities of the HLA profiles among cancer types: the more similar the HLA profile between two cancers, the closer these cancers will be in the map of Fig. 1. Given the role of HLA in antigen elimination, it can be inferred that a given cluster may reflect similar HLA binding to shared neoantigens37, epitopes common to different antigens, and/or viral oncoproteins38,39 associated with several types of cancer within a cluster. Additional research is warranted to identify the specific factors linking cancers within each cluster and those that distinguish different clusters.

Several associations documented here overlap with those of previous studies and several others are novel. Consistent with the clustering observed in our study, previous studies have documented genetic associations between several types of cancer including breast cancer and lung cancer6,7, breast cancer and ovarian cancer7, kidney and testicular cancers5, and esophageal cancer and Non-Hodgkin’s lymphoma8. The present findings provide additional support regarding shared genetic associations between these cancer types and specifically point to highly similar HLA as partially driving those associations. Prior studies evaluating genetic associations between different types of cancer have been limited by inclusion of relatively few types of cancer and very few studies have specifically evaluated HLA with regard to genetic associations. Here we included 30 different cancers and 127 different HLA alleles permitting identification of overlapping cancer-HLA associations that have not previously been reported to our knowledge. For example, we found that prostate cancer and melanoma are very tightly clustered, implicating HLA in that shared risk. Prior studies have documented a possible link between melanoma risk and prostate cancer risk, purportedly associated with androgen-dependence40,41. Esophageal cancer and non-Hodgkin’s lymphoma, which were clustered with melanoma and prostate cancer here, have also been linked to androgens42,43. Though intriguing, a mechanism linking HLA to androgen-associated cancers remains to be elucidated.

Overall, the findings of the present study document similarities and differences among several types of cancer based on HLA associations in Continental Western European Countries. It should be noted, however, that the findings may not generalize to other populations since HLA varies across populations44,45. For instance, the genetic architecture of lung cancer has been shown to vary by population and environmental exposures5. Furthermore, although environmental factors play a role in risk of cancers2,3, we solely focused on the influence of HLA and did not evaluate the influence of environmental exposures in the present study. Nonetheless, the results provide compelling evidence regarding the influence of HLA on the population clustering of different types of cancer and suggest that additional research evaluating Cancer-HLA associations is warranted. As discussed above, HLA relates to antigens, and, as such, HLA-based measures and associations are necessarily discussed in the context of antigen presentation and processing by HLA. Given the diversity of proteins produced by tumor cells, the concomitant degradation of host proteins, and the added potential viral proteins implicated in several cancers, it is not surprising that the HLA-Cancer immunogenetic profiles provide a broad spectrum of cancer protection/susceptibility estimates reflecting the footprint, so to speak, of a specific cancer with regard to the antigens related to the cancer. In that sense, the observed associations and clustering described here furnish a framework within which a 127-dimensional HLA allele map is effectively reduced to a 2-dimensional map by multidimensional scaling that allows for a simplified investigation of immune-related associations among cancer types. Since the origin of the various antigens involved can be very diverse, as mentioned above, the associations found among the various cancer types are more holistic in nature, and their interpretation rests on information stemming from considerations regarding the antigens themselves. For example, a major part of the co-occurrence of pancreatic and ovarian cancers stems from the presence of BRCA genes in individuals susceptible to both cancers but their corresponding significant immunogenetic association indicates that the spectrum of antigens related to those cancers is very similar in those cancers despite the fact that they affect very different cells (pancreas vs. ovaries). Similar considerations hold for the more tenuous attribution of the mesothelioma/esophageal cancer association to asbestos and of the gallbladder/colorectal cancer association to gallstones. A recent study linking mesothelioma to melanoma46 via BAP1 gene highlights the predictive power of our immunogenetic approach which showed a high correlation between mesothelioma and melanoma HLA P/S estimates (Fig. 8).

In summary, the immunogenetic HLA-Cancer association approach is a promising new tool in identifying associations between cancer types which may be due to various causes (genetic, environmental, viral, etc.) but which produce a spectrum of antigens that engage in a similar way the HLA system. Given that the success of immune blockade immunotherapy for cancer partly depends on the HLA genetic makeup of the patient11,12,13, the HLA-based cancer associations we report here could be useful in informing immunotherapy across various cancer types.