Immunogenetic clustering of 30 cancers

Human leukocyte antigen (HLA) genes have been implicated in cancer risk and shared heritability of different types of cancer. In this immunogenetic epidemiological study we first computed a Cancer-HLA profile for 30 cancer types characterized by the correlation between the prevalence of each cancer and the population frequency of 127 HLA alleles, and then used multidimensional scaling to evaluate the possible clustering of those Cancer-HLA associations. The results indicated the presence of three clusters, broadly reflecting digestive-skin-cervical cancers, reproductive and endocrine systems cancers, and brain and androgen-associated cancers. The clustering of cancer types documented here is discussed in terms of mechanisms underlying shared Cancer-HLA associations.


Immunogenetic clustering of 30 cancers
Lisa M. James 1,2,3 & Apostolos P. Georgopoulos 1,2,3,4* Human leukocyte antigen (HLA) genes have been implicated in cancer risk and shared heritability of different types of cancer. In this immunogenetic epidemiological study we first computed a Cancer-HLA profile for 30 cancer types characterized by the correlation between the prevalence of each cancer and the population frequency of 127 HLA alleles, and then used multidimensional scaling to evaluate the possible clustering of those Cancer-HLA associations. The results indicated the presence of three clusters, broadly reflecting digestive-skin-cervical cancers, reproductive and endocrine systems cancers, and brain and androgen-associated cancers. The clustering of cancer types documented here is discussed in terms of mechanisms underlying shared Cancer-HLA associations.
Cancer, a leading cause of death worldwide 1 , is associated with environmental risk factors coupled with genetic predisposition 2 . It is now well-established that genetic factors are not only a significant contributor of cancer risk but also contribute to shared heritability among different cancers [3][4][5][6][7][8] . For instance, one study evaluating the genetic correlation between six cancer types in individuals of European ancestry found significant genetic correlations of colorectal cancer with pancreatic cancer, breast cancer, and lung cancer as well as genetic correlations between lung cancer and breast cancer 6 ; in contrast, they found no evidence of shared heritability between prostate cancer and the other cancers they investigated. A more recent study from the same group reported statistically significant genetic correlations between breast cancer with ovarian cancer, lung cancer, and colorectal cancer as well as lung cancer with head/neck cancer 7 . A study evaluating 13 cancer types found the strongest genetic correlations between kidney and testicular cancer, diffuse large B-cell lymphoma and osteosarcoma and chronic lymphocytic leukemia, and bladder and lung cancer 5 . Finally, a study of 18 cancer types in two large cohorts of European ancestry found several genetically associated cancer pairs including positive correlations between colon and rectal cancers; esophageal/stomach cancer and Non-Hodgkin's lymphoma, breast, lung, and rectal cancers; bladder and breast cancers; melanoma and testicular cancer; and prostate and thyroid cancers 8 . In addition they reported four negative correlations including endometrial and testicular cancers; esophageal/stomach cancer and melanoma; lung cancer and melanoma; and Non-Hodgkin's lymphoma and prostate cancer 8 . Moreover, there was evidence of widespread pleiotropy including 25 regions that were associated with more than 1 cancer type, 14 of which involved the Human Leukocyte Antigen (HLA) region 8 .
The HLA region of chromosome 6 codes for cell-surface proteins involved in immunosurveillance and T-cell activation aimed at elimination of tumor cells and pathogens. Specifically, Class I HLA molecules (HLA-A, -B, -C) present intracellular antigen peptides to CD8+ cytotoxic T cells to signal destruction of infected cells whereas HLA Class II molecules (HLA-DR, DQ, and DP genes) present endocytosed extracellular antigen peptides to CD4+ T cells to promote B-cell mediated antibody production and adaptive immunity. HLA has been increasingly implicated in various types of cancer [8][9][10][11][12][13] . Moreover, HLA has recently been implicated in the shared heritability across cancers with some loci evidencing unidirectional associations with cancer types and others demonstrating discordant associations with cancer types, highlighting varying associations between HLA and different types of cancer 8 .
Taken together, prior studies have identified shared genetic associations between some cancer types and implicate HLA as common genetic mechanism. HLA genes are the most highly polymorphic of the human genome. Here we take advantage of the population heterogeneity of HLA and extend prior lines of research by evaluating similarities between population-level Cancer-HLA associations involving 30 types of cancer and 127 HLA alleles. This approach captures the complex associations between HLA and cancer and permits clustering of cancers with similar HLA profiles, potentially permitting identification of common genetic mechanisms underlying clusters of cancers.  16 , there was a total of 2746 entries of alleles from the 14 CWE countries, comprising 844 distinct alleles, i.e. alleles that occurred in at least one country. Of those, 127 alleles occurred in 9 or more countries and were used in further analyses. This criterion is somewhat arbitrary but reasonable; it was partially validated in a previous study 19 .
The distribution of those alleles (Table 2) to the HLA classes and their genes is given in Table 3. Cancer-HLA profiles and protection/susceptibility estimates. HLA profiles for each cancer were derived by computing, first, the Pearson correlation coefficient, r , between the prevalence of a cancer and the population frequency of an allele, and then its Fisher z-transform, r′ , to normalize its distribution: www.nature.com/scientificreports/ Negative P/S estimates indicate a protective association ("protective" alleles), whereas positive P/S estimates indicate a susceptibility association ("susceptibility" alleles). Thus 30 Cancer-HLA profiles were computed, each consisting of 127 values of r′ . These data were tabulated in a 127 allele (rows) × 30 cancers (columns) matrix ("Cancer-HLA" matrix).  1 and 2) coordinates. The initial step in the spatial analysis of the distribution of cancer types in this map was to test the null hypothesis that the spatial distribution of the 30 cancer types was random. For that purpose, we used the distance to nearest neighbor measure of Clark and Evans 20 , as follows. Given N = 30 points (cancer types), we computed 30 distances to nearest neighbor, g k A , one for each k th X-Y point, and obtained its average g A :

Multidimensional scaling (MDS). A MDS analysis of the
Next, we computed the density, ρ , of the observed distribution expressed as the number of points (N = 30) per unit of area (total area = 4 × 4 = 16): The mean distance to nearest neighbor expected in an infinitely large random distribution of density ρ is given by (1) HLA-Cancer Protection/Susceptibility (P/S) estimate : r ′ = atanh(r)

Results
The MDS cancer type configuration map is shown in Fig. 1 and the X-Y coordinates of the 30 cancer types are given in Table 4.

Analysis of spatial distribution of the MDS map of 30 cancer types: test for pure randomness.
The following values were obtained: The high value of normal deviate c (Eq. 14), and the corresponding very low probability value, reject the null hypothesis of pure randomness of the distribution of the 30 cancer types in the MDS configuration map (Fig. 1). Fig. 1 suggests the existence of 3 main clusters. (Interestingly, this is the same number of clusters predicted by the measure of Carlis and Bruso 21 ). Therefore, we performed a K-means clustering analysis which assigned the 30 cancer types to 3 clusters (red, green and blue ellipses in Fig. 1). Cluster 1 (blue, Fig. 1) comprised 7 cancers of which melanoma and prostate cancer were tightly clustered; brain cancer and non-Hodgkin's lymphoma were also included in this cluster. Cluster 2 (red, Fig. 1) comprised 11 cancers including several reproductive system cancers as well as some cancers involving the endocrine system. Here, breast, kidney, and tracheal-bronchus-lung cancers are tightly clustered as are pancreatic, ovarian, and neoplasms. Finally, cluster 3 (green, Fig. 1) comprised 12 cancer including several cancers of the digestive system, cervical cancer, and skin cancers. www.nature.com/scientificreports/ Immunogenetic associations between specific cancers. The results above are based on associations between immunogenetic HLA-Cancer profiles of specific cancer types. It would be important to cross-validate this approach by comparing associations above with known associations between cancer types occurring in the same individual. To that end, we used 4 pairs of cancers with significant probabilities of co-occurrence in the same individuals and computed the correlations between their immunogenetic HLA-Cancer profiles to test the prediction that those correlations would be positive, high, and statistically significant, if, indeed, the immunogenetic predictions are congruent with the documented co-occurrence. The 4 pairs of cancers included (1) multiple myeloma and kidney cancer [22][23][24][25] (Fig. 2), (2) pancreatic and ovarian cancers [26][27][28] (Fig. 3), (3) prostate cancer and melanoma 29 (Fig. 4), and (4) cancer of the bladder and larynx 30 (Fig. 5). We also analyzed two additional cancer pairs for which common causes have been postulated but for which we could not find cases documenting their co-occurrence. One is the gallbladder-colorectal cancer pair (Fig. 6) for which gallstones have been implicated [31][32][33] , and the other is the mesothelioma-esophageal cancer pair (Fig. 7) for which exposure to asbestos has been implicated 34,35 . It can be seen that the immunogenetic HLA profiles of all these 6 cancer pairs were positively and highly significantly associated (P < 0.001 for all correlations), attesting to the congruence of existing data from other studies. Finally, the immunogenetic association between mesothelioma and melanoma P/S is illustrated in Fig. 8.

Discussion
Here we used multidimensional scaling 36 to identify clusters of cancers based on the population-based HLA profile of each cancer in 14 countries in Continental Western Europe. The findings, which indicate the presence of three clusters derived from the immunogenetic HLA-cancer associations, partially overlap with prior studies documenting shared genetic associations for different cancers 3-8 and provide novel insights regarding potential shared mechanisms underlying different types of cancers. Remarkably, the immunogenetic associations derived here were congruent with known associations between the co-occurrence (or the significant probability of cooccurrence) of certain types of cancers, thus lending credit to our immunogenetic approach. The three clusters derived (Fig. 1) are based on similarities of the HLA profiles among cancer types: the more similar the HLA profile between two cancers, the closer these cancers will be in the map of Fig. 1. Given the role of HLA in antigen elimination, it can be inferred that a given cluster may reflect similar HLA binding to shared neoantigens 37 , epitopes common to different antigens, and/or viral oncoproteins 38,39 associated with several types www.nature.com/scientificreports/ of cancer within a cluster. Additional research is warranted to identify the specific factors linking cancers within each cluster and those that distinguish different clusters. Several associations documented here overlap with those of previous studies and several others are novel. Consistent with the clustering observed in our study, previous studies have documented genetic associations between several types of cancer including breast cancer and lung cancer 6,7 , breast cancer and ovarian cancer 7 , kidney and testicular cancers 5 , and esophageal cancer and Non-Hodgkin's lymphoma 8 . The present findings provide additional support regarding shared genetic associations between these cancer types and specifically point to highly similar HLA as partially driving those associations. Prior studies evaluating genetic associations between different types of cancer have been limited by inclusion of relatively few types of cancer and very few studies have specifically evaluated HLA with regard to genetic associations. Here we included 30 different cancers and 127 different HLA alleles permitting identification of overlapping cancer-HLA associations that have not previously been reported to our knowledge. For example, we found that prostate cancer and melanoma are very tightly clustered, implicating HLA in that shared risk. Prior studies have documented a possible link between melanoma risk and prostate cancer risk, purportedly associated with androgen-dependence 40,41 . Esophageal cancer and non-Hodgkin's lymphoma, which were clustered with melanoma and prostate cancer here, have also been linked to androgens 42,43 . Though intriguing, a mechanism linking HLA to androgen-associated cancers remains to be elucidated.
Overall, the findings of the present study document similarities and differences among several types of cancer based on HLA associations in Continental Western European Countries. It should be noted, however, that the findings may not generalize to other populations since HLA varies across populations 44,45 . For instance, the genetic architecture of lung cancer has been shown to vary by population and environmental exposures 5 . Furthermore, although environmental factors play a role in risk of cancers 2,3 , we solely focused on the influence of HLA and did not evaluate the influence of environmental exposures in the present study. Nonetheless, the results provide compelling evidence regarding the influence of HLA on the population clustering of different types of cancer and suggest that additional research evaluating Cancer-HLA associations is warranted. As discussed above, HLA relates to antigens, and, as such, HLA-based measures and associations are necessarily discussed in the context of antigen presentation and processing by HLA. Given the diversity of proteins produced by tumor www.nature.com/scientificreports/  www.nature.com/scientificreports/   www.nature.com/scientificreports/ cells, the concomitant degradation of host proteins, and the added potential viral proteins implicated in several cancers, it is not surprising that the HLA-Cancer immunogenetic profiles provide a broad spectrum of cancer protection/susceptibility estimates reflecting the footprint, so to speak, of a specific cancer with regard to the antigens related to the cancer. In that sense, the observed associations and clustering described here furnish a framework within which a 127-dimensional HLA allele map is effectively reduced to a 2-dimensional map by multidimensional scaling that allows for a simplified investigation of immune-related associations among cancer types. Since the origin of the various antigens involved can be very diverse, as mentioned above, the associations found among the various cancer types are more holistic in nature, and their interpretation rests on information stemming from considerations regarding the antigens themselves. For example, a major part of the co-occurrence of pancreatic and ovarian cancers stems from the presence of BRCA genes in individuals susceptible to both cancers but their corresponding significant immunogenetic association indicates that the spectrum of antigens related to those cancers is very similar in those cancers despite the fact that they affect very different cells (pancreas vs. ovaries). Similar considerations hold for the more tenuous attribution of the mesothelioma/esophageal cancer association to asbestos and of the gallbladder/colorectal cancer association to gallstones. A recent study linking mesothelioma to melanoma 46 via BAP1 gene highlights the predictive power of our immunogenetic approach which showed a high correlation between mesothelioma and melanoma HLA P/S estimates (Fig. 8).
In summary, the immunogenetic HLA-Cancer association approach is a promising new tool in identifying associations between cancer types which may be due to various causes (genetic, environmental, viral, etc.) but which produce a spectrum of antigens that engage in a similar way the HLA system. Given that the success of immune blockade immunotherapy for cancer partly depends on the HLA genetic makeup of the patient 11-13 , the HLA-based cancer associations we report here could be useful in informing immunotherapy across various cancer types.

Data availability
All data used were retrieved from freely accessible websites, as mentioned in Methods, and, as such, are publicly and freely available.