Genetic immune escape landscape in primary and metastatic cancer

Martínez-Jiménez, Francisco; Priestley, Peter; Shale, Charles; Baber, Jonathan; Rozemuller, Erik; Cuppen, Edwin

doi:10.1038/s41588-023-01367-1

Download PDF

Article
Open access
Published: 10 May 2023

Genetic immune escape landscape in primary and metastatic cancer

Francisco Martínez-Jiménez ORCID: orcid.org/0000-0001-7804-1134^1,2^na1^nAff5,
Peter Priestley³^na1,
Charles Shale³,
Jonathan Baber³,
Erik Rozemuller⁴ &
…
Edwin Cuppen ORCID: orcid.org/0000-0002-0400-9542^1,2

Nature Genetics volume 55, pages 820–831 (2023)Cite this article

23k Accesses
15 Citations
180 Altmetric
Metrics details

Subjects

Abstract

Studies have characterized the immune escape landscape across primary tumors. However, whether late-stage metastatic tumors present differences in genetic immune escape (GIE) prevalence and dynamics remains unclear. We performed a pan-cancer characterization of GIE prevalence across six immune escape pathways in 6,319 uniformly processed tumor samples. To address the complexity of the HLA-I locus in the germline and in tumors, we developed LILAC, an open-source integrative framework. One in four tumors harbors GIE alterations, with high mechanistic and frequency variability across cancer types. GIE prevalence is generally consistent between primary and metastatic tumors. We reveal that GIE alterations are selected for in tumor evolution and focal loss of heterozygosity of HLA-I tends to eliminate the HLA allele, presenting the largest neoepitope repertoire. Finally, high mutational burden tumors showed a tendency toward focal loss of heterozygosity of HLA-I as the immune evasion mechanism, whereas, in hypermutated tumors, other immune evasion strategies prevail.

A single-cell atlas enables mapping of homeostatic cellular shifts in the adult human breast

Article Open access 28 March 2024

Austin D. Reed, Sara Pensa, … Walid T. Khaled

Human lung cancer harbors spatially organized stem-immunity hubs associated with response to immunotherapy

Article 19 March 2024

Jonathan H. Chen, Linda T. Nieman, … Nir Hacohen

SOX17 enables immune evasion of early colorectal adenomas and cancers

Article 28 February 2024

Norihiro Goto, Peter M. K. Westcott, … Ömer H. Yilmaz

Main

Cancer immune escape is the process whereby tumor cells prevent their elimination by the immune system^1,2. Tumors acquire this capacity as a response to the accumulation of tumor-specific alterations, which may be presented—in the form of neoepitopes—by the major histocompatibility complex class I (MHC-I). Escape from immune system recognition often involves tumor-specific genomic alterations in immune-related pathways, a process named genetic immune escape (GIE).

GIE alterations operate through different mechanisms, including partial or complete abrogation of neoepitope presentation³ or suppression of proapoptotic signals from the surrounding immune cells⁴. Therefore, identification of GIE events across human cancers is key to understanding the interplay between cancer cells and the immune system, as well as to enable effective precision medicine based on immunotherapy.

Previous studies have performed cancer type-specific molecular profiling of GIE events and their phenotypic implications in several cancer types, including non-small-cell lung cancer^5,6 (NSCLC) and colorectal carcinoma⁷, among others^8,9. Others have performed an extensive analysis of loss of heterozygosity (LOH) of HLA-I across thousands of tumor samples¹⁰. However, a pan-cancer analysis of the prevalence and impact of diverse GIE events is currently lacking. In addition, the focus of these studies was to portray GIE in early stage primary tumors, whereas the changes induced by exposure to treatment and by the metastatic bottleneck have not been comprehensively addressed.

One of the main challenges to perform such analyses lies in the extraordinary diversity of the HLA-I locus, with >15,000 different sequences of the HLA-A, HLA-B and HLA-C genes reported to date¹¹. This extensive polymorphism hampers the identification of tumor-specific somatic alterations, prompting the development of tools that specifically identify LOH of HLA-I (ref. ¹²) or HLA-I somatic mutations¹³ from whole-exome sequencing (WES) and whole-genome sequencing (WGS) data. However, none of these tools provides an integrative characterization of the HLA-I tumor status in both the germline and the tumor, which includes HLA-I typing, allelic imbalance, LOH of HLA-I and somatic mutation annotation.

In the present study, we present a pan-cancer landscape of the GIE prevalence in primary (represented by the PCAWG (pan-cancer analysis of whole genomes) cohort) and unmatched metastatic patients (represented by the Hartwig cohort). Furthermore, to address the complexity of the HLA-I locus, we developed LILAC, an open-source integrative framework that characterizes the HLA-I locus, including its tumor status from WGS data. We applied LILAC and a universal tumor-processing pipeline to establish a comprehensive portrait of GIE events and their positive selection landscape across six different pathways associated with an immune evasion phenotype: the HLA-I locus, the antigen presentation machinery, interferon (IFN)-γ signaling pathway, the programmed cell death ligand 1 (PD-L1) immune checkpoint, the costimulatory signaling by the CD58 receptor and epigenetic immune escape driven by SETDB1 (Fig. 1a and Supplementary Table 1). We also studied how the tumor mutational burden (TMB) and other genomic and environmental features influence the prevalence of GIE alterations, providing insights into tumorigenesis and its interplay with the immune system.

Results

Inference of HLA-I tumor status with LILAC

Inference of the correct HLA-I tumor status is fundamental to identifying GIE alterations (Fig. 1a), to estimate the neoepitope repertoire and burden and to predict the response to immune checkpoint inhibitors^14,15 (ICIs). We have developed LILAC, a framework that performs HLA-I typing for the germline of each patient, as well as determining the status of each of those alleles in the tumor using WGS data on tumor-normal paired samples as input. LILAC also allows for detection of novel human leukocyte antigen (HLA) alleles and provides allele-specific and sample-level, quality control measurements (Fig. 1b and Supplementary Note 1).

We first assessed LILAC’s HLA-I typing robustness by independently calculating the germline and tumor HLA-I two-field-calling agreement across 6,279 patients, including 4,439 patients from the Hartwig¹⁶ dataset and 1,839 from the PCAWG¹⁷ cohort. LILAC showed the highest agreement compared with two state-of-the-art HLA typing tools, Polysolver¹³ and xHLA¹⁸ (Fig. 1c, Extended Data Fig. 1a and Supplementary Data 1). The Hartwig dataset showed higher normal-tumor agreement for all tools, possibly due to the higher sequencing coverage and read quality of this dataset. In a three-way comparison, LILAC also displayed the highest overlap with the predictions from the other tools across both datasets (Extended Data Fig. 1b,c). Moreover, LILAC’s HLA-I typing performance on three family trios with diverse genetic ancestries showed a perfect agreement with previously reported HLA-I types (Fig. 1d). Next, we demonstrated WES applicability by running LILAC on the TRACERx100 lung cohort, where it showed a 98.16% agreement with the HLA-I types originally reported in the publication¹² (Fig. 1e). Finally, we evaluated LILAC HLA-I typing sensitivity in a set of 95 samples with challenging HLA-I types—including 10 from tumor biopsies—with an independent orthogonal and clinically validated HLA-I typing approach (Supplementary Note 1). LILAC showed a perfect 100% two-field agreement across the 564 alleles, higher than Polysolver (93.09%) and xHLA (98.94%) agreements (Fig. 1f and Supplementary Data 1). To conclude, LILAC reported nine somatic mutations in seven of the tumor biopsies evaluated. All of them were perfectly matched by the orthogonal approach (Supplementary Data 1).

HLA allele-specific, tumor copy number (CN) determination is key to identify LOH of HLA-I genes in tumors, a well-established mechanism of immune evasion^10,12. LILAC annotates allele-specific ploidy levels of each HLA-I allele based on the purity-corrected local tumor CN estimations and the number of fragments assigned to each allele (Supplementary Note 1). WGS data provide adequate resolution to annotate purity-adjusted minor and major allele ploidy in the HLA-I locus (Extended Data Fig. 1d,e). Moreover, we quantified LILAC’s agreement with LOHHLA¹² in the TRACERx100 lung WES cohort. LILAC and LOHHLA estimates displayed a global 90% agreement (Fig. 1g and Supplementary Data 1). Importantly, high tumor purity samples showed considerably better concordance than low-purity samples (96.08% in samples with tumor purity ≥0.3, 75.51% when tumor purity <0.3), reflecting increased challenges for genome-wide CN loss calling in low-purity WES samples. Finally, the three tumor samples harboring LOH of HLA-I, according to our framework and evaluated by the orthogonal approach, displayed a strong allelic imbalance in the experimental validation (Supplementary Data 1).

GIE prevalence across cancer types

We then combined LILAC with the Hartwig tumor analytical cancer WGS pipeline^16,19 to annotate GIE events across 6 pathways strongly associated with immune escape (Fig. 1a and Supplementary Table 1) across 6,319 uniformly processed WGS samples²⁰, including 1,880 primary patients from PCAWG and 4,439 patients with metastases from Hartwig (Fig. 2a, Extended Data Fig. 2a, Supplementary Table 2 and Supplementary Note 1). In total, these patients were classified into 58 cancer types, which included 30 tumor types with sufficiently high representativeness (that is, number of patients ≥15) in the metastatic cohort, 27 in the primary dataset and 20 cancer types with sufficient representation in both datasets (Fig. 2b, Extended Data Fig. 2b,c and Supplementary Table 2).

GIE prevalence showed high mechanistic and frequency variability across primary and metastatic cancer types (Fig. 2c,d, top panels and Supplementary Data 2). The median proportion of patients harboring GIE alterations per cancer type was 0.27 for the metastatic cohort and 0.20 for primary tumors, both showing highly dispersed distributions (±0.15 s.d. and ±0.19 s.d. in metastatic and primary tumors, respectively). In certain cancer types, such as pancreatic neuroendocrine (PANET, metastatic), diffuse large B-cell lymphoma (DLBCL, metastatic) and kidney chromophobe cancer (KICH, primary), GIE was present in >50% of patient samples (65%, 55% and 74%, respectively) whereas in others, such as lung neuroendocrine (LUNET, metastatic), GIE was an extremely rare event. Overall, one in four patients (26% in metastatic and 24% in primary) presented GIE alterations based on the six investigated pathways (Fig. 2c,d, bottom panels).

The most frequent GIE alteration was partial loss of the HLA-I locus (including both LOH of HLA-I and homozygous deletions of HLA-I genes that were grouped as LOH of HLA-I for simplicity), which was present in 783 (18%) of metastatic and 319 (17%) of primary cancer patients, followed by IFN-γ inactivation (4% in metastatic and 3% in primary) and alterations in the antigen presentation pathway (4% in metastatic and 3% in primary). CD58 inactivation was the least frequent immune escape event present in only 16 metastatic and 8 primary patients. The high GIE rates of KICH and PANET were exclusively due to LOH of HLA-I (Fig. 2e), whereas other cancer types displayed a wider range of GIE mechanisms (Fig. 2f). Of note, we did not observe a significant mutual exclusivity between LOH of HLA-I and other GIE events in cancer types with sufficient representation of multiple GIE mechanisms (Supplementary Data 2). This suggests that certain tumors may require complementary GIE alterations, such as concurrent alterations that disrupt HLA-I-mediated neoepitope presentation and CD58 loss²¹, to effectively escape immune surveillance.

High agreement between primary and metastatic GIE rates

We next sought to investigate whether there was a GIE prevalence difference between early stage primary and late-stage metastatic tumors. Comparison by tumor type across the 20 cancer types with sufficient representation showed a broad agreement between both stages (Fig. 3a). Although nine cancer types showed a certain degree of metastatic enrichment (log₂(odds ratio) (log₂(OR)) > 0.5; Fig. 3a,b), only in prostate carcinoma (PRAD) and thyroid cancer (THCA) was this difference statistically significant (Fisher’s exact test corrected P < 0.01). The significant enrichment in these two cancer types might be connected to the substantial genome transformation at the metastatic transition²⁰.

**Fig. 3: GIE prevalence in primary and metastatic tumors.**

Breaking down pathway-specific differences revealed that THCA metastatic enrichment is the result of increased LOH of HLA-I incidence, whereas the discrepancies in PRAD are the result of a widespread enrichment across several pathways (Fig. 3b). In general, LOH of HLA-I showed a nonsignificant trend toward metastatic enrichment across seven of the nine metastatic-enriched cancer types. None of the cancer types showed a significantly higher GIE incidence in primary tumors.

Positive selection of HLA-I alterations

We next examined to what extent somatic alterations in HLA-I genes (that is, HLA-A, HLA-B and HLA-C) were positively selected during tumorigenesis.

First, a pan-cancer-grouped HLA-I analysis revealed a nonsynonymous:synonymous substitution (dN:dS) ratio >1 for nonsense, splice site and truncating variants in both the metastatic and the primary datasets (Fig. 4a), indicating that these genes are subject to positive selection. Next, pan-cancer and gene-specific dN:dS ratios showed that HLA-A and HLA-B, but not HLC-C, are positively selected and are mostly enriched in truncating variants but not in missense mutations (Fig. 4b,c). Finally, gene and cancer type-specific analysis showed that HLA-A and HLA-B were deemed as drivers across several cancer types, including metastatic colorectal, NSCLC and DLBCL as well as the pan-cancer cohorts (Fig. 4d,e and Supplementary Data 3).

**Fig. 4: Positive selection of *HLA-I* genes.**

Somatic point mutations and small indels (insertions and deletions) of HLA-I genes were evenly distributed along their sequences (Fig. 4f and Extended Data Fig. 3a). The main exception was the recurrent HLA-A Lys210 frameshift indel (chromosome 6 at position 29911899), which was observed in six mismatch repair-deficient (MMRd) metastatic tumors. This genomic region overlaps with a (C)₇ homopolymer repeat, which probably explains its susceptibility for the observed base indel. No enrichment for mutations in amino acids involved in the peptide binding was observed. Such uniform distribution was in agreement with previous observations²² and with the expected profile in tumor-suppressor genes dominated by inactivating variants²³.

LOH of HLA-I trims the repertoire of HLA-I-presented epitopes in HLA-I heterozygous individuals. Therefore, to further shed light on the tumorigenic role of LOH of HLA-I, we developed a randomization strategy that pinpoints cancer types where the LOH of HLA-I rates were significantly higher than the expected, given their background LOH rates using three genomic resolutions (that is, nonfocal LOH including all LOH events spanning >75% of the chromosome arm length, focal LOH for those events <75% of the chromosome arm and highly focal LOH for LOH events <3 Mb). In spite of the global correlation with background genome-wide LOH rates (Extended Data Fig. 3b), our analyses revealed higher-than-expected rates of LOH of HLA-I across several cancer types in both the metastatic and the primary datasets (G-test goodness of fit q value <0.1; Fig. 4d,e and Supplementary Data 3). PANET (Fig. 4g) and KICH (Extended Data Fig. 3c) showed nonfocal LOH of HLA-I enrichment. Others, such as metastatic cervix carcinoma (Fig. 4h), metastatic colorectal cancer (Fig. 4i) or primary DLBCL, showed focal or highly focal LOH of HLA-I patterns. Furthermore, 33 patients with nonsynonymous mutations of HLA-I genes (20% of the total 159 patients with mutations in HLA-I genes) displayed the concurrent loss of the alternative allele by LOH, potentially leading to complete inactivation of the HLA gene.

Finally, we did not observe any biallelic deletion of the entire HLA-I locus (Supplementary Data 3), suggesting that homozygous deletions within the HLA-I might be constrained by purifying selection, featuring the importance of expressing a minimal amount of HLA-I molecules to avoid immune-alerter signals²⁴.

Differences between focal and nonfocal LOH of HLA-I

Our results suggest that LOH of HLA-I is a positively selected genomic event in certain tumor types. However, it remains unclear whether these losses target a specific allele and whether both focal and nonfocal LOH of HLA events display similar selective patterns. To address these questions, we assessed whether LOH of HLA-I tends to involve the allele(s) with the highest neoepitope ratio (that is, higher number of predicted neoepitopes compared with the alternative allele; Fig. 5a).

**Fig. 5: LOH of *HLA-I* and neoepitope load.**

We observed a positive association between the neoepitope ratio and the frequency of the allele with highest neoepitope repertoire to be lost in both the metastatic and the primary cohorts (Fig. 5b,c). This trend was significantly different from a neutral scenario where both alleles are equally likely to be lost independently of their neoepitope repertoire (Kolmogorov–Smirnov test metastatic P = 2.47 × 10⁻⁵ and primary P = 2.24 × 10⁻⁷). Remarkably, the association between neoepitope ratio and the loss frequency became stronger when selecting for focal LOH of HLA-I events (Fig. 5d,e; P = 1.71 × 10⁻⁹ and P = 1.17 × 10⁻¹⁰ for metastatic and primary, respectively). However, it was indistinguishable from a neutral scenario for nonfocal LOH of HLA-I (Fig. 5f,g; P = 0.32 and P = 0.99 for metastatic and primary, respectively), showing that nonfocal LOH of HLA-I does not select for the allele with the highest neoepitope repertoire and that its high recurrency in several cancer types may be associated with other selective forces operating on chromosome 6.

Furthermore, the majority of focal LOH of HLA-I events were CN neutral (81% in metastatic tumors and 70% in primary), which was considerably higher than for nonfocal events (65% in metastatic and 35% in primary), providing further support for the notion that the loss of neoepitope repertoire, and not gene dosage, is the main driving force behind focal LOH of HLA-I.

Positive selection of GIE alterations beyond HLA-I

Alterations in other pathways beyond the HLA-I locus may also lead to immune escape. Hence, we explored signals of positive selection across 18 genes associated with 5 immune escape pathways (pathways 2–6 in Fig. 1a).

Grouped pan-cancer analysis of the dN:dS ratio in these pathways (covering a total of 16 genes, excluding those with an oncogenic mechanism based on CN amplification; Methods) revealed a >1 ratio for nonsense, splice site and truncating variants in both the metastatic and the primary datasets (Fig. 6a), which was indicative of positive selection.

**Fig. 6: Positive selection of GIE events beyond the *HLA-I*.**

Refining the analysis for specific genes and cancer types revealed that two genes from the antigen presentation pathway (that is, B2M and CALR) displayed recurrent patterns of inactivating mutations and focal biallelic deletions across several tumor types, as well as in the pan-cancer cohorts (Fig. 6b–d). Moreover, higher-than-expected frequencies of focal biallelic deletions for several IFN-γ pathway genes, including JAK1, JAK2 and IRF2, were also observed. CD58 also harbored a higher-than-expected number of nonsynonymous mutations and homozygous deletions in DLBCL and the pan-cancer primary cohort. Finally, the chromatin modifier SETDB1 was recurrently focally amplified in multiple cancer types, including metastatic NSCLC (Fig. 6e) and primary breast cancer. Full results are available in Supplementary Data 3.

GIE association with cancer genomic features

We next investigated whether, aside from cancer-type intrinsic differences, there were other cancer genomic and environmental features associated with GIE prevalence. Thus, we performed a cancer type-specific univariate logistic regression of 99 tumor genomic features and 366 driver genes against the presence of GIE events (excluding nonfocal LOH of HLA-I) across 38 cancer types (Supplementary Note 3). Moreover, to control for associations that may be secondary to increased mutation and CN variant (CNV) background rates, we filtered out significant associations that were found in our GIE simulations (Supplementary Note 3).

Overall, 35 genomic features and 5 driver genes showed a statistically significant association with GIE in at least one cancer type (Fig. 7a and Extended Data Fig. 4a). Even after controlling for background mutation rates, TMB and patient’s neoepitope load were strongly associated with GIE events in DLBCL, pancreas carcinoma and skin melanoma (q value < 0.05, log₂(OR) > 0.0 and simulated GIE prevalence ≤2%; Fig. 7a and Extended Data Fig. 4a–c). It is interesting that clonal TMB and clonal neoepitope load showed a strong positive association with GIE, whereas subclonal TMB and neoepitope load showed a modest correlation (Fig. 7a–c), highlighting the relevance of mutation cellularity in triggering immune responses. Finally, fusion-derived neoepitopes were significantly associated with GIE in DLBCL and NSCLC (Fig. 7a), which emphasizes the importance of considering noncanonical sources of neoepitopes beyond small nonsynonymous variants in coding regions.

**Fig. 7: GIE association with cancer genomic features.**

Exposure to certain endogenous and exogenous mutational processes have been correlated with increased immunogenicity²⁵ and response to ICIs^26,27. After controlling for molecular age and excluding non-GIE exclusive associations (that is, associations also observed in the GIE simulations) several mutational processes showed significant association with GIE incidence (Fig. 7a and Extended Data Fig. 4a). First, MMRd mutational processes were broadly associated with increased GIE incidence. Similarly, exposure to the APOBEC family of cytidine deaminases was strongly associated with GIE in multiple cancer types, including breast carcinomas (Fig. 7d and Extended Data Fig. 4d). Last, the mutation burden associated with several exogenous mutational processes, such as ultraviolet light in skin melanoma (Fig. 7e and Extended Data Fig. 4e) and platinum treatment in NSCLC (Fig. 7f and Extended Data Fig. 4f), was also significantly linked to an increased incidence of GIE events in these cancer types.

We also identified other tumor genomic features that were correlated with GIE. For instance, in colorectal cancers, which also include some patients with anal cancer, human papillomavirus DNA integration was positively associated with GIE incidence (Fig. 7a). Moreover, high-immune infiltration, as determined by several RNA-sequencing-based deconvolution measurements (Supplementary Note 3), was significantly linked with higher GIE incidence in this cancer type (Fig. 7g and Extended Data Fig. 4g), which is in agreement with previous reports⁷.

Certain driver alterations, beyond the GIE pathways considered in the present study, also showed a strong association with GIE events. Specifically, CASP8, KMT2D, RPL22 and TGFBR2 alterations tended to co-occur with GIE in patients with colorectal cancer. Of note, CASP8 (ref. ¹³) and TGFBR2 (ref. ²⁸) alterations have previously been linked to immune surveillance escape.

Finally, other factors, such as the HLA-I supertype, the germline HLA-I divergence, patient chronological age or exposure to previous treatments, including immunotherapy, failed to attain significant association with GIE (or the association was also observed in the simulated GIE). All the screened molecular features alongside their cancer type-specific significance coefficients are available in Supplementary Data 4.

The selected immune evasion mechanisms depends on TMB

An increase in mutational load leads to the generation of neoepitopes susceptible to recognition as neoantigens by the adaptive immune system. Therefore, we investigated the relationship between the frequency of GIE alterations (excluding nonfocal LOH of HLA-I) and the TMB across 20 evenly distributed TMB buckets (Methods). We first observed that GIE frequency steadily increased with the TMB (Fig. 8a; observed GIE) and that this trend was not fully explained by an increased background mutation and CNV rate (Fig. 8a; simulated GIE). More specifically, as the TMB increases, the observed GIE frequency deviates from the expected frequency given by the GIE simulations. This is particularly noticeable for (ultra)hypermutated tumors, which showed a GIE incidence two- to threefold higher than the simulations. This trend was still consistent after controlling for the cancer type (Extended Data Fig. 5a) and mutation clonality (Extended Data Fig. 5b). Using the burden of predicted neoepitopes based on the germline HLA-I profile as baseline also revealed an almost uniformly increasing distribution across the neoepitope buckets, which becomes sharper and higher than expected after the 17th bucket (Fig. 8b).

**Fig. 8: Immune evasion mechanisms and TMB.**

It is interesting that, in the bucket grouping samples with ~10–13 mutations per Mb, which is the minimal threshold regularly used as a response to ICIs, we observed an average GIE frequency of 0.30 ± 0.03 s.d. Similarly, in the group of samples between 26 and 36 mutations per Mb, mostly including hypermutated tumors, the average frequency was 0.42 ± 0.06 s.d., whereas beyond ~95 mutations per Mb (considered to be ultra-hypermutated tumors²⁹) we identified GIE alterations in >70% of samples (0.72 ± 0.06 s.d.). Our results thus showed that an important fraction of patients eligible for ICIs harbored tumor alterations that may hinder recognition and/or elimination by the immune system.

We then analyzed the relationship between the TMB and the presence of specific GIE alterations across the six immune escape pathways included in the present study. Overall, the observed frequency distributions across these pathways were remarkably different (Fig. 8c and Extended Data Fig. 5c). In fact, different types of HLA-I alterations showed a distinctive frequency distribution along the TMB buckets. Nonfocal LOH of HLA-I was primarily present in low-TMB tumors, whereas focal LOH of HLA-I showed a clear enrichment for mid and high TMB tumors, peaking around ~10–20 mutations per Mb (average frequency of 0.22 ± 0.04 s.d.) and displaying an inverted U-shaped distribution. Finally, mutations in HLA-I genes were more frequent in hypermutated tumors (that is, from ~26 mutations per Mb to 36 mutations per Mb). Similarly, alterations in the antigen presentation machinery and the IFN-γ pathway were predominantly found in hypermutated tumors (Extended Data Fig. 5c). The remaining pathways did not show any clear TMB preference, probably due the lower prevalence of these alterations in our dataset. Finally, using the number of predicted neoepitopes as baseline revealed consistent distributions (Fig. 8d and Extended Data Fig. 5d).

Discussion

In the present study, we have characterized the prevalence and impact of GIE alterations involved in six major pathways across thousands of uniformly processed primary and metastatic tumors from fifty-eight cancer types. Moreover, we addressed the complexity of identifying tumor-specific HLA-I alterations by developing LILAC.

Our results revealed that, on average, one in four patients bears a GIE event, primarily as a result of LOH of HLA-I. However, GIE incidence and the targeted pathways showed high diversity across cancer types. Importantly, the fact that we did not observe mutual exclusivity between GIE alterations targeting different pathways suggests that multiple GIE alterations may concur to effectively avoid immune surveillance.

Remarkably, our analyses also showed that the frequency of GIE alterations in metastatic patients are comparable to their primary counterparts across most cancer types. This result is also supported by independent studies^6,30, denoting that early stages of tumorigenesis have already acquired the capacity to escape from immune system recognition.

Immune escape alterations were often positively selected during tumor evolution. Specifically, loss-of-function mutations in HLA-A and HLA-B, as well as multiple genes from other immune escape pathways, displayed higher-than-expected frequencies across several cancer types. Nevertheless, HLA-C did not show a significant enrichment in inactivating variants which may imply that its expression is needed to avoid natural killer-mediated immunity³¹ and that the neoepitope repertoire of this gene is generally lower compared with HLA-A and HLA-B. Finally, we also observed higher-than-expected LOH of HLA-I rates across multiple cancer types.

Related to this, focal and nonfocal LOH of HLA-I undergo divergent mechanisms of selection. Focal LOH of HLA-I was primarily a CN-neutral event that tended to target the HLA allele with the largest neoepitope repertoire, indicating an active role in immune evasion. On the contrary, we did not observe such allelic preference for nonfocal LOH of HLA-I, suggesting that alternative selective forces, such as DAXX haploinsufficiency³², are operating in these large-scale chromosome 6 events.

Multiple tumor intrinsic and extrinsic features displayed a significant association with increased GIE incidence. However, in our cohort, a patient’s exposure to previous cancer therapies, including immunotherapies, did not attain a significant association with GIE frequency, indicating that the efficacy of GIE alterations may be compromised when dealing with the strong immune pressure released by ICIs.

The tumor mutation and neoepitope burden influenced both the GIE frequency and the targeted GIE pathway. Although focal LOH of HLA-I was the most frequent mechanism in mid and high TMB tumors, the loss of certain HLA-I alleles was apparently not sufficient to cope with the neoepitope load of (ultra)hypermutated tumors, where a nontargeted GIE mechanism, such as antigen presentation abrogation, is probably needed. However, we cannot rule out the fact that such differences may also be partially shaped by mutation and CNV rate differences across cancer types. It is important to mention that the GIE escalation as the TMB increases was not entirely attributed to the underlying increase in background mutation rate, particularly in hypermutated tumors. Although the modeling of background GIE rates could be sensitive to the selected randomization strategy, our results are supported by independent studies based on orthogonal analytical approaches³⁰, evidencing the robustness of our conclusions.

The present study considered a collection of highly confident GIE alterations across six well-characterized, immune-related pathways. However, in our dataset, three of four patients did not harbor GIE events, highlighting the need to characterize other mechanisms of immune evasion. These may involve not only alternative molecular pathways such as the HLA-II (ref. ³³), but also other types of alterations such as germline variants³⁴ and epigenetic modifications^5,35. Finally, tumor extrinsic factors such as clonal hematopoiesis, tumor-associated microbiome or the tissue architecture may also play an important role in tumor immune evasion. We expect that the combination of cancer genomics with high-resolution characterization of the tumor microenvironment will aid in further understanding of the interplay between tumor evolution and the immune system.

Methods

Data collection and processing

The Hartwig Medical Foundation sequences and characterizes the genomic landscape for a large number of patients with metastases. A detailed description of the consortium and the whole patient cohort has been given in detail in Priestley et al.¹⁶. In the present study, the Hartwig cohort included 4,784 metastatic tumor samples from 4,468 patients.

The Hartwig patient samples have been processed using the Hartwig analytical pipeline5 (https://github.com/hartwigmedical/pipeline5) implemented in Platinum (v.1.0) (https://github.com/hartwigmedical/platinum). Briefly, Platinum is an open-source pipeline designed for analyzing WGS tumor data. It enables a comprehensive characterization of tumor WGS samples (for example, somatic point mutations and indels, structural variants, CN changes) in one single run.

Hartwig samples that failed to provide a successful pipeline output, potential nontumor samples, with purity <0.2, with TMB < 50 SNVs/indels, lacking sufficient informed consent for the present study or without enough read coverage to perform two-field HLA typing (Supplementary Note 1) were discarded. Similarly, for patients with multiple biopsies, we selected the tumor sample with the most recent biopsy date and, if this information did not exist, we selected the sample with the highest tumor purity. However, some Hartwig patients had biopsies from different primary tumor locations. In these cases, we kept at least one sample from each primary tumor location and, when there were multiple samples from the same primary tumor location, we applied the aforementioned biopsy date and tumor purity-filtering criteria. A total number of 4,439 Hartwig samples were whitelisted and used in the present study (Extended Data Fig. 2a and Supplementary Table 2).

Preprocessed RNA-sequencing (RNA-seq) data by ISOFOX (https://github.com/hartwigmedical/hmftools/tree/master/isofox) were available for 1,864 Hartwig samples and were consequently used in the immune infiltration deconvolution analysis.

Patient clinical data were obtained from the Hartwig database. Cancer-type labels were harmonized to maximize the number of samples that had tumor types comparable with the PCAWG dataset (Supplementary Table 2).

The PCAWG cohort consisted of 2,835 patient tumors and access for raw sequencing data for the PCAWG-US was approved by the National Institutes of Health (NIH) for the dataset General Research Use in The Cancer Genome Atlas (TCGA) and downloaded via the dbGAP download portal. Raw sequencing access to the non-US PCAWG samples was granted via the Data Access Compliance Office (DACO). A detailed description of the consortium and the whole patient cohort has been given in Campbell et al.¹⁷.

The samples were fully processed using the same cancer analytical pipeline applied to the Hartwig cohort (BWA³⁸ v.0.7.17, GATK³⁹ v.3.8.0, SAGE¹⁶ v.2.2, GRIDSS⁴⁰ v.2.9.3, PURPLE¹⁶ v.2.53 and LINX⁴¹ v.1.17). This enabled a harmonized analysis and eliminated the potential biases introduced by applying different methodological approaches. Samples that failed to provide a successful pipeline output, with a tumor purity <0.2, potential nontumor samples, blacklisted by the PCAWG original publication¹⁷ or without enough read coverage to perform two-field HLA typing were discarded. Similarly, for patients with multiple samples, we selected the first according to the aliquot ID alphabetical order. A total number of 1,880 were whitelisted and used in the present study (Extended Data Fig. 2a and Supplementary Table 2). For more details about the re-processing of the PCAWG dataset and the technical validation see Martínez-Jiménez et al.²⁰.

Preprocessed gene level expression data were downloaded for 1,118 samples from the International Cancer Genome Consortium (ICGC) portal (https://dcc.icgc.org/releases/PCAWG/transcriptome/gene_expression/tophat_star_fpkm_uq.v2_aliquot_gl.tsv.gz). ENSEMBL identifiers were mapped to HUGO symbols. Of these samples, 930 belonged to biopsies selected for the present study and were therefore used for the RNA analysis in PCAWG samples.

The most recent clinical data were downloaded from the PCAWG release page (https://dcc.icgc.org/releases/PCAWG) on August 2021. Cancer-type labels were harmonized to maximize the number of samples that had tumor types comparable with the Hartwig dataset (Supplementary Table 2).

LILAC

All information relative to LILAC’s algorithm, implementation and validation is described in Supplementary Note 1.

Definitions of GIE alterations

We searched in the literature for somatic genomic alterations that are robustly and recurrently associated with immune evasion. We stratified the reported alterations into six major pathways (Fig. 1a and Supplementary Table 1):

(1)
The HLA-I: somatic alterations in the HLA-A, HLA-B and HLA-C genes have been extensively reported as a mechanism for immune evasion across several cancer types^9,10,12,22. We considered LOH of HLA-I, homozygous deletions and somatic nonsynonymous mutations on these genes as immune evasion alterations. We defined LOH for HLA-A, HLA-B and HLA-C as those cases with a minor allele ploidy <0.3 and a major allele ploidy >0.7 according to LILAC annotation. We also relied on LILAC mapping of somatic mutations into HLA-A, HLA-B and HLA-C alleles to report samples with nonsynonymous alterations. Finally, we also used LILAC allele-specific tumor CN estimations to annotate samples with homozygous deletions of HLA-A, HLA-B and HLA-C genes. A gene was homozygous deleted in a sample if the estimated minimum tumor CN of the gene was <0.5.
(2)
The antigen presentation pathway: several studies have reported the immunomodulatory effect of somatic inactivation of genes involved in the antigen presentation machinery (see Supplementary Table 1 for gene-specific references). The most recurrent alteration is B2M inactivation, but there are other genes involved in antigen presentation and antigen presentation activation, the inactivation of which has been linked to increased immune evasion, including CALR, TAP1, TAP2, TAPBP, NLRC5, CIITA and RFX5. We defined inactivation events as monoallelic and biallelic clonal loss-of-function mutations (frameshift variant, stop gained, stop lost, splice acceptor variant, splice donor variant, splice region variant and start lost), biallelic clonal nonsynonymous mutations not included in the former group (for example, missense mutations) and homozygous deletions. A gene was homozygous deleted in a sample if the estimated minimum tumor CN of the gene was <0.5.
(3)
The IFN-γ pathway: IFN-γ is a cytokine with known proapoptotic and immune booster capacities. Hence, it has been reported that tumors frequently leverage somatic alterations targeting IFN-γ receptors and downstream effectors to evade immune system surveillance (see Supplementary Table 1 for gene-specific references). More specifically, we considered that inactivation events (see above for specifics of which type of alterations are included) in JAK1, JAK2, IRF2, IFNGR1, IFNGR2, APLNR and STAT1 have been probed to have the ability to provide an immune evasion phenotype.
(4)
The PD-L1 receptor: the PD-L1 receptor, encoded by the CD274 gene, plays a major role in suppressing the adaptive immune system. It has been reported how overexpression of PD-L1 in tumor cells leads to impaired recruitment of immune effectors⁴². We therefore considered CD274 CN amplification as a genetic mechanism of immune evasion. We defined a CD274 CN amplification event as samples with CD274 minimum tumor CN >3× the average sample ploidy.
(5)
The CD58 receptor: the CD58 receptor, encoded by CD58, plays an essential role in T-cell recognition and stimulation. It has been extensively reported that CD58 alterations in B-cell lymphomas lead to immune evasion²¹. Moreover, a recent study identified CD58 loss as one of the major effectors of impaired T-cell recognition⁴³. Consequently, we considered inactivation events (see above) in CD58 as alterations able to provide an immune escape phenotype.
(6)
Epigenetic driven immune escape: it has been recently reported how SETDB1 amplification leads to epigenetic silencing of tumor intrinsic immunogenicity⁴⁴. SETDB1 amplification was recurrently found across several cancer types and was therefore considered in the present study as a mechanism of immune evasion. We defined a SETDB1 CN amplification event as samples with SETDB1 minimum tumor CN >3× the mean sample ploidy.

A summary table with all 21 considered genes, their associated pathway, references and their type of somatic alterations is presented in Supplementary Table 1.

GIE mutual exclusivity

To assess whether LOH of HLA-I events were mutually exclusive with other GIE events, we performed two statistical tests. First, we performed a left-sided Fisher’s exact test comparing two groups of annotations (LOH of HLA-I and other GIE events) in a cancer type-specific manner. Second, for each cancer type, we compared the number of samples bearing both LOH of HLA-I and other GIE events with the expected given by 10,000 randomization, using the observed alteration frequency of both groups in the specific cancer type (LOH of HLA-I and other GIE alterations). The significance was computed using an empirical one-sided P value (that is, number of randomizations with co-occurring events lower than the real observed value divided by the total number of randomizations).

Primary and metastatic GIE prevalence

The prevalence of a pathway alteration for a particular cohort was calculated as the number of samples with at least one alteration in the pathway divided by the total number of cohort samples. The presence of a genetic immune alteration in a given sample was annotated if there was at least one pathway with an alteration in that sample.

For the primary versus metastatic comparison, we performed a tumor type-specific Fisher’s exact test comparing pathway-specific and global escaped status prevalence across the two cohorts. P values were adjusted with a multiple-testing correction using the Benjamini–Hochberg procedure (α = 0.05).

Positive selection: somatic point mutations and indels

Positive selection analysis based on somatic point mutation and small indels was performed using dNdScv and the hg19 reference genome. The analysis was performed in a cohort-specific, cancer-type and pan-cancer manner across the two datasets. The analysis was restricted to datasets with sufficient representativeness (that is, number of samples ≥15). Global grouped dN:dS ratios of the HLA-I (HLA-A, HLA-B and HLA-C) and the 16 non-HLA-I genes potentially targeted by mutations (that is, excluding SETDB1 and CD274 because their immune escape phenotype is associated with CN gains; Supplementary Table 1) were calculated in a pan-cancer manner using the gene_list attribute of the dndscv function.

We used the geneci() function of dNdScv to estimate the pan-cancer and gene-specific dN:dS ratios, which include confidence intervals (CIs), of the HLA-I genes.

Positive selection: CNAs

We devised a statistical test to assess positive selection in LOH, homozygous deletion (HD) and CN amplification (AMP) events. LOH was defined as those genomic regions where the minor allele ploidy of this gene was <0.3 and the major allele ploidy >0.7. HD was defined as those regions with estimated minimum CN < 0.5. Similarly, AMP events were defined as those genomic regions with the minimum tumor CN >3× the mean sample ploidy.

For a particular type of genomic event overlapping with a gene, this test compares the number of observed samples bearing the alteration with the expected number after whole-genome randomization. More specifically, these are the steps followed:

(1)
Let us first denote E as the type of query alteration (that is, LOH, HD or AMP), S as a group of samples (usually samples from the same cancer type and same dataset) and G_s as the genomic scale (that is, nonfocal for segment lengths >75% chromosome arm, focal for segments <75% of the chromosome arm and highly focal for segments <3 Mb).
(2)
For every sample S_i in {S₁,S₂,…S_T} we first gather the number and length of observed (O_i) segments targeted by E within that G_s. Only E events overlapping with autosomes are considered in the present study. Samples that do not harbor any event of type E within that G_s are ignored.
(3)
Next, for every sample S_i we performed 10 independent randomizations (R_i1, R_i2, … R_i10) of the O_i events, by randomly shuffling these events E along the autosomes. For this, we used the shuffle function from pybedtools⁴⁵ with the following parameters (genome=‘hg19’, noOverlapping=True, excl=‘sexual_chomosomes’, allowBeyondChromEnd=False). In certain samples, with an extremely high segment load (O_i > 10,000) or with mean ploidy of ~1 (that is, monoploid genome), the noOverlapping flag was set to False because the randomization would not converge.
(4)
We then binned the autosomes into 28,824 bins of 100 kb and counted for each bin k_j {k₁, … k_28,842} the total number of observed events O_Tj as the sum of observed events O_1k, … O_TK overlapping with that bin across all S samples.
(5)
Similarly, for each R_ith (R₁, … R₁₀) randomization and each bin k_j{k₁,… k_28,842}, we counted the total number of simulated events as the sum of events—in that ith randomization and overlapping with that bin across all samples in S.
(6)
We then performed a bin-specific comparison of the O_Tk with the average number of simulated events R_TK across the ten simulations and performed a statistical test of significance using a G-test goodness of fit. As chromosome starting bins were highly depleted in the simulated group (R_TK), we also computed the global simulated mean across all bins k_j{k₁, … k_28,842}, and used this as the expected number of events for the statistical significance assessment.
(7)
The P values were adjusted (that is, converted to q values) with a multiple-testing correction using the Benjamini–Hochberg procedure (α = 0.05).
(8)
For each gene, overlapping with one or with multiple k_j bins, we used the minimal adjusted P-value significance of the bin(s) overlapping with the genomic location of the specific gene-coding sequence. Therefore, by definition, two genes sharing the same bins would have a similar q value. We used ENSEMBL v.88 to perform the annotation of gene exonic regions to hg19 genomic coordinates.

We observed that LINE insertions near the HLA-I locus (LINE activation site at chr6:29,920,000) in some esophageal cancer samples had an incorrect CN estimation due to multiple insertions originating from almost the same site in the same sample. Consequently, these samples were not considered in the HLA-I homozygous deletion analysis.

Distribution of mutations in HLA-I genes

LILAC mapped the HLA-A, HLA-B and HLA-C somatic mutations detected by SAGE into the inferred HLA-I alleles (see LILAC section). LILAC provides the consequence type and coding sequence position of HLA-I alterations, which was used to display the distribution of mutations across the HLA-I coding sequence. The 34 amino acids involved in peptide presentation were gathered from our neoepitope prioritization pipeline (see below). Pfam HLA-A, HLA-B and HLA-C domains were manually downloaded from the Pfam³⁶ website.

Tumor-specific neoepitopes

The methodology for the identification and prioritization of neoepitopes is extensively described in Supplementary Note 2.

Calculation and randomization of neoepitope ratio

We wanted to evaluate whether LOH of HLA-I tends to select the HLA-I allele with highest neoepitope repertoire. Let us first introduce the neoepitope allele ratio (nr). Given an HLA-I gene, G, we defined nr as G_A1/2= G_A1/(G_A1 + G_A2), where G_A1/2 is the number of predicted neoepitopes of allele 1 and allele 2, respectively. For each patient tumor sample, the assignment of allele number (that is, allele 1 or allele 2) was randomly performed. Then, we followed the next steps:

(1)
For each patient sample with LOH of HLA-I we calculated the nr across the HLA-I genes targeted by the LOH. Homozygous HLA-I cases were not considered, because their nr is by definition 0.5.
(2)
We then grouped the nr into eight buckets: (0.0–0.35), (0.35, 0.4), (0.4, 0.45), (0.45, 0.5), (0.5, 0.55), (0.55, 0.6), (0.6, 0.65) and (0.65, 1.0). Consequently, each bucket included n allele pairs with an nr within the bound limits.
(3)
Next, we performed 100 bootstraps by randomly subsampling 75% of the total number of available allele pairs in the bucket.
(4)
For each bootstrap iteration ith (i \(\in\) 1, … 100) and each bucket we estimated the frequency of allele 1 loss (F_{A1_loss}) as the number of cases with allele 1 loss compared with the total number of cases in that bucket. Similarly, we computed the expected frequency (F_A1exp) by randomly assigning LOH events to the allele 1 (background probably of 0.5).
(5)
We then computed the bucket-specific average and s.d. of F_A1loss and F_A1exp values across the 100 bootstraps.
(6)
Finally, we performed a Kolmogorov–Smirnov test to compare the observed distribution with the expected given random distribution of the LOH events.

This test was applied to LOH of HLA-I, focal LOH of HLA-I and nonfocal LOH of HLA-I events across the metastatic (Hartwig) and primary (PCAWG) datasets.

GIE and tumor genomic features

Check Supplementary Note 3 for a full description of the methods for this section.

GIE and TMB association

We aggregated the two datasets, metastatic and primary, to increase the robustness of this analysis. We then defined 20 evenly arranged buckets (10 for the cancer type-specific analyses) of the log₁₀(TMB) scale, starting from the 1st percentile and ending in the 99th percentile values. Next, each sample with a log₁₀(TMB) = S_tmb, was allocated to the ith (i \(\in\) 1, … 20) bucket such as log₁₀(TMB)_i−1 < S_tmb ≤ log₁₀(TMB)_i. Samples with an S_tmb greater than the last bucket threshold (that is, log₁₀(TMB)₂₀) were allocated into the last bucket. The number of mutations in each bucket was displayed as the number of mutations per megabase by dividing the total number of mutations by 3,000 (that is, approximated number of human genome megabases). Finally, the GIE frequency (GIE_freq) of the ith bucket was defined as the number of GIE samples in the ith bucket divided by the total number of available samples in that bucket.

To enable calculation of the uniformity in GIE frequency among samples in the same TMB bucket, we performed n (where n = 1,000) bootstraps of the 50% of samples allocated to each bucket. We then calculated the average and s.d. of the GIE_freq across the bootstraps.

A similar approach was conducted to analyze the relationship between the predicted neoepitope load and GIE frequency. The number of neoantigens of each bucket was estimated as 1% (ref. ⁴⁶) and 5% (ref. ⁴⁷) of the total predicted neoepitopes assigned to that bucket threshold.

For the simulated GIE control, we estimated the average and s.d. across the 100 simulated GIE iterations for each TMB bucket.

Statistics and reproducibility

Sample sizes were determined by the availability of samples with sufficient quality from the two datasets included in the present study (PCAWG and Hartwig). Sample-exclusion criteria are thoroughly described in Methods, Supplementary Note 1 and the original publication describing the harmonized cohort²⁰.

The statistical tests and randomization strategies used in each specific analysis are described in Methods and the figure legends. Unless otherwise specified, the scipy⁴⁸ (v.1.5.3) library from python v.3.6.9 was used to carry out the statistical tests.

All the code and data to reproduce the analyses presented in the present article have been deposited in public repositories as described in Data availability and Code availability.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The Hartwig dataset used in the present study is freely available for academic use from the Hartwig Medical Foundation through standardized procedures and request forms that can be found at https://www.hartwigmedicalfoundation.nl/en/applying-for-data. This includes raw sequencing data (.bam files and unmapped reads for hg19 reference genome) as well as the processed data through the latest version of the Hartwig tumor-processing pipeline. The re-processed PCAWG data using the Hartwig Medical Foundation pipeline (for hg19 reference genome) have also been made available for academic purposes. The ICGC part of the PCAWG dataset¹⁷ can be accessed now through the ICGC platform (https://dcc.icgc.org/releases/PCAWG/Hartwig), following their standard access control mechanisms originally put in place. Similarly, users with authorized access can download the re-processed TCGA portion of the PCAWG dataset at https://icgc.bionimbus.org/files/5310a3ac-0344-458a-88ce-d55445540120. We refer to the accompanying publication²⁰, including the description of the entire primary and metastatic cohort, for further information about the technical aspects of the re-processing of the PCAWG dataset. Raw sequencing data of the high-resolution HLA typing performed by GenDx can also be downloaded via European Genome-phenome Archive (http://www.ebi.ac.uk/ega) under accession no. EGAD00001008643. HLA-I typing, sample-specific GIE events and processed data are now shared as Supplementary Data and Supplementary Tables.

Code availability

The Hartwig analytical processing pipeline is available at https://github.com/hartwigmedical/pipeline5 and implemented in Platinum (https://github.com/hartwigmedical/platinum). LILAC’s source code is available at https://github.com/hartwigmedical/hmftools/tree/master/lilac. The source code of the neoepitope prioritization pipeline is available at https://github.com/hartwigmedical/hmftools/tree/master/neo. The source code to reproduce the figures and analysis of the manuscript is available at https://github.com/UMCUGenetics/Genetic-Immune-Escape.

References

Vinay, D. S. et al. Immune evasion in cancer: mechanistic basis and therapeutic strategies. Semin. Cancer Biol. 35, S185–S198 (2015).
PubMed Google Scholar
Jhunjhunwala, S., Hammer, C. & Delamarre, L. Antigen presentation in cancer: insights into tumour immunogenicity and immune evasion. Nat. Rev. Cancer 21, 298–312 (2021).
CAS PubMed Google Scholar
Dhatchinamoorthy, K., Colbert, J. D. & Rock, K. L. Cancer immune evasion through loss of MHC class I antigen presentation. Front. Immunol. 12, 636568 (2021).
CAS PubMed PubMed Central Google Scholar
Castro, F., Cardoso, A. P., Gonçalves, R. M., Serre, K. & Oliveira, M. J. Interferon-gamma at the crossroads of tumor immune surveillance or evasion. Front. Immunol. 9, 847 (2018).
PubMed PubMed Central Google Scholar
Rosenthal, R. et al. Neoantigen-directed immune escape in lung cancer evolution. Nature 567, 479–485 (2019).
CAS PubMed PubMed Central Google Scholar
Pennycuick, A. et al. Immune surveillance in clinical regression of preinvasive squamous cell lung cancer. Cancer Discov. 10, 1489–1499 (2020).
CAS PubMed PubMed Central Google Scholar
Grasso, C. S. et al. Genetic mechanisms of immune evasion in colorectal cancer. Cancer Discov. 8, 730–749 (2018).
CAS PubMed PubMed Central Google Scholar
Restifo, N. P. et al. Loss of functional beta 2-microglobulin in metastatic melanomas from five patients receiving immunotherapy. J. Natl Cancer Inst. 88, 100–108 (1996).
CAS PubMed Google Scholar
Fangazio, M. et al. Genetic mechanisms of HLA-I loss and immune escape in diffuse large B cell lymphoma. Proc. Natl Acad. Sci. USA 118, e2104504118 (2021).
CAS PubMed PubMed Central Google Scholar
Montesion, M. et al. Somatic HLA class I loss Is a widespread mechanism of immune evasion which refines the use of tumor mutational burden as a biomarker of checkpoint inhibitor response. Cancer Discov. 11, 282–292 (2021).
CAS PubMed Google Scholar
Robinson, J. et al. IPD-IMGT/HLA database. Nucleic Acids Res. 48, D948–D955 (2020).
CAS PubMed Google Scholar
McGranahan, N. et al. Allele-specific HLA loss and immune escape in lung cancer evolution. Cell 171, 1259–1271.e11 (2017).
CAS PubMed PubMed Central Google Scholar
Shukla, S. A. et al. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. Nat. Biotechnol. 33, 1152–1158 (2015).
CAS PubMed PubMed Central Google Scholar
Chowell, D. et al. Patient HLA class I genotype influences cancer response to checkpoint blockade immunotherapy. Science 359, 582–587 (2018).
CAS PubMed Google Scholar
Naranbhai, V. et al. HLA-A*03 and response to immune checkpoint blockade in cancer: an epidemiological biomarker study. Lancet Oncol. 23, 172–184 (2022).
CAS PubMed Google Scholar
Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216 (2019).
CAS PubMed PubMed Central Google Scholar
Campbell, P. J. et al. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).
Google Scholar
Xie, C. et al. Fast and accurate HLA typing from short-read next-generation sequence data with xHLA. Proc. Natl Acad. Sci. USA 114, 8059–8064 (2017).
CAS PubMed PubMed Central Google Scholar
Roepman, P. et al. Clinical validation of whole genome sequencing for cancer diagnostics. J. Mol. Diagn. 23, 816–833 (2021).
CAS PubMed Google Scholar
Martínez-Jiménez, F. et al. Pan-cancer whole genome comparison of primary and metastatic solid tumors. Nature https://doi.org/10.1038/s41586-023-06054-z (2023).
Challa-Malladi, M. et al. Combined genetic inactivation of β₂-microglobulin and CD58 reveals frequent escape from immune recognition in diffuse large B cell lymphoma. Cancer Cell 20, 728–740 (2011).
CAS PubMed PubMed Central Google Scholar
Castro, A. et al. Elevated neoantigen levels in tumors with somatic mutations in the HLA-A, HLA-B, HLA-C and B2M genes. BMC Med. Genomics 12, 107 (2019).
PubMed PubMed Central Google Scholar
Martínez-Jiménez, F. et al. A compendium of mutational cancer driver genes. Nat. Rev. Cancer 20, 555–572 (2020).
PubMed Google Scholar
Waldhauer, I. & Steinle, A. NK cells and cancer immunosurveillance. Oncogene 27, 5932–5943 (2008).
CAS PubMed Google Scholar
Boichard, A. et al. APOBEC-related mutagenesis and neo-peptide hydrophobicity: implications for response to immunotherapy. OncoImmunology 8, 1550341 (2019).
PubMed Google Scholar
Rousseau, B. et al. The spectrum of benefit from checkpoint blockade in hypermutated tumors. N. Engl. J. Med. 384, 1168–1170 (2021).
PubMed PubMed Central Google Scholar
Litchfield, K. et al. Meta-analysis of tumor- and T cell-intrinsic mechanisms of sensitization to checkpoint inhibition. Cell 184, 596–614.e14 (2021).
CAS PubMed PubMed Central Google Scholar
Dhainaut, M. et al. Spatial CRISPR genomics identifies regulators of the tumor microenvironment. Cell 185, 1223–1239.e20 (2022).
CAS PubMed PubMed Central Google Scholar
Campbell, B. B. et al. Comprehensive analysis of hypermutation in human cancer. Cell 171, 1042–1056.e10 (2017).
CAS PubMed PubMed Central Google Scholar
Lakatos, E. et al. Evolutionary dynamics of neoantigens in growing tumors. Nat. Genet. 52, 1057–1066 (2020).
CAS PubMed PubMed Central Google Scholar
Blais, M.-E., Dong, T. & Rowland-Jones, S. HLA-C as a mediator of natural killer and T-cell activation: spectator or key player? Immunology 133, 1–7 (2011).
CAS PubMed PubMed Central Google Scholar
Jiao, Y. et al. DAXX/ATRX, MEN1, and mTOR pathway genes are frequently altered in pancreatic neuroendocrine tumors. Science 331, 1199–1203 (2011).
CAS PubMed PubMed Central Google Scholar
Nijland, M. et al. HLA dependent immune escape mechanisms in B-cell lymphomas: implications for immune checkpoint inhibitor therapy? Oncoimmunology 6, e1295202 (2017).
PubMed PubMed Central Google Scholar
Sayaman, R. W. et al. Germline genetic contribution to the immune landscape of cancer. Immunity 54, 367–386.e8 (2021).
CAS PubMed PubMed Central Google Scholar
Berglund, A. et al. Methylation of immune synapse genes modulates tumor immunogenicity. J. Clin. Invest. 130, 974–980 (2020).
CAS PubMed PubMed Central Google Scholar
Mistry, J. et al. Pfam: The protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021).
CAS PubMed Google Scholar
Davoli, T., Uno, H., Wooten, E. C. & Elledge, S. J. Tumor aneuploidy correlates with markers of immune evasion and with reduced response to immunotherapy. Science 355, eaaf8399 (2017).
PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
CAS PubMed PubMed Central Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
CAS PubMed PubMed Central Google Scholar
Cameron, D. L. et al. GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly. Genome Res. 27, 2050–2060 (2017).
CAS PubMed PubMed Central Google Scholar
Shale, C. et al. Unscrambling cancer genomes via integrated analysis of structural variation and copy number. Cell Genomics 2, 100112 (2022).
CAS PubMed PubMed Central Google Scholar
Hamanishi, J. et al. Programmed cell death 1 ligand 1 and tumor-infiltrating CD8+ T lymphocytes are prognostic factors of human ovarian cancer. Proc. Natl Acad. Sci. USA 104, 3360–3365 (2007).
CAS PubMed PubMed Central Google Scholar
Frangieh, C. J. et al. Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat. Genet. 53, 332–341 (2021).
CAS PubMed PubMed Central Google Scholar
Griffin, G. K. et al. Epigenetic silencing by SETDB1 suppresses tumour intrinsic immunogenicity. Nature 595, 309–314 (2021).
CAS PubMed PubMed Central Google Scholar
Dale, R. K., Pedersen, B. S. & Quinlan, A. R. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinforma 27, 3423–3424 (2011).
CAS Google Scholar
Gartner, J. J. et al. A machine learning model for ranking candidate HLA class I neoantigens based on known neoepitopes from multiple human tumor types. Nat. Cancer 2, 563–574 (2021).
CAS PubMed PubMed Central Google Scholar
Wells, D. K. et al. Key parameters of tumor epitope immunogenicity revealed through a consortium approach improve neoantigen prediction. Cell 183, 818–834.e13 (2020).
CAS PubMed PubMed Central Google Scholar
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This publication and the underlying study have been made possible partly on the basis of the data that the Hartwig Medical Foundation and the Center of Personalised Cancer Treatment have made available to the study. We thank A. Van Hoeck and A. G. Perez for critical reading of the manuscript and their valuable input. We also thank A. Van Hoeck for assistance in the dataset collection and L. Nyguen for help in the mutational signature extraction. We thank H. W. Lee for technical assistance in the TRACERx dataset processing. The present study makes use of data generated by the TRAcking Non-small Cell Lung Cancer Evolution Through Therapy (TRACERx) Consortium and provided by the UCL Cancer Institute and the Francis Crick Institute. The TRACERx study is sponsored by University College London (UCL), funded by Cancer Research UK and coordinated through the Cancer Research UK and UCL Cancer Trials Centre. We also thank N. McGranahan, O. Pich and C. Puttick for sharing the original HLA-I typing and LOHHLA calls used in their original publication. Finally, we thank W. Mulder, M. Penning and H. Merkens for discussions on LILAC’s orthogonal validation.

Author information

Francisco Martínez-Jiménez
Present address: Vall d’Hebron Institute of Oncology, Barcelona, Spain
These authors contributed equally: Francisco Martínez-Jiménez, Peter Priestley.

Authors and Affiliations

Center for Molecular Medicine and Oncode Institute, University Medical Center Utrecht, Utrecht, the Netherlands
Francisco Martínez-Jiménez & Edwin Cuppen
Hartwig Medical Foundation, Amsterdam, the Netherlands
Francisco Martínez-Jiménez & Edwin Cuppen
Hartwig Medical Foundation Australia, Sydney, New South Wales, Australia
Peter Priestley, Charles Shale & Jonathan Baber
GenDx, Utrecht, the Netherlands
Erik Rozemuller

Authors

Francisco Martínez-Jiménez
View author publications
You can also search for this author in PubMed Google Scholar
Peter Priestley
View author publications
You can also search for this author in PubMed Google Scholar
Charles Shale
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Baber
View author publications
You can also search for this author in PubMed Google Scholar
Erik Rozemuller
View author publications
You can also search for this author in PubMed Google Scholar
Edwin Cuppen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.M.J., E.C. and P.P. conceived the study. F.M.J., P.P., J.B. and C.S. provided the methodology. P.P., J.B. and C.S. provided the software. F.M.J., P.P. and E.R. validated the study. F.M.J. and P.P. performed the formal analysis. F.M.J. and P.P. carried out the investigations. E.C. and P.P. provided the resources. F.M.J., P.P., C.S. and E.R. curated the data. F.M.J. wrote the original draft of the MS. E.C. and P.P. reviewed and edited the MS. F.M.J. visualized the study. F.M.J. and E.C. supervised the study. F.M.J. and E.C. administered the project. E.C. acquired funding.

Corresponding authors

Correspondence to Francisco Martínez-Jiménez or Edwin Cuppen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Jimmy Van den Eynden, Teresa Davoliand and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 LILAC’s validation.

a) HLA-I typing tumor and germline agreement in PCAWG. Venn diagrams representing the overlap of germline samples with perfect 2-field HLA-I allotype match in b) Hartwig and c) PCAWG. d) Left, copy number of minor and major alleles of HLA-A, HLA-B and HLA-C in Hartwig dataset. Right, proportion of samples with integer minor and major copy number (copy number <=0.3 or >=0.7) of HLA-I genes (orange) compared to the average of 1,000 randomly selected genes (gray) in the Hartwig cohort (N=4,439). e) analogous for PCAWG samples (N=1,880). Box-plots: center line, median; box limits, first and third quartiles; whiskers, lowest/highest data points at first quartile minus/plus 1.5× IQR. N, number of samples.

Extended Data Fig. 2 GIE in primary and metastatic tumors.

a) Workflow of the sample processing pipeline used in this study in Hartwig (left) and PCAWG (right). Each rectangle represents a processing step. The resulting number of selected samples for this study are displayed at the bottom. Sample-exclusion criteria for LILAC is described in Supp. Note 1. b) number of metastatic (Hartwig) samples across cancer types that lack sufficient representation in the primary (PCAWG) dataset. c) Analogous representation for primary (PCAWG) samples. Vertical dashed lines (N>= 15 samples) represent the threshold of samples to consider a cohort as sufficiently populated.

Extended Data Fig. 3 Positive selection of HLA-I genes.

a) Needle plots representing the pan-cancer distribution of somatic mutations along the HLA-A, HLA-B and HLA-C protein sequences in the primary (PCAWG) dataset. Mutations are colored according to the consequence type. Rectangles represent the Pfam domains. b) Correlation of LOH of HLA-I rates (y-axis) with the mean genome-wide LOH rates (x-axis) across the metastatic (left) and primary (right) datasets. Each dot represents a cancer type. Horizontal lines represent genome-wide LOH standard deviation across samples from each cancer type. The regression and 95% confidence intervals of the linear regression are represented as a solid line and the adjacent shaded area, respectively. Confidence intervals are calculated using 1,000 bootstraps. P-values are obtained from a two-sided test of no-correlation. c) Distribution of non-focal LOH events along the autosomes in the primary kidney chromophobe cancer cohort. X-ticks represent the chromosomal starting position. Dashed horizontal lines represent the expected mean after randomization. Vertical dashed lines highlight the HLA-I genomic location. CDS pos, coding sequence position. Std., standard deviation.

Extended Data Fig. 4 GIE association with cancer genomic features.

a) Heatmap displaying the association of genomic features with GIE frequency across cancer types. Significant associations that can not be explained by higher background mutation rate are highlighted by a red border line. Significant associations found in >2% of the GIE simulations are highlighted by a black border line. b) Comparison of the TMB between samples bearing GIE alterations and non-GIE samples in diffuse large B-cell lymphoma (DLBCL). Left boxplot, using observed GIE events. Right using simulated GIE events. c) Similar comparison for neoepitope burden in skin melanoma. d) Comparison of the APOBEC mutational exposure between samples bearing simulated GIE alterations and wild-type (no simulated GIE) in breast cancer in one randomly selected simulation. e) Analogous for ultraviolet light (UV) associated double base substitutions (DBSs) in skin melanoma and f) for platinum treatment associated DBSs in non-small cell lung cancer (NSCLC). g) Comparison of immune infiltration estimates from Davoli et al.³⁷ between samples bearing simulated GIE alterations and wild-type (no simulated GIE) in colorectal cancer. Boxplots: center line, median; First section out from the centerline contains 50% of the data. The next sections contain half the remaining data until we are at the outlier level. Each level out is shaded lighter. N, number of samples. P-values of the boxplots are calculated using a two-sided Mann–Whitney U test. One of the 100 simulations was randomly selected for all the simulated GIE boxplots. SBS, single base substitution.

Extended Data Fig. 5 Immune evasion mechanisms and TMB.

a) Analogous to Fig. 7a, but restricted to (from left to right) non-small cell lung cancer, breast cancer and colorectal cancer patients. Top panels represent the total number of bucket-assigned (white bars with a black contouring line) and GIE-positive (pink bars with pink contouring lines) samples across the ten evenly arranged TMB buckets. bottom panel, representation of observed (pink) and simulated (gray) GIE frequency across these buckets. For the observed GIE values, average (represented as pink dots) and standard deviation (std, vertical error bars and shaded pink area) values are computed using 1,000 bootstraps from the total number of cancer type samples classified into each bucket (from the top panel). For the simulated GIE values, average (gray triangle) and standard deviation (std, vertical bars and shaded gray area) values are computed from 100 GIE simulations using the total number of cancer type samples assigned into each bucket. b) Analogous to panel a) but using the entire cohort (N=6,319) and twenty evenly arranged buckets as well as clonal TMB (left) and subclonal TMB (right) as baseline. c) Equivalent to panel a) but using the entire cohort (N=6,319) and twenty evenly arranged buckets and grouping GIE events according to the assigned GIE pathway. Label number represents the assigned immune escape pathway (from Fig. 1a) d) Similar to panel c) but using the number of predicted neoepitopes as baseline for the buckets. Bottom labels, number of estimated neoantigens as a relative percentage (1% and 5%) of the number of predicted neoepitopes in the bucket. N, number of samples. Muts/Mb, mutations per megabase. ICI, immune checkpoint inhibitor.

Supplementary information

Supplementary Information

Supplementary Tables 1 and 2 and Data 1–4 captions, and Supplementary Notes 1–3.

Reporting Summary

Peer Review File

Supplementary Tables 1 and 2

Supplementary Table 1 Pathways and genes involved in genetic immune escape. Table 2 Dataset metadata and cancer-type representation.

Supplementary Data 1

HLA-I typing for Hartwig and PCAWG patients and validation of LILAC.

Supplementary Data 2

Sample-specific GIE annotation and cohort-wise GIE frequency.

Supplementary Data 3

Positive selection in HLA-I and non-HLA-I genes.

Supplementary Data 4

Tumor genomics and clinical features and GIE association.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Martínez-Jiménez, F., Priestley, P., Shale, C. et al. Genetic immune escape landscape in primary and metastatic cancer. Nat Genet 55, 820–831 (2023). https://doi.org/10.1038/s41588-023-01367-1

Download citation

Received: 28 June 2022
Accepted: 10 March 2023
Published: 10 May 2023
Issue Date: May 2023
DOI: https://doi.org/10.1038/s41588-023-01367-1

This article is cited by

Optimized whole-genome sequencing workflow for tumor diagnostics in routine pathology practice
- Kris G. Samsom
- Linda J. W. Bosch
- Kim Monkhorst
Nature Protocols (2024)
Breaking the performance ceiling for neoantigen immunogenicity prediction
- Hugh O’Brien
- Max Salm
- Sergio A. Quezada
Nature Cancer (2023)
Pan-cancer whole-genome comparison of primary and metastatic solid tumours
- Francisco Martínez-Jiménez
- Ali Movasati
- Arne Van Hoeck
Nature (2023)
A systematic analysis of immune escape signals in the primary and metastatic tumor genome
- Jimmy Van den Eynden
Nature Genetics (2023)
Genetic differences between primary and metastatic cancer: a pan-cancer whole-genome comparison study
- Lei Zhong
- Zhipeng Zhao
- Xiaonan Zhang
Signal Transduction and Targeted Therapy (2023)