# Neoantigen-directed immune escape in lung cancer evolution

## Abstract

The interplay between an evolving cancer and a dynamic immune microenvironment remains unclear. Here we analyse 258 regions from 88 early-stage, untreated non-small-cell lung cancers using RNA sequencing and histopathology-assessed tumour-infiltrating lymphocyte estimates. Immune infiltration varied both between and within tumours, with different mechanisms of neoantigen presentation dysfunction enriched in distinct immune microenvironments. Sparsely infiltrated tumours exhibited a waning of neoantigen editing during tumour evolution, indicative of historical immune editing, or copy-number loss of previously clonal neoantigens. Immune-infiltrated tumour regions exhibited ongoing immunoediting, with either loss of heterozygosity in human leukocyte antigens or depletion of expressed neoantigens. We identified promoter hypermethylation of genes that contain neoantigenic mutations as an epigenetic mechanism of immunoediting. Our results suggest that the immune microenvironment exerts a strong selection pressure in early-stage, untreated non-small-cell lung cancers that produces multiple routes to immune evasion, which are clinically relevant and forecast poor disease-free survival.

## Main

Anti-tumour immune responses require the functional presentation of tumour antigens and a microenvironment that is replete with competent immune effectors1,2. However, the extent to which an active immune system sculpts the evolution of the tumour genome in untreated early-stage tumours has not been well characterized. Although associations between immune infiltration and tumour clonal diversity have previously been observed in some contexts3,4, whether the immune system acts as a dominant selective force in early-stage untreated cancer is unclear. Both genetic and transcriptomic heterogeneity might confound conclusions drawn from sampling a single tumour sample, leading to inaccurate interpretations of mechanisms of immune evasion.

To determine immune infiltration in untreated non-small-cell lung cancer (NSCLC), assess how this infiltration varies between and within tumours, and characterize immune-evasion mechanisms and their associations with clinical outcome, we integrated 164 RNA-sequencing (RNA-seq) samples from 64 tumours and 234 tumour-infiltrating lymphocyte (TIL) histopathology estimates (hereafter, pathology TIL estimates) from 83 tumours to produce a combined cohort of 258 tumour regions from 88 prospectively acquired tumours within the ‘Tracking Non-Small-Cell Lung Cancer Evolution through Therapy’ (TRACERx) 100 cohort5. We explored how selection pressures from a diverse tumour microenvironment affect neoantigen presentation, as well as tumour-intrinsic mechanisms that lead to immune escape and their respective effects on clinical outcomes.

## Heterogeneity of immune infiltration

We benchmarked published in silico immune deconvolution tools (Methods) to estimate immune infiltration in the multi-region NSCLC TRACERx RNA-seq cohort. Compared to other transcriptomic approaches6,7,8,9,10, the Danaher et al.11 immune signature optimally estimated immune infiltrates in NSCLC (Extended Data Fig. 1).

Using this approach, we estimated RNA-seq-derived infiltrating immune cell populations for the 164 tumour regions from 64 patients in the TRACERx 100 cohort5, for which there was RNA of sufficient quality (Extended Data Fig. 2a, b, Supplementary Table 1).

A wide range in the extent of immune infiltration was observed between and within histologies (Extended Data Fig. 3), as well as between separate regions from the same tumour. Unsupervised hierarchical clustering revealed two distinct immune clusters for each histology, which corresponded to high and low levels of immune infiltration. Individual tumour regions were stratified as having either high or low levels of immune infiltrate (Fig. 1).

Consistent with our clustering approach, tumour regions with high levels of immune infiltration contained higher pathology TIL estimates as compared to regions with low levels of immune infiltration (P = 3 × 10−5) (Extended Data Fig. 4a). Owing to the strong correlation that we observed between RNA-seq-derived immune estimates and pathology TIL estimates (Extended Data Fig. 1e), we also used pathology TIL estimates to group tumour regions for which RNA-seq data were not available (Extended Data Fig. 4b, c, Methods). The predicted abundance of myeloid-derived suppressor cells and tumour-associated M2 macrophages12 negatively correlated with the immune activating-cell subsets (Extended Data Fig. 4d, e), which indicates that immunosuppressive cells may influence the immune microenvironment. Eleven per cent of tumour regions—mostly of lung adenocarcinoma—had pathology TIL estimates that were not reflected in the immune cluster to which they were assigned by RNA-seq, which potentially reflects a heterogeneity of sampling due to spatial variation in the mirrored tissue samples used to score TILs and extract RNA.

Overall, 63 patients had tumours with uniformly low (38 tumours (43%)) or high (25 tumours (28%)) levels of immune infiltration, and 25 patients had tumours with disparate levels of immune infiltration between tumour regions (28%) (Extended Data Fig. 4c). We also found that intratumour heterogeneity confounded genomic and transcriptomic biomarkers that are used for the prediction of response to immune-checkpoint blockade. For example, the TIDE12 classifier was heterogeneous in 17 out of 42 tumours (Extended Data Fig. 5a), and heterogeneously infiltrated tumours tended to exhibit a heterogeneous TIDE signature (P = 0.05) (Extended Data Fig. 5a). Likewise, a transcriptomic signature predicting innate resistance to PD-1 immune checkpoint blockade (IPRES)13 and an IFN-signalling score14 were also heterogeneous (Extended Data Fig. 5b–d).

In a recent prospective study, high mutation burden (>10 mutations per megabase) was found to be associated with an improved response to immunotherapy15. Twelve out of fifty-seven (21%) NSCLCs  with a high mutation burden had at least one tumour region that contained a low mutation burden (Extended Data Fig. 5e). Heterogeneously infiltrated tumours were also more likely to exhibit a heterogeneous tumour mutation burden (P = 7 × 10−4) (Extended Data Fig. 5f). Among tumours with a heterogeneous tumour mutation burden, the regions with a low mutation burden had significantly lower tumour purity than regions with a high mutation burden, which indicates that it is important to consider tumour stromal content as a confounding factor to mutation burden assessment (paired t-test, P = 0.04) (Extended Data Fig. 4f).

## Immune infiltration and tumour evolution

We calculated a distance measure in genomic and immune space for all pairwise combinations of tumour regions from the same tumour to explore the relationship between tumour genomic features and the immune microenvironment (Methods). We observed a significant correlation between the two pairwise distance measures (Fig. 2a, lung adenocarcinoma, P = 3.5 × 10−4; lung squamous cell carcinoma, P = 2 × 10−3). We observed a similar relationship when comparing the pairwise immune and copy-number alteration distances, which reached statistical significance among the lung adenocarcinoma cohort (Extended Data Fig. 6a). These results support an interplay between the immune and cancer genomic landscape and that tumour regions distant in genomic space have distinct immune microenvironments.

To further explore this interplay, we considered the relationship between the clonal structure of each tumour region and its immune infiltrate. We compared CD8+ T cell infiltration estimated from RNA-seq data to within-region subclonal diversity (Shannon diversity; see Methods). A significant negative correlation was observed in lung adenocarcinoma but not squamous cell carcinoma; regions with high CD8+ T cell infiltration had lower subclonal diversity (lung adenocarcinoma, P = 0.035, ρ = −0.22; lung squamous cell carcinoma, P = 0.91, ρ = −0.02) (Extended Data Fig. 6b, c). Lung adenocarcinoma regions from tumours with uniformly low levels of immune infiltration exhibited greater subclonal diversity when compared to regions from tumours with high or heterogeneous levels of immune infiltration (Fig. 2b, c; lung adenocarcinoma, P = 0.01). Consistent with previous findings3, when pathology TIL estimates (which did not correlate with tumour purity, Extended Data Fig. 6d) were used to stratify patients, we again observed a reduction in tumour diversity in regions from tumours with a high or heterogeneous level of TILs present (Extended Data Fig. 6e, P = 0.02).

## Immune editing and the immune microenvironment

If T-cell-mediated immune surveillance of neoantigens influences the evolution of cancer genomes, one would predict neoantigen depletion and/or disruption to the antigen-presenting machinery in tumours as mechanisms of immune escape16. Conceivably, neoantigen depletion may occur at the DNA level (through events such as copy-number loss), at the RNA level (through the suppression of transcripts that contain neoantigens), at the epigenetic level (through the silencing of genomic segments that encode neoantigens) or through post-translational mechanisms. Alternatively, tumour subclones that express neoantigens may be preferentially eliminated by the immune system through purifying selection removing subclones containing neoantigens.

We predicted neoantigens and their clonal status to investigate neoantigen depletion. Neoantigens were defined as peptides with a predicted binding affinity < 500 nM or a rank percentage score < 2%; strong neoantigens were defined as those with a predicted binding affinity < 50 nM or a rank percentage score < 0.5%17 (Methods). We used a published method to quantify the extent of DNA immunoediting in each tumour sample16. This method compares the observed and expected numbers of neoantigens present in a tumour, such that a score of <1 suggests DNA immunoediting has occurred. We found no significant difference between the observed-to-expected ratio of neoantigens in lung adenocarcinomas and lung squamous cell carcinomas (Extended Data Fig. 6f). But we note that this score depends on the number of heterozygous human leukocyte antigen (HLA) alleles in the patient germline (P = 2.1 × 10−5, ρ = 0.43) (Extended Data Fig. 6g): if fewer unique HLA types are present, there will be fewer observed neoantigens. To mitigate this bias, we investigated whether this measure changed during tumour evolution, from clonal to subclonal events within each tumour. Among tumours with a low level of infiltrate, a decrease in DNA immunoediting (that is, an increase in ratio of observed-to-expected neoantigens) was noted from clonal to subclonal mutations (P = 8.8 × 10−3, paired t-test) (Fig. 2d), which possibly reflects an ancestral immune-active microenvironment that has subsequently become cold.

Neoantigen depletion may also occur at the DNA level through copy-number loss18 (Fig. 2e). Across this cohort, 43 out of 88 tumours showed evidence for at least one historically clonal neoantigen being subclonally lost owing to subclonal copy-number events (Fig. 2f, range 0–42% of clonal neoantigens).

To determine whether the elimination of historically clonal neoantigens through copy-number loss occurred more frequently than expected by chance, we compared neoantigens with non-neoantigenic, non-synonymous mutations. In tumour regions with low levels of immune infiltration, non-synonymous mutations that were predicted to be neoantigens were more likely to occur on genomic segments that were subject to subclonal copy-number loss, as compared to their non-neoantigenic counterparts (P = 1.2 × 10−4) (Fig. 2g). In tumours with low levels of infiltration, reduced immunoediting of subclones was observed more frequently in tumours without evidence of neoantigen copy-number loss, which supports the role of copy-number loss as a mechanism of subclonal immunoediting (P = 0.88 versus P = 2.2 × 10−4) (Fig. 2h).

## Repression of neoantigenic transcripts

We next determined whether each neoantigen was identified at the transcript level to investigate alternative neoantigen-depletion mechanisms. Overall, only 33% of clonal neoantigens were expressed in every tumour region within a given tumour; we observed a significantly lower proportion of ubiquitously expressed clonal neoantigens among tumours with high (median, 29%) or heterogeneous (median, 35%) levels of immune infiltration, as compared to tumours with low levels of immune infiltration (median, 41%) (Fig. 3a, b) (P = 0.01). To further investigate whether the downregulation of neoantigenic transcripts reflects likely immune selection pressure, we considered whether neoantigens were preferentially subject to reduction in expression as compared to non-neoantigens, an approach that is not confounded by the influence of tumour purity.

At the cohort level, tumours with intact HLA alleles exhibited a significant reduction of expressed neoantigens, as compared to non-neoantigenic, non-synonymous mutations (Fig. 3c, P = 0.01). Moreover, when tumours were divided by immune classification, only tumours with intact HLA alleles and with high or heterogeneous levels of immune infiltration showed a depletion of expressed neoantigens, which suggests that subclones in immune-infiltrated tumours may be selected for either through HLA loss-of-heterozygosity (LOH) or through repression of neoantigen expression permitting immune evasion (Fig. 3c). Among tumours without HLA LOH and with high levels of immune infiltration, diminished neoantigen expression was more pronounced when we used the more-stringent definition of strongly binding neoantigens (Extended Data Fig. 6h).

Next we investigated whether there was evidence for negative selection of clones that contain expressed neoantigens. We observed an enrichment of neoantigens in genes that were expressed at a low level in the tumour sample (≤1 transcript per million), as compared to non-synonymous non-neoantigens (P = 5.5 × 10−10, odds ratio = 1.3) (Extended Data Fig. 6i). This enrichment was more pronounced when we considered only strong neoantigens (P = 6.8 × 10−13, odds ratio = 1.4) (Extended Data Fig. 6i). Neoantigens identified in TRACERx were also less likely to occur in genes that were consistently expressed across 1,019 NSCLC samples from The Cancer Genome Atlas (TCGA) (Fig. 3d), as compared to non-synonymous predicted non-neoantigens (Methods). In genes that are consistently expressed in TCGA samples, the generation of neoantigenic mutations in the TRACERx cohort was most reduced among tumours with high levels of immune infiltration (P = 2.1 × 10−4, odds ratio = 0.77); however, we also observe this reduction among tumours with low (P = 1.8 × 10−3, odds ratio = 0.82) or heterogeneous (P = 0.04, odds ratio = 0.88) levels of infiltration. This is consistent with tumours with low levels of immune infiltration having once been subject to the selective pressures of an active immune microenvironment (Fig. 3d).

To investigate the methylation status of neoantigens as a mechanism of neoantigen repression, we performed multi-region reduced-representation bisulfite sequencing (RRBS) (Methods) on 79 out of the 164 tumour samples (28 out of 64 patients) in the TRACERx RNA-seq cohort, in addition to adjacent normal tissue (Fig. 3e, Supplementary Table 2). Among genes that carry neoantigenic mutations, an 11.4-fold increase in promoter hypermethylation was observed for genes that were not expressed as compared to genes that were expressed (χ2 test, P = 1.6 × 10−4) (Fig. 3f). To determine whether the observed downregulation was neoantigen-specific, we compared promoter hypermethylation between all neoantigens and the same genes without the neoantigen, in samples matched for purity and ploidy. Overall, non-expressed neoantigens were more likely to exhibit promoter hypermethylation than the same genes without a neoantigen (χ2 test, P = 0.045, odds ratio = 2.3) (Fig. 3g, Supplementary Table 3). Among expressed neoantigens, we observed no difference in promoter hypermethylation state as compared to non-mutated samples matched for purity and ploidy (χ2 test, P = 0.67, odds ratio = 0.48) (Fig. 3h, Supplementary Table 4). These findings suggest that immune pressures may select for promoter hypermethylation and neoantigen silencing in evolving subclones.

## Pervasive disruption to antigen presentation

Defects in antigen presentation that interrupt tumour antigen recognition19,20 may provide another immune-evasion mechanism. To understand the importance of these avenues of immune escape in the treatment-naive setting, we mapped their occurrence region by region (Fig. 4a, b, Extended Data Fig. 7a, Methods).

Disruptions to antigen presentation—through HLA LOH or through mutations that affect the stability of the major histocompatibility complex (MHC), the HLA enhanceosome and peptide generation—were frequently observed in both lung histologies (56% of lung adenocarcinomas and 78% of lung squamous cell carcinomas). HLA LOH and alterations that affect other components of the antigen presentation machinery (Methods), including B2M mutations, had a tendency to be mutually exclusive (lung adenocarcinoma, P = 9.3 × 10−4; lung squamous cell carcinoma, P = 0.015), which supports the notion that dysfunction in antigen presentation can act as a potent immune-escape mechanism. Moreover, consistent with previous findings20, highly infiltrated regions of lung adenocarcinoma tumours were prone to HLA LOH (P = 3 × 10−3, odds ratio = 2.4).

Loss of HLA-C in particular may result in the loss of the killer-cell immunoglobulin-like receptor signal that inhibits elimination through the activity of natural killer cells21. There are two groups of HLA-C alleles (HLA-C1 and HLA-C2), each of which has a different specificity to the killer-cell immunoglobulin-like receptor22. Thus, tumour cells from heterozygous patients (HLA-C1 and HLA-C2) would be expected to be targeted for elimination mediated by natural killer cells following loss of either of the HLA-C alleles (Extended Data Fig. 7b). Conversely, tumour cells from patients with homozygous HLA-C alleles may avoid elimination mediated by natural killer cells. Consistent with this, infiltration by natural killer cells was increased in tumour regions that were heterozygous for HLA-C1 and HLA-C2 following HLA-C LOH (P = 6.2 × 10−7) (Extended Data Fig. 7c). Increased infiltration by natural killer cells was not observed among tumours that had not undergone HLA-C LOH (P = 0.12), which suggests that this change in the tumour microenvironment results from loss of the HLA-C inhibitory ‘self’ signal.

## Immune-evasion capacity is prognostic in NSCLC

Finally, we examined whether combining estimates of immune infiltration with immune-escape mechanisms that could be quantified at the patient level could provide prognostic power. Tumours were classified as exhibiting a low capacity for immune evasion (uniformly high levels of immune infiltration or no evidence of immune evasion (DNA immunoediting score > 1 and no disruption of antigen presentation)) or high capacity for immune evasion (at least one region with low levels of immune infiltration as well as defective antigen presentation or DNA immunoediting score < 1). Patients whose tumours had a low immune-evasion capacity had significantly longer disease-free survival (P = 9.0 × 10−4) (Fig. 4c).

To explore these results in the context of previous findings that relate to the importance of clonal neoantigens23, we also grouped patients into two groups (high versus low clonal neoantigen burden), defined as upper quartile of cohort versus lower quartiles, as done previously23. Consistent with previous results, a high clonal neoantigen burden was associated with increased disease-free survival among lung adenocarcinoma (P = 0.022) and also among lung squamous cell carcinoma (P = 0.025) (Extended Data Fig. 8a). The association that we observed between clonal neoantigens and disease-free survival was not dependent on the specific threshold used (Extended Data Fig. 8b), and clonal neoantigen burden remained significant in a multivariate model that included stage, histology, age, gender, pack years and adjuvant therapy (P = 0.02). Conversely, no significant relationship between subclonal neoantigen burden or total neoantigen burden and disease-free survival was observed (Extended Data Fig. 8c–e).

However, when we focused on tumours with a low clonal neoantigen load, the immune-evasion capacity of a tumour was still prognostic (P = 5.3 × 10−3), which indicates that in the absence of immune evasion even a low clonal neoantigen burden may be sufficient to elicit an effective immune response (Fig. 4d).

Furthermore, we observed that tumours with either a high clonal neoantigen load or low immune-evasion capacity exhibited significantly increased disease-free survival (P = 4.9 × 10−6) (Fig. 4e). This association remained significant in a multivariate model that included stage, histology, age, gender, pack years and adjuvant therapy (P < 0.001) (Extended Data Fig. 8f). These data suggest that it is important to consider the many facets of the interaction between the tumour and immune microenvironment when predicting clinical outcomes.

## Discussion

To capture the complex interplay between cancer genomic evolution and anti-tumour immunity in lung cancer, we integrated genomic, transcriptomic, epigenomic, and pathologic data to define how tumours are sculpted by the immune microenvironment, and to investigate which mechanisms of immune escape influence tumour evolution, as well as the relationships of active tumour–immune interactions with clinical outcome. Our results suggest the immune microenvironment is highly variable between and within patients and can be markedly different between distinct regions of the same tumour—nearly a third of the tumours we studied exhibited diverse levels of immune infiltration.

Our results provide evidence that tumour evolution is shaped through immunoediting mechanisms that affect either antigen presentation or neoantigenic mutations themselves, at both the DNA and RNA level.

Consistent with disruption to the antigen presentation machinery being subject to strong positive selection24, we found HLA LOH tended towards being mutually exclusive with other forms of disruption of antigen presentation such as mutations that affect MHC stability, the HLA enhanceosome or peptide generation. At the DNA level, sparsely infiltrated tumours were enriched for the elimination of clonal neoantigens, which indicates the importance of chromosomal instability in driving loss of neoantigens.

As a whole, tumours exhibited fewer neoantigens in expressed genes than expected, which potentially reflects an historical purifying selection of neoantigens. Tumours with high levels of immune infiltration and intact HLA alleles also displayed transcriptomic neoantigen depletion, which suggests that these tumours may evade immune elimination either through HLA LOH or by suppressing neoantigen expression—but seldom both. We identify promoter hypermethylation as a potential mechanism of transcriptomic neoantigen depletion that leads to the preferential repression of genes that contain neoantigenic mutations. Promoter hypermethylation affected neoantigen expression in about 23% of the neoantigens we studied, which indicates that additional mechanisms of neoantigen transcription repression await elucidation.

Through the combination of immune microenvironment and tumour immune-escape factors, we defined an estimate of the immune-evasion capacity of each tumour; high immune-evasion capactiy was associated with poorer clinical outcome. As TRACERx is a prospective study of early-stage untreated NSCLC, it will be important to validate these findings in the extended longitudinal cohort as the study matures.

The observation that clonal neoantigens can be subject to copy-number loss and transcript repression, even in untreated early-stage disease, may have important implications for predicting response and resistance to immune-checkpoint blockade. It has previously been shown that, after checkpoint blockade therapy, clonal neoantigens have been eliminated from relapse samples, which reshapes the T cell receptor repertoire of these samples18. Clonal neoantigens that occur in expressed genes that are required for lung cancer cell fitness may make ideal targets for vaccine or adoptive cell therapies.

The extent to which neoantigen transcript depletion is dynamic in response to therapy and tumour dissemination, and whether such phenomena may be harnessed to improve responses to immunotherapy, is unknown. Epigenetic immune evasion supports the potential for epigenetic modulatory agents, in combination with immunotherapy, to restore or improve tumour immunogenicity25. One possibility is that epigenetic repression of a neoantigen in a lung-cancer-expressed gene may result in a fitness cost. This may shed light on recent phenomena observed in tumours with acquired resistance to checkpoint inhibitor therapy that respond a second time upon re-challenge with the same drug26.

Our results suggest that early-stage, untreated NSCLCs are frequently characterized by multiple independent mechanisms of immune evasion within individual tumours, which emphasizes the strong selection pressures that the immune system imposes upon tumour evolution. Our results suggest that the beneficial role of successful immune surveillance, and the diversity of immune-evasion mechanisms, should be considered and harnessed in immunotherapeutic interventions.

## Methods

The experiments were not randomized and investigators were not blinded to allocation during experiments and outcome assessment.

### Patients and samples

The cohort evaluated in this study comes from the first 100 patients prospectively analysed by the lung TRACERx study (https://clinicaltrials.gov/ct2/show/NCT01888601, approved by an independent research ethics committee, 13/LO/1546) and mirrors the previously described prospective 100 patient cohort5.

Informed consent for entry into the TRACERx study was mandatory and obtained from every patient. There were 68 male and 32 female patients with NSCLC in the TRACERx study, with a median age of 68. The cohort is predominantly early-stage: Ia (26), Ib (36), IIa (13), IIb (11), IIIa (13) and IIIb (1). Seventy-two had no adjuvant treatment and 28 had adjuvant therapy. All patients were assigned a study identity number that was known to the patient. These were subsequently converted to linked study identities such that the patients could not identify themselves in study publications. All human samples (tissue and blood) were linked to the study identity number and barcoded such that they were anonymized and tracked on a centralized database, which was overseen by the study sponsor only.

### TRACERx 100 RNA-seq

RNA was extracted from the TRACERx 100 cohort using a modification of the AllPrep kit (Qiagen), as previously described5. RNA integrity was assessed by TapeStation (Agilent Technologies). Samples that had a RNA integrity score ≥ 5 were sent to the Oxford Genomics Centre for whole-RNA (RiboZero depleted) paired-end sequencing. The ribo-depleted fraction was selected from the total RNA provided, before conversion to cDNA. Second-strand cDNA synthesis incorporated dUTP. The cDNA was end-repaired, A-tailed and adaptor-ligated. Before amplification, samples underwent uridine digestion. The prepared libraries were size-selected, multiplexed and underwent quality control before paired-end sequencing. Reads were 75 base pairs in length. FASTQ data underwent quality control and were aligned to the hg19 genome using STAR28. Transcript quantification was performed using RSEM with default parameters29.

### TRACERx 100 RRBS

RRBS was obtained for approximately half of the NSCLC cohort with RNA-seq data (79 out of 164 tumour regions from 28 out of 64 patients, each with matched normal tissue). The NuGEN ovation RRBS methyl-seq system, adapted by the manufacturer for automation on an Agilent Bravo liquid handling robot, was used to generate sequencing libraries by enzymatically digesting 100 ng of gDNA using MspI, followed by adaptor ligation and the final repair step. Generated libraries were bisulfite-converted using the EpiTect Fast DNA Bisulfite Kit (Qiagen), PCR-amplified for 12 cycles and purified using Agencourt RNAClean XP magnetic beads. Purified libraries were quantified by Qubit double-strand DNA high-sensitivity assay (Invitrogen) and underwent quality control using Agilent Bioanalyzer high-sensitivity DNA Assay (Agilent Technologies). Eight samples were multiplexed per flow cell and sequenced on an Illumina HiSeq2500 system using HiSeq SBS kit v.4 in paired-end 100-bp runs for CRUK0062 and single-end 100-bp runs for the others, yielding on average 150 million raw sequencing reads per sample. Sequencing results were checked with FastQC v.0.11.2 (Babraham Institute, https://www.babraham.ac.uk/), adaptor sequences were trimmed with Trim Galore! V.0.3.7 (a wrapper around Cutadapt30) and NuGEN v.1.0 diversity trimming script (https://github.com/nugentechnologies/NuMetRRBS), and reads were aligned to the UCSC hg19 reference assembly using Bismark v.0.14.430. Read deduplication was carried out using NuDup (pre-release version, March 2015, https://github.com/nugentechnologies/nudup/), leveraging NuGEN’s molecular tagging technology that produces on average 100 million unique reads per sample.

### Statistical information

All statistical tests were performed in R. No statistical methods were used to predetermine sample size. Tests involving correlations were done using ‘cor.test’ with the Spearman’s method. Tests involving comparisons of distributions were done using ‘wilcox.test’ or ‘t.test’ using the unpaired option, unless otherwise stated. Hazard ratios and P values were calculated with the ‘survival’ package. For all statistical tests, the number of data points included are plotted or annotated in the corresponding figure.

### Selection of immune-infiltration approach

Previously defined measures of immune infiltration and activity were used to classify the immune microenvironment of all tumours (and tumour regions) for which RNA-seq data were available6,7,8,12,26. The genes used in each one of the immune-estimation approaches were tested to see whether they fit two criteria: (1) having a negative relationship with tumour purity, as genes that define immune subtypes are expressed in infiltrating immune cells8 and (2) not showing a positive correlation with tumour copy number at the gene locus (a positive correlation may indicate that the gene is expressed by the tumour cell, which would confound immune estimates). The proportion of genes in each immune-estimation method that passed these two criteria was compared. Finally, for each method, the immune estimates themselves were compared against independent ground-truth measures (pathology TIL estimates, flow cytometry quantification and T cell receptor abundance). The RNA-seq immune estimation that performed best in the TRACERx cohort was chosen.

### Estimating immune cell populations

#### Estimations based on RNA-seq

The method of Danaher et al.11 was used to estimate immune cell populations for every tumour region for which RNA-seq data were available. The immune cell populations were: CD8+ T cells (CD8), exhausted CD8+ T cells (CD8 exhausted), CD4+ T cells (CD4), regulatory T cells (Treg), helper T cells (TH1), dendritic cells (dendritic), B cells (B cell), mast cells (mast), natural killer cells (NK), natural killer CD56 cells (NK CD56), neutrophils, macrophages, CD45+ cells (CD45), and measures for total T cells (T cells), total TILs (total TIL) and cytotoxic cells (cyto). Because the original paper did not identify any suitable genes for CD4+ T cell population estimation, and we found a poor relationship with ground-truth measures in the TRACERx cohort using the Danaher et al.11 CD4+ T cell estimates, the Davoli et al.6 CD4+ T cell estimates were used instead. The Davoli et al.6 estimate was chosen as, overall, this estimate matched the estimates of Danaher et al.11 closely, and performed nearly as well for the selection criteria.

The Jiang et al.12 immune measures were calculated using the TIDE web interface (http://tide.dfci.harvard.edu/).

#### Pathology TIL estimates

TILs were estimated from pathology slides using international established guidelines, developed by the International Immuno-Oncology Biomarker Working Group27. In brief, the relative proportion of stromal area to tumour area was determined from the pathology slide of a given tumour region. TILs were reported for the stromal compartment (= per cent stromal TILs). The denominator used to determine the per cent stromal TILs was the area of stromal tissue (that is, the area occupied by mononuclear inflammatory cells over total intratumoral stromal area) rather than the number of stromal cells (that is, the fraction of total stromal nuclei that represent mononuclear inflammatory cell nuclei). This method has been demonstrated to be reproducible among trained pathologists31. An inter-person concordance was performed, and this demonstrated high reproducibility. The International Immuno-Oncology Biomarker Working Group has developed a freely available training tool to train pathologists for optimal TIL assessment on haematoxylin–eosin slides (www.tilsincancer.org).

#### Flow measurements

Tissue samples were collected and transported in RPMI-1640 (Sigma, cat. no. R0883-500ML). Single-cell suspensions were produced by enzymatic digestion using liberase, with subsequent cellular disaggregation using a Miltenyi gentleMACS Octo Dissociator. Lymphocytes were isolated from single-cell suspension by gradient centrifugation on Ficoll Paque Plus (GE Healthcare, cat. no. 17-1440-03) and stored in liquid nitrogen. Blood samples were collected in BD Vacutainer EDTA blood collection tubes (BD cat. no. 367525). Peripheral blood mononuclear cells were then isolated by gradient centrifugation on Ficoll Paque (GE Healthcare, cat. no. 17-1440-03), and stored in liquid nitrogen.

Fc receptors were blocked with human Fc receptor binding inhibitor (Thermo) before staining. Non-viable cells were stained using the eBioscience Fixable Viability Dye eFluor 780 (Thermo). Cells were stained in BD Brilliant stain buffer (BD cat. no. 563794) with the following monoclonal antibodies: anti-human CD3 (clone SK7, BD cat. no. 565511), anti-human CD4 (clone SK3, BD cat. no. 566003) and anti-human CD8 (clone RPA-T8, BD cat. no. 564804). Data were acquired on a BD Symphony flow cytometer and analysed in FlowJo. Cells were gated for size, single cells, live cells and CD3+CD8+ T cells.

#### T cell receptor abundance

A previously developed quantitative experimental and computational T cell receptor sequencing pipeline32 was used for the high-throughput sequencing of α and β T cell receptor chains. T cell receptor sequencing was performed on whole RNA extracted from multi-region tumour specimens. A distinct feature of this T cell receptor sequencing protocol is the use of a unique molecular identifier that enables correction for PCR and sequencing errors, thereby providing a quantitative and reproducible method of library preparation32,33.

### Classifying tumour regions as having high or low levels of immune infiltration

Tumours were split into either lung adenocarcinoma or lung squamous cell carcinoma. The Danaher et al.11 estimates for all tumour regions from each histological type were clustered together using ‘ward.D2’. The dendrogram was then cut into two. The samples that fell in the portion with higher estimates for levels of immune infiltrate were considered to be tumour regions with high levels of immune infiltration (immune high). Conversely, the samples that fell in the portion with lower estimates for levels of immune infiltrate were considered to be tumour regions with low levels of immune infiltration (immune low). If all tumour regions from a given sample were classified as immune low, that tumour was designated as uniformly immune low; if all tumour regions from a given sample were classified as immune high, that tumour was designated as uniformly immune high. If some tumour regions from the same tumour were immune high and others were immune low, the overall tumour classification was classified as ‘heterogeneous’.

If a tumour region had no RNA-seq data available, it could be included in the analysis using the pathology TIL estimates. A tumour region was classified on the basis of pathology TIL estimates by determining whether the pathology TIL estimate for the tumour region in question was closer to the median of the pathology TIL estimates from the immune-high or immune-low tumour regions with RNA-seq data that had been clustered. The RNA-seq cohort (164 tumour regions from 64 TRACERx patients) was expanded by rescuing tumour regions that lacked accompanying RNA-seq data (Extended Data Fig. 2a) with pathology TIL estimates (234 tumour regions from 83 TRACERx patients) (Extended Data Fig. 4e).

### Calculation of IPRES score

The calculation of the IPRES score was performed as previously described13.

### Distance measures

#### Immune distance

The immune distance was determined by taking the Euclidean distance of immune-infiltrate estimates between tumour regions.

#### Genomic distance

The genomic distance was calculated by taking the Euclidean distance of the mutations present between tumour regions. All mutations present in any region from a tumour were turned into a binary matrix, in which the rows were mutations and the columns were tumour regions. This matrix was clustered, and the pairwise distance between any two tumour regions was determined.

### Calculation of Shannon diversity

For each tumour region, the Shannon diversity was estimated using the command ‘entropy.empirical’ from the ‘entropy’ R package. This was calculated on the basis of the number and prevalence of different tumour subclones found in that region, such that a tumour region that contained only one subclone was assigned a value of 0.

The Shannon diversity score, H, followed the formula H = −Σpi × log(pi), in which pi is the probability of the ith clone appearing in the tumour cell population.

### Predicted neoantigen binders

Novel 9–11mer peptides that could arise from identified non-silent mutations present in the sample5 were determined. The predicted half-maximal inhibitory concentration binding affinities and rank percentage scores—representing the rank of the predicted affinity compared to a set of 400,000 random natural peptides—were calculated for all peptides that bound to each of the patient’s HLA alleles using netMHCpan-2.817,34 and netMHC-4.034. Using established thresholds, predicted binders were considered to be those peptides that had a predicted binding affinity < 500 nM or a rank percentage score < 2% by either tool. Strong predicted binders were those peptides that had a predicted binding affinity < 50 nM or rank percentage score < 0.5%. Of the 28,489 non-synonymous mutations in this cohort, 24,494 were predicted to encode peptides that were capable of binding to at least one of the patient’s HLA class I alleles (binding affinity < 500 nM or rank percentage < 2) and 13,884 were predicted to strongly bind (binding affinity < 50 nM or rank percentage < 0.5)17.

When RNA-seq data were available, a neoantigen was considered to be expressed if at least five RNA-seq reads mapped to the mutation position, and at least three contained the mutated base.

### Neoantigen depletion

#### Transcriptional depletion

Transcriptional neoantigen depletion was identified by first dividing tumours into immune classifications and HLA LOH categories (loss or no loss). All non-synonymous mutations were annotated as expressed in the RNA-seq data or not, using the definitions above. Then, a test for enrichment was performed to determine whether non-synonymous mutations that were neoantigens were less likely to be expressed than non-synonymous mutations that were not predicted to be neoantigens.

#### Copy-number depletion

Copy-number neoantigen depletion was identified by first dividing tumours into immune classifications. All non-synonymous mutations were annotated as either in a region of subclonal copy-number loss or not as identified in a previous report5. Then, a test for enrichment was performed to determine whether non-synonymous mutations that were neoantigens were more likely to be in regions of subclonal copy-number loss than were non-synonymous mutations that were not predicted to be neoantigens.

#### Methylation

Neoantigens in genes that are consistently expressed across the TCGA NSCLC cohort were classified in two groups: expressed (in which the mutant is detected in at least 30 reads) and non-expressed (in which no mutant transcript is observed). Of the 375 non-expressed and 883 expressed neoantigens with matched RRBS data, 77 and 406 were unique, respectively (others were duplicates from different regions of the same patient). We down-sampled the expressed neoantigens list to match, as closely as possible, the gene expression and the variant allele frequency distributions that were observed for the non-expressed neoantigens. We then assessed differential methylation as follows: bulk and normal per-CpG methylation rates in promoters (2 kb up- and downstream of the transcription start site) modelled as beta distributions, B(α + 1, β + 1), in which α represents the observed methylated read counts and β represents the unmethylated read counts. We compute $$P\left(B{\left(\alpha,\beta \right)}_{{\rm{tum}}} > B{\left(\alpha,\beta \right)}_{{\rm{norm}}}\right)$$ exactly, using

$${\rm{P}}{\rm{r}}({\theta }_{{\rm{t}}{\rm{u}}{\rm{m}}} > {\theta }_{{\rm{n}}{\rm{o}}{\rm{r}}{\rm{m}}})=\sum _{i=0}^{{\alpha }_{{\rm{t}}{\rm{u}}{\rm{m}}}-1}\frac{B({\alpha }_{{\rm{n}}{\rm{o}}{\rm{r}}{\rm{m}}}+i,{\beta }_{{\rm{n}}{\rm{o}}{\rm{r}}{\rm{m}}}+{\beta }_{{\rm{t}}{\rm{u}}{\rm{m}}})}{({\beta }_{{\rm{t}}{\rm{u}}{\rm{m}}}+i)B(1+i,{\beta }_{{\rm{t}}{\rm{u}}{\rm{m}}})B({\alpha }_{{\rm{n}}{\rm{o}}{\rm{r}}{\rm{m}}},{\beta }_{{\rm{n}}{\rm{o}}{\rm{r}}{\rm{m}}})}$$

We then apply Hochberg family-wise error rate correction, and flag promoters as hypermethylated when ≥3 CpGs are significantly hypermethylated (q < 0.05). Promoter counts are tested in a 2 × 2 contingency table (methylation status versus expression status or mutation status) using a χ2 test.

#### Depletion of neoantigens in expressed genes

Every mutated gene was first classified as consistently expressed or not consistently expressed in TCGA NSCLC. Genes were considered consistently expressed if they were expressed at ≥1 transcript per million in 95% of the TCGA samples for each histology. The expression classification of non-synonymous mutations that result in neoantigens was compared against the expression classification of non-synonymous mutations that do not result in neoantigens. A contingency table comparing neoantigenic to non-neoantigenic mutations, and whether they occurred in expressed or non-expressed genes, was obtained. Finally, using this contingency table, the odds ratio detailing the extent of neoantigen depletion in expressed genes was calculated using a Fisher’s exact test. An odds ratio of <1 indicates that neoantigenic mutations were less likely to occur in consistently expressed genes than non-synonymous, non-neoantigenic mutations.

### Identifying tumour regions with HLA LOH

Tumour regions that contain an HLA LOH event were identified using a previously described method20.

### Immune-evasion alterations

Antigen-presentation-pathway genes were compiled from a previous report35, and affected the HLA enhanceosome, peptide generation, chaperones or the MHC complex itself. They included disruptive events (non-silent mutations or copy-number loss defined relative to ploidy5) of the following genes: CIITA, IRF1, PSME1, PSME2, PSME3, ERAP1, ERAP2, HSPA (also known as PSMA7), HSPC (also known as HSPBP1), TAP1, TAP2, TAPBP, CALR, CNX (also known as CANX), PDIA3 and B2M.

### TCGA data

RNA-seq data were downloaded from the TCGA data portal. For each lung adenocarcinoma and lung squamous cell carcinoma, all available ‘Level_3’ gene-level data were obtained.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

### Code availability

All code used for analyses was written in R version 3.3.1 and is available at: https://bitbucket.org/snippets/raerose01/5enKR5.

## Data availability

Sequence data used during the study has been deposited at the European Genome–phenome Archive (EGA), which is hosted by The European Bioinformatics Institute (EBI) and the Centre for Genomic Regulation (CRG) under the accession codes: EGAS00001003458 (RNA-seq) and EGAS00001003484 (RRBS). Further information about EGA can be found at https://ega-archive.org. Any other relevant data can be obtained from the corresponding authors upon reasonable request.

## References

1. 1.

Galon, J. et al. Type, density, and location of immune cells within human colorectal tumors predict clinical outcome. Science 313, 1960–1964 (2006).

2. 2.

Charoentong, P. et al. Pan-cancer immunogenomic analyses reveal genotype-immunophenotype relationships and predictors of response to checkpoint blockade. Cell Reports 18, 248–262 (2017).

3. 3.

Zhang, A. W. et al. Interfaces of malignant and immunologic clonal dynamics in ovarian cancer. Cell 173, 1755–1769.e1722 (2018).

4. 4.

Milo, I. et al. The immune system profoundly restricts intratumor genetic heterogeneity. Sci. Immunol. 3, eaat1435 (2018).

5. 5.

Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).

6. 6.

Davoli, T., Uno, H., Wooten, E. C. & Elledge, S. J. Tumor aneuploidy correlates with markers of immune evasion and with reduced response to immunotherapy. Science 355, eaaf8399 (2017).

7. 7.

Racle, J., de Jonge, K., Baumgaertner, P., Speiser, D. E. & Gfeller, D. Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data. eLife 6, e26476 (2017).

8. 8.

Li, B. et al. Comprehensive analyses of tumor immunity: implications for cancer immunotherapy. Genome Biol. 17, 174 (2016).

9. 9.

Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457 (2015).

10. 10.

Aran, D., Hu, Z. & Butte, A. J. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 18, 220 (2017).

11. 11.

Danaher, P. et al. Gene expression markers of tumor infiltrating leukocytes. J. Immunother. Cancer 5, 18 (2017).

12. 12.

Jiang, P. et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat. Med. 24, 1550–1558 (2018).

13. 13.

Hugo, W. et al. Genomic and transcriptomic features of response to anti-PD-1 therapy in metastatic melanoma. Cell 165, 35–44 (2016).

14. 14.

Ayers, M. et al. IFN-γ-related mRNA profile predicts clinical response to PD-1 blockade. J. Clin. Invest. 127, 2930–2940 (2017).

15. 15.

Hellmann, M. D. et al. Tumor mutational burden and efficacy of nivolumab monotherapy and in combination with ipilimumab in small-cell lung cancer. Cancer Cell 33, 853–861.e854 (2018).

16. 16.

Rooney, M. S., Shukla, S. A., Wu, C. J., Getz, G. & Hacohen, N. Molecular and genetic properties of tumors associated with local immune cytolytic activity. Cell 160, 48–61 (2015).

17. 17.

Hoof, I. et al. NetMHCpan, a method for MHC class I binding prediction beyond humans. Immunogenetics 61, 1–13 (2009).

18. 18.

Anagnostou, V. et al. Evolution of neoantigen landscape during immune checkpoint blockade in non-small cell lung cancer. Cancer Discov. 7, 264–276 (2017).

19. 19.

Tran, E. et al. T-cell transfer therapy targeting mutant KRAS in cancer. N. Engl. J. Med. 375, 2255–2262 (2016).

20. 20.

McGranahan, N. et al. Allele-specific HLA loss and immune escape in lung cancer evolution. Cell 171, 1259–1271.e1211 (2017).

21. 21.

Thielens, A., Vivier, E. & Romagné, F. NK cell MHC class I specific receptors (KIR): from biology to clinical intervention. Curr. Opin. Immunol. 24, 239–245 (2012).

22. 22.

Fischer, J. C. et al. Relevance of C1 and C2 epitopes for hemopoietic stem cell transplantation: role for sequential acquisition of HLA-C-specific inhibitory killer Ig-like receptor. J. Immunol. 178, 3918–3923 (2007).

23. 23.

McGranahan, N. et al. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science 351, 1463–1469 (2016).

24. 24.

Garrido, F., Ruiz-Cabello, F. & Aptsiauri, N. Rejection versus escape: the tumor MHC dilemma. Cancer Immunol. Immunother. 66, 259–271 (2017).

25. 25.

Dunn, J. & Rao, S. Epigenetics and immunotherapy: the current state of play. Mol. Immunol. 87, 227–239 (2017).

26. 26.

Bernard-Tessier, A. et al. Outcomes of long-term responders to anti-programmed death 1 and anti-programmed death ligand 1 when being rechallenged with the same anti-programmed death 1 and anti-programmed death ligand 1 at progression. Eur. J. Cancer 101, 160–164 (2018).

27. 27.

Hendry, S. et al. Assessing tumor-infiltrating lymphocytes in solid tumors: a practical review for pathologists and proposal for a standardized method from the international immuno-oncology biomarkers working group: part 2: TILs in melanoma, gastrointestinal tract carcinomas, non-small cell lung carcinoma and mesothelioma, endometrial and ovarian carcinomas, squamous cell carcinoma of the head and neck, genitourinary carcinomas, and primary brain tumors. Adv. Anat. Pathol. 24, 311–335 (2017).

28. 28.

Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

29. 29.

Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).

30. 30.

31. 31.

Denkert, C. et al. Standardized evaluation of tumor-infiltrating lymphocytes in breast cancer: results of the ring studies of the international immuno-oncology biomarker working group. Mod. Pathol. 29, 1155–1164 (2016).

32. 32.

Oakes, T. et al. Quantitative characterization of the T cell receptor repertoire of naïve and memory subsets using an integrated experimental and computational pipeline which is robust, economical, and versatile. Front. Immunol. 8, 1267 (2017).

33. 33.

Best, K., Oakes, T., Heather, J. M., Shawe-Taylor, J. & Chain, B. Computational analysis of stochastic heterogeneity in PCR amplification efficiency revealed by single molecule barcoding. Sci. Rep. 5, 14629 (2015).

34. 34.

Andreatta, M. & Nielsen, M. Gapped sequence alignment using artificial neural networks: application to the MHC class I system. Bioinformatics 32, 511–517 (2016).

35. 35.

Arrieta, V. A. et al. The possibility of cancer immune editing in gliomas. A critical review. OncoImmunology 7, e1445458 (2018).

## Acknowledgements

We thank the members of the TRACERx consortium for participating in this study. C.S. is Royal Society Napier Research Professor. C.S. is supported by the Francis Crick Institute, which receives its core funding from the Medical Research Council (FC001169), the Wellcome Trust (FC001169), and Cancer Research UK (FC001169). C.S. is funded by Cancer Research UK (TRACERx and CRUK Cancer Immunotherapy Catalyst Network), the CRUK Lung Cancer Centre of Excellence, Stand Up 2 Cancer (SU2C), the Rosetrees and Stoneygate Trusts, NovoNordisk Foundation (ID 16584), the Breast Cancer Research Foundation (BCRF), the European Research Council Consolidator Grant (FP7-THESEUS-617844), European Commission ITN (FP7-PloidyNet-607722), Chromavision (this project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 665233), National Institute for Health Research (NIHR), the University College London Hospitals Biomedical Research Centre (BRC) and the Cancer Research UK University College London Experimental Cancer Medicine Centre. N.M. is a Sir Henry Dale Fellow, jointly funded by the Wellcome Trust and the Royal Society (211179/Z/18/Z), and also receives funding from CRUK Lung Cancer Centre of Excellence, Rosetrees and the University College London Hospitals Biomedical Research Centre (BRC) and the Cancer Research UK University College London Experimental Cancer Medicine Centre. E.L.C., J.D. and P.V.L. are supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC001202), the UK Medical Research Council (FC001202), and the Wellcome Trust (FC001202). P.V.L. is a Winton Group Leader in recognition of the Winton Charitable Foundation’s support towards the establishment of The Francis Crick Institute. J.D. is a postdoctoral fellow of the Research Foundation - Flanders (FWO). S.A.Q. is funded by a CRUK Senior Cancer Research Fellowship (C36463/A22246), a CRUK Biotherapeutic Program Grant (C36463/A20764), the Cancer Immunotherapy Accelerator Award (CITA-CRUK) (C33499/A20265) and Rosetrees. M.T. received funding from the People Programme Marie Curie Actions (FP7/2007-2013/WHRI-ACADEMY-608765) and the Danish Council for Strategic Research (1309-00006B). The TRACERx study (Clinicaltrials.gov no: NCT01888601) is sponsored by University College London (UCL/12/0279) and has been approved by an independent Research Ethics Committee (13/LO/1546). TRACERx is funded by Cancer Research UK (C11496/A17786) and coordinated through the Cancer Research UK and UCL Cancer Trials Centre. For the RRBS methylation data, we acknowledge technical support from the CRUK–UCL Centre-funded Genomics and Genome Engineering Core Facility of the UCL Cancer Institute and grant support from the NIHR BRC (BRC275/CN/SB/101330). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care. The results published here are based in part upon data generated by The Cancer Genome Atlas pilot project established by the NCI and the National Human Genome Research Institute. The data were retrieved through database of Genotypes and Phenotypes (dbGaP) authorization (accession number phs000178.v9.p8). Information about TCGA and the constituent investigators and institutions the TCGA research network can be found at http://cancergenome.nih.gov/.

### Reviewer information

Nature thanks Lynette Sholl and the other anonymous reviewer(s) for their contribution to the peer review of this work.

## Author information

Authors

### Contributions

R.R. created the bioinformatics analysis pipeline and wrote the manuscript. R.S., M.A.B., D.A.M., C.T.H. and T.L. jointly analysed pathology TIL estimates. J.L.R., J.Y.H. and E.G. performed flow cytometry experiments for validating immune signatures. K.J. performed T cell receptor sequencing experiments for validating immune signatures. S.V. performed sample preparation and RNA extraction. E.L.C., J.D., A.F., G.A.W. and M.T. generated and analysed RRBS data. E.L.C. and J.D. performed DNA methylation analyses and neoantigen methylation analyses, under supervision of S.B. and P.V.L. N.J.B. gave advice on immune signatures, conducted analyses of multiregion sequencing exome data and reviewed the manuscript. M.J.-H. designed study protocols and helped to analyse patient clinical characteristics. Z.S., S.L. and M.D.H. helped to direct avenues of bioinformatics and pathology TIL analysis. B.C., J.H. and S.A.Q. provided data analysis support and supervision. N.M. and C.S. jointly supervised the study and helped to write the manuscript.

### Corresponding authors

Correspondence to Nicholas McGranahan or Charles Swanton.

## Ethics declarations

### Competing interests

C.S. receives grant support from Pfizer, AstraZeneca, BMS and Ventana. C.S. has consulted for Boehringer Ingelheim, Eli Lilly, Servier, Novartis, Roche-Genentech, GlaxoSmithKline, Pfizer, BMS, Celgene, AstraZeneca, Illumina and the Sarah Cannon Research Institute. C.S. is a shareholder of Apogen Biotechnologies, Epic Bioscience, GRAIL, and has stock options in and is co-founder of Achilles Therapeutics. S.A.Q. is a co-founder of Achilles Therapeutics. R.R., N.M. and G.A.W. have stock options in and have consulted for Achilles Therapeutics. M.A.B has consulted for Achilles Therapeutics. M.D.H. receives research funding from Bristol-Myers Squibb; is paid consultant to Merck, Bristol-Myers Squibb, AstraZeneca, Genentech/Roche, Janssen, Nektar, Syndax, Mirati, and Shattuck Labs; received travel support/honoraria from AztraZeneca and BMS; and a patent has been filed by the Memorial Sloan Kettering Cancer Center related to the use of tumour mutation burden to predict response to immunotherapy (PCT/US2015/062208), which has received licensing fees from PGDx.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Extended data figures and tables

### Extended Data Fig. 1 Determination of robust immune infiltration approach.

ad, The expression of the genes used in each of the immune-signature definitions is correlated against tumour purity (a, b) and tumour copy number (c, d). Random genes (n = 1,000), or genes in the TIMER8 (n = 575), EPIC7 (n = 98), Danaher et al.11 (n = 60), Rooney et al.16 (n = 100) and Davoli et al.6 (n = 75) immune-signature definitions, are plotted. The Spearman’s ρ value of the correlation is plotted for the immune genes that comprise each signature definition, coloured by the P value of the association. The comparisons are performed separately for lung adenocarcinoma and lung squamous cell carcinoma. The median ρ value for the immune-signature set is indicated by the red line. The fraction of genes with an expression value that is significantly correlated with purity or tumour copy number is shown, and compared to a set of random genes. For every immune signature we considered, there was significant enrichment of genes with expression that was negatively correlated with tumour purity (as compared to the random selection of genes) and a significant enrichment of genes with expression that was positively correlated with tumour copy number (as compared to the random selection of genes). e, Scatter plots show the Spearman correlation between pathology TIL estimates and CD8+ T cells as measured by the Danaher et al.11 approach (n = 140), between flow CD8+ T cell estimates and Danaher et al.11 CD8+ T cells (n = 36), T cell receptor sequencing abundance and Danaher et al.11 CD8+ T cells (n = 72), normalized live-flow CD8+ T cell estimates and Danaher et al.11 CD8+ T cells (n = 39) and normalized live-flow CD8+ T cell to Treg ratio and Danaher et al.11 CD8+ cell to Treg ratio estimates (n = 38). Blue dots indicate regions from a lung adenocarcinoma tumour; red dots indicate regions from a lung squamous cell carcinoma tumour. Spearman ρ values, and P values are given for all tumour regions (black), lung adenocarcinoma tumour regions (blue) and lung squamous cell carcinoma tumour regions (red). f, A scatter plot showing the correlation between pathology TIL estimates and CD8+ estimates from each of the immune-infiltration methods is shown (n = 140 tumour regions). Lung adenocarcinoma tumour regions are shown in blue; lung squamous cell carcinoma tumour regions are shown in red. Below, the top six correlations between pathology TIL estimates and an immune-cell subset is shown for each method. Blue boxes indicate positive correlation; red boxes indicate negative correlation. P values were corrected for false discovery rate. g, Example of CD8 T cell quantification in a representative TRACERx TIL sample. TILs were isolated from the tumour regions of surgical resections, as previously described5, and cryopreserved. Thawed samples were stained with a custom-designed 20-marker antibody panel to measure T cell activation, dysfunction and differentiation by flow cytometry.

### Extended Data Fig. 2 TRACERx 100 sample selection and patient characteristics.

a, CONSORT diagram showing the selection of TRACERx 100 patients for RNA-seq and/or pathology TIL estimate analysis. b, Patient characteristics for the TRACERx 100 cohort are shown. Patient characteristics can also be found in Supplementary Table 1.

### Extended Data Fig. 3 Difference in immune infiltration by histology.

The distribution of Danaher et al.11 estimated CD8+ T cell infiltrate is displayed for lung adenocarcinomas (adeno.) and lung squamous cell carcinomas (squam.) (n = 145 tumour regions). The minimum and maximum are indicated by the extreme points of box plot; the median is indicated by a thick horizontal line; and the first and third quartiles are indicated by box edges. A two-sided Wilcoxon rank-sum test is used.

### Extended Data Fig. 4 Incorporating tumour regions that lack RNA-seq data by using pathology TIL estimates.

a, The difference in pathology TIL estimates is shown by immune clusters derived from RNA-seq (n = 139). b, All regional pathology TIL estimates are plotted for each tumour sample (lung adenocarcinoma, n = 121; lung squamous cell carcinoma, n = 90). If a region also had RNA-seq information available, the immune cluster to which that region belonged is also shown as immune high (red) or immune low (blue). Immune clusters for tumour regions without RNA-seq data are annotated as grey. The immune class for the patients is also provided as high (red), low (blue), heterogeneous (orange) or unknown (grey). For all box plots, the minimum and maximum are indicated by the extreme points of the plot; the median is indicated by a thick horizontal line; and first and third quartiles are indicated by box edges. A two-sided Wilcoxon rank-sum test is used for comparisons. c, The number of patients in each immune classification is plotted as inferred from using RNA-seq data alone, or by also incorporating pathology TIL estimates. d, A correlation matrix of the Danaher et al.11 immune-cell estimates with the Jiang et al.12 immunosuppressive cell subsets is shown (Spearman’s test). Positive correlations are indicated in blue and negative correlations are indicated in red. Correlations are significant unless marked with a black X. e, The Jiang et al.12 immune-infiltration estimates are shown for tumour-associated macrophage M2 (TAM M2) and myeloid-derived suppressor cells (MDSC), split by immune cluster (n = 163). f, Tumour purity is shown for the regions of high and low tumour mutational burden (TMB) for every tumour with heterogeneous mutation burdens (n = 12).Two-sided paired t-test is used for comparison. No corrections were made for multiple comparisons.

### Extended Data Fig. 5 Heterogeneity of biomarkers that predict responses to checkpoint blockade.

a, The TIDE gene-signature score of each tumour region is shown per patient, for patients with more than one region available (n = 39). Using a threshold shown by the dashed line, patients are classified as having high TIDE (light blue), low TIDE (dark blue) or heterogeneous TIDE (orange) scores. b, The IPRES gene-signature score of each tumour region is shown per patient, for patients with more than one region available (n = 39). Using the previously defined threshold13 (shown by the dashed line), patients are classified as having low IPRES (light blue), high IPRES (dark blue) or heterogeneous IPRES (orange) scores. c, The expanded Ayers et al.14 IFN signature is shown for each tumour region per patient, for patients with more than one region available (n = 38). For ac, the immune classification of the patient is also given. d, The greatest difference in the expanded Ayers et al.14 IFN signature between tumour regions from the same tumour is plotted, according to whether or not the tumour has heterogeneous levels of immune infiltration (n = 38). A two-sided Wilcoxon rank-sum test is used for comparison. e, Tumour mutational burden of each tumour region is shown per patient (n = 93). Using a threshold of ten mutations per megabase (dashed line), patients are classified as having a low (light blue), high (dark blue) or heterogeneous tumour mutational burden (orange). For all box plots, the minimum and maximum are indicated by extreme points of the plot; the median is indicated by a thick horizontal line; and the first and third quartiles are indicated by box edges. f, A summary of the tumour histology, immune classification, tumour mutational burden status, TIDE category and IPRES category is shown for each tumour (n = 93). There is an enrichment for heterogeneously immune infiltrated tumours to have heterogeneous tumour mutational burden status and heterogeneous TIDE scores (Fisher’s exact test). No corrections were made for multiple comparisons.

### Extended Data Fig. 6 Relationship between immune infiltration and tumour-region diversity.

a, The pairwise copy number (cn) and immune distances between every two tumour regions from the same patient are compared for lung adenocarcinoma (n = 91) and lung squamous cell carcinoma (n = 60). b, c, For each tumour region, the CD8+ T cell score is plotted against the Shannon diversity score. Lung adenocarcinomas (n = 89) (b) and lung squamous cell carcinomas (n = 50) (c) are shown. d, The correlation between pathology TIL estimates and tumour purity is shown for lung adenocarcinoma (n = 120) (blue) and lung squamous cell carcinoma (n = 90) (red) regions. No relationship for either histology is observed. Spearman’s test is used to determine the relationship. e, The Shannon diversity score per lung adenocarcinoma tumour region (n = 137) is plotted by immune classification, as determined solely by pathology TIL estimates. A two-sided Wilcoxon rank-sum test is used for comparison. f, A comparison of the observed:expected immunoediting score between lung adenocarcinoma and lung squamous cell carcinoma tumours (n = 92) is shown. A two-sided Wilcoxon rank-sum test is used for comparison. g, The observed:expected immunoediting score is shown by number of unique HLAs present in the tumour (patients heterozygous at HLA-A, HLA-B and HLA-C will have six unique HLA alleles) (n = 90). For all box plots, the minimum and maximum are indicated by the extreme points of the plot; the median is indicated by a thick horizontal line; and the first and third quartiles are indicated by box edges. h, The odds ratio and 95% confidence interval of transcriptional neoantigen depletion is shown for strongly binding neoantigens, calculated with Fisher’s exact test. Values <1 indicate that putative neoantigens are less likely to be expressed, as compared to non-synonymous mutations that are not putative neoantigens. Tumours are broken down by HLA LOH status and their immune classification. i, The enrichment for neoantigens and strongly binding neoantigens to occur in non-expressed genes (as compared to non-synonymous non-neoantigens) is shown, calculated with Fisher’s exact test. No corrections were made for multiple comparisons.

### Extended Data Fig. 7 Components of immune-evasion mechanisms in NSCLC.

a, Each of the potential immune-evasion mechanisms explored in Fig. 4 are shown, divided into their component genes. Patients are split according to their immune-evasion capacity status. Copy-number losses are shown in blue, and mutations are shown in green. b, A schematic of how LOH of the HLA-C locus in HLA-C1 and HLA-C2 heterozygous tumours may lead to destruction mediated by natural killer cells is shown. c, The level of natural killer cell infiltration divided by the total TIL estimate (using the method of Danaher et al.11) is shown for tumour regions with (n = 45) and without (n = 90) HLA-C LOH, according to their HLA-C1 and HLC-C2 heterozygosity status. A two-sided Wilcoxon rank-sum test is used for comparison.

### Extended Data Fig. 8 Relationship between clonal neoantigen burden, immune infiltration and patient prognosis.

a, c, e, Kaplan–Meier curves are shown for lung adenocarcinoma and lung squamous cell carcinoma. The curves are split on the basis of the upper quartile of clonal neoantigen burden (a), on the upper quartile of subclonal neoantigen burden (c) and on the upper quartile of total neoantigen burden (e). For all survival curves, the number of patients in each group for every time point is indicated below the time point, and significance is determined using a log-rank test. b, d, The hazard ratio is shown for each threshold value of clonal neoantigen (b) and subclonal neoantigen (d) load, which indicates that a high clonal neoantigen burden remains significantly prognostic across a wide range of thresholds. Significant associations are indicated in red; non-significant associations are plotted in black. f, Clonal neoantigen load and immune-infiltration classification are incorporated in a multivariate analysis: this combination is more significant as a predictor of prognosis than either metric individually. Other tumour and clinical characteristics are also controlled for in the multivariate analysis. Hazard ratios of each variable with a 95% confidence interval are shown on the horizontal axis. Significance is calculated using a Cox proportional hazards model. All statistical tests were two-sided.

## Supplementary information

### Supplementary Information

This file contains the full list of names of The TRACERx consortium members and affiliations.

### Supplementary Tables

This file contains Supplementary Tables S1-S4 and their guide.

## Rights and permissions

Reprints and Permissions

Rosenthal, R., Cadieux, E.L., Salgado, R. et al. Neoantigen-directed immune escape in lung cancer evolution. Nature 567, 479–485 (2019). https://doi.org/10.1038/s41586-019-1032-7

• Accepted:

• Published:

• Issue Date:

• ### Cancer evolution: A means by which tumors evade treatment

• Xiao Zhu
• , Shi Li
• , Bairui Xu
•  & Hui Luo

Biomedicine & Pharmacotherapy (2021)

• ### The head and neck cancer genome in the era of immunotherapy

• N. Ari Wijetunga
• , Yao Yu
• , Luc G. Morris
• , Nancy Lee

Oral Oncology (2021)

• ### Using deep learning to predict anti-PD-1 response in melanoma and lung cancer patients from histopathology images

• Jing Hu
• , Chuanliang Cui
• , Wenxian Yang
• , Lihong Huang
• , Rongshan Yu
• , Siyang Liu
•  & Yan Kong

Translational Oncology (2021)

• ### Intratumor Heterogeneity in Early Lung Adenocarcinoma

• Maria-Fernanda Senosain
•  & Pierre P. Massion

Frontiers in Oncology (2020)

• ### The microcosmos of intratumor heterogeneity: the space-time of cancer evolution

• Michalina Janiszewska

Oncogene (2020)