Neoantigen quality predicts immunoediting in survivors of pancreatic cancer

Cancer immunoediting1 is a hallmark of cancer2 that predicts that lymphocytes kill more immunogenic cancer cells to cause less immunogenic clones to dominate a population. Although proven in mice1,3, whether immunoediting occurs naturally in human cancers remains unclear. Here, to address this, we investigate how 70 human pancreatic cancers evolved over 10 years. We find that, despite having more time to accumulate mutations, rare long-term survivors of pancreatic cancer who have stronger T cell activity in primary tumours develop genetically less heterogeneous recurrent tumours with fewer immunogenic mutations (neoantigens). To quantify whether immunoediting underlies these observations, we infer that a neoantigen is immunogenic (high-quality) by two features—‘non-selfness’ based on neoantigen similarity to known antigens4,5, and ‘selfness’ based on the antigenic distance required for a neoantigen to differentially bind to the MHC or activate a T cell compared with its wild-type peptide. Using these features, we estimate cancer clone fitness as the aggregate cost of T cells recognizing high-quality neoantigens offset by gains from oncogenic mutations. With this model, we predict the clonal evolution of tumours to reveal that long-term survivors of pancreatic cancer develop recurrent tumours with fewer high-quality neoantigens. Thus, we submit evidence that that the human immune system naturally edits neoantigens. Furthermore, we present a model to predict how immune pressure induces cancer cell populations to evolve over time. More broadly, our results argue that the immune system fundamentally surveils host genetic changes to suppress cancer.


Patient Samples
We collected matched primary and recurrent PDACs through surgical resection at Memorial Sloan Kettering Cancer Center (MSK) (n = 5/9 LTS), and the Garvan Institute of Medical Research (n = 1/9 LTS) (Supplementary Table 1).Additional matched primary and recurrent PDACs were previously obtained through the Gastrointestinal Cancer Rapid Medical Donation Program at The Johns Hopkins Hospital (JHH) (n = 3/9 LTS, 6/6 STS) and have been previously described 19 (Supplementary Table 1) under Institutional Review Board-approved study protocols.Cohorts of primary only PDAC were collected at MSK (MSK primary PDAC cohort), the International Cancer Genome Consortium (ICGC primary PDAC cohort), and The Cancer Genome Atlas (TCGA) through surgical resection as previously described. 5,33 e obtained informed consent from all patients.We performed the study in strict compliance with all institutional ethical regulations and institutional review boards.All tumor samples were PDACs.We excluded adenocarcinomas in cystic pancreatic neoplasms and neuroendocrine tumors.We defined LTS and STS consistent with our previous work. 5We identified all tumors through histopathologic evaluation following surgery or at autopsy.We conservatively estimated that patients had 100 recurrent tumors when the number of tumors were too numerous to count.

Nucleic acid extraction, whole exome sequencing and mutation identification
We previously described methods to extract DNA and sequence samples collected at MSK, Garvan Medical Center, 5 and JHH. 19All samples from MSK, Garvan, and JHH were examined by an expert GI pathologist and confirmed to have at least 20% neoplastic cellularity and preserved tissue quality.We macrodissected samples meeting these criteria from serial unstained sections, and extracted genomic DNA as previously described. 5,19 0 ng of genomic DNA was then fragmented to a target size of 150-200 bp on a LE220 ultrasonicator (Covaris).Barcoded libraries (Kapa Biosystems) were subjected to exon capture by hybridization using the SureSelect Human All Exon 51MB V3 (JHH samples) or V4 (all other samples) kits (Agilent).DNA libraries were subsequently sequenced on a HiSeq 2500 (JHH samples) or 4000 (all other samples) (Illumina) in paired end 100/100 reads, using the TruSeq SBS Kit v3 (Illumina) with target coverage of 150-250X for tumor samples and 70X for matched normal.Sequence data were demultiplexed using Illumina CASAVA software.After removal of adapter sequences using BIC, 34 reads were aligned to the reference human genome (hg19) using the Burrows-Wheeler Alignment tool (bwa mem v0.7.17) and samtools (v1.6).Duplicates were marked with picard-2.11.0 MarkDuplicates (http://broadinstitute.github.io/picard).Indel realignments were done with the Genome Analysis toolkit (GenomeAnalysisTK-3.8-1-0-gf15c1c3ef) RealignerTargetCreator and IndelRealigner 35 using 1000 genome phase1 indel (1000G_pha-se1.indels.b37.vcf) and Mills indel calls (Mills_and_1000G_gold_standard.indels.b37.vcf) as references.Base calls were recalibrated with BaseRecalibrator 35 and dbSNP version 138.Average unique sequence coverages of 207X, 152X, 212X, and 221X were achieved for STS primary, LTS primary, STS recurrent, and LTS recurrent tumor samples respectively, and 88X and 84X were achieved for STS normal and LTS normal samples, respectively.MuTect 1.1.7 35and Strelka 1.0.15 36were used to call SNVs and indels on pre-processed sequencing data.For the MuTect calls, dbSNP 138 and CosmicCodingMuts.vcfversion 86 37 were used as reference files.For the Strelka calls, we set "isSkipDepthFilters = 1" to prevent filtering-out of mutation calls from exome sequencing due to exome-sequencing mapping breadth.Unbiased normal/tumor read counts for each SNV and indel call were then assigned with the bam-readcount software 0.8.0 (https://github.com/genome/bam-readcount).A minimum base quality filter was set with the "-b 15" flag.The reads were counted in an insertion-centric way with the "-i" flag, so that reads overlapping with insertions were not included in the per-base read counts.We then use the normal/tumor read counts to filter mutations.Filtering criteria are 1) total coverage for tumor ≥10, 2) variant allele frequency (VAF) for tumor ≥4%, 3) number of reads with alternative allele ≥ 9 for tumor, 4) total coverage for normal ≥7, and 5) VAF for normal ≤1% at a given mutation.These filters exist for all mutations except for mutations in the KRAS gene.
In order to avoid missing possible KRAS driver mutations, common mutations in KRAS known to be pathogenic were manually curated if the sample did not already contain a KRAS mutation.Using IGV Viewer 2.4.19, 38we inspected the top ten coding mutations in KRAS denoted in the Genomic Data Commons Database 39 at the following positions on chromosome 12; bases 25378562(C>T), 25380275(T>G), 25398281(C>T), 25398282(C>A), 25398284(C>A, T, or G), and 25398285(C>A, T, or G).One mutation was selected at most based on the number of reads containing the alternative allele at the position.In total, six additional KRAS mutations were added for the STS samples, and six were added to the LTS samples.

HLA typing
HLA-I typing for PDAC patients was performed in silico with the OptiType version 1.3.3tool 40 using non-tumor sequencing reads.

Neoantigen prediction
Putative neoantigens were identified in silico.In brief, all wild-type (WT) and mutant genomic sequences corresponding to coding mutations were translated to an amino acid sequence consistent with the GRCh37 reference genome (GRCh37.75)using the snpEff.v4.3t software 41 with options set as "-noStats -strict -hgvs1LetterAa -hgvs -canon -fastaProt [fasta file name]".Only annotations without "WARNING" or "ERROR" were kept and the most deleterious missense mutation was prioritized in mapping a genomic mutation to a gene.Missense mutations were centrally located in a peptide of up to 17 amino acids long, which depended on the location of the missense mutation within a protein.This corresponded to nine 9-mers in a left-to-right sliding fashion, each containing the mutant amino acid in a different position.Predictions of MHC class-I binding for both the WT peptide (p WT ) and mutant peptide (p MT ) were estimated using the NetMHC 3.4 software 42,43 with patient-specific HLA-I types.All p MT s with predicted IC 50 affinities below 500 nM to a patient-specific HLA-I type were defined as neoantigens.

TCR cloning, transduction, and peptide stimulation
We constructed TCR fragments as previously described. 46Briefly, we fused epitope specific TRB V-D-J and TRA V-J sequences to mouse constant TRB and TRA chain sequences respectively (kind gift of Alena Gros), to prevent mispairing of transduced TCRs with the endogenous TCRs. 47We used modified mouse constant regions to further improve pairing and increase cell surface TCR expression as previously described (mTCR). 46We joined the TRB and TRA chains with a furin SGSG P2A linker, cloned the TCR constructs into an SFG γ-retroviral vector, 48 and sequence-verified all plasmids (Genewiz).We transfected retrovirus vectors into H29 cells (gpg29 fibroblasts) using calcium phosphate to produce VSV-G pseudo-typed retroviruses. 44We next used Polybrene (Sigma) and viral-containing supernatants to generate stable RD114-enveloped producer cell lines. 45We collected and concentrated virus-containing supernatants using Retro-X ™ Concentrator (Takara).For T cell transductions, we coated non-tissue culture treated 6-well plates with Retronectin (Takara) as per the manufacturer's protocol.We plated a titrated viral quantity to 3×10 6 activated T cells per well, and centrifuged cells for 1 hour at room temperature at 300g, and used transduced T cells either between day 7-14 post transduction or cryopreserved them for future use.We used mock-transduced T cells as controls.
To stimulate transduced T cells with peptide, we used T2 cells as antigen presenting cells (APCs).All peptide panels were commercially custom synthesized by Genscript (Piscataway, NJ), and were >95% pure.Briefly, we pulsed 5×10 4 T2 cells per well in a 96-well U-bottom plate for 1 hour at 37 • C with the indicated peptide at the indicated concentrations.After 1 hour, we washed out the peptide by centrifugation, and added 5×10 4 TCR-transduced T cells per well.We used an equivalent number of mock transduced total T cells, and irrelevant peptides as controls.We measured CD137 (4-1BB) expression on CD8 + mTCR + T cells 24 hours later.
We stained cells using antibody cocktails in the dark at 4 • C according to manufacturer's instructions, washed, and analyzed on a FACS LSR Fortessa (BD Biosciences).Flow cytometric data were collected using FACSDiva (BD Biosciences, version 8.0.1).We diluted reagents according to manufacturer's instructions.We used Flowjo (version 10, Tree Star) to perform our analyses.

Neoantigen quality model: Q
We consider 9-amino-acid long peptides containing a single point mutation as potential neoantigens if they are a predicted binder to a patient specific HLA allele.For a given neoantigen with a peptide sequence p MT (with corresponding wildtype peptide sequence denoted as p WT ), and an HLA allele h, we define the neoantigen quality Q as: The non-self recognition component R quantifies the recognition probability of peptide p MT by comparison to validated non-self epitopes from infectious diseases and it has been introduced previously. 4,5 he self discrimination component, D, expands on the previously proposed measure based only on MHC-presentation 4, 5 by including a new measure of cross-reactivity distance, C, between p MT and p WT .Below we briefly describe R and derive D.

Non-selfness: R
To estimate the "non-selfness" of a peptide we calculate the similarity of a given peptide to epitopes which have been previously recognized as non-self.To do so we estimate a recognition probability R of a peptide with sequence p based on its sequence similarity to a dataset of recognizable epitopes e.Here similarity is measured using a gapless alignment with BLOSUM62, though this can be easily generalized.We use a thermodynamically motivated model, 4, 5 where Z(p, a, k) = 1 + e exp (−k(a − |p, e|)) is the normalization constant, |p, e| is a local alignment score between p and e, and free parameters a and k represent the horizontal displacement of the binding curve and the slope of the curve.In our case, recognizable epitopes come from the Immune Epitope Database (IEDB), 49 and we restrict our search to all human infectious disease class-I restricted targets with positive immune assays.
As the peptides in IEDB can change over time, we use the current version of IEDB and list the positive epitopes used (Supplementary Table 3).The parameters of the model are set to optimize the separation of survival curves (detailed description of parameter training is included in later sections).To find the set of IEDB epitope sequences with sequence similarity to neoantigens in our cohort, we used the blastp algorithm with the BLOSUM62 matrix (gap opening penalty=-11, gap extention penalty=-1).We calculated alignment scores with the Biopython Bio.pairwise2 package (http://biopython.org)for all alignments identified with blastp.

Cross-reactivity distance: C
To model a cross-reactivity distance we measure and analyze TCR-pMHC avidity curves, by which we mean activation of a monoclonal T cell population as a function of pulsed exogenous peptide concentration.We define the cross-reactivity distance C between the two peptides as ratio of the EC 50 of the two avidity curves to a T cell clone that is specific to at least one of the peptides.If the EC 50 shift between the two avidity curves is small this reflects that the TCR is specific and highly cross-reactive to both peptides; consequently the peptides have a low cross-reactivity distance C. The reverse, a large shift in the EC 50 , indicates a lack in reactivity against one of the peptides, low cross-reactivity, and thus a large cross-reactivity distance C. Formally, this quantity could depend on the TCR and the HLA allele, however we fit a minimal model for peptides that are one amino acid substitution from each other with the intention of extracting coarse grain features that are sufficiently robust for our application.

Fitting avidity curves
To fit our model we measured avidity curves (Figure 3c, Extended Data Figure 4c-e) corresponding to seven different peptide-TCR combinations: 3 TCRs specific to a CMV epitope (NLVPMVATV), 3 TCRs specific to a gp100 epitope (IMDQVPFSV), and 1 TCR specific to a neoantigen arising from a mutation in the RHBDF2 gene (GRLKALCQR).For each peptide-TCR combination we measured the avidity curves for the wildtype peptide along with all 171 peptides one amino acid substitution away from the wildtype peptide (1204 total TCR-pMHC combinations)(Extended Data Figure 5a-c).For each peptide we extract the EC 50 from the TCR-pMHC reactivity curve by fitting a generalized Hill function: This function has 3 parameters: the maximum amplitude V ∞ , the cooperativity n, and, the term to be inferred, the EC 50 .As we have 3 concentration points for each peptide, regularization is key to a robust fit of these curves.To motivate the regularization, we use the priors that V ∞ ≈ 1 (i.e. at infinite peptide concentration, TCR reactivity approaches 100%) and n ≈ 1 (cooperativity of 1 is for the case of simple binding reactions of 2 molecules).Finally, we enforce a slight regularization on the EC 50 if it extends outside of the measured concentration region.We use an L2 cost and regularization to fit, yielding a cost function to minimize: ).The regularization constants used were r V = 0.01, r n = 0.01, and r EC 50 = 0.001.Parameters were then fit using standard least squares.The inferred EC 50 's were further clipped to the range of 10 −4 µg/mL to 10 4 µg/mL.

Cross-reactivity distance model
To model the effect of a single amino acid substitution on an avidity curve's EC 50 , we assume that there is a position independent amino acid substitution matrix M that is rescaled by a position dependent factor d i .Together this yields a model of the following form for the cross-reactivity distance C between two peptides, p A and p B , which differ only by a single mismatched amino acid in position i: This form of the model for C has more parameters than can be reliably be inferred from our experimentally measured TCR-pMHC avidity curves -the distance weight d i has 9 parameters, and the substitution matrix M has 380 free parameters (190 if we assume a symmetric matrix).
To ameliorate this problem, we implement two modifications to reduce the effective number of parameters -first, we embed the 20 amino acids into a bounded 2D region (a 20×20 square) and define the values of the substitution matrix M as the Euclidean distance between the positions of each embedded amino acid.This reduces the number of free parameters for M from 190 to 40 and allows for clear visualization of amino acid clustering.
Second, we introduce the BLOSUM62 substitution matrix as a prior (we find a model inference performed without this assumption shows that the inferred substitution matrix correlates significantly to BLOSUM62).We define a cost function that includes not only the differences between the measured and modeled distances but also a regularization term that reflects how well a linear transformation of the BLOSUM62 matrix matches the inferred substitution matrix (we exclude the diagonal terms from this fit as those terms are not fit under the model).The full expression is: where we sum over pairs of measured peptides {p A , p B } and RSS bl62 is the sum of the square residuals of the optimal linear regression between M and BLOSUM62.
We used a value of r bl62 = 0.01 for the constant that controls the relative weighting of the fit to the measured data or the fit to BLOSUM62.We then use the dual annealing method to minimize the cost function and fit the model parameters.
This model is inferred using the measured log distance between the EC 50 of two peptides to the same TCR.We restrict the peptide pairs we consider in our inference by requiring both that the peptides are a single amino acid substitution away from each other and that there is some minimal reactivity to at least one of the peptides to the associated TCR.We set this reactivity threshold as the criteria that the EC 50 of at least one of the peptides must be less than 0.1µg/mL.This criteria will include all pairs that include one of the wildtype peptides (NLVPMVATV, IMDQVPFSV, or GRLKALCQR), but may also include pairs of mutants that have substitutions in the same position in the wildtype (e.g.NLVMMVATV and NLVKMVATV).Including these additional combinations allows us to more accurately resolve amino acid substitutions not observed from the original 3 wildtype peptides.

Cross-reactivity model validation
To validate the cross reactivity model C, we inferred a model using peptide pairs only from the NLV and gp100 TCRs (6 TCRs in total) (Extended Data Figure 7a, b) and predicted on the remaining RHBDF2 neoantigen peptide pairs (with the same minimum reactivity restrictions as described above for the inference).The NLV and gp100 peptides are presented on HLA-A02:01 whereas the RHBDF2 neoantigen is predicted to be presented on HLA-B27:05, so the validation dataset stems from not only a different wildtype peptide-TCR combination, but also a wholly different HLA allele.We find that the model learned on the NLV and gp100 TCRs provides highly significant predictive power for the peptide pairs from the RHBDF2 neoantigen (Figure 3f).

Self discrimination: D
The self discrimination component quantifies how easily p MT and p WT can be distinguished from each other as a result of negative selection, and is a sum of terms relating to both the MHC presentation and our experimentally derived cross-reactivity distance C. For a given HLA allele h, we calculate the peptide-MHC-I dissociation constants, 50 ) for both peptides.We consider the relative MHC dissociation constants between p MT and its p WT counterpart, as the ratio of their inferred MHC-I binding affinities.
We define the combined self discrimination D of a neoantigen as: Each term represents an affinity difference, or discrimination energy, between p MT and p WT either for MHC presentation or for T cell activation.The self discrimination D therefore increases if either the underlying mutation leads to an increased presentation probability, or if it results in a peptide not cross-reactive with the wildtype and thus recognized by a collection of TCRs distinct from those that recognize the wildtype peptide.Parameter w ∈ [0, 1], sets the relative weight between the two terms: MHC presentation and T cell activation.

Amino acid clustering and subsequent ordering
The dendogram and the amino acid ordering in Figure 3g were computed by unsupervised agglomerative clustering using the sklearn package with 0 distance thresholding (sklearn.cluster.AgglomerativeClustering). The distances used for the clustering were the Euclidean distances arising from the 2D embedding of the amino acids in Figure 3g.

Mutation and neoantigen distributions
The substitution frequency scatter plots (Figure 3h) are generated by determining all nonsynonymous mutations.We determine the corresponding amino acid substitution frequencies by binning the mutation substitutions (e.g.leucine to isoleucine, L→I) for each nonsynonymous mutation and then normalizing by the total number.Each substitution has a particular score from our inferred amino acid substitution matrix M, which we use as the x-axis.The linear fits are done using least squares regression and Pearson correlations are computed.Unseen amino acid substitutions (generally arising from requiring at least 2 nucleotide mutations) are excluded from the analysis.
The cumulative probability distributions (Figure 3i) are computed by determining the total fraction of neoantigens in the defined cohort that have a C or D larger than and or equal to the value on the x-axis.

Clonal structure of tumors
Tumor clones are reconstructed using the PhyloWGS algorithm. 28We use multisample reconstruction, combining all primary and recurrent samples from a given patient.The algorithm returns a family of 10000 trees, each associated with a likelihood, (T i , L i ).When appropriate, our tree-based statistics for a tumor sample will be reported as averages over the top scoring trees, with weight of the i tree defined as w i = L i / 5 j=1 L j , the averaging operator will be denoted as .T .We empirically checked that the the full set of results are consistent with those that use only the 5 top scoring trees.
A given tree T provides a common clone topology for all samples in the patient, with clone definitions informed by clustering of mutations across all the samples.For a given clone α ∈ T , the algorithm estimates the maximum likelihood clone frequency, X α ≡ X α (T ), which is equivalent to the cellular cancer fraction (CCF) associated with that clone in that sample.We refer to these frequencies as the inclusive clone frequencies.Based on the clone definitions, we additionally define the exclusive clone frequencies, Here D (α) is the set of clones that are direct descendants of clone α, as defined by the tree T .By this definition, x α is a probability distribution, with α x α = 1.We denote by x the ensemble of cluster size distributions for each of the phylogenies for a given tumor sample.
Genetic heterogeneity of tumor samples.We compute the heterogeneity of a tumor sample as the entropy of the clone frequency distribution, A higher entropy indicates a more diverse and less clonal tumor composition.
Distance between time points.The amount of evolution between the paired primary and recurrent tumor samples can be computed as the Kullback-Leibler divergence, which quantifies the amount of changes between the clones sizes between time points, To account for predictable evolution, i.e. concerning the fate of the clones present in the primary tumor, we disregard all clones α with inclusive clone frequency X α < 0.03; we observed that such clones are more likely to contain mutations with unobserved reads in the primary tumor and are introduced to the topology by the reconstruction algorithm by support of mutations in the recurrent tumors.
We define the clone frequencies of these shared clones shared between primary and recurrent tumors as: where Z = α∈T :X α prim ≥0.03 x α is the normalization constant.

Fitness model for tumor clones
A fitness model is used to quantify the growth rates of clones.Here we propose a two-component model, which accounts for balancing selective pressures.
Immune fitness component.We quantify the negative selection on tumor clones imposed by the T cell recognition based on the neoantigen quality model as defined in eq. ( 1) where N (α) is the set of neoantigens in clone α and their associated HLA alleles.
Driver gene component.We quantify selective advantage due to mutations in the recognized PDAC oncogenes, O = {KRAS, TP53, CDKN2A, SMAD4} by awarding each mutation from one of these genes, where G(α) is the set of genes mutated in clone α (including genes mutated in clones ancestral to α).
Combined fitness model.To account for negative selection due to immune recognition and positive selection of mutated oncogenes, we define an additive combined fitness model as where σ I ≥ 0 and σ P ≥ 0 are weights assigning the amplitude to their respective fitness components; they also determine the total amplitude of selection described by the fitness model.Constant F 0 ≡ F 0 (T , σ I , σ P ) is a tree specific and clone independent constant determined by the normalization of clone frequencies for a given tumor sample and tree T , giving

Tumor immune cost
For a given tumor, we compute its total immune fitness cost as the negative average of fitness, over clones in a given tree, and over the possible reconstructed trees, To evaluate the fitness of the new clones in the recurrent tumors, we compute an analogous average only over the clones that were not present in the primary tumor.As before, we use a 3% threshold on clone frequency,

Recurrent tumor clone composition predictions
For each primary and recurrent tumor pair, we predict the distribution of clone sizes in the recurrent tumor by fitness model projections from the primary tumor.In our model we combine the probability that a given clone in the primary tumor seeds a recurrence, together with a selective pressure as given by the fitness model.For a given clone α with a fitness F α , the predicted exclusive clone frequency is and the inclusive frequency is where β iterates over all subclones of α (T α being a subtree of the tumor clone tree, rooted at clone α).We restrict predictions to clones that have been observed in the primary tumor, and we will use the shared clone frequency distributions as defined in eq. ( 11), both for the primary and recurrent tumors.

Neoantigen quality model fitting and model selection
The free parameters of the neoantigen quality model, Θ = {a, k, w}, are trained on an independent cohort of 58 pancreatic cancer patients, 5 to optimize survival analysis log-rank score. 4,5 his cohort comprises samples from short and long term survivors and we have previously shown that the long-term survivors are likely to have increased immune activity in their tumors.We use our fitness model for tumor clones (14) to predict tumor growth in the pancreatic cancer patients.For each patient sample in the cohort we compute the predicted tumor population size.To limit the number of parameters, we fixed the slope parameter of the R component, k = 1. 5 The survival analysis is performed by splitting the patient cohort by the median value of n(σ I , σ P , Θ) into high and low fitness groups and evaluation of the log-rank score, S(σ I , σ P , Θ) (multivariate log rank test from python package lifelines.statistics).
We computed the optimal parameters σI , σP , Θ as an average (σ I , σ P , Θ) ρ over ρ, the probability distribution defined by the log-rank test score landscape for the cohort, with Z ρ the normalization constant assuring ρ is a probability distribution over the parameters with significant score, {(σ I , σ P , Θ) : p(S(σ I , σ P , Θ)) < 0.01}(p-values are computed with χ 2 test).This smoothing procedure is applied to select optimal parameters while preventing over-fitting on a potentially rugged score landscape.If no choice of parameters meets the significance threshold, we average the parameters that have the maximum observed value of the score.
The optimal value of parameter a, the midpoint of the logistic binding function R, is at a = 22.9 and the relative weight parameter for the two terms in component D eq. ( 7) is w = 0.22 (Extended Data Table 1).These are the parameter values we use to compute neoantigen qualities in the recurrent tumors cohort used in this study.

Model selection
Along with parameter training we performed a model selection effort to justify that all components of the fitness and neoantigen quality models are informative.We considered a variety of partial models and repeated the parameter training procedure via maximization of the the log-rank test score.We considered clone fitness model of single components only, namely the driver gene component-and the immune component-only, Further we decomposed the immune fitness component by considering various variants of the neoantigen quality model: To compare the performance of the models of different complexities (number of fitted parameters), we computed the BIC 34 and AIC values (Extended Data Table 1).According to these criteria, the best performing model is our full clone fitness model with both the driver gene-and immune components, and the full neoantigen quality model.

Fitness model fitting and model selection
For a given pair of primary-recurrent tumor samples from a given patient, we fit the fitness model parameters, σ I , σ P , to minimize the Kullback-Leibler divergence between the predicted clone composition and the observed clone composition of the recurrent tumor sample, The likelihood that the observed distributions are samples of populations with the predicted clone frequencies takes the form (by Sanov's theorem 51 ) where n is a factor standing for the effective population size of the cells in the recurrent tumor sample, from which the clone frequencies were inferred.The effective population size reflects the sampling error and our ability to correctly estimate clone frequencies from the bulk sequencing data.It depends on multiple factors, such as the sequencing depth, the purity of the sample, and the phylogeny reconstruction algorithm.We estimate the effective population size for each sample, as described in a following section.
We evaluate the likelihood L 0 under the null model of neutral evolution, which assigns fitness values F α N = 0 to all clones and predicts clade frequencies xrec = xprim .The likelihood of the data under this model is, analogously to (27), given by To compare models of varying complexity, we compute the Bayesian Information Criterion (BIC), 34 where Θ is the set of optimized parameters, |Θ| = 2, arriving at an adjusted log-likelihood, To assess the predictive power of individual fitness model components, we consider partial models and their corresponding optimized likelihoods: the immune component only model F α I (σ I ) ≡ F α (σ I , σ P = 0), with likelihood L I ; and the driver gene component only model F α P (σ P ) ≡ F α (σ I = 0, σ P ) with likelihood L P .Each of these models has one free parameter, we apply the BIC-based correction (30) to compute the adjusted log-likelihoods log(L adj I ) = log(L I ) − log(n)/2 and log(L adj P ) = log(L P ) − log(n)/2.In general, to compare fitting of alternative models F 1 and F 2 on a cohort S, for each sample s in the cohort we compute the log-likelhood score, ∆ (s, F 1 , F 2 ) = log(L adj 1 (s)) − log(L adj 2 (s)).We also evaluate the aggregated score over samples in cohort S, where s iterates over samples in the cohort.Positive scores favor model L 1 over model L 2 .

Effective cancer cell population size n
To account for sampling error which affects clone frequency inference, we estimate the error of mutation frequencies for each of the tumor samples in our data.We evaluate frequencies for each mutation m in a given sample s, with the frequencies from the individual trees T , given by x(m) = X α where m originates in clone α ∈ T .The variance is computed over the 5 trees reconstructed for that sample, σ 2 (x m ) = x(m) 2 T − x(m) 2 T .The effective size n for a given sample scales proportionally with the inverse of variance, giving our estimate of n as For the patients with multiple samples the variance of mutation frequencies from tree reconstruction is reduced due to information from other samples; to account for this effect we divide n(s) by the total number of additional samples.The estimated effective cancer population sizes vary from 79.79 to 1189.60, with a mean of 187.23 and median 244.6.

Clone fitness model selection
We compute the log-likelihood score for the alternative models for each recurrent tumor sample (Extended Data Table 1).In particular, we observe that 19 out of 22 LTS samples (86%) and 17 our of 33 (52%) are better described by the model with selection, F , rather than the null model, F N ≡ 0 (giving ∆ (s, F, F N ) > 0).Evaluating the aggregated log-likelihood score on the LTS and STS cohorts, we observe evidence for the model with selection in both cohorts, ∆L LTS (F, F N ) = 1241 nats and ∆L STS (F, F N ) = 198 nats , with a mean of 56.42 nats and 6.01 nats respectively.The fit, and therefore the predictive power of model F , is relatively stronger in the LTS cohort.
We assess that the oncogenic selection, described by the driver gene component model F P , provides predictive signal on its own, with

Accuracy of clone growth fitting
We consider the observed and model fitted clone frequency changes, X α rec /X α prim and Xα rec /X α prim across all clones in all tumors in the cohorts.For a given cohort we define the accuracy of a model as the fraction of clones for which the direction of change is the same, i.e. (X α rec /X α prim > 1 and Xα rec /X α prim > 1) or (X α rec /X α prim < 1 and Xα rec /X α prim < 1) or (X α rec /X α prim = 1 and Xα rec /X α prim = 1).We consider all clones with frequency larger than 0.03 in the primary tumor, from the top scoring trees for each patient.We obtain accuracy of 71% over 243 clones in the LTS cohort, and 58% over 389 clones in the STS cohort.The Pearson correlation coefficients are r LTS = 0.57 and r STS = 0.35 (as computed on log-transformed frequency changes, log X α rec /X α prim and log Xα rec /X α prim ) and Spearman rank coefficients are ρ LTS = 0.65 and ρ STS = 0.28 (Extended Data Table 1b).

TCR beta (TCRB) sequencing
We extracted genomic DNA from n = 23 primary and recurrent STS and LTS PDACs according to the manufacturer's instructions (QIAsymphony, Qiagen).We verified the quantity and quality of extracted DNA before sequencing.We then used a standard quantity of input DNA, amplified and sequenced the CDR3β regions using the survey multiplexed PCR ImmunoSeq assay (Adaptive Biotechnologies).The ImmunoSeq platform combines multiplex PCR with high-throughput sequencing to selectively amplify the rearranged complementarity-determining region 3 (CDR3β) of the TCR, producing fragments sufficiently long to identify the VDJ region spanning each unique CDR3β.After correcting for sequencing coverage, PCR bias, primer bias, and sequencing errors, we define a T cell clone as a T cell with a unique TCRB CDR3β amino acid sequence.

Dissimilarity index to estimate antigen-specificity in a T cell repertoire
To estimate the antigen-specificity of a T cell repertoire, for each repertoire, we apply a sequence based probabilistic model called a Restricted Boltzmann Machine (RBM). 18The RBM model assigns a probabilistic score of an antigen specific response to each T cell clone in a sample, based on the frequency and the CDR3β sequence similarity of the top 25 ranking clones.Based on these RBM scores for each clone, we estimate for each repertoire, a TCR dissimilarity index DI = 1 f where: where T is the total number of terms in the sum (T = M (M −1)/2), M = 25 (the top 25 clones in the repertoire with the highest RBM scores), and d(σ i , σ j ) is a distance obtained from the global pairwise alignment score between the CDR3β amino acid sequences σ i and σ j .This score is computed using the BLOSUM62 matrix corrected with an offset such that all its weights are positive, −S(A, B) + max A,B (S(A, B)) ≥ 0, where S(A, B) are the usual BLOSUM62 matrix elements.The parameter δ represents a typical scale of the BLOSUM-weighted distance d and it is set to δ=9.37, the average distance d between reported epitope-specific CDR3β sequences (we use an influenza-specific repertoire 21 ).As a control, we calculate this TCR dissimilarity index between the top 25 clones in the repertoire based on clone size (TCR dissimilarity index − clone size), and not the RBM computed probability (Extended Data Figure 1c).
To verify that the difference in the TCR dissimilarity index between LTSs and STSs is robust, we randomly subsample the repertoire down to a few hundred clones and repeat the RBM training, score assignment, and TCR dissimilarity index estimation 10 times (Extended Data Figure 1b).

Statistics
Survival curves were compared using log-rank test (Mantel-Cox).Comparison between two groups was performed using unpaired two-tailed Mann-Whitney test, or Wald's test for gene expression analyses to correct for multiple comparison testing.Correlation between two variables was performed using two-tailed Pearson correlation.Categorical variables were compared using chi-square test.Probability distributions were compared using two-sided Kolmogorov-Smirnov (KS) test.All comparison groups had equivalent variances.P < 0.05 was considered to be statistically significant.Data analysis was performed using statistical software (Prism 7.0, GraphPad Software v.9.1.0and Python v.3.4).
r EC 50 d(EC 50 , [0.01, 100]) 2 .(4) Where d(EC 50 , [0.01, 100]) indicates the log distance to the measured concentration range of [0.01µg/mL,100µg/mL] (i.e.d(EC 50 , [0.01, 100]) = max(0, log(EC 50 ) − log(100), log(0.01)− log(EC 50 ) The immune component F I on its own has less predictive power, with positive log-likelihood score ∆ (s, F I , F N ) for 11 out of 22 LTS samples (50%) and only 4 out of 33 STS samples (12%).The aggregated log-likelihood score values are ∆L LTS (F I , F N ) = 579 nats (mean of 26.31 nats) and ∆L STS (F I , F N ) = 91 nats (mean of 2.75 nats).The full model with both components provides an improvement to the driver-gene component-only model for 11 out of 22 LTS samples (50%) but only 5 out of 33 STS samples (15%), as quantified by ∆ (s, F, F P ) > 0. The aggregated log-likelihood score on the LTS and STS cohorts are ∆L LTS (F, F P ) = 200 nats and ∆L STS (F, F P ) = −25 nats respectively.This results means that inclusion of the immune component directly improves prediction of clone dynamics in the LTS cohort, but it does not for the STS cohort.All these results are reported in Extended Data Table1band in Extended Data Figure9.
14 out of 22 LTS samples (64%) and 13 out of 33 STS samples (39%) having positive log-likelihood score, ∆ (s, F P , F N ) > 0. Evaluating the aggregated log-likelihood score on the LTS and STS cohorts, we observe evidence for the model with oncogenic selection in both cohorts, ∆L LTS (F P , F N ) = 1041 nats and ∆L STS (F P , F N ) = 223 nats, with a mean of 47.34 nats and 6.77 nats respectively.Again, the fit of the partial model is stronger in the LTS cohort.This effect could be explained indirectly by the negative immune selection, which reduces tumor heterogeneity and facilitates clonal composition predictions in the LTS cohort.