Evolution of Barrett’s esophagus through space and time at single-crypt and whole-biopsy levels

The low risk of progression of Barrett’s esophagus (BE) to esophageal adenocarcinoma can lead to over-diagnosis and over-treatment of BE patients. This may be addressed through a better understanding of the dynamics surrounding BE malignant progression. Although genetic diversity has been characterized as a marker of malignant development, it is still unclear how BE arises and develops. Here we uncover the evolutionary dynamics of BE at crypt and biopsy levels in eight individuals, including four patients that experienced malignant progression. We assay eight individual crypts and the remaining epithelium by SNP array for each of 6–11 biopsies over 2 time points per patient (358 samples in total). Our results indicate that most Barrett’s segments are clonal, with similar number and inferred rates of alterations observed for crypts and biopsies. Divergence correlates with geographical location, being higher near the gastro-esophageal junction. Relaxed clock analyses show that genomic instability precedes and is enhanced by genome doubling. These results shed light on the clinically relevant evolutionary dynamics of BE.

Events used for phylogenetic reconstructions are breakpoints. No significant correlation was found. Circles highlight non-progressors and triangles highlight progressors. Each patient is identified by a colour. R 2 indicates the squared Pearson correlation coefficient used to test for significance.

Supplementary Figure 44: Correlation between genetic distances at the micro and macro scales.
Correlation of genetic distances between crypts from the same biopsy and crypts from different biopsies. All genetic distances in both groups (different crypts from the same biopsy; different crypts from different biopsies) are represented by their mean per patient. Progressors are displayed with triangles, non-progressors with circles and each patient is indicated by a different colour. R 2 indicates the squared Pearson correlation coefficient used to test for significance.

Supplementary Figure 45: Genetic distances between crypts from the same or different biopsies.
Distributions of all pairwise crypt distances depending on whether the two crypts are from the same biopsy (left) or from different biopsies (right) in all 8 patients. Boxes indicate the middle quartiles, whiskers extend to 1.5 times the interquartile range, the horizontal bar indicates the median and dots represent outliers. All p values computed using Wilcoxon rank sum tests. Each violin plot corresponds to the posterior sample of the LUCA age for a given patient (x axis) and dataset (color, yellow=crypt, green=whole epithelium) included in the 95% highest posterior density (HPD) interval. The mean rate is indicated with a dot for non progressors and a triangle for progressors. Inscribed posterior samples with decreasing transparency correspond to the 75%, 50% and 25% of the HPD interval. The probability of the two posteriors distribution being the same is indicated per patient. Each violin plot corresponds to the posterior sample of the effective population size estimation for a given patient (x axis) and dataset (color, yellow=crypt, green=whole epithelium) included in the 95% highest posterior density (HPD) interval. The mean rate is indicated with a dot for non progressors and a triangle for progressors. Inscribed posterior samples with decreasing transparency correspond to the 75%, 50% and 25% of the HPD interval. The probability of the two posteriors distribution being the same is indicated per patient. Each violin plot corresponds to the posterior sample of the effective population size estimation for a given patient (x axis) and dataset (color, yellow=crypt, green=whole epithelium) included in the 95% highest posterior density (HPD) interval. The mean rate is indicated with a dot for non progressors and a triangle for progressors. Inscribed posterior samples with decreasing transparency correspond to the 75%, 50% and 25% of the HPD interval. The probability of the two posteriors distribution being the same is indicated per patient.  Figure 55: Genome doubling in patient 740. Endoscopic and surgical maps indicating the spatial orientation of biopsies relative to each other in the from the endoscopic sampling (endoscopic level indicates distance from incisors as measured by endoscope) and from the surgical specimens. Increased 4N biopsies are those with a 4N DNA content of greater than 6%; aneuploid biopsies have a G1 peak of 2.2N or greater. Genome double crypts and biopsies are as defined in the manuscript. Figure 56: Genome doubling in patient 848. Endoscopic and surgical maps indicating the spatial orientation of biopsies relative to each other in the from the endoscopic sampling (endoscopic level indicates distance from incisors as measured by endoscope) and from the surgical specimens. Increased 4N biopsies are those with a 4N DNA content of greater than 6%; aneuploid biopsies have a G1 peak of 2.2N or greater. Genome double crypts and biopsies are as defined in the manuscript. Figure 57: Genome doubling in patient 852. Endoscopic and surgical maps indicating the spatial orientation of biopsies relative to each other in the from the endoscopic sampling (endoscopic level indicates distance from incisors as measured by endoscope) and from the surgical specimens. Increased 4N biopsies are those with a 4N DNA content of greater than 6%; aneuploid biopsies have a G1 peak of 2.2N or greater. Genome double crypts and biopsies are as defined in the manuscript. Figure 58: Genome doubling in patient 911. Endoscopic and surgical maps indicating the spatial orientation of biopsies relative to each other in the from the endoscopic sampling (endoscopic level indicates distance from incisors as measured by endoscope) and from the surgical specimens. Increased 4N biopsies are those with a 4N DNA content of greater than 6%; aneuploid biopsies have a G1 peak of 2.2N or greater. Genome double crypts and biopsies are as defined in the manuscript. Figure 59: Estimated SCA mutation rate changes in patients 852-P and 391-P using biopsy data. Phylogenetic trees of patients 391-P (a) and 852-P (b). Branch lengths indicate time measured in years, while branch colors indicate the log10 of the estimated SCA rate. SCA rate changes are indicated numerically with their estimated mean rate, accompanied by their posterior probability conditional to the clade defined by their branch (between parentheses). The original rate is indicated without posterior probability, since it does not constitute a rate change. Nodes with a posterior probability >0.80 are labelled with an asterisk. The x-axis indicates years from the birth of the patient. Samples with estimated genome doubling are also labelled.

SUPPLEMENTARY METHODS
Figures and tables specific to this section are embedded in the text.

Pre-processing and quality control
Standard quality control was performed using the Illumina GenomeStudio software. 216 samples did not pass the quality control (0/1 column in default GenomeStudio output) and were excluded from further analysis. logR values were corrected for GC content bias using the genomic wave correction tool of the pennCNV software suite 1 .

Probe filtering
Because of the low initial DNA quantity extracted from single crypts, biases were introduced in the SNP array data for these samples in the form of downward 'spikes' in the logR data, similar to what would occur in double deletions, which were found to be recurrent across samples and patients (Supplementary Figure 60).
We therefore proceeded to filter these out to avoid it impacting both segmentation and copy number profiling. From the initial set of probes S0, a subset S25 of probes is selected by taking a probe every 25 probes, and compute M the median logR and d the median average deviation from 100 probes before and after. The median of these medians is calculated similarly, taking a probe s out of every 5 of probes from the S25 set (so 1 in every 5 * 25 = 125 probes overall). Mm, the median of medians is computed using 50 probes from the S25 set before and after the selected probe. For the region relative to s, all the probes from S0 that are included in the calculation (5 * 25 * 50 +100 probes each side). Any probe that deviates from Mm by more than 5 times m (initial median average deviation) is excluded (Supplementary Figure 61 In order to ensure that segmentation was compatible across all samples from the same patient, we performed patient-specific joint segmentation using the copynumber R package 2 . For each patient, the filtered logRs and BAFs were segmented independently using the winsorize and multipcf functions, using a high gamma value of 1000 (10000 for patient 256). Missing logR values were imputed using the imputeMissing function. Only heterozygous BAFs were considered, defined as those whose value in the corresponding normal samples were between 0.25 and 0.75 (excluded). BAF values were mirrored so as to stand between 0.5 and 1, and only segments of more than 10 BAF-informative probes were kept. The breakpoints obtained from logR and BAF segmentations were then merged and mean logR and mirrored BAF values were computed for each segment. These values were passed to the ASCAT software 3 , by bypassing the ascat.aspcf function, to obtain allelespecific copy numbers.

Fragile site segmentation and breakpoint definition
The genomic coordinates of known fragile sites FHIT and WWOX were retrieved via the ensembl.org website. The logR of both loci were independently joint-segmented with the copynumber package, this time using a low gamma value of 25. For each patient p, total copy numbers were predicted for each segment i using the following formula: where mi is the mean logR of segment i, and Pp is the purity of patient p as given by the tumor content estimate by ASCAT. Gains and losses were defined using empirical thresholds on the obtained copy number (

Event matrices
Gains, losses, and cnLOH events were defined as follows. Allele-specific copy numbers from ASCAT were summed to determine total integer copy number for each segment. The overall median total copy number for the sample was then calculated by weighting segment copy numbers by segment size in base pairs. Each segment was then scored as normal (N), loss (L), gain (G), or cnLOH (O) relative to the median copy number. A segment was called as normal if it had the median copy number and both alleles were present, or as cnLOH if it had the median copy number but one allele was absent. It was scored as loss if it had lower than the median copy number, and gain if it had higher. This approach, which evaluates gains and losses relative to the genomic median copy number, was chosen to avoid calling 4N copy number as gain in a genome-doubled (GD) sample, which would produce large numbers of gains and cause spurious clustering of GD samples in the phylogeny.

Further segment filtering
All segments within 10 Mb of telomeres were removed. Segments with less than 100 BAFinformative probes (heterozygous in the normal reference) or spanning less than 10,000 kb were removed. We removed segments likely to correspond to biases due to the low input DNA quantity reproducible across patients (Supplementary Figure 60) by comparing the overlap between segments in each patient with segments in all the other patients. If an overlap between segments from two different patients corresponded to 80% or more of the average base pair length of both segments and the average segment length was less than 10 Mb, the two segments were considered redundant. Segments that were redundant across at least 4 patients were removed. Finally, we filtered out segments in which discontinuous copy number alterations were found across different samples (thus likely to influence phylogeny reconstruction) while no signal difference was visible when manually inspecting BAF and logR values in the relevant samples. A list of all segments removed after manual inspection is given in Supplementary Data 1. In addition, the following 9 samples (from 5 patients) were deemed too noisy because of the presence of multiple copy number alterations that could not be identified when manually inspecting logR and BAF values and were also removed from further calculations: 256_42Y_SS18_3_1_22486, 391_67K_SS4_3_2_22223, 391_67K_SS4_3_1_22222, 391_175R_C4_whole_epi__22327, 451_76Z_SS4_3_1_22374, 451_115T_SS6_2_1_22667, 740_78M_SS7_4_2_21934, 740_78M_SS7_1_1_2192, 911_5U_SS6_2_2_22752. Supplementary Figures 62-69 illustrate the copy number events reported at different stages of the filtering process, while Supplementary Figure 70 gives an example of manually filtered data.

Supplementary Figure 62: Filtering steps (patient 256-P).
Blue segments indicate a loss, red ones a gain, green ones a copy-neutral loss of heterozygosity, with white ones showing no alteration. For illustration purposes, all segments have the same size. a) Segments retained prior to filtering on alterations found to recur across patients. b) Segments retained after filtering on alterations found to recur across patients. c) Segments retained after manual inspection and filtering. Gray dots belong to segments that were filtered out of the analysis. One segment near the chromosome end (coordinates 130132392-133977901) was filtered out manually and is highlighted by rectangular boxes. A gain, highlighted in orange, was predicted in this segment for sample TP1_SS3_whole_epi (top left) and a loss, highlighted in cyan, was predicted in sample TP1_SS7_1_3 (top right), with no evidence of a shift in BAF values such as is observable in the loss pattern on the q arm (highlighted in blue). Both samples resemble TP1_SS7_whole_epi (bottom left), in which no alteration was predicted and the segment was filtered out in all samples for this patient. A genuine-looking copyneutral loss of heterozygosity (cnLOH) is thus be filtered out in sample TP2_SS9_whole_epi. However, due to the breakpoint-oriented method we are using, this would not impair phylogenetic reconstruction.

Allele phasing
Starting with a matrix of allele-specific copy numbers per segment per sample, a copy number (CN) "event" was defined as an abnormal genotype in one segment in one sample. All events were computed relative to baseline ploidy so a total copy number of 2 would be considered a loss if the baseline ploidy was 4. All samples had paired normal information.
For phylogenetic analysis, we needed to determine how many copies of each original allele were present in each segment, and define these alleles in a consistent way across adjacent segments. Our strategy was to assume that, when two segments of identical copy number (CN) call were adjacent, they represented gain or loss of the same haplotype unless there was substantial evidence supporting gain or loss of different haplotypes (Supplementary Fig. 71a, leftmost panel). This assumption minimizes calling of spurious breakpoints.
We began by defining a "seed" segment from which to build haplotypes, preferring segments likely to be highly informative (Supplementary Fig. 71a, center left panel). The segment was chosen to maximize the following score: number of contiguous segments with the same CN call, times log10 of the number of heterozygous SNPs in the segment. We then determined, for each heterozygous SNP in the seed segment, whether it had increased or decreased BAF relative to normal (the patient's constitutive genotype). The sample with the greatest mean BAF difference between its SNPs and normal was chosen as the reference sample as it was presumed to offer the clearest signal.
We divided the heterozygous SNPs in the seed segment into those whose BAFs were increased in the reference sample and those whose BAFs were decreased; the decreased SNPs were arbitrarily designated as corresponding to haplotype A and the increased SNPs to haplotype B.
We then examined this segment in all other samples from the same patient. In segments showing a CN event opposite to the one in the reference sample (loss or copy-neutral LOH versus gain), the increased SNPs were considered as haplotype A and the decreased SNPs as haplotype B. We assumed that the increased haplotype in each sample displaying the same CN event was the same as in the reference sample, unless the informative SNPs supporting a reversed assignment were >1.5x greater than the number of SNPs supporting the reference-sample assignment, in which case the sample was designated as "swapped.", i.e. affecting the other allele compared to the reference sample. The threshold of 1.5x was chosen empirically to reduce the frequency of apparently spurious transitions between, for example, loss of A and loss of B in noisy data.
We then counted, for each informative SNP, the number of samples in which that SNP was associated with the same haplotype (that is, varied in the same direction as in the reference sample for a non-swapped sample, or in the opposite direction for a swapped sample). The maximum of this number across SNPs defined the maximum phasing support (MPS). We based our phasing decisions only on SNPs which were associated with the same haplotype at least 0.75 * MPS proportion of the time. For example, if we examined 10 samples and the SNP with the highest consistency of haplotype assignment was associated with the same allele in 8 of them, the MPS was 8, and we considered all SNPs associated with the same haplotype in at least 0.75 * 8 = 6 samples to be phased to that haplotype. Based on these SNPs, we assigned a phased genotype to each sample according to the reference sample phasing, the copy number event detected in each sample, and whether it was considered as swapped.
Once phase was determined for the seed segment, we extended phase inference in both directions down the chromosome until all contiguous segments with the same CN had been phased (Supplementary Fig. 71a, center right panel). It is not possible to infer phase relationships among non-contiguous segments from these data. For each new segment, we found heterozygous SNPs with increased or decreased BAFs as before. We defined the A haplotype as SNPs decreased in normal samples and increased in previously swapped samples, and the B haplotype as the reverse. An additional filtering step was added in which we computed the mirrored BAF frequencies of heterozygous probes in the previous and current segments. If the current segment's frequencies were significantly different from those in the previous segment (defined as p<0.1 in a t-test) we did not merge the current segment with the previous segment; the significant t-test was taken as evidence that these represented two different CN events, perhaps present in different fractions of the cells, rather than a single event.
The A and B probes were then defined as those with MPS * 0.75 phasing support as before. When no samples passed the t-test filtering, all samples sharing a CN event with the previous segment were used for phasing. This was done to favor consistency and avoid creating spurious breakpoints as much as possible. Finally, the "swapped" status of each sample was re-evaluated based on the chosen probes, and the phasing was propagated if there was at least one sample with an identical CN event in the current segment. When no further phasing propagation was possible, the process was restarted by defining a new seed segment on the remaining unphased segments (Supplementary Fig. 71a, rightmost panel).
Associations between all phased SNPs and alleles were recorded (Supplementary Fig. 71b-c) and used to produce phased allele-specific copy number matrices and phased event matrices. Copy number matrices corresponded, for each patient, to two matrices of copy number per chromosomal segment, one for the allele defined as 'A' by our phasing method, one for allele 'B'. In the phased event matrices, losses, gains and cnLOH events were associated with a specific allele.  391-P, chromosome 2). a) Different steps of the phasing process in a chromosome made of 13 segments (columns) in 62 samples (rows). From the initially unphased CNV event matrix, the seed segment is selected as the one showing the best potential for reliable phasing propagation (here segment 7). The reference sample is be selected as the one with the largest split between alleles, the probes showing a decrease compared to normal will be defined as the reference A allele. Segments in which SNPs behave contrarily to the reference sample are "swapped" and the best matching SNPS are phased to each allele. The phasing is propagated in both directions in stepwise fashion. Segments that display the same event as the previous segment in a given sample are selected to define the SNPs associated to each alleles. If a mismatch is detected due to discrepancy in mirrored B allele frequencies with the previous segments or because the phasing is not in agreement with other samples being phased, the non-matching segment is excluded from the SNP to allele association phase (example given in segment 10 -3rd step of the propagation process). Segments are then "swapped" if not matching the SNP to allele association and the best matching SNPS are phased to each allele. Finally, CNV events (and allele-specific copy numbers) are phased using the information regarding which SNPs are associated to which allele. b-c) Example of losses impacting different alleles in the same patient. Samples 391_175R_C2_whole_epi__22331 (panel b and sample highlighted in magenta in panel a) and 391_175R_B4_whole_epi__22328 (panel c and sample highlighted in cyan in panel a) appear to have been impacted by different events and the losses spanning segments 5 to 9 occur in different alleles, which could not be determined prior to phasing.

Breakpoint definition
Genomes were defined as having undergone genome doubling if the ploidy reported by ASCAT was strictly greater than 3, diploid otherwise. Breakpoints were defined as the association between a chromosomal location, an allele as defined by prior phasing, and either a gain or a loss compared to the preceding segment. At the start of the chromosome where there was no previous segment, we substituted the inferred median copy number of the sample. Divergent breakpoints were defined as those that were observed in one sample but not the other. When calculating percentages, only "informative" breakpoints were considered, i.e. those demarcating a copy number alteration observed in at least one sample in the patient.

Maximum parsimony phylogenetic analyses
For each patient p with Ns samples, the list of all Np breakpoints observed at least once was used to create a breakpoint matrix of size Ns x Np. The presence and absence of each breakpoint in each sample was coded by '1' if a given sample presented a given breakpoint, '0' otherwise. This was done separately for whole genome breakpoints and fragile site breakpoints. Phylogenetic trees were produced using maximum parsimony and Fitch optimization via the phangorn R package 4 , which was also used to calculate the Robinson-Foulds distances 5 . Evolutionary distances were defined as the sum of the branch lengths separating crypts and/or biopsies on the maximum parsimony phangorn phylogeny. Evolutionary diversity was defined as the mean evolutionary distance, for example among all biopsy-biopsy comparisons within a patient.
In order to determine whether the two data sets provided compatible phylogenetic information, we pooled together the Npwg whole genome breakpoints and the Npfs fragile site breakpoints. Breakpoints totaling Npwg were then taken at random to build a first tree, while the Npfs remaining were used to build a second tree; the topological-only Robinson-Foulds distance was then calculated between them. The process was repeated 250 times to create an expected distance distribution which was then compared to the observed distance between the two real trees, giving a one-tailed p-value.

Principal component analysis-based color schemes
The Ns x Np sample-specific breakpoint matrices were used to provide each sample with a color based on breakpoint presence/absence. A principal component analysis was performed on each matrix to extract the first three principal components, on which singular value decomposition (SVD) was performed for each sample. The principal components were normalized so that all three distributions ranged from 0 to 200 across samples. Each sample was therefore assigned 3 values between 0 and 200, which served as a basis for color on a 256-bit RGB scheme. The three principal components (in order) corresponded to red, green and blue. Values were limited to 200 to avoid obtaining colors too close to white (255 in all three component colors).

Bayesian phylogenetics: methodological development
Bayesian phylogenetic analyses were performed using a modified version of the BEAST 1.8 software 6 adapted to the usage of phased integer SCA state data. We implemented a novel substitution model that parametrizes the acquisition of SCAs based on three rates (gain, loss, and LOH), modifies the likelihood calculation by incorporating a branch connecting the most recent common ancestor of the sample to the last common ancestor with an unaltered genome (LUCA), and incorporates an acquisition bias correction adapted from the one by Lewis 7 . Some of these modifications are built upon re-implementations of strategies developed by Kostadinov et al 8 .
Each variable fragment constitutes an evolutionary character in our model and its state indicates the number of copies of the original two alleles present in the sample. This model assumes that each fragment evolves as an independent and identically distributed (i.i.d.) random variable.

SCA mutation matrix
We designed and implemented a continuous-time non-reversible Markov chain to model the SCA mutation process. Its rate matrix (Supplementary Fig. 80) is parameterized by three rates, each modeling one evolutionary event: gain, loss and conversion. Gain corresponds to the acquisition of an additional copy of one of the original alleles, loss to the loss of a pre-existing copy of one of the original alleles, and conversion to the substitution of one of the original alleles for the other. Rates in the diagonal are set so that the sum of transition rates leaving a given state and the rate of keeping the same state equals 0 for all states. Conceptually, our model has an infinite number of states; however, we implemented a version limited to 6 copies per locus (fragment), since we consider that our SCA calling pipeline cannot accurately call states with a larger number of copies. The nonreversibility of this process complicates its usage with clock models, since it cannot be normalized so that the total flux of states equals 1. In order to circumvent this issue, the matrix can be normalized by the total SCA rate per fragment and allele (i.e., the sum of the three parameters of the matrix). This is only necessary when using more than one clock in the tree (i.e., random local clock).
This model was implemented in BEAST as a new substitution model. Since it only has two free parameters, we estimated two rates relative to the third. We relied on the general data type of BEAST to input our SCA states, recoding the 28 biallelic states of our rate matrix as different alphanumerical characters. We did not obtain a simple closed-form of the transition probability matrix and therefore carried out the matrix exponentiation using the numerical approach already implemented in BEAST. Figure 80: SCA rate matrix implemented in BEAST. g=gain rate, d=loss rate, c=conversion rate.

Last common ancestor with unaltered genomic state
We implemented a last common ancestor with unaltered genomic state (LUCA) in both tree model and likelihood calculation, thus assuming that all samples shared a common ancestor with an unaltered genome. This allowed us to estimate the time in which the BE lesion started its development and also helped to make better use of the non-reversible SCA rate matrix. LUCA is implemented as an extra degree-1 node that becomes the root of the tree, which is connected to the most recent common ancestor (MRCA) by a new branch. Therefore, the BEAST implementation of the likelihood calculation had to be modified to carry out the Felsenstein pruning algorithm 9 in a degree-1 node. Moreover, the partial likelihood at the root is multiplied by the partial likelihood of the assumed root state (i.e., partial likelihood of one for the wild type state and zero for the rest).

Acquisition bias correction
The evolutionary characters used in our analysis were obtained by joint segmentation of the SNP array data (i.e., filtered logRs and BAFs), and therefore, by definition, all observed characters varied. Moreover, the number of invariant fragments is not measurable or easy to estimate, as it would correspond to the invariable proportion of the total potential fragments the BE tissue could eventually generate in a given patient. Nevertheless, considering only variable fragments constitutes a sampling bias, which we predict would strongly affect the estimation of absolute SCA mutation rates. This bias is well known in other kinds of data, such as restriction sites 10 , single nucleotide polymorphisms 11 and morphological characters 7 . In order to mitigate this bias, we implemented the conditional likelihood correction 7 modified for the SCA characters of our model (see 12 for an equivalent implementation for DNA characters) repurposing BEAST code by Alekseyenko et al. 13 . Reconstituted approaches could not be used in our data due to the difficulty of estimating the number of invariant fragments.

Genome doubling
Our phylogenetic method does not model genome doubling (GD). However, polyploidy is common in cancer cells, and is present in our data. We predicted that the presence of genome doubling would systematically bias our evolutionary parameter estimates, such as SCA mutation rate, tree topology and branch lengths. We developed two alternative data pre-processing strategies intended to alleviate this model misspecification, and carried out a simulation study (Supplementary Note 1) in order to choose the best, which we used in our analyses. The chosen strategy consisted of re-coding the SCA states, making them relative to the estimated integer baseline. Thus, states were divided by the estimated baseline state. If the division remainder was not zero, the resulting state was rounded up or down at random for each position, in order to minimize the mutation rate estimation bias. The integer baseline was calculated as the rounded division of the mean copy number state per allele.

Bayesian phylogenetic analysis
In order to estimate SCA mutation rates we developed a new phylogenetic method (PISCA) implemented as a BEAST 1.8 6 plugin (available at https://github.com/adamallo/PISCA). PISCA analyzes sequences of phased integer SCA states using a new model with three main components: (1) a SCA substitution model, (2) an extra branch connecting the most recent common ancestor of the sample to the last common ancestor with an unaltered genome (LUCA), and (3) acquisition bias correction.
For the 6 patients who had sufficient mutations to perform the analysis, we analyzed single-crypt and whole-biopsy SCA data separately to obtain estimates of the phylogenetic tree, SCA mutation rate (sum of gain, loss and conversion rate), LUCA age and effective population size. We then compared the estimates obtained at the crypt and whole-biopsy level. We used both a strict molecular clock model, which assumes that all branches of the phylogeny have the same (constant) SCA mutation rate, and a random local clock model, which allows for different branches of the phylogeny to have different SCA mutation rates 14 . This allowed us to estimate changes in SCA mutation rate in specific BE cell lineages during progression.
The two phased allele-specific copy number matrices per individual (one for each allele) were translated into SCA states, applying our pre-processing strategy intended to mitigate the GD bias. Fragments called as wild-type in all samples from a patient were discarded in order to appropriately use the acquisition bias correction. These molecular data, together with the patient date of birth, sequence dates (tip dates) and prior distributions (Supplementary Tables 1-2) were used to generate the BEAST input files. We performed two analyses, one using a strict molecular clock, and one using the random local clock model 14 . Patients 256-NP and 911-NP were not analyzed, since the number of variable sites for at least one of their data-sets was too low (3 and 1, respectively).

Parameter
Prior distribution   15 . One extra chain sampling only from the priors was run per dataset in order to check that the results were not driven by the priors. Parameter and tree estimation were carried out on the combination of the 3 chains after discarding the first 10% of the samples as burnin (i.e., 27000 posterior samples). Three datasets required the usage of the slower Metropolis-coupled alternative with three chains with default heating parameters in order to mix properly and achieve convergence. We performed a simulation study (see the Genome Doubling section of the supplementary methods) using both algorithms to ensure that their results were comparable. With the strict molecular clock model, effective sample sizes (ESSs) were high for all parameters and samples, getting close to the maximum 9000 in most datasets and parameters, with a global minimum per replicate of 241 (data not shown). The random clock model analysis showed the same performance except for some rate-change parameters for the biggest crypt-based analyses. Parameter point estimates were obtained using the mean of the posterior sample, while maximum credibility trees with median heights were used as tree point estimates. The estimated population size parameter (PNe) is a composite parameter PNe=Ne*tau, where Ne is the effective population size and tau the generation time length. In order to obtain final Ne estimates we assumed 2 different tau values, 50 and 7 days. The probability of two posterior samples being equal was calculated as their overlap using the R package birdring 16 .

SUPPLEMENTARY NOTE 1: SIMULATION STUDY Introduction
The current implementation of PISCA does not model genome doubling (GD). However, polyploidy is common in cancer cells, and it is present in our data. We predict that the presence of genome doubling will systematically bias our evolutionary parameter estimates, like SCA acquisition rate, tree topology, and branch lengths. In order to characterize this systematic bias, we carried out a simulation study with different scenarios. Moreover, this allowed us to benchmark the two data preprocessing strategies we developed to alleviate this expected bias and to ensure that results obtained with the MCMC and MC3 BEAST algorithms were comparable.
GD-like genotypes can arise due to both real GD events or due to baseline estimation errors in the CNV calling procedure. In order to assess the relative performance of the different strategies tackling these two scenarios we developed a simulation strategy for the two of them and added a negative control with no doublings. We also predicted that acquisition rate strongly affects the accuracy of the different strategies and included different rates in our analyses accordingly.

Data pre-processing strategies
We developed two data pre-processing strategies intended to reduce the systematic bias induced by GD events. The inclusion of a third strategy in the form of negative control left us with the following three strategies: -none: CNV states are directly translated from the simulated number of allele copies (i.e., neglecting GD). For clarification, we refer to the two original copies (one per chromosome) as A and B alleles, independently of their identity. Thus, the state 2:1 would correspond to a gain of the copy A (2 copies) and the original copy number state of the allele B (1 copy).
-baseline: CNV states are calculated relative to the estimated integer baseline. Thus, simulated states are divided by the estimated baseline state. If the division remainder is not zero, the resulting state is randomly rounded. We added this strategy to avoid biasing the acquisition rate estimation. We estimated the integer baseline as the rounded division of the mean copy number state per allele.

SCA simulator
We developed and implemented in Python an in-house simulation software that simulates copy number states evolving down a user-specified tree according to our SCA substitution model. It also incorporates two genome doubling models, one parameterized by a constant GD rate and another by an exponentially growing GD rate. The implementation of the second is based on waiting-time scaling; a well-known strategy used to simulate exponential population growth in coalescent simulators. We included this strategy to reflect that GD is more common late in progression. This simulator also implements an option to duplicate a given number of leaves chosen at random, feature intended to simulate GD-like genotypes originated by baseline estimation errors. Finally, this piece of software also considers the previously discussed pre-processing strategies and can directly generate BEAST input files. Importantly, it depends on DendroPy 4.1.0 17

Simulation pipeline
We simulated the trees using CoalEvol7.3.5 18 under the coalescent model with an exponentially expanding population and dated tips. We assayed two tree sizes (11 and 56 leaves) corresponding to the median number of taxa for the two groups respectively (progressors and nonprogressors) taking into consideration only the single-crypt samples. The other tree-simulation parameters were fixed and kept between biologically reasonable values (Supplementary Table 3). We generated ten replicates (trees) per condition.

Time between time points 5 years
Generation time 1 day Haploid effective population size in the tips 90000 individuals Effective population size exponential growth rate 1.563e-3 Supplementary Table 3: Comprehensive list of fixed tree-simulation parameters. The exponential growth rate was parameterized assuming a LUCA age of 20 years before the present. Note that we parameterized the effective population size in haploid fashion since we assume that cancer cells do not undergo recombination, and therefore must be modeled as haploid individuals.
These trees were used as input for our SCA simulator in order to generate genetic data under 54 different simulation scenarios; result of the combination of two tree sizes, three acquisition rates, three pre-processing strategies and three doubling models (Supplementary Table 4). Some CNV-simulation parameters were fixed and kept between biologically reasonable values (Supplementary Table 5).

Parameter Values
Tree size (number of tips) Nonprogressor (11) Table 5. Comprehensive list of fixed SCA-simulation parameters. Parameters indicated with * are only applicable in certain simulation scenarios. The exponential growth rate of the GD rate has been parameterized to get to a final value of 0.5 at the tips.

Estimation
We performed the phylogenetic estimation on each dataset using three independent runs of 20M MCMC generations in PISCA. We sampled the posterior distribution every 2000 generations and used the priors shown in Supplementary Table 6. We checked for proper mixing and convergence using Tracer 19 and RWTY 15 in a random replicate of each combination. We discarded a 10% of the posterior samples in each estimation replicate as burnin, and combined the rest to perform the data analysis.

Parameter
Prior distribution LUCA age (years from the present) Uniform [5,60]