Inferring clonal composition from multiple tumor biopsies

Knowledge about the clonal evolution of a tumor can help to interpret the function of its genetic alterations by identifying initiating events and events that contribute to the selective advantage of proliferative, metastatic, and drug-resistant subclones. Clonal evolution can be reconstructed from estimates of the relative abundance (frequency) of subclone-specific alterations in tumor biopsies, which, in turn, inform on its composition. However, estimating these frequencies is complicated by the high genetic instability that characterizes many cancers. Models for genetic instability suggest that copy number alterations (CNAs) can influence mutation-frequency estimates and thus impede efforts to reconstruct tumor phylogenies. Our analysis suggested that accurate mutation frequency estimates require accounting for CNAs—a challenging endeavour using the genetic profile of a single tumor biopsy. Instead, we propose an optimization algorithm, Chimæra, to account for the effects of CNAs using profiles of multiple biopsies per tumor. Analyses of simulated data and tumor profiles suggested that Chimæra estimates are consistently more accurate than those of previously proposed methods and resulted in improved phylogeny reconstructions and subclone characterizations. Our analyses inferred recurrent initiating mutations in hepatocellular carcinomas, resolved the clonal composition of Wilms’ tumors, and characterized the acquisition of mutations in drug-resistant prostate cancers.


INTRODUCTION
Pan-cancer tumor profiling has identified recurrent alterations that are associated with tumor etiology at the loci of thousands of genes but the interpretation of genetic alterations remains a major challenge [1][2][3] . Knowledge about the clonal evolution of tumors can point to genetic alterations that both contribute to tumorigenesis, indicate prognostically relevant intratumoral variability, and point to refractory tumor subclones 4,5 . Specifically, clonal evolution-depicted as a phylogenetic tree in Fig. 1a-can help to identify alterations that play a role in tumor initiation as well as those that confer a selective advantage to altered tumor cells. Moreover, information about its subclone composition is important for predicting cancer's potential for drug resistance and metastasis, which vary across tumor subclones 6 and are the key determinants of patient outcome. Consequently, tumor-subclone characterization is essential for designing personalized therapies that target all tumor subclones and may hold the key to predicting tumor progression, metastases, drug sensitivity, and patient outcome.
Current methods that rely on DNA-profiling to reconstruct clonal evolution of tumors can be classified into two categories: methods that primarily rely on single-cell profiles [7][8][9][10] and those that computationally resolve mixtures of subclones from molecular profiles of bulk tumor cells, i.e., profiles of pools of cells that originate from a common malignant lesion [11][12][13][14] . Single-cell DNA sequencing can produce more definitive estimates of the proportion (frequencies) of tumor cells that contain each genetic alteration and more complete profiles of tumor subclones, including information about the co-occurrence of alterations within each subclone. Its primary disadvantage is operational: the availability of high-quality tumor samples that permit single-cell isolation and profiling as well as the accuracy and cost associated with parallel sequencing DNA from a multitude of cells per tumor. Moreover, improving the accuracy of single-cell mutation profiling remains challenging due to limited material availability in single cells 15 ; this is not likely to improve as future sequencing technologies focus on profiling formalin-fixed paraffin-embedded (FFPE) tumor samples 16,17 . Alternatively, single-cell RNA or protein profiling can help indicate tumor subclones, but these assays may not directly point to key driving genetic alterations.
Focusing on single-nucleotide somatic variants (SNVs; or simply mutations), we sought to reconstruct clonal evolution from DNA profiles of genetically unstable cancers. This entails deconvolving mutation frequencies, mutation-subclone associations, and CNAs from DNA profiles-including both whole-exome sequencing (WES) and panel-based (targeted) sequencing assays-that produce average estimates across cellular ensembles ( Fig. 1b-d).
One approach to improve the accuracy of these deconvolutions is to profile multiple biopsies from the same tumor across time points 18 or across regions 6,19 . This approach relies on two key assertions: (1) that genetic alterations that are specific to the same tumor subclone will co-occur with the same frequency across biopsies, and (2) that the clonal composition of heterogeneous regions varies, i.e., multiple sampling will allow for the aggregation and deconvolution of the frequencies of most mutations with improved power. It is important to note that mutations that underwent convergent evolution 20 do not violate these assertions and will not be aggregated with other mutations from the same tumor subclone because of differing frequency estimates across biopsies. It is also important to note that accurate deconvolution must account for tumor purity, and our efforts-including production of simulated data to compare leading methods and 1 IBM Research-Zurich, 8803 Rüschlikon, Switzerland. 2 Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland. 3 Texas Children's Cancer Center, Baylor College of Medicine, Houston, TX, USA. 4 Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland. 5 RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine, Aachen, Germany. 6 Institute for Computational Biomedicine, Heidelberg University Hospital, Heidelberg, Germany. 7 Senckenberg Institute of Pathology, University Hospital Frankfurt, Frankfurt am Main, Germany. ✉ email: Peter.Wild@kgu.de; mrm@zurich.ibm.com; sumazin@bcm.edu analyses of tumor profiles-account for differences tumor purity across samples.
A central challenge for estimating mutation frequencies in tumors with unstable genomes is accounting for the effects of CNAs that can alter mutated-read fractions. These are observed in profiles of tumor biopsies that are composed of tumor cells with wild-type and mutated alleles as well as tumor-adjacent cells (Fig.  1e, f). In turn, inaccurate mutation-frequency estimates can contribute to erroneous associations between mutations and tumor subclones as well as errors in phylogeny reconstructions (Fig. 1g, h). We describe the mutation-frequency inference problem as that of inferring tumor subclone frequencies and associating mutations with subclones. Consequently, we describe the tumor-phylogeny reconstruction problem as that of inferring ancestral relations between tumor subclones. The main challenges for addressing the mutation-frequency inference problem are to Fig. 1 A simulated footprint of the clonal evolution of a tumor, as observed in genetic profiles of multiple biopsies. a Tumor phylogeny composed of six dominant tumor subclones that make up the b cellular composition of six tumor biopsies. c Read fractions in a DNA profiling assay that are associated with these subclones and d corrected fractions after accounting for CNAs. e Frequencies of the variant allele, for each mutation defining a clone in each biopsy, are higher for ancestral clones. f Variant allele frequencies are linked to the cellular composition of the tumor and depend on its associated phylogeny. g Ancestral relations can be inferred by comparing subclone frequency vectors; e.g., Subclone 3 frequencies are greater or equal to those of Subclone 4 across all biopsies, suggesting that Subclone 3 may be ancestral to Subclone 4. h However, errors in frequency estimates (red) can complicate efforts to infer ancestry and tumor-phylogeny reconstruction.
aggregate co-occurring mutations across biopsies, estimate the frequency of each aggregate in every biopsy, and identify partial orders across aggregates that are consistent across biopsies. When viewed this way, each tumor subclone could be associated with a frequency vector that describes the proportion of cells containing its mutations in each biopsy. Ancestral order between two subclones could then be established based on (probabilistic) comparisons between their corresponding mutation frequencies.
Ancestral order inference requires confident frequency assignment to the majority of mutations based on observed mutatedread fractions, and inference methods can be compared based on the number of mutations with frequency estimates, the accuracy of these estimates, and their accuracy at aggregating sister mutations that initiate the same clones.
We studied the mutation-frequency inference problem as a function of genetic instability and proposed the inference method Chimaera to improve these estimates and subsequent phylogeny reconstruction. Chimaera uses an optimization process to resolve the parameters of a natural model for the effects of CNAs on mutated-read fractions and is unique in its emphasis on the simultaneous inference of mutation frequencies and CNAs. We report on a comparison of Chimaera's accuracy to that of other mutation-frequency and heterogeneity inference methods on simulated DNA-profiling data of genetically unstable tumors and on biopsy subsets of thirteen tumors, including liver, kidney, and prostate cancers [21][22][23] . Each of these tumors was profiled in 4-10 regions, and three of the prostate cancers profiled were surveyed across multiple time points. We showed that Chimaera's inferences can be used to identify the key mutations that are associated with increased subclone proliferation, drug response, and tumor grade, as well as to infer the ancestral relations between tumor subclones that harbor these mutations.

RESULTS
We describe the results of our efforts to evaluate inference method accuracy on simulated data and to reconstruct phylogenies based on WES and targeted sequencing assays of tumor biopsies. Our analyses highlighted the challenges in inferring cancer mutation frequencies from these assays and the benefits of methods that rely on profiles of multiple biopsies per tumor to characterize tumor subclones and tumor evolution by tracking mutation aggregates.
Simulation of DNA profiling data We used phylogeny models-with sizes ranging from three to twelve tumor subclones, twenty to fifty somatic mutations per subclone, and varying degrees of genetic instability-to generate simulated DNA profiles. ABSOLUTE 24 , AncesTree 12 , EXPANDS 25 , PhyloWGS 14,26 , SCHISM 13 , and Chimaera were then used to reconstruct phylogenies based on simulated data. Each method inferred ancestral relations between mutation pairs, and errors were estimated as the combined frequencies of false-positive and false-negative predictions. ABSOLUTE infers tumor purity and malignant cell ploidy directly from the analysis of somatic DNA alterations by fitting estimates of copy-ratio of both homologous chromosomes with a Gaussian mixture model, where components were centered at the discrete concentration-ratios implied by an initial frequentist estimation. AncesTree characterizes the clonal evolution of tumors based on the probabilistic model for errors in observed read fractions and infers phylogeny matrices using integer linear programming. EXPANDS clusters mutations based on their cell-frequency probability distributions; clusters are next extended by members with similar distributions and pruned based on statistical confidence by comparing the cluster maxima and peaks observed outside the core region. PhyloWGS reconstructs phylogenies based on a model for simple somatic mutations in addition to a correction for CNAs, all based on a single biopsy per tumor. SCHISM takes as input mutation cellularity estimations and mutation clustering inferred by other methods and uses a generalized likelihood ratio to infer lineage precedence and lineage divergence. A genetic algorithm is then used to build phylogenetic trees.
Accuracy of mutation-frequency estimation based on simulated data Our initial efforts to compare accuracy between methods based on analyses of phylogenies of size three revealed variable success rates, with some methods showing consistently poor accuracy. EXPANDS and PhyloWGS, which were designed to reconstruct phylogenies using profiles of one biopsy per tumor, and ABSOLUTE, which is best known and most effective for estimating tumor purity, had consistently poor accuracy in our simulationswith results statistically indistinguishable from random inferences. SCHISM and AncesTree had better or comparable performance than these three methods in every simulated instance. For example, the magnitude of frequency inference errors by ABSOLUTE, which processed profiles of multiple biopsies per tumor, were more than double those of SCHISM and analyses required manual parameter optimization. However, ABSOLUTE had good accuracy for inferring tumor purity in our synthetic data. SCHISM and AncesTree do not explicitly account for the full range of observed CNAs in tumors, but they were accurate in 100% of our tested instances with three tumor subclones. Consequently, we focused on accuracy comparisons between inferences by SCHISM, AncesTree, and Chimaera on phylogenies composed of 6-12 tumor subclones. Moreover, our analysis suggested that more than one biopsy per tumor was required to accurately approximate mutation frequencies and CNAs at these mutation loci.
We compared the accuracy of SCHISM, AncesTree, and Chimaera on phylogenies that were adapted from a precompiled library that was generated both manually and using CITUP 27 ; see Fig. 2b and Table S1 for representative phylogenies. Each somatic mutation was associated with a trio of copy numbers-δ s , δ s w , and δ s m (Fig.  2a)-that were taken from truncated normal distributions with means μ ∈ {1, 2, 3}, where μ = 1 corresponds to no copy number changes and standard deviation σ ∈ {0, 1, 2, 3}; σ = 0. was used only when μ = 1. The resulting copy numbers modeled a range of genetic instability conditions that were in line with observed CNAs in TCGA-profiled prostate, hepatocellular, breast carcinomas (HCC and BRCA in Fig. 2d, e); we assumed no linkage between simulated CNAs of any mutations, and we omitted prostate adenocarcinoma (PRAD) curves from Fig. 2 for readability. In addition, we added up to 10% of wildtype reads for all simulated mutations to account for the potential inclusion of nontumor cells in biopsied samples (WT subclone in Fig. 1a). Total coverage for each allele-i.e., the number of reads covering both the wildtype and mutated variants of a specific nucleotide-was taken by sampling mutation coverage values from prostate and Wilms' tumor biopsies profiled here. Finally, once idealized counts were available for both mutated and wildtype alleles, we simulated duplication or loss of up to 5% of the observations according to a uniform distribution. To simulate multiple regions per tumor, we repeated each biopsy simulation using the same simulation parameters but with distinct cellular composition vectors to produce simulated profiles of 6-12 biopsies per tumor, as depicted in Fig. 2c. The availability of six biopsies per tumor increases the likelihood that mutations can be aggregated and subclone mutation frequencies can be compared to infer ancestral relations and are in line with pre-existing datasets, including those reported here. The selection of six biopsies is a compromise between clinical feasibility and the power needed to infer mutation frequencies and phylogenies.
AncesTree accepts no external input when estimating mutation frequencies, but SCHISM can be guided by externally inferred mutation frequencies and clusters. SCHISM's implementation includes its own selected clustering methods, and these were also used to compare accuracy. Chimaera can also be guided by externally inferred mutation frequencies and clusters, but by default, it uses a clustering approach modeled after hdbscan 28 . When comparing SCHISM and Chimaera performance on synthetic data, we clustered mutations using their native clustering approaches and with tclust 29 optimization subroutines including ElbowSSE, Entropy, GMD, Mclust, and SDIndex 30,31 . We compared the accuracy of methods and pipelines on 2000 simulated assays, including both with and without modeled genetic instability by varying mutation copy numbers. The accuracy of SCHISM estimates was better, on average, than that of AncesTree, but it was relatively sensitive to clustering optimization methods, with SDIndex outperforming other methods, including SCHISM's native implementation. Comparatively, Chimaera estimates were less dependent on clustering methods and significantly outperformed estimates by SCHISM with SDIndex (p < 1E−16 by U-test).
When using its native clustering approach Chimaera exhibited lower accuracy than implementations using tclust (Fig. 3a), but it estimated frequencies for a significantly larger number of mutations (Fig. 3b). The number of mutations with no frequency estimates by Chimaera was 2-fold less than that of the next best method, which allowed for dramatically improved phylogeny reconstruction in both simulated data and profiled cancers. Inference accuracy, for both SCHISM and Chimaera, was anticorrelated with copy number variability across biopsies. We used CNA variability as a surrogate for genetic instability and quantified it using the coefficient of variation of mutation copy numbers across biopsies, which followed truncated normal distributions (Fig. 3c). However, while Chimaera inferences were affected by copy number variability, they were independent of the actual magnitude of CNAs (Fig. 3d). This, in turn, suggested that instability across biopsies is a key challenge for estimating mutation frequencies. We note that both the SCHISM and tclustbased Chimaera pipelines failed to cluster 40% of mutations in our synthetic data. Moreover, while Chimaera assigned frequencies to all clustered mutations and made predictions for each simulate cancer, SCHISM did not successfully estimate mutation frequencies for some simulated genomes and cancer profiles. To account for this, accuracy comparisons in Fig. 3 relied on only those mutations that had assigned frequencies by all methods. In total, our analysis suggests that mutation frequency estimation is more challenging for genomes with high CNA variability (Fig. 3c, d). All data-including supplementary tables and analyses-are available at Chimaera's GitHub repository.
The number of regions profiled per tumor dictates algorithm convergence Chimaera requires at least two profiled regions per tumor for making tumor subclone predictions, and while SCHISM can predict tumor subclones based on a single region, its accuracy improves with the number of profiled regions per tumor. To test the benefit of profile-multiplicity per tumor, we compared Chimaera and SCHISM analyses of multiregion WES profiles of 13 tumors using only subsets of the available tumor profiles for each analysis. Tumor profiles included profiles of nine hepatocellular carcinomas (HCCs) 32 , three high-risk Wilms' tumors, and a castrate-resistant prostate cancer (CRPC). Each tumor was profiled in 5-10 distinct regions, and we compared the number of subclones detected by each method in each multiregion subset as a function of the number of regions, with a minimum of two WES profiles per tumor.
All HCCs were profiled in five regions, which permitted testing predictions in 2-size, 3-size, and 4-size subsets of the assays for each tumor; Wilms' tumors were profiled in six and eight regions; and the CRPC in ten regions. Our results (Fig. 4) suggested that, for most HCCs and Wilms' tumors, Chimaera-analysis of four tumor regions resulted in a similar number of subclones as profiling five tumor areas, however, profiling two tumor areas was insufficient for predicting tumor subclones accurately with Chimaera. Predictions for our CRPC, which had the highest genomic instability and mutation burden of the 13 tumors, largely converged with seven profiled regions. Interestingly, because mutations that were clustered together by Chimaera were associated with sufficiently different frequencies by SCHISM, the number of subclones Our model for the effects of copy number alterations on mutated-read fractions in DNA profiles. a For each mutation in each biopsy s, the mutated-read fraction is a function of the true proportion of profiled cells with the mutation (mutation frequency) φ s , the copy number of the reference allele in cells without the mutation δ s , and the copy numbers of the reference and the mutated allele in tumor cells with the mutation, δ 0 s and δ a s , respectively. The frequency of tumor and WT cells without the mutation is (1 − φ s ) and the two reference alleles can be assumed to have a combined copy number of 2δ s . c-e To compare inference methods we synthetically generated profiling data based on parametrized simulated copy number distributions. b Shown are representative phylogenies and c a representative cellular composition matrix, as well as d density plots of average copy numbers across profiles of TCGA-profiled hepatocellular (HCC) and breast (BRCA) carcinomas. These include the distribution of copy numbers in two individual HCCs (HCC1 and HCC2) and across all HCCs (black) and all BRCAs (blue); copy numbers ranged from 0 to >260×. e Simulated copy numbers, including simulations of more-stable and less-stable cancers, ranged from 0× to 15× copies.
predicted by SCHISM were often dramatically greater, and SCHISM analyses did not produce subclone predictions for some tumors and often required more regions to converge (Fig. 4b). A detailed description of tumor subclone predictions in each tumor context follows. All data and analyses are given in Table S2.
Phylogeny inference in HCC HCCs are high-risk liver tumors that are known to have high genetic instability 23 . We used Chimaera to infer mutation frequencies and ancestral relations between HCC subclones based on WES profiles of nine HBV-positive HCCs 32 . In total, we obtained mutated-read fractions and CNA estimates for 1424 mutation candidates in 9 tumors and 43 tumor samples. Table S3 lists the data input to Chimaera, including mutation-frequency and CNA estimates for each mutation; it also details the outcome of the analyses described below.
Chimaera inferred frequencies estimates for 60% (858/1424) of all mutations, reconstructing phylogenetic trees for each tumor and predicting initiating clones and proliferative subclones; see representative trees in Fig. 5a-c. In contrast, SCHISM inferred mutations frequency for 18% of the identified mutations.
Interestingly, 100% (9/9) of the HBV-positive HCCs had predicted initiating mutations in WNT-signaling pathway genes (Fig. 5d). An examination of 102 TCGA-profiled HBV-positive HCCs 23 suggested that 74% (75/102) of samples carried mutations in WNT-signaling pathway genes, and that the majority of these samples (76%) had WNT-signaling pathway mutations with mutated-read fractions above 25%-corresponding to mutations that are potentially present in the majority of cells.
To test whether WNT-signaling pathway genes were enriched for mutations-and particularly high-frequency mutations with mutated-read fractions above 25%-we calculated the proportion of tumors with such mutations in each of 186 KEGG pathways in MSigDB 32 . The most enriched pathways by p-value and mutatedsample fraction are shown in Fig. 5e. p-values were estimated using permutation testing, where for each pathway, random same-size gene sets were generated using KEGG pathway genes and the mutated-sample fraction taken to generate a null distribution. WNT-signaling was the most enriched pathway, and most of the remaining enriched pathways significantly overlapped it (p < 0.01, FET). To correct for this overlap 33 -where pathways that overlap another pathway that is mutated in many samples are identified as significant-we recalculated enrichment significance Fig. 3 Performance of mutation frequency estimates on simulated data. a Performance-measured by mean error across simulated WES datasets from genomes with varying mutation copy numbers-of mutation-frequency estimates by AncesTree (purple), SCHISM (red), and Chimaera (green and blue); SCHISM and Chimaera were evaluated using multiple clustering methods in an effort to improve their accuracy, with SDIndex (SCHISM) and ElbowSSE (Chimaera) producing top accuracy, respectively. In blue, are reported estimates for the published Chimaera, which uses hdbscan. b No method is able to estimate mutation frequencies for every mutation; however, Chimaera assigns frequencies for over 80% of simulated mutations, compared to an average of 60% or fewer for other methods. c Errors in frequency estimates were correlated with genetic instability, which was measured here as the coefficient of variation within copy number distributions used in simulated WES profiles. Inferences by some methods were consistently better than others; e.g., SCHISM with SDIndex clustering outperformed AncesTree inferences. Chimaera clearly outperformed all the other methods regardless of the clustering strategy. d While copy-number variability in the same sample was correlated with inference errors, the absolute magnitude of copy numbers had no significant effect on Chimaera's performance. We report results for Chimera (hdbscan) and SCHISM with SDIndex (a representative that resembles results with other clustering methods). Standard errors are reported. Mean error is the mean of L1 distances between true and estimated mutation frequencies after normalizing for the number of biopsies tested.
for each pathway using the same test but after excluding WNTsignaling pathway genes. We note that MAPK-signaling and two other top-10 pathways were still enriched (Fig. 5e). Analyses products and input data are given in Table S3.

Phylogeny inference in Wilms' tumors
To test SCHISM's and Chimaera's predictive ability in tumors with a range of genomic instability, we selected three Wilms' tumors with low-genomic (CG118), intermediate-genomic (CG565), and highgenomic instability (CG163). Multiple regions of these tumors were profiled by WES, including six regions from each of CG118 and CG163 and eight regions from CG565 (Fig. 6a). SCHISM produced stable tumor subclone predictions that agreed with predictions by Chimaera for CG118, but it did not converge on a set of subclones even when profiling was available for seven regions from CG565; it also was not able to predict any subclones for CG163 (Fig. 4b). Chimaera tumor subclone predictions converged with fewer profiled regions, and Chimaera predicted phylogenies for all three profiled tumors.
Chimaera analysis of CG118 profiles (Fig. 6b) suggested that the tumor was composed of primarily two types of cells: tumor subclones with a CTNNB1 (S45F) mutation and those with a mutation in WT1 (R445W) mutation; a predicted daughter clone of the CTNNB1 mutation had no previously studied mutations. Both CTNNB1 and WT1 mutations have been previously implicated with Wilms' tumor genesis 34,35 , and both clone types were present in For each tumor, we exhaustively selected all region subsets size-2 and up, and compared the number of predicted tumor subclones across subsets to those obtained using all available region profiles using a Chimaera and b SCHISM. Chimaera analysis of any 4-tumor regions resulted in a similar number of subclones as analysis of five regions, however, profiling two regions produced fewer predicted subclones. SCHISM performed better on stable genomes (CG118) than on unstable genomes (CG163 and CRPC). c Chimaera analyses (top) of HCC6046 profiles suggested convergence of subclone predictions using three profiled regions, while SCHISM analyses (bottom) produced a higher prediction variability across profiling subsets. d Similarly, Chimaera analyses of four regions of CG565 produced a similar number of clones as profiles of eight regions. e Chimaera analyses of seven regions of our CRPC tumor produced a similar number of clones as analysis of ten regions; however, analyses based on five or fewer tumor areas produced significantly more predicted tumor subclones due to reduced aggregating power. SCHISM analyses based on six or more regions produced consistent counts of predicted subclones, but this number was considerably greater than the number of subclones predicted by Chimaera. every profiled region. However, the majority of cells-in all regions-were predicted to have CTNNB1 mutations. To compare the effects of these mutations, we compared RNA-expression profiles in regions with the lowest and highest frequencies of WT1mutated cells (Fig. 6c), 7% and 3%, respectively. Analysis of the CTNNB1-pathways and WT1-pathways 36,37 suggested that they are differentially regulated across these tumor regions (Fig. 6c). These data suggested that S45F activates CTNNB1 and that R445W inhibits WT1, as previously described 38,39 .
Chimaera analysis of CG163 profiles suggested that the acquisition of missense mutation in LIN28A (p.R132H) was a key event early in the formation of this tumor. Chimaera identified a second event that produced a less frequent tumor clone with many coding mutations with unknown significance. RNAexpression profiles of regions with the lowest and highest concentration of LIN28A mutations suggested that genes downstream of LIN28 are significantly altered in mutated cells. Analysis of CG565 did not reveal any mutations with known significance in Wilms' tumors. Data and analyses are provided in Table S4.

Phylogeny inference in prostate tumors
To further test Chimaera's predictive ability, we studied ultra-deep profiles of multiple regions of a select set of prostate cancers at multiple time points. Ultra-deep targeted profiling allows for improved mutation identification and read-fraction estimation. A single region from each of these cancers was previously profiled and helped identify multiple predicted driver mutations 40,41 . However, because mutations detection by WES are not always reliable and often includes both false-positive and false-negative predictions 42 , we selected three cancers and designed a mutation panel that targets mutated genes in these cancers, as well as other known driver genes in prostate cancers 43 for ultra-deep sequencing. The identity of targeted genes is given in "Methods" section. This approach helped test Chimaera's performance in more restrictive assays, which are quickly becoming standard in oncology clinics. Controls and five areas of each cancer were profiled at 2, 3, and 5 time points per cancer using both our targeted sequencing panel and OncoScan arrays to estimate CNAs on genome scales; areas profiled from tumor PC1 at each of three time points are shown in Fig. 7a and given in Supplementary  Table S5.
We recorded changes in treatment, identified mutations that may have been acquired following treatment, and predicted phylogenies for each tumor based on these mutations. We only considered mutations that were observed in multiple regions at the same time point, thus eliminating two-thirds of the candidates. Our results demonstrate the feasibility of phylogeny inference from targeted-panel profiling of multiple tumor regions and support phylogeny prediction efforts by suggesting that predicted subclones that are supported by multiple genetic variants may persist and accumulate additional genetic variants across time. We describe the results of Chimaera analysis below.
The tumor PC1 was diagnosed as Gleason 3 + 4 and treated with an LHRH analog. It was profiled at 3-time points during follow-ups after treatment was started. Timepoint 1 in Fig. 7a was taken over a year after diagnosis and treatment, and Time points 2 and 3 followed 1.17 and 1.4 years after Time Point 1. These suggested increased risk, with Gleason 5 + 5 and the discovery of a mutation in AR that has been associated with increased cancer cell proliferation and poor outcome. Chimaera analysis of profiles of PC1 suggested two predominant tumor subclones that are represented in Fig. 7b by predicted deleterious mutations in EP300 (p.I997V) and AR (p.T878A); Fig. 7b. Tumor subclones with the EP300 mutation were present in all regions profiled at Time Point 1. AR mutations were identified at high frequencies at Time Points 2 and 3 but only in regions that lacked the EP300 mutation and had high Gleason scores. Together, these findings suggest that either the most aggressive tumor sections were not profiled in Time Point 1, or that subclones with the AR mutation have a proliferative advantage and have overtaken subclones with the EP300 mutation. Protein profiling by Hyper Reaction Monitoring of regions with low-frequencies and high-frequencies for the subclones with the AR mutation confirmed that genes that have been shown to be downregulated by AR 44 are significantly upregulated in AR-mutated regions.
PC2 was diagnosed as Gleason 5 + 4 and biopsied before treatment. The patient was treated for 9 months with LHRH-Analogon, during which the cancer was biopsied two additional times and registered an increase in severity to Gleason 5 + 5 at Time Point 3. Following treatment, the tumor was biopsied a 4th time and showed no change in severity (Gleason 5 + 5). The patient was then treated with combined androgen blockage-LHRH and Casodex-for 2 months, followed by a Gleason 4 + 5 diagnosis at Time Point 5. Time Point 1 profiles revealed the loss of RB1 and a commonly observed stop-gain mutation in PTEN (p. R303X). Mutations in BRCA2 and EP300 were observed in Time Points 4 and 5. Chimaera analysis suggested that the RB1 locus deletion predates the PTEN mutation and that the BRCA2 and EP300 mutations were acquired in tumor subclones that following RB1 loss, lacked the PTEN mutation (Fig. 7d).
The tumor PC3 was diagnosed as Gleason 5 + 4 using posttreatment profiles, which included androgen blockers and orchiectomy 9 years prior to Time Point 1 profiling. A second biopsy, taken 6 months after the first suggested increased severity and Gleason 5 + 5. Tumor profiles identified known deleterious mutations in PTEN (p.Q245) and BRCA1 (p.E1038G), as well as a stop-gain mutation in BRIP1, and nonsynonymous mutations in BRCA2 and PALB2. All mutations were identified at both time points and nearly all regions. Chimaera's analysis suggested that RB1 loss was followed by the BRCA1 mutation. This was followed by the acquisition of a mutation in TP53 (p.V173G) or PALB2 and BRIP1. These sister clones subsequently acquired mutations in PTEN and BRCA2, respectively (Fig. 7e). Comparing the expression of genes downstream from PTEN and TP53 by Hyper Reaction Monitoring in regions with the lowest-consternation and highestconsternation of subclones with PTEN and TP53 mutations suggested that these mutations disrupt the associated pathway (Fig. 7f). Genes known to be downregulated by PTEN 45 were The inferred phylogeny for CG118 suggested that it is composed of two major subclones driven by previously observed mutations in CTNNB1 and WT1, with the CTNNB1-mutated clone accounting for a larger proportion of the tumor. CG565 was predicted to acquire mutations in ITGA3 and MACF1 that coincided with clonal expansion. The initiation of CG163 was predicted to include a mutation in LIN28A, which is sufficient to drive Wilms' tumor genesis. c, d RNA-expression profiles in c regions that were more abundant with each of the CG118 subclones-90% vs. 70%, and 30% vs. 7%, for the CTNNB1-mutated and WT1-mutated subclones, respectively-and d LIN28A-mutated subclones of CG163 (100% vs. 74%) suggested differential expression of the gene programs downstream from these predicted drivers.
upregulated in regions rich with PTEN-mutated subclones as did targets of TP53 46 . All input data-including protein expression profiles-and analyses products are given in Table S5.

DISCUSSION
Bulk tissue DNA profiling by whole-genome and targeted sequencing can identify key DNA alterations that provide insight into the biology of tumors and indicate effective treatment options. Increasingly, these assays are used to help elucidate the clonal composition of heterogeneous tumors and even predict ancestral relationships between tumor subclones. The key to such efforts is the accurate estimation of mutation frequencies from both high-coverage and low-coverage DNA profiles. Our study suggested that such estimation efforts must explicitly account for the copy numbers of both the reference and alternative alleles and that only accurate mutation-frequency estimates can yield accurate tumor phylogenies.
Accordingly, we reported on methodology to improve the accuracy of tumor phylogeny reconstruction by improving mutation-frequency estimates from DNA profiles of multiple same-tumor biopsies. Our analysis suggested that mutationfrequency estimates are particularly challenging in the face of high genetic instability, which is characteristic of high-risk cancers, and that the accuracy of methods that rely on DNA profiles of a single biopsy of such cancers is poor. We also showed that even when profiles of multiple biopsies are available, methods that do not explicitly account for the full range of copy number variability produce inconsistent results and have poor accuracy.
We briefly outlined current methods to reconstruct tumor clonal evolution using DNA-profiling. These include methods that rely on single-cell profiles and methods that resolve subclone mixtures from profiles of bulk tissues. We note that single-cell DNA sequencing is expected to help produce more accurate mutationfrequency estimates, but this technology is yet to mature and does not produce accurate single-cell estimates for FFPE tumor samples. The greatest challenge in cancer genomics-and this is Fig. 7 Inferred phylogenies of tumors from three prostate cancer patients. a Five regions of Prostate Cancer 1 (PC1) were profiled at each of three time points, identifying potentially deleterious mutations at each time point. b PC1's inferred phylogeny suggested that EP300 p.I997V mutation was present at Time Point 1, and that the tumor subclone with EP300 p.I997V (Subclone 1) is distinct from the subclone with AR p. T878A (Subclone 2), which was observed only in the later time point and whose clonal frequency increased with time. c Differential protein expression analysis suggested that regions that were predicted to have high clonal composition of Subclone 2 (82% vs. 0%) had higher ARtarget expression than regions without Subclone 2. d Inferred phylogeny for Prostate Cancer 2 (PC2) suggested that RB1 loss was an initiating event, and that that majority of tumor cells are the results of divergent evolution following the acquisition of PTEN and BRCA2 mutations, Subclones 2 and 3, respectively. In total, 5 regions of PC2 were profiled in each of the five Time Points; Subclone 1 was observed in all time points, and Subclones 3 and 4 in Time Points 4 and 5. e PC3's inferred phylogeny included RB1 loss as an initiating event, followed by the acquisition of a BRCA1 mutation. These tumor cells then either acquired TP53 and PTEN mutations, or PALB2, BRIP1, and BRCA2 mutations. All mutations and subclones were observed in each of the two time points profiled. The PTEN, BRCA1, and BRCA2 mutations were previously observed in cancers. f A comparison of PC3 regions with low and high composition for Subclones 3 and 4, 90% vs. 0% and 88% vs. 0%, respectively, suggested significant differential protein expression for targets of TP53 and PTEN.
not expected to change-is sample availability for patients with rich or specific clinical annotation, and FFPE is and will remain the primary preservation method for solid tumor resections. Moreover, frozen samples that could be used to generate cell suspensions that may be profiled using single-cell DNA sequencing cannot be easily used to evaluate the heterogeneity of tumors. We expect that whole-genome, whole-exome, and targeted-panel sequencing will produce the vast majority of tumor DNA profiles in the near future.
We expect that in the cases that single-cell DNA and RNA profiling is possible and pursued, whole-genome and wholeexome sequencing will be used to inform on the biology of clones. In these cases-where analyses of single-cell RNA, DNA, or protein expression will be integrated with analyses of bulk tumor profiles -methods that impute mutation frequencies and even tumor phylogeny from bulk tumor DNA profiles will play a key role in this integrated analysis. Most importantly, we argue that given the heterogeneity of solid tumors and the often observable and documented differences in the composition of regions on the same tumor, profiling multiple tumor regions will be required for both research and clinical efforts in the future.
Our analyses also suggested that Chimaera improves on mutation-frequency estimates by harnessing added information from multiple profiles and by directly accounting for the influence of CNAs on observations from DNA profiles. Chimaera's advantage was clearly observed in simulated data, where its performance was the most consistent and its accuracy the greatest. Interestingly, while Chimaera was able to estimate mutation frequencies with relatively high accuracy even at very high and very low copy numbers, its performance declined for the most unstable genomes where copy numbers for the same mutation varied widely across samples.
Using three tumor types-including Wilm's tumors, which is expected to have relatively low genetic instability and few mutations-and prostate cancers, where longitudinal biopsies from the same patients were available, we showed that when given a sufficient number of biopsies per tumor, Chimaera is able to address mutation-frequency estimate challenges arising from genetic instability. Our results suggested that Chimaera's mutation aggregation approach can help resolve issues arising from convergent evolution 20 and false-positive mutation calls 42 . We also showed how Chimaera could be used in conjunction with ultra-deep sequencing to improve on mutation calling accuracy. In conclusion, our results suggest that accurate mutation-frequency and cellularity inference are possible using profiles of multiple biopsies per tumor when coupled with analyses that account for the effects of CNAs on observed mutation read fractions.

METHODS
We formulated the phylogeny reconstruction problem in set-theoretic terms, which lead to a natural model for the effects of CNAs on mutatedread fractions in sequencing profiles. We describe our methodology for simulating WES tumor profiles, as well as our efforts to deconvolve mutation frequencies from simulated data using ABSOLUTE, AncesTree, EXPANDS, SCHISM, and Chimaera. Note that, to reduce analysis complexity, CNAs and mutation frequency simulations did not include alterations in sex chromosomes. Finally, to demonstrate that Chimaera can be effectively applied to clinical data, we described reconstructed phylogenies from WES profiles of ten same-tumor CRPC biopsies; a set of five same-tumor HCC biopsies from nine patients; two six same-tumor biopsies and an eight same-tumor biopsy of three Wilms' tumors; and three same-tumor prostate-cancer biopsies from three patients profiled at multiple time points.

Phylogeny reconstruction problem
Let M ¼ m : m 2 N; 1 m n f g denote the set of n mutations identified across a set of profiled biopsies S. The mutation burden in any given cell is given as a subset of M, γ M, or as an element of the power set over M, PðMÞ; i.e., γ ∈ PðMÞ is a specific mutation ensemble that characterizes a tumor subclone. We denote the cellularity of γ and its corresponding subclone in biopsy s ∈ S as ρ s γ , and the frequency of mutation m ∈ γ in biopsy s as φ s m ¼ P γ:γ2PðMÞ;m2γ f g ρ s γ . Consequently, P γ2PðMÞ ρ s γ ¼ 1 and the assignment A ¼ fρ s γ : γ 2 P M ð Þ; s 2 Sg produces a solution to our clonality reconstruction formulation.

Mutation frequencies
As defined above, for a mutation m in biopsy s ∈ S, φ s m denotes the frequency of cells in s with mutation m. The total copy number C s of the allele targeted by the mutation can be estimated from WES data. C s is composed by: the copy numbers of the allele in cells that lack mutation m, δ s , the copy number of the wildtype allele in m-mutated cells, δ s w and the copy number of the mutated allele in m-mutated cells, δ s m (Fig. 2). Notice that if no copy number event has occurred at the locus:δ s = 2, δ s w ¼ 1 and δ s m ¼ 1. Adopting the infinite-sites assumption, we denote the mutatedread fraction-the fraction of reads reflecting the mutated versus wildtype allele in a WES profile-in sample s as f s m . Then, we can formulate the following equations (Eqs. 1, 2). (1) Equation (1) provides a weighted sum of the copy number contribution from each allele type, and Eq. (2) gives the ratio of the number of reads from the mutated allele and the total number of reads.

Chimaera
Chimaera proceeds in three steps. First, mutation frequencies are approximated from sequencing and CNA data in each biopsy; then, mutations with similar frequency vectors (where each vector component gives the mutation frequency in each biopsy) are clustered together to form subclones; and finally, mutation frequencies and CNAs for these alleles are refined using an optimization process. The optimization assumes that clustered mutations that are associated with the same subclone have the same frequencies in each tumor biopsy and that δ s mthe average copy number of m (the mutated allele)-is unchanged across biopsies from the same tumor. This assumption can be relaxed in postprocessing.
We first approximate the true frequency of the mutation φ s m by accounting for tumor purity, i.e., the fraction of tumor cells in biopsies that include non-tumor cells, and assuming that the allele's average copy number in tumor cells-whether mutated or not-is fixed. Let p s be the purity of biopsy s, then Eq. (2) can be rewritten as follows: The experimentally observed copy number, C s obs , depends on the purity of the sample and the copy number of the sample tumor cells, C s , as follows: where C s obs can be estimated using additional biochemical assays, genetic sequencing, or through computational analysis of WES data 47 , and the normal cells are assumed to have been corrected for germline copy number variants associated biases.
The simplifying assumption that the allele's average copy numberaveraged across all profiled cells-of the mutated allele in tumor cells is constant across biopsies, i.e,: δ s m ¼ C s 2 . Under this approximation, we can use Eqs. (3) and (4) to eliminate C s and obtain a first approximation of the mutation frequency f φ s m : This constraint will be later removed in the optimization process that follows but is necessary at this stage to obtain a first approximation that mutation frequencies that take into account the copy number influence from WES measurements. The minimization is necessary because of the interplay between the copy numbers at these alleles that may produce a first approximation above 1.
The approximate mutation frequency vectors (Eq. 5) are next clustered to identify candidate groups of mutations that form subclones. We considered clustering algorithms with the robust treatment of outliers in order to ensure good clustering stability and quality. Specifically, we used a method modeled after hdbscan, a density-based hierarchical clustering method that aims at maximizing the stability of the obtained clustered against noise and requires minimal parameter selection. The number of clusters is determined automatically based on the minimal number of mutations that have to be considered to constitute a cluster. We also use tclust 29 , a nonhierarchical robust clustering that trims outliers based on a probabilistic model. The number of clusters is selected by optimizing intracluster entropy or the sum of squared errors (SSE) and using a variety of optimization methods including the Elbow method, Gaussian mixture decomposition (GMD), and SD index [48][49][50] . The clustering based on hdbscan showed better performance on the generated synthetic data compared to others, especially when considering the number of mutations that could be assigned frequency estimates. Furthermore, it has the advantage of avoiding imposing a prior distribution on the mutations frequencies. Once the clusters are found, Chimaera assumes that each cluster represents a subclone and uses the mutation assignment to infer subclone frequencies and copy number estimates for each mutated allele in the final optimization step.
Focusing on subclone ∈ PðMÞ, Eq. (3) describes a relationship between the frequencies and copy numbers of mutations in γ: where B ms is the entry of a matrix B 2 R S j j; γ j j corresponding to mutation m and biopsy s. B is fully determined from analysis of sequencing assays, including purity, observed copy numbers, and observed mutated-read fractions of each mutation.
Unfortunately, the right-hand side of Eq. (6)-a multiplication of frequencies and copy numbers-can not be analytically decoupled. However, from our problem formulation, mutations from the same subclone are expected to have the same mutation frequencies, i.e., φ s mi ¼ φ s mj φ s 8 m i ; m j 2 γ. Further, we assume that the copy number of each mutation m is constant across biopsies, i.e., δ si m ¼ δ sj m δ m 2 0; CN ½ 8s i ; s j 2 S, where CN is a fixed upper bound for the copy number; CN = 15 in our simulations. While we expect that this assumption will introduce some errors to the approximation of δ s m , it will have limited effects on the selection of optimal mutation frequencies because the variability of copy number averages for the mutated allele across biopsy is expected to be low for most mutated loci. We also note that we have not assumed stable genomes in our simulated data, i.e., the generated data displays variable copy numbers for the same mutated allele across biopsies in order to have an accurate estimate of the committed error.
After these assumptions, the optimization problem for each subclone 2 PðMÞ, based on Eq. (6), can be formulated as: minkφ s * δ m * À Bk 2 ; 0 δ m CN; 0 φ s 1; 8m 2 γ; 8s 2 S; where φ s * is the mutation frequency vector across biopsies for all mutations in γ; δ m * is the copy-number vector for each mutation in γ; B is as defined in Eq. (6); and φ s * δ m * denotes the outer product of vectors φ s * 2 R S j j and δ m * 2 R γ j j . We used the sequential least-squares programming (SLSQP) optimization 51 to find an optimal solution of Eq. (7). To avoid being trapped in local optima, multiple runs with different random initializations for the mutations frequencies and unitary variant allele copy numbers are performed.
Profiling and analysis of Wilms' tumors DNA libraries were constructed and sequenced as previously described 52  were performed on the flow cell with Illumina's cBot cluster generation system. On average, about 80-100 million successful reads, consisting of 2 × 100 bp, were generated on each lane of a flow cell. CNAs and mutation frequencies in sex chromosomes were estimated using Genome Analysis Toolkit pipelines.
RNA-seq profiles of Wilms' tumors were aligned using STAR v2.3.0e 32 to an index of GRCh37 that included GENCODE v16 gene annotation. Alignment files were processed using Picard tools v1.54, and the final BAM files indexed using SAMtools index v0.1.11 33 . RNA-seq run quality was assessed using the RNA-SeQC package 34 using the same GENCODE 16.gtf file. Transcript quantification was performed using Cufflinks 18 v2.02 running in quantification mode against the GENCODE v16.gtf file. FPKM values were used for relative abundance estimation.
Profiling and analysis of WES of prostate cancer biopsies To test our ability to infer mutation frequencies and ancestral relations between subclones based on clinical profiles of four prostate cancers. The Specimen were collected at the Department of Pathology and Molecular Pathology, University Hospital Zurich, Switzerland as previously described 53 with the approval of Cantonal scientific ethics committee Zurich, approval number KEK-ZH-No. 2014-0007, and with informed consent by the patient. Tumor regions were selected for heterogeneous histological presentation by an experienced uropathologist (PJW). DNA from peripheral blood and FFPE punches was isolated with the Maxwell 16 LEV Blood DNA kit (Promega, AS1290) and Maxwell 16 FFPE Tissue LEV DNA Purification Kit (Promega AS1130), respectively, according to manufacturer's recommendations; 300 μl of blood collected in a BD Vacutainer K2 (EDTA 18.0 mg) tube was added to 30 μl of Proteinase K solution (final concentration 2 mg/ml) and subsequently mixed with 300 μl lysis buffer, vortexed, and incubated for 20 min at 56°C. FFPE cylinders were deparaffinised with xylene, washed twice with ethanol, dried 10 min at 37°C and re-suspended in 200 μl incubation buffer containing 2 mg/ml Proteinase K. Samples were incubated overnight at 70°C and mixed with 400 μl lysis buffer. Lysates from both, blood and FFPE tissues, were transferred to well 1 of the supplied cartridge of the corresponding kit and DNA was automatically purified and eluted in 30 μl Tris-buffer, pH 8.0 by the Maxwell instrument.
WES samples were profiled using Agilent SureSelect Whole Exome Enrichment, v6 (58 Mbp) and 2 × 75 bp paired-end reads were used for optimal performance on a HiSeq 4000 (Illumina). Mutation calling was followed by protocols established by TCGA and ExAC 21,54 . Reads were aligned to hg19 using BWA 55 , and variants were called with GenomeA-nalysisTK, MuTect 56 , Picard MarkDuplicates, and additional post-processing utilities from GATK including BaseRecalibrator. FastQ files were deposited in EBI's ENA project PRJEB19193. Predicted mutations are given in Table S5; mutations were annotated with estimated read fractions and estimated CNAs by VarScan using default parameters and after setting the maximum amplification to 15× 47 . Profiling and analysis of targeted-panel sequencing of prostate cancer biopsies We selected a total of 36 genes whose mutations are enriched in prostate cancers for ultra-deep sequencing 40,41 . ImmQuant was used to capture their coding regions and the completeness of the capture was verified against the human reference genome GRCh38. Target genes are given below. The samples were taken from the Prostate Cancer Outcomes Cohort Study from UZH (ProCOC) and Metastatic Prostate Cancer Biobank from UZH (metaProC). Patients and target genes were selected for ultra-deep sequencing based on WES profiling and using our predictive panel. Their profiles were implemented in three batches and sequenced using Illumina HiSeq using 150 bp pair-end reads by Sophia Genetics. Analysis of DNA profiles mirrored the steps described for WES analysis. BAM files are freely available on ENA project PRJEB19193.
Copy numbers were estimated using Affymetrix OncoScan arrays and the OncoScan FFPE Assay Kit for detection of genome-wide copy number changes and loss of heterozygosity in FFPE samples. Oncoscan uses molecular inversion probe technology to query over 220,000 SNPs at carefully selected genomic locations, evenly distributed across the genome and with increased density within approximately 900 cancer or cancerrelated genes. All samples underwent array hybridization and analysis and passed gel QC, as established during validation of the assay for clinical genetic analysis. Data were analyzed using Nexus Express.

Protein-expression profiling and analysis of prostate cancer biopsies
For the comprehensive quantitation of proteins in high throughput manner, we employed a data-independent acquisition (DIA) workflow compiled of the commercially available and standardized iST sample preparation kit (PreOmics GmbH), a Q Exactive HF (Thermo Fisher Scientific Inc.) and Spectronaut Pulsar data analysis software 57 . We have applied a combination of published and specifically generated spectral libraries. We have prepared and analyzed a selected five tissue samples from human prostate adenocarcinomas from the retrospective FFPE PC cohort from the University Hospital Zurich as described above 41 . Each specimen was analysed once.
First, the FFPE fixed material was deparaffinised and peptides were generated by using the iST 96× kit from PreOmics following the optimized protocol for FFPE punches. This was followed with DIA-MS analysis of 1 µg of total peptide mixture for each patient sample including iRT standard peptides for non-linear calibration, batch correction and large-scale data set merge. Then, we generated and applied spectral libraries through the merge of an in house generated library (Spectronaut Pulsar) and a commercial organ-specific library 311 FFPE prostate library (311_FFPE_prostate). The resulting library contained 8200 protein groups with a size of 65.5 mb. For quality control, the peptide yield was determined after protein extraction, digestion, and peptide clean up. MS signal intensity was monitored (TIC) during entire MS run as well as the performance and stability of the liquid chromatography according to a reference peptide set.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

DATA AVAILABILITY
BAM files are freely available without restrictions on ENA project PRJEB19193. All supplementary tables, including synthetic data, processed data, and analyses used to produce Fig. 3 are also given in the GitHub repository https://github.com/ drugilsberg/chimaera.