Reconstructing evolutionary trajectories of mutation signature activities in cancer using TrackSig

The type and genomic context of cancer mutations depend on their causes. These causes have been characterized using signatures that represent mutation types that co-occur in the same tumours. However, it remains unclear how mutation processes change during cancer evolution due to the lack of reliable methods to reconstruct evolutionary trajectories of mutational signature activity. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole-genome sequencing data from 2658 cancers across 38 tumour types, we present TrackSig, a new method that reconstructs these trajectories using optimal, joint segmentation and deconvolution of mutation type and allele frequencies from a single tumour sample. In simulations, we find TrackSig has a 3–5% activity reconstruction error, and 12% false detection rate. It outperforms an aggressive baseline in situations with branching evolution, CNA gain, and neutral mutations. Applied to data from 2658 tumours and 38 cancer types, TrackSig permits pan-cancer insight into evolutionary changes in mutational processes.

We apply topic modeling ? to infer signature activities. Within the time point, we separate mutation into K mutation types. Mutation types relate to vocabulary in topic modeling. The types used in TrackSig are described in the Results section. Then we use mixture of discrete distributions to infer signature activities. We describe this model below.
We represent each mutation as a K-dimensional binary vector -"one-hot-encoding" of a mutation type. "One-hot-encoding" of a mutation of type k is a binary vector where k-th component is equal to 1, and other components are zeros. We will denote x (n) to be the "one-hot-encoding of mutation n. A sample containing N mutations is represented as a N × K binary matrix X, where each column corresponds one mutation.
1, mutation n belongs to type k 0, otherwise A mutation process is represented as a distribution over mutation types, known as a mutation signature. We will denote signature multinomials as K-dimensional probability vectors s i , where i = {1..M} is an index over signatures. Signatures are fixed and are not updated during the training.
We aim to estimate signature activities m -the proportion of mutations generated by each signature.
We will use the following notation: We represent mutation matrix X as a mixture of signature multinomials s 1 , ..s K with mixture coefficients m: We denote z n to be the signature assignment of mutation n. The probabilities of mutation n to be assigned to i-th signature are equal to the mixing coefficients: The probability of a mutation n to be generated by signature i is given by: To estimate the activities, we fit mixing coefficients m in each bin using Expectation-Maximization (EM) algorithm ? . The EM algorithm iterates between updating a posterior distribution over z n and updating an estimate of the mixing coefficients m We start with initializing EM algorithm with uniform mixing coefficients: Then, we repeat the following E-step and M-step until the algorithm converges.
In E-step, at the t-th iteration, the posterior probabilities of mutation assignments to signatures are estimated as such: In M-step we update the estimates of the mixing coefficients: The algorithm has converged when the value of π is updated by less than 0.001 between iterations. The resulting mixture coefficients as the activities of the mutational signatures. We show the activities as percentage for the convenience of interpretation.

Supplementary Note 2: Pruned Exact Linear Time (PELT) Algorithm
We adapt Pruned Linear Exact Time (PELT) ? algorithm to detect change points in activity trajectories given cost function (likelihood) and BIC penalty. PELT is based on dynamic programming and uses heuristics to prune the set potential changepoints, thus reducing the computational time.
In this section, we will use the following notation: T -number of time points P -number of changepoints M -number of signatures

Locating change points
As described in the Methods section, we separate mutations into bins 100 mutations, each of which represents one time point. Our input is the set of mutation counts across 96 types for each time point: y 1:T = (y 1 , . . . , y T ). We aim to find P changepoints, or in other words, P + 1 segments. We denote τ 1:P = (τ 1 , . . . , τ P ) to be the boundaries for our segments, meaning each segment will contain the data points y τ i−1 ..y τ i . Given a set of changepoints we can compute the likelihood of the data the following way. We fit mutational signatures within each segment (treating all mutations within each segment as one bin) and compute the likelihoodL(y τ i−1 ..y τ i ) as described in Supplementary Note 1. The total likelihood is the sum of likelihoods in each segment:L = P+1 ∑ i=1 L(y (τ i−1 +1):τ i )

3/46
We aim to minimize the Bayesian Information Criterion (BIC): where k is the number of parameters in our model and T is the number of time points. In our case k = (P + 1) · (M − 1) as we fit (M − 1) signature activities in (P + 1) segments (recall that signature activities sum to 1).
We adapt PELT objective to minimize the BIC criterion. PELT aims to minimize sum of cost functions at each time point, while using a penalty β for each placed changepoint minimize P+1 ∑ i=1 C(y (τ i−1 +1):τ i ) + β (P + 1) Intuitively, we are trying to select changepoints which result in the lowest cost (or highest likelihood) while reducing the penalty associated with adding changepoints. We set the parameters as follows to make the PELT equivalent to BIC: TrackSig-PELT algorithm finds the changepoints as follows. The algorithm starts with finding a partial solution in a subset of the timeline and then increases the search space until changepoints are located over the whole timeline. The algorithm keeps track of the time points R τ * that satisfy the pruning condition and which will be considered as potential changepoints at further iterations. At each iteration τ * , the algorithm considers adding a new changepoint out of the set of available time points R τ * . To score a potential new changepoint, the algorithm refits the activities in bins formed by a potential changepoint. It finds a time point τ with the smallest likelihood and adds it to the list of changepoints cp. Then the list of available time points R τ * is updated: the potential changepoints are removed from further consideration if the increase in likelihood associated with this changepoint does not exceed the complexity penalty β .

Pruning
PELT provides an improvement in runtime by pruning certain changepoints from consideration. We prune time point t if for all t < s < T : The cost of placing the last changepoint prior to T at t will always be higher than cost of placing the last changepoint prior to T at s. Given this result, we can eliminate t as a potential changepoint for all iterations of the dynamic programming algorithm as it will never be optimal going forwards.

Supplementary Note 3: Clonal evolution simulations
Choice of signatures We generate the simulations with four active signatures: S1, S5 and two randomlysampled signatures, which we will call A1, A2. Two other signatures A1, A2 are sampled from uniformly from the set of PCAWG (excluding signatures S1, S5, S7 and "artifact signatures" S40-S60). We decided to exclude signature S7 (sum of signatures S7a, S7b, S7c, S7d) as it had a distribution similar to uniform and was easily confused with other signatures both by TrackSig and DeconstructSigs. We include signatures S1 and S5 in all simulations as they are present in all real samples in PCAWG.
We sample activities separately for each cluster. We sample the activity of S1 from [0.03, 0.1] interval, S5 from [0.05, 0.15] interval, A1 from [0.4, 0.7] interval. The remaining activity is assigned to signature A2 (all signature activities have to sum to 1).
Sampling mutation types To sample mutation types from a signature, we treat it as a multinomial distribution and sample from it. The number of mutations sampled from each signature is equal to the activity of this signature multiplied by the total number of mutations.

Sampling number of ref and alt alleles
Here we describe sampling number of ref and alt alleles for each mutation of the cluster, given the cluster CCF, number of mutation in the cluster and desired mean mutation depth. We tested mean mutation depths of 10, 30 and 100.
For each mutation, we sample read depth d from Poisson distribution with specified mean depth. Then we compute the probability of alt allele as p = ccf * mutantCN totalCN , where ccf is CCF of the current cluster. Finally, we sample number of alt alleles a from a Binomial(d, p) and set the number of ref alleles to be the difference between depth and alt alleles.
In simulations with one and two clusters we use normal copy number of 2, mutant copy number of 1 and purity 1. Each simulation has 5000 mutations in total. We generate 100 simulations of each of five simulation types (one-cluster, two-cluster, branching, cna gain and infinite site assumption) and for each read depth that we tested.

Basic simulations
First, we create simple one-and two-cluster simulations.
One-cluster simulations We create one cluster with the average cluster CCF=1. Number of ref and alt alleles for each mutation is sampled as described in the previous section. We sample activity of the first active signature A1 from the interval Uniform([0.4, 0.7]), activity of time-related signature S1 from Uniform([0.03, 0.1]), and time-related signature S5 from Uniform([0.05, 0.15]). The remaining activity is assigned to the signature A2. Finally, we sample mutation types from each of active signatures. Number of mutation types sampled from each signature is proportional to their activities.

Two-cluster simulations
We create the first cluster with CCF=1 as described above. For the second cluster we sample ccf from Uniform([0.2, 0.6]) distribution. To sample signature activities, we follow the produre similar to one-cluster simulations. We sample activity of the first active signature A1 from Uniform([0.4, 0.7]) for the clonal cluster, and Uniform([0.2, 0.4]) for the second cluster to ensure the signature activity change between the two clusters. Full procedure is shown in Supplementary Note 6.

Branching
To test violation of TrackSig assumptions, we create simulations with branching, CNA gain or violation of infinite site assumption.
To simulate branching, we create three clusters. The clonal cluster is always assigned CCF=1. The CCF for the last cluster (with the smallest CCF) is sampled uniformly from [0.2, 0.35]. The middle cluster CCF is sampled such that it has at least 0.15 gap on CCF scale with other clusters. Additionally, we ensure that sum of CCFs of the second and third clusters does not exceed 1 (otherwise the clusters cannot be branched).
In branching simulations, we expect to see the signature activity for A1 signature decreases at the transition to the second cluster and increases again at the transition to the third subclone. If such step-like behavior of is observed in real data, we consider this a sign of branching. Note that if we reversed the order of the branched clusters and assigned the same signature activities to the first and second clusters, it woudn't be possible to distinguish between these two clusters since TrackSig can only find changepoints based on signature change.
To show the effect of branching on signature trajectories, we assign similar activities to the first (clonal) and third cluster (with the smallest CCF), but introduce a signature change in the second (middle) cluster. To do this, we sample signature activity for A1 from Uniform([0.4,0.7]), calculate the exposures for other signatures and assign the same activities to the first and last cluster. For the middle cluster, we sample activity for A1 signature from Uniform([0.2, 0.4]). As before, we sample activity of time-related signature S1 from Uniform([0.03, 0.1]), and time-related signature S5 from Uniform([0.05, 0.15]) and assign the remaining activity to A2.

CNA gain
CNA gain simulations are based on the branching simulations described above and has three clusters: clonal and two subclones.
We introduce a CNA gain for 10% of mutations in the clonal cluster: 5% of mutations have CNA gain on the mutant allele and 5% have CNA gain on reference allele. Thus, 10% of mutations get total copy number 3 and mutant copy number of 2 and 1 respectively. We assume that these copy number changes are inherited by both subclones. To simulate the CNA change, we adjust the mutantCN and totalCN parameter in Supplementary Note 7 for 10% of mutations in each cluster. We provide total copy number a input to both TrackSig and SciClone.

Violation of infinite site assumption
To simulate the violation of infinite site assumption (ISA), we create four clusters. The first three clusters are created the same way as in the branching simulation. The forth cluster simulates mutations that occurred in both clusters independently, thus violating ISA. The CCF of the forth cluster is the sum of CCFs of the two subclonal clusters. We assign 3% of all mutation to the forth cluster. As expected, the presence of mutations that violate ISA don't affect signature activity trajectories.

Neutral Evolution Mutations
To make our simulations more realistic, we add mutations which emerged due to neutral evolution. We follow Williams et al. ? for generating mutations from neutral evolution. First, we establish the number of neutral mutations to be generated. Then we sample those mutations according to the power-law distribution 1 f 2 , where f is variant allele frequency. Both steps are described in more detail below.
The number of neutral mutations is computed as follows: where f c is the variant allele frequency (VAF) of the cluster, f min is a minimal VAF in consideration and s e is effective mutation rate. For clonal cluster, f c = 0.5. We only consider mutations with 3 or more mutant reads. Therefore, we set f min = 3 d , where d is the mean depth of the simulation. We use s e = 16 ? . Next, we sample M( f c ) mutations according to the power-law distribution on interval [ f min ; f c ]. Cumulative distribution function (CDF) of power-law distribution on the interval [ f min ; f c ] is the following: To sample from this distribution, we take samples from uniform distribution and then use inverse cumulative distribution function (I-CDF) to transform them into samples from power-law distribution. Inverse CDF function takes the following form: where f is our target allele frequency (i.e. sample from the power-law) and u is a sample from uniform distribution.
Note that the approach we used to sample neutrally-evolving mutations may not reflect the true, complex clonal dynamics that would be better represented with a branching process. Although our one cluster case precisely matches a standard neutral model ? , using the same model for the two cluster simulations ignores the effect that the introduction of subclone has on the number and VAF distribution of neutrally-evolving mutations.
It does, however, establish a lower bound on performance. The introduction of a subclone is likely to reduce the number of neutral mutations, though their VAF distributions would not drastically different, and the "neutral mode" near the detection limit would be composed of mutations from both clones rather than from just one clone. These differences would make the reconstruction problem easier for TrackSig. As such, although the neutral model is not correct in the two cluster case, the one we used provides lower bound on TrackSig's performance.
Our results are shown in additional bar in Figure 3b and Supplementary Figure 2. At depth 10 and 30 TrackSig's ability to detect subclones is not impacted by neutral mutations. At depth 100, both TrackSig and SciClone detect an extra cluster, which is consistent findings of Williams et al. ? : neutral evolution can be detected at a minimal depth of 100. Figure 4b shows the example of generated simulation at depth 10.

Supplementary Note 4: SciClone+DeconstructSigs baseline
To showcase the potential of our method, we compared TrackSig to SciClone+DeconstructSigs pipeline which is commonly used to infer signature activities.

SciClone + DeconstructSigs
First, we clustered SNVs using SciClone (v1.1) ? . Sciclone uses variational Bayesian mixture model to cluster SNVs based on their CCF. We provided CNA calls as a part of input for SciClone, same as we do in TrackSig.
Since we needed to test clustering at low depth, we used minimum read depth of 1. We report the results with two clustering methods in SciClone: Beta mixture model (BMM, default) and Beta-binomial mixture model (Binomial BMM).
Finally, we took the mutation clustering performed by SciClone and computed activities of mutational signatures withing each cluster using DeconstructSigs (v1.8.0) ? . We used the same set of PCAWG signatures as we used in TrackSig. We fit the same set of active signatures with DeconstructSigs as we do in TrackSig.

Supplementary Note 5: Analysis of multi-region cases
To compare mutational signatures across multiple samples, we run TrackSig separately on each sample. Samples from the same tumour can have different active signatures. Therefore, for each tumour, we split the samples into groups that have the same set of active signatures and compare samples only within the group. To compare signatures of the clonal cluster, we compute KL divergence and mean activity difference between the first time points of the samples with the same active signatures. Within each group of samples with the same set of active signatures, we compute the pair-wise mean activity difference and KL difference between all pairs of samples within the group. We report the mean metrics of all pairs within the signature group.

14:
Sample signature activities: 15: e S1 ∼ Uniform(0.03, 0.1) 16: for each signature s in active signatures (S1, S5, A1, A2) do  q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q     Figure 9. Evolutionary trajectories for multiple samples from same tumour. Each subplot shows signature trajectories for different samples from the same tumour. Signature trajectories shown are the mean of 30 bootstrap trajectories and therefore are not piece-wise constant. We report mean activity difference and KL divergence between the activities in the clonal cluster only. We compare clonal activities across the groups of samples with the same set of active signatures. Tumour DO51954: Group 1 with active signatures "SBS1 SBS5 SBS40": mean activity diff 0.0573, KL divergence 0.05. Group 2 with signatures "SBS1 SBS5 SBS18 SBS40": mean activity diff 0.036, KL divergence 0.018 Tumour DO51958: Group 1 with active signatures "SBS1 SBS5 SBS18 SBS40": mean activity diff 0.014, KL divergence 0.002. Group 2 with signatures "SBS1 SBS5 SBS18 SBS40 SBS2+13": mean activity diff 0.008, KL divergence 0.001. Tumour DO51965: Group with active signatures "SBS1 SBS3 SBS18 SBS40": mean activity diff 0.042, KL divergence 0.028. Tumour DO51953: Group with active signatures "SBS1 SBS5 SBS40": mean activity diff 0.031, KL divergence 0.005. Tumour DO51959: Group with active signatures "SBS1 SBS5 SBS18 SBS41": mean activity diff 0.011 , KL divergence 0.0014. Tumour DO51962: Group with signatures "SBS1 SBS3 SBS5 SBS8 SBS40": mean activity diff 0.037, KL divergence 0.031.