Main

The development of cancer is an evolutionary process1,2. Acquisition of genomic alterations including mutations and somatic copy-number alterations (SCNAs) drives the emergence of genetically heterogeneous subpopulations of cancer cells or subclones3, resulting in intra-tumour heterogeneity (ITH). A subset of genomic alterations (termed ‘drivers’) endows subclones with increased fitness. Subclones compete for resources, including physical space, and undergo expansion or extinction according to their fitness under selective pressures imposed by the tumour microenvironment (TME) or therapeutic intervention. With advances in next-generation sequencing, clonal architecture and evolutionary features have been elucidated in a variety of tumour types4,5,6,7. However, the ability to predict clinically relevant evolutionary trajectories remains limited.

One potential to enhance this ability lies in the detection and characterization of ongoing clonal evolution. ITH provides a substrate for the selection of competent clones8. Detection of ITH is sampling dependent, and accurate measurement of diversity is enhanced by sampling of multiple small tumour areas9. While macrodiversity (that is, the number of subclones in the whole tumour) reflects the established clonal diversity within a tumour, clonal diversity at a narrow spatial scale, or microdiversity (that is, the number of subclones within a single tumour sample), could represent under-detected ongoing clonal evolution (Fig. 1a). Both macrodiversity6,7 and microdiversity have clinical implications. Microdiversity predicts poor survival in paediatric kidney cancer10 and contributes to invasion in breast tumours11. Clonal diversification sometimes manifests as parallel evolution, that is, the selection of distinct mutations in the same gene in distinct subclones4,5,7,12,13,14,15. More recently, parallel evolution of SCNAs was demonstrated through mirrored subclonal allelic imbalance16. Thus, understanding microdiversity could offer important insights into clinically relevant evolutionary potential.

Fig. 1: Construction of in silico tumours.
figure 1

a, Schematic illustrating future evolutionary trajectories delineated by present under-detected subclones. b, Schematic of probabilistic growth, death and driver acquisition in a coarse-grained cellular automaton model. c, Schematic of two growth modes: ‘surface growth’ with proliferation predominating at the tumour surface and ‘volume growth’ with proliferation throughout the tumour volume. d, Schematic figure of three levels of measurements: from 3D tumour to 2D tumour slice and 2D tumour regions within the slice. e, Representative in silico tumours under volume growth (i) and surface growth (ii), respectively, from a 3D view, showing tumour voxels harbouring gain of chromosome arm events (green) and loss of chromosome arm events (red), with different shades reflecting different driver events.

Despite increasing understanding of ITH and macrodiversity, understanding of the temporal features of clonal evolution remains limited owing to ethical and logistical challenges in obtaining serial single- or multi-regional biopsies from patients17. While longitudinal profiling of circulating tumour DNA permits clonal tracking over time18,19,20,21, this approach provides no resolution of the spatial outgrowth or organization of clones. To improve the understanding of spatial and temporal features of clonal diversification, we developed a coarse-grained cellular automata model of tumour growth with stochastic acquisition of driver events (Fig. 1b). Various forms of non-spatial mathematical models have been formulated to describe different types of tumour growth dynamics, including exponential and polynomial growth22,23. More recently, modelling work incorporating spatial elements of tumour growth found an impact of growth modes on the classification of neutral evolution and selection24,25,26,27. Previous analysis of clear cell renal cell carcinoma (ccRCC) supports the predomination of active proliferation at the tumour surface in a subset of tumours28, and we recently described varying rates of proliferation across the tumour29. In this study, we evaluated the effects of different tumour growth modes on spatial and temporal features of clonal diversification in our model, focusing on two simple growth modes: (i) uniform growth throughout the tumour volume (referred to as the ‘volume growth model’) and (ii) active proliferation restricted to the tumour surface (referred to as the ‘surface growth model’) (Methods and Fig. 1c). We additionally investigated a broader set of model conditions including the implementation of necrosis in the tumour interior, building upon our recent study29. We related our model to the TRAcking Cancer Evolution through therapy (Rx) (TRACERx) Renal study, which previously evaluated the genomic profiles and spatial coordinates of 756 patient tumour (PT) regions from 66 tumours29. Tumours with high clonal diversity do not necessarily harbour metastasizing clones30, suggesting that the development of metastatic competence and continuing subclonal diversification may be uncoupled in the tumour as independent evolutionary processes. Through combined modelling and clinical analysis, we show how tumour growth modes determine the extent and trajectories of clonal diversity. Crucially, we explore temporal aspects of tumour evolution that would otherwise be inaccessible from single-timepoint biopsies.

Results

Generation of an agent-based model recapitulating ccRCC evolution

To understand the spatial and temporal features of clonal diversification, we developed a coarse-grained cellular automaton model to simulate the evolutionary dynamics of ccRCCs (see Methods for a detailed description). The model includes 12 genes and 14 SCNAs (Extended Data Fig. 1) identified as canonical driver events in ccRCCs in the TRACERx Renal study7. Each model unit, referred to as a ‘tumour voxel’, represents a tumour volume of 1 mm3. Tumour voxels stochastically undergo growth, death and acquisition of driver events upon growth. As proliferation proceeds, some tumour voxel acquires a driver event conferring selective advantages, manifested in the current study as an increase in the growth probability (pgrowth). Two ways of implementing selective advantages are considered, being referred to as ‘saturated’ and ‘additive’ driver advantage models. In the saturated driver advantage model, the pgrowth of a tumour voxel can be at one of three levels \(\{ p_{\mathrm{growth}}^{\left( {\mathrm{initial}} \right)},p_{\mathrm{growth}}^{\left( {\mathrm{moderate}} \right)},p_{\mathrm{growth}}^{({\mathrm{maximal}})}\}\) . Each driver endows a tumour voxel with one of these levels, and the relative differences in selective advantage of drivers, denoted as s, are assumed to reflect their association with the Ki67 score in tumour regions (Extended Data Fig. 1) and their frequencies in the clinical cohort7. For simplicity, individual driver gene mutations are assigned with \(p_{\mathrm{growth}}^{({\mathrm{initial}})}\), whereas four SCNAs with strong association with Ki67 score (7q gain, 20q gain, 4q loss and 8p loss) are assumed to be the strongest drivers assigned with \(p_{\mathrm{growth}}^{({\mathrm{maximal}})}\), and therefore their acquisition would lead to the biggest increase in growth probability. Importantly, the saturated model is implemented with only two levels of selective advantage, and the growth probability of a tumour voxel becomes saturated at 1 if acquiring the strongest driver. The additive driver advantage model has a more graduated implementation of selective advantage in which the growth probability of a tumour voxel is defined by all the drivers it harbours, \(p_{\mathrm{growth}}=p_{\mathrm{growth}}^{({\mathrm{initial})}} + \mathop {\sum}\nolimits_k {p_{{\mathrm{growth}}\_k}}\), where pgrowth_k reflects the amount of growth probability added by driver k (Extended Data Fig. 2). The amount pgrowth_k varies between drivers and is assigned according to different strengths of their association with Ki67 score (Extended Data Figs. 1and 2).

Additional assumptions are made to keep the model minimal. Individual driver gene mutations are assumed to be acquired with a greater probability (pdriver) than SCNAs. A second mutation in the same gene is assumed to never occur in the same tumour voxel. As the majority of ccRCCs have clonal VHL inactivation events, in general and in the TRACERx Renal cohort7, the founder tumour voxel is assumed to harbour VHL inactivation together with 3p loss as a clonal event. Based on data from the TRACERx Renal study7, and functional evidence31,32, mutations in PBRM1 or BAP1 are assumed to enhance the probability of SCNA acquisition. Mutations and acquisition of SCNAs are assumed to be proliferation dependent, which implies that DNA replication and chromosome mis-segregation are the main source of genomic alterations. Lastly, the selective advantage endowed by a driver is assumed to be fixed, so the variation in driver advantage dependent on changing environments is not considered in the current study.

Each simulation starts from a single tumour voxel carrying VHL and 3p loss and ends when the size exceeds 1 million tumour voxels, reflecting a tumour diameter of approximately 12 cm. Simulated tumours are analysed at multiple spatial scales (Fig. 1d). A flow diagram of the simulation procedure is presented in Extended Data Fig. 3.

Growth modes impact the extent of clonal diversification

We hypothesized that the growth mode influences the extent of clonal diversification. To test this, we first assessed the clonal diversity at the end of a simulation. Tumours under volume growth commonly harboured only parental clone with the lack of further subclonal diversification or contained a single dominant subclone, whereas tumours under surface growth harboured multiple advantageous subclones (Fig. 1e). Intriguingly, expanding subclones in tumours under surface growth were also associated with changes in tumour morphology. These subclones initially appeared as bulging structures in a localized manner and subsequently outgrew to cover large areas of the tumour surface. Next, we counted the number of clones in the whole tumour (Fig. 2a,b). Under volume growth, subclones were only observed in tumours with larger s and larger pdriver (Fig. 2b). By contrast, under the surface growth model, for a wide range of s, tumours with small to moderate pdriver harboured more subclones (Fig. 2b). We then illustrated the fractions of a tumour that subpopulations occupied and tumour fitness (measured as the average growth probabilities of tumour voxels in a tumour slice, Methods). Overall, volume growth models depicted a dichotomous pattern of clonal evolution and fitness: limited evidence of clonal diversification with low tumour fitness (Fig. 2c(i)) or presence of a single dominant clone with high tumour fitness (Fig. 2c(ii)). The latter pattern reflected early fixation of a highly fit clone in a subset of ‘born to be bad’ tumours. In comparison, in the surface growth model, extensive subclonal diversification and enhanced tumour fitness were evident in nearly all cases, even with a small pdriver (Fig. 2c(iii)). With a large pdriver, almost all tumours achieved peak fitness (Extended Data Fig. 1). More extensive subclonal diversification was also noted in models with additive driver advantages (Supplementary Note 1 and Extended Data Fig. 2). These differences between growth models were quantitatively reflected in the whole-tumour cancer cell fraction (CCF) of the parental clone (Fig. 2d) and Shannon diversity index (Methods and Fig. 2e). Using these two metrics, greater extent of diversification in the surface growth model was also noted for conditions with still smaller pdriver (Supplementary Fig. 1) or smaller s (Supplementary Fig. 2). In the interest of characterizing patterns of subclonal diversification and contrasting the two growth modes, we limit our parameter analysis, to s = 1 and a range of pdriver from 2 × 10−4 to 1 × 10−3.

Fig. 2: Growth modes impact the extent of clonal diversification and tumour fitness.
figure 2

a, Schematic of the whole-tumour analysis of clonal diversity. b, Heatmap showing the average number of clones (that is, parental clone and subclones) with respect to driver acquisition probability (pdriver) and selective coefficient (s) in the volume growth (i) and surface growth (ii) models. The average is calculated from 50 in silico tumours per parameter condition. Clones with a whole-tumour CCF of at least 0.05 are counted. c, Whole-tumour CCF of parental and largest subclones in in silico tumours under volume growth (i,ii) and surface growth (iii), respectively. Average fitness in a tumour slice for each simulation is presented as a heatmap. Driver acquisition probabilities in these sets of simulations are pdriver = 2 × 10−4 (i), 1 × 10−3 (ii) and 2 × 10−4 (iii). ‘Parental (3p loss, VHL)’ clone is shown along with up to five subclones with a whole-tumour CCF of 0.01 or higher. All remaining subclones are represented in the ‘other’ group. d, Whole-tumour CCF of parental clone in in silico tumours under volume growth and surface growth with varying driver acquisition probabilities. n = 100 for each condition. e, Shannon diversity index in in silico tumours under volume growth and surface growth with varying driver acquisition probabilities. n = 100 for each condition. f,g, Mean fitness of randomly sampled (10% of all) tumour voxels against the mean fitness of the central-most (10%) tumour voxels, in models with saturated (f) and additive (g) driver advantages. Data points reflect sets of simulations with varying growth patterns (colour), driver acquisition rates (size) and implementation of necrosis (symbol). Heatmaps indicate the fitness in representative in silico tumours under surface growth without or with the implementation of necrosis. Statistical annotations in d and e reflect two-sided Wilcoxon tests: **** P ≤ 0.0001. In box plots in d and e, the ends of the box reflect the lower (Q1) and upper (Q3) quartiles, with the difference indicating the IQR; the horizontal line dividing the box reflects the median; the ends of the vertical line indicate the extreme values within the range from \({\mathrm{Q}}1 - 1.5 \times {\mathrm{IQR}}\) to \({\mathrm{Q}}3 + 1.5 \times {\mathrm{IQR}}\); dots beyond the vertical line indicate potential outliers.

Source data

We further incorporated central necrosis into a subset of models to evaluate its impact on clonal diversification29. In brief, where necrosis was incorporated, tumour voxels located far from the tumour surface underwent death with an elevated probability (pnecrosis = 0.5) (Methods). In both growth modes, incorporation of necrosis generally led to a greater extent of clonal evolution and higher fitness observed at the end of simulations (Extended Data Fig. 4 and Supplementary Fig. 3). To further investigate the impact of necrosis on fitness at different parts of a tumour, we collected samples of centrally, marginally or randomly located tumour voxels (Methods). Surface growth models in general achieved higher fitness than volume growth models, as evidenced in random samples (Fig. 2f,g and Supplementary Figs. 4 and 5). When necrosis was implemented, fitness in the tumour centre was significantly elevated in the surface growth models (Fig. 2f,g, Supplementary Note 2 and Supplementary Figs. 4 and 5), in keeping with our recent study29. More broadly, surface growth led to more extensive subclonal diversification, corresponding to highly branched tumour evolution, while volume growth resulted either in tumours with limited evidence of clonal diversification or in tumours with early fixation of a fit subclone corresponding to punctuated evolution. Interestingly, these modes of evolution correspond to evolutionary subtypes identified in ccRCC7. A subset of tumours with low fitness has limited subclonal diversification, while other tumours have an early fixation of a highly fit clone resulting in rapid clonal sweep7. There are also tumours with extensive subclonal diversification, characterized by a range of drivers.

Growth modes impact the spatial distribution of clonal diversity

We next examined the spatial distribution of clonal diversity (Fig. 3a and Methods). Under surface growth, multiple subclones, representing macrodiversity, outgrew to occupy distinct spatially contiguous areas with hotspots of microdiversity frequent near the tumour edge (Fig. 3b). In the representative volume growth model, a single dominant subclone was observed with a more uniform distribution of microdiversity hotspots (Fig. 3c). Next, we explored whether similar patterns were present in ccRCC. Using regions that contain at least two subclones as a proxy for microdiversity hotspots in the TRACERx Renal data, we observed both spatial patterns of microdiversity corresponding to the surface growth model (for example, ‘K234’) and volume growth model (for example, ‘K446’) (Fig. 3d). Consistently between the surface growth model and clinical data, while microdiversity was generally high near the edge, there was variation along the edge, with higher microdiversity apparently at the more bulging regions.

Fig. 3: Growth modes impact the spatial features of clonal diversification.
figure 3

a, Schematic figure and procedure for the analysis of microdiversity within a 2D tumour slice. b,c, Spatial maps of subclones (i) and microdiversity (ii) in a representative in silico tumour under surface growth (b) or volume growth (c) with pdriver = 2 × 10−4, showing tumour voxels that belong to the parental clone (grey) and other subclones (randomly generated colours). d, Maps of regional biopsies with the number of subclones within a biopsy, colour coded in two cases (K234 and K446) in the TRACERx Renal study. Hues from red to purple to blue reflect decreasing number of subclones. ‘Low’ and ‘High’ reflect one and four subclones in K234 or two and four subclones in K446, respectively. e, Cumulative probability distribution, P(Dd), of the normalized distances to tumour centre in in silico tumours under surface growth and volume growth and in ccRCC tumours. n = 100 for each model condition for surface growth (‘S’) and volume growth (‘V’). ‘pdriver = 2 × 10−4’ indicates a driver acquisition probability of 2 × 10−4. A total of 606 PT regions from 54 ccRCC tumours are considered for this analysis. ‘Null model’ reflects a power law with an exponent of 2. f, Bootstrapped power-law exponent k, as in \(P(D \le d)\approx d^k\), fitted to cumulative probability distributions generated from bootstrap samples (Methods) with n = 100 k values per condition. g,h, Bootstrapped power-law exponent k in ccRCC tumours partitioned according to relapse status (g) or rate of disease progression (h) with n = 100 k values per condition. Statistical annotations in fh reflect two-sided Wilcoxon tests: **** P ≤ 0.0001. In box plots in fh, the ends of the box reflect the lower (Q1) and upper (Q3) quartiles, with the difference indicating the IQR; the horizontal line dividing the box reflects the median; the ends of the vertical line indicate the extreme values within the range from \({\mathrm{Q}}1 - 1.5 \times {\mathrm{IQR}}\) to \({\mathrm{Q}}3 + 1.5 \times {\mathrm{IQR}}\); dots beyond the vertical line indicate potential outliers.

Source data

Intriguingly, the cumulative probability distribution with respect to the normalized distance from microdiversity hotspots to tumour centre, d, depicted power-law scaling (Fig. 3e), suggesting that the probability of observing spots with high microdiversity along the radius of a tumour could be estimated using a simple mathematical formula (that is, \(P(D \le d)\approx d^k\), where k is the power-law exponent to be fitted). In comparison, the surface growth model displayed a larger k (Fig. 3f and Supplementary Fig. 6), indicating a greater likelihood of microdiversity hotspots being enriched near the tumour edge. Critically, spatial homogenization of subclone patterns abolished the characteristic scaling behaviour (Supplementary Note 3 and Extended Data Fig. 5), demonstrating the importance of spatial elements of tumour growth in generating microdiversity. Models with additive driver advantages showed similar distributions of microdiversity hotspots (Supplementary Note 4 and Extended Data Fig. 6). The incorporation of necrosis into the model re-adjusted the spatial profile in surface growth models, leading to the enrichment of additional microdiversity hotspots in the necrotic tumour centre (Supplementary Note 4 and Extended Data Fig. 6). The power-law pattern was also observed in 54 tumours with microdiversity hotspots (Fig. 3e). Moreover, there was an association between the power-law scaling exponent k with relapse status (Fig. 3g) and the rate of disease progression (Fig. 3h) in the TRACERx Renal study (Supplementary Table 1), where tumours mapped to a poorer clinical outcome are typically associated with a steeper spatial distribution of microdiversity hotspots and enrichment towards the tumour margin (Supplementary Note 5 and Supplementary Fig. 7). Tumours with attenuated progression showed a steep gradient of microdiversity hotspots, consistent with surface growth models. Interestingly, both tumours with no progression and those with rapid progression show a shallow gradient of microdiversity hotspots, nicely corresponding to volume growth models with limited evidence of evolution and with early fixation of a fit clone, respectively. The observation of spatial features of clonal diversity (Supplementary Table2) adds to our previous finding that the overall genetic diversity correlated with patient clinical outcome7.

Growth modes impact the spatial patterns of parallel evolution events and youngest subclones

As subclonal diversification could involve acquisition of, and be facilitated by, distinct mutations in the same gene at spatially separate locations, we next evaluated the frequency of parallel evolution events and their spatial features. In the TRACERx Renal study, parallel evolution was observed in 28 tumours, with each event spanning a variable number of regions (Supplementary Note 6 and Supplementary Table 3). Interestingly, parallel mutation events with limited clonal expansion (spanning only a single region) showed distinct spatial patterns in different ccRCCs, suggesting that ongoing convergent evolution in the same gene could operate at varying locations of a tumour (Fig. 4a). In some cases (for example, K252), such events were all close to the margin; in other cases (for example, K520), they were located far from the edge (Fig. 4a,b). We hypothesized that the observed distinct patterns of parallel mutation events could be attributed to the patterns of proliferation and accordingly recent subclone births. To test this, we returned to the computational model and examined whether and how different growth models differ in the patterns of youngest subclones (Fig. 4c). Consistent with the patterns of microdiversity, we observed a preferential distribution of the youngest subclones near the tumour edge in the surface growth model and a more uniform distribution in the volume growth models (Fig. 4d(i) and Extended Data Fig. 7). Interestingly, when necrosis was incorporated, surface growth models often showed a bimodal distribution, with youngest subclones being born either near the tumour surface or in the necrotic interior, a pattern that was rarely observed in volume growth models (Fig. 4d(ii)).

Fig. 4: Growth modes impact the spatial features of parallel evolution and youngest subclones.
figure 4

a, Distance from regions harbouring parallel mutational events that span a single region to the tumour margin. Red rectangles indicate the two representative cases shown in b. b, Maps of regions containing parallel mutation events in two representative cases (PBRM1 events in K520 (i) and BAP1 events in K252 (ii)) in the TRACERx Renal study. Distinct parallel mutation events are indicated using different colours. For regions containing more than two parallel mutations, two colours are applied simultaneously. Double-headed arrow indicates the measurement of distance to tumour edge. c, Schematic of the analysis of youngest subclones within a 2D tumour slice. d, Distance from the positions of youngest subclones to the tumour margin, in models without (i) or with (ii) the implementation of necrosis. n = 100 youngest subclones from each simulation are analysed and shown as grey points, with the mean distance to margin indicated with a coloured vertical bar. n = 50 simulations are shown, arranged from small to large mean distance to margin (top to bottom) for each model condition. Surface growth and volume growth models are shown in red and blue, respectively, with increasing driver acquisition probabilities indicated by increasing colour intensity. e, Distance from regions harbouring genomic alterations that span a single region to the tumour margin. Only tumours with at least ten regions are included. Red rectangles indicate the two representative cases shown in f. f, Maps of regional biopsies with the number of subclones within a biopsy colour coded in two representative cases (K523 and K156) in the TRACERx Renal study. Hues from red to purple to blue reflect decreasing number of subclones. ‘Low’ and ‘High’ reflect one and four subclones in K523 or one and six subclones in K156, respectively. Regions harbouring events that span a single region are marked by arrows, green if located within 10 mm from the tumour edge and orange otherwise. g,h, Histological images of representative areas from tumour regions of K156 (g) and K523 (h). Vertical dashed line in a,d and e corresponds to a distance of 10 mm.

Source data

Motivated by the diverse patterns of youngest subclones observed in models with different growth modes, we then turned to the clinical data. Focusing on the 20 ccRCC cases most extensively sampled (n ≥ 10 regions) from TRACERx Renal, we observe events that spanned a single region (assumed to be the most recent/youngest subclones) close to the tumour margin (for example, K523, K360 and K234) and located at varying distances from the tumour margin (for example, K156, K165 and K272) (Fig. 4e,f and Extended Data Fig. 7). Observations of these representative cases characterized by high levels of clonal diversity7 suggest that they evolved via surface growth but that necrosis in K156, K165 and K272 enabled birth of young subclones in the interior. Intriguingly, histological assessment of K156 showed the presence of paucicellular areas both macroscopically (Supplementary Fig. 8) and microscopically (Fig. 4g) in the interior of the tumour, which interfaced youngest subclones, in contrast to the interior of K523 (Fig. 4h). These observations suggest that continuing proliferation and clonal evolution occur at the tumour margin but are also facilitated by available space in the interior.

Growth modes impact the temporal features of clonal diversification

We investigated temporal features of clonal diversification through in silico time-course analysis. Overall, the number of subclones remained limited in volume growth models but increased over time before reaching a plateau in surface growth models (Fig. 5a,b and Extended Data Fig. 8). Thus, surface growth and volume growth models initially showed similar extent of clonal diversification with subsequent divergence. Notably, when necrosis was implemented, volume growth models were minimally impacted, but we observe a dramatic reduction in clonal diversity at later stages of tumour growth in surface growth models, especially when additive driver advantages were implemented (Extended Data Fig. 8). This reflected a ‘pruning’ effect on the clonal structure with elimination of the less fit subclones. These observations reconciled previous observation of a non-monotonic relationship between tumour size and number of clones, including the collapse of clonal diversity at very large tumour sizes7.

Fig. 5: Growth modes impact the temporal features of clonal diversification.
figure 5

a, Number of subclones as a function of the diameter of a 2D tumour slice in in silico tumours under surface growth and volume growth, for n = 50 simulations with \(p_{\mathrm{driver}} = 6 \times {10^{-4}}\) for each condition. b, KDE with respect to the number of subclones and the diameter of a 2D tumour slice in in silico tumours under volume growth (i) and surface growth (ii). Each KDE plot is based on 250 simulations (50 per condition) under five conditions with \(p_{\mathrm{driver}} = 2 \times 10^{ - 4},4 \times 10^{ - 4},6 \times 10^{ - 4},8 \times 10^{ - 4}\, {\mathrm{and}}\,1 \times 10^{ - 3}\). c, Vector maps of evolutionary flows over time with respective to the number of subclones and the average number of drivers accumulated among tumour voxels in surface growth models without (i) or with (ii) the implementation of necrosis. n = 50 simulations with \(p_{\mathrm{driver}} = 6 \times 10^{ - 4}\) are shown. d, The spatial patterns of parallel mutations in PBRM1 (upper, distinct events in different colours) and microdiversity (lower) over time in a representative in silico tumour under surface growth. The arrows indicate budding structures preceding subclonal expansion. e, Time evolution with respect to the number of subclones and the mean fitness of the tumour slice, along with the diameter of tumour slice, contour circularity and average number of drivers accumulated, in surface growth models. n = 50 simulations with \(p_{\mathrm{driver}} = 6 \times 10^{ - 4}\) are shown. f, Axial image in the corticomedullary contrast phase of a representative case (K523) showing budding structure on the tumour surface (red arrow). Outlines in red reflect the tumour contour giving volumetric tumour coverage. g, Maps of tumour regions with the number of subclones colour coded in a representative case. Hues from red to purple to blue reflect decreasing number of subclones. ‘Low’ and ‘High’ reflect one and four subclones, respectively. h, The number of subclones as a function of ultimate tumour size in the TRACERx Renal study, overlaid with KDE based on simulated data. Tumours with size up to 7 cm and with radiologically evident budding structures are highlighted (orange). Contours reflect 90% probability density based on in silico tumours under surface growth (red) and volume growth (blue) in Fig. 6b.

Source data

As the birth of a new subclone is defined by the acquisition of new driver events in a tumour voxel, expectedly, surface growth models, which showed more extensive clonal diversification, accumulated a larger number of drivers, at a faster rate, than volume growth models (Extended Data Fig. 8). Nevertheless, in contrast to the rate of clonal diversification, the number of accumulated drivers increased monotonically over time in the surface growth models. Repeat simulations under surface growth clearly exhibited an altered direction of ‘evolutionary flows’, indicating out-competence of advantageous subclones and reduction of overall clonal diversity at later stages (Fig. 5c). Underlying these observations, surface growth led to polynomial growth with longer time to reach the stopping condition, while volume growth resulted in exponential growth (Extended Data Fig. 8). The faster growth rate in volume growth models means a large contribution of parental clone to overall tumour growth and shorter time for advantageous subclones to outgrow and compete, leading to tumours with limited diversification.

Early indication of evolutionary potential

We have previously shown that evolutionary features correlate with clinical outcomes and could be used to guide patient management7. Therefore, it is of particular interest to examine whether the computational model could be used to suggest predictive features for the likely evolutionary trajectories and therefore clinical behaviour of individual tumours.

Of note, early-stage tumours under distinct growth modes appeared indistinguishable with respect to the number of subclones (Fig. 5a). We specifically investigated features that could indicate subsequent subclonal diversification in the surface growth model and noted the appearance and outgrowth of budding structures (Fig. 5d). Notably, as the tumour grew with gained fitness, the contour circularity of a tumour slice initially decreased and then recovered, concomitant with the initial increase and subsequent reduction in clonal diversity, respectively (Fig. 5e). Exploratory simulations attempting at ‘replaying’ evolution (that is, re-simulating clonal evolution from a historical tumour state with established clonal structure as a starting point) starting from different tumour sizes suggested that evolution was more repeatable if starting from a historical tumour state with budding structures emerging (Supplementary Note 7 and Extended Data Figs. 9 and 10).

With respect to the above findings, the radiological features of 46 tumours with diameter <7 cm in the TRACERx Renal study were evaluated (Supplementary Table 4). By qualitative examination of radiological images, budding structures were apparent at the surface of 16 tumours and evident in the tumour ex vivo (one representative, K523, is shown in Fig. 5f,g). In this case, adjacent to the budding structures were regions with high clonal diversity, consistent with surface growth models. Interestingly, the presence of budding in these 16 sub-7-cm tumours exhibiting increasing diversity as a function of tumour size, combined with the absence of budding and low diversity in some larger tumours (Fig. 5h), showed clear concordance with the temporal trend of clonal diversification and morphological variation observed in the surface growth models. The failure of this trend to continue in larger tumours supports the observed collapse of subclonal diversity in simulated tumours under surface growth.

Discussion

Genetic ITH arises when clonally related populations of cells in the tumour acquire distinct genomic alterations, endowing the subclones with a range of fitness advantages. Our understanding of how major subclones sculpt evolutionary trajectories has been gleaned primarily from multi-region sampling and deep sequencing. However, our understanding of the temporal features of clonal evolution and our ability to predict evolutionary trajectories remain limited. Therefore, we focus on spatial and temporal characterization of clonal diversity to elucidate predictive features for evolutionary potential, summarized in Fig. 6. To this end, we developed an agent-based model to study tumour growth and clonal evolution, with a focus on examining the contribution of different modes of growth, namely surface and volume growth, and the presence or absence of central necrosis to spatial and temporal features of clonal diversity.

Fig. 6: Summary diagram: tumour growth modes impact the extent, spatial features and temporal trajectories of clonal diversification.
figure 6

Surface growth models lead to attenuated progression with extensive subclonal diversification, reflective of branched evolution. At the early stage, the birth and outgrowth of proliferatively advantageous subclones causes the formation of surface budding structures and consequently a distorted tumour contour. As these subclones grow to collectively constitute the tumour frontier at a later stage, the tumour contour returns to a more circular shape with enrichment of youngest subclones and microdiversity hotspots and enhancement of fitness near the tumour margin. The incorporation of central necrosis causes the loss of macrodiversity but at the same time permits continued subclonal diversification in the tumour interior, evidenced by the enrichment of youngest subclones and microdiversity hotspots. In contrast, volume growth models give rise to dichotomous patterns of tumour growth and clonal evolution, developing tumours that are either indolent with ‘lack of evolution’ or aggressive with early birth of fitter subclone and rapid progression. The distributions of youngest subclones, microdiversity hotspots and fitness in volume growth models are more uniform than those in surface growth models.

Adding to previous modelling work on spatial elements of tumour growth24,25,26,27,33,34,35,36,37, our model demonstrates that growth modes impact subclonal diversification. Specifically, volume growth resulted in either limited evidence of evolution or punctuated evolution with early fixation of a fit clone, whereas surface growth gave rise to branched evolution with extensive subclonal diversification (Fig. 6). Intriguingly, surface growth models revealed a non-monotonic variation in clonal diversity over time with a dramatic collapse of diversity at large tumour sizes, consistent with the apparent relationship between static data of tumour size and clonal diversity in the TRACERx Renal study. These temporal features of clonal diversity informed by the model raise the possibility that evolutionary modes are not a static property but instead can undergo a dynamic switch from a branched to an apparently punctuated sub-type, with peak diversity occurring in the past, during tumour development.

Spatial analyses further uncovered that microdiversity hotspots and youngest subclones were more uniformly distributed in volume growth models while predominantly near the tumour margin in surface growth models (Fig. 6). As in the model, the spatial distribution of microdiversity hotspots exhibited a power-law pattern in ccRCCs. Strikingly, the exponent of the power law was associated with previously described different classes of ccRCC evolution. Tumours with attenuated progression had a larger exponent, which is consistent with their more branched phylogenetic trees. Both indolent mono-driver and aggressive poly-driver tumours had lower exponents, suggesting volume growth patterns, with the aggressive tumours simply determined in our model by the early acquisition of multiple strong drivers.

Investigation on the impact of necrosis in our model further broadened our understanding of the spatial and temporal features of clonal diversification. (Fig. 6). Incorporation of necrosis led to enhanced fitness in the tumour interior, suggestive of selection of fitter clones, in keeping with our recent study29. Furthermore, with necrosis, tumours under surface growth harboured additional microdiversity hotspots and youngest subclones at the centre. The pattern of youngest subclones was corroborated by the analysis of sequencing data and further supported by histological evidence of interface between tumour and acellular areas, suggesting that surface growth with necrosis incorporated could explain the evolution of some ccRCCs. Our recent work in the context of TRACERx Renal demonstrated that metastasis-competent subclones are enriched at the tumour centre, suggesting that environmental factors favoured their selection, possibly through acquisition of advantageous traits such as epithelial-to-mesenchymal transition. In the current study, we present a complementary, and non-exclusive, perspective that necrosis could accelerate the rate of evolution, via turnover of tumour mass, to achieve enhanced fitness in the tumour centre.

Finally, tracking advantageous subclones over time in silico illuminated the rapid increase of their prevalence in small tumours, marked by the appearance of budding structures, concomitant with subclonal diversification in surface growth models. Intriguingly, budding structures were radiologically apparent in 16 early-stage ccRCC tumours, a subset of which already showed high clonal diversity, by molecular profiling. While budding structures in our model arose from advantageous subclonal outgrowth, alternative mechanisms cannot be excluded33,38.

To conclude, we have developed a model that enables us to understand how spatial patterns of growth and necrosis determine patterns of clonal diversity in space and time (Fig. 6). We validate our findings using patient data, thereby opening the potential for predicting future clinical behaviour, precision medicine’s holy grail.

Methods

Computational model

Tumour growth and clonal evolution in a spatio-temporal context have increasingly been studied with the aid of computational models that incorporate spatial elements of tumour growth and acquisition of genomic alterations24,25,33,34. Spatial patterns of tumour growth24,25,26,27 have been shown to impact the ability to classify neutral evolution in contrast to selection, suggesting that spatial growth of a structured population interplays with evolutionary forces (driver acquisition, selection and genetic drift) to shape the spatial patterning of subclones.

In the present study, to establish an understanding of the spatial and temporal features of clonal diversification and to enhance the ability to predict evolutionary trajectories in ccRCC, we constructed a coarse-grained cellular automaton model to simulate tumour growth and clonal evolution. A basic model unit reflects a tumour volume of 1 × 1 × 1 mm3, referred to as a ‘tumour voxel’. The full simulation lattice comprises 200 × 200 ×200 lattice sites, each of which can accommodate a single tumour voxel when a tumour grows. As a simulation proceeds, tumour voxels stochastically undergo growth, death and acquisition of driver events upon growth (Extended Data Fig. 3). The subsequent sections detail the model components and assumptions.

Growth and death

Tumour voxels stochastically undergo growth and death, with baseline probabilities per simulation step of pgrowth = 0.25 and pdeath = 0.05, respectively. Upon death, a tumour voxel is removed from the simulation lattice, rendering the site empty and available for accommodating new tumour voxels. Two different modes of spatial tumour growth are considered: surface growth and volume growth (Fig. 1c). For surface growth, proliferation only takes place when space is available, namely when at least 1 of the 26 neighbouring lattice sites of the tumour voxel selected to divide is empty. Upon duplication of a parent tumour voxel, one child tumour voxel retains the position of the parent while the other is placed at a randomly selected adjacent empty site. For volume growth, all tumour voxels can proliferate. Upon duplication, one child tumour voxel retains the position of the parent while the other is placed at a selected adjacent site according to the rule described below and pushes tumour voxels in that orientation outward. The process for selecting an adjacent site includes two steps: (1) random sampling of 10 candidate positions out of the 26 neighbouring lattice sites and (2) selection of the orientation (that is, pointing from the position of the parent tumour voxel to the candidate position) giving the smallest distance from the tumour surface, similar to the algorithm described in ref. 34.

Driver events

A panel of 26 ccRCC drivers that were highlighted previously7, including mutations in 12 genes and 14 SCNAs, are considered in the present work (Extended Data Fig. 1). For simplicity, the selective advantage conferred by a driver is assumed to manifest as growth advantage.

Two ways of implementing selective advantages are considered and referred to as ‘saturated’ and ‘additive’ driver advantage models. In the saturated driver advantage model, the pgrowth of a tumour voxel can be at one of three levels \(\{ p_{\mathrm{growth}}^{\left( {\mathrm{initial}} \right)},p_{\mathrm{growth}}^{\left( {\mathrm{moderate}} \right)},p_{\mathrm{growth}}^{({\mathrm{maximal}})}\}\). Each driver endows a tumour voxel with one of these growth probabilities, and the relative differences in selective advantage of drivers are assumed to reflect their association with the Ki67 score in tumour regions (Extended Data Fig. 1) and their frequencies in the clinical cohort7. In a general form, \(p_{\mathrm{growth}}^{({\mathrm{moderate}})} = g(s)p_{\mathrm{growth}}^{({\mathrm{initial}})}\) and \(p_{\mathrm{growth}}^{({\mathrm{maximal}})} = h(s)p_{\mathrm{growth}}^{({\mathrm{initial}})}\) are functions of the baseline growth probability, where \(h(s) \ge g(s) \ge 1\) reflect the growth advantages relative to the baseline. As one specific implementation, \(p_{\mathrm{growth}}^{({\mathrm{initial}})} = 0.25\), \(p_{\mathrm{growth}}^{({\mathrm{moderate}})} = (1 + s)p_{\mathrm{growth}}^{({\mathrm{initial}})}\) and \(p_{\mathrm{growth}}^{({\mathrm{maximal}})} = (1 + s)^2p_{\mathrm{growth}}^{({\mathrm{initial}})}\), where 0 ≤ s ≤ 1 reflects the selective advantage. For simplicity, individual driver gene mutations are assigned with \(p_{\mathrm{growth}}^{({\mathrm{initial}})}\), whereas four SCNAs with strong association with Ki67 score (7q gain, 20q gain, 4q loss and 8p loss) are assumed to be the strongest drivers assigned with \(p_{\mathrm{growth}}^{({\mathrm{maximal}})}\) and therefore their acquisition would lead to the biggest increase in growth probability. Importantly, the saturated model is implemented with only two levels of selective advantage, and the growth probability of a tumour voxel becomes saturated at 1 if acquiring the strongest driver. In comparison, the additive driver advantage model has a more graduated implementation of selective advantage. In this implementation, each driver adds a certain amount of growth probability to the tumour voxel that acquires the driver, namely \(p_{{\mathrm{growth}}} = p_{\mathrm{growth}}^{({\mathrm{initial}})} + \mathop {\sum }\limits_k p_{{\mathrm{growth}}\_k}\), where \(p_{{\mathrm{growth}}\_k}\) reflects the amount of growth probability added by driver k (Extended Data Fig. 2). \(p_{\mathrm{growth}}\) is set to 1 if the calculated probability exceeds 1. The amount \(p_{{\mathrm{growth}}\_k}\) varies between drivers, reflecting the different strengths of their association with the Ki67 score (Extended Data Fig. 1). Three different scenarios were explored to reflect different amounts of growth probability endowed by drivers on average, as determined by the sk of the weakest driver, namely min(sk), and the difference in sk between consecutive two drivers in their advantages, namely Δsk (Extended Data Fig. 2).

Upon proliferation of a parent tumour voxel, child tumour voxels inherit existing driver events harboured by the parent tumour voxel and stochastically acquire new drivers. Individual driver gene mutations are assumed to be acquired with a greater probability (pdriver) than SCNAs (0.001pdriver). A second mutation in the same gene is assumed to never occur in the same tumour voxel, but multiple independent, distinct mutations in the same gene may be acquired in parallel within a simulated tumour in different tumour voxels. As the majority of ccRCCs have clonal VHL inactivation events, in general and in the TRACERx Renal cohort7, the founder tumour voxel is assumed to harbour VHL inactivation together with 3p loss as a clonal event. The subpopulation of tumour voxels that only harbour these two events is referred to as the parental clone. Based on their association with a high weighted genome instability index in the data from the TRACERx Renal study7, and functional evidence31,32, mutations in PBRM1 or BAP1 are assumed to enhance the probability of SCNA acquisition (to pdriver). A range of driver acquisition probabilities have been studied to explore the impact on the patterns we investigate (see Supplementary Note 8 for considerations for the selection of pdriver values in the coarse-grained model). Mutations and acquisition of SCNAs are assumed to be proliferation dependent, implying that DNA replication and chromosome mis-segregation is the main source of genomic alterations. Lastly, the selective advantage endowed by a driver is assumed to be fixed, so the variation in driver advantage dependent on changing environments is not considered in the current study.

Necrosis

Building upon our previous work29, necrosis is implemented in a subset of model conditions to evaluate its impact on features of clonal diversification. Specifically, tumour voxels located at a distance greater than \(d_{\mathrm{necrosis}} = 15\,{\mathrm{mm}}\) from the tumour surface undergo death with a probability of \(p_{\mathrm{necrosis}}\). A probability of \(p_{\mathrm{necrosis}} = 0.5\) is used in this study, in keeping with our previous work. Like the spontaneous death described above, upon necrosis-induced death, a tumour voxel is removed from the simulation lattice, rendering the site empty and available for accommodating new tumour voxels.

Simulation

The procedure for simulating events of death, proliferation and acquisition of driver events is illustrated in a flow diagram (Extended Data Fig. 3). Briefly, each simulation starts from a single tumour voxel (that is, founder tumour voxel) that harbours a VHL mutation and 3p loss as truncal events, placed at the centre of the lattice, (x0, y0, z0). During the evaluation of possible death events, for each of all tumour voxels alive, pdeath is compared with a random number generated in the range [0, 1]. If pdeath is larger, a death event occurs, resulting in the lattice site being freed to accommodate a newly born tumour voxel in the future. During the evaluation of possible proliferation events, for each of all valid tumour voxels (see above for the difference between surface growth and volume growth), pgrowth is determined according to the driver events harboured (see above for the difference between ‘saturated’ and ‘additive’ models of fitness advantage) and compared with a random number generated in the range [0, 1]. If pgrowth is larger, a proliferation event occurs, resulting in a new tumour voxel being created nearby. During the evaluation of possible acquisition of driver events, for each of all the daughter tumour voxels just arising from proliferation and for each of the ccRCC drivers, pdriver is compared with a random number generated in the range [0, 1]. If pdriver is larger and that driver is not currently harboured by the tumour voxel, acquisition of the driver takes place in the given tumour voxel. In a subset of simulations, necrosis is implemented. During the evaluation of necrotic death events, for each of all tumour voxels alive, if it is located at a distance greater than \(d_{\mathrm{necrosis}}\) from the tumour surface, \(p_{\mathrm{necrosis}}\) is compared with a random number generated in the range of [0, 1]. If \(p_{\mathrm{necrosis}}\) is larger, a necrotic death event occurs, resulting in the lattice site being freed to accommodate a newly born tumour voxel in the future. The simulation runs until the tumour grows to at least 1 million tumour voxels after the last simulation step. The computer code is written in CUDA C++.

Evolutionary replay

The procedure for simulating evolutionary replay is illustrated in a flow diagram (Extended Data Fig. 9). Briefly, a preparation step is performed to create evolutionary snapshots of a simulated tumour at different timepoints. Specifically, each snapshot contains precise information about the positions, subclone identities and drivers of all tumour voxels. At the beginning of evolutionary replay, N replicate tumours are reconstructed, each with a copy of the same evolutionary snapshot at a given timepoint t. Then, these replicate tumours undergo the events described above, each with a different unique random seed, and grow to the predefined stopping size. Evolutionary outcomes from these replicate tumours are evaluated and compared.

Model analyses

Levels of analysis

Analyses are conducted at three different levels (Fig. 1d): (1) the whole-tumour level, which takes into account all tumour voxels in the three-dimensional (3D) volume, (2) the tumour slice level, which takes into account all tumour voxels within a two-dimensional (2D) plane \((z = z_0)\) and (3) the regional biopsy level, which takes into account tumour voxels within regional biopsies. A regional biopsy is defined as all tumour voxels within a region in the 2D slice. Spatially uniform sampling is performed in this study. This process is carried out by locating the centres of candidate regional biopsies in the 200 mm × 200 mm 2D lattice with a spacing of 20 mm and collecting all voxels within a distance of 5 mm from each biopsy centre.

CCF of subclones

The CCF of a subclone is calculated as the number of tumour voxels that belong to a subclone divided by the total number of tumour voxels in the domain of interest, depending on the level of analysis. A subclone is identified by a set of driver events, shared by a subpopulation of tumour voxels, which are accumulated within the subclone-initiating tumour voxel. A subclone-initiating tumour voxel is defined as a tumour voxel that acquires a new driver event upon birth. A subclone is considered detectable if the CCF is greater than 0.01.

Shannon diversity index

As a measure of clonal diversity, the Shannon diversity index is defined as \(S = \mathop {\sum }\limits_i - f_i\ln f_i\), where fi is the CCF of subclone i. All subclones are considered in this calculation.

Number of drivers accumulated

The drivers (including both mutations and SCNAs) harboured by each tumour voxel within a tumour slice are counted at each timepoint. The average number of drivers among all tumour voxels is then calculated.

Fitness

The fitness of a tumour voxel is defined as the instantaneous growth probability, pgrowth, dependent on the set of drivers harboured by that tumour voxel. In representative cases, the fitness of tumour voxels is mapped within a 2D tumour slice. To examine the spatial features of tumour fitness, the distances from every tumour voxel to the centre of a tumour slice and to the nearest point along the tumour contour are calculated, respectively. The central-most 10% of tumour voxels, with the shortest distances to the centre, and the marginal-most 10% of tumour voxels, with the shortest distances to the margin, are sampled to calculate a sample-average fitness. Additionally, another 10% of tumour voxels is randomly sampled for comparison. The ratio of the mean fitness of the central-most 10% of tumour voxels to that of the marginal-most 10% of tumour voxels, denoted as ‘Ratio_C2M’, is calculated for each simulation.

Microdiversity

Microdiversity is defined as the number of subclones contained in a 3 × 3 mm2 region within the tumour slice. In representative cases, microdiversity is spatially mapped within a tumour slice, by sliding a 3 × 3 mm2 spatial window throughout the tumour slice. Microdiversity hotpots are defined as a subset of these small regions with five or more subclones. The distance from a microdiversity hotspot to the centre of a tumour slice is referred to as the distance to tumour centre (d1). The distance from a microdiversity hotspot to the nearest point along the tumour contour is referred to as the distance to tumour margin (d2). The normalized distance to tumour centre is defined as \(d = d_1/(d_1 + d_2)\). The cumulative probability distribution, \(P(D \le d)\), of d is generated by combining microdiversity hotspots from repeat simulations. The power-law exponent is obtained by bootstrapping 100 samples of 400 hotspots per sample and fitting a power-law function to the cumulative probability distribution of d in each sample.

Youngest subclones

Youngest subclones are defined as the subclones that emerge closest to the end of a simulation. For the analysis of their spatial distribution, we recorded the 100 youngest subclones within a 2D tumour slice and, for each of them, measured the distance from its position to the tumour margin. The mean of the distances for these 100 subclones was calculated for comparison between different replicate simulations.

Contour circularity

Tumour contours are determined from 2D tumour slices. Smoothing is performed for each tumour voxel along the tumour contour by calculating the mean (x, y) position of this tumour voxel and adjacent tumour voxels. After smoothing, contour circularity is calculated as \({\mathrm{Circularity}} = \frac{{4\uppi \,{\mathrm{Area}}}}{{\mathrm{Perimeter^2}}}\).

Temporal analysis

For the time-course study, 2D tumour slices are collected over multiple timepoints. The number of subclones is counted within each historical tumour slice. Kernel density estimation (KDE) with a Gaussian kernel is performed with respect to the number of subclones and the diameter of the tumour slice, based on all simulations under a given model condition, to produce a continuous density estimate.

Analyses of tumour data

TRACERx Renal cohort

Seventy-nine tumour sections of 66 unique primary tumours are included in this study. See the exclusion criterion in our previous publication29.

Computed tomography images

Contrast-enhanced computed tomography images were obtained using standard-of-care imaging sequences in 91 patients and curated using a local research picture archiving and communication system based on the Extensible Neuroimaging Archive Toolkit platform39. Outlines were drawn by consensus between an oncologist (S.T.C.S.) and a radiologist (D.A.), giving volumetric tumour coverage, from which image strips were prepared for rapid visualization of all tumour slices for all patients using an in-house script written in Python.

To detect presence or absence of budding structures in the contoured CT data, qualitative assessment of contoured tumours was performed by X.F. and S.T.C.S and verified by a radiologist (D.A.).

Histological analysis

Haematoxylin and eosin-stained histological sections of representative cases were evaluated by two pathologists (J.I.L. and C.E.S.). Each sample was qualitatively assessed for the relative amount of viable tumour cells, fibrosis and presence of necrosis.

Microdiversity

Spatial maps of regional clone diversity are created for two representative tumour sections. In these maps, regions are colour coded based on the number of subclones. Regions that harbour at least one subclone are treated as a proxy for microdiversity hotspots defined in the model analysis. In total, there are 606 regions from 54 tumours that satisfy this criterion. In evaluation of association between microdiversity features and clinical annotations, subsets of tumours are considered. Subsets with different relapse statuses consist of 270 (‘relapse’) and 336 regions (‘no relapse’), respectively. Subsets with different rates of disease progression consist of 276 (‘no progression’), 265 (‘attenuated progression’) and 65 regions (‘rapid progression’), respectively. For tumour regions, the normalized distance to tumour centre is measured as described above in Model analyses section.

Parallel evolution

Spatial maps of parallel mutational events in PBRM1 and in BAP1, respectively, are created for two representative tumours. In these maps, regions are coloured differently according to different parallel mutational events. Regions that harbour more than one event are indicated with multiple colours. To study the spatial distribution of mutational events with limited clonal expansion, the maximum distance from an event spanning a single region to the tumour margin is measured.

Statistical analysis

The two-sided Wilcoxon’s rank test was performed to compare a particular measurement between different conditions. Statistical significance is annotated within box plots using stat_compare_means (method = ‘wilcox.text’, label = ‘p.signif’) in R.

Bootstrapping was performed to generate 100 random samples of 400 microdiversity hotspots per sample with replacement, using in random.choice() in Python. The power-law exponent was then determined by fitting a power-law function to the cumulative probability distribution from each sample, using scipy.optimize.curve_fit() in Python.

The quantile–quantile (Q–Q) plot was generated to compare the actual distribution of microdiversity hotspots with a power-law distribution with the exponent as the median of fitted values in bootstrapping, using statsmodels.graphics.gofplots.qqplot() in Python.

KDE was performed for simulations with respect to the size of tumour slice and the number of subclones, using seaborn.jointplot (kind = ‘kde’) in Python.

R v.3.6.2 and Python v.3.7.7 were used for these analyses.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.