Spatial structure governs the mode of tumour evolution

Noble, Robert; Burri, Dominik; Le Sueur, Cécile; Lemant, Jeanne; Viossat, Yannick; Kather, Jakob Nikolas; Beerenwinkel, Niko

doi:10.1038/s41559-021-01615-9

Download PDF

Article
Open access
Published: 23 December 2021

Spatial structure governs the mode of tumour evolution

Nature Ecology & Evolution volume 6, pages 207–217 (2022)Cite this article

20k Accesses
40 Citations
125 Altmetric
Metrics details

Subjects

Abstract

Characterizing the mode—the way, manner or pattern—of evolution in tumours is important for clinical forecasting and optimizing cancer treatment. Sequencing studies have inferred various modes, including branching, punctuated and neutral evolution, but it is unclear why a particular pattern predominates in any given tumour. Here we propose that tumour architecture is key to explaining the variety of observed genetic patterns. We examine this hypothesis using spatially explicit population genetics models and demonstrate that, within biologically relevant parameter ranges, different spatial structures can generate four tumour evolutionary modes: rapid clonal expansion, progressive diversification, branching evolution and effectively almost neutral evolution. Quantitative indices for describing and classifying these evolutionary modes are presented. Using these indices, we show that our model predictions are consistent with empirical observations for cancer types with corresponding spatial structures. The manner of cell dispersal and the range of cell–cell interactions are found to be essential factors in accurately characterizing, forecasting and controlling tumour evolution.

State-dependent evolutionary models reveal modes of solid tumour growth

Article Open access 09 March 2023

Spatial biology of cancer evolution

Article 09 December 2022

Characterizing the ecological and evolutionary dynamics of cancer

Article 27 July 2020

Main

A tumour is a product of somatic evolution in which mutation, selection, genetic drift and cell dispersal generate a patchwork of cell subpopulations (clones) with varying degrees of aggressiveness and treatment sensitivity¹. A primary goal of modern cancer research is to characterize this evolutionary process to enable precise, patient-specific prognoses and optimize targeted therapy regimens. However, studies revealing the evolutionary features of particular cancers raise as many questions as they answer. Why do different tumour types exhibit different modes of evolution^{2,3,4,5,6,7,8}? What conditions sustain the frequently observed pattern of branching evolution, in which clones diverge and evolve in parallel^2,9,10,11? And why do some pan-cancer analyses indicate that many tumours evolve neutrally¹², whereas others support extensive selection¹³?

Factors proposed as contributing to tumour evolution include microenvironmental heterogeneity, niche construction and positive ecological interactions between clones^{1,14,15,16,17}. However, because such factors have not been well characterized across human cancer types, it remains unclear how they might relate to evolutionary modes. In contrast, it is well established that tumours exhibit a wide range of architectures and types of cell dispersal^18,19 (Fig. 1), the evolutionary effects of which have not been systematically examined. Because gene flow (the transfer of genetic information between localized populations²⁰) is a principal force in evolutionary dynamics, we hypothesized that different tumour structures might result in different evolutionary modes. To test this hypothesis, we developed a way to formulate multiple classes of mathematical models, each tailored to a different class of tumour, within a single general framework, and we implemented this framework as a stochastic computer programme.

**Fig. 1: Representative regions of histology slides from human tumours exemplifying four different kinds of tissue structure and manners of cell dispersal.**

Our modelling approach is built on basic tenets of cancer evolutionary theory¹. Simulated tumours arise from a single cell that has acquired a fitness-enhancing mutation. Each time a tumour cell divides, its daughter cells can acquire passenger mutations, which have no fitness effect, and more rarely driver mutations, which confer a fitness advantage. In solid tumours, we assume that cells compete with one another for space and other resources. Whereas previous studies have assumed that tumours grow into empty space, our model also allows us to simulate the invasion of normal tissue—a defining feature of malignancy.

Results

Tumour architecture can determine the mode of evolution

To test whether varying tumour architecture suffices to alter the tumour evolutionary mode, we considered four particular models with different spatial structures and manners of cell dispersal but identical evolutionary parameters (driver mutation rate and distribution of driver fitness effects). We set the dispersal probability per cell division such that all tumours take a similar amount of time to grow from one cell to one million cells, corresponding to several years in real time.

Our first case is a non-spatial model that has been proposed as appropriate to leukaemia^21,22, a tumour type in which mutated stem cells in semi-solid bone marrow produce cancer cells that mix and proliferate in the bloodstream (Fig. 1a). When simulating tumour growth in the absence of spatial constraints, rapid clonal expansions can result from driver mutations that increase the cell division rate by as little as a few percent, and the vast majority of cells eventually share the same set of driver mutations (Fig. 2a–d). These characteristics are reminiscent of chronic myeloid leukaemia, in which cell proliferation is driven by a single change to the genome²³, and acute myeloid leukaemia, which has relatively few drivers²⁴.

**Fig. 2: Four modes of tumour evolution predicted by our model.**

In our second model, consistent with the biology of colorectal adenoma²⁵ and in common with previous computational models of colorectal carcinoma^5,26,27, we simulate a tumour that consists of large glands (Fig. 1b) and grows via gland fission (bifurcation). Although the driver mutation rate and the fitness effect are exactly the same as in the previous case, the addition of spatial structure dramatically alters the mode of tumour evolution. The organization of cells into glands limits the extent to which driver mutations can spread through the population, so that selective sweeps become progressively localized as the tumour expands. For our parameter values, this process leads to a highly branched, fan-like driver phylogenetic tree and ever greater spatial diversity, with different combinations of driver mutations predominating even in neighbouring glands (Fig. 2e–h). The mean tumour cell fitness increases substantially, but there is also extensive, positively correlated intratumour variation in cell fitness values and passenger mutation counts (Extended Data Fig. 1a–b). Model outcomes are similar even if cells are able to acquire drivers that directly increase the gland fission rate, because such mutations rarely spread within glands (Extended Data Fig. 2a).

The third case corresponds to a glandular tumour that grows by invading adjacent normal tissue, as documented in various types of solid tumour, including many colorectal, breast and lung cancers^19,28. Glandular tumours are subdivided into localized cell communities (Fig. 1c), whose small size has previously been inferred by community detection methods²⁹ and mathematical modelling.³⁰ To obtain additional estimates of gland size in four cancer types, we used semi-automated analysis of histology slides (Extended Data Fig. 3) and found that each gland contains between a few hundred and a few thousand cells (Extended Data Fig. 4a). In simulations with gland sizes within this range, we find that even small increases in cell fitness can spark rapid clonal expansions. Clonal interference nevertheless inhibits selective sweeps, resulting in a zonal tumour in which large regions share the same combination of driver mutations (Fig. 2i–l and Extended Data Fig. 1c,d). Simulated invasive glandular tumours typically exhibit stepwise increases in driver diversity and a phylogeny with several long branches, qualitatively consistent with observations in numerous cancer types^2,3,11. Restricting cell dispersal to the tumour boundary without dispersal within the tumour bulk (to simulate tumours that lack intratumoural budding²⁸ or tumours in which proliferation is confined to the boundary³¹) results in somewhat shorter branches (Extended Data Fig. 2b).

Our fourth and final model represents a tumour with no glandular structure and with growth confined to its boundary (Fig. 1d). Expansive tumour growth associated with a clearly defined boundary and no sign of active migration occurs in tissues that impose relatively weak physical resistance¹⁸. Boundary-growth models have in particular been proposed as appropriate for simulating the evolution of certain kinds of hepatocellular carcinoma^7,32, although it should be noted that hepatocellular carcinoma in general exhibits a wide range of growth patterns³³. The spatial structure of the boundary-growth model favours genetic drift, rather than selection. For our fixed parameter values, tumour evolution in this case is effectively almost neutral (Fig. 2m–p and Extended Data Fig. 1e), and mutations can spread only by surfing on a wave of population expansion^34,35,36. Consequently, the mutation burden generally increases from the tumour core to its boundary (Extended Data Fig. 1f). Selection is only slightly more prominent when cells can compete with their nearest neighbours within the tumour mass (Extended Data Fig. 2c). Suppression of selection in the boundary-growth model is consistent with evidence of effectively neutral evolution in hepatocellular carcinoma⁷, as well as the existence of large, well-differentiated benign tumours such as leiomyomas³⁷ and fibroadenomas³⁸ that only rarely progress to malignancy.

Characterization of evolutionary modes and comparison with data

Together, our models demonstrate that variation in the range of cell–cell interactions and the manner of cell dispersal alone can generate distinct modes of tumour evolution. We next sought to describe these modes more precisely in terms of summary evolutionary indices that can be computed from both our simulations and real cancer genomic data (Fig. 3a). The first index we considered is clonal diversity (denoted D), which grows with the number of large nodes in the driver phylogenetic tree (as in the final column of Fig. 2). The second index n is the mean number of driver mutations per cell, which represents the average depth of the driver phylogenetic tree. Any pair of values of these two indices corresponds to a distinct set of phylogenetic trees. The nodes of these trees represent clones, and their size is proportional to clone population size. The space of attainable n and D values (Fig. 3b) is bounded below by the line D = 1 and above by the curve D = 1/(2−n)² (see Methods). Locations close to the upper boundary correspond to more highly branched trees than locations close to the lower boundary, and locations on the left correspond to trees with shorter branches than locations on the right.

**Fig. 3: Using summary indices to characterize modes of tumour evolution.**

To compare our model outcomes to data, we determined the evolutionary indices of phylogenetic trees previously inferred from multi-region sequencing of four solid cancer types: clear cell renal cell carcinoma (ccRCC)⁹, non-small-cell lung cancer (NSCLC)¹⁰, breast cancer³⁹ and mesothelioma⁴⁰. We also calculated indices from single-cell sequencing data for breast cancer⁴¹ and uveal melanoma⁴². Despite their methodological diversity, these six studies yielded remarkably similar evolutionary indices. The majority of data points (28 of 35) lie above a trajectory corresponding to sequential selective sweeps (pink curve in Fig. 3b) and below a reference curve that represents an intermediate degree of branching (pale blue curve in Fig. 3b and Supplementary information). All the tumours have 1 < D < 12 and 3≤n < 14. Notwithstanding limitations of sampling, sequencing and phylogenetic inference methods, a useful computational model of invasive tumour evolution should generate summary indices that are consistent with these data points, corresponding to branching evolution with a small number of main branches.

The simulation results of the four models discussed previously form four distinct clusters with respect to the summary indices n and D (Fig. 3c; mean silhouette width 0.60). Neutral counterparts of these four models—which have the same parameter values, except that the driver fitness effect is reduced to zero—cluster together, near the boundary-growth model. As expected, we find that the evolutionary indices of the solid tumours are consistent with outcomes of our invasive glandular model, and this consistency is robust to varying gland size, driver mutation rate and driver mutation effect within plausible ranges (Extended Data Fig. 5). Particularly close agreement between unadjusted model output and data occurs when the average driver fitness effect is 0.2 (Fig. 3d).

An important caveat in the above comparison is that the unadjusted model output includes all driver mutations down to a frequency of one in a million, whereas solid tumour sequencing protocols fail to detect most mutations at frequencies below 5%⁹. This difference in sensitivity means that D values calculated from data are expected to underestimate true tumour diversity. It follows that a fairer comparison can be made by removing rare mutations from the model output, to simulate imperfect sensitivity. Such adjustment strengthens the agreement between model and data (Fig. 3e and Extended Data Fig. 6).

Since the non-spatial model most plausibly represents liquid tumour evolution, we compared its predictions to additional data for acute myeloid leukaemia²⁴. We found robust correspondence between the model and this data set (Fig. 3b,c and Supplementary Fig. 1). Within plausible parameter ranges, 83% of tumours simulated using a non-spatial model have coordinates (n, D) consistent with the selective-sweeps evolutionary mode.

Alternative models that have different spatial structures are less consistent with data for both solid and liquid tumours. For the gland fission model, 83% of simulated tumours have coordinates above the intermediate-branching curve, corresponding to high values of D relative to n (Supplementary Fig. 2). For the boundary-growth model, both n and D are typically close to 1 (Supplementary Fig. 3). These outcomes are summarized in Table 1, which provides quantitative definitions of evolutionary modes in terms of evolutionary indices (see also Fig. 3f and Supplementary Table 1).

Table 1 Properties of the four modes of tumour evolution

Full size table

Results for a variant of the invasive glandular model, in which normal cells are absent and the tumour grows into empty space, are also less consistent with data (Supplementary Fig. 4). In this empty-space model, the speed at which the tumour expands (via cell dispersal into empty space) typically exceeds the speed at which clones spread within the tumour (via cell dispersal into fully occupied glands), which leads to a more star-shaped or highly branched phylogeny (high D relative to n). Conversely, when tumour cells must compete with normal cells at the tumour boundary (as in the third row of Fig. 2), the speed at which driver mutations spread within the tumour is similar to the speed of tumour growth, which enables some driver mutations to reach high frequency and results in sparser branching (Extended Data Fig. 5). Yet another alternative model, which includes normal cells but confines cell dispersal to the tumour boundary, thwarts the spread of driver mutations and generates similar D but smaller n values (Supplementary Fig. 5).

Further analysis of tumour evolutionary modes

A complementary way to describe modes of tumour evolution is in terms of phylogenetic tree shape or balance. Because tree balance indices developed for characterizing organismal evolution are poorly suited to tumour data, we developed an index⁴³ that is robust to variation in sampling and sequencing protocols (Methods). This index J¹ takes a high value for trees in which branching events tend to split the tree into subtrees of similar size. Low values are assigned to trees that are approximately linear or are dominated by a single node.

Just as for indices n and D, the tree balance values predicted by our invasive glandular tumour model are consistent with the values obtained from sequencing data (Fig. 4a). Typical J¹ values for both this model and the data are between 0 and 0.5—substantially below the maximum value of 1 corresponding to perfectly balanced trees. The consistency remains when we adjust the model output by removing rare driver mutations (Extended Data Fig. 6, which constitutes a fairer comparison), even though the associated trees appear very different (Extended Data Fig. 7) and have dissimilar degree distributions (Extended Data Fig. 8). Agreement between model and data is also observed for alternative balance indices after removing rare mutations (Supplementary Figs. 6, 7 and 8). Conversely, neutral models and models that do not account for glandular structure predict smaller or more variable tree balance values than the data for solid tumours (Fig. 4a). Tree balance values for the non-spatial model are consistent with data for acute myeloid leukaemia (Fig. 4a).

**Fig. 4: Alternative summary indices for characterizing modes of tumour evolution.**

Whereas n, D and J¹ are determined only by the final tumour state, other indices can be based on time series data. For example, the mean clonal turnover magnitude $\overline{{{\Theta }}}$ provides an alternative to n for measuring the extent of evolutionary change, and the mean clonal turnover time ${\overline{T}}_{{{\Theta }}}$ indicates whether evolutionary change occurs mostly early or late during tumour growth (Methods). As expected, after an appropriate axis transformation, the pattern of clusters of D versus $\overline{{{\Theta }}}$ (Fig. 4b) resembles the pattern of clusters of D versus n (Fig. 3c). Plotting D versus a transformed ${\overline{T}}_{{{\Theta }}}$ reveals a somewhat similar pattern, except that models with low $\overline{{{\Theta }}}$ exhibit high stochastic variation in ${\overline{T}}_{{{\Theta }}}$ (Fig. 4c). Clonal turnover occurs relatively late in the non-spatial model but throughout tumour growth in the gland fission model. Given sufficient data, evolutionary modes can thus be described and classified in terms of various summary indices capturing distinct aspects of tumour evolution (for alternative diversity indices, see Extended Data Fig. 9).

Influence of tumour architecture on mutation frequency distributions

As researchers and clinicians seldom have access to multi-regional sequencing data, or the longitudinal data needed to track how tumour clone sizes change over time, tumour phylogenies and evolutionary parameters are commonly inferred from mutation frequencies measured from a single biopsy sampled at a single time point. Moreover, current cancer sequencing technologies are neither sensitive enough to detect the majority of low-frequency mutations, nor precise enough to distinguish between high-frequency and clonal (100% frequency) mutations. Accordingly, the most relevant part of the mutation frequency distribution for practical purposes is in the intermediate frequency range. One way to examine differences between distributions within this intermediate range is to plot the cumulative mutation count (the number of mutations present at or above frequency f) versus the inverse mutation frequency (1/f). In a neutral non-spatial model, this graph is a straight line (Fig. 5a, blue points). Because the transformed mutation frequency distributions of many human cancers are also approximately linear, it has been proposed that neutral tumour evolution is widespread¹². Deviations from this theoretical straight line have been taken as evidence of selection^27,44.

**Fig. 5: Mutation frequency distributions predicted by our model.**

Our population genetics modelling illustrates how not only selection but also tumour architecture has important effects on tumour mutation frequency distributions (Fig. 5 and Extended Data Fig. 10). In particular, when the cumulative mutation count is plotted against the inverse mutation frequency, the curve for the neutral model is no longer linear. Instead, for spatial models, the average non-neutral curve can be closer to a straight line than the average neutral model curve. These results confirm and extend previous findings^27,35 indicating that methods using mutation frequencies to infer selection in solid tumours can yield incorrect conclusions if they fail to account for effects of population structure. Inappropriate choice of null model can therefore explain otherwise contradictory findings regarding the prevalence of neutral evolution in human cancers^13,45.

Discussion

In summary, we have found that differences in the range of cell–cell interactions and the manner of cell dispersal are sufficient to generate a spectrum of tumour evolutionary modes. This finding has important implications both for understanding tumour genomic data and for interpreting the results of previous computational models. Whereas mathematical oncologists have focused on mutation fitness effects^{5,12,27,44,46,47} or microenvironmental heterogeneity¹⁵, our perspective instead emphasizes the importance of population structure and gene flow in tumour evolution.

Prominent studies have variously used non-spatial models^12,21,44,46, gland fission models^5,27 or variants of the Eden growth model (in which cells compete with their nearest neighbours)^32,47 to investigate aspects of tumour evolution (see Methods for further discussion of previous work). Our results imply that, at best, each of these model types is appropriate only in special cases. Accurate models of solid tumour evolution must faithfully recapitulate interactions within localized patches of cancer cells—the so-called tumour communities²⁹—and between cancer cells and normal cells.

Consistent with previous work⁴⁸, our models predict substantial variability in tumour evolutionary modes due to stochasticity in the timing, location and fitness effects of driver mutations. Our finding that this random variation approaches the variability observed within and between solid tumour types (Fig. 3b,c) suggests that it will be challenging to infer precise information about tumour structure and growth patterns from phylogenetic data, even given previous knowledge of mutation rates and fitness effects. Nevertheless, of the model types we have examined, we have shown that the most plausible for simulating evolution in the majority of malignant solid tumours, which exhibit branching evolution¹¹, is the invasive glandular model introduced herein. A key feature of this model is that the speed at which a fitter clone spreads within its immediate ancestor is similar to the ancestor’s own expansion speed.

It follows from our findings that tumour architecture determines how well biopsy samples reflect intratumour heterogeneity. Oncologists typically base treatment decisions on the presence or absence of particular mutations in cells taken from only a small region of a solid tumour. Tumour types with structures that promote diversification are predicted to be the least responsive to targeted therapies, unless truncal mutations can be reliably identified and targeted.

Our framework also implies that a change in tumour architecture during cancer progression can lead to a change in the mode of tumour evolution. For example, the ‘big bang’ model of colorectal cancer^4,5 posits that early selective sweeps are followed by effectively neutral evolution, such that mutation frequency is determined by the time of mutation occurrence. This idea was previously examined using a computational model of tumour growth via gland fission, with a maximum of one driver mutation per cell⁵. Based on more sophisticated population genetics modelling, we find reason to expect ongoing selection throughout the very early stages of colorectal tumour progression (when growth is driven by gland fission), enabling multiple driver mutations to reach high frequencies. In later stages, after cells from the adenoma invade neighbouring tissue and give rise to an adenocarcinoma, we predict a transition to either branching evolution or – because the invasion begins with numerous and/or highly transformed, rapidly expanding subclones⁴⁹ – effectively neutral evolution. Punctuated evolution in colorectal tumours can thus be explained by the transition from gland fission to invasive growth. This explanation is broadly consistent with the big bang model and more recent multi-region sequencing studies^6,49,50, while also agreeing with results of comparative genomic analysis, which indicate that colorectal cancers evolve subject to strong positive selection and have more driver mutations per cell than most other cancer types^13,51. Transitions between evolutionary modes were recently investigated in a mathematical modelling study of ductal carcinoma,⁵² which complements the current work by likewise highlighting the importance of spatial competition.

In clear cell renal cell carcinoma, separate studies have found that tumour architecture^53,54 and evolutionary trajectory⁹ are predictors of cancer progression and survival. Evolutionary mode correlates with both tumour architecture and clinical outcome in childhood cancers^8,55. The manner of cell dispersal has prognostic value in colorectal and other solid cancers²⁸. By mechanistically connecting tumour architecture to the mode of tumour evolution, our work provides a blueprint for a new generation of patient-specific models for forecasting tumour progression^16,48 and for optimizing evolutionarily-informed treatment regimens^56,57,58. This drive towards personalized models motivates further efforts to characterize how spatial structure interacts with other biological factors, such as spatially-varying carrying capacity²⁹, alternative manners of cell dispersal¹⁹, immune interactions⁵⁹, cancer stem cell hierarchies⁶⁰, and frequency- or density-dependent cell fitness.¹⁷

Methods

Previous mathematical models of tumour population genetics

Many previous studies of tumour population genetics have used non-spatial branching processes²¹, in which cancer clones grow exponentially without interacting. Unless driver mutations increase cell fitness by less than 1%, these models predict lower clonal diversity and lower numbers of driver mutations than typically observed in solid tumours⁴⁶. Among spatial models, a popular option is the Eden growth model (or boundary-growth model), in which cells are located on a regular grid with a maximum of one cell per site, and a cell can divide only if an unoccupied neighbouring site is available to receive the new daughter cell^32,47,61. Other methods with one cell per site include the voter model^32,62,63 (in which cells can invade neighbouring occupied sites) and the spatial branching process⁴⁷ (in which cells budge each other to make space to divide). Further mathematical models have been designed to recapitulate glandular tumour structure by allowing each grid site or ‘deme’ to contain multiple cells and by simulating tumour growth via deme fission throughout the tumour^5,26 or only at the tumour boundary²⁷. A class of models in which cancer cells are organized into demes and disperse into empty space has also been proposed^36,52,64. Supplementary Table 2 summarizes selected studies representing the state of the art of stochastic modelling of tumour population genetics.

Our main methodological innovations are to implement all these distinct model structures, and additional models of invasive tumours, within a common framework, and to combine them with methods for tracking driver and passenger mutations at single-cell resolution. The result is a highly flexible framework for modelling tumour population genetics that can be used to examine consequences of variation not only in mutation rates and selection coefficients, but also in spatial structure and manner of cell dispersal⁶⁵.

Computational model structure

Simulated tumours in our models are made up of patches of interacting cells located on a regular grid of sites. In keeping with the population genetics literature, we refer to these patches as demes. All demes within a model have the same carrying capacity, which can be set to any positive integer. Each cell belongs to both a deme and a genotype. If two cells belong to the same deme and the same genotype then they are identical in every respect, and hence the model state is recorded in terms of such subpopulations rather than in terms of individual cells. For the sake of simplicity, computational efficiency and mathematical tractability, we assume that cells within a deme form a well-mixed population. The well-mixed assumption is consistent with previous mathematical models of tumour evolution^{5,26,27,36,64} and with experimental evidence in the case of stem cells within colonic crypts⁶⁶.

Initial conditions

A simulation begins with a single tumour cell located in a deme at the centre of the grid. If the model is parameterized to include normal cells, then these are initially distributed throughout the grid such that each deme’s population size is equal to its carrying capacity. Otherwise, if normal cells are absent, then the demes surrounding the tumour are initially unoccupied.

Stopping condition

The simulation stops when the number of tumour cells reaches a threshold value. Because we are interested only in tumours that reach a large size, if the tumour cell population succumbs to stochastic extinction, then results are discarded and the simulation is restarted (with a different seed for the pseudo-random number generator).

Within-deme dynamics

Tumour cells undergo stochastic division, death, dispersal and mutation events, whereas normal cells undergo only division and death. The within-deme death rate is density-dependent. When the deme population size is less than or equal to the carrying capacity, the death rate takes a fixed value d₀ that is less than the initial division rate. When the deme population size exceeds carrying capacity, the death rate takes a different fixed value d₁ that is much greater than the largest attainable division rate. Hence, all genotypes grow approximately exponentially until the carrying capacity is attained, after which point the within-deme dynamics resemble a birth–death Moran process—a standard, well characterized model of population genetics.

In all spatially structured simulations, we set d₀ = 0 to prevent demes from becoming empty. For the non-spatial (branching process) model, we set d₀ > 0 and dispersal rate equal to zero, so that all cells always belong to a single deme (with carrying capacity greater than the maximum tumour population size).

Mutation

When a cell divides, each daughter cell inherits its parent’s genotype plus a number of additional mutations drawn from a Poisson distribution. Each mutation is unique, consistent with the infinite-sites assumption of canonical population genetics models. Whereas some previous studies have examined the effects of only a single driver mutation (Supplementary Table 2), in our model there is no limit on the number of mutations a cell can acquire. Most mutations are passenger mutations with no phenotypic effect. The remainder are drivers, each of which increases the cell division or dispersal rate.

The programme records the immediate ancestor of each clone (defined in terms of driver mutations) and the matrix of Hamming distances between clones (that is, for each pair of clones, how many driver mutations are found in only one clone), which together allow us to reconstruct driver phylogenetic trees. To improve efficiency, the distance matrix excludes clones that failed to grow to more than ten cells and failed to produce any other clone before becoming extinct.

Driver mutation effects

Whereas previous models have typically assumed that the effects of driver mutations combine multiplicatively, this can potentially result in implausible trait values (especially in the case of division rate if the rate of acquiring drivers scales with the division rate). To remain biologically realistic, our model invokes diminishing returns epistasis, such that the average effect of driver mutations on a trait value r decreases as r increases. Specifically, the effect of a driver is to multiply the trait value r by a factor of 1 + s(1 − r/m), where s > 0 is the mutation effect and m is an upper bound. Nevertheless, because we set m to be much larger than the initial value of r, the combined effect of drivers in all models in the current study is approximately multiplicative. For each mutation, the value of the selection coefficient s is drawn from an exponential distribution.

Dispersal

Depending on model parameterization, dispersal occurs via either invasion or deme fission (Supplementary Table 3). In the case of invasion, the dispersal rate corresponds to the probability that a cell newly created by a division event will immediately attempt to invade a neighbouring deme. This particular formulation ensures consistency with a standard population genetics model known as the spatial Moran process. The destination deme is chosen uniformly at random from the four nearest neighbours (von Neumann neighbourhood). Invasion can be restricted to the tumour boundary, in which case the probability that a deme can be invaded is 1 − N/K if N≤K and 0 otherwise, where N is the number of tumour cells in the deme and K is the carrying capacity. If a cell fails in an invasion attempt, then it remains in its original deme. If invasion is not restricted to the tumour boundary, then invasion attempts are always successful.

In fission models, a deme can undergo fission only if its population size is greater than or equal to carrying capacity. As with invasion, deme fission immediately follows cell division (so that results for the different dispersal types are readily comparable). The probability that a deme will attempt fission is equal to the sum of the dispersal rates of its constituent cells (up to a maximum of 1). Deme fission involves moving half of the cells from the original deme into a new deme, which is placed beside the original deme. If the dividing deme contains an odd number of cells, then the split is necessarily unequal, in which case each deme has a 50% chance of receiving the larger share. Genotypes are redistributed between the two demes without bias according to a multinomial distribution. Cell division rate has only a minor effect on deme fission rate because a deme created by fission takes only a single cell generation to attain carrying capacity.

If fission is restricted to the tumour boundary, then the new deme’s assigned location is chosen uniformly at random from the four nearest neighbours, and if the assigned location already contains tumour cells, then the fission attempt fails. If fission is allowed throughout the tumour, then an angle is chosen uniformly at random, and demes are budged along a straight line at that angle to make space for the new deme beside the original deme.

Our particular method of cell dispersal was chosen to enable comparison between our results and those of previous studies and to facilitate mathematical analysis. In particular, when the deme carrying capacity is set to 1, our model approximates an Eden growth model (if fission is restricted to the tumour boundary, or if dispersal is restricted to the tumour boundary and normal cells are absent), a voter model (if invasion is allowed throughout the tumour) or a spatial branching process (if fission is allowed throughout).

To fairly compare different spatial structures and manners of cell dispersal, we set dispersal rates in each case such that the time taken for a tumour to grow from one cell to one million cells is approximately the same as in the neutral Eden growth model with maximal dispersal rate. This means that, across models, the cell dispersal rate decreases with increasing deme size. Given that tumour cell cycle times are on the order of a few days, the timespans of several hundred cell generations in our models realistically correspond to several years of tumour growth. More specifically, if we assume tumours take between 5 and 50 years to grow and the cell cycle time is between 1 and 10 days (both uniform priors), then the number of cell generations is between 400 and 8,000 in 95% of plausible cases. This order of magnitude is consistent with tumour ages inferred from molecular data⁶⁷.

We note that, in addition to gland fission, gland fusion has been reported in normal human intestine⁶⁸, which raises the possibility that gland fusion could occur during colorectal tumour development. However, the rate of crypt fission in tumours is much elevated relative to the rate in healthy tissue, and must exceed the rate of crypt fusion (or else the tumour would not grow). Therefore, even if crypt fusion occurs in human tumours, we do not expect it to have a substantial influence on evolutionary mode. This view is supported by previous computational modelling⁶⁹.

Two versus three dimensions

We chose to conduct our study in two dimensions for two main reasons. First, the effects of deme carrying capacity on evolutionary dynamics are qualitatively similar in two and three dimensions, yet a two-dimensional model is simpler, easier to analyse, and easier to visualize. Second, we aimed to create a method that is readily reproducible using modest computational resources and yet can represent the long-term evolution of a reasonably large tumour at single-cell resolution.

One million cells in two dimensions corresponds to a cross-section of a three-dimensional tumour with many more than one million cells. Therefore, compared to a three-dimensional model, a two-dimensional model can provide richer insight into how evolutionary dynamics change over a large number of cell generations. Developing an approximate, coarse-grained analogue of our model that can efficiently simulate the population dynamics of very large tumours with different spatial structures in three dimensions is an important direction for future research.

Implementation

The programme implemented Gillespie’s exact stochastic simulation algorithm⁷⁰ for statistically correct simulation of cell events. The order of event selection is (1) deme, (2) cell type (normal or tumour), (3) genotype, and (4) event type. At each stage, the probability of selecting an item (deme, cell type, genotype or event type) is proportional to the sum of event rates for that item, within the previous item. We measured elapsed time in terms of cell generations, where a generation is equal to the expected cell cycle time of the initial tumour cell.

Sequencing data

We surveyed the multi-region and single-cell tumour sequencing literature to identify data sets suitable for comparison with our model results. Studies published before 2015 (for example, refs. ^71,72,73,74) were excluded as they were found to have insufficient sequencing depth for our purposes. We also excluded studies that reconstructed phylogenies using samples from metastases or from multifocal tumours (for example, refs. ^{75,76,77,78,79,80}) because our model is not designed to correspond to such scenarios. The seven studies we chose to include in our comparison are characterized by either high-coverage multi-region sequencing or large-sample single-cell sequencing of several tumours.

The ccRCC investigation⁸¹ we selected involved multi-region deep sequencing, targeting a panel of more than 100 putative driver genes. Three studies of NSCLC¹⁰, mesothelioma⁴⁰ and breast cancer³⁹ conducted multi-region whole-exome sequencing (first two studies) or whole-genome sequencing (latter study), and reported putative driver mutations. We also used data from single-cell RNA sequencing studies of uveal melanoma⁴² and breast cancer⁴¹, in which chromosome copy number variations were used to infer clonal structure, and a study of acute myeloid leukaemia (AML) that used single-cell DNA sequencing²⁴. All seven studies constructed phylogenetic trees, which are readily comparable to the trees predicted by our modelling. The methodological diversity of these studies contributes to demonstrating the robustness of the patterns we seek to explain.

From each of the seven cohorts, we obtained data for between three and eight tumours. In the ccRCC data set, we focused on the five tumours for which driver frequencies were reported in the original publication. For NSCLC, we used data for the five tumours for which at least six multi-region samples were sequenced. In mesothelioma, we selected the six tumours that had at least five samples taken. From the breast cancer multi-region study, we used data for the three untreated tumours that were subjected to multi-region sequencing. From the single-cell sequencing studies of uveal melanoma and breast cancer, we used all the published data (eight tumours in each case), and from the AML study, we selected a random sample of eight tumours.

In multi-region sequencing data sets, it is uncertain whether all putative driver mutations were true drivers of tumour progression. One way to interpret the data (interpretation I1) is to assume that all putative driver mutations were true drivers that occurred independently. Alternatively, the more conservative interpretation I2 assumes that each mutational cluster (a distinct peak in the variant allele frequency distribution) corresponds to exactly one driver mutation, while all other mutations are hitchhikers. Thus, I1 permits linear chains of nodes that in I2 are combined into single nodes (compare Supplementary Figs. 9 and 10), and I1 leads to a higher estimate of the mean number of driver mutations per cell (our summary index n). If both the fraction of putative driver mutations that are not true drivers (false positives) and the fraction of true driver mutations that are not counted as such (false negatives) are low, or if these fractions approximately cancel out, then interpretation I1 will give a good approximation of n whereas I2 will give a lower bound. For the ccRCC, NSCLC and breast cancer cases in our data set, I1 generates values of n in the range 3–10 (mean 6.1), consistent with estimates based on other methodologies^13,51, whereas for I2 the range is only 1–4 (mean 2.5). Accordingly, we used interpretation I1.

Clonal diversity index

To measure clonal diversity, we used the inverse Simpson index defined as $D=1/{\sum }_{i}{p}_{i}^{2}$, where p_i is the frequency of the ith combination of driver mutations. For example, if the population comprises k clones of equal size, then p_i = 1/k for every value of i, and so D = 1/(k × 1/k²) = k. Clonal diversity has a lower bound D = 1. The inverse Simpson index is relatively robust to adding or removing rare types, which makes it appropriate for comparing data sets with differing sensitivity thresholds. Further examples are illustrated in Supplementary Fig. 11.

D is constrained by an upper bound for trees with n < 2, where n is the mean number of driver mutations per cell. Indeed, n = ∑_iip_i ≥ p₁ + 2(1 − p₁) = 2 − p₁, so p₁ ≥ 2 − n > 0, since n < 2. Therefore,

$$D=\frac{1}{{\sum }_{i}{p}_{i}^{2}}\le \frac{1}{{p}_{1}^{2}}\le \frac{1}{{(2-n)}^{2}}.$$

To see that this bound is tight, assume 1 ≤ n < 2 and consider a star-shaped tree with N nodes such that p₁ = 2 − n and other nodes have equal weights p_i = (1 − p₁)/(N − 1) = (n − 1)/(N − 1) for i ≥ 2. The mean number of driver mutations per cell is p₁ + 2(1 − p₁) = 2 − p₁ = n, and the inverse Simpson index is

$$\begin{array}{l}D=\frac{1}{\mathop{\sum }\nolimits_{i = 1}^{N}{p}_{i}^{2}}=\frac{1}{{p}_{1}^{2}+\mathop{\sum }\nolimits_{i = 2}^{N}{p}_{i}^{2}}\\=\frac{1}{{(2-n)}^{2}+(N-1){((n-1)/(N-1))}^{2}}=\frac{1}{{(2-n)}^{2}+{(n-1)}^{2}/(N-1)}.\end{array}$$

This quantity goes to 1/(2 − n)² as the number of nodes N goes to infinity, so the bound 1/(2 − n)² may be approached arbitrarily closely.

It is informative to derive the relationship between D and n for a population that evolves via a sequence of clonal sweeps, such that each new sweep begins only after the previous sweep is complete. For a given value of n, our simulations rarely produce trees with D values below the curves of this trajectory. Suppose that a population comprises a parent type and a daughter type, with frequencies p and 1 − p, respectively. If the daughter has m driver mutations, then the parent must have m − 1 driver mutations and n must satisfy m − 1 ≤ n ≤ m. More specifically,

$$n=(m-1)p+m(1-p)=m-p\ \Rightarrow \ p=m-n=1-\{n\},$$

where {n} denotes the fractional part of n (or 1 if n = m). The trajectory is therefore described by

$$D=\frac{1}{{p}^{2}+{(1-p)}^{2}}=\frac{1}{{(1-\{n\})}^{2}+{\{n\}}^{2}}.$$

We additionally calculated a curve representing the maximum possible diversity of linear trees. In the main text and below, we refer to this curve as corresponding to trees with an intermediate degree of branching. Specifically, this intermediate-branching curve is defined such that for every point below the curve (and with D > 1), there exist both linear trees and branching trees that have the corresponding values of n and D, whereas for every point above the curve there exist only branching trees. Derivation of the curve’s equation is provided in Supplementary Information. A first-order approximation (accurate within 1% for n ≥ 2.2) is D ≈ 9(2n − 1)/8.

To assess the extent to which clusters of points (n, D) are well separated, we calculated silhouette widths using the cluster R package⁸². A positive mean silhouette width indicates that clusters are distinct.

Other diversity indices

Our diversity index fulfills the same purpose as the intratumour heterogeneity (ITH) index used in the TRACERx Renal study⁹, defined as the ratio of the number of subclonal driver mutations to the number of clonal driver mutations. However, compared to ITH, our index has the advantages of being a continuous variable and being robust to methodological differences that affect ability to detect low-frequency mutations. In calculating ITH from sequencing data, we included all putative driver mutations, whereas ref. ⁹ used only a subset of mutations. For model output, we classified mutations with frequency above 99% as clonal and we excluded mutations with frequency less than 1%. ITH and the inverse Simpson index are strongly correlated across our models (Spearman’s ρ = 0.98, or ρ = 0.81 for cases with D > 2; Extended Data Fig. 9c).

The Shannon index, defined as ${\sum }_{i}{p}_{i}{{\mathrm{log}}}\,{p}_{i}$, is another alternative to the Simpson index. The exponential of this index has the same units as the inverse Simpson index (equivalent number of types). Compared to the Simpson index, the Shannon index gives more weight to rare types, which makes it somewhat less suitable for comparing data sets with differing sensitivity thresholds.

Defining evolutionary modes in terms of indices D and n

In defining regions in terms of indices D and n (Table 1 and Fig. 3c), we first noted that if a population undergoes a succession of non-overlapping clonal sweeps, then at most two clones coexist at any time, and hence D ≤ 2. Allowing for some overlap between sweeps, we defined the ‘selective sweeps’ region as having D < 10/3 and D below the intermediate-branching curve. We put the upper boundary at D = 10/3 because this intersects with the intermediate-branching curve at n = 2.

We used D = 20 to define the boundary between the ‘branching’ and ‘progressive diversification’ regions. The TRACERx Renal study⁹ instead categorized trees containing more than 10 clones as highly branched, as opposed to branched. It is appropriate for us to use a higher threshold because our regions are based on true tumour diversity values, rather than the typically lower values inferred from multi-region sequencing data. Finally, we defined an ‘effectively almost neutral’ region containing star-shaped trees with n < 2 and D above the intermediate-branching curve.

It is possible to construct trees that do not fit the labels we have assigned to regions. For example (as shown in Supplementary Information), there exist linear trees within the branching and progressive diversification regions. Such exceptions are an unavoidable consequence of representing high-dimensional objects, such as phylogenetic trees, in terms of a small number of summary indices. Our labels are appropriate for the subset of trees that we have shown to arise from tumour evolution.

Previously defined tree balance indices

Conventionally, the balance of a tree is the degree to which branching events split the tree into subtrees with the same number of leaves, or terminal nodes. A balanced tree thus indicates more equal extinction and speciation rates than an unbalanced tree⁸³. Tree balance indices are commonly used to assert the correctness of tree reconstruction methods and to classify trees. We considered three previously defined indices, all of which are imbalance indices, which means that more balanced trees are assigned smaller values. We subtracted each of these indices from 1 to obtain measurements of tree balance.

Let T = (V, E) be a tree with a set of nodes V and edges E. Let ∣V∣ = N, and hence ∣E∣ = N − 1 (since each node has exactly one parent, except the root). We defined l as the number of leaves of the tree. The root is labelled 1 and the leaves are numbered from N − l + 1 to N. There is only one cladogram with two leaves, which is maximally balanced according to all the previously defined indices discussed below. We also considered the single-node tree to be maximally balanced with respect to these previously defined indices. The following definitions then apply when l ≥ 3.

For each leaf j, we defined ν_j as the number of interior nodes between j and the root, which is included in the count. Then a normalized version of Sackin’s index, originally introduced in ref. ⁸⁴, is defined as

$${I}_{S,\mathrm{norm}}(T)=\frac{\mathop{\sum }\limits_{j=N-l+1}^{N}{\nu }_{j}-l}{\frac{1}{2}(l+2)(l-1)-l},$$

where to be able to compare indices of trees on different number of leaves l, we subtracted the minimal value for a given l and divided by the range of the index on all trees on n leaves, as in ref. ⁸⁵.

For an interior node i of a binary tree T, we defined T_L(i) as the number of leaves subtended by the left branch of T_i, the subtree rooted at i, and T_R(i) the number of leaves subtended by its right branch. Then, the unnormalized Colless index⁸⁶ of T is

$${I}_{C}(T)=\mathop{\sum }\limits_{i=1}^{N-l}| {T}_{L}(i)-{T}_{R}(i)| .$$

Since Colless index is defined only for bifurcating trees, we used the default normalized Colless-like index ${{\mathfrak{C}}}_{{\mathrm{MDM}},\,{{\mathrm{ln}}}(l+e),\,{\mathrm{norm}}}$ defined in ref. ⁸⁵. This consisted of measuring the dissimilarity between the subtrees $T^{\prime}$ rooted at a given internal node by computing the mean deviation from the median (MDM) of the f-sizes of these subtrees. In this case, $f(l)={{\mathrm{ln}}}(l+e)$ and the f-size of $T^{\prime}$ is defined as

$$\mathop{\sum}\limits_{v\in V(T^{\prime} )}{\mathrm{ln}}({\mbox{deg}}(v)+e).$$

These dissimilarities were then summed and the result was normalized as for Sackin’s index.

The cophenetic value ϕ(i, j) of a pair of leaves i, j is the depth of their lowest common ancestor (such that the root has depth 0). The total cophenetic index⁸⁷ of T is then the sum of the cophenetic values over all pairs of leaves, and a normalized version is

$${I}_{{{\Phi }},{\mathrm{norm}}}(T)=\frac{\mathop{\sum}\limits_{N-l+1\le i < j\le N}\phi (i,j)}{\left({l}\atop{3}\right)},$$

where here the minimal value of the cophenetic index is 0 for all l (for a star-shaped tree with l leaves).

These three balance indices were designed for analysing species phylogenies and are thus defined on cladograms, which are trees in which leaves correspond to extant species and internal nodes are hypothetical common ancestors. Conventional cladograms have no notion of node size. Cladograms also lack linear components as each internal node necessarily corresponds to a branching event. The driver phylogenetic trees reported in multi-region sequencing studies and generated by our models are instead clone trees (also known as mutation trees), in which all nodes of non-zero size represent extant clones. To apply previous balance indices to driver phylogenetic trees, we first converted the trees to cladograms by adding a leaf to each non-zero-sized internal node and collapsing linear chains of zero-sized nodes.

Whereas diversity indices such as D are relatively robust to the addition or removal of rare clones, the balance indices described above are much less robust because they treat all clones equally, regardless of population size (Supplementary Figs. 6, 7 and 8). This hampered comparison between model results and data for two reasons. First, due to sampling error, even high quality multi-region sequencing studies underestimate the number of subclonal, locally abundant driver mutations by approximately 25%⁸¹. Second, bulk sequencing cannot detect driver mutations present in only a very small fraction of cells.

A robust tree balance index

To overcome the shortcomings of previous indices, we have developed a more robust tree balance index based on an extended definition: tree balance is the degree to which internal nodes split the tree into subtrees of equal size, where size refers to the sum of all node populations.

Let f(v) > 0 denote the size of node v. For an internal node i, let V(T_i) denote the set of nodes of T_i, the subtree rooted at i. We then define

$$\begin{array}{l}{S}_{i}=\mathop{\sum}\limits_{v\in V({T}_{i})}f(v)=\,{{\mathrm{the}}\ {\mathrm{size}}\ {\mathrm{of}}}\,\,{T}_{i},\\ {S}_{i}^{* }=\mathop{\sum}\limits_{v\in V({T}_{i})\atop {v\ne i}}f(v)=\,{{\mathrm{the}}\ {\mathrm{size}}\ {\mathrm{of}}}\,\,{T}_{i}\,\,{{\mathrm{without}}\ {\mathrm{its}}\ {\mathrm{root}}}\,\,i.\end{array}$$

For i in the set of internal nodes $\widetilde{V}$, and j in the set C(i) of children of i, we define ${p}_{ij}={S}_{j}/{S}_{i}^{* }$. We then computed the balance score ${W}_{i}^{1}$ of a node $i\in \widetilde{V}$ as the normalized Shannon entropy of the sizes of the subtrees rooted at the children of i:

$${W}_{i}^{1}=\mathop{\sum}\limits_{j\in C(i)}{W}_{ij}^{1},\quad \,{{\mbox{with}}}\,{W}_{ij}^{1}=\left\{\begin{array}{ll}-{p}_{ij}{{{\mathrm{log}}}\,}_{{d}^{+}(i)}{p}_{ij}&\,{{\mbox{if}}}\,\,{p}_{ij} > 0\,{{\mbox{and}}}\,\,{d}^{+}(i)\ge 2,\\ 0&\,{{\mbox{otherwise,}}}\,\end{array}\right.$$

where d⁺(i) is the out-degree (the number of children) of node i. Finally, for each node i, we weighted the balance score by the product of ${S}_{i}^{* }$ and a non-root dominance factor ${S}_{i}^{* }/{S}_{i}.$ Our normalized balance index is then

$${J}^{1}:= \frac{1}{{\sum }_{k\in \widetilde{V}}{S}_{k}^{* }}\mathop{\sum}\limits_{i\in \widetilde{V}}{S}_{i}^{* }\frac{{S}_{i}^{* }}{{S}_{i}}{W}_{i}^{1}.$$

Supplementary Fig. 11 illustrates the calculation of J¹ for four exemplary trees. We further describe the desirable properties of this index, and its relationship to other tree balance indices, in another article⁴³.

When n ≤ 2 (where n is the mean number of driver mutations per cell), the non-root dominance factor cannot exceed n − 1, while the other factors in J¹ are at most 1, which implies J¹ ≤ n − 1 for all n ≤ 2. Also for n > 2, we have J¹ ≤ 1 < n − 1. Thus, it is impossible to construct trees that have J¹ > n − 1, as shown in Fig. 4a.

Clonal turnover indices

For each time point t ≥ δt, we defined a clonal turnover index as

$${{\Theta }}(t)=\mathop{\sum}\limits_{i}{\left({f}_{i}(t)-{f}_{i}(t-\tau )\right)}^{2},$$

where f_i(t) is the frequency of clone i at time t, and τ is 10% of the total simulation time measured in cell generations. The mean value $\overline{{{\Theta }}}$ over time measures the total extent of clonal turnover.

To determine whether clonal turnover mostly occurred early, late or throughout tumour evolution, we calculated the weighted average

$${\overline{T}}_{{{\Theta }}}=\frac{1}{\max (t)}\left(\mathop{\sum}\limits_{t}{{\Theta }}(t)t\bigg/\mathop{\sum}\limits_{t}{{\Theta }}(t)\right),$$

where $\max (t)$ denotes the final time of the simulation. This quantity takes values between 0 and 1, and is higher if clonal turnover occurs mostly late during tumour growth. If the rate of clonal turnover is constant over time, then ${\overline{T}}_{{{\Theta }}}\approx 0.55$.

Histology slide analysis to determine the number of cells per gland

We randomly selected five tumours of each of four cancer types (colorectal cancer, clear cell renal cancer, lung adenocarcinoma and breast cancer) from The Cancer Genome Atlas (TCGA) reference database (http://portal.gdc.cancer.gov). Using QuPath v0.2.0m4⁸⁸, we manually delineated five representative groups of tumour cells in each image and automatically counted the number of cells in each group. We defined a group as a set of tumour cells directly touching each other, separated from other groups by stroma or other non-tumour tissue (Extended Data Fig. 3).

The number of cells per group ranged from 5 to 8,485, with 50% of cases having between 53 and 387 cells (Extended Data Fig. 4a). Variation in the number of cells per group was larger between rather than within tumours, whereas cell density was relatively consistent between tumours (Extended Data Fig. 4b). Because our cell counts were derived from cross sections, they would underestimate the true populations of three-dimensional glands. On the other hand, it is unknown what proportion of cells are able to self-renew and contribute to long-term tumour growth and evolution⁸⁹. On balance, therefore, it is reasonable to assume that each gland of an invasive glandular tumour can contain between a few hundred and a few thousand interacting cells. This range of values is, moreover, remarkably consistent with results of a recent study that used a very different method to infer the number of cells in tumour-originating niches. Across a range of tissue types, this study concluded that cells typically interact in communities of 300–1,900 cells³⁰. Another recent study of breast cancer applied the Louvain method for community detection to identify two-dimensional tumour communities typically in the range of 10–100 cells.²⁹

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Data can be accessed at https://github.com/robjohnnoble/ModesOfEvolution.

Code availability

Our computational modelling code is available in an online repository⁶⁵. Our data analysis R package is at https://github.com/robjohnnoble/demonanalysis. Specific figure plotting code is at https://github.com/robjohnnoble/ModesOfEvolution. Muller plots were drawn using the ggmuller R package⁹⁰.

References

Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature 481, 306–313 (2012).
Article CAS PubMed PubMed Central Google Scholar
Davis, A., Gao, R. & Navin, N. Tumor evolution: linear, branching, neutral or punctuated? Biochim. Biophys. Acta Rev. Cancer 1867, 151–161 (2017).
Article CAS PubMed Google Scholar
Turajlic, S., Sottoriva, A., Graham, T. & Swanton, C. Resolving genetic heterogeneity in cancer. Nat. Rev. Genet. 20, 404–416 (2019).
Article CAS PubMed Google Scholar
Sun, R., Hu, Z. & Curtis, C. Big Bang tumor growth and clonal evolution. Cold Spring Harb. Perspect. Med. 8, a028381 (2017).
Article Google Scholar
Sottoriva, A. et al. A Big Bang model of human colorectal tumor growth. Nat. Genet. 47, 209–216 (2015).
Article CAS PubMed PubMed Central Google Scholar
Suzuki, Y. et al. Multiregion ultra-deep sequencing reveals early intermixing and variable levels of intratumoral heterogeneity in colorectal cancer. Mol. Oncol. 11, 124–139 (2017).
Article CAS PubMed Google Scholar
Ling, S. et al. Extremely high genetic diversity in a single tumor points to prevalence of non-Darwinian cell evolution. Proc. Natl Acad. Sci. USA 112, E6496–E6505 (2015).
Article CAS PubMed PubMed Central Google Scholar
Karlsson, J. et al. Four evolutionary trajectories underlie genetic intratumoral variation in childhood cancer. Nat. Genet. 50, 944–950 (2018).
Article CAS PubMed Google Scholar
Turajlic, S. et al. Deterministic evolutionary trajectories influence primary tumor growth: TRACERx Renal. Cell 173, 595–610.e11 (2018).
Article PubMed PubMed Central Google Scholar
Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, NEJMoa1616288 (2017).
Article Google Scholar
Dentro, S. C. et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell 76, 2239–2254 (2021).
Article Google Scholar
Williams, M. J., Werner, B., Barnes, C. P., Graham, T. A. & Sottoriva, A. Identification of neutral tumor evolution across cancer types. Nat. Genet. 48, 238–244 (2016).
Article CAS PubMed PubMed Central Google Scholar
Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041.e21 (2017).
Article PubMed PubMed Central Google Scholar
Marusyk, A. & Polyak, K. Tumor heterogeneity: causes and consequences. Biochim Biophys. Acta 1805, 105–117 (2010).
CAS PubMed Google Scholar
Lloyd, M. C. et al. Darwinian dynamics of intratumoral heterogeneity: not solely random mutations but also variable environmental selection forces. Cancer Res. 76, 3136–3144 (2016).
Article CAS PubMed PubMed Central Google Scholar
Maley, C. C. et al. Classifying the evolutionary and ecological features of neoplasms. Nat. Rev. Cancer 17, 605–619 (2017).
Article CAS PubMed PubMed Central Google Scholar
Noble, R. J. et al. Paracrine behaviors arbitrate parasite-like interactions between tumor subclones. Front. Ecol. Evol. 9, 675638 (2021).
Article PubMed PubMed Central Google Scholar
Friedl, P. & Alexander, S. Cancer invasion and the microenvironment: plasticity and reciprocity. Cell 147, 992–1009 (2011).
Article CAS PubMed Google Scholar
Pandya, P., Orgaz, J. L. & Sanz-Moreno, V. Modes of invasion during tumour dissemination. Mol. Oncol. 11, 5–27 (2017).
Article PubMed Google Scholar
Slarkin, M. Gene flow in natural populations. Annu. Rev. Ecol. Syst. 16, 393–430 (1985).
Article Google Scholar
Durrett, R. Branching Process Models of Cancer (Springer International, 2015).
Book Google Scholar
Clapp, G. & Levy, D. A review of mathematical models for leukemia and lymphoma. Drug Discov. Today. Dis. Models 16, 1–6 (2015).
Article PubMed Google Scholar
Hehlmann, R., Hochhaus, A. & Baccarani, M. Chronic myeloid leukaemia. Lancet 370, 342–350 (2007).
Article CAS PubMed Google Scholar
Morita, K. et al. Clonal evolution of acute myeloid leukemia revealed by high-throughput single-cell genomics. Nat. Commun. 11, 5327 (2020).
Article CAS PubMed PubMed Central Google Scholar
Preston, S. L. et al. Bottom-up histogenesis of colorectal adenomas: origin in the monocryptal adenoma and initial expansion by crypt fission. Cancer Res. 63, 3819–3825 (2003).
CAS PubMed Google Scholar
Ryser, M. D., Min, B.-H., Siegmund, K. D. & Shibata, D. Spatial mutation patterns as markers of early colorectal tumor cell mobility. Proc. Natl Acad. Sci. USA 115, 5774–5779 (2018).
Article CAS PubMed PubMed Central Google Scholar
Sun, R. et al. Between-region genetic divergence reflects the mode and tempo of tumor evolution. Nat. Genet. 49, 1015–1024 (2017).
Article CAS PubMed PubMed Central Google Scholar
Lugli, A., Zlobec, I., Berger, M. D., Kirsch, R. & Nagtegaal, I. D. Tumour budding in solid cancers. Nat. Rev. Clin. Oncol. 18, 101–115 (2021).
Article PubMed Google Scholar
Jackson, H. W. et al. The single-cell pathology landscape of breast cancer. Nature 578, 615–620 (2020).
Article CAS PubMed Google Scholar
Buder, T., Deutsch, A., Klink, B. & Voss-Böhme, A. Patterns of tumor progression predict small and tissue-specific tumor-originating niches. Front. Oncol. 8, 668 (2019).
Article PubMed PubMed Central Google Scholar
Hoefflin, R. et al. Spatial niche formation but not malignant progression is a driving force for intratumoural heterogeneity. Nat. Commun. 7, ncomms11845 (2016).
Article CAS PubMed PubMed Central Google Scholar
Waclaw, B. et al. A spatial model predicts that dispersal and cell turnover limit intratumour heterogeneity. Nature 525, 261–264 (2015).
Article CAS PubMed PubMed Central Google Scholar
Schlageter, M., Terracciano, L. M., D’Angelo, S. & Sorrentino, P. Histopathology of hepatocellular carcinoma. World J. Gastroenterol. 20, 15955 (2014).
Article PubMed PubMed Central Google Scholar
Hallatschek, O. & Nelson, D. R. Gene surfing in expanding populations. Theor. Popul. Biol. 73, 158–70 (2008).
Article PubMed Google Scholar
Fusco, D., Gralka, M., Kayser, J., Anderson, A. & Hallatschek, O. Excess of mutational jackpot events in expanding populations revealed by spatial Luria–Delbrück experiments. Nat. Commun. 7, 12760 (2016).
Article CAS PubMed PubMed Central Google Scholar
Gidoin, C. & Peischl, S. Range expansion theories could shed light on the spatial structure of intra-tumour heterogeneity. Bull. Math. Biol. 81, 4761–4777 (2018).
Article PubMed Google Scholar
Murase, E., Siegelman, E. S., Outwater, E. K., Perez-Jaffe, L. A. & Tureck, R. W. Uterine leiomyomas: histopathologic features, MR imaging findings, differential diagnosis, and treatment. Radiographics 19, 1179–1197 (1999).
Article CAS PubMed Google Scholar
Basara Akin, I. & Balci, P. Fibroadenomas: a multidisciplinary review of the variants. Clin. Imaging 71, 83–100 (2021).
Article PubMed Google Scholar
Yates, L. R. et al. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 21, 751–759 (2015).
Article CAS PubMed PubMed Central Google Scholar
Zhang, M. et al. Clonal architecture in mesothelioma is prognostic and shapes the tumour microenvironment. Nat. Commun. 12, 1751 (2021).
Article CAS PubMed PubMed Central Google Scholar
Minussi, D. C. et al. Breast tumours maintain a reservoir of subclonal diversity during expansion. Nature 592, 302–308 (2021).
Article CAS PubMed PubMed Central Google Scholar
Durante, M. A. et al. Single-cell analysis reveals new evolutionary complexity in uveal melanoma. Nat. Commun. 11, 496 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lemant, J., Le Sueur, C., Manojlovic, V. & Noble, R. Robust, universal tree balance indices. Preprint at bioRxiv https://doi.org/10.1101/2021.08.25.457695 (2021).
Williams, M. J. et al. Quantification of subclonal selection in cancer from bulk sequencing data. Nat. Genet. 50, 895–903 (2018).
Article CAS PubMed PubMed Central Google Scholar
Tarabichi, M. et al. Neutral tumor evolution? Nat. Genet. 50, 1630–1633 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bozic, I. et al. Accumulation of driver and passenger mutations during tumor progression. Proc. Natl Acad. Sci. USA 107, 18545–18550 (2010).
Article CAS PubMed PubMed Central Google Scholar
Chkhaidze, K. et al. Spatially constrained tumour growth affects the patterns of clonal selection and neutral drift in cancer genomic data. PLoS Comput. Biol. 15, e1007243 (2019).
Article CAS PubMed PubMed Central Google Scholar
Noble, R., Burley, J. T., Le Sueur, C. & Hochberg, M. E. When, why and how tumour clonal diversity predicts survival. Evol. Appl. 13, 1558–1568 (2020).
Article PubMed PubMed Central Google Scholar
Ryser, M. D. et al. Minimal barriers to invasion during human colorectal tumor growth. Nat. Commun. 11, 1280 (2020).
Article CAS PubMed PubMed Central Google Scholar
Cross, W. et al. The evolutionary landscape of colorectal tumorigenesis. Nat. Ecol. Evol. 2, 1661–1672 (2018).
Article PubMed PubMed Central Google Scholar
Sabarinathan, R. et al. The whole-genome panorama of cancer drivers. Preprint at bioRxiv https://doi.org/10.1101/190330 (2017).
West, J., Schenck, R. O., Gatenbee, C., Robertson-Tessi, M. & Anderson, A. R. A. Normal tissue architecture determines the evolutionary course of cancer. Nat. Commun. 12, 2060 (2021).
Article CAS PubMed PubMed Central Google Scholar
Fukatsu, A. et al. Growth pattern, an important pathologic prognostic parameter for clear cell renal cell carcinoma. Am. J. Clin. Pathol. 140, 500–505 (2013).
Article PubMed Google Scholar
Verine, J. et al. Architectural patterns are a relevant morphologic grading system for clear cell renal cell carcinoma prognosis assessment. Am. J. Surg. Pathol. 42, 423–441 (2018).
Article PubMed Google Scholar
Andersson, N. et al. Extensive clonal branching shapes the evolutionary history of high-risk pediatric cancers. Cancer Res. 80, 1512–1523 (2020).
Article CAS PubMed Google Scholar
Bacevic, K. et al. Spatial competition constrains resistance to targeted cancer therapy. Nat. Commun. 8, 1995 (2017).
Article PubMed PubMed Central Google Scholar
Gallaher, J. A., Enriquez-Navas, P. M., Luddy, K. A., Gatenby, R. A. & Anderson, A. R. Spatial heterogeneity and evolutionary dynamics modulate time to recurrence in continuous and adaptive cancer therapies. Cancer Res. 78, 2127–2139 (2018).
Article CAS PubMed PubMed Central Google Scholar
Viossat, Y. & Noble, R. A theoretical analysis of tumour containment. Nat. Ecol. Evol. 5, 826–835 (2021).
Article PubMed PubMed Central Google Scholar
von Loga, K. et al. Extreme intratumour heterogeneity and driver evolution in mismatch repair deficient gastro-oesophageal cancer. Nat. Commun. 11, 139 (2020).
Article CAS PubMed PubMed Central Google Scholar
Scott, J. G., Maini, P. K., Anderson, A. R. A. & Fletcher, A. G. Inferring tumor proliferative organization from phylogenetic tree measures in a computational model. Syst. Biol. 69, 623–637 (2020).
Article PubMed Google Scholar
Eden, M. A two-dimensional Poisson growth process. J. R. Stat. Soc. Ser. B 27, 497–504 (1965).
Google Scholar
Williams, T. & Bjerknes, R. Stochastic model for abnormal clone spread through epithelial basal layer. Nature 236, 19–21 (1972).
Article CAS PubMed Google Scholar
Ahmed, Z. & Gravel, S. Intratumor heterogeneity and circulating tumor cell clusters. Mol. Biol. Evol. 35, 2135–2144 (2018).
Article CAS PubMed Google Scholar
Durrett, R., Foo, J. & Leder, K. Spatial Moran models, II: cancer initiation in spatially structured tissue. J. Math. Biol. 72, 1369–1400 (2016).
Article CAS PubMed Google Scholar
Noble, R. demon: Deme-based Oncology Model (2019); https://github.com/robjohnnoble/demon_model
Snippert, H. J. et al. Intestinal crypt homeostasis results from neutral competition between symmetrically dividing Lgr5 stem cells. Cell 143, 134–144 (2010).
Article CAS PubMed Google Scholar
Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature 578, 122–128 (2020).
Article CAS PubMed PubMed Central Google Scholar
Baker, A.-M. et al. Crypt fusion as a homeostatic mechanism in the human colon. Gut 68, 1986–1993 (2019).
Article CAS PubMed Google Scholar
Nicholson, A. M. et al. Fixation and spread of somatic mutations in adult human colonic epithelium. Cell Stem Cell 22, 909–918.e8 (2018).
Article CAS PubMed PubMed Central Google Scholar
Gillespie, D. T. Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81, 2340–2361 (1977).
Article CAS Google Scholar
Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012).
Article CAS PubMed PubMed Central Google Scholar
Zhang, J. et al. Intratumor heterogeneity in localized lung adenocarcinomas delineated by multiregion sequencing. Science 346, 256–259 (2014).
Article CAS PubMed PubMed Central Google Scholar
Sottoriva, A. et al. Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics. Proc. Natl Acad. Sci. USA 110, 4009–4014 (2013).
Article CAS PubMed PubMed Central Google Scholar
Gerlinger, M. et al. Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing. Nat. Genet. 46, 225–233 (2014).
Article CAS PubMed PubMed Central Google Scholar
Cooper, C. S. et al. Analysis of the genetic phylogeny of multifocal prostate cancer identifies multiple independent clonal expansions in neoplastic and morphologically normal prostate tissue. Nat. Genet. 47, 367–372 (2015).
Article CAS PubMed PubMed Central Google Scholar
Boutros, P. C. et al. Spatial genomic heterogeneity within localized, multifocal prostate cancer. Nat. Genet. 47, 736–745 (2015).
Article CAS PubMed Google Scholar
Zhao, Z.-M. et al. Early and multiple origins of metastatic lineages within primary tumors. Proc. Natl Acad. Sci. USA 113, 2140–2145 (2016).
Article CAS PubMed PubMed Central Google Scholar
McPherson, A. et al. Divergent modes of clonal spread and intraperitoneal mixing in high-grade serous ovarian cancer. Nat. Genet. 48, 758–767 (2016).
Article CAS PubMed Google Scholar
Rasche, L. et al. Spatial genomic heterogeneity in multiple myeloma revealed by multi-region sequencing. Nat. Commun. 8, 268 (2017).
Article CAS PubMed PubMed Central Google Scholar
Pectasides, E. et al. Genomic heterogeneity as a barrier to precision medicine in gastroesophageal adenocarcinoma. Cancer Discov. 8, 37–48 (2018).
Article CAS PubMed Google Scholar
Turajlic, S. et al. Tracking cancer evolution reveals constrained routes to metastases: TRACERx Renal. Cell 173, 581–594 (2018).
Article CAS PubMed PubMed Central Google Scholar
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M. & Hornik, K. cluster: Cluster Analysis Basics and Extensions. R package version 2.1.0 (2019). https://CRAN.R-project.org/package=cluster
Shao, K.-T. & Sokal, R. R. Tree balance. Syst. Biol. 39, 266–276 (1990).
Article Google Scholar
Sackin, M. J. ‘Good’ and ‘bad’ phenograms. Syst. Biol. 21, 225–226 (1972).
Article Google Scholar
Mir, A., Rotger, L. & Rosselló, F. Sound Colless-like balance indices for multifurcating trees. PLoS ONE 13, 559–560 (2018).
Article Google Scholar
Colless, D. H. Review of phylogenetics, the theory and practice of phylogenetic systematics. Syst. Zool. 31, 100–104 (1982).
Article Google Scholar
Mir, A., Rosselló, F. & Rotger, L. A. A new balance index for phylogenetic trees. Math. Biosci. 241, 125–136 (2013).
Article PubMed Google Scholar
Bankhead, P. et al. QuPath: open source software for digital pathology image analysis. Sci. Rep. 7, 16878 (2017).
Article PubMed PubMed Central Google Scholar
Batlle, E. & Clevers, H. Cancer stem cells revisited. Nat. Med. 23, 1124–1134 (2017).
Article CAS PubMed Google Scholar
Noble, R. ggmuller: Create Muller Plots of Evolutionary Dynamics. R package version 0.5.4 (2019); https://cran.r-project.org/package=ggmuller
St Hill, C. A. et al. The high affinity selectin glycan ligand C2-O-sLe^x and mRNA transcripts of the core 2 β-1,6-N-acetylglusaminyltransferase (C2GnT1) gene are highly expressed in human colorectal adenocarcinomas. BMC Cancer 9, 79 (2009).
Article Google Scholar
Brunet, É., Derrida, B., Mueller, A. H. & Munier, S. Effect of selection on ancestry: an exactly soluble case and its phenomenological generalization. Phys. Rev. E 76, 041104 (2007).
Article CAS Google Scholar
Neher, R. A. & Hallatschek, O. Genealogies of rapidly adapting populations. Proc. Natl Acad. Sci. USA 110, 437–442 (2013).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank M. Hochberg, M. Lebid, K. Jahn, R. Neher, A. Deutsch, K. Korolev, C.-A. Weis, B. Werner, A. Sottoriva and A. Kaznatcheev for helpful discussions, and D. Minussi for sharing data. R.N. and N.B. were supported by ERC Synergy Grant 609883. J.N.K. was supported by the German Consortium for Translational Cancer Research (DKTK/DKFZ) fellowship programme and by RWTH Aachen START grant 2018/691906. R.N. was also supported by the National Cancer Institute of the National Institutes of Health under Award Number U54CA217376. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Robert Noble
Present address: Department of Mathematics, City, University of London, London, UK

Authors and Affiliations

Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
Robert Noble, Dominik Burri, Cécile Le Sueur, Jeanne Lemant & Niko Beerenwinkel
SIB Swiss Institute of Bioinformatics, Basel, Switzerland
Robert Noble & Niko Beerenwinkel
Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
Robert Noble
Biozentrum, University of Basel, Basel, Switzerland
Dominik Burri
Ceremade, Université Paris Dauphine-PSL, Paris, France
Yannick Viossat
German Cancer Consortium (DKTK), Heidelberg, Germany
Jakob Nikolas Kather
Applied Tumor Immunity, German Cancer Research Center (DKFZ), Heidelberg, Germany
Jakob Nikolas Kather
Internal Medicine III, University Hospital RWTH Aachen, Aachen, Germany
Jakob Nikolas Kather

Authors

Robert Noble
View author publications
You can also search for this author in PubMed Google Scholar
Dominik Burri
View author publications
You can also search for this author in PubMed Google Scholar
Cécile Le Sueur
View author publications
You can also search for this author in PubMed Google Scholar
Jeanne Lemant
View author publications
You can also search for this author in PubMed Google Scholar
Yannick Viossat
View author publications
You can also search for this author in PubMed Google Scholar
Jakob Nikolas Kather
View author publications
You can also search for this author in PubMed Google Scholar
Niko Beerenwinkel
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.N. conceived the research question and designed and created the modelling framework. R.N., D.B. and C.L.S. ran computational models and analysed the output. R.N. and J.L. developed tree balance indices. J.N.K. obtained and analysed histology slides. Y.V. derived the formula for the maximum possible diversity of linear trees (Supplementary Information). R.N. wrote the manuscript with critical comments and input from N.B. and the other authors. All authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Robert Noble or Niko Beerenwinkel.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Ecology and Evolution thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Cell proliferation rates and mutation burdens.

a, Evolution and final spatial distribution of cell proliferation rates, for a model of tumour growth via gland fission (8,192 cells per gland). b, Final spatial distribution of mutation burdens. c-d, Results of a model in which tumour cells disperse between neighbouring glands and invade normal tissue (512 cells per gland). e-f, Results of a boundary-growth model of a non-glandular tumour. In all cases, the driver mutation rate is 10⁻⁵ per cell division, and driver fitness effects are drawn from an exponential distribution with mean 0.1. Other parameter values are listed in Supplementary Table 4.

Extended Data Fig. 2 Results of variant models.

First column: Dynamics of driver mutation diversity in 20 stochastic simulations. Dynamics of clonal diversity (inverse Simpson index D) in 20 stochastic simulations of a non-spatial model. Black curves correspond to the individual simulations illustrated in subsequent columns. These particular simulations are those with indices D and n closest to the medians of sets of 100 replicates. Second column: Muller plots of clonal dynamics over time. Colours represent clones with distinct combinations of driver mutations (the original clone is grey-brown; subsequent clones are coloured using a recycled palette of 26 colours). Descendant clones are shown emerging from inside their parents. Third column: Final clone proportions (for the non-spatial model) or spatial arrangement (for spatial models). For spatial models, each pixel corresponds to a patch of cells, corresponding to a tumour gland, coloured according to the most abundant clone within the patch. Fourth column: Driver phylogenetic trees. Node size corresponds to clone population size at the final time point and the founding clone is coloured red. Only clones whose descendants represent at least 1% of the final population are shown. Final column: Evolutionary indices D and n at the final time point. Black points correspond to the individual simulations illustrated in previous columns. a, A model of tumour growth via gland fission (8,192 cells per gland), in which cells can acquire driver mutations that increase their contribution to the gland fission rate (with an average effect size of 50%), in addition to drivers that increase the cell division rate. b, A model in which tumour cells invade normal tissue but do not disperse within the tumour bulk (512 cells per gland). c, A boundary-growth model of a non-glandular tumour in which cells invade neighbouring sites within the tumour. d, A model in which tumour cells invade normal tissue at the tumour boundary only (2,048 cells per gland). e, A model of tumour growth via gland fission (2,048 cells per gland). Other parameter values are listed in Supplementary Table 4.

Extended Data Fig. 3 Quantification of tumour cell numbers per gland in a breast cancer histology slide.

Multiple glands were manually outlined and the number of cells in each gland was counted automatically. Shaded areas are cell masks obtained with QuPath. The original image was retrieved from The Cancer Genome Atlas at https://portal.gdc.cancer.gov (patient TCGA-49-AARR, slide 01Z-00-DX1).

Extended Data Fig. 4 Results of semi-automated analysis of five glands in each of twenty randomly selected TCGA histology slides, representing four cancer types.

a, Number of cells per gland (dashed lines correspond to the overall median, 25% and 75% quantiles). b, Cell density. BRCA = breast invasive carcinoma; ccRCC = clear cell renal cell carcinoma; CRC = colorectal cancer; NSCLC = non-small-cell lung cancer.

Extended Data Fig. 5 Variation in evolutionary indices D and n for an invasive glandular model with cell dispersal throughout the tumour and at the tumour boundary.

Results are shown for varied gland size (colours), driver mutation rate (columns) and average driver fitness effect (rows), with 100 stochastic simulations per model. Black points show values derived from multi-region sequencing of kidney cancers, lung cancers and breast cancers. Non-varied parameter values are the same as in Fig. 2.

Extended Data Fig. 6 Variation in tree balance index J¹ versus clonal diversity D for an invasive glandular model with cell dispersal throughout the tumour and at the tumour boundary.

Results are shown for varied gland size (colours), driver mutation rate (columns) and sensitivity threshold (rows), with 100 stochastic simulations per model. Driver mutations with frequency below the sensitivity threshold (0.005, 0.02, 0.05 or 0.1) are removed from the model output before calculating J¹ and D. Non-varied parameter values are the same as in Fig. 2. Black points show values derived from multi-region sequencing of kidney cancers, lung cancers and breast cancers.

Extended Data Fig. 7 Driver phylogenetic trees resulting from an evolutionary model with cell dispersal throughout the tumour and at the tumour boundary (512 cells per gland).

The rows show final outcomes of the same five simulations (in the same order) after adjustment to simulate different sensitivities in detecting rare mutations. Driver mutations with frequency below the sensitivity threshold are removed from the model output. This means that if the combined frequency of a clone and its descendants is below the sensitivity threshold then the clone is merged with its parent clone. The sensitivity threshold is varied from 0.5% (top row) to 10% (bottom row). Node size corresponds to clone population size and the founding clone is coloured red.

Extended Data Fig. 8 Combined degree distribution for 100 driver phylogenetic trees resulting from an evolutionary model with cell dispersal throughout the tumour and at the tumour boundary (512 cells per gland).

The four panels correspond to the same set of simulations after adjustment to simulate different sensitivities in detecting rare mutations. Driver mutations with frequency below the sensitivity threshold are removed from the model output. This means that if the combined frequency of a clone and its descendants is below the sensitivity threshold then the clone is merged with its parent clone. The sensitivity threshold is varied from 0.5% to 10%.

Extended Data Fig. 9 Alternative diversity indices.

a, Variation in the exponential of the Shannon diversity index versus mean number of driver mutations per cell (n). b, Variation in the ITH index versus mean number of driver mutations per cell (n). c, Correlation between the inverse Simpson index (D) and the ITH index. Coloured points correspond to four example models with different spatial structures and different manners of cell dispersal but identical driver mutation rates and identical driver mutation effects (100 stochastic simulations per model). Neutral counterparts of the four models are represented together as an additional group. Mutations with frequency less than 1% are removed from model outcomes before calculating ITH and D. Black circular points show values derived from multi-region sequencing of kidney cancers, lung cancers and breast cancers. Purple squares show values derived from single-cell sequencing data for acute myeloid leukaemia. Estimates of the Shannon index and ITH index based on multi-region sequencing data are expected to be lower than true values because these indices are sensitive to the removal of rare types, many of which are likely to be missing from the data due to sampling error.

Extended Data Fig. 10 Mutation frequency distributions for simulated tumours.

a-d, Complete mutation frequency distributions for models with only neutral mutations (blue points) or both neutral and driver mutations (red points). Each distribution represents combined data from 100 simulations of each of the four model types of Figs. 2 and 3. To clarify the shape of the distributions, especially at high frequencies, the x-axes are transformed as logit(x) = log(x/(1 − x)), which is approximately equal to log x when x is much less than 1. Dashed lines indicate the slope predicted for an exponentially-growing population acquiring only neutral mutations (negative slope) and a prediction of the Bolthausen-Sznitman coalescent (positive slope), which has been shown to describe genealogies when a constant-size population expands into uninhabited territory⁹² or when a constant-size population acquires both neutral and highly beneficial mutations⁹³. e-h, Mutation frequency versus timing of mutation for the specific model instances of Fig. 2. Point colour corresponds to clone (as in Fig. 2), and size corresponds to the division rate of cells within the clone. Driver mutations are typically preceded by a string of hitchhiking passenger mutations with similar frequencies. This figure format is inspired by Figure 2 of ref. ⁴. i-l, Mutation frequency distributions for the specific model instances of Fig. 2, with linear axes. Results are shown for a non-spatial branching process (a, e, i); tumour growth via gland fission (b, f, j); cell dispersal throughout the tumour and at the tumour boundary (c, g, k); and a boundary-growth model (d, h, l). Parameter values are the same as in Figs. 2 and 3.

Supplementary information

Supplementary Information

The maximum possible diversity of linear trees, Supplementary Tables 1–4 and Figs. 1–11.

Reporting Summary

Peer Review Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Noble, R., Burri, D., Le Sueur, C. et al. Spatial structure governs the mode of tumour evolution. Nat Ecol Evol 6, 207–217 (2022). https://doi.org/10.1038/s41559-021-01615-9

Download citation

Received: 17 June 2021
Accepted: 10 November 2021
Published: 23 December 2021
Issue Date: February 2022
DOI: https://doi.org/10.1038/s41559-021-01615-9

This article is cited by

Mapping cancer biology in space: applications and perspectives on spatial omics for oncology
- Sumin Lee
- Gyeongjun Kim
- Sunghoon Kwon
Molecular Cancer (2024)
LinG3D: visualizing the spatio-temporal dynamics of clonal evolution
- Anjun Hu
- Awino Maureiq E. Ojwang’
- Katarzyna A. Rejniak
BMC Bioinformatics (2024)
Nucleic acid-based drugs for patients with solid tumours
- Sebastian G. Huayamares
- David Loughrey
- Eric J. Sorscher
Nature Reviews Clinical Oncology (2024)
Tumor phylogeography reveals block-shaped spatial heterogeneity and the mode of evolution in Hepatocellular Carcinoma
- Xiaodong Liu
- Ke Zhang
- Weiwei Zhai
Nature Communications (2024)
Tumour Growth Mechanisms Determine Effectiveness of Adaptive Therapy in Glandular Tumours
- Rui Zhen Tan
Interdisciplinary Sciences: Computational Life Sciences (2024)

Subjects

Abstract

Similar content being viewed by others

Main

Results

Tumour architecture can determine the mode of evolution

Characterization of evolutionary modes and comparison with data

Further analysis of tumour evolutionary modes

Influence of tumour architecture on mutation frequency distributions

Discussion

Methods

Previous mathematical models of tumour population genetics

Computational model structure

Initial conditions

Stopping condition

Within-deme dynamics

Mutation

Driver mutation effects

Dispersal

Two versus three dimensions

Implementation

Sequencing data

Clonal diversity index

Other diversity indices

Defining evolutionary modes in terms of indices D and n

Previously defined tree balance indices

A robust tree balance index

Clonal turnover indices

Histology slide analysis to determine the number of cells per gland

Reporting Summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links