Modeling colorectal cancer evolution

Understanding cancer evolution provides a clue to tackle therapeutic difficulties in colorectal cancer. In this review, together with related works, we will introduce a series of our studies, in which we constructed an evolutionary model of colorectal cancer by combining genomic analysis and mathematical modeling. In our model, multiple subclones were generated by driver mutation acquisition and subsequent clonal expansion in early-stage tumors. Among the subclones, the one obtaining driver copy number alterations is endowed with malignant potentials to constitute a late-stage tumor in which extensive intratumor heterogeneity is generated by the accumulation of neutral mutations. We will also discuss how to translate our understanding of cancer evolution to a solution to the problem related to therapeutic resistance: mathematical modeling suggests that relapse caused by acquired resistance could be suppressed by utilizing clonal competition between sensitive and resistant clones. Considering the current rate of technological development, modeling cancer evolution by combining genomic analysis and mathematical modeling will be an increasingly important approach for understanding and overcoming cancer.


Introduction
Colorectal cancer is one of the most prevalent and deadly tumor types in both men and women worldwide. Patients are often diagnosed at an advanced stage, when tumor cell dissemination has taken place, and provided with a limited increase in overall survival by chemo-and targeted therapies due to therapeutic resistance. Although not limited to colorectal cancer, a clue to overcome the therapeutic difficulty resides in understanding the evolution of cancer cells. During tumorigenesis, normal cells are transformed into malignant cancer cells through the accumulation of mutations and natural selection. Evolution allows cancer cells to adapt to new environments and to acquire malignant phenotypes, such as metastatic potential and resistance to treatment. The idea that cancer is an evolutionary system was first proposed by Nowell in 1976 [1]. Subsequent discoveries of oncogenes and tumor suppressor genes (collectively referred to as driver genes) were integrated into this view, leading to Fearon and Vogelstein's multistep carcinogenesis hypothesis [2]: in colorectal tumorigenesis, while accumulating multiple driver genes including APC, KRAS, TP53, and SMAD4, a normal epithelial cell linearly transforms through a benign lesion into a malignant tumor. Since then, tumorigenesis has been viewed as a linear evolutionary process of malignant transformation through repeated acquisition of driver mutations and Darwinian selection.
However, since the advent of next-generation sequencing, this view has changed dramatically [3]. In particular, multiregion sequencing, in which we analyze multiple samples obtained from physically separate regions within the tumor of a single patient, has revealed that solid tumors harbor extensive intratumor heterogeneity (ITH) formed by branching evolutionary processes. Through multiregion sequencing, we identified two categories of somatic singlenucleotide mutations: "founder" and "progressor" mutations. Founder mutations are defined to be present in all of the regions while progressor mutations are defined to be present in some of the regions (note that they are also referred to by different terms in different studies, e.g., public/private or trunk/branch mutations). Founder mutations are assumed to accumulate during the early phases of cancer evolution. The common ancestor clone that has acquired all the founder mutations then branches into subclones, which accumulate progressor mutations and contribute to the formation of ITH. Through these multiregion mutational profiles, we can infer an evolutionary history of the cancer by constructing a phylogenetic tree.
For example, whole-exome multiregion sequencing using multiple samples from primary and metastatic lesions in 10 patients with renal cancer revealed extensive ITH and clonal branching evolution [4,5]. Their study also revealed not only founder non-silent mutations in some known driver genes such as VHL, but also progressor non-silent mutations in other known driver genes such as SETD2 and BAP1. It is intriguing that, in some cases, different mutations of the same driver gene or pathway were acquired independently. This phenomenon called parallel evolution also indicates that a part of the ITH was generated by Darwinian selection.

Neutral evolution in advanced colorectal cancer
Inspired by this pioneering study, we investigated ITH in nine cases of surgically resected late-stage colorectal tumors through whole-exome multiregion sequencing to identify founder and progressor mutations in each case [6]. The results obtained from one of the nine cases are shown in Fig. 1. Progressor mutations showed a mutational pattern that was geographically correlated with the sampling locations. Moreover, we found that, in each region, founder mutations existed as clonal mutations, while progressor mutations existed as subclonal mutations. This finding suggests that, even in each region, extensive ITH existed, which was not captured by the resolution of multiregion sequencing. In addition, most of the mutations in known driver genes such as APC and KRAS were found to be founder mutations. However, progressor mutations containing few driver mutations and parallel evolution were not confirmed, which is in contrast to the findings obtained in renal cancer. These observations suggest that apart from Darwinian selection, there are other evolutionary principles generating ITH.
In pursuit of these principles, we employed mathematical modeling of the evolutionary process generating ITH using an agent-based model [7]. The agent-based model assumes a set of system constituents, called independent agents, and specifies rules for the independent behavior of the agents themselves, as well as for the interaction between agents and the agent environment. The agent-based model is a flexible representation of the model, and given the initial conditions and parameters of the system, the behavior of the system can be easily analyzed by computational simulation. In the case of modeling the evolution of cancer, if each cell is assumed to be an agent, ITH can be easily represented by the differences in the internal states of each agent. For example, in a pioneering model, agents were assumed to be cells that contained a few genes and proliferated while accumulating mutations. As a result, computer simulation succeeded in reproducing ITH observed in single-genefocused experiments [8]. Since then, multiple mathematical modeling studies employing agent-based models have been developed to shed light on the principles underlying the generation of ITH. For example, stem cell hierarchy may contribute to ITH, and the interaction between cells as well as the turnover of cells in three-dimensional space may affect the formation of ITH [9].
Because the existing models could not completely reproduce the extensive ITH revealed by our multiregion  Fig. 1 Multiregion sequencing of colorectal cancer. a A schema of a multiregion sampling in a primary colorectal cancer and matched metastatic liver lesion. In this case, we obtained 20 samples from the primary lesion and one sample from the metastatic lesion. b A multiregion mutation profile. The depth of red represents mutant allele frequency while the colors of sample labels were prepared so that the similarities of colors represent those of mutation patterns. c A phylogenetic tree constructed from the multiregion mutation profile. The time when mutations in known driver genes of colorectal cancer are acquired is indicated along the tree. This image was obtained by modifying a figure which originally appeared in our previous work [6] sequencing of colorectal cancer, we developed a new agentbased model, the branching evolutionary process (BEP) model, to simulate heterogeneous cancer evolution [6]. Similar to the other models, the BEP model assumes cells to be agents (Fig. 2a). Each cell harbors n genes, including d driver genes, while each cell divides and dies in a unit time with a probability p and q, respectively. When the cell divides, each gene is mutated with probability r, and if any driver genes are mutated, the division probability p increases by 10 f -fold per mutation. In the BEP model, f can be regarded as the strength of driver genes. Given that a cell without mutations divides according to this rule, after the normal cell acquires the first driver mutation, which accelerates cell division, the proportion of the clone originating from the cell increases in a whole cell population. By repeating these steps, each cell gradually accumulates driver mutations as well as accompanying passenger mutations, which do not affect the cell division rate, and finally, a tumor is formed with numerous mutations accumulated. Depending on the parameter values during the course of cancer evolution, each cancer cell can accumulate different combinations of mutations to generate different types of ITH. In Fig. 2b, c, an example of snapshots of twodimensional tumor growth is shown, which was simulated using the BEP model with appropriate parameter values. In this example, driver mutations gradually accumulated in the cells, and a clone with four mutations was selected through Darwinian selection, which finally became dominant in the tumor.
To find the principle generating ITH, we performed a large number of BEP simulations using a supercomputer with various parameter settings to find conditions leading to the extensive ITH observed in our genomic analyses [6]. As a result, when cancer evolution was simulated with the assumption of a high mutation rate, followed by computer simulation of multiregion sequencing, we could reproduce mutation profiles similar to those obtained by our multiregion sequencing of colorectal cancers (Fig. 3a, b). That is, irrespective of the presence of founder mutations, progressor mutations contributed to the formation of a heterogeneous mutation profile, which was geographically correlated with sampling locations. Moreover, we could also reconstruct local heterogeneity, as illustrated by the finding that progressor mutations existed as subclonal mutations in each region. Intriguingly, while driver mutations were acquired as founder mutations, progressor mutations contained few driver mutations, and most of them comprised neutral mutations that did not affect the cell division rate. This suggests that, after the appearance of the common ancestor clone with accumulated driver mutations, extensive ITH was generated by neutral evolution. Moreover, single-cell mutation profiles of the simulated tumor suggest that the tumor comprises a large number of minute clones with numerous neutral mutations accumulated (Fig. 3c).
Similar to our study, other studies employed multiregion genomic analysis and mathematical modeling to propose a neutral evolution model to explain the origin of ITH in colorectal tumors [10] and liver cancer [11]. In contrast, an increasing number of multiregion genomic studies have demonstrated different evolutionary properties of ITH among cancer types: non-neutral, neutral, and in-between. [12]. It should also be noted that although we mentioned Top colored bands represent clones, while the blue bands on the left represent driver genes. This figure was obtained by modifying a figure which originally appeared in our previous work [53] earlier that the signature of Darwinian selection (i.e., the presence of subclonal driver mutation) is prominent in renal cancer, a large-scale multiregion cancer genome project, TRACERx Renal, identified the renal cancer subtype without the signature of Darwinian selection [13]. The clinical significance of different ITH properties remains unclear but has been proposed to be associated with clinical disease course [10]. A few studies employing mathematical modeling have also been performed to understand the mechanisms generating ITHs with different evolutionary properties [14,15].

Evolutionary shift during colorectal tumorigenesis
As described so far, by combining genomic analysis and mathematical modeling, we demonstrated that neutral evolution shapes the ITH of advanced colorectal cancer. The next question then arises: When is the ITH generated?
To answer this question, we performed multiregion sequencing of nine early-stage colorectal tumors, which were resected through endomicroscopic surgery [16]. As a result, we found that extensive ITH was already generated in the early stages of colorectal tumorigenesis. However, compared with late-stage tumors, early-stage tumors demonstrated the enrichment of driver genes in progressor mutations, suggesting the contribution of Darwinian selection. The difference in evolutionary modes is evident in the shapes of phylogenetic trees in which the lengths of the trunk and branches represent the number of founder and progressor mutations, respectively (Fig. 4). The late-stage tumors tended to have "palm tree-like" shapes that were composed of long trunks and short branches. In contrast, the  (Fig. 1b), and that driver mutations consisted only of founder mutations. c A simulated singlecell mutation profile of the simulated tumor, suggesting the existence of numerous clones that cannot be detected by multiregion sequencing. This image was obtained by modifying a figure which originally appeared in our previous work [6] early-stage tumors tended to have "forked tree-like" shapes that were composed of short trunks and long branches. It has been reported that the contribution of Darwinian selection to ITH can also be measured by the distribution of mutant allele frequencies (MAFs) in single-region sequencing data [17]; if a set of subclonal mutations comprised driver and accompanying passenger mutations, Darwinian selection should have made their MAFs high, compared with those from a set without driver mutations. By applying this idea to multiregion analysis, we examined MAFs in each sample of multiregion data to find that MAFs of progressor mutations, especially mutations shared by multiple samples, tended to shift to a higher level in early-stage tumors than in late-stage tumors. Collectively, these results demonstrate that Darwinian selection contributes to the formation of ITH in the early stages of colorectal tumorigenesis. It should be noted that, consistently to our data, another study has independently reported that early-stage colorectal tumors harbor subclonal driver mutations [18]. Although ITH exists in both early-stage and late-stage colorectal tumors, the evolutionary principle generating ITH shifts from Darwinian selection to neutral evolution. Next, we had to understand the mechanism underlying the temporal shift. To solve this problem, we focused on copy number alterations (CNAs). By comparing copy number profiles inferred from exome sequencing data, we found that CNAs drastically increased during the progression from the early stage to the late stage of colorectal tumorigenesis [16]. Notably, although some of the early-stage tumors that we analyzed contained not only adenoma but also carcinoma samples, we observed a significant increase in CNAs in the carcinoma samples alone. In addition to single-nucleotide mutations that we discussed so far, recent studies have demonstrated that, in multiple types of cancers, more drastic chromosome-and/or genome-wide evolutionary events that produce CNAs and chromosomal rearrangements may have occurred in a short time at the early stage of cancer evolution [19,20]. Such large-scale events could confer a marked fitness increase on one or a few cells, which expand to constitute the tumor mass uniformly. This type of evolution is referred to as "punctuated evolution" after the term "punctuated equilibrium," which was proposed for species evolution by Gould and Eldredge to challenge the longstanding paradigm of gradual Darwinian evolution [21] although the underlying molecular mechanisms that cause rapid bursts of change are very different. Based on our observation of the drastic increase in CNAs, we hypothesized that punctuated evolution triggers the temporal shift of the evolutionary principle generating ITH.
To examine this hypothesis, we employed mathematical modeling [14]. We modified the BEP model to reproduce punctuated evolution. In the original BEP model, we assumed that a cell can grow infinitely without a decrease in their growth speed. However, it is more natural to assume that there exists a limit of population size because of resource limitations and that the growth speed gradually slows down as the population size approaches the limit. The limit in the population size is called the carrying capacity and is employed in the well-known logistic equation [22]. By mimicking the logistic equation, we introduced the carrying capacity into the modified model and additionally employ an "explosive" driver mutation, which negates the effect of the carrying capacity; that is, it is assumed that the explosive driver mutation rapidly evolves the cell such that it can conquer the growth limit and attain infinite proliferation ability. We show the results of a simulation based on the modified model in Fig. 5. It was observed that multiple subclones with different driver genes coexist; that is, ITH shaped by Darwinian selection is prominent during the early phase of the simulation. Note that a growth curve plot indicates that, as the population size approaches the carrying capacity, the growth speed slows down; however, the tumor regrows after the appearance of a clone that has acquired an explosive driver mutation. Darwinian selection makes the clone with the explosive driver mutation dominant in the population, which causes subclonal driver mutations in the clone to shift to clonal mutations. Then, neutral mutations alone accumulate as subclonal mutations; that is, ITH is finally generated by neutral evolution. Collectively, our mathematical modeling also supports the notion that punctuated evolution triggers an evolutionary shift in colorectal tumorigenesis.
In summary, our genomic analysis and mathematical modeling led us to develop a new model of colorectal cancer evolution (Fig. 6). In our model, multiple subclones are generated by driver mutation acquisition and subsequent clonal expansion in an early-stage tumor; however, early tumor growth is inevitably hampered by obstacles such as spatial and nutritional limitations and immune attack. However, of the multiple subclones generated by Darwinian selection, a parental clone that can overcome the obstacles emerges. In addition to a sufficient set of driver mutations, such a clone acquires driver CNAs that endow a tumor with malignant phenotypes, such as invasion, angiogenesis, and immune escape, and then it dominantly regrows by overcoming the obstacles. Reinforcing this view, we recently disclosed that the arm-level CNA in cancer tissues elicits immune tolerance in the cancer microenvironment, such as impaired cytolytic activity and diminished expression of cytotoxic cell-related genes [23]. After punctuated evolution, numerous subclones were generated by the accumulation of neutral mutations.
Our model is consistent with the well-established multistep carcinogenesis model of CRC [2], in which mutations in major driver genes such as APC, KRAS, and TP53 are sequentially accumulated in adenoma and then additional CNAs are acquired during the progression from adenoma to carcinoma. The neutral evolution phase following punctuated evolution is also consistent with the recently proposed Big Bang model [10], where a tumor predominantly grows as a single expansion without selective sweep. It should also be noted that our model of the evolutionary shift from Darwinian selection to neutral evolution might be a simplified view; different evolutionary processes actually work not separately but simultaneously and continuously as a series of phases of cancer evolution, which is discussed in our previous paper [14].
Either way, our evolution model is far from complete, and many aspects remain to be done. Our model is a rough sketch of the evolution of microsatellite stable tumors, which is the major subtype of colorectal cancer. Since the minor subtype, microsatellite instable tumors, generally shows higher mutation rates and lower CNA rates than microsatellite stable tumors, our evolution model is not applicable [24]. Subtypes exist even in microsatellite stable tumors. For example, we identified depressed-type carcinoma, which is characterized by a depressed surface in colorectal mucosa, as early-stage lesions for a possible novel subtype of a microsatellite stable tumor. The depressed cancers were positively correlated with lymphovascular invasion, tumor budding, and massive submucosal invasion. Our genomic analysis also demonstrated that depressed carcinomas harbor arm-level copy number amplification in 13q and 20p more frequently than protruding carcinomas [25]. The construction of a subtype-specific evolution model should be addressed in future studies.
Although clinically important, the evolutionary path to metastasis was not addressed in our model. Recently, several studies employing multiregion sequencing have successfully uncovered metastatic routes in colorectal cancer [26][27][28]. So far, no driver events that specifically occurred in metastatic respectively. Top green bands represent a clone acquiring an explosive driver mutation, while the blue bands on the left represent driver mutations. Note that the driver mutation exists as subclonal mutations at the beginning but shifts to clonal mutations, as the clone acquiring the explosive driver mutation expands. This image was obtained by modifying a figure which originally appeared in our previous work [14] samples have been identified, suggesting that metastatic potential is already acquired in founder clones, consistent with our neutral evolution model. Mathematical modeling of the multiregion data indicated that metastatic clones tended to branch out from the primary tumor in the early phase of evolution, which is in contrast with the case presented in Fig. 1 [29]. Another study employing mathematical modeling proposed that fewer primary tumor lineages seed distant metastases than lymph node metastases, that is, different levels of selection work against the two sites [30]. The evolution and ITH of tumor-immune interactions should also be studied in greater depth.
Recently, studies that integrated multiregion immunogenomic data such as human leukocyte antigens, neoantigens, T cell receptor repertoire, and expression of immune-related genes have been reported for several cancer types [31][32][33][34]. Although much work on colorectal cancer remains to be done, such an approach might provide insight into the mechanism underlying resistance against immune checkpoint inhibitors, which is only approved and partially effective for microsatellite instable tumors [32].

Evolution-based strategy for coping with therapeutic resistance
Finally, we discuss on how to utilize our understanding of cancer evolution to address the problems related to therapeutic resistance. Currently, a large number of molecular target drugs are available or under development.
Targeting driver genes in colorectal cancer appears to be a rational approach when considering our evolution model where driver genes exist as founder mutations. However, drug resistance frequently appears during therapy for most drugs, leading to therapeutic failure. Our model indicates that late-stage colorectal tumors harbor extensive ITH generated by neutral evolution, which could be a fundamental cause of therapeutic failure. Whether a mutation is neutral or not depends on the surrounding environment, and an environmental change induced by a specific therapy can convert a neutral mutation that has no selective advantage before the therapy to a driver mutation leading to therapeutic refractoriness (resistant mutation). This means that any type of therapy can potentially generate a resistant clone with a resistant mutation, which leads to tumor relapse even if the therapy is temporarily effective.
Studies employing mathematical modeling have shown that tumor regrowth can be delayed or prevented by adjusting the therapeutic regimen [35,36]. In normal clinical practice, an anticancer drug is continuously administered to a patient with cancer at the maximum tolerated dose, if possible. Given that a tumor comprises major and minor clones that are sensitive and resistant to chemotherapy, respectively, the tumor temporarily shr inks since the major drug-sensitive clone is eradicated. However, as the major drug-sensitive clone disappears, the minor resistant clone can grow freely because of the release from growth competition, that is, the competitive relationship between the two clones is dissolved (Fig. 7a). In contrast, if we can keep the two clones in a competitive state while controlling the total tumor volume within an acceptable threshold, the  Fig. 6 Our model of colorectal cancer evolution. During early tumorigenesis, multiple subclones harboring different single-nucleotide mutations appear and constitute ITH by Darwinian selection. The tumor is then confronted with growth limitation before progressing to the late phase of tumorigenesis. Out of the multiple subclones generated by Darwinian selection, the parental clone that can conquer the growth limitation emerges. In addition to a sufficient set of driver mutations, such a clone acquires driver CNAs. The parental clone is selected to progress locally advanced cancer or metastatic cancer. During the late phase, extensive ITH is generated by neutral evolution. This image was obtained by modifying a figure which originally appeared in our previous work [16] survival of the patient can be prolonged as compared with the routine continuous administration. For example, in "adaptive therapy," the initial dose of an anticancer drug is high, and then the dosage is decreased as the tumor shrinks to eventually maintain the sensitive clone at a level sufficient to suppress the growth of the resistant clone (Fig. 7b) [37]. In fact, in an experimental system using xenografts, it has been shown to prolong the survival rate of mice compared with the standard dosing schedule [38]. In antiandrogen therapy for prostate cancer, continuous administration causes adverse events, leading to poor quality of life. To address this problem, "intermittent therapy" (Fig. 7c) has been proposed [39] in which administration is repeated cyclically while monitoring the level of serum prostate-specific antigen (PSA), a noninvasive biomarker quantifying prostate tumor growth. Current data do not show that intermittent therapy is inferior to continuous therapy, with statistical certainty. Moreover, the rationality of intermittent therapy is provided by the mathematical modeling of tumor growth dynamics measured by PSA [40,41]; that is, intermittent therapy can suppress relapse by utilizing clonal competition, similarly to adaptive therapy.
Although the availability of noninvasive biomarkers such as PSA is essential for mathematical modeling of actual clinical data, recent advances in liquid biopsy technology have made it possible to take a similar approach for other cancer types. In particular, liquid biopsy based on circulating tumor DNA (ctDNA) appears to be a promising tool for this purpose [42]. ctDNA is a tumor-derived portion of cell-free DNA (cfDNA), which is all non-encapsulated DNA circulating in the bloodstream. By applying digital PCR or deep sequencing to cfDNA extracted from patients' plasma, we can non-invasively detect mutations in ctDNA. The allele frequencies of the mutations in ctDNA are supposed to reflect the real-time clonal proportions in the whole tumor, including primary and metastatic lesions, which can provide an opportunity for tracking clonal dynamics during a therapeutic course.
A few pioneering studies have recently combined ctDNA-based liquid biopsy with mathematical modeling to understand the therapeutic resistance in colorectal cancer. For example, the time-series data acquired by digital PCR of cfDNA showed the emergence of a resistant KRAS mutation during anti-EGFR therapy in patients with metastatic colorectal cancer. Moreover, mathematical analysis of the time-series data suggested that the KRAS mutation already existed in the tumor before the initiation of chemotherapy, which is consistent with the view derived from our neutral evolution model [43]. Targeted cfDNA sequencing demonstrated that acquired resistance to anti-EGFR therapy results from multiple resistant clones, and mathematical modeling combined with frequent serial sampling of cfDNA allows prediction of the expected time to treatment failure in individual patients [44]. It has also been reported that after discontinuation of anti-EGFR therapy, resistant clones decay due to a lack of growth advantage relative to sensitive clones, which supports anti-EGFR rechallenge [45]. Collectively, we expect that mathematical modeling of tumor growth data profiled by liquid biopsy will not only help us understand colorectal tumor evolution during anticancer drug therapy but also work out therapeutic strategies for coping with therapeutic resistance.

Conclusion
In this review, we introduced our works in which we modeled colorectal cancer evolution by genomic analysis and mathematical modeling. The explosion of cancer genomic data still continues on; moreover, technological innovation represented by single-cell sequencing technologies is also accelerating [46]. Although most of the current single-cell sequencing technologies focus on transcriptome analysis, it has been reported that ITH of CNAs can be computationally inferred form single-cell RNA sequencing data [47]. Recently, a protocol for performing both transcriptomic profiling and targeted mutation detection simultaneously at the single-cell level has also developed [48]. These approaches appear to be useful to study not only genomic evolution itself but also phenotypic changes accompanying it. The resolution of spatial sequencing is approaching to the single-cell level [49]; the spatial sequencing approach will expectedly be a powerful tool for understanding ITH of solid tumors at the ultimate level.
Mathematical modeling incorporating large amount and complexity of data will be empowered by approximate Bayesian computation (ABC) [50]. ABC is a computational method that estimated the parameter distributions of a simulation model. Although ABC is originally introduced in population genetics, it has also recently gained popularity in other fields of biological science. In fact, our work [6] and another work cited above [29] employed ABC for fitting simulation models to real data. Although the performance of ABC depends on the choice of summary statistic, which is used for evaluating similarities between real and simulation data, it has recently been reported that the summary statistic can automatedly be selected by deep learning [51,52]. It is expected that, together with such methodological improvements, the expansion of computational resources will broaden the applicability of ABC. In summary, considering these recent technological advancements, modeling cancer evolution by combining genomic analysis and mathematical modeling will be an increasingly important approach for understanding and overcoming not only colorectal cancer but also other types of cancer.