A stochastic model for immunotherapy of cancer

We propose an extension of a standard stochastic individual-based model in population dynamics which broadens the range of biological applications. Our primary motivation is modelling of immunotherapy of malignant tumours. In this context the different actors, T-cells, cytokines or cancer cells, are modelled as single particles (individuals) in the stochastic system. The main expansions of the model are distinguishing cancer cells by phenotype and genotype, including environment-dependent phenotypic plasticity that does not affect the genotype, taking into account the effects of therapy and introducing a competition term which lowers the reproduction rate of an individual in addition to the usual term that increases its death rate. We illustrate the new setup by using it to model various phenomena arising in immunotherapy. Our aim is twofold: on the one hand, we show that the interplay of genetic mutations and phenotypic switches on different timescales as well as the occurrence of metastability phenomena raise new mathematical challenges. On the other hand, we argue why understanding purely stochastic events (which cannot be obtained with deterministic models) may help to understand the resistance of tumours to therapeutic approaches and may have non-trivial consequences on tumour treatment protocols. This is supported through numerical simulations.


Introduction
Immunotherapy of cancer received a lot of attention in the medical as well as the mathematical modeling communities during the last decades [1,2,3,4,5,6].Many different therapeutic approaches were developed and tested experimentally.As for the classical therapies such as chemo-and radiotherapy, resistance is an important issue also for immunotherapy: although a therapy leads to an initial phase of remission, very often a relapse occurs.The main driving forces for resistance are considered to be the genotypic and phenotypic heterogeneity of tumors, which may be enhanced during therapy, see [6,7,5] and references therein.A tumor is a complex tissue which evolves in mutual influence with its environment [8].
In this article, we consider the example of melanoma (tumors associated to skin cancer) under T-cell therapy.Our work is motivated by the experiments of Landsberg et al. [9], which investigate melanoma in mice under adoptive cell transfer (ACT) therapy.This therapeutic approach involves the injection of T-cells which recognize a melanocyte-specific antigen and are able to kill differentiated types of melanoma cells.The therapy induces an inflammation and the melanoma cells react to this environmental change by switching their phenotype, i.e. by passing from a differentiated phenotype to a dedifferentiated one (special markers on the cell surface disappear).The T-cells recognize the cancerous cells through the markers which are downregulated in the dedifferentiated types.Thus, they are not capable of killing these cancer cells, and a relapse is often observed.The phenotype switch is enhanced, if pro-inflammatory cytokines, called TNF-α (Tumor Necrosis Factor), are present.A second reason for the appearance of a relapse is that the T-cells become exhausted and are not working efficiently anymore.This problem was addressed by re-stimulation of the T-cells, but this led only to a delay in the occurrence of the relapse.Of course, other immune cells and cytokines are also present.However, according to the careful control experiments, their influence can be neglected in the context of the phenomena considered here.
Cell division is not required for switching, and switching is reversible.This means that the melanoma cells can recover their initial (differentiated) phenotype [9].The switch is thus a purely phenotypic change which is not induced by a mutation.The state of the tumor is a mixture of differentiated and dedifferentiated cells.One possibility to avoid a relapse is to inject two types of T-cells (one specific to differentiated cells as above, and the other specific to dedifferentiated cells) as suggested also in [9].
In this paper, we propose a quantitative mathematical model that can reproduce the phenomena observed in the experiments of [9], and which allows to simulate different therapy protocols, including some where several types of T-cells are used.The model we propose is an extension of the individual-based stochastic models for adaptive dynamics that were introduced in Metz et al. [10] and developed and analyzed by many authors in recent years (see e.g.[11,12,13,14,15,16,17,18,19]) to the setting of tumor growth under immunotherapy.Such models are also referred to as agent-based models or particle systems.Note that we use the term individual for T-cells, cytokines or cancer cells, in particular not for patients or mice.
These stochastic models describe the evolution of interacting cell populations, in which the relevant events for each individual (e.g.birth and death) occur randomly.
It is well known that in the limit of large cell-populations, these models are approximated by deterministic kinetic rate models, which are widely used in the modeling of cell populations.However, these approximations are inaccurate and fail to account for important phenomena, if the numbers of some sub-populations become small.In such situations, random fluctuation may become highly significant and completely alter the long-term behavior of the system.For example, in a phase of remission during therapy, the cancer and the T-cell populations drop to a low level and may die out due to fluctuations.
A number of (mostly deterministic) models have been proposed that describe the development of a tumor under treatment, focusing on different aspects.For example, a deterministic model for ACT therapy is presented in [3].Stochastic approaches were used to understand certain aspects of tumor development, for example rate models [20] or multi-type branching processes; see the book by Durrett [21] or [22,23,24].To our knowledge, however, it is a novel feature of our models to describe the coevolution of immune-and tumor cells taking into account both interactions and phenotypic plasticity.Our models can help understanding the interplay of therapy and resistance, in particular in the case of immunotherapy, and may be used to predict successful therapy protocols.The main expansions of our models are the following: • Two types of transitions are allowed: genotypic mutations and phenotypic switches.
The characteristic timescale for a mutation to occur can be significantly longer than a timescale for epigenetic transitions.• Phenotypic changes may be affected by the environment which is not modeled deterministically as in [18] but as particles undergoing the random dynamics as well.• A predator-prey mechanism is included (modeling the interaction of cancer cells and immune cells).The defense strategies of the prey are not modeled by different interaction rates as in [19] but by actually modeling the escape strategy, namely switching.• A birth-reducing competition term is included which takes account of the fact that competition between individuals may also affect their reproduction behavior.The paper is structured as follows.In Section 2 we define the model and state the convergence towards a quadratic system of ODEs in the large population limit.In Section 3, we first present an example which qualitatively models the therapy carried out in Landsberg et al. [9] (Subsection 3.1).We point out a phenomenon of relapse caused by random fluctuations: in the phase of remission, the typical number of T-cells is small, and random fluctuations can cause their extinction, allowing for a growth of the tumor.As a second example (Subsection 3.2) we study the T-cell therapy with two types of T-cells.In this case an ever richer class of possible behavior occurs: either the tumor is cured, or one type of T-cells dies out, or both types of T-cells do, or all populations survive for a given time.Subsection 3.3 presents our choice of physiologically reasonable parameters, and the reproduction of experimental observations together with predictions.In Section 4 we consider the case of rare mutations.Without therapy (Subsection 4.1), we first show how to extract an effective Markov chain on the space of genotypes, whose transition rates are given by the asymptotic behavior of a faster Markov process on the space of phenotypes.In particular, we define a notion of invasion fitness in this setting, and describe how we can obtain a generalization of the Polymorphic Evolution Sequence introduced in [27].Second (Subsection 4.2), we study the interplay of mutation and therapy.We consider birth-reducing competition between tumor cells and show that the appearance of a mutant genotype may be enhanced under treatment.

The model
We introduce a general model, which contains three types of actors: • Cancer cells: each cell is characterized by a genotype and a phenotype.These cells can divide (with or without mutation), die (due to age, competition or therapy) and switch their phenotype.We assume that the switch is inherited by the descendants of the switched cells.
• T-cells: each cell can divide, die and produce cytokines.
• Cytokines: each messenger can vanish and influence the switching of cancer cells.The trait space, X , is a finite set of the form where (g, p) ∈ G × P denotes a cancer cell with genotype g and phenotype p, z ∈ Z a T-cell of type z and w ∈ W a cytokine of type w.We write | • | for the number of elements of a set and ∪ • for disjoint unions of sets.
2.1.General notations and parameters.A population at time t ∈ R + is represented by the measure where ν t (x) is the number of individuals of type x at time t and δ x denotes the Dirac measure at x. Here, K is a parameter that allows to scale the population size and is usually called carryingcapacity of the environment.The dynamics of the population is described by a continuous time Markov process, (ν K t ) t≥0 , with the following rates: Each cancer cell of type (g, p) is characterized by • natural birth and death rates: b(p) ∈ R + and d(p) ∈ R + .
• competition kernels: c(p, p)K −1 ∈ R + and c b (p, p)K −1 ∈ R + , where the first term increases the death rate and the second term, called birth-reducing competition, lowers the birth rate of a cancer cell of phenotype p in presence of a cancer cell of phenotype p.If the total birth rate is already at a level 0, then c b (p, p)K −1 ∈ R + acts as an additional death rate.• therapy kernel: t(z, p)K −1 ∈ R + additional death rate of a cancer cell of phenotype p due to the presence of a T-cell of type z.In addition, kill w (z, p) ∈ N 0 cytokines of type w are deterministically produced at each killing event.
• switch kernels: s g (p, p) ∈ R + and s g w (p, p)K −1 ∈ R + denote the natural and cytokineinduced switch kernels from a cancer cell of type (g, p) to one of type (g, p).
• mutation probability and law: µ g ∈ [0, 1] denotes the probability that a birth event of a cancer cell of genotype g is a mutation.m((g, p), (g, p)) ∈ [0, 1] encodes the probability that, whenever a mutation occurs, a cancer cell of type (g, p) gives birth to a cancer cell of type (g, p).By definition m((g, p), (g, p)) = 0 and g,p m((g, p), (g, p)) = 1.Each T-cell of type z is characterized by • natural birth and death rates: b(z) ∈ R + and d(z) ∈ R + .
• reproduction kernel: b(z, p)K −1 ∈ R + denotes the rate of reproduction of a T-cell with trait z in presence of a cancer cell of phenotype p.In addition, prod w (z, p) ∈ N cytokines of type w are deterministically produced at each reproduction event.Each cytokine of type w is characterized by • natural death rate: d(w) ∈ R + .
• the molecules are produced when a cancer cell dies due to therapy or a T-cell reproduces.Note that the relation between G and P is encoded in the switch kernels.They specify which phenotypes are expressed by a given genotype and influence the proportions of the different phenotypes in a (dynamic) environment.
Figure 1 provides a graphical representation of the transitions for a population with trait space X = {x = (g, p), y = (g, p )} ∪ • {z x } ∪ • {w}, which constitute our model for the ACT therapy described in Landsberg et al. [9], see Subsection 3.1.
Dynamics of the process (without mutations) modelling the experiments described in [9].Here, x denotes differentiated melanoma cells, y dedifferentiated melanoma cells, z x T-cells and w TNF-α.At each arrow the rate for occurrence of the corresponding event is indicated (e.g.birth is illustrated with two arrowheads and death with an arrow directed to †).
2.2.The dynamics.Let M K (X ) ≡ 1 K n i=1 δ x i : n ∈ N, x 1 , . . ., x n ∈ X , i.e. the set of finite point measures on X rescaled by K.Then, for each K ∈ N, the dynamics of the M K (X )-valued Markov process, (ν K t ) t≥0 , describing the evolution of the population at each time t, can be summarized as follows: At time t = 0 the population is characterized by a given measure ν K 0 ∈ M K (X ).Each individual present at time t has several exponential clocks with intensities depending on its trait x ∈ X and on the current state of the system.We use the shorthand notation ν K t (p) = g∈G ν K t (g, p) and • ± to denote the positive/negative part of the argument.
( (3) Each present cytokine has -a mortality clock with rate d(w).
Moreover, an amount of kill w (z, p) particles (of trait w) are produced every time a T-cell of trait z kills a cancer cell of phenotype p, and a number of prod w (z, p) particles are produced every time a T-cell of trait z is produced in the presence of a cancer cell of phenotype p.
The measure-valued process (ν K t ) t≥0 is a Markov process whose law is characterized by its infinitesimal generator L K which captures the dynamics described above (cf.[25] Chapter 11 and [30]).The generator acts on bounded measurable functions φ from M K into R, for all η ∈ M K by × s g (p, p) + w∈W s g w (p, p)η(w) Kη(g, p) All the figures of this article containing simulations of the above model were generated by a computer program written by Boris Prochnau.The pseudo-code can be found in the Appendix A.

Large population approximation.
There is a natural way to jointly construct the processes (ν K t ) t≥0 for different values of K, see [25].For this joint realization, the sequence of rescaled processes ((ν K t ) t≥0 ) K converges almost surely as K tends to infinity to the solution of a quadratic system of ODEs, as stated in the following proposition, which can be seen as a law of large numbers.The deterministic system, which provides (partial) information on the stochastic system, consists of a logistic part, a predator-prey relation between T-cells and cancer cells, a mutation and a switch part.
Note that the population can be represented as a vector Proposition 1. Suppose that the initial conditions converge almost surely to a deterministic limit, i.e. lim K→∞ V K (0) = v(0).Then, for each T ∈ R + the sequence of rescaled processes (V K (t)) 0≤t≤T converges almost surely as K → ∞ to the d-dimensional deterministic process which is the unique solution to the following quadratic dynamical system: For all (g, p) ∈ G × P, for all z ∈ Z, ) where n(t) denotes the solution to Equations (2.4) with initial condition v(0).
Proof.This result follows from Theorem 2.1 in Chapter 11 of [25].
It is an important feature of stochastic models opposed to deterministic ones that populations can die out.There are two main reasons for the extinction of a population for finite K: first, the trajectory of the population size is transient and passes typically through a low minimum.In this case, random fluctuations can lead to extinction before the population reaches its equilibrium.Second, fluctuations around a finite equilibrium cause extinction of a population after a long enough time.The second case happens at much longer times scales than the first one.In both cases, the value of K plays a crucial role, since it determines the amplitude of the fluctuations and thus the probability of extinction.
The relevant mutations in the setup of cancer therapy are driver mutations and appear only rarely.In this case, more precisely, when the mutation probabilities, µ g ≡ µ K g , tend to zero as K → ∞, mutations are invisible in the deterministic limit.Due to the presence of the switches the analysis of the system is difficult.It is not a generalized Lotka-Volterra system of the form ṅ = nf (n), where f is linear in n.In Section 4 we show how to deal mathematically with rare mutations and their interactions with fast phenotypic switches or therapy.

3.
Relapse due to random fluctuations 3.1.Therapy with T-cells with one specificity.In this section we present an example which qualitatively models the experiment of Landsberg et al. [9], where melanoma escape ACT therapy by phenotypic plasticity in presence of TNF-α.Mutations are not considered here, since this was not investigated in the experiments.
Let us denote by x := (g, p) the differentiated cancer cells, by y := (g, p ) the dedifferentiated cancer cells, by z x the T-cells attacking only cells of type x, and by w the TNF-α proteins.We start with describing the deterministic system and denote by n i its solution for trait i ∈ {x, y, z x , w}.It is the solution to the following system of four differential equations: We fix the following parameters: and initial conditions: 3) The solution to the deterministic system (3.1) with parameters (3.2) and initial conditions (3.3) can be seen on Figure 4 (A).There are three fixed points in this example, see the black dots on Figure 3 (B): P 0000 where all populations sizes are zero, P xy00 where the T-cells and TNF-α are absent and both melanoma populations are present and P xyzxw where all populations are present.P xyzxw is the only stable fixed point and P xy00 is stable in the invariant subspace {n zx = 0} (i.e. if the T-cell population is zero).To highlight the qualitative features of the system, we choose parameters such that the minimum of the T-cell population during remission is low, and such that the equilibrium value of melanoma of type x in presence of T-cells is low, whereas equilibrium values of both melanoma types in absence of T-cells are high.
For initial conditions such that the number of differentiated melanoma cells, n x (0), is large, the number of injected T-cells, n zx (0), is small, and the numbers of dedifferentiated melanoma cells, n y (0), and TNF-α molecules, n w (0), are small or equal to zero, the deterministic system is attracted to P xyzxw : the T-cell population, n zx , increases in presence of its target x, TNF-α is secreted, and the population of differentiated melanoma cells, n x , shrinks due to killing and TNF-α induced switching, whereas the population of dedifferentiated melanoma cells, n y , grows.
For the stochastic system, several types of behavior can occur with certain probabilities: either the trajectory stays close to that of the deterministic system and the system reaches a neighborhood of the fixed point P xyzxw (Fig. 2 (A)) or the T-cell population, ν K (z x ), dies out and the system reaches a neighborhood of P xy00 (Figure 2 (B)).In the latter case the TNF-α population, ν K (w), becomes extinct shortly after the extinction of the T-cells, ν K (z x ), and the population of differentiated melanoma cells, ν K (x), can grow again.Moreover, TNF-α inducing the switch from x to y vanishes and we observe a relapse which consists mainly of differentiated cells.Depending on the choice of parameters (in particular switching, therapy or cross-competition), a variety of different behavior is possible.to add T-cells attacking the dedifferentiated cells as new actors to the setting described above.We denote them by z y .The system contains one more predator-prey term between y and z y : In addition to parameters (3.2), we use the following ones: and initial conditions: The solution to the deterministic system (3.The introduction of z y adds two new fixed points (see the blue dots on Figure 3 (B)): P xyzxzyw is the new stable fixed point with all non-zero populations, and P xy0zyw corresponds to the absence of the T-cell population of type z x .The invariant subspaces are now {n zx = 0}, in which P xy0zyw is stable, {n zy = 0}, in which P xyzx0w is stable and {n zx = 0} ∩ {n zy = 0}, in which P xy000 is stable.Note that P xyzx0w , corresponding to P xyzxw from the last section, is unstable in the enlarged space.
With the same initial conditions as before and n zy (0) small but positive, the deterministic system is attracted to the stable fixed point P xyzxzyw : the T-cell population, n zx , increases in presence of its target x, TNF-α is secreted, and the differentiated melanoma population shrinks due to killing and switching, the population of dedifferentiated melanoma grows, but is regulated and kept at a low level by the T-cells of type z y .Similarly, n x is regulated by n zx .
We choose the parameters such that the minima of the two types of T-cells during remission are low, so that they have a large enough probability to die out in the stochastic system.Since at the beginning of therapy no or only very few dedifferentiated melanoma cells are present, the population of T-cells of type z y starts growing only later.In order to avoid their early extinction a higher initial amount of these T-cells can be injected.There are now five main different scenarios in the stochastic system (see Figure 5).Either the T-cells of type z x (B), or the T-cells of type z y (C), or both of them die out (D).Also all populations can survive for some time fluctuating around their joint equilibrium (A).The fifth scenario is a cure, i.e. the extinction of the entire tumor due to the simultaneous attack of the two different T-cell types (F).T-cells and TNF-α vanish since they are not produced any more in the absence of their target.Of course, transitions between the different scenarios are also possible, e.g. the system could pass from Case (A) to (B) or (C) and then to (D), see Figure 5 (E).Furthermore, note that setting the switch from x to y to zero introduces an additional scenario: it is then possible that a relapse appears, which consists only of differentiated melanoma cells.
Starting from our choice of initial conditions, the deterministic system converges to P xyzxzy , but the stochastic system can hit one of the invariant hyperplanes due to fluctuations, and is driven to different possible fixed points, see Figure 3 (A).The transitions between the different scenarios can be seen as a metastability phenomenon.

Reproduction of experimental observations and predictions.
3.3.1.Comparison of experimental observations and simulations.The parameters of Section 3 are chosen ad hoc to highlight the influence of randomness and the possible behavior of the system.Let us now show that our models are capable to reproduce the experimental data of Landsberg et al. [9] quantitatively.The choice of parameters is explained below (Subsection 3.3.2).
Figure 6 (A) shows the experimental data of [9] whereas Figure 6 (B) shows the results of our simulations.Each curve describes the evolution of the diameter of the tumor over time.In the stochastic system two situations can occur: first, the relapse consists mainly of differentiated melanoma cells and the tumor reaches its original size again after 90 days.This is the case if the T-cells die out.Second, the relapse consists mainly of dedifferentiated cells and the tumor reaches its original size again after roughly 190 days.This is the case if the T-cells survive the phase of remission, become active again and kill differentiated cancer cells.In the simulations the therapy with one type of T-cells pushes the tumor down to a microscopic level for 50 to 60 days, as in the experimental data.The curves marked ACT in the experimental data in Figure 6 (A) are matched by simulation data when the T-cells die out (Differentiated relapse in Figure 6 (B)).In the experiments there might be T-cells which lose their function, e.g.due to exhaustion, and cannot kill the differentiated melanoma cells.This effect is to be seen as included in the death rate of T-cells in the model.They can be re-stimulated and become active again which is marked as ACT+Re in Figure 6 (A).Although our model does not include restimulation, the case of surviving T-cells in the simulations (Dediff.relapse in Figure 6 (B)) can qualitatively be interpreted as the case of ACT+Re.Note that the scales of the axes are the same in both figures and that the experimental findings are met very well by the simulations.The simulated curves under treatment start at the beginning of the treatment and not at day zero.The detailed pictures showing the evolution of melanoma and T-cell populations during the therapy are given in Figure 7.
As there is no data for the case of two T-cells, numerical simulations of such a therapy strategy should be seen as predictions.For the new T-cell population (of type z y ) we choose the same parameters as for the first population (of type z x ), just the target is different.The therapy seems to be very promising: almost all simulations show a cure for these parameters, only very few times a relapse occurs.Nevertheless, the behavior of the system (e.g. the probability to end up in the different scenarios) depends strongly on the choice of certain parameters, as pointed out in the last two sections.In order to give a reliable prediction we need data to obtain safer estimates for the most important parameters, which seem to be the switching and therapy rates as well as initial values.
The initial values play an important role for the success of a therapy.In the case of therapy with T-cells of one specificity, increasing the initial amount of T-cells has the following effect: the melanoma cells are killed faster, the population of differentiated melanoma cells reaches a lower minimum and as a consequence the T-cells pass through a lower and broader minimum.The probability that the T-cells die out increases, and a differentiated relapse is more likely than in the case of a smaller initial T-cell population.Moreover, the broadening of the minima causes a "delay" and both kind of relapses (consisting mainly of differentiated or dedifferentiated cells) appear later.But since the extinction of T-cells is more likely, the tumor may reach its original size earlier, see Figure 8.For an initial value ten times as large as in Figure 6 (B) the probability of an eradication of the tumor is still very small.If the number of T-cells initially is half the number of tumor cells, the probability of a favorable outcome is much higher.But such a high amount of T-cells is unrealistic.

Physiologically reasonable parameters.
We explain here how we choose the biological parameters.Some parameters can be estimated from the experimental data.Recall that the subject of [9] is to investigate the behavior of melanoma under T-cell therapy in mice.Without therapy the tumor undergoes only natural birth, death and switch events.
• Choice of birth and death rates: We assume that the number of cells in the tumor is described by where N t denotes the number of cells at time t, N 0 the initial population size and r the overall growth rate.Note that the estimate of the growth rate is independent of the initial value.Figure 4 (A) in the main part shows that the tumor needs roughly 50 days (without therapy) to grow from 2 mm diameter to 10 mm diameter.Since the structure of a melanoma is 3-dimensional this corresponds roughly to N 50 = 125N 0 which implies r = 0.1.Unfortunately, no data that allow to estimate the ratio of birth and death events are provided.As long as mutations are not considered this should not have a big impact and we chose b = 0.12 and d = 0.02 for the differentiated as well as the dedifferentiated cells.Landsberg et al. observed that the growth kinetics appear to be the same for both cell types, see Supplementary Figure 11 in [9].
• Choice of the competition: We assume that the competition has a very little effect here because the tumor grows exponentially in the observed time frame and does not come close to its equilibrium.We choose the competition between melanoma cells of the same type as c(x, x) = c(y, y) = 0.00005 and between different types of melanoma cells as c(x, y) = c(y, x) = 0.00002.The values are not set to 0 since the melanoma can grow only up to a finite size.
• Choice of the switch parameters: We can now estimate the switching parameters by using the data of Supplementary Figure 9e in [9].In this experiment where cell division is inhibited, we can set b = 0. Furthermore, the amount of TNF-α is constant and we set here n w = 2. Thus, the dynamics of the melanoma populations is described by At the beginning of their observations the switch is very slow and speeds up after the first 24 hours.We assume that there is a delay until the reaction really starts and thus we choose the proportions at day 1 (n x = 0.81 and n y = 0.19) as initial data and choose switching parameters such that roughly the concentrations at day 2 (n x = 0.45 and n y = 0.54) and 3 (n x = 0.24 and n y = 0.72) are reached as shown in Figure 9. Thereby we obtain s(x, y) = 0.0008, s(y, x) = 0.065 and s w (x, y) = 0.33.Note that the experiments we refer to provide only in vitro data and it is not clear if the in vivo situation is similar.

• Choice of parameters concerning T-cells: It remains to characterize the T-cells. Their
natural birth rate is set to 0 since they are transferred by adoptive cell transfer and not produced by the mice themselves and do not proliferate in absence of targets.We assume that they have a relatively high birth rate depending on the amount of cancer cells present, b(z x , x) = b(z y , y) = 2 and produce one TNF-α molecule when they divide, prod w (z x , x) = prod w (z y , y) = 1.Furthermore, we assume that 4.5 cancer cells can be killed per hour (including indirect mechanisms), t(z x , x) = t(z y , y) = 108.The rate of death for the T-cell population is chosen as d(z x ) = d(z y ) = 0.12.These parameters are chosen such that the qualitative behavior of the tumor was recovered.We choose the same parameters for the second T-cell type as for the first one because there are no data concerning the second T-cell type.
• Choice of starting values and the scale K: We set K = 10 5 , the initial value for the differentiated melanoma cell population to 1 and to 0 for the population of dedifferentiated melanoma cells.The ratio of differentiated and dedifferentiated cells is not known for small tumors which do not result from cell transfer of cells of in vitro cell lines.The initial value of the T-cell population is set to 0.02.We assume that the Tcells appear directly in the tumor, i.e. the migration phase into the tumor is not modeled.
To sum up, biological rates (per day) and initial conditions (in 100 000 cells) are:

Arrival of a mutant
The model we defined in Section 2 can be seen as a generalization of one of the standard models of adaptive dynamics, usually called BPDL, which was introduced in publications of Bolker, Pacala, Dieckmann and Law [11,12,13].In a possibly continuous trait space X ⊆ R d , the BPDL model allows for each individual to reproduce, with or without mutation, or die due to natural death or to competition (e.g. this amounts to only keep parameters b(x), d(x), c(x, y), m(x, y) and µ in Subsection 2.1).
In the limit of large populations (K → ∞ and µ fixed) Fournier and Méléard [30] proved a law of large numbers: the process converges to a system of deterministic integro-differential equations.In the case µ ≡ 0 the process converges to the solution of a system of coupled logistic equations (of Lotka-Volterra type) without mutations.The limit of large populations (K → ∞) followed by the limit of rare mutations (µ → 0) on the timescale t ∼ log(1/µ) was studied by Bovier and Wang [17] and a deterministic jump process is obtained as limiting behavior.
The simultaneous limits of large populations and rare mutations, where (K, µ) → (∞, 0) at a rate such that 1/(Kµ) log K and a timescale t ∼ 1/(Kµ) was studied by Champagnat and Méléard [15,27].At this scale the system has time to equilibrate between two successive mutations.The long-term behavior of the population can be described as a Markov jump process along a sequence of equilibria of, in general, polymorphic populations.An important (and in some sense generic) special case occurs when the mutant population fixates while resident population dies out in each step.The corresponding jump process is called the Trait Substitution Sequence (TSS) in adaptive dynamics.Champagnat [15] derived criteria in the context of individual-based models under which convergence to the TSS can be proven.The general process is called the Polymorphic Evolution Sequence (PES).It is described partly implicitly in [27], as it involves the identification of attractive fixed points in a sequence of Lotka-Volterra equations that are in general not tractable analytically.
In situations when the population converges to a TSS, one may take a further limit of small mutation steps, σ, and look at the time scale t ∼ 1/σ 2 .The mutant is then of trait y = x + σh where h is chosen according to a probability kernel on X .For birth and death rates that vary smoothly on X , this ensures a vanishing evolutionary advantage of mutants when σ → 0. The TSS converges in this limit to the so-called Canonical Equation of Adaptive Dynamics (CEAD), see e.g.[27], which describes the continuous evolution of a monomorphic population in a fitness landscape.The convergence of the individual-based model to the CEAD in a single step has recently be shown by Baar et al. [26], i.e. the limit (K, µ, σ) → (∞, 0, 0) of large populations, rare mutations and small mutation steps are taken simultaneously, provided 1/(Kµ) log(K)/σ and 1/ √ K σ 1.The time-scale on which this convergence takes place is t ∼ 1/(Kµσ 2 ), and corresponds to the combination of the previous two.
Costa et al. [19] study an extension of the model with a predator-prey relation.The predatorprey kernel is an explicit function of parameters describing defense strategies for preys, together with the ability of predators to circumvent the defense mechanism.In the limit 1/(Kµ) log K, convergence to a Markov jump process generalizing the Polymorphic Evolution Sequence is derived, and in the subsequent limit σ → 0, in the case of a monomorphic prey and predator populations, convergence to an extended version of the CEAD is obtained.4.1.Rare mutations and fast switches.In this section, we give a generalization of the Polymorphic Evolution Sequence in the case of fast switches in the phenotypic space, without therapy (no predator-prey term).We consider the case of rare mutations in large populations on a timescale such that a population reaches equilibrium before a new mutant appears: Champagnat and Méléard's proof of convergence to the PES [27] relies on a precise study of the way a mutant population fixates, which we now describe.A crucial assumption is that the large population limit is a competitive Lotka-Volterra system with a unique stable 1 fixed point n.The main task is to study the invasion of a mutant that has just appeared in a population close to equilibrium.The invasion can be divided into three steps.First, as long as the mutant population size is smaller than K for a fixed small > 0, the resident population stays close to its equilibrium.Therefore the mutant population can be approximated by a binary branching process.Second, once the mutant population reaches the level K , the whole system is close to the solution of the corresponding deterministic system (this is a consequence of Proposition 1) and reaches an -neighborhood of n in finite time.Finally, the subpopulations which have a zero coordinate in n can be approximated by sub-critical branching processes.The durations of the first and third steps are proportional to log(K), whereas that of the second step is bounded.The second inequality in (4.1) guarantees that, with high probability, the three steps of invasion are completed before a new mutation occurs.4.1.1.Invasion fitness.Given a population in a stable equilibrium that populates a certain set of traits, say M ⊂ X , the invasion fitness f (x, M ) is the growth rate of a population consisting of a single individual with trait x ∈ M in the presence of the equilibrium population n on M .In the case of the BPDL model, it is given by Positive f (x, M ) implies that a mutant appearing with trait x from the equilibrium population on M has a positive probability (uniformly in K) to grow to a population of size of order K; negative invasion fitness implies that such a mutant population will die out with probability tending to one (as K → ∞) before this happens.Invasion fitness is a fundamental concept in the analysis of stochastic population models.We first generalize it to the case where fast phenotypic switches are present, for pure tumor evolutions, i.e. we ignore the T-cells and the cytokines in our model.
Let us consider an initial population of genotype g (associated with different phenotypes p 1 , . . ., p ) which is able to mutate at rate µ K g to another genotype g , associated with k different phenotypes p 1 , . . ., p k .The assumption (4.1) ensures that no mutation occurs during the equilibration phase in the phenotypic space.
Consider as initial condition n(0) = (n (g,p 1 ) (0), . . ., n (g,p ) (0)) a stable fixed point, n, of the following system: We write for simplicity b We want to analyze whether a single mutant appearing with a new genotype g (and one of its possible phenotypes p i ), has a positive probability to give rise to a population of size of order K. Using the same arguments as Champagnat et al. [15,27], it is easy to show that as long as the mutant population has less than K individuals (with 1), the mutant population 1 By stable fixed point we mean that the eigenvalues of the Jacobian matrix of the system at the fixed point have all strictly negative real parts.
(g , p 1 ), . . ., (g , p k ) is well approximated by a k-type branching process, where each individual undergoes binary branching, death, or switch to another phenotype with the following rates: where (n 1 , . . ., n ) denotes the fixed point of (4.3).We will assume that the switch rates s ij are the transition rates of an irreducible Markov chain on {1, . . ., k}.The simplest example is the case where s ij > 0, for all i, j ∈ {1, . . ., k}.
Multi-type branching processes have been analyzed by Kesten and Stigum [31,32,33] and Athreya and Ney [34].Their behavior are classified in terms of the matrix A, given by where Note that f i would be the invasion fitness of phenotype i if there was no switch back from the other phenotypes to i (or if all switched cells would be killed).It is well-known that the multitype process is super-critical, if and only if the largest eigenvalue, λ 1 = λ 1 (A), of the matrix A is strictly positive, meaning that if λ 1 > 0, the mutant population will grow (with rate λ 1 ) to any desired population size before dying out.Thus λ 1 (A) is the appropriate generalization of the invasion fitness of a genotype: This notion can easily be generalized to the case when the initial condition is the equilibrium population of several coexisting genotypes.Note that this notion of invasion fitness of course reduces to the standard one of [15] if there is only one mutant phenotype.This settles the first step of our analysis, which is the invasion of the mutant.
4.1.2.Towards a generalized Polymorphic Evolution Sequence.In fact, one can say more about how the mutant population grows.Write Z (i) j (t) for the number of individuals of phenotype p j existing at time t for this branching process when the first mutant is of phenotype p i .Then, for i, j ≤ k, where Assume that the largest eigenvalue λ 1 (A) is simple and strictly positive.Let v and u be the left and right eigenvectors of A associated to λ 1 , normalized such that u • v = 1 and u • 1 = 1.The extinction probability vector q = (q 1 , . . ., q k ) where q i = P(Z (i) (t) = 0 for some t) is the unique solution of the system of equations: which has in general no analytical solution.Then the following limit theorem holds [31]: where (W i ) i=1,...,k is random vector with non-negative entries such that In particular, conditionally on survival, the phenotypic distribution of the mutant populations converges almost surely to a deterministic quantity, which moreover does not depend on the phenotype of the initial mutant, namely For us, this implies the important fact that the phenotypic structure of the mutant population once it reaches the level εK > 0 is almost deterministic.Then, conditionally on survival, (4.11) implies that the time τ εK until the total mutant population reached εK is of order log(K).Moreover, the proportions of the k types of phenotypes converge to deterministic quantities given above, Once the mutant population has reached the level εK, the behavior of the process can be approximated by the solution of the deterministic system: (where i runs from 1 to in the first line and from 1 to k in the second one) with initial conditions in a small neighborhood of (n (g,p 1 ) (0), . . ., n (g,p ) (0), n (g ,p 1 ) (0), . . ., n (g ,p k ) (0)) = n1 , . . ., n , εv (4.16)If the system (4.15) is such that a neighborhood of (4.16) is attracted to the same stable fixed point, we are in the same situation as in Champagnat and Méléard [27] and get the generalization of the Polymorphic Evolution Sequence on the genotypic trait space.
Observe that in the case when the system of equations (4.15) has multiple attractors, and different points near (4.16) lie in different basins of attraction, then for finite K, the choice of attractor the system approaches may be random.The characterization of the asymptotic behavior of the system (4.15) is needed to describe the final state of our stochastic process.This is in general a very difficult and complex problem, which is not doable analytically and requires numerical analysis.
Figure 10 shows examples where in a population consisting only of type (g, p) a mutation to genotype g occurs.g is associated to two possible phenotypes p 1 and p 2 .The mutant phenotype p 2 has a negative initial growth rate but can switch to p 1 which has a positive one.The fitness of the genotype g is positive.(B) The fitness of the mutant genotype g is positive, although each phenotype has a negative initial growth rate.This is possible because an outgoing switch is a loss of a particle for a phenotype, but not for the whole genotype.
In this case, p 2 can switch to p 1 but the back-switch is very weak, and we have f 2 < 0 and f 1 > 0 according to definition (4.6).The global fitness of the genotype g is positive and close to f 1 .The coexistence of the two phenotypes depends on the cross-competition c 12 and c 21 .For these parameters f 1 and f 2 as defined in (4.6) are negative, but, due to the cooperation of the two phenotypes, the fitness of the genotype is positive and it invades with positive probability as indicated by the definition (4.7).Moreover, both phenotypes appear on a macroscopic level.
4.2.Interplay of mutation and therapy.In the previous section we considered the probability of invasion of a mutant when the resident population is at an equilibrium given by a stable fixed point.In the context of therapy, there are phases when populations shrink and regrow due to treatment and relapse phenomena.In the medical literature, there are frequent allusions to the possibility that such growth cycles may induce fixation of a "super-resistant mutant", see e.g.[28,5,29].It is important to understand whether and under what circumstances such effects may happen.Here we show an example where the appearance of a mutant genotype may indeed be enhanced under treatment.We consider birth-reducing competition (BRC) between tumor cells.In such a case, a large population at equilibrium may encounter fewer births and hence mutations, than a smaller population growing towards its equilibrium size.
Let us discuss in more detail how the birth-reducing competition can have a crucial effect on the mutation timescale.For the sake of simplicity we consider an example where the switching rates are set to 0. Consider a melanoma population (g, p) which is able to mutate to a fitter type of melanoma (g , p ).We allow for one T-cell population attacking the resident melanoma population since this is the simplest scenario where the effect of therapy in this context can be explained.As the presence of TNF-α only influences the switch between phenotypes, it does not play any role in this example and we can set the corresponding parameters ( w and d(w)) to zero.If µ K g → 0 as K → ∞ the limiting deterministic system is given by: ṅ(g,p) = n Note that the mutation term does not appear in the deterministic system and that the difference between birth-reducing competition and usual competition is not visible.The effects we are looking for are intrinsically stochastic and happen on time-scales that diverge with K.
If the competition is only of birth-reducing type, then the total mutation rate of the population of type (g, p) at time t is  The interesting scaling of the mutation rate is Kµ K g → α > 0 as K → ∞.In this case, there is a number of mutations of order one while the population grows by O(K) individuals.For lower mutation rates, no mutant can be expected before the population reaches its equilibrium, while for higher rates, mutations occur unrealistically fast.Since for µ K g → 0 the mutation term does not appear in the deterministic system, the difference between BRC and usual competition is invisible.
During therapy, a tumor which is close to equilibrium (similar to Figure 12 (B)) can shrink to a small size (similar to Figure 12 (A)): the introduction of T-cells in the system lowers the population size of melanoma, and the total mutation rate in the tumor population of type (g, p) can be larger during treatment, see Figure 12 (C).This means that treatment could lead to earlier mutations and thereby accelerate the evolution towards more aggressive tumor variants.
Note that this example provides an interesting situation of interplay between therapy and mutation.By lowering the melanoma population, the T-cell therapy actually increases the probability for it to mutate to a potentially fitter and pathogenic genotype, which is not affected by the T-cells.The number of individuals divided by 1000 is plotted versus time: Effect for an initial population which is small (A), or at equilibrium (B) or under therapy (C) and (D).

Discussion
Therapy resistance is a major issue in the treatment of advanced stages of cancer.We have proposed a stochastic mathematical model that allows to simulate treatment scenarios and applied it to the specific case of immunotherapy of melanomas.Comparison to experimental data is so far promising.The models pose challenging new mathematical questions, in particular due to the interplay of fast phenotypic switches and rare driver mutations.First numerical results point to a significant effect of stochastic fluctuations in the success of therapies.More precise experimental data will be needed in the future to fit crucial model parameters.While our models describe the actions of individual cells and cytokines, they do not by far resolve the full complexity of the biological system.In particular, they do not reflect the spatial structure of the tumour and its microenvironment.Also, the distinction of only two phenotypes of the tumour cells is a simplification.The same is true for the interaction with other immune cells and cytokines.This reflects on the one hand the limitation due to available experimental data, on the other hand the use of a model of reduced complexity also makes numerical computations and theoretical understanding of the key phenomena feasible.The rates entering as model parameters therefore have to be understood as effective parameters, e.g. the death rate of T-cells accounts for their natural death as well as the exhaustion phenomenon.In principle it is possible to increase the resolution of the model; this, however, increases the number of parameters that need to be determined experimentally which would pose a major challenge.Already at the present stage, the model parameters are not known well enough and are adjusted to reproduce the experimental findings.Some parameters that it would be very useful to see measured precisely are: • birth and death rates of tumour cells, both in differentiated and dedifferentiated types.Currently these are estimated from the growth rate of the tumour, but this yields only the difference of these rates; • killing rates of T-cells, both of the differentiated and the dedifferentiated tumour cells; • rates of phenotypic switches, both in the absence and the presence of TNF-α; • death rates of T-cells and their expansion rates when interacting with tumour cells.Nevertheless, we see the proposed model as a promising tool to assist the development of improved treatment protocols.Simulations may guide the choice of experiments such that the number of necessary experiments can be reduced.The obvious strength of our approach is to model reciprocal interactions and phenotypic state transitions of tumour and immune cells in a heterogeneous and dynamic microenvironment in the context of therapeutic perturbations.
The clinical importance of phenotypic coevolution in response to therapy has been recently documented in patients' samples from melanomas acquiring resistance to MAPK inhibitors [35].Adaptive activation of bypass survival pathways in melanoma cells was accompanied by the induction of an exhausted phenotype of cytolytic T cells.This has important implications for the combinatorial use of cancer immunotherapy (checkpoint inhibitors like anti PD-1) with respect to scheduling.We envision that our mathematical approach will help to integrate such patient omics data with experimental findings to guide novel strategies.Of note and similar to our previous study, dedifferentiation of melanoma cells was identified as a major mechanism of escape from MAPK inhibitors [36,37].We recently dissected the molecular circuitries that control melanoma cell states and showed how melanoma dedifferentiation governs the composition of the immune cell compartment through a cytokine-based crosstalk in the microenvironment [38,39].Hence, malignant melanoma is a paradigm for a phenotypic heterogeneous tumour and a future goal is to incorporate this increasing knowledge of melanoma cell plasticity into our method to refine its capability to model complex interactions with immune cells.
Importantly, phenotypic plasticity in response to therapy is a widespread phenomenon and non-small cell lung cancer (NSCLC) represents a prominent example.A subset of NSCLCs harbour activating mutations in the epidermal-growth factor receptor EGFR and respective small molecule inhibitors (EGFRi) are potent first line cancer drugs for this NSCLC subtype.However, tumours invariably relapse and genetic selection of subclones with the second site resistance mutation EGFRT790M is the major event.A substantial number of relapse tumours show remarkable transitions from an NSCLC adenocarcinoma to a neuroendocrine-related small cell lung cancer (SCLC) phenotype [40].Given the recent success of immune checkpoint inhibitors in NSCLC [41], it will be of clinical interest to investigate the phenotypic coevolution of immune cells in the context of NSCLC-SCLC lineage transitions.Again, our mathematical approach could represent a valuable tool to support this research.Finally, our results suggest that stochastic events play an unanticipated critical role in the dynamic evolution of tumours and the emergence of therapy resistance that requires further experimental and clinical investigation.Sample (g, p) according to m((g, p), (g, p)) Note that x chosen = (g, p) for some (g, p) ∈ G × P in this case.Sample z ∈ Z according to the weights t(z, p)ν Sample (g, p) ∈ Supp(ν K T k ) according to the weights b(x chosen , p)ν K T k (g, p), (g, p) ∈ Supp(ν Note that x chosen = (g, p) for some (g, p) ∈ G × P in this case.
Sample p ∈ P according to s g (p, p) + w∈Supp(ν K T k ) s g w (p, p)ν K T k (w), p ∈ P Algorithm 1: Pseudo-code of the Gillespie algorithm used for generating the figures.

Figure 2 .Figure 3 .
Figure 2. Simulations of the evolution of melanoma under T-cell therapy for parameters (3.2) and initial conditions (3.3).The graphs show the number of individuals divided by 200 versus time.Two scenarios are possible for therapy with T-cells of one specificity: (A) T-cells z x survive and the system is attracted to P xyzxw , or (B) T-cells z x die out and the system is attracted to P xy00 .

Figure 6 .Figure 7 .
Figure 6.Comparison of experimental data obtained by Landsberg et al. with simulations for biologically reasonable parameters.The graphs show the diameter of the tumor measured in millimeters versus time in days after tumor initiation: (A) experimental data, (B) simulated data (K = 10 5 and n zx (0) = 0.02).

Figure 9 .
Figure 9. Switch in the in vitro experiments for inhibited cell division and constant concentration of TNF-α.Dashed lines indicate experimental data.

Figure 10 (Figure 10 .
Figure 10.Simulations for rare mutations in combination with fast switching, where the number of individuals divided by 200 is plotted versus time.(A)The mutant phenotype p 2 has a negative initial growth rate but can switch to p 1 which has a positive one.The fitness of the genotype g is positive.(B) The fitness of the mutant genotype g is positive, although each phenotype has a negative initial growth rate.This is possible because an outgoing switch is a loss of a particle for a phenotype, but not for the whole genotype.

. 20 )
This is a positive and concave function of ν K t (g, p) on the interval [0, b(p)/c b (p, p)], see Figure 11.If the population is at equilibrium (without or before therapy), meaning ν K t (g, p) = n(g,p) = (b(p) − d(p))/c b (p, p), then the time until a mutation occurs is approximately exponentially distributed with parameter equal to

Figure 11 .
Figure 11.Initial total mutation rate of the population (g, p).

Figure 12 .
Figure 12.Simulations of mutation events in a population, where competition is acting via birth reduction (parameters are given by (4.22)).The number of individuals divided by 1000 is plotted versus time: Effect for an initial population which is small (A), or at equilibrium (B) or under therapy (C) and (D).