A mechanistic mathematical model of initiation and malignant transformation in sporadic vestibular schwannoma

Background A vestibular schwannoma (VS) is a relatively rare, benign tumour of the eighth cranial nerve, often involving alterations to the gene NF2. Previous mathematical models of schwannoma incidence have not attempted to account for alterations in specific genes, and could not distinguish between nonsense mutations and loss of heterozygosity (LOH). Methods Here, we present a mechanistic approach to modelling initiation and malignant transformation in schwannoma. Each parameter is associated with a specific gene or mechanism operative in Schwann cells, and can be determined by combining incidence data with empirical frequencies of pathogenic variants and LOH. Results This results in new estimates for the base-pair mutation rate u = 4.48 × 10−10 and the rate of LOH = 2.03 × 10−6/yr in Schwann cells. In addition to new parameter estimates, we extend the approach to estimate the risk of both spontaneous and radiation-induced malignant transformation. Discussion We conclude that radiotherapy is likely to have a negligible excess risk of malignancy for sporadic VS, with a possible exception of rapidly growing tumours.


Appendices A Series solution of sporadic incidence model
Equation (1) can be expressed explicitly: with the matrix M in (1) encoding the dependency structure.
The system of differential equations (1) is linear, and solutions can be found by a wide variety of well-studied methods. Exact symbolic solutions are available but not very useful in our case. The evolutionary dynamics are neutral, with no exponential clonal expansion during initiation. Furthermore, we are interested only in relatively early times t compared to r −1 LOH ≈ 300,000 years. Our initial conditions are also not on a singular point [94,95]. These facts make a power series solution appropriate. This also makes the similarity of the three-hit model to the classic Armitage-Doll curve very clear [4,23].
A m,n t n for subpopulation m (see figure 2 and system (1)). The A m,n coefficients can then be found recursively [94].
The resulting solutions, truncated at the third term, read: and so, P 1 , P 2 and P 3 read It is also clear that f LOH and f SM ARCB1 are only approximately independent of patient age: where It follows that this approximation is accurate up to a factor of approximately O(r LOH t), so any age-dependence should only show up on a timescale of r −1 LOH ≈ 300, 000 years. This is far in excess of a typical human lifetime, so in any real empirical study, any association can be expected to be unobservable. This assumes that these alterations are selectively neutral. If this is not true, then the relative frequency of these alterations may change over time, indicating selection [81]. It may also be the case that the model ceases to be accurate over shorter timescales than r −1 LOH : without knowledge of selection coefficients of the relevant genes, we cannot comment further.

B Statistical methods: bootstrapping and regularisation
Bootstrapping consists of drawing new samples at random from a dataset, with replacement [47,48]. Our measures of relative frequency come from datasets with n samples, k of which are positive results. When one sample is drawn uniformly at random, there is a probability p that this draw will be positive.
Bootstrapping n samples with this probability p will result in a "sample" valuek. Because this sampling process consists of n independent draws with replacement, the resulting sample valuek will be binomially distributed: These draws can therefore be simulated by generating binomially distributed random numbers, given an estimate of the parameter p.
Binomial random variables were generated with Python's NumPy library and substituted for f LOH or f SM ARCB1 in formulae (13) and (14). Repeating this process for many values off LOH andf SM ARCB1 allowed us (in theory) to generate distributions for n GF X and r LOH that reflect the uncertainty due to the small sample size of the underlying experiment.
However, a problem arises ifk is 0 or n: whenf SM ARCB1 = 0 orf LOH = 1, equations (13) and (14) are singular. This can make distribution means, variances, and even some quantiles ill-defined. To address this, we used a Bayesian estimator forf givenk and n [42, 96], for some α = β > 0. The exact choice of constant α = β was not found to have a strong effect on our estimates or computed confidence intervals: we chose the Jeffreys prior, α = β = 1 2 , so that This gives a rigorous justification for additive smoothing: it ensures that our prior distribution is uninformative, and regularises the sample estimatorf , making it well-behaved whenk is 0 or n [42]. Intuitively, this can be thought of as adding one additional datum with a value of 1 2 to the resampled dataset. It should be noted that we already apply additive smoothing with α = 1 2 to our estimate of the parameter p in 2.1.4 before the bootstrapping procedure. This avoids the case in which p = 0, in which case the results would be meaningless: it is impossible to meaningfully resample data with no events. This is therefore not a true bootstrapping procedure, in which we resample the underlying data. It is instead a simulation of a bootstrapping procedure applied to a regularised dataset. Namely, this regularised dataset is equivalent to adding one extra datum to the original, with a value of 1 2 . It is easily seen that this procedure becomes equivalent to bootstrapping in the limit that α = 0.
We also considered and rejected the use of an informative prior. Because the regularisation procedure is applied both before and after resampling, an uninformative prior should be used in both cases, so as not to introduce inappropriate bias. From a Bayesian perspective, while there are good reasons to believe that f LOH and f SM ARCB1 are neither exactly zero nor exactly one, there is no objective reason to pick a particular value otherwise.
The choice of prior is easily adjusted to see what effect it has on our conclusions: our computed confidence intervals are relatively robust. The Python code that implements the bootstrapping procedure has been included in the supplemental material.

C Parameter sensitivity
The parameters of the model and experimental results (consisting of the epidemiological data of Evans et al. 2005, LOH(22q) data M. Carlson et al. 2018 and the present work's results on SMARCB1 ) can be gathered into subsets consisting of the parameters that can be determined or measured directly, or "inputs", X = {A, N 0 , b, f LOH , f SM ARCB1 }; and the parameters that we predict theoretically, the "outputs" Y = {n GF X , r LOH , u}. Estimating the sensitivity of parameters for non-linear models where the variables have no explicit form is an open research topic [97,98]. In our case, our model is a system of linear ordinary differential equations, and explicit forms are available for all parameter dependencies. As such, the change in theoretical outputs with respect to changes in the inputs is straightforwardly represented by the Jacobian J of the outputs with respect to the inputs [46].
Using the explicit expression for u the Jacobian J can be computed: Many values in the analytical Jacobian are exactly zero. This warrants several comments. It is remarkable that n GF X is completely insensitive to estimates of A, N 0 or b. This implies that improved measurements of epidemiological data, imaging, or precursor cell dynamics are likely to have no effect on the predicted properties of the underlying hypothetical gene GFX.
It is also remarkable that r LOH is completely insensitive to changes in the value of b. This suggests that the dynamics of LOH are better constrained by epidemiological data than microscopic estimates, as r LOH depends on A but not on b.
Clearly, all outputs appear to be most sensitive to changes in f LOH and f SM ARCB1 , which strongly reinforces our recommendation to increase the precision with which these parameters can be measured. Namely, by a follow-up experiment with a larger sample size.