Ethics

Tissues were collected at the University of Southern California Keck School of Medicine from excess surgical samples taken in the course of routine clinical care, with Institutional Review Board approval. Additional normal colon specimens were obtained from University College London Hospital (UCLH) Cancer Biobank (REC approval: 15/YH/0311).

Methylation array

Crypts or endometrial glands were isolated using an EDTA washout method, as previously described26,67. DNA methylation was measured with EPIC bead arrays (Illumina) using the Restore protocol and the manufacturers’ protocols79. IDAT files were processed using the noob normalization function in the minfi R package80.

Blood methylation data were obtained from the Gene Expression Omnibus (GEO)81,82 using β values as provided. The datasets are GSE40279 (normal blood; Fig. 6a)70, GSE73115 (10-year serial samples; Fig. 6b)83, GSE51759 (myelodysplastic syndromes84), GSE42042 (essential thrombocythemia, polycythemia vera, primary myelofibrosis85), GSE106600 (chronic myeloid leukemia86), GSE105420 (chronic myelomonocytic leukemia87), GSE62298 (AML88) and GSE69229 (ALL89).

RNA expression data for normal tissue derived from 40 individuals were retrieved from TCGA54.

Derivation of fCpG loci

To isolate those CpG sites that behave as FMCs, it was first necessary to filter out those loci that are likely to have a regulatory function or change their methylation status over the length of the crypt. This was done by selecting only those CpG sites that lie in the ‘open sea’ (further than 4 kb from a CpG island). Furthermore, probes of CpG loci that were identified90,91 as being cross-reactive were filtered out, along with CpG loci positioned on sex-determinant chromosomes. Given the relatively low amounts of DNA contained within a single crypt, we also filtered out probes that were likely to have experienced incomplete binding by restricting our analysis to probes that had a total intensity greater than 1,200 (arbitrary units).

The Illumina EPIC array features two different probe types, type I and type II (ref. 91). Type I probes feature a higher dynamic range, leading to the two probe types having different underlying distributions of β values. Due to difficulties in simultaneously modeling the two different probe types, and given that type I probes are overrepresented in CpG-dense regions of the genome, the analysis was restricted to type II probes.

CpG sites with fluctuating methylation were then detected by comparing between-individual to within-individual heterogeneity in methylation value. At fluctuating sites, we expect the average methylation in non-clonal bulk samples to follow a distribution centered on 0.5 (because methylation at the site is uncorrelated between the multiple lineages that make up the bulk sample), whereas in individual clonal samples, the methylation value can take any value between 0 and 1. Thus, to select for fCpG loci, we selected CpG sites that had the highest 5% of variance in β value between individual samples and then filtered these for sites with mean methylation across all samples and individuals of ~0.5 (mean β value between 0.4 and 0.6) (Fig. 2a).

To demonstrate technical accuracy in methylation measurement from the small amounts of DNA in single small intestinal crypts (~400 cells), colon crypts (~2,000 cells) or endometrial glands (~5,000 cells), we identified similar fCpG sites on the X chromosome and compared methylation between male and female individuals. In males, there is only a single copy of the X chromosome; hence, only two modal peaks near 0% and 100% methylation should be present in clonal populations, as opposed to the trimodal distribution observed on autosomes. Consistent with the ability to measure fluctuating methylation in small tissue samples, the X chromosome fluctuating sites exhibited trimodal W-shaped distributions in female colon crypts and bimodal ‘U-shaped’ distributions in male colon crypts (Supplementary Fig. 1d). This observation is supportive of the hypothesis that the methylation distribution of fCpG loci is reflective of that of the most recent recurrent clone rather than varying with cell type or differentiation status.

We compared methylation of fluctuating sites between crypts from the same individual. If fluctuating methylation occurs stochastically and without biological regulation, then each crypt should independently evolve a unique pattern of fCpG site methylation. Intercrypt comparison between crypts within the colon or small intestine, both across the set of crypts sampled from each individual and across crypts from different individuals, showed that fluctuating methylation patterns between crypts were uncorrelated (Supplementary Fig. 2e). There was weak correlation of fluctuating methylation patterns between crypts for younger individuals (age <30 years), but this was lost with advancing age (Supplementary Fig. 2f).

Mathematical model of methylation within the stem cell niche

We developed a stochastic model to describe how the fraction of methylated alleles (β value) in the stem cell niche of a given CpG locus changes over time. This model draws on previous attempts1,75 to model the behavior of the stem cell niche in colonic crypts but with a number of modifications that account for the differences when considering methylation as a lineage tracing marker rather than DNA. Namely, while DNA mutations occur relatively infrequently, allowing for a model that only considers a single mutant population expanding or contracting with reference to a single wild-type population, the relatively high methylation switching rate requires us to consider the potential of multiple clones existing simultaneously. Further, while DNA mutations can be generally regarded as irreversible, the methylation status of a given cell (that is, whether a particular cell is homozygously (de)methylated or heterozygously methylated) can theoretically flip–flop, necessitating a careful consideration regarding the possible ways the overall β value can change.

For this reason, we made the simplifying assumption that the population was well mixed, such that any of the S stem cells can replace any of the other S – 1 stem cells with equal probability and that these replacements occur at a constant rate λ per stem cell. This assumption greatly simplified our analysis, as the system can be fully characterized using just two state variables: k – the number of stem cells containing a single methylated allele, and m – the number of stem cells containing two methylated alleles. The admitted states are constrained by the inequality \(0 \le k + m \le S\) for a total of \(\frac{1}{2}\left( {S + 1} \right)\left( {S + 2} \right)\) states.

Along with the replacement process, we assumed that a previously unmethylated CpG locus could spontaneously become methylated with a rate μ per year and, conversely, that a previously methylated CpG locus could spontaneously become demethylated with a rate γ per year.

To develop the series of ordinary differential equations that fully determine the system, we considered the ways in which a state (k,m) could transition to a state \(\left( {k^\prime ,m^\prime } \right)\). As an example, if we consider Fig. 1c, we observe that of the S = 5 stem cells, 3 of the stem cells are heterozygously methylated, and 1 of the cells is homozygously methylated; hence, the system is initially in state (k = 3,m = 1). To transition to state \(\left( {k^\prime = 3,m^\prime = 2} \right)\), the homozygously methylated stem cell must clonally expand, replacing the homozygously demethylated cell. The rate at which any one of the stem cells replaces another is λS = 5λ, but of the S(S – 1) = 20 possible transitions, only 1 would lead to the desired (3,2) state; hence, the rate at which the system transitions \(\left( {3,1} \right) \to \left( {3,2} \right)\) is \(\frac{1}{{20}} \ast 5\lambda = \frac{1}{4}\lambda\). We continue this process (Supplementary Information) considering the general transition \(\left( {k,m} \right) \to \left( {k^\prime ,m^\prime } \right)\), deriving the following master equation:

$$\begin{array}{l}\frac{{d{{{\mathrm{P}}}}\left( {k,m|\lambda ,\mu ,\gamma ;t} \right)}}{{dt}} = \left( {S - m - \left( {k - 1} \right)} \right)\left( {\left( {k - 1} \right)\frac{\lambda }{{S - 1}} + 2\mu } \right){{{\mathrm{P}}}}\left( {k - 1,m|\lambda ,\mu ,\gamma ;t} \right)\\ + \left( {m - 1} \right)\left( {S - \left( {m - 1} \right) - k} \right)\frac{\lambda }{{S - 1}}{{{\mathrm{P}}}}\left( {k,m - 1|\lambda ,\mu ,\gamma ;t} \right)\\ + \left( {k + 1} \right)\left( {\left( {m - 1} \right)\frac{\lambda }{{S - 1}} + \mu } \right){{{\mathrm{P}}}}\left( {k + 1,m - 1|\lambda ,\mu ,\gamma ;t} \right)\\ + \left( {k + 1} \right)\left( {\left( {S - m - \left( {k + 1} \right)} \right)\frac{\lambda }{{S - 1}} + \gamma } \right){{{\mathrm{P}}}}\left( {k + 1,m|\lambda ,\mu ,\gamma ;t} \right)\\ + \left( {m + 1} \right)\left( {S - \left( {m + 1} \right) - k} \right)\frac{\lambda }{{S - 1}}{{{\mathrm{P}}}}\left( {k,m + 1|\lambda ,\mu ,\gamma ;t} \right)\\ + \left( {m + 1} \right)\left( {\left( {k - 1} \right)\frac{\lambda }{{S - 1}} + 2\gamma } \right){{{\mathrm{P}}}}\left( {k - 1,m + 1|\lambda ,\mu ,\gamma ;t} \right)\\ - \left( {2\left( {k\left( {S - k} \right) + m\left( {S - k - m} \right)} \right)\frac{\lambda }{{S - 1}} + \left( {2S - \left( {k + 2m} \right)} \right)\mu + \left( {k + 2m} \right)\gamma } \right)\\{{{\mathrm{P}}}}\left( {k,m|\lambda ,\mu ,\gamma ;t} \right)\end{array}$$

This linear series of differential equations can be solved computationally by rewriting the equations into a matrix equation, \(\frac{{d\mathop{P}\limits^{\rightharpoonup} \left( t \right)}}{{dt}} = {\it{T}}\mathop{P}\limits^{\rightharpoonup} \left( t \right)\) and applying matrix exponentiation to the resulting transition matrix T.

$$\mathop{P}\limits^{\rightharpoonup} \left( t \right) = e^{t{\it{T}}}\mathop{P}\limits^{\rightharpoonup} \left( {t = 0} \right)$$

During the very early stages of embryogenesis, the existing methylation patterns inherited from parental gametes are largely erased before a large wave of de novo methylation remodels the entire genome, resulting in a bimodal methylation distribution92. Given that all the stem cells within a niche are initially clonal, we thus assumed that it was equally likely to find a given fCpG locus as homozygously methylated or unmethylated across all the stem cells within the niche at time 0. Further study is necessary to ensure the validity of this assumption.

$${{{\mathrm{P}}}}\left( {k,m|\lambda ,{\it{\upmu }},\gamma ;t = 0} \right) = \left\{ {\begin{array}{*{20}{c}} {0.5} & {{\textrm{if}}} & k & = & {0 \wedge m} & = & S \\ {0.5} & {{\textrm{if}}} & k & = & {0 \wedge m} & = & 0 \\ 0 & {{\textrm{otherwise}}} & {} & {} & {} & {} & {} \end{array}} \right.$$

However, the methylation status of individual cells is not available using methylation arrays; hence, the hidden states must be marginalized over to calculate the probability of there being z methylated copies within the stem cell niche (note that \(0 \le z \le 2S\)). This can be achieved by summing the various combinations of k and m states that satisfy the equation z = k + 2m.

$${{{\mathrm{P}}}}\left( {z|\lambda ,\mu ,\gamma ;t} \right) = \mathop {\sum }\limits_{m = 0}^S \mathop {\sum }\limits_{k = 0}^{S - m} {{{\mathrm{P}}}}\left( {k,m|\lambda ,\mu ,\gamma ;t} \right)\delta _{k + 2m,z}$$

The resulting distribution of \({{{\mathrm{P}}}}\left( {z|\lambda ,\mu ,\gamma ;t} \right)\) can qualitatively reproduce the characteristic W shape exhibited in the methylation fraction of individual crypts.

Error model

The probability distribution calculated above, \({{{\mathrm{P}}}}\left( {z|\lambda ,\mu ,\gamma ;t} \right)\), gives the probability that exactly z of the 2S alleles (across S stem cells) are methylated at a particular CpG locus; however, the Illumina EPIC array quantifies the methylation level at specific loci aggregated over the whole crypt. As such, we introduced an error model to link the measured β value with the ‘true’ z value at a specific site. We chose to model the measured β values as a mixture of z β-distributed random variables, each with a mean value determined by z and a scale parameter k z .

To account for the background noise of the array, the mean value of each β peak was set to be equal to a linear transform of z: \(x = \left( {{\it{\epsilon }} - {{{\mathrm{{\Delta}}}}}} \right)\frac{z}{{2S}} + {{{\mathrm{{\Delta}}}}}\), with the parameters describing this transform (\({\epsilon }\) and Δ) to be inferred. The scale parameters (sometimes referred to as the sample size), \(\vec \kappa\), of each β peak were modeled as hierarchical, with each κ z being drawn from a lognormal distribution parameterized in terms of the population mean, θ, and its standard deviation, σ. These hyperparameters were also inferred during the Bayesian inference.

Likelihood and prior functions

As rate parameters are naturally positive quantities, λ, μ and γ were constrained to positive real values by defining the prior distributions in terms of positive half-normals with a scale informed by prior literature. Following the finding of Nicholson et al.53 that the replacement rate is approximately 1.3 replacements per stem cell per year, we set the scale of the prior on the replacement rate equal to 1. Similarly, θ and σ were also constrained to positive values using broad half-normal prior distributions, with a scale of 500 and 50, respectively. Previous work has found that methylation fidelity can vary dramatically across the genome, from ~10−4 to 10−2, and we will take an estimate of 10−3 per division as a reasonable scale93. If we assume intestinal stem cells divide every ~3 d and we consider that our definition of μ, γ is in units of (per allele per year), this corresponds to a (de)methylation rate of ~0.05. We note that the inference is relatively insensitive to the exact choice of prior on the (de)methylation rate (Supplementary Fig. 7b,c). The lognormal hierarchical prior distribution naturally constrains \(\vec \kappa\) to real values. The ‘offsets’ in the linear transform, Δ and \({\it{\epsilon }}\), were constrained to lie between 0 and 1 by placing a β distribution on each parameter, such that the mean prior value was 0.05 and 0.95, respectively.

The behavior of individual CpG loci was assumed to be independent, such that the likelihood of all N = 1,794 CpG loci was simply the product of the per-CpG likelihood computed according to the mathematical model outlined above.

Likelihood:

$$x = \left( {{\it{\epsilon }} - {{{\mathrm{{\Delta}}}}}} \right)\frac{z}{{2S}} + {{{\mathrm{{\Delta}}}}}$$

$${{{\mathrm{P}}}}\left( {\beta _i|z,{{{\mathrm{{\Delta}}}}},{\it{\epsilon }},\kappa _z} \right) = \frac{{\beta _i^{\kappa _zx - 1}\left( {1 - \beta _i} \right)^{\kappa _z\left( {1 - x} \right) - 1}}}{{{{{\mathrm{B}}}}\left( {\kappa _zx,\kappa _z\left( {1 - x} \right)} \right)}}$$

$${{{\mathcal{L}}}}\left( {\lambda ,\mu ,\gamma ,{{{\mathrm{{\Delta}}}}},{\it{\epsilon }},\vec \kappa ,S|\vec \beta } \right) = \mathop {\prod }\limits_{i = 1}^N \mathop {\sum }\limits_{z = 0}^{2S} {{{\mathrm{P}}}}\left( {\beta _i|z,{{{\mathrm{{\Delta}}}}},{\it{\epsilon }},\kappa _z} \right){{{\mathrm{P}}}}\left( {z|\lambda ,\mu ,\gamma ;t} \right)$$

Hyperpriors:

$$\theta \sim {{{\mathrm{halfnormal}}}}\left( {500} \right)$$

$$\sigma \sim {{{\mathrm{halfnormal}}}}\left( {50} \right)$$

Priors:

$$\lambda \sim {{{\mathrm{halfnormal}}}}\left( {1.0} \right)$$

$$\mu \sim {{{\mathrm{halfnormal}}}}\left( {0.05} \right)$$

$$\gamma \sim {{{\mathrm{halfnormal}}}}\left( {0.05} \right)$$

$${{{\mathrm{{\Delta}}}}}\sim {{{\mathrm{\beta}}}}\left( {5,95} \right)$$

$${\it{\epsilon }}\sim {{{\mathrm{\beta}}}}\left( {95,5} \right)$$

$$\kappa _z\sim {{{\mathrm{lognormal}}}}\left( {\ln \left( {\frac{{\theta ^2}}{{\sqrt {\theta ^2 + \sigma ^2} }}} \right),\sqrt {\ln \left( {1 + \frac{{\sigma ^2}}{{\theta ^2}}} \right)} } \right)$$

Bayesian inference

A Bayesian inference methodology was developed to infer the biological model parameters (number of stem cells within the stem cell niche (S), replacement rate per stem cell per year (λ), and methylation (μ) and demethylation (γ) rate per CpG locus per stem cell per year) directly from the distribution of FMC β values for each crypt.

Investigation of simulated datasets revealed that the resulting posterior distributions were multimodal, with each S value associated with a local maxima (due to the correlation in the posterior between S and λ). This multimodality can make the posterior difficult to explore using traditional Markov chain Monte Carlo techniques, such as Hamiltonian Monte Carlo. To overcome this, a nested sampling method94 was developed to calculate the Bayesian evidence (marginal probability density, \({{{\mathcal{Z}}}}\)) of each S value considered (\(S \in \left[ {3..20} \right]\)) while simultaneously generating samples from the posterior associated with each value of S. The probability of S for a given crypt can then be calculated as

$${{{\mathrm{P}}}}\left( {S|\vec \beta } \right) = \frac{{{{{\mathcal{Z}}}}\left( {S|\vec \beta } \right)}}{{\mathop {\sum }

olimits_j {{{\mathcal{Z}}}}\left( {S_j|\vec \beta } \right)}}$$

The full posterior can be approximated by drawing samples from each S mode with a weight equal to the inferred probability of S. The nested sampling was performed using dynesty95, a Python implementation of the nested sampling algorithm, using the ‘rwalk’ sampling option, such that new live points are generated from existing live points under random walk behavior.

To ensure that the posterior samples had converged to the equilibrium distribution, four independent samples were run with random initializations for each sample, and the rank-normalized potential scale reduction statistic (\(\hat R\)) was calculated96,97. \(\hat R\) was found to be less than 1.1 (a typical threshold used to determine convergence) in all cases. The inference code can be obtained from https://github.com/CalumGabbutt/flipflop.git (ref. 56).

Impact of simplifying assumptions

Our mathematical model of intestinal stem cell niche dynamics inevitably rested on a number of simplifying assumptions. We investigated the impact of these assumptions.

First, we assumed a well-mixed population. This differed from previous prominent modeling approaches, foremost the approach of Lopez-Garcia et al.1 who assumed that stem cells were organized in a ring geometry where replacement could only happen between neighboring cells on the ring (Supplementary Fig. 3a and Supplementary Information). We used stochastic simulation to explore the effect of a well-mixed versus ring geometry. Across biologically relevant numbers of stem cells (\(S \lesssim 10\)), the differing geometry was found to have a negligible effect on the resulting fCpG methylation distribution (Supplementary Fig. 3b). We performed statistical inference upon these simulations (using the inference framework that makes the well-mixed assumption) and were able to accurately recover the known parameters (Supplementary Fig. 3c). We note that Lopez-Garcia et al.’s model only needed to consider the clonal expansion or retraction of a single labeled clone, whereas our model had to account for the possibility of multiple labeled clones due to the increased mutation rate of the epigenome; hence, the well-mixed assumption was chosen to minimize mathematical complexity. Further, we note that live-imaging data from mouse crypts69 show that murine stem cells can exchange places within the niche, suggesting that the stem cell population may be neither strictly ring-like nor well mixed but rather a hybrid model between the two extremes.

Second, we neglected genetic ‘linkage’ between different CpG loci (each cell carries a set of linked CpGs) to prevent mathematical complexity. We explored the effect of linkage using the same well-mixed Gillespie simulations as above and found that the mean methylation per peak of the individual crypts simulated with linkage exactly matches that of the analytic probability distribution that we derived but that the individual crypts exhibit a greater degree of variability than that predicted by sampling from the analytic model (Supplementary Fig. 6). Consequently, credible intervals of the posterior inferred with our non-linkage inference method will be marginally too narrow.

Third, we assumed that all of the fCpG loci that we had identified as fluctuating were not under selection or active regulation. We explored the consequence of a fraction of CpG sites not behaving in a fluctuating manner on the accuracy of the inference (Supplementary Information and Supplementary Fig. 4). Including non-fluctuating sites caused a systematic underestimation of the replacement rate, but when the number of non-fluctuating sites was sufficiently low (≲5%), the number of stem cells and the replacement rate could still be accurately inferred.

Finally, we assumed that the replacement rate, methylation rate and demethylation rate are constant over an individual’s lifetime. While previous research suggests that the stem cell division rate lowers over an individual’s lifetime98, and our findings are consistent with such a decrease, it is likely that both the replacement rate and the methylation error rate are proportional to the cell division rate, such that the ratio of the two rates does not change over time. In this way, our model describes the stem cell dynamics of an individual crypt averaged over an individual’s lifetime.

Tissue-specific differences in stem cell dynamics

To compare the stem cell dynamics of different tissue and disease types in a statistically rigorous manner, we must account for the hierarchical individual structure (that is, we have multiple glands from each individual that are likely to be correlated) while controlling for the age and sex of each individual. We developed a hierarchical Bayesian GLM using a log-link function to constrain our dependent variable to be positive (presented fully in the Supplementary Information) and take a hypothesis testing by parameter estimation approach (that is, the difference between small intestine and colon is statistically significant if the 95% equal-tailed credible interval excludes 0).

Spatial model of the crypt

A crypt ignoring villi in the small intestine forms a cylindrical geometry with stem cells at the base and a crypt wall moving up the crypt. Here, we have developed an off-lattice mechanistic agent-based model of the human crypt using the hybrid automata library (HAL) modeling framework99 capable of representing a crypt of the small intestine or colon (Fig. 3d). The cylindrical unit is separated into two compartments, the stem cell compartment represented as a pool at the base of the crypt and then the wall of the crypt where transit amplifying cells are pushed upward until they are removed from the top of the crypt. The spatial model of the crypt is dynamic in the sense that the x and y dimensions are calculated using the total populations size (N T ) and the stem cell pool radius (ψ). The x dimension is defined as x = 2πψ. The center of the stem cell pool is placed such that the origin of the center of this circular stem cell pool whose size, and thus number of stem cells allowed within this pool, is placed at (h,k) where h = x/2 and k = ψ + 5 Division for each stem cell is defined by ρ c , which is randomly assigned as the hourly cell cycle defined by \(p_c\sim U\left( {\rho _{min},p_{max}} \right)\), where ρ min and ρ max are ρ ± 4 h.

As a cell approaches ρ c , the cells diameter doubles for the 5 h/time steps preceding the cell’s division. Following division, both daughter cells occupy this space. When a stem cell (defined by \(d\left( {x_c,y_c} \right) \le \psi\) where \(d\left( {x_c,y_c} \right) = \sqrt {\left( {x_c - h} \right)^2 + \left( {y_c - k} \right)^2}\)) divides, the daughter cells can be placed in any arrangement around the parent cell’s x c and y c position; differentiated cells can only be placed vertically (that is, the x c values are equal). The base of the crypt wall is set just above the origin of the stem cell pool plus ψ and a small offset to provide space so that no cell forces interact between the stem cell pool and the base of a stem cell wall. If \(d\left( {x_c,y_c} \right) > \psi\), then the cell is moved to the base of the stem cell wall where the cell’s new position (x 2 ,y 2 ) is given as y 2 , and x 2 is given by the cell’s exit radians, rad s , given by \(atan2\left( {y_c,x_c} \right)\) so that the cells position along the x dimension is \(x_2 = \left( {rad_s + \pi } \right)\left( {\frac{x}{{2\pi }}} \right)\). Boundary conditions for the cells within the crypt wall are periodic (that is, allowed to wrap around) and no flux at the top and bottom of the crypt (that is, no cell can breach these boundaries). A run step in the model is hourly, and updates to cell positions occur for the whole crypt and are applied at each time step. We give each cell 1,794 CpG loci (with the possible status of 0 for demethylated or 1 for methylated). At each division, these loci can switch methylation status at a rate defined by ω following division.

At each hourly time step, we assume that the forces acting on each individual cell are at equilibrium, \(F_{c_i} = 0\), where \(F_{c_i}\) is equal to the contact force between cell i and its neighbors. For two cells whose radii are R i and R j , respectively, the contact force between them is based on a linear spring constant model (Hooke’s law) and is calculated as

$$F_{c_{ij}} = \left\{ {\begin{array}{*{20}{c}} {k_i\frac{{{{{\mathrm{{\Delta}}}}}R_{ij}}}{{R_i + R_j}}} & {if} & {\frac{{{{{\mathrm{{\Delta}}}}}R_{ij}}}{{R_i + R_j}}} & > & 0 \\ 0 & {if} & {\frac{{{{{\mathrm{{\Delta}}}}}R_{ij}}}{{R_i + R_j}}} & < & 0 \end{array}} \right.$$

Assuming that each cell has the same spring constant k, the overlap of cells (\(\frac{{{{{\mathrm{{\Delta}}}}}R_{ij}}}{{R_i + R_j}}\)) and the overall number of cells in contact with any given cell (n i ) give the velocity for an individual cell, \(v_i = k\mathop {\sum }

olimits_{j = 1}^{n_j} \frac{{{{{\mathrm{{\Delta}}}}}R_{ij}}}{{R_i + R_j}}\). The modeling framework can be obtained from https://github.com/MathOnco/flipflopspatialmodel.git (ref. 57).

Inference of stem cell numbers on the spatial model

To provide insights into the FMC signal from a first principles model of the homeostatic crypt (balanced birth/death with a methylation error rate), we have to add noise to the output data of the spatial model. This is because the inference framework is designed to fit the noisy experimental data and that fCpG sites with values of zero or one are not captured within the data. To add a small amount of noise to the output of the perfect methylation distribution’s output by the spatial model, a binomial is used with two offsets to provide a distribution that the inferences can be performed on. For each β value, a sample size (κ) of 1,000 is taken from a β distribution using an offset from 0 (Δ = 0.04) and an offset from 1 (\({\it{\epsilon }} = 0.92\)) (Fig. 3d). The script required to add noise to this model is accompanied with the inference framework (see add_noise.py). Once the β values with noise are added, the inference framework is executed for each model simulation’s β value distributions for across stem cell number ranges from 2 to 9, 3 to 10 and 8 to 15, respectively, using 400 live points for the dynesty sampler95.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.