Introduction

Human embryonic stem cells (hESCs) and induced pluripotent stem cells (iPSCs) have promising clinical applications, including the advancement of cellular therapies, disease modelling and drug development, due to their self renewal potential and the ability to differentiate into specialised cells (pluripotency)1,2. However, continued efforts in understanding the complex behaviour of hESCs and iPSCs are necessary to make their clinical applications a reality.

A typical in-vitro hESC experiment involves the distribution of cells upon a growth material (the ‘seeding’ of cells onto a plate). The seeding density is then the number of cells placed on the growth material per unit area. Cells need to attach to the plate surface, which is covered by Matrigel or similar, for viability and proliferation; however, some cells do not successfully attach and are lost. The hESCs then form colonies by repeated mitosis in which two genetically identical daughter cells are produced from the division of the mother cell. The proliferation of cells in this way results in colonies of tightly packed cells in mono-layers along the growth material. The doubling time of stem cells varies and can be affected by various environmental and chemical factors, including cell density3,4,5.

An important measure of the self-renewal potential of stem cells is the clonality, the condition of being genetically identical. Generating homogeneous populations of clonal cells is of great importance6,7 as clonally derived stem cell lines maintain pluripotency and proliferative potential for prolonged periods8. Some applications require clonal homogeneous populations, e.g. drug discovery9 and iPSCs for personalised medicine. The selection of the best clones for further experimentation needs to be optimised to make clinical applications safe. If the seeding density is high, the migration of cells and the growth of closely-separated cell groups can cause aggregation of colonies; this is undesirable when a homogeneous clonal population with identical genetic composition is required. The seeding density of cells has been shown to not only have an effect on the clonality of stem cells10, but also on their differentiation potential11. Moreover, culturing at an overly high density can cause DNA damage and culture adaptation, leading to increasing occurrence of chromosomal aberrations3,12,13.

Single hESCs are reported to have no effect on each other’s movement if they are greater than 150 μm apart10. It is therefore recommended to keep a minimum distance of 150 μm between colony boundaries throughout growth to assure the resulting clonal structures are from single founding hESCs. In our previous work we considered the kinematics of single and pairs of cells, and identified the occasional super-diffusive movements of cells which could lead to re-aggregation14,15.

The quantification of stem cell behaviour assists in the development of mathematical models which can then be used as non-invasive predictive tools. The development of colonies is inherently stochastic containing many properties which occur with some randomness, such as cell growth, cell division and cell death. An overview of the modelling of cell processes is provided in16. For mammalian cells, colony formation from a single cell has been previously described using a stochastic Galton-Watson branching process model17. Cell proliferation or colony models can either be set on or off a regular or irregular lattice, and each cell can be modelled individually in an agent based model or using continuum models. A thorough summary of these different model types, along with their advantages and disadvantages with a view to tissue mechanics is provided in18.

Here we focus on quantifying the possibilities of re-aggregation due to the physical proximity of colonies at different seeding densities and consider the optimal seeding densities to form clonal structures. From experimental observations of colonies after a fixed evolution time, we infer a stochastic model for colony growth, a method previously applied to other cell types, including bacterial and cancerous cells19,20. To correctly initialize the model for a given seeding density, we take into account the proportion of seeded cells which begin as isolated cells, pairs, triples, and so on. We find that this is essential to capture the experimental observations. We use the model to simulate hESC colony growth at different seeding densities, consider the area coverage with time and calculate the critical time at which the homogeneous colonies begin to merge. These results can help inform cell seeding decisions to form homogeneous colonies from single founder cells.

Results

Notation

The notation used throughout the manuscript is outlined here for convenience. The number of cells in a colony at a given time, t, is N(t), with t always in hours. Therefore the number of cells at 72 h is N(72). The initial number of cells at time t = 0 is N(0) ≡ N0. The seeding density is n0 and the density of attached cells after 24 h is η0 = 0.35n0, given in cells/cm2. The growth rate of a colony is γ (given in h−1), the division rate is 1/γ (given in h) and the population doubling time is td (given in h). The time at which the number of founder cells is indistinguishable based on colony size is t* (given in h). The average time at which colonies merge due to physical proximity is τ (given in h).

Experiments

hESCs were seeded at low density and grown into colonies. Two types of experiment were carried out: Experiment 1 collected data on colony numbers 72 hours post-attachment and Experiment 2 collected data on the rates of cell attachment and the time to the first colonies merging. Further details are given in the Methods section.

Experimental colony size

From Experiment 1, the number of cells in each of the 48 colonies at 72 hours after cell attachment, N(72), was recorded and is shown for each colony arranged in ascending order in Fig. 1a. The corresponding histogram of N(72) is shown in Fig. 1b. The distribution is bimodal, confirmed by the kernel density estimation, with an outlier colony at N(72) = 77 cells. We remove this outlier colony for further analysis as it was most likely formed by several colony merges. We expect the number of cells at time t to evolve roughly as N(t) = N02t/td, where N0 ≡ N(0) is the initial number of cells and td is the time it takes for the population to double, or equivalently N(t) = N0eγt, where γ is the growth rate. The bimodal nature of N(72) implies that we have two distinct groups of colonies, lead by differences in N0 and/or differences in the growth rates between colonies. For a typical duration of the cell cycle, 16–18 hours, one expects 20 cells at 72 hours, corresponding to the first histogram peak. The second peak, at about 40 cells per colony, suggests that some groups of cells have merged to form larger colonies during the 72 hour period, or that the initial condition of the colony growth was in fact N0 = 2. K-means clustering, a standard algorithm which partitions observations into clusters based upon minimising within cluster variance, splits N(72) into the two ranges, 7 ≤ N(72) ≤ 29 and 34 ≤ N(72) ≤ 77.

Figure 1
figure 1

(a) N(72) for each colony arranged in ascending order. The horizontal orange block shows the splitting of the data into two groups using K-means clustering. The split occurs between 29 ≤ N(72) ≤ 34. (b) A histogram of N(72) with kernel density estimates with bandwidths of 4.5 (orange) and 5.43 (auto bandwidth, blue dashed). The vertical orange block shows the splitting of the data into the two groups using K-means clustering. (c) Microscopy image of cells seeded at density n0 = 1200 cells/cm2. Most cells have no neighbours within the critical interaction distance of 150 μm (highlighted in blue), but some are in pairs (orange) and triples (green). The scale bar represents 100 μm. (d) Bar chart showing the initial cell seeding density (cells/cm2) at day zero (n0, unfilled bar height), and the mean cell density of the remaining cells attached to the plate at day one (η0, filled bars). Each mean is based on three independent measurements and the error bars represent standard deviations.

To ascertain the initial conditions that underlie the colony growth, we turn to Experiment 2, examining the cells after 24 hours; a typical image is shown in Fig. 1c. There are several characteristic features of the cell distribution revealed in this experiment. Firstly, the random initial positioning of the seeded cells means that some cells are initially isolated with no cells within the interaction distance (estimated as 150 μm). Other cells lie within the interaction distance of each other, forming groups of varying sizes. In Fig. 1c we colour-code the cells according to whether they are isolated, or are effectively in a pair or in a triple, to illustrate how N0 can vary at low seeding densities. We will return to this feature later.

Secondly, only a fraction of the originally seeded cells are attached to the plate at this time. Cells need to be attached to the plate for viability and proliferation - cells which do not attach are lost. We find for a range of seeding densities, 1000 ≤ n0 ≤ 7000 cells/cm2, that on average 35.19% ± 4.23[0.99] (the mean ± one standard deviation [standard error]) of initially seeded cells were attached 24 hours after plating. Figure 1d shows the proportion of attached cells at different seeding densities. This rate of attachment is usual for stem cells in similar experiments. In the following modelling section of this paper we discuss N(t), the number of cells present in a colony over time, independent of original cell seeding densities. In the cell seeding section we discuss the effects of the cell plating densities, n0, where we assume that the actual density of cells present is η0 = 0.35n0 to account for the loss at the attachment stage. Note this relationship is easily adjustable for different attachment rates.

Development of the exponential growth model

Throughout this paper we define time t = 0 to be the time that seeded cells have attached to the plate and their proliferation starts. Before this time some cells are lost (as reported above and shown in Fig. 1d) and there is a delay in the growth from the lag-phase experienced by newly plated in vitro hESCs as they adjust to the environment21. This is consistent with the experimental data which considers 72 h after cell attachment.

The simplest deterministic model for the number of cells in a colony at time t, N(t), assumes a constant cell division time 1/γ and simultaneous division of all the cells, leading to

$$\frac{dN(t)}{dt}=\gamma N,$$

which has the solution,

$$N(t)={N}_{0}{e}^{\gamma t},$$

where N0 ≡ N(0) is the initial number of cells at t = 0. However, the cell cycle duration is variable due to various factors, such as inhomogeneities in the nutrient distribution within the growth medium and the inherent variation in the cell cycle duration between different clones. Such effects can be allowed for by considering a Gaussian random growth rate γ, with a mean value μ and standard deviation σ:

$$N(t)={N}_{0}{e}^{\gamma t},\,\gamma \sim {\rm{Norm}}(\mu ,{\sigma }^{2}\mathrm{).}$$
(1)

Different colonies thus grow at different rates sampled from the Gaussian probability distribution. The number of cells then follows a lognormal distribution, \(N(t) \sim {\rm{LogNorm}}({\mu }_{0},{{\sigma }_{0}}^{2})\), where μ0 =  + log(N0) and σ02 = t2σ2. A short mathematical explanation can be found in the Supplementary Information. However, this model fails to explain the bimodal distribution of the colony sizes observed at t = 72 h. This is presented in the Supplementary Information, Fig. S1.

We suggest that a bimodal distribution of the colony sizes can be a consequence of a difference in the cell proliferation rates in cell groups of different sizes that may arise from their interactions. It can be expected that colonies starting from larger groups grow faster, not only due to the initial conditions but the preference of cells to growth in close proximity to neighbours22,23. To capture the bimodal nature of the colony size distribution, we consider two populations, A and B, each with a different initial condition,

$$\begin{array}{ccc}{N}_{0} & = & \{\begin{array}{cc}\mathrm{1,} & {\rm{group}}\,{\rm{A}},\\ \mathrm{2,} & {\rm{group}}\,{\rm{B}},\end{array}\end{array}$$

where the probabilities for a colony to belong to groups A and B are α and β, respectively. Each population then follows Eq. (1) with its corresponding initial condition,

$$(\begin{array}{cc}{N}_{{\rm{A}}}(t)={e}^{{\gamma }_{{\rm{A}}}t},\,{\gamma }_{{\rm{A}}} \sim {\rm{Norm}}({\mu }_{{\rm{A}}},{\sigma }_{{\rm{A}}}^{2}), & {\rm{with}}\,{\rm{probability}}\,\alpha ,\\ {N}_{{\rm{B}}}(t)=2{e}^{{\gamma }_{{\rm{B}}}t},\,{\gamma }_{{\rm{B}}} \sim {\rm{Norm}}({\mu }_{{\rm{B}}},{\sigma }_{{\rm{B}}}^{2}), & {\rm{with}}\,{\rm{probability}}\,\beta ,\end{array}$$
(2)

and each of NA and NB has a lognormal probability distribution. Thus we consider separately colonies that originate from a single cell and those from cell pairs. We consider the possible role of cell triples and larger progenitor groups in the Discussion.

A lognormal mixture fit to the data, shown in Fig. 2a, gives \({N}_{{\rm{A}}}\mathrm{(72)} \sim {\rm{LogNorm}}{\mathrm{(2.84,0.41}}^{2})\) and \({N}_{{\rm{B}}}\mathrm{(72)} \sim {\rm{LogNorm}}{\mathrm{(3.76,0.13}}^{2})\), with the mixture probabilities α = 0.77 and β = 0.23. Therefore, we have μA = 0.0394 h−1, σA = 0.0057 h−1, μB = 0.0426 h−1, and σB = 0.0018 h−1. A comparison between these growth parameter distributions is shown in Fig. 2b. These values correspond to a doubling time of 17.5 h for the single founder cell population (group A) and 16.3 h for the pairs of cells population (group B). Note that here we have assumed the growth rates for the two populations are different. Our fitting suggests colonies starting with two cells grow faster than their single cell counterparts, consistent with the idea that cells growing in larger colonies proliferate more effectively than isolated cells22,23. The fitting under the assumption of identical growth rates, γA = γB, is presented in the Supplementary Information, Fig. S2. The fitting does not capture the experimental data and so we conclude it is appropriate to continue assuming different growth rates for the two populations.

Figure 2
figure 2

(a) N(72) with a lognormal mixture model fitting corresponding to the two populations. The population fittings, \({N}_{A}\mathrm{(72)} \sim {\rm{LogNorm}}{\mathrm{(2.84,0.41}}^{2})\) with mixture probability 0.77 and \({N}_{B}\mathrm{(72)} \sim {\rm{LogNorm}}{\mathrm{(3.76,0.13}}^{2})\) with mixture probability 0.23 are shown in blue dashed, and the overall mixture distribution in orange. (b) The two γ distributions with group A in blue and group B in orange. Note that the distributions are not scaled to represent the group probabilities α and β. The inset shows the parameters for γ, μ with ±σ error bars for the initial conditions N0 = 1 and N0 = 2, corresponding to group A and group B, respectively. The dashed line shows the extrapolation of the trend to higher values of N0.

Modelling population growth

We have demonstrated how accurately the two-population model captures the experimental data at 72 hours. Now we proceed to develop it into a prognostic model for the colony size at later times. The evolution of the colony size, N(t), according to this two-population model is shown in Fig. 3. Because of the random scatter in the colony growth parameters the admissible range of colony size N(t) increases as \(\sqrt{t}\), and sooner or later, the size of the two colony types overlap. At early times, the sizes of the two colony types are distinct, where those beginning from two cells are larger than those from one founder cell, but as time progresses the stochasticity in the growth rates causes an overlap in the two populations. This overlap becomes more prominent for larger numbers of simulations as this increases the incidence of extreme growth rate values. Histograms of N(20) and N(72) in Fig. 3b,d illustrate how at early times the two colony types are distinguishable but over time the distributions spread and merge to make single-clone colonies indistinguishable from heterogeneous colonies. The time at which NA first becomes equal to NB, t*, is the critical time after which it is not possible to distinguish which colonies originated from a single progenitor based on the colony size. This time is shown for increasing numbers of colonies in Fig. 4. As we increase the number of colonies, Ncols, we see more of the extreme values occurring with low probability, causing the blurring of the two populations to begin at an earlier time. The relationship is a power law, with the best fit t* = aNcolsb with a = 77.9 ± 4.7 h and b = 0.12 ± 0.01 with an R2 coefficient of 0.98. This allows us to estimate, for a given plating cell density, the time up to which colonies originating from a single founder cell are identifiable based on the current number of cells in the colony.

Figure 3
figure 3

Evolution of N(t) from the two-population model for (a) 50 and (c) 5000 colonies, with blue NA(t) (N0 = 1) and orange NB(t) (N0 = 2). The insets show the corresponding plots of log(N(t)). Histograms of N(t) for (b) 50 and (d) 5000 colonies at t = 20 h and t = 72 h. Note that the distributions are unscaled to highlight the difference in the frequencies of each group, resulting from the difference in the population probabilities α = 0.77 (blue population) and β = 0.23 (orange population).

Figure 4
figure 4

The mean critical time, t* at which colonies originating from a single cell are no longer identifiable based on colony size, N(t). Each data point is the mean of 500 simulations. The error bars represent the standard deviations in the means. The data is well captured by a power law relation t* = aNcolsb (orange line, R2 = 0.98); linear least-squares fitting on a log-log plot (inset) gives a = 77.9 ± 4.7 h and b = 0.12 ± 0.01.

Role of seeding density and cell clustering on the formation of homogeneous hESC colonies

Typical low seeding densities for hESCs, intended to grow colonies from single founder cells, commonly range from 500 to 3000 cells/cm2. Across a range of seeding densities, we find that the average proportion of cells attached to the substrate after 24 hours is 35 ± 4%, where the range represents one standard deviation within the sample, and the accuracy of the mean value (the standard error) is ±1.0%, presented in Fig. 1d. For example, an initial seeding density of n0 = 1500 cells/cm2 results in around 500 cells continuing past day one of the experiment. Throughout this section we will present the initial seeding densities n0 and work on the assumption that 35% of these cells are successfully attached and survive, η0 = 0.35n0.

In this section we find the initial conditions corresponding to cell plating at different cell seeding densities and use this to inform the model for colony growth. The seeding of cells randomly across a growth area, A, can be simulated as a homogeneous Poisson point process in which the number of point counts is sampled from a Poisson distribution with the mean λA. The points are then independently and uniformly scattered across the region. For example, if we consider an initial seeding density of n0 = 1500 cells/cm2, we can simulate the seeding of η0 = 0.35n0 cells in a 1 cm2 area by a homogeneous point Poisson process in which the η0 point counts are sampled from Po(λA) where A = 1 cm2, and then locating cells according to the uniform distributions in x and y.

Once cells have been scattered, we can then consider the distances between cells and their nearest neighbours with the aim to estimate the fraction of isolated cells, their pairs (defined as two cells separated by less than 150 μm) and triples etc. The probability density function of the distance, r, to the kth nearest neighbour is known from the theory of Poisson point processes as \({d}_{k}(r)=2{(\lambda \pi {r}^{2})}^{k}{e}^{-lp{r}^{2}}/r(k-1)\)24. This reduces to \({d}_{1}(r)=2\pi \lambda {r}^{2}{e}^{-lp{r}^{2}}\) for the first nearest neighbour. The theoretical distributions along with histograms from simulated data for d1(r) are shown in Fig. 5a for initial seeding densities of n0 = 500, 1500 and 5000 cells/cm2 corresponding to λ = η0 = 0.35n0. These distributions allow us to calculate the proportion of seeded cells with the nearest neighbour at a given distance. The nearest neighbour cumulative distribution function for the proportion of cells with a nearest neighbour at a distance < r for a 2D homogeneous Poisson process is given by \({D}_{1}(r)=1-{e}^{-lp{r}^{2}}\)24. This theoretical proportion of cells with a first nearest neighbour less than r away, D1, for initial seeding densities n0 = 500, 1500 and 5000 cells/cm2 is shown in Fig. 5b along with data from a simulation at each seeding density. For the initial seeding density of n0 = 1500 cells/cm2 the nearest neighbour distance between cells will be less than 150 μm in around 30% of cases, similar to the experimental estimate of 23%. We have neglected the movement of cells as, based on observed migration speeds of approximately 16 μm/h14, the time required to traverse the critical interaction distance of 150 μm is around 9 h, a large portion of the cell cycle time.

Figure 5
figure 5

(a) The probability distributions for the nearest neighbour distance in a Poisson point process, given by \({d}_{1}(r)=2\lambda \pi r{e}^{-lp{r}^{2}}\) for λ = η0 = 175 (blue, n0 = 500 cells/cm2), λ = η0 = 525 (orange, n0 = 1500 cells/cm2) and λ = η0 = 1750 (green, n0 = 5000 cells/cm2). The histograms show the simulated distributions for seeding cells according to a Poisson process with rate λ, each is for one run of the seeding simulation. (b) The theoretical proportion of cells with a nearest neighbour < d cmaway, \({D}_{1}(r)=1-{e}^{-\lambda \pi {r}^{2}}\), for λ = η0 = 175 (blue dashed, n0 = 500 cells/cm2), λ = η0 = 525 (orange dashed, n0 = 1500 cells/cm2) and λ = η0 = 1750 (green dashed, n0 = 5000 cells/cm2). The corresponding solid lines show the values of D1 from a single run of the seeding simulation. (c) Example clustering configurations for cell triples with microscopy examples of these formations. The distance between cells, d, must be less than 150 μm. (d) The percentage of cells in a cluster of size n (for an interaction radius of 150 μm) for n = 1 (single cells, blue circles), n = 2 (pairs, orange diamonds), n = 3 (triplets, green square) and n = 4 (quadruples, pink open circles). The error bars represent standard deviations. Inset: The data on a log-log scale. The convergence to a linear relationship (and hence an exponential relationship for the non transformed data) can be seen.

To consider the groupings of seeded cells we use a density based clustering algorithm. Cells less than 150 μm apart are considered as being part of the same cluster, and any neighbouring cell less than 150 μm away from any other cell in the cluster is also considered to be part of the cluster. This allows for clusters of elongated shapes. Note that this definition of a cluster of cells is non-trivial and has implications for the interactions of cells, but from the experimental images we see examples of clusters in elongated shapes as well as the more common regular shapes. Examples of different cluster formations for triples are shown in Fig. 5c. The proportion of cells in a cluster of a size n at different seeding densities is shown in Fig. 5d. At low initial seeding densities, e.g. n0 = 500 cells/cm2 the majority of cells have no close neighbours. As the seeding density increases, the proportion of pairs of cells increases. The proportion of each cluster size first rises with n0 before reaching a maximum and then tends to zero as more possible cluster sizes become available. The distributions shown in Fig. 5 provide the initial conditions corresponding to cell seeding at different densities.

Now the initial conditions of cell seeding are known, the growth of colonies from these cells can be considered. Cells are seeded at density n0 according to a Poisson point process as described above, and then the division of the cells and growth into colonies can be described by the two-population model. The area coverage of the plate can be estimated from the number of cells we expect to be present. The average area of a cell, Acell, is approximately 250 μm2 [S. Orozco-Fuentes, private communication], corresponding to a cell diameter of 18 μm. The percentage of area covered by cells evolves as shown in Fig. 6a. Taking this value, the proportion of area coverage is the area covered by the cells, N(t)Acell, divided by the growth area of the plate and Aplate = Nseeded/n0. We therefore expect the percentage area coverage over time to tend to an exponential relationship due to the growth of N(t), scaled by a factor equal to n0Acell/Nseeded, as we see in Fig. 6a. The time taken for the growth area to be 100% covered, t100%, for varying initial seeding densities, is shown in Fig. 6b.

Figure 6
figure 6

(a) The percentage area coverage of cells at different time points for different cell seeding densities (blue 500 cells/cm2, orange 1000 cells/cm2, green 1500 cells/cm2 and pink 2000 cells/cm2) according to the two-population model. Inset: The data on a log-linear scale. (b) The time the growth area is 100% covered for varying cell seeding densities, with line of best fit t100% = 20n0−0.15. Inset: The data on a log-log scale, with line of best fit log(t100%) = (−0.15 ± 0.005)log(n0) + (3.0 ± 0.04).

Simulating the initial conditions as described above and the colony growth allows us to estimate the crucial time at which the colonies begin to merge. The cells are seeded at density n0, with η0 cells attached, and are then sorted into clusters based on their spatial distances away from each other. Each cluster grows according to the two-population model, estimated as a circle with centre at the geometric centre of the cluster and radius based on the number of cells present, N(t). The growth rate for triples and larger clusters of cells is assumed to be the same as that for pairs of cells. The time at which any colony begins to merge with its neighbour is critical as the time that clonality is lost, τ, illustrated in Fig. 7a. The time the first colony merge occurs at varying seeding densities is shown in Fig. 7b, with least squares fitting τ = (−0.007 ± 0.0001)n0 + (102 ± 3) with R2 = 0.99, τ in hours and n0 in cells/cm2. We are therefore able to estimate the time taken for the first colony merge to occur from the equation

$$\frac{\tau }{1\,{\rm{h}}}\approx 100-\frac{{n}_{0}}{140\,\,{{\rm{cm}}}^{-2}},$$
(3)

where n0 is the initial seeding density of cells before attachment in cells/cm2 and τ is produced in hours. Experimental values were extracted for τ from Experiment 2 and the model captures these values within errors for the seeding densities 3000, 4000 and 7000 cells/cm2. These results are summarised in Table 1 for convenience. The results are also shown for extrapolated growth rates in Fig. 7b, under the assumption growth rates continue to increase with cluster size. The least squares fitting is τ = (−0.01 ± 0.001)n0 + (97 ± 4) with R2 = 0.99, τ in hours and n0 in cells/cm2. The increasing growth rates cause an earlier first merger time, particularly at the higher seeding densities where larger clusters are more likely.

Figure 7
figure 7

(a) Diagram illustrating initially seeded cells, and the colonies at time τ, the first time at which two growing colonies touch from a simulation of the cell seeding model. The orange cells are classed as a pair and grow accordingly. (b) An example of two colonies merging from experimental images. The two colonies, highlighted in blue and orange are beginning to merge together at 5 days. The scale bar represents 100 μm. (c) The mean time (averaged over 20 simulations) that the first colony merge occurs for cells seeded at different densities growing according to the two-population model. The time assuming the growth rates for clusters of three or above are the same as the growth rate for pairs is shown as blue circles with the line of best fit, τ = (−0.007 ± 0.001)n0 + (102 ± 3) with R2 = 0.99 in blue. The mean time using extrapolated growth rates for larger clusters is shown as orange squares with line of best fit τ = (−0.01 ± 0.001)n0 + (97 ± 4) with R2 = 0.99 in orange. To test the prediction, experimental values of τ were extracted from Experiment 2 and are shown as green crosses. The error bars have been calculated through error propagation based on an error of ±0.5 days for each of the measured values due to the time resolution of the images.

Table 1 The expected number of attached cells, single cells at seeding and the time to the first colony merge for varying seeding densities.

Discussion

Colony growth originating from single or pairs of cells is well modelled using a two-population stochastic exponential model where the growth rate is sampled from a Normal distribution. Experimental results show that colonies that start from a single founder cell grow at a rate different from those originating from a cell pair, with a relative difference of 8 ± 1.8%. The colonies originating from pairs of cells have a higher mean growth rate (although it is within standard deviation errors of the growth rate for single cell colonies), and the standard deviation of the growth rate for pairs is much lower, as seen in Fig. 2b.

Stem cells do not show any contact inhibition of proliferation which would slow colony growth, however, it is worth noting that there could be a trend in growth rates with respect to the ‘tightness’ of the cluster packing. It is likely that a triplet of cells in contact grows faster than a triplet where each cell is 150 μm apart, based on the evidence that stem cells proliferate more effectively in close proximity with each other22,23. It is likely that the growth rates are actually time dependent, so although we see a difference in the growth rates from single and paired founding state up to 72 h, we expect this difference to reduce and eventually become negligible over time. Similarly, we expect that the difference in growth rates between triples and other larger groups of founding cells also decrease as the cluster size increases. It is for this reason we set the growth rate for clusters of triples and larger groups to be the same as the growth rate for pairs. Allowing the growth rates to continue increasing with cluster size (extrapolated from the data) results in less time to the first colony merge, especially at the higher seeding densities where the probabilities of larger clusters is increased, Fig. 7c. Further experimental data would be needed to clarify the trends in growth rates for both increasing cluster size and cell separation distance within the cluster.

This model can be used to predict colony sizes at different time scales. We thus suggest that single clone colonies (that originate from a single founder cell) can be selected as the smaller of the growing colonies, but only within a certain period, t*, before colony sizes from a range of founder cells can be equivalent. This time gives an indication of when colonies should be observed to identify single founder cell colonies based on colony size. This time, t*, ranges from around 30 to 50 h depending on the seeding density.

Experimental results show that on average 35% of initially seeded cells are attached to the plate 24 hours after cell seeding, Fig. 1d. We take this into account when considering the initial conditions of cell seeding. It is thought that cells affect each other’s movement if they are less than 150 μm apart10. The nearest neighbour distributions for cells seeding according to a Poisson process, shown in Fig. 5, indicates that for a low seeding density of n0 = 1500 cells/cm2 (i.e. around 500 attached cells) we would expect to see around 30% of cells with a nearest neighbour closer than the critical value d = 150 μm. However, our previous results suggest that we wouldn’t expect all these pairs to migrate towards each other and form colonies14. This critical separation distance may be affected by the cluster size of neighbouring colonies. Since the cellular communication is by inter-cellular signalling, it is likely reduced as neighbouring colony sizes increase. At higher seeding densities this individual cell migration is more likely to have an effect on the cell clustering at early times. However, as larger groups of cells in close contact form colonies, individual migration reduces to a negligible amount, meaning it is only a factor at early times. For large colonies there could be collective migration effects to consider. Further experiments are needed to quantify this movement, which could then be introduced into the model.

The initial clustering of cells is an important and non-trivial consideration with implications on cell communications. We define a cluster to be a group of cells where each cell is 150 μm or closer to another cell in the cluster. This allows clusters of different structures, as illustrated in Fig. 5c. This random seeding process gives the proportion of cell clusters of different sizes presented in Fig. 5d. At low seeding densities up to ≈3000 cells/cm2, i.e. ≈1000 attached cells, single cells with no neighbours dominate, making up over 50% of cells present. As the seeding density increases the amount of single cells decreases, with the number of cells in pairs, triples and larger groups increasing (each proportion will always tend to zero as a wider variety of cluster sizes appear). This gives an indication of the seeding densities required to ensure a large proportion of the colonies originate from a single founder cell.

Following the population growth model we can also estimate the time to completely fill a growth area at different seeding densities, Fig. 6. A more interesting time to consider is the average time that growing colonies first merge due to physical proximity at different seeding densities, Fig. 7b. This time, τ, has a linear relationship with τ ≈ 100 − 0.007n0, allowing us to predict this time for any seeding density. This statistical estimation of τ suggests the latest possible observation time to catch colonies for re-plating before they merge.

Although developed here for a set of particular hESC experiments, the modelling framework can be applied more generally to other cell lines and hiPSCs, and experimental conditions to predict the critical timescales. These results can be used to inform cell seeding density choices to maximise clonal colonies and avoid those arising from more than one founder cell. Future work should explore the role of migrating cells and colonies of clonality and apply the model to hiPSCs.

Methods

Two types of experiment were carried out: Experiment 1 collected data on colony numbers at 72 hours post-attachment and Experiment 2 collected data on the rates of cell attachment and the time to colony merging.

Experiment 1: Human embryonic stem cells (WiCell, Madison WI) were plated at a density of 1500 cells/cm2 onto 6-well plates coated with Matrigel® Basement Membrane Matrix (Corning Inc.), in the mTeSR1TM media (STEMCELL Technologies). The cells were stained with Cell Trace Violet Dye (Thermo Fisher). At 72 hours after cell attachment the cells were fixed and microscopy images (Nikon Eclipse Ti-E microscope) were taken of the colonies. Data was then collected using Imaris Image Analysis Software (BITPLANE Inc) to identify cell boundaries and count the number of cells in each colony. This data was extracted for 48 colonies.

Experiment 2: Following the same set-up as Experiment 1, the numbers of cells attached 24 hours after seeding were recorded for different initial densities of 1000, 1200, 2000, 3000, 4000 and 7000 cells/cm2. Microscopy images were also taken of these wells each day for eight days for the initial seeding densities 1200, 3000, 4000 and 7000 cells/cm2. The time the first colony merge occurred at each seeding density was extracted from examination of the images.