Seeding hESCs to achieve optimal colony clonality

Human embryonic stem cells (hESCs) and induced pluripotent stem cells (iPSCs) have promising clinical applications which often rely on clonally-homogeneous cell populations. To achieve this, it is important to ensure that each colony originates from a single founding cell and to avoid subsequent merging of colonies during their growth. Clonal homogeneity can be obtained with low seeding densities; however, this leads to low yield and viability. It is therefore important to quantitatively assess how seeding density affects clonality loss so that experimental protocols can be optimised to meet the required standards. Here we develop a quantitative framework for modelling the growth of hESC colonies from a given seeding density based on stochastic exponential growth. This allows us to identify the timescales for colony merges and over which colony size no longer predicts the number of founding cells. We demonstrate the success of our model by applying it to our own experiments of hESC colony growth; while this is based on a particular experimental set-up, the model can be applied more generally to other cell lines and experimental conditions to predict these important timescales.

Human embryonic stem cells (hESCs) and induced pluripotent stem cells (iPSCs) have promising clinical applications, including the advancement of cellular therapies, disease modelling and drug development, due to their self renewal potential and the ability to differentiate into specialised cells (pluripotency) 1,2 . However, continued efforts in understanding the complex behaviour of hESCs and iPSCs are necessary to make their clinical applications a reality.
A typical in-vitro hESC experiment involves the distribution of cells upon a growth material (the 'seeding' of cells onto a plate). The seeding density is then the number of cells placed on the growth material per unit area. Cells need to attach to the plate surface, which is covered by Matrigel or similar, for viability and proliferation; however, some cells do not successfully attach and are lost. The hESCs then form colonies by repeated mitosis in which two genetically identical daughter cells are produced from the division of the mother cell. The proliferation of cells in this way results in colonies of tightly packed cells in mono-layers along the growth material. The doubling time of stem cells varies and can be affected by various environmental and chemical factors, including cell density [3][4][5] . An important measure of the self-renewal potential of stem cells is the clonality, the condition of being genetically identical. Generating homogeneous populations of clonal cells is of great importance 6,7 as clonally derived stem cell lines maintain pluripotency and proliferative potential for prolonged periods 8 . Some applications require clonal homogeneous populations, e.g. drug discovery 9 and iPSCs for personalised medicine. The selection of the best clones for further experimentation needs to be optimised to make clinical applications safe. If the seeding density is high, the migration of cells and the growth of closely-separated cell groups can cause aggregation of colonies; this is undesirable when a homogeneous clonal population with identical genetic composition is required. The seeding density of cells has been shown to not only have an effect on the clonality of stem cells 10 , but also on their differentiation potential 11 . Moreover, culturing at an overly high density can cause DNA damage and culture adaptation, leading to increasing occurrence of chromosomal aberrations 3,12,13 .
Single hESCs are reported to have no effect on each other's movement if they are greater than 150 μm apart 10 . It is therefore recommended to keep a minimum distance of 150 μm between colony boundaries throughout growth to assure the resulting clonal structures are from single founding hESCs. In our previous work we considered the kinematics of single and pairs of cells, and identified the occasional super-diffusive movements of cells which could lead to re-aggregation 14,15 . 1 School of Mathematics, Statistics and Physics, Newcastle University, Newcastle upon Tyne, UK. 2 Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, UK. 3 Institute of Cytology, RAS, Sankt-Petersburg, Russia. 4 Bio-Imaging Unit, Medical School, Newcastle University, Newcastle upon Tyne, UK. *email: l.e.wadkin@ncl.ac.uk open Results notation. The notation used throughout the manuscript is outlined here for convenience. The number of cells in a colony at a given time, t, is N(t), with t always in hours. Therefore the number of cells at 72 h is N(72). The initial number of cells at time t = 0 is N(0) ≡ N 0 . The seeding density is n 0 and the density of attached cells after 24 h is η 0 = 0.35n 0 , given in cells/cm 2 . The growth rate of a colony is γ (given in h −1 ), the division rate is 1/γ (given in h) and the population doubling time is t d (given in h). The time at which the number of founder cells is indistinguishable based on colony size is t * (given in h). The average time at which colonies merge due to physical proximity is τ (given in h).
Experiments. hESCs were seeded at low density and grown into colonies. Two types of experiment were carried out: Experiment 1 collected data on colony numbers 72 hours post-attachment and Experiment 2 collected data on the rates of cell attachment and the time to the first colonies merging. Further details are given in the Methods section.
Experimental colony size. From Experiment 1, the number of cells in each of the 48 colonies at 72 hours after cell attachment, N(72), was recorded and is shown for each colony arranged in ascending order in Fig. 1a. The corresponding histogram of N(72) is shown in Fig. 1b. The distribution is bimodal, confirmed by the kernel density estimation, with an outlier colony at N(72) = 77 cells. We remove this outlier colony for further analysis as it was most likely formed by several colony merges. We expect the number of cells at time t to evolve roughly as N(t) = N 0 2 t/td , where N 0 ≡ N(0) is the initial number of cells and t d is the time it takes for the population to double, or equivalently N(t) = N 0 e γt , where γ is the growth rate. The bimodal nature of N(72) implies that we have two distinct groups of colonies, lead by differences in N 0 and/or differences in the growth rates between colonies. For a typical duration of the cell cycle, 16-18 hours, one expects 20 cells at 72 hours, corresponding to the first histogram peak. The second peak, at about 40 cells per colony, suggests that some groups of cells have merged to form larger colonies during the 72 hour period, or that the initial condition of the colony growth was in fact N 0 = 2. K-means clustering, a standard algorithm which partitions observations into clusters based upon minimising within cluster variance, splits N(72) into the two ranges, 7 ≤ N(72) ≤ 29 and 34 ≤ N(72) ≤ 77.
To ascertain the initial conditions that underlie the colony growth, we turn to Experiment 2, examining the cells after 24 hours; a typical image is shown in Fig. 1c. There are several characteristic features of the cell distribution revealed in this experiment. Firstly, the random initial positioning of the seeded cells means that some cells are initially isolated with no cells within the interaction distance (estimated as 150 μm). Other cells lie within the interaction distance of each other, forming groups of varying sizes. In Fig. 1c we colour-code the cells according to whether they are isolated, or are effectively in a pair or in a triple, to illustrate how N 0 can vary at low seeding densities. We will return to this feature later.
Secondly, only a fraction of the originally seeded cells are attached to the plate at this time. Cells need to be attached to the plate for viability and proliferation -cells which do not attach are lost. We find for a range of seeding densities, 1000 ≤ n 0 ≤ 7000 cells/cm 2 , that on average 35.19% ± 4.23[0.99] (the mean ± one standard deviation [standard error]) of initially seeded cells were attached 24 hours after plating. Figure 1d shows the proportion of attached cells at different seeding densities. This rate of attachment is usual for stem cells in similar experiments. In the following modelling section of this paper we discuss N(t), the number of cells present in a colony over time, independent of original cell seeding densities. In the cell seeding section we discuss the effects of the cell plating densities, n 0 , where we assume that the actual density of cells present is η 0 = 0.35n 0 to account for the loss at the attachment stage. Note this relationship is easily adjustable for different attachment rates.
Development of the exponential growth model. Throughout this paper we define time t = 0 to be the time that seeded cells have attached to the plate and their proliferation starts. Before this time some cells are lost (as reported above and shown in Fig. 1d) and there is a delay in the growth from the lag-phase experienced by newly plated in vitro hESCs as they adjust to the environment 21 . This is consistent with the experimental data which considers 72 h after cell attachment.
The simplest deterministic model for the number of cells in a colony at time t, N(t), assumes a constant cell division time 1/γ and simultaneous division of all the cells, leading to www.nature.com/scientificreports www.nature.com/scientificreports/ which has the solution, is the initial number of cells at t = 0. However, the cell cycle duration is variable due to various factors, such as inhomogeneities in the nutrient distribution within the growth medium and the inherent variation in the cell cycle duration between different clones. Such effects can be allowed for by considering a Gaussian random growth rate γ, with a mean value μ and standard deviation σ: Different colonies thus grow at different rates sampled from the Gaussian probability distribution. The number of cells then follows a lognormal distribution, A short mathematical explanation can be found in the Supplementary Information. However, this model fails to explain the bimodal distribution of the colony sizes observed at t = 72 h. This is presented in the Supplementary Information, Fig. S1.
We suggest that a bimodal distribution of the colony sizes can be a consequence of a difference in the cell proliferation rates in cell groups of different sizes that may arise from their interactions. It can be expected that colonies starting from larger groups grow faster, not only due to the initial conditions but the preference of cells to growth in close proximity to neighbours 22,23 . To capture the bimodal nature of the colony size distribution, we consider two populations, A and B, each with a different initial condition, www.nature.com/scientificreports www.nature.com/scientificreports/ where the probabilities for a colony to belong to groups A and B are α and β, respectively. Each population then follows Eq. (1) with its corresponding initial condition, and each of N A and N B has a lognormal probability distribution. Thus we consider separately colonies that originate from a single cell and those from cell pairs. We consider the possible role of cell triples and larger progenitor groups in the Discussion. A lognormal mixture fit to the data, shown in Fig . Note that here we have assumed the growth rates for the two populations are different. Our fitting suggests colonies starting with two cells grow faster than their single cell counterparts, consistent with the idea that cells growing in larger colonies proliferate more effectively than isolated cells 22,23 . The fitting under the assumption of identical growth rates, γ A = γ B , is presented in the Supplementary Information, Fig. S2. The fitting does not capture the experimental data and so we conclude it is appropriate to continue assuming different growth rates for the two populations.
Modelling population growth. We have demonstrated how accurately the two-population model captures the experimental data at 72 hours. Now we proceed to develop it into a prognostic model for the colony size at later times. The evolution of the colony size, N(t), according to this two-population model is shown in Fig. 3. Because of the random scatter in the colony growth parameters the admissible range of colony size N(t) increases as t , and sooner or later, the size of the two colony types overlap. At early times, the sizes of the two colony types are distinct, where those beginning from two cells are larger than those from one founder cell, but as time progresses the stochasticity in the growth rates causes an overlap in the two populations. This overlap becomes more prominent for larger numbers of simulations as this increases the incidence of extreme growth rate values. Histograms of N (20) and N(72) in Fig. 3b,d illustrate how at early times the two colony types are distinguishable but over time the distributions spread and merge to make single-clone colonies indistinguishable from heterogeneous colonies. The time at which N A first becomes equal to N B , t * , is the critical time after which it is not possible to distinguish which colonies originated from a single progenitor based on the colony size. This time is shown for increasing numbers of colonies in Fig. 4. As we increase the number of colonies, N cols , we see more of the extreme values occurring with low probability, causing the blurring of the two populations to begin at an earlier time. The relationship is a power law, with the best fit t * = aN cols −b with a = 77.9 ± 4.7 h and b = 0.12 ± 0.01 with an R 2 coefficient of 0.98. This allows us to estimate, for a given plating cell density, the time up to which colonies originating from a single founder cell are identifiable based on the current number of cells in the colony.
Role of seeding density and cell clustering on the formation of homogeneous heSc colonies. Typical low seeding densities for hESCs, intended to grow colonies from single founder cells, commonly range from 500 to 3000 cells/cm 2 . Across a range of seeding densities, we find that the average proportion of cells attached to the substrate after 24 hours is 35 ± 4%, where the range represents one standard deviation within the sample, and the accuracy of the mean value (the standard error) is ±1.0%, presented in Fig. 1d. For example, an www.nature.com/scientificreports www.nature.com/scientificreports/ initial seeding density of n 0 = 1500 cells/cm 2 results in around 500 cells continuing past day one of the experiment. Throughout this section we will present the initial seeding densities n 0 and work on the assumption that 35% of these cells are successfully attached and survive, η 0 = 0.35n 0 .  www.nature.com/scientificreports www.nature.com/scientificreports/ In this section we find the initial conditions corresponding to cell plating at different cell seeding densities and use this to inform the model for colony growth. The seeding of cells randomly across a growth area, A, can be simulated as a homogeneous Poisson point process in which the number of point counts is sampled from a Poisson distribution with the mean λA. The points are then independently and uniformly scattered across the region. For example, if we consider an initial seeding density of n 0 = 1500 cells/cm 2 , we can simulate the seeding of η 0 = 0.35n 0 cells in a 1 cm 2 area by a homogeneous point Poisson process in which the η 0 point counts are sampled from Po(λA) where A = 1 cm 2 , and then locating cells according to the uniform distributions in x and y.
Once cells have been scattered, we can then consider the distances between cells and their nearest neighbours with the aim to estimate the fraction of isolated cells, their pairs (defined as two cells separated by less than 150 μm) and triples etc. The probability density function of the distance, r, to the k th nearest neighbour is known from the theory of Poisson point processes as for the first nearest neighbour. The theoretical distributions along with histograms from simulated data for d 1 (r) are shown in Fig. 5a for initial seeding densities of n 0 = 500, 1500 and 5000 cells/cm 2 corresponding to λ = η 0 = 0.35n 0 . These distributions allow us to calculate the proportion of seeded cells with the nearest neighbour at a given distance. The nearest neighbour cumulative distribution function for the proportion of cells with a nearest neighbour at a distance < r for a 2D homogeneous Poisson process is given by = − − D r e ( ) 1 lpr 1 2 24 . This theoretical proportion of cells with a first nearest neighbour less than r away, D 1 , for initial seeding densities n 0 = 500, 1500 and 5000 cells/cm 2 is shown in Fig. 5b along with data from a simulation at each seeding density. For the initial seeding density of n 0 = 1500 cells/cm 2 the nearest neighbour distance between cells will be less than 150 μm in around 30% of cases, similar to the experimental estimate of 23%. We have neglected the movement of cells as, based on observed migration speeds of approximately 16 μm/h 14 , the time required to traverse the critical interaction distance of 150 μm is around 9 h, a large portion of the cell cycle time.
To consider the groupings of seeded cells we use a density based clustering algorithm. Cells less than 150 μm apart are considered as being part of the same cluster, and any neighbouring cell less than 150 μm away from any www.nature.com/scientificreports www.nature.com/scientificreports/ other cell in the cluster is also considered to be part of the cluster. This allows for clusters of elongated shapes. Note that this definition of a cluster of cells is non-trivial and has implications for the interactions of cells, but from the experimental images we see examples of clusters in elongated shapes as well as the more common regular shapes. Examples of different cluster formations for triples are shown in Fig. 5c. The proportion of cells in a cluster of a size n at different seeding densities is shown in Fig. 5d. At low initial seeding densities, e.g. n 0 = 500 cells/cm 2 the majority of cells have no close neighbours. As the seeding density increases, the proportion of pairs of cells increases. The proportion of each cluster size first rises with n 0 before reaching a maximum and then tends to zero as more possible cluster sizes become available. The distributions shown in Fig. 5 provide the initial conditions corresponding to cell seeding at different densities. Now the initial conditions of cell seeding are known, the growth of colonies from these cells can be considered. Cells are seeded at density n 0 according to a Poisson point process as described above, and then the division of the cells and growth into colonies can be described by the two-population model. The area coverage of the plate can be estimated from the number of cells we expect to be present. The average area of a cell, A cell , is approximately 250 μm 2 [S. Orozco-Fuentes, private communication], corresponding to a cell diameter of 18 μm. The percentage of area covered by cells evolves as shown in Fig. 6a. Taking this value, the proportion of area coverage is the area covered by the cells, N(t)A cell , divided by the growth area of the plate and A plate = N seeded /n 0 . We therefore expect the percentage area coverage over time to tend to an exponential relationship due to the growth of N(t), scaled by a factor equal to n 0 A cell /N seeded , as we see in Fig. 6a. The time taken for the growth area to be 100% covered, t 100% , for varying initial seeding densities, is shown in Fig. 6b.
Simulating the initial conditions as described above and the colony growth allows us to estimate the crucial time at which the colonies begin to merge. The cells are seeded at density n 0 , with η 0 cells attached, and are then sorted into clusters based on their spatial distances away from each other. Each cluster grows according to the two-population model, estimated as a circle with centre at the geometric centre of the cluster and radius based on the number of cells present, N(t). The growth rate for triples and larger clusters of cells is assumed to be the same as that for pairs of cells. The time at which any colony begins to merge with its neighbour is critical as the time that clonality is lost, τ, illustrated in Fig. 7a. The time the first colony merge occurs at varying seeding densities is shown in Fig. 7b, with least squares fitting τ = (−0.007 ± 0.0001)n 0 + (102 ± 3) with R 2 = 0.99, τ in hours and n 0 in cells/cm 2 . We are therefore able to estimate the time taken for the first colony merge to occur from the equation where n 0 is the initial seeding density of cells before attachment in cells/cm 2 and τ is produced in hours. Experimental values were extracted for τ from Experiment 2 and the model captures these values within errors for the seeding densities 3000, 4000 and 7000 cells/cm 2 . These results are summarised in Table 1 for convenience. The results are also shown for extrapolated growth rates in Fig. 7b, under the assumption growth rates continue to increase with cluster size. The least squares fitting is τ = (−0.01 ± 0.001)n 0 + (97 ± 4) with R 2 = 0.99, τ in hours and n 0 in cells/cm 2 . The increasing growth rates cause an earlier first merger time, particularly at the higher seeding densities where larger clusters are more likely.

Discussion
Colony growth originating from single or pairs of cells is well modelled using a two-population stochastic exponential model where the growth rate is sampled from a Normal distribution. Experimental results show that colonies that start from a single founder cell grow at a rate different from those originating from a cell pair, with a relative difference of 8 ± 1.8%. The colonies originating from pairs of cells have a higher mean growth rate (although it is within standard deviation errors of the growth rate for single cell colonies), and the standard deviation of the growth rate for pairs is much lower, as seen in Fig. 2b. www.nature.com/scientificreports www.nature.com/scientificreports/ Stem cells do not show any contact inhibition of proliferation which would slow colony growth, however, it is worth noting that there could be a trend in growth rates with respect to the 'tightness' of the cluster packing. It is likely that a triplet of cells in contact grows faster than a triplet where each cell is 150 μm apart, based on the evidence that stem cells proliferate more effectively in close proximity with each other 22,23 . It is likely that the growth rates are actually time dependent, so although we see a difference in the growth rates from single and paired founding state up to 72 h, we expect this difference to reduce and eventually become negligible over time. Similarly, we expect that the difference in growth rates between triples and other larger groups of founding cells also decrease as the cluster size increases. It is for this reason we set the growth rate for clusters of triples and larger groups to be the same as the growth rate for pairs. Allowing the growth rates to continue increasing with cluster size (extrapolated from the data) results in less time to the first colony merge, especially at the higher seeding densities where the probabilities of larger clusters is increased, Fig. 7c. Further experimental data would be needed to clarify the trends in growth rates for both increasing cluster size and cell separation distance within the cluster.
This model can be used to predict colony sizes at different time scales. We thus suggest that single clone colonies (that originate from a single founder cell) can be selected as the smaller of the growing colonies, but only within a certain period, t * , before colony sizes from a range of founder cells can be equivalent. This time gives an indication of when colonies should be observed to identify single founder cell colonies based on colony size. This time, t * , ranges from around 30 to 50 h depending on the seeding density.
Experimental results show that on average 35% of initially seeded cells are attached to the plate 24 hours after cell seeding, Fig. 1d. We take this into account when considering the initial conditions of cell seeding. It is thought that cells affect each other's movement if they are less than 150 μm apart 10 . The nearest neighbour distributions for The mean time (averaged over 20 simulations) that the first colony merge occurs for cells seeded at different densities growing according to the two-population model. The time assuming the growth rates for clusters of three or above are the same as the growth rate for pairs is shown as blue circles with the line of best fit, τ = (−0.007 ± 0.001)n 0 + (102 ± 3) with R 2 = 0.99 in blue. The mean time using extrapolated growth rates for larger clusters is shown as orange squares with line of best fit τ = (−0.01 ± 0.001)n 0 + (97 ± 4) with R 2 = 0.99 in orange. To test the prediction, experimental values of τ were extracted from Experiment 2 and are shown as green crosses. The error bars have been calculated through error propagation based on an error of ±0.5 days for each of the measured values due to the time resolution of the images. www.nature.com/scientificreports www.nature.com/scientificreports/ cells seeding according to a Poisson process, shown in Fig. 5, indicates that for a low seeding density of n 0 = 1500 cells/cm 2 (i.e. around 500 attached cells) we would expect to see around 30% of cells with a nearest neighbour closer than the critical value d = 150 μm. However, our previous results suggest that we wouldn't expect all these pairs to migrate towards each other and form colonies 14 . This critical separation distance may be affected by the cluster size of neighbouring colonies. Since the cellular communication is by inter-cellular signalling, it is likely reduced as neighbouring colony sizes increase. At higher seeding densities this individual cell migration is more likely to have an effect on the cell clustering at early times. However, as larger groups of cells in close contact form colonies, individual migration reduces to a negligible amount, meaning it is only a factor at early times. For large colonies there could be collective migration effects to consider. Further experiments are needed to quantify this movement, which could then be introduced into the model.
The initial clustering of cells is an important and non-trivial consideration with implications on cell communications. We define a cluster to be a group of cells where each cell is 150 μm or closer to another cell in the cluster. This allows clusters of different structures, as illustrated in Fig. 5c. This random seeding process gives the proportion of cell clusters of different sizes presented in Fig. 5d. At low seeding densities up to ≈3000 cells/cm 2 , i.e. ≈1000 attached cells, single cells with no neighbours dominate, making up over 50% of cells present. As the seeding density increases the amount of single cells decreases, with the number of cells in pairs, triples and larger groups increasing (each proportion will always tend to zero as a wider variety of cluster sizes appear). This gives an indication of the seeding densities required to ensure a large proportion of the colonies originate from a single founder cell.
Following the population growth model we can also estimate the time to completely fill a growth area at different seeding densities, Fig. 6. A more interesting time to consider is the average time that growing colonies first merge due to physical proximity at different seeding densities, Fig. 7b. This time, τ, has a linear relationship with τ ≈ 100 − 0.007n 0 , allowing us to predict this time for any seeding density. This statistical estimation of τ suggests the latest possible observation time to catch colonies for re-plating before they merge.
Although developed here for a set of particular hESC experiments, the modelling framework can be applied more generally to other cell lines and hiPSCs, and experimental conditions to predict the critical timescales. These results can be used to inform cell seeding density choices to maximise clonal colonies and avoid those arising from more than one founder cell. Future work should explore the role of migrating cells and colonies of clonality and apply the model to hiPSCs.

Methods
Two types of experiment were carried out: Experiment 1 collected data on colony numbers at 72 hours post-attachment and Experiment 2 collected data on the rates of cell attachment and the time to colony merging. Experiment 1: Human embryonic stem cells (WiCell, Madison WI) were plated at a density of 1500 cells/cm 2 onto 6-well plates coated with Matrigel ® Basement Membrane Matrix (Corning Inc.), in the mTeSR1 TM media (STEMCELL Technologies). The cells were stained with Cell Trace Violet Dye (Thermo Fisher). At 72 hours after cell attachment the cells were fixed and microscopy images (Nikon Eclipse Ti-E microscope) were taken of the colonies. Data was then collected using Imaris Image Analysis Software (BITPLANE Inc) to identify cell boundaries and count the number of cells in each colony. This data was extracted for 48 colonies. Experiment 2: Following the same set-up as Experiment 1, the numbers of cells attached 24 hours after seeding were recorded for different initial densities of 1000, 1200, 2000, 3000, 4000 and 7000 cells/cm 2 . Microscopy images were also taken of these wells each day for eight days for the initial seeding densities 1200, 3000, 4000 and 7000 cells/cm 2 . The time the first colony merge occurred at each seeding density was extracted from examination of the images.

Data availability
The datasets generated during and analysed during the current study are available from the corresponding author on reasonable request.