Predicting pattern formation in embryonic stem cells using a minimalist, agent-based probabilistic model

The mechanisms of pattern formation during embryonic development remain poorly understood. Embryonic stem cells in culture self-organise to form spatial patterns of gene expression upon geometrical confinement indicating that patterning is an emergent phenomenon that results from the many interactions between the cells. Here, we applied an agent-based modelling approach in order to identify plausible biological rules acting at the meso-scale within stem cell collectives that may explain spontaneous patterning. We tested different models involving differential motile behaviours with or without biases due to neighbour interactions. We introduced a new metric, termed stem cell aggregate pattern distance (SCAPD) to probabilistically assess the fitness of our models with empirical data. The best of our models improves fitness by 70% and 77% over the random models for a discoidal or an ellipsoidal stem cell confinement respectively. Collectively, our findings show that a parsimonious mechanism that involves differential motility is sufficient to explain the spontaneous patterning of the cells upon confinement. Our work also defines a region of the parameter space that is compatible with patterning. We hope that our approach will be applicable to many biological systems and will contribute towards facilitating progress by reducing the need for extensive and costly experiments.

the mechanisms of pattern formation during embryonic development remain poorly understood. embryonic stem cells in culture self-organise to form spatial patterns of gene expression upon geometrical confinement indicating that patterning is an emergent phenomenon that results from the many interactions between the cells. Here, we applied an agent-based modelling approach in order to identify plausible biological rules acting at the meso-scale within stem cell collectives that may explain spontaneous patterning. We tested different models involving differential motile behaviours with or without biases due to neighbour interactions. We introduced a new metric, termed stem cell aggregate pattern distance (SCAPD) to probabilistically assess the fitness of our models with empirical data. The best of our models improves fitness by 70% and 77% over the random models for a discoidal or an ellipsoidal stem cell confinement respectively. Collectively, our findings show that a parsimonious mechanism that involves differential motility is sufficient to explain the spontaneous patterning of the cells upon confinement. Our work also defines a region of the parameter space that is compatible with patterning. We hope that our approach will be applicable to many biological systems and will contribute towards facilitating progress by reducing the need for extensive and costly experiments.
Developmental patterning is the process whereby a population of cells initially seemingly homogenous differentiates in a spatio-temporally coordinated manner to generate segregated domains of distinct cell fates. During mammalian development, early patterning events break symmetry and define the future body axes of the embryo 1,2 .
Decades of research in model animals have permitted to uncover molecular players responsible for the patterning of the mammalian embryo. These include diffusible signals secreted by the cells such as Nodal, BMP4, Wnt and their respective inhibitors engaged in a complex molecular interplay that involves many feedback mechanisms 3,4 . Yet much remains to be understood about how the precise positioning of individual cell fates are coordinated with morphogenetic events in space and time.
In recent years, new insights have been gained with embryonic stem cells (ESCs) in vitro. ESCs are a population of cells endowed with the capacity to self-renew and differentiate in vitro into all somatic cell types as well as into the germ cell lineage. When ESCs are geometrically confined on adhesive areas of specific sizes and shapes (micropatterns), ESCs self-organise to form patterns of cell fates reminiscent of the organisation of cell fates found in vivo [5][6][7][8] . One advantage of in vitro systems is that they bring the system to a level of complexity that makes it amenable to interpretation and computational modelling. For example, ESCs-based in vitro systems have made it possible to investigate signalling dynamics using both experimental perturbations and computational modelling approaches to better understand how distinct domains of cell fates are specified over time [9][10][11] .
One aspect that remains relatively unexplored is how distinct group geometries and local cellular movements may affect the boundaries between domains of cell fates initially specified by signalling molecules. We recently Materials and methods empirical data. We used the data previously described in Blin et al. 4 mESCs were cultured in LIF/Serum conditions. These conditions are known to maintain the cells in a dynamic equilibrium comprised of heterogeneous cell states including both undifferentiated cells and cells primed for differentiation 20 . Micropatterning was applied to control the precise spatial deposition of cell adhesive molecules onto the surface of the culture dish. This technique makes it possible to control the biochemical environment of the cells while at the same time providing precise geometrical confinement. A detailed video protocol of the method employed in Blin et al. is available at https ://www.jove.com/t/59634 /mappi ng-emerg ent-spati al-organ izati on-mamma lian-cells -using 21 . Briefly, we have 186 images for disc micropatterns and 152 images for ellipse micropatterns. Each images contains one cell colony that was captured after 48 hours from the initial seeding of the stem cells. The dimensions of disc and ellipse are shown in Fig. 1. The radius of the disc is 97.5 μm; the semi-major axis of the ellipse is 195.5 μm and the semi-minor axis of the ellipse is 49 μm.
In this study, we focus on studying the pattern formation observed in T+ and T− cells on disc and ellipse micropatterns. T+ cells are early differentiated cells as they are marked by Brachyury (T); T− cells are naïve cells, which means they have not started differentiation yet. On a cellular level, specific pattern formation of T+ and www.nature.com/scientificreports/ T− cells can be observed 48 hours after seeding cells randomly on the disc and ellipse shaped micropatterns. On average 6 cells were seeded on the disc or ellipse micropatterns, which means theoretically cells can only survive within these restricted areas. However, it may be possible that a few cells might migrate out of these restricted areas in practice. This might be due to the degrading of the hydrophobic substrate, proteins attaching to the hydrophobic regions, or matrix secreted by cells etc. We do not include the cells migrated out of the micropatterns in our modelling since we assume it is a result of uncontrollable noise in the wet lab. In addition, we selected cells within the micropatterns that are more than 2 μm inside the border to obtain a clear cut at the border of the micropatterns. For disc micropatterns, we selected cells within 95.5 μm from the centre of the disc; for ellipse micropatterns, we selected cells within the ellipse with the semi-major axis as 193.5 μm and semi-minor axis as 47 μm.
On an aggregated level having multiple cell colonies, after 48 hours from random seeding for disc micropatterns, T− cells preferentially stay at the centre of the disc while T+ cells preferentially localise at the edge of the disc. For ellipse micropatterns, T+ cells preferentially localise at the tips of the ellipse only, while T− cells preferentially stay at the centre of the ellipse.
As we only consider the 2D pattern we observed, we extracted the cells' locations (described by x and y coordinates) based on the centres of the nuclei. Based on the 2D locations, we counted the number of neighbouring cells for each cell within a certain radius. We calculated the percentage of cells with at least one neighbouring cell within a certain radius (as shown in Fig. 2). For example, 14% of cells have at least one neighbouring cell within 2 μm. Figure 3 presents the workflow in this study. We first propose biologically plausible rules of cell motility which may lead to the resulting pattern formation. We next constructed a set of models to test all combinations of these possible rules (see below for details) as evaluated the models through a novel metric that measures the distance between models and empirical data. With inversing models, we can obtain possible initial states providing information to biologist about seeding cells.

Model construction.
We applied agent-based models for self-organised pattern formation in ESCs from a population level. We have previously described the model structure 22 . Models were constructed in NetLogo 6.0.4, an open sourced environment specifically designed for agent-based modelling. The code is available online at https ://githu b.com/ Minho ngW/ESCs_model s. In short, there are two types of agents standing for cells and environments with their own parameters (parameters are listed in Table 1). The models are lattice-based as environment agents are a set of square patches that cell agents can move on top of, one patch stands for 1 μm. The models are state-based with 192 timestamps representing 48 hours in real life, i.e. we have measurements every 15 minutes which correspond to the timing resolution used (timestamps).
The aim of the study is to construct a minimal model to investigate the contribution from cell motility to pattern formation, therefore, we omit the contribution from different shapes of cells, cell differentiation, division and apoptosis. Similarly, we do not include additional physical rules between the cells and their environment to simplify the resulting model.
The biologically plausible rules we proposed are:

Directional movements decided by neighbouring cells (based on the empirical observations from wet lab that T+ cells move away from neighbouring cells and T− cells move toward to neighbouring cells).
We hypothesize that cells immigrating direction depends on the forces they received from their neighbouring cells within a distance (sensing radius R). We calculate the force according to where F x,y is the force with x and y representing the direction of the force in the two dimensional hyper-plane. D is the Euclidean distance between cells. T+ cells receive pushing forces from their neighbouring cells, while T− cells receive pulling forces from their neighbouring cells. Then we sum up the forces to get the direction of cells. The final immigration direction is generated by a normal distribution of the final sum force with a specific standard deviation (σ). Normal distribution was used as noise generator. 4. Border effect of cells (based on the assumption that cells tend to stay within the micropatterns instead of escaping from these constraint areas). When a cell reach the border of the constraint area, which means with current immigrating direction and speed the cell is tempting to immigrate to somewhere outside of the constraint area in next state, this cell will change its immigrating direction by a small angle (α). By comparing the current direction and the direction to the closest patch outside of the constraint area, the cell will decide the angle change is clockwise or counter clockwise. Figure 4 shows the diagram illustrating these rules. In this study, we tested all combinations of these four rules with 16 agent-based models. The list of the models numbers with corresponding rules is shown in Tables 2 and 3.  www.nature.com/scientificreports/ Subsequently, based on the previous rule that cells get the direction from neighbouring cells and move with fixed speed, we improved the selected best models by adjusting cell velocity by neighbouring cells, hence, cells do not always migrate with the same velocity. In addition to the previous rule of directional movements, we calculate the ratio of actual velocity and previous mean velocity based on the sum of forces divided by the sum of the magnitude of the forces. v is the previous mean velocity of this type of cell. v actual is the actual speed of cell migration. parameter optimisation through grid search. Based on the model with improved rules, we tested a set of parameters. Because the number of free parameters is not big, we applied grid search for parameter optimisation to test all combinations of the values of these free parameters. The free parameters we tested are different sensing radius (R) for getting neighbouring cells and standard deviation (σ) for generating the final direction (as listed in Table 4). According to the results illustrated in former experiments of growing T+/T− cells in confined areas 4 , T+ and T− cells shows different preferential localisation by taking 50 μm and above as radius for neighbourhood on disc and ellipse micropatterns. Considering the size of the micropatterns, we tested 25, 50, 75, and 100 μm for sensing radius in our models.
Following parameter optimisation, we tested applying different sensing radius (R) and different standard deviation (σ) for T+ and T− cells separately. Hence, T+ and T− cells can have different optimised value for these  www.nature.com/scientificreports/ parameters. Again, we applied grid search for parameter optimisation by testing all combinations of the potential values. We get the model with the best performance by applying the metric explained in the following section.

Model validation and a new evaluation metric.
We tested related existing approaches to evaluate pattern formation in disc and ellipse models against empirical data. For example, we considered the earth mover's distance 26 (EMD), Kullback-Leibler divergence 27 (KL divergence) and continuous rank probability score (CPRS) metrics (for distribution density of each grid point) 28 . We take output from Model 1 (random model) and Model 7 on disc micropatterns as example. Figure 5 shows the density plot of T+ cells distribution after running models 100 times. The results from Model 7 is closer to our desired pattern as T+ cells prefer to localise at the border on the disc. Table 5 shows the results of EMD, KL divergence and CRPS. Hence, for all these established approaches,    www.nature.com/scientificreports/ the resulting outcome did not follow the visual impression we had from observing the resulting patterns, which motivated the development of a new approach that was tailored specifically to this application. We developed a new metric to quantify the difference in the constellation of the cell patterns between the model results and the empirical data, which we call the stem cell aggregate pattern distance (SCAPD). The metric calculates the distance between the total densities within high density areas (HDA) of aggregate cell cultures and takes T+ and T− cells into account separately. Figure 6 illustrates the process of getting the borders of high density areas for evaluation. The steps of getting the borders are: • Step 1 Getting density maps: We applied 2D kernel density estimation 29 with Gaussian kernels to generate aggregate density maps of T+ and T− cell colonies separately. We applied Botev's approach for density estimation and used their internal bandwidth estimate with default optionse 30 . We calculated the density over a 256 × 256 grid space, hence the resulting density estimate comprises 256 × 256 entries. • Step 2 Getting points by thresholding: Based on density maps, we collected a list of points marking the border of the areas by thresholding. The threshold T was calculated as the mean of maximum and minimum value of density of each grid point (g i ), as shown below.
• Step 3 Best fit circles/ellipses: Based on the points we marked, we generated the best fit circles or ellipses based on least-squares fitting 29 . • Step 4 Fixing asymmetric: Since the cells were seeded randomly and there is no fundamental reason to suggest the micropatterns should exhibit particular preference on either end (e.g. left vs right part of the ellipse), we do not believe cells would end up on a specific direction. Due to the symmetrical properties of disc and ellipse, we believe that the assessment of the disc micropatterns should be symmetric according to any arbitrary angle, and the pattern on ellipse micropattern should be symmetric according to x and y axis.
For disc micropattern, we keep the radius of the best-fit circles and moved the centres to the centre of the disc micropattern (point(0,0)). For ellipse micropatterns, we use the maximum value for both semi-minor and semi-major axis of the best fit ellipses, and then we keep the absolute value of x of the centre of the best-fit ellipse as the mean of the absolute values of two x values, and y of the centre of the best-fit ellipses as 0.
After setting the borders, we calculated the total density within these areas for T+ and T− cells separately in empirical data on both disc and ellipse micropatterns. The total density is the sum of the density for the subset of the originally evaluated 256 × 256 grid points for the density which fall within the defined borders that we Figure 6. The process of getting the borders of evaluation of T+ cells and T− cells separately for disc and ellipse micropatterns from step 1 to step 4 accordingly. White lines stand for the border of the micropatterns. In step 1, the density maps of T+/T− cells on disc and ellipse micropatterns are provided. In step 2, the contour plots were generated by thresholding. In step 3, best fit circles/ellipses for the borders of HDA are shown in red lines. In step 4, the asymmetric problems are fixed. www.nature.com/scientificreports/ have set up for matching to ground truth. Hence, for empirical data, we got the total density within the area for both T+ and T− cells, noted as T e_T+ and T e_T− . In this study, we take the empirical data collected from the wet lab as ground truth. For the evaluation of the models, we compare the model results against the empirical data. In order to do so, we firstly applied kernel density estimation with using same parameters and approaches we applied in empirical data. Subsequently, with the borders we generated from the empirical data, we calculated the total density within the defined borders on our model results as well. We run each model 100 times to take into stochasticity effects, and calculated the total density of the aggregated 100 runs results from both T+ and T− cells separately noted as T m_T+ and T m_T− . We calculate SCAPD to express the difference observed between the model results and the experimental data (used as ground truth), as follows: In the next step, we got the sum of the absolute difference of T+ and T− cells total density within the area for both disc and ellipse colonies. A small sum of absolute total density difference indicates the model is close to the empirical data (if completely matching the empirical data, then the absolute difference would be exactly zero).

Results
Ground truth. Using the borders we defined for SCAPD on Fig. 6, we calculated the total density within the areas for T+ and T− cells in the empirical data on both disc and ellipse micropatterns. The results of total density from the empirical data are shown in Table 6.
Basic models. On Fig. 7, we show the SCAPD results from 16 models for both disc and ellipse micropatterns. As shown in Fig. 7, for both disc and ellipse micropatterns, Model 7 (with different velocity and directional movements) and Model 14 (with different velocity, directional movements, and border effect) have the best performance. For disc models, Model 14 is slightly better than Model 7, while for ellipse models, Model 7 is slightly better than Model 14. However, the difference between Model 7 and Model 14 is minimal. So in the next step, we focus on Model 7 and Model 14.
We tested the sensitivity of model output with respect to changes of the parameter of angle change in Model 14. As the results shown in Table 7, the model output, based on the results of SCAPD, is not sensitive to the small degree of angle change.
Models with parameter optimisation. Based on the original Model 7 and Model 14, we improved the rules by getting the ratio of velocity. Afterwards, we tested a set of parameters combination of sensing radius (R) and standard deviation (σ) (the possible values we tested are listed in Table 4) and got the SCAPD results as (4) SCAPD = T e_T+ − T m_T+ + T e_T− − T m_T− Table 6. Total density within the area from empirical data. www.nature.com/scientificreports/ shown in Fig. 8. For disc models, Model 14 is slightly better than Model 7, while for ellipse models, Model 7 is slightly better than Model 14. However, the difference is extremely small (0.002213 for disc models and 0.0006 for ellipse models). Hence, with getting the ratio of velocity and parameter optimisation through grid search, we reduced SCAPD about 38% from the original best model for disc and about 27% from the original best model for ellipse.

T− cells T+ cells
Best performance models. Subsequently, we tested different sensing radius (R) and different standard deviation (σ) for T+ and T− cells separately to bring the models closer to the empirical data. Figure 9 shows the results of all combinations of the parameters. As we can see in Fig. 9, sensing radius (R) plays an important role to models results, while models are not so sensitive to standard deviation (σ). Even though sensing radius should be an intrinsic cell property, we have different optimised values for disc and ellipse micropatterns. This difference  Figure 8. SCAPD results from models with grid search for parameters optimization. Black vertical lines stand for parameters we tested, sensing radius (R) and standard deviation (σ). The last blue vertical lines show SCAPD results. Each line crossing three vertical lines stand for each mode with specific parameters setting and quantified SCAPD. Red line is the best performance model in this group; orange lines are the following three good models; grey lines stand for the rest models. www.nature.com/scientificreports/ might cause by (1) the natural properties of the different shapes of micropatterns since part of the circle that we considered as neighbourhood of the cells would be empty due to part of the circle would be outside of the micropatterns; (2) different crowdedness between different shaped micropatterns. Hence, it is reasonable that disc and ellipse micropatterns have different optimised values for these parameters. Table 8 summarizes the final SCAPD results from the best performance models compared to the initial random models (Model 1 without any specific motility rules in the original 16 models). Since the mathematical properties of disc and ellipse, we took the results from the random model as the baseline and checked the improvements of our models by comparing against the initial random models. The improvements in our models reach up to 70% and 77% performance improvements in terms of SCAPD for disc and ellipse micropatterns respectively.  www.nature.com/scientificreports/ Finally, we wanted to assess the computational time required to test each model. We tested out the models on a high performance compute cluster with requesting 64 GB memory. We repeated the timing computations 100 times for each model. The total time consumption for each model is listed on Table 9.

Discussion
Previous studies on cell sorting and tissue morphogenesis have described motility driven pattern formation and provided some insights at the molecular level 31 . In this study, we focused on specific pattern formation in ESCs and constructed 16 agent-based stem cell pattern formation models to test all combinations of four biologically plausible rules of cell motility that we have defined. Furthermore, we introduced a new metric, SCAPD, quantifying differences between the model derived pattern formation and the experimental ESCs pattern formation. We found that Model 7 and Model 14 have best performance, which is consistent for both disc and ellipse micropatterns. Model 7 is built using different velocity and directional movements, while Model 14 has an additional rule compared to Model 7 also including the border effect.
We applied grid search for parameter optimisation to bring our model outputs closer to the empirical data. We tested different values of sensing radius and standard deviation. We also tested different values for angle change in rule iv and proved that the model output is not sensitive to this parameter. In consequence, we observed that SCAPD of Model 7 and Model 14 after parameter optimisation is considerably lower compared to the original outputs. Specifically, we computed SCAPD as 0.08 and 0.07 for disc and ellipse micropatterns respectively after revising rules and parameters optimization. Compared to the initial random models, the models' performance was improved by 70% and 77% for disc and ellipse micropatterns, respectively. The presented models can probabilistically produce broadly realistic pattern formation (when compared to the empirical data).
Different modelling methods have been applied to study stem cells at a population level to reproduce the population dynamics by generating minimal models 32 . For example, Libby and colleagues have applied cellular Pott models to human pluripotent stem cells, enabling a machine learning optimisation approach to predict experimental conditions that yield targeted multicellular patterns 33 . Multiple mathematical models were applied to increase the precision of modelling stem cell proliferation 34 and investigating stem cells self-renewal 35 . Compared to cellular Pott models, which is targeted to achieve the pattern emergence with the possible simplest computation, agent-based modelling provides higher freedom as cells are free to move and cells can interact with other cells and their environment. Hence, agent-based modelling is a suitable modelling method for investigating cell motility. In addition, agent-based modelling have been widely applied in cell biology. For example, Briers and colleagues applied agent-based modelling to study specific pattern formation in ESCs differentiaion 36 . In this study, we applied agent-based modelling to investigate specific observed pattern formation in ESCs with focusing cell motility, and tested a wide range of motility rules to obtain the minimal rules that can reproduce the pattern formation. Agent-based modelling allows us to model cell motility intuitively and gives insight of cell motility.
Compared to our previous work, in this paper, we tested all combinations of the proposed biological plausible rules and improved the models performance to a better level based on the results from the new proposed metric SCAPD. As one of the challenges of applying agent-based modelling in morphogenesis is the evaluation of patterning 37 , in this study, we tested multiple existing metrics for evaluation, including earth mover's distance, Kullback-Leibler divergence, and continuous rank probability score, and demonstrated that a new metric is required. SCAPD evaluates the models' results by calculating the distance between models' results and empirical data. Based on SCAPD, we quantified our models' results with different rules instead of visually estimating the results. The high variety in empirical data could be the reason that existing approaches for evaluation resulting outcomes that do not follow the visual impression we had from the resulting patterns.
The cell behaviours in reality may differ in practice from the simplifying assumptions explored in the context of this study. However, we demonstrated that we could replicate the pattern formation with a quantified level of uncertainty. Firstly, we proposed new testable rules for understanding the mechanisms of pattern formation based on the presented work. The new hypotheses are that the pattern formation can be achieved by engineering cell speed and the level of adherence of cells by using synthetic biology 38,39 . These hypothesis could be verified through experiments in the future. Secondly, these findings can support the future study on engineering cell motility to obtain desired patterns. Lastly, we can propose potential initial states for desired patterns by finding the probabilistic corresponding relationship between initial states and the final states in our models. More experiments would be required to verify the corresponding relationship. Through these experiments, we provide contributions to regenerative medicine by increasing the robustness of achieving desired pattern formation and having a better understanding of the driving power of pattern formation from population level.
In this study, we generated probabilistic models to reproduce the observed pattern formation in pluripotent stem cells. We developed a new metric SCAPD to evaluate the models' results at a quantified level. Through Table 9. Time consuming of running model 7 and Model 14 for disc and ellipse micropatterns.

Models
Total running time (22, www.nature.com/scientificreports/ revising cell motility rules and parameters optimization, we improved the models performance about 70% and 77% compared to initial random model for disc and ellipse micropatterns respectively. The models provide opportunities for engineering the cells in reality to achieve the desired pattern formation. Overall, we see this study as a step towards extending the current understanding of ESCs pattern formation, which may facilitate the development of novel stem cell therapies.