Cell cycle dynamics of mouse embryonic stem cells in the ground state and during transition to formative pluripotency

Mouse embryonic stem cells (mESCs) can be maintained as homogeneous populations in the ground state of pluripotency. Release from this state in minimal conditions allows to obtain cells that resemble those of the early post-implantation epiblast, providing an important developmental model to study cell identity transitions. However, the cell cycle dynamics of mESCs in the ground state and during its dissolution have not been extensively studied. By performing live imaging experiments of mESCs bearing cell cycle reporters, we show here that cells in the pluripotent ground state display a cell cycle structure comparable to the reported for mESCs in serum-based media. Upon release from self-renewal, the cell cycle is rapidly accelerated by a reduction in the length of the G1 phase and of the S/G2/M phases, causing an increased proliferation rate. Analysis of cell lineages indicates that cell cycle variables of sister cells are highly correlated, suggesting the existence of inherited cell cycle regulators from the parental cell. Together with a major morphological reconfiguration upon differentiation, our findings support a correlation between this in vitro model and early embryonic events.


Supplementary Figures
. Nuclear segmentation, cell and division tracking, and graphical interface for data validation. (A) Upper panel, unprocessed images from a time lapse experiment of a colony maintained in naive ground state conditions. Images are merged compositions for the three channels corresponding to hCd1-mCherry, hGeminin-mVenus and H2B-mCerulean. The lower panel displays the same colony after manually corrected automatic nuclear segmentation, cell tracking, 3 and division tracking using the LineageTracker plugin of imageJ. (B) Graphical interface developed for the visualization, annotation and data correction of individual lineages and its composing cells. The script allows to interpret the data produced by LineageTracker and automatically generates the lineage dendrogram, while also automatically determining the cell cycle variables CC-L, G1-L, and SG2M-L for each cell that completed the cell cycle during the time lapse imaging. After manually validating each cell and annotating different features (e.g. apoptosis, polyploidy), the script generates a convenient database that allows further analysis.  For example, for CC-L, it is impossible to calculate the distribution value of a given time because of the incomplete cells which have not a CC-L observed. If these cells are discarded, then there will be a bias in the distribution and its mean value. However, the cumulative distribution can be estimated by counting the number of divisions over the total number of cells (even the incomplete ones). This provides an approximate cumulative distribution from where the median can be obtained despite the incomplete cells (see Fig. S2C).
Estimation of spearman correlation coefficients using a Bootstrap strategy.
To calculate the Spearman correlation coefficients reducing the bias induced by generation number and between cells of different colonies, we applied a bootstrap strategy 1 . Briefly, we calculated the residuals of the cell cycle variables for each cell with respect to the generational mean within each colony. To avoid oversampling cells of later generations, we randomly selected an even number of pairs of sister cells, mother-daughter or cousin cells among the different generations for the different colonies analyzed, and calculated a preliminary Spearman coefficient for that iteration.
We repeated the random sampling with replacement 1000 times and calculated the median Spearman coefficient together with a 95% confidence interval (see Fig. S3 B and C). This analysis allows to obtain the p-values for a Spearman rho greater than 0 by calculating the number of cases in which the Spearman rho of a given iteration was ≤ 0 and dividing it by the total number of iterations.

Grasberger Procaccia algorithm
The Grassberger-Procaccia algorithm was first applied to study lineage inheritance by Sandler et al 2 , where is explained in detail. The general aim of the analysis is to detect if there is a deterministic relationship between any groups of different variables. For any kind of data from 2 independent and uncorrelated variables, when plotting one against the other, data will be randomly distributed in the 2-dimensional plane, occupying all the space with some nearly constant density.
Conversely, if there exist a relationship and one variable determines the other, data will shape a curve in the plane. To make the distinction between a curve and a uniform density one can imagine an expanding circle centered on one data point and just count the number of points inside this circle as a function of the radio of the ball r. If the variables are independent, this number will increase proportionally to the area of the circle, to r^2. But in the case of a deterministic relation, this number will be proportional to the length of the curve locked inside the ball, this means, with r^1.
The final step of the algorithm is to fit the number of points inside the circle, as a function of the circle radio, to extract the radius exponent and see if it is equal (independent) or lower than 2 (deterministic).
This same idea can be extended to 3 dimensions or more. For example, in 3 dimensions for 3 independent variables, data points will be distributed in all the space and points inside a sphere will increase as r^3. However, there is another case where two variables, instead of one, determine the other. In this case, data points will shape a plane in a 3-dimensional space, making the number of points inside the ball increase proportionally as the plane locked area, r^2. The final step now is the same as before, where if the radius exponent is equal to 3, the 3 variables are independent, but if it is lower there is a deterministic factor relating them.
In the case of more dimensions (more variables), the conclusion is made just by looking if the radius exponent, d, is equal or lower than the embedding dimension (the number of variables), DE. To study lineage inheritance, the algorithm is applied to 2, 3 and 4 variables ascending in lineage progenitor generations. For example, we take data of G1-L of one cell and G1-L of its mother cell as the two variables to study if there is an inheritance factor that makes G1-L of the mother cell to determine G1-L of daughter cell. Another possibility is that the determination is made by both mother and grandmother cells, so we extended the analysis with the G1-L of the grandmother cell and applied the algorithm again. This process is extended back in the lineage until the results show that either d reaches a fixed value independent of r, or it continues increasing equally to DE. In this second case, the conclusion is that G1-L of daughter cells is independent of cell lineage or that we must look even further back. On the other hand, if d reaches a fixed value, this value is near to the number of generations that determines the daughter cell's G1-L. For example, if d=2, then it is determined by the mother and grandmother cells.
The Grassberger-Procaccia plots (Fig. S4) shows d for the different values of DE, where it can be seen that d is always lower than DE. However, it is important to quantify how much the difference between d and D has to be to conclude that there is a determination. For this purpose, we developed a statistical analysis which consists on randomly mixing all lineages and then applying the algorithm to this shuffled data and repeat the process several times. This procedure provides a distribution of d values whose mean value, compared to DE, indicates if there is no external determination. This distribution corresponds to free lineage relations data and represents our null hypothesis, which we can compare to the real data. We decided to take the confidence intervals of these shuffled data distributions instead of the standard deviation because of the asymmetry of the distributions obtained.
Finally, we performed a positive and a negative control of the Grassberger Procaccia algorithm (Fig   S4 C). We applied the algorithm to the data of each cell´s first position in the video and also to the mean velocity of cells. As cells "born" from its mother in a close position, this variable works as one determined from mother and the analysis confirms it (positive control). On the other hand, we expected mean velocity to be a purely stochastic variable, which was also confirmed (negative control).