Unlike pluripotent cells, which generate only embryonic tissues, totipotent cells can generate a full organism, including extra-embryonic tissues. A rare population of cells resembling 2-cell-stage embryos arises in pluripotent embryonic stem (ES) cell cultures. These 2-cell-like cells display molecular features of totipotency and broader developmental plasticity. However, their specific nature and the process through which they arise remain outstanding questions. Here we identified intermediate cellular states and molecular determinants during the emergence of 2-cell-like cells. By deploying a quantitative single-cell expression approach, we identified an intermediate population characterized by expression of the transcription factor ZSCAN4 as a precursor of 2-cell-like cells. By using a small interfering RNA (siRNA) screen, we identified epigenetic regulators of 2-cell-like cell emergence, including the non-canonical PRC1 complex PRC1.6 and the EP400–TIP60 complex. Our data shed light on the mechanisms that underlie exit from the ES cell state toward the formation of early-embryonic-like cells in culture and identify key epigenetic pathways that promote this transition.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We thank A. Smith (Wellcome Trust/MRC Stem Cell Institute) for providing the knock-in REX1 reporter cell line, M. Ko (Keio University) for the Zscan4c promoter plasmid, R. Enriquez-Gasca for providing a classification of MERVLs before publication, D. Reinberg (New York University Langone School of Medicine) for the rabbit antibody to PRDM14, A. Ettinger for time-lapse analysis, C. Ebel, D. Pich, T. Hofer and W. Hammerschmidt for help and access to FACS, the INGESTEM infrastructure for access to the IGBMC high-throughput high-content screening workstation, C. Thibault, F. Recillas-Targa and M. Zurita-Ortega for helpful discussions and A. Burton for critical reading of the manuscript. M.-E.T.-P. acknowledges funding from EpiGeneSys NoE, ERC-Stg ‘NuclearPotency’ (280840), the EMBO Young Investigator Programme, the Fondation Schlumberger pour l’Education et la Recherche (2016-Torres-Padilla) and the Helmholtz Association. J.M.V. acknowledges funding from the Max Planck Society and Epigenesys NoE. T.I. was a recipient of postdoctoral fellowships from the Uehara Memorial Foundation and the Human Frontier Science Programme (LT000015/2012-l). D.R.-T. was partially supported by a DGECI fellowship (2890/2014) from the National University of Mexico.
Integrated supplementary information
Supplementary Figure 1 Controls for the single-cell expression profiling experiments of ES and 2-cell-like cells
a, List of genes selected for the single-cell analysis classified according to their pathway or function. b, Immunofluorescence analysis using a turboGFP and an OCT4 antibody in the 2C::turboGFP cell line before and after sorting out 2-cell-like cells as indicated in Fig. 1b. Scale bar, 100 μm. c, Scatterplots of turboGFP fluorescence versus tdTomato fluorescence for feeder cells only (bottom), WT ES cells and feeder cells (middle), and the 2C::turboGFP/CAG-tdTomato reporter line with feeders (top) assayed by FACS. The presence of constitutively expressed NLS-tdTomato in the reporter line allows efficient discrimination from feeder cells. d, Normalized Ct values for the ERCC-943 spike-in comparing turboGFP– and turboGFP+ cells. Note that the turboGFP– and turboGFP+ cells analyzed in these plots come from independent sample preparation experiments but were processed on the same Biomark chip. Because the two groups both exhibit constant expression that is highly similar for ERCC-943, we conclude that they were normalized properly and that their expression levels are therefore comparable. Boxes indicate 25% and 75% quartiles, and the whiskers extend to 1.5 times the interquartile range. e, Graphic interpretation of the features contrasted across the first three principal components of the principal-component analyses shown in Figs. 1–3. f,g, Different viewpoints of the principal-component analysis of the ES and 2-cell-like single-cell dataset. This PCA was computed without the expression data from turboGFP and Zscan4. Each point corresponds to a single cell and is color-coded based on the original expression level of turboGFP (f) or Zscan4c/d/f (g) as indicated on the right. Black dots indicate no expression.
Supplementary Figure 2 Zscan4 + cells are an intermediate cellular state between the ES and 2-cell-like states
a, Accuracy of the Zscan4c::tdTomato reporter cell line used for the single-cell profiling described in Fig. 2. The graph shows the number of tdTomato+ cells that scored positive as assessed by FACS in relation to whether they belong to ES cells (no Zscan4c/d/f transcripts detected), Zscan4 + cells (Zscan4c/d/f transcripts detected) or 2-cell-like cells (Zscan4c/d/f and turboGFP transcripts detected). b,c, Principal-component projection of all datasets combined. Principal components were calculated for the aggregate of the ES, Zscan4 + and 2-cell-like datasets (Figs. 1 and 2), unlike the analyses in Figs. 2 and 3 where the Zscan4 dataset (from Fig. 2) was projected onto the principal components of the ES and 2-cell-like datasets. In b, turboGFP and Zscan4c/d/f were omitted from the calculation of the principal components. d,e, Validation of the Zscan4c::tdTomato and 2C::turboGFP cell line used for the time-lapse analysis in Fig. 2e. d, Representative immunostaining for mCherry and Zscan4 (top) and mCherry and turboGFP (bottom) from three independent cell cultures. e, Quantification of the percentage of (endogenous) ZSCAN4+ cells that also express mCherry. The reporter recapitulates endogenous expression of ZSCAN4 protein with ~92% accuracy. Error bars, s.d. Scale bar, 10 μm. f,g, Zscan4 – and Zscan4 + cells were FACS sorted based on the Zscan4::mCherry reporter and cultured for 24 h, after which the percentage of turboGFP+ cells was quantified by FACS. Shown are the means ± s.d. of four independent experiments. During the 24-h window, 4% of the Zscan4 + cell population became 2C-like cells, 63% remained Zscan4 + cells and 33% lost Zscan4 reporter expression. h, Heat maps showing ATAC–seq signal intensity over 1,911 genomic regions with different accessibility in ES and 2-cell-like cells.
Supplementary Figure 3 Gradual transcriptional changes accompany Zscan4 upregulation and precede entry to the 2-cell-like state
a, The graph combines two parameters: the line (left y axis) depicts probability density and the histogram under it (right y axis) refers to absolute frequency of occurrence. The probability density function of Zscan4c/d/f expression in ES (blue), Zscan4 + (orange) and 2-cell-like (green) cells is plotted against the normalized expression of Zscan4c/d/f (x axis) in each individual cell. These three distinct levels were classified as low, mid and high based on the histogram data, which derive from the Biomark analysis. b,c, Projection of the expression profiles of Zscan4 + cells onto the principal components of the ES and 2-cell-like cell dataset (Fig. 1d). Each dot represents a single cell and is color-coded according to whether it corresponds to an ES cell, a Zscan4 low, Zscan4 mid or Zscan4 high cell, or a 2-cell-like cell according to the legend on the right. In c, cells are colored based on their expression levels of Zscan4/c/d/f as indicated on the right. Black indicates no expression. d, Density plots for Zscan4c/d/f and MT2_Mm based on single-cell RNA-seq data39. Dotted lines represent the thresholds used to classify individual cells into ES cells, Zscan4 low, Zscan4 mid or Zscan4 high cells, and 2-cell-like cells. e, Violin plots for the MT2_Mm LTR, Zscan4c/d/f and two MERVL-driven chimeric genes in the single-cell RNA-seq dataset. f, MA plots showing significantly differentially expressed genes (red) for each transitional state analyzed from single-cell RNA-seq data. The list of differentially expressed genes for each transition is shown in Supplementary Table 9. g, Heat map showing a gradual transition in the expression profiles of cells transitioning between ES cells and 2-cell-like cells based on single-cell RNA-seq data.
a, Scatterplot showing the fluorescence intensity measurements for Oct4 and Zscan4 in individual cells as judged by immunostaining. r depicts the Pearson correlation coefficient between OCT4 and ZSCAN4 expression for each group of cells, as indicated. b,c, Validation of the Rex1::EGFP and Zscan4::tdTomato cell line by immunofluorescence. A representative single confocal section from three independent cell cultures is shown. The Rex1 knock-in construct was validated previously44. d, Density plot showing the gating parameters used for sorting the Rex1 high and Rex1 low cells in Fig. 4a. e,f, Violin and density plots showing the distribution of single-cell expression for Rex1 and Nanog. Note that in these plots ES cells were further classified into two groups according to whether they express high or low levels of Rex1, which highlights naive versus primed pluripotent states, as confirmed also by the abundance of Nanog transcripts in the same cells. g, Percentage of OCT4+, EGFP+ and ZSCAN4+ cells 48 h after transfection with siRNA for Oct4 or the scrambled control. Data shown are the means ± s.d. for three independent cell cultures. h, Percentage of EGFP+, ZSCAN4+ and 2-cell-like cells after transfection with Oct4, Nanog, Sox2 or Rex1 siRNA as compared to p150 siRNA and to the negative controls (NT and Neg). Transfection and analysis were performed as described in the Methods for the RNAi screen. Shown are the means ± s.d. from triplicate cell cultures. i, RT–qPCR analysis of MERVL and Zscan4 in the 2C::EGFP reporter cell line after transfection with the indicated siRNAs. Shown are the mean values ± s.d. of two independent cell cultures.
Supplementary Figure 5 Sequential gene expression changes during the transition to the 2-cell-like state
a, Violin plots showing the distribution of expression levels of individual cells for the indicated genes. Higher values correspond to higher expression levels, and a Ct value of 0 indicates that no amplification was detected. The median is indicated by a square. b, Schematic of significantly and differentially expressed genes related to germline development between individual stages of the transition from the ES to the 2-cell-like state. Changes were considered significant if they exhibited at least 2-fold changes across cells between individual states and P < 0.05 (Mann–Whitney U test). The arrow indicates the direction (up or down) of the changes in gene expression.
a, Screening was based on nuclear segmentation, following DAPI staining, for which a representative image is shown. Nuclei were segmented based on DAPI intensity, and only nuclei that met the quality control were used for further analysis (blue outlines). Scale bars, 100 μm (left) and 5 μm (right). b, Box-and-whisker plots for the negative (non-transfected cells (NT; n = 45 wells) and negative-control-siRNA-transfected cells (neg; n = 270 wells)) and positive (p150-siRNA-transfected cells (p150; n = 270 wells)) controls from the primary screen. The percentage of EGFP+, ZSCAN4+ and OCT4+ cells was determined for each cell culture well. Two-cell-like cells were defined as cells fitting all three criteria, namely: positive for EGFP and ZSCAN4 but negative for OCT4. On the graphs, boxes indicate 25% and 75% quartiles, and the whiskers extend to 1.5 times the interquartile range. Outlier wells are not shown. c, Complete results from the primary screen depicting the z scores of the 1,167 targets (mean z score of triplicate wells for each target) relative to the negative controls. The positive control p150 is depicted in red. d, Analysis of cell toxicity, as inferred by cell number, elicited upon treatment with siRNA for the top 50 hits. The heat map displays the top 50 hits ranked by their ability to induce 2-cell-like cells (left) and the cell number per well upon siRNA transfection (right). Note that, because all siRNAs were transfected using the same number of cells, changes in cell number indicate cell death and/or cell growth defects resulting from RNAi.
Supplementary Figure 7 Validation of the hits obtained in the primary screen by a secondary screen and identification of new hits
a, Box-and-whisker plots representing the results of the secondary screen for the non-transfected cells (NT), scrambled-siRNA-transfected cells (Neg) and cells transfected with siRNA for p150 (positive control). n indicates the number of cell culture wells analyzed. Two-cell-like cells are defined as cells positive for EGFP and ZSCAN4 but negative for OCT4. Boxes indicate 25% and 75% quartiles, and the whiskers extend to 1.5 times the interquartile range. Outlier wells are not shown. The mean ± s.d. of two technical replicates is shown. b, Comparison of the primary and secondary screen results for three selected hits (Ep400, Dmap1 and Ring1b). Fold changes relative to the negative control are indicated. c, Validation of individual siRNAs from the siRNA pool for the top 50 hits. The top 50 hits from the primary screen were selected for validation in the secondary screen by transfecting four different individual siRNAs, and the effect of each individual siRNA on 2-cell-like cell emergence was assessed. The number of validated hits (z score > 2 as compared to the negative control) by 4, 3, 2 or 1 siRNA is depicted. Only one hit (Dnmt3b) from the primary screen was not validated by any of the four individual siRNAs. d, Representative random, inverted dynamics merged fields of view from the secondary screen for the indicated siRNAs as compared to the negative and positive controls. Scale bar, 500 μm. e, Percentage of EGFP+, OCT4+ and ZSCAN4+ cells and of 2-cell-like cells obtained in the secondary screen for Mga, Max, Rybp and Daxx as compared to the negative (NT and Neg) and positive (p150) controls. Mean values ± s.d. derived from triplicate cell cultures are shown.
Supplementary Figure 8 Gene expression dynamics of the novel regulators of the 2-cell-like state in the preimplantation mouse embryo and 2-cell-like cells
a, Heat map showing the changes in mRNA levels for the top 50 candidates in endogenous, p60-knockdown-induced and p150-knockdown-induced 2-cell-like cells. Log fold changes were calculated based on bulk RNA-seq data27 and are color-coded relative to ES cells. Genes are ranked according to differential expression in endogenous 2-cell-like cells. b, Heat map showing unsupervised clustering of the relative expression levels of the top 50 candidates during early mouse development (zygote, early, mid and late 2-cell, 4-cell, 8-cell, 16-cell stages, and early, mid and late blastocyst stages). Protein names are color-coded according to the complex to which they belong. Expression data are derived from ref.66. Notably, while most spliceosome proteins were enriched upon development to the morula stage, PRC1 subunits peaked in expression at different time points during the 2-cell stage, further suggesting that parallel pathways act in concert to restrict totipotent/2-cell/2-cell-like identity.
Supplementary Figure 9 Characterization of 2-cell-like cell protein markers in 2-cell-like cells induced upon siRNA of the hits identified in the siRNA screen
Characterization of 2-cell-like cell markers in 2-cell-like cells induced upon siRNA targeting of the hits identified in the siRNA screens. a,b, Immunostaining with an antibody against the protein from MERVL reveals expression of endogenous MERVL loci in cells expressing the 2C::EGFP reporter in controls (a) as well as in 2-cell-like cells induced upon siRNA targeting of the indicated chromatin modifiers (b). c, Immunostaining for ZSCAN4 and EGFP in the 2C::EGFP reporter ES cell line. Representative images from at least three independent cell cultures performed on different days are shown. Scale bars, 10 μm.
Supplementary Figure 10 Characterization of 2-cell-like cell transcriptional markers in 2-cell-like cells induced upon siRNA of the hits identified in the siRNA screen
a, Expression of 2-cell-like genes upon siRNA targeting of the identified hits. RT–qPCR analysis was performed for repetitive elements (top) and chimeric LTR transcripts (bottom) upon transfection with the indicated siRNAs. Shown are the mean values ± s.d. from four independent cell cultures performed on two different days. b, FACS analysis of the 2C::turboGFP and Zscan4::mCherry cell line after transfection with the indicated siRNAs individually or in pairs. Fold changes in turboGFP+, mCherry+ and double-positive (2-cell-like) cells are shown. The mean ± s.d. of the indicated number of cell cultures is shown.
a, Schematic of PRC1 complexes identified in mammals. PRC1 complexes are divided into cPRC1 (canonical PRC1) (left) and ncPRC1 (non-canonical PRC1) (right). RING1a and RING1b interact with distinct PCGF proteins. PCGF2 and PCGF4 are present only in canonical PRC1 complexes (PRC1.2 and PRC1.4, respectively). PCGF1, PCGF3, PCGF5 and PCGF6 proteins associate with RYBP or YAF2 to form the non-canonical PRC1 complexes (PRC1.1, PRC1.3, PRC1.5 and PRC1.6, respectively). b, Two-cell-like cell induction after transfection with siRNAs for all PRC1 components. Results for the siRNA pools identified in the primary or secondary screen are shown as a z score. c,g, RT–qPCR analysis was performed to measure siRNA efficiency for Yaf2 and Ring1a (c) or Eed and Ezh2 (g) after transfection with the corresponding siRNAs as compared to scrambled siRNA in the 2C::EGFP reporter cell line. Mean values ± s.d. from four independent cell cultures performed on two different days are shown. d,h, Quantification of EGFP+ cells (%) by FACS after transfection with the indicated siRNAs. Shown are the means ± s.d. of the indicated number of cell cultures. e,f,i,j, Expression of 2-cell-like genes upon treatment with the indicated siRNAs. RT–qPCR analysis was performed of MERVL (e,i) and Zscan4 (f,j) expression in the 2C::EGFP reporter cell line after transfection with the indicated siRNA. The mean values ± s.d. from four independent cell cultures performed on two different days are shown.
a, Immunostaining for OCT4, EGFP and H2AK119Ub in the 2C::EGFP reporter ES cell line depicting endogenous (Neg siRNA) as well as p60- and p150-knockdown-induced 2-cell-like cells. Representative single-section confocal images of at least three independent cell cultures are shown. Dashed white lines demarcate EGFP+ cells. Scale bar, 20 μm. b, H2AK119Ub levels in endogenous 2-cell-like cells and in 2-cell-like cells induced upon transfection with the indicated siRNAs. EGFP (top), OCT4 (middle) and H2AK119Ub (bottom) fluorescence was quantified in ES cells (blue, EGFP negative) and in 2-cell-like cells (red, EGFP positive). Each dot represents a single cell. Shown are raw values obtained in one representative experiment of three independent biological replicates performed on different days. c, Quantification of EGFP+ cells (fold change as compared to negative control Neg) by FACS after transfection with the indicated siRNAs in combination with Rex1 (left) or Nanog (right) siRNA. The mean ± s.d. of the indicated number of cell cultures is shown. d, RT–qPCR analysis of MERVL, Zscan4 and Gm6763 in the 2C::EGFP reporter cell line after transfection with the indicated siRNAs and/or overexpression of Nanog (OE Nanog). Expression of Pcgf6 (lower left), Dmap1 (lower middle) and Nanog (lower right) is shown as controls for siRNA and overexpression efficiency. The mean ± s.d. of the indicated number of cell cultures performed on different days is shown.
Supplementary Figures 1–12, Supplementary Table 10 and Supplementary Note
List of TaqMan assays.
Raw data from the Biomark expression analysis.
Significantly differentially expressed genes between transitional states, based on the Biomark expression data.
List of siRNA targets used in the library.
List of siRNAs used for validation and subsequent experiments.
List of all primers used in this study.
Results from primary screening.
Results from secondary screening.
Differentially expressed genes across each transitional state.
Embryonic stem cells transitioning to the 2-cell-like state, through an intermediate Zscan4 + state—example 1. Example video for the time-lapse experiments shown in Fig. 2. The destabilized 2C::tbGFP reporter is shown in green, the destabilized ZSCAN4::mCherry reporter is shown in red and the constitutively expressed H2B-iRFP marking all nuclei is shown in cyan.
Embryonic stem cells transitioning to the 2-cell-like state, through an intermediate Zscan4 + state—example 2. Example video for the time-lapse experiments shown in Fig. 2. The destabilized 2C::tbGFP reporter is shown in green, the destabilized ZSCAN4::mCherry reporter is shown in red and the constitutively expressed H2B-iRFP marking all nuclei is shown in cyan.
Embryonic stem cells transitioning to the 2-cell-like state, through an intermediate Zscan4 + state—example 3. Example video for the time-lapse experiments shown in Fig. 2. The destabilized 2C::tbGFP reporter is shown in green, the destabilized ZSCAN4::mCherry reporter is shown in red and the constitutively expressed H2B-iRFP marking all nuclei is shown in cyan.