Cell fate roadmap of human primed-to-naive transition reveals preimplantation cell lineage signatures

Human naive pluripotent stem cells offer a unique window into early embryogenesis studies. Recent studies have reported several strategies to obtain cells in the naive state. However, cell fate transitions and the underlying mechanisms remain poorly understood. Here, by a dual fluorescent reporter system, we depict the cell fate dynamics from primed state toward naive pluripotency with ALPG activation followed by the activation of OCT4-distal enhancer. Integration of transcription profiles and the chromatin accessibility landscape reveals the appearance of primitive endoderm and trophectoderm signatures in the transitioning subpopulations, with the capacities for derivation of extra-embryonic endoderm and trophoblast stem cell lines, respectively. Furthermore, despite different fluorescent dynamics, all transitioning intermediates are capable of reaching the naive state with prolonged induction, showing their developmental plasticity and potential. Overall, our study describes a global cell roadmap toward naive pluripotency and provides hints for embryo modeling-related studies.

H uman naive pluripotent stem cells (PSCs) capture the ground pluripotent state corresponding to the preimplantation epiblast [1][2][3][4] and exhibit more plasticity and unbiased differentiation potential than conventional PSCs in the primed pluripotent state [5][6][7] , thus providing an inexhaustible model for developmental studies and therapeutic applications. Important breakthroughs in culture system optimization have allowed the development of several strategies for achieving naive pluripotency. Naive PSCs can be generated by direct derivation from preimplantation embryos, reprogramming of somatic cells, or transitioning of conventional PSCs in the primed state [8][9][10][11][12][13][14][15][16][17][18][19][20] . Several molecular events during the establishment of naive pluripotency have been reported: The surface marker SSEA4 disappears during the derivation of naive PSCs directly from human preimplantation blastocysts or from PSCs in the primed state 9 ; The activity of OCT4 enhancer swiches from proximal enhancer (PE) to distal enhancer during the transition of cells from primed state to naive state 12 ; The expression of ALPG (also known as ALPPL2) is acquired during the establishment of naive pluripotency from cells at either primed state or somatic state 21 . Moreover, a recent study depicted a high-resolution roadmap for naive reprogramming process from somatic cells, showing the molecular reprogramming trajectories with trophectoderm (TE) lineage-specific signatures 22 . Although the molecular criteria for distinguishing naive and primed pluripotency have been systematically defined 9,23-27 , neither detailed molecular events nor subpopulation dynamics have been described during the primedto-naive transition process.
In this study, to precisely monitor naive pluripotency establishment from cells in the primed state, we constructed a dual fluorescent reporter system composed of ALPG-promoter-RFP and OCT4-ΔPE-GFP, and traced the fluorescence dynamics during the process. Transcriptional profiling by both bulk and single cell RNA-seq (scRNA-seq) analyses shows a transitioning trajectory toward naive pluripotency, with ALPG activation followed by the activation of OCT4-distal enhancer. Integrative analysis with chromatin accessibility dynamics (CAD) indicates that primitive endoderm (PrE) signatures and TE signatures emerge successively in the transitioning subpopulations during the primed-to-naive induction process, and the intermediates with strong PrE or TE signatures enable the generation of extraembryonic endoderm cell lines or trophoblast stem cell (TSC) lines, respectively. Furthermore, despite different fluorescent dynamics, all transitioning intermediate cells are capable of reaching a naive pluripotent state with prolonged induction by 5iLAF culture system, showing their developmental plasticity and potential. Overall, our study describes a high-resolution cell roadmap toward naive pluripotency, and provides valuable sources for human blastocyst modeling and early embryogenesis studies.

Results
Dual reporter system to monitor fluorescent dynamics during the primed-to-naive transition process. To precisely investigate the dynamics during the primed-to-naive transition, we constructed a dual fluorescent reporter system. One was ALPGpromoter-RFP (hereafter mentioned as RFP), in which RFP was fused to the DNA sequences 3 kb upstream of ALPG transcription start site (TSS) 21 ; the other was OCT4-ΔPE-GFP (hereafter mentioned as GFP), in which OCT4 PE element was deleted and OCT4 expression was primarily dependent on its distal enhancer activity as previousely reported 12 . Then, we genetically engineered primed embryonic stem cells (pESCs) with this reporter system and performed the primed-to-naive resetting under 5iLAF 14 culture conditions (Fig. 1a). Small colonies were observed on day 6 of induction, when the expression of SSEA4 (primed state-specific surface marker) was dramatically decreased and a small portion of cells started to express SUSD2, a naive state-specific surface marker reported recently 28 (Fig. 1b, c). Dome-shaped colonies resembling naive clones emerged on day 8, and the subpopulation of cells expressing RFP was greatly increased, with the proportion similar to the SUSD2 + cells (Fig. 1b, c). At this time, almost all the cells turned SSEA4negative. GFP expression was not synchronized with RFP expression. The GFP + subpopulation was not observable until day 10. The proportion of RFP + GFP + cells increased along the induction process, and naive-like colonies were picked to establish naive ESC (nESC) lines on day 14 (Fig. 1b, c).
Transcriptional profiling of intermediates during the primedto-naive transition. Next, we collected the transitioning intermediates with distinct fluorescence dynamics through the primed-to-naive process, and subjected them to bulk RNA-seq (Supplementary Data 1). Principal component analysis (PCA) indicated that RFP + cells collected since day 8 clustered closely with nESCs and were separated from pESCs (Fig. 1d, e). When integrated with human embryo datasets 29 , RFP + cells and nESCs clustered closely with ICM cells, while the pESCs showed great similarities in transcriptome to reported blastocyst-derived hESCs at P0 and P10 ( Supplementary Fig. 1a).
Toward naive pluripotency, epiblast-specific markers were upregulated starting on day 6 ( Fig. 1f; Supplementary Fig. 1b), whereas primed state-specific genes were gradually downregulated ( Supplementary Fig. 1b, Supplementary Data 2). Next, we characterized expression clusters based on the dynamics of gene expression (FPKM ≥ 5 in at least one sample) and identified six major expression patterns (Supplementary Fig. 1c-h, Supplementary Data 1). Primed pluripotency-associated genes participating in stem cell maintenance and embryonic morphogenesis were rapidly downregulated, including OTX2, ZIC2 and ZIC3 ( Supplementary Fig. 1c), while naive pluripotency-related genes were activated in three categories: One group was enriched with genes essential for naive pluripotency regulation and mRNA processing, such as NANOG, TFAP2C, LIN28B and DPPA3, which showed increased expression from day 6 ( Supplementary  Fig. 1d); one group enriched with genes related to embryonic development and protein modification, such as DNMT3L and NODAL, showed peak expression on day 8 ( Supplementary  Fig. 1e); the third group was enriched with genes involved in oxidative phosphorylation metabolism, as well as core naive pluripotency markers ALPG and UTF1, the expression of which peaked on day 10 ( Supplementary Fig. 1f). On day 6, SSEA4 − cells exhibited unique characteristics: one cluster with the transient upregulation of genes associated with PrE development (GATA4, GATA6), extracellular matrix organization (KRT8, COL1A1), and embryonic morphogenesis (HAND1, HAND2, HOXB4) ( Supplementary Fig. 1g), and another cluster enriched with genes related to shared pluripotency markers, including POU5F1, SOX2, SALL4, etc., showing transient waves of downregulated expression on day 6 ( Supplementary Fig. 1h). Thus, the bulk RNA-seq analysis uncovered different gene expression patterns during the primed-to-naive transition process.
Charaterization of cell populations during the primed-to-naive transition at single-cell resolution. RFP + cells on day 8 showed similarities in transcription with RFP + GFP − or RFP + GFP + cells on days 10, 12, and 14, as well as nESCs (Fig. 1d, e; Supplementary Fig. 1a). Multiple naive pluripotency-related genes, such as ALPG, DNMT3L, DPPA3, KLF17 and SUSD2, showed robust upregulation during the primed-to-naive transition, and their expression levels reached a state comparable to that of naive pluripotency from day 8 (Fig. 1f). To further characterize the cell populations during the transition at single-cell resolution, we subjected cells harvested on days 6, 8, 10, and 14 during the transition process as well as human nESCs and pESCs to dropletbased 10× Genomics scRNA-seq, which generated a dataset of 38,036 cells with 16,929 common genes. The force-directed layout (FDL) 30 shows the transitioning trajectory of intermediates and the relationships between single cells during the primed-to-naive transition (Fig. 2a, b). We also confirmed these findings by utilizing multiple dimensionality reduction methods to visualize cell embeddings in a low-dimensional space such as uniform manifold approximation and projection (UMAP) ( Supplementary   Fig. 2a-b), and t-distributed stochastic neighbor embedding (tSNE) ( Supplementary Fig. 2c-d). Together with the expression patterns of known marker genes for shared pluripotency (POU5F1, PRDM14, NANOG, LEFTY1, TDGF1), primed pluripotency (ZIC2, SOX11) and naive pluripotency (DNMT3L, DPPA3, ALPG, DPPA5, FGF4) ( Fig. 2c-e, Supplementary  Fig. 3a-b), we identified and characterized 15 clusters during the transition process by performing unsupervised clustering analysis (Fig. 2f, Supplementary Fig. 3c-d). According to the cell proportions in different libraries or clusters, we observed that the populations with naive pluripotency signatures were greatly increased from day 8 (Fig. 2g, Supplementary Fig. 3e) Fig. 3f), which is consistent with our bulk RNA-seq analysis results.
Chromatin accessibility dynamics during the primed-to-naive transition. Next, we tried to illustrate the chromatin accessibility landscape during the primed-to-naive transition process by transposase-accessible chromatin sequencing (ATAC-seq) of the intermediates according to their fluorescence dynamics (Supplementary Data 3-4). Conversion of OCT4 enhancer activity from the PE to the distal enhancer was observed during the primed-tonaive transition process ( Supplementary Fig. 4a). Analyses of the repeatability among replicates, peak enrichment regions, and peak size distributions indicated the acquisition of ATAC-seq datasets with high-quality (Supplementary Fig. 4b-d).
PCA showed that all RFP + cells correlated well with nESCs at different time points (Fig. 3a), indicating a similar chromatin accessibility state among cells with ALPG activation from day 8. Interestingly, cells that remained in the RFP − state clustered closely, apart from the RFP + cells ( Fig. 3a; Supplementary  Fig. 4b), suggesting cells with and without ALPG activation may have distinct chromatin landscapes and cell fates. Then we focused on the chromatin landscape differences between the two conditions: cells turning RFP-positive and cells remaining RFPnegative during the primed-to-naïve transition process. However, CAD charting revealed pattern similarities between the two conditions. ( Supplementary Fig. 4e). The loci of naive pluripotency-related genes, such as DNMT3L and NANOG, were opened not only in cells moving toward naive pluripotency but also in intermediates that remained RFP-negative ( Fig. 3b-g), suggesting that these RFPcells may still possess the potential to reach the naive pluripotent state. Furthermore, the loci of both TE markers (GATA3, KRT7 etc.) and PrE markers (GATA4, GATA6, FGFR2, etc.) are specifically in the CO category of CAD for intermediates remaining RFP-negative (Fig. 3c, e, g; Supplementary Fig. 4g). GO analysis of the genes within the CO categories showed that these genes are involved in embryonic epithelial tube formation and stem cell population maintenance in CADs composed of cells moving toward naive pluripotency and cells remaining RFP-negative, respectively ( Supplementary Fig. 4f).
We also performed motif enrichment analysis among the ATAC-seq datasets. During the primed-to-naive transition process, the motifs of naive TFs (POU5F1-SOX2-TCF-NANOG, SOX2/15, and KLF4/5) were significantly enriched in RFP + cells (Fig. 3h). The motifs of TE-specific TFs (TEAD family and GATA2/3) and TFs highly expressed in TSCs (JUN, JUNB, ATF3, FOS and FOSL2 31 ) were enriched in RFP − cells, which coincided with their corresponding TF expression (Fig. 3h). The TFAP2C motif was enriched in both RFP + and RFP − cells from day 8 ( Fig. 3h), consistent with the activation of TFAP2C gene in both naive PSCs and TSCs. Thus, the analyses of the ATAC-seq datasets revealed clear TE signatures in RFP − cells during the primed-to-naive transition process.
TE signatures during the primed-to-naive transition. We further characterized the TE signatures of RFP − intermediates by RNA-seq analysis. Scoring analyses were performed to examine different embryonic signatures of the intermediate cells along the transition process ( Fig. 4a; Supplementary Data 2). As expected, during the primed-to-naïve transition, signatures for the naive state and EPI were upregulated and maintained in RFP + cells, while signatures for primed state was only enriched in pESCs (Fig. 4a). We also observed great enrichment of TE/TSC signatures in the RFP − intermediates (Fig. 4a). In addition, when our bulk RNA-seq datasets were integrated with published TSC datasets 32 , RFP − GFP − cells on day 14 clustered closely with TSCs derived from nESCs or blastocysts ( Fig. 4b; Supplementary  Fig. 5a). Representative TE markers, such as GATA2/3, KRT7 and TP63, were specifically expressed in RFP − intermediates ( Supplementary Fig. 5b), confirming their similarities with TE/ TSCs in transcriptional programs. scRNA-seq analysis also showed that a subpopulation of RFPcells specifically expressed TE-associated genes ( Fig. 4c; Supplementary Fig. 5c). Thus, together with the results of ATAC-seq analyses, we confirmed the appearance of a TE-like subpopulation with TE signatures within RFP − intermediates.

Derivation of TSCs from transitioning intermediate cells.
Next, we speculated that this TE-like subpopulation could give rise to TSCs in vitro. We collected RFP − intermediates on days 8 and 14 by flow cytometry and cultured them in TSC medium as previously reported 33 (Fig. 4d). As TSCs can be generated from naive PSCs 7,32 , intermediate RFP + cells were also subjected to TSC induction as controls. As expected, both RFP − and RFP + cells on day 8 or 14 successfully generated TSCs with representative colony morphologies and high expression of TSC markers, such as TP63 and KRT7 ( Fig. 4e; Supplementary Fig. 5d-e), while pESCs failed to establish stable TSC lines, consistent with the results of previous studies 32 ( Supplementary Fig. 5f). Transcriptional profiling revealed high similarities among TSCs derived from RFP − and RFP + intermediates on day 8 of the primed-to-naive resetting, as well as TSCs described in published reports 32 ( Fig. 4f; Supplementary Data 5), further suggesting the capacity of these transitioning intermediates to generate TSCs. Interestingly, RFP − cells and RFP + cells may adopt different routes to establish TSCs during the TSC induction process, both with activation of TSCrelated gene expression programs eventually (Fig. 4f, g). In addition, we also observed that RFPintermediates show accelerated trophoblast induction at early time points compared to RFP + cells ( Supplementary Fig. 5g).
We further assessed the differentiation potential of the TSCs derived from intermediates on day 8. These cells can differentiate into extravillous trophoblast (EVT) cells with specific HLA-G expression and syncytiotrophoblast (ST) cells with SDC1 and CGB expression when cultured under specific conditions 33 (Fig. 4h). When subcutaneously injected into NOD-SCID mice, these TSCs generate lesions, as indicated by TP63 + cells, SDC1 + ST-like cells and HLA-G + EVT-like cells observed by immunostaining (Fig. 4i, j). Moreover, human chorionic gonadotropin (hCG) was detected in the supernatant of ST cell cultures, urine and serum of the host mice injected with TSCs, as determined by pregnancy test sticks ( Fig. 4k; Supplementary Fig. 5h). Taken together, the findings indicate that TSCs with functional differentiation capacities can be established from RFP − cells, which exhibit TE signatures during the primed-to-naive transition.
We then subjected the intermediates on day 6 and 8 of the transition process, as well as nESCs and pESCs, to endoderm differentiation respectively 34 (Fig. 5d). All the derived endoderm cells could be maintained and expanded as stable cell lines with typical morphologies of embryonic endoderm cells and strong expression of endoderm marker gene GATA6 (Fig. 5e). However, these cell lines represent different characteristics corresponding to different embryonic development stages as reported 34 (Fig. 5f).The endoderm cells derived from pESC (named as pESC_End cells) showed the definitive endoderm (DE) signatures with high expression of a series of DE genes (Fig. 5g, Supplementary Fig. 6c), while the endoderm cell lines derived from SSEA4 + and SSEA4 − cells on day 6, and RFP − cells on day 8 of the primed-to-naive transition process, as well as nESC exhibit strong PrE signatures, with enriched expression of PrErelated genes (Fig. 5g, Supplementary Fig. 6c). Thus, the transitioning intermediates of the primed-to-naive transition, as well as nESCs, possess the potential to differentiate into endoderm cells with PrE signatures.
Prolonged induction of RFPintermediates transitioning toward naive pluripotency. As we observed a tendency of intermediate cells transitioning toward naive pluripotency by FDL (Fig. 2a), as well as the gradual opening of naïve state-related gene loci in RFP − intermediates by CAD charting (Fig. 3c), we wondered whether the RFP − intermediates can also reach naive state similar to the RFP + cells by prolonged induction in 5iLAF medium. Although day 8-RFPcells remained fluorescencenegative for the first 6 days of prolonged 5iLAF culture, most cells became RFP + GFP + after 9 days of prolonged induction in 5iLAF medium (Fig. 6a). Meanwhile, day 8-RFP + cells converted into RFP + GFP + cells more quickly than day 8-RFP − cells (Fig. 6a). These results were also validated by subcloning assays, which showed higher efficiency of naive colony formation in day 8-RFP + cells than day 8-RFP − cells after 8 days of prolonged 5iLAF culture (Fig. 6b). Similar results could also be observed in prolonged 5iLAF culture of RFP + and RFP − cells on day 14 (Fig. 6c). These RFP − intermediates finally reached a state resembling nESCs, as indicated by their transcription profiling, after prolonged induction in 5iLAF medium (Fig. 6d, e; Supplementary Data 5). Collectively, these results demonstrate the naive state-induction potential and developmental plasticity of RFP − intermediates in the primed-to-naive transition.
Cell fate roadmap of the primed-to-naive process. Finally, we tried to describe the global cell fate trajectories of the primed-tonaive transition process. Together with the results above, We characterized and identified transitioning subpopulations with PrE signatures (transitioning subpopulations 4 and 5), TE signatures (transitioning subpopulations 6 and 7), and naive signatures according to the corresponding gene signatures and marker genes expression ( Fig. 7a; Supplementary Fig. 7a), such as PITX1 and GATA6 for PrE signatures, GATA2 and GATA3 for TE signatures, and NANOG and ALPG for naive signatures (Fig. 7b). Moreover, knockdown of some of these branchingdependent transcription factors dramatically reduced the proportion of corresponding subpopulations, compared to the control group (Fig. 7c).
We also performed partition-based graph abstraction (PAGA) 35 trajectory inference (Fig. 7d, Supplementary Fig. 7b-c) and RNA velocity 36,37 computation (Fig. 7d, Supplementary  Fig. 7d) to predict the future state of individual cells. Together with the latent time inference analysis ( Supplementary Fig. 7e-f) that can reconstruct the temporal sequence of transcriptomic events, these results indicate that while a small proportion of cells at primed state can reach the naive state directly, most cells still undergo a more complicated transition: losing their primed pluripotency first, then appearing PrE and TE signatures successively in subpopulations, and acquiring the naïve pluripotency ultimately (Fig. 7e).
In conclusion, these results showed the cell fate roadmap from primed state toward naive state, in which subpopulations with PrE and TE signatures emerge successively, with a majority of cells ultimately reach the state with naive signatures.

Discussion
In this study, we constructed a high-resolution roadmap to elucidate the cell fate transitions from the primed to naive pluripotency by a dual fluorescent reporter system via integration of transcription profiles and the chromatin accessibility landscape. Further investigation into transitioning cells with dynamic fluorescence indicated the appearance of PrE-and TE-like subpopulations with the capacity for extra-embryonic stem cell lines derivation including endoderm cell lines and TSC lines, respectively. However, the intermediate cells with either PrE or TE signatures could also acquire naive pluripotency by prolonged induction under naive conditions, strongly suggesting their developmental potential and diverse plasticity.
Using the dual fluorescent reporter system as well as SSEA4 antibodies, we collected cell populations according to their distinct fluorescence signals during the primed-to-naive process and identified several cell fate transitions by transcriptional analysis  Fig. 2b (d) and Fig. 2c (e). f-g Representative ATAC-seq tracks for the OC and CO peaks of intermediates during the primed-to-naive transition. OTX2 and SOX11, primed state-specific markers; DNMT3L and NANOG, naive state-specific markers; GATA3 and KRT7, trophectoderm (TE)-specific markers. h Motif enrichment analysis of TFs. Colors and sizes represent motif enrichment (−log (p value)) and expression values (FPKM), respectively (n = 3 biologically independent samples). Source data are provided as a Source Data file. Previous studies demonstrated the endoderm differentiation occurs in two waves during mammalian embryonic development. PrE in the preimplantation stage can predominantly give rise to extra-embryonic visceral yolk sac, and later some embryonic cells at gastrulation stage differentiate into definitive endoderm that contribute to embryonic organs. Naive PSCs are capable to differentiate into naïve extra-embryonic endoderm (nEnd) efficiently, which can be expanded as an in vitro study model for extra-embryonic PrE (hypoblast) development 34 CAV1  NPPB  TMEM52B  ANXA8  ANXA8L1  MYOF  BHLHE40  SLC40A1  ITGB4  KCNN4  ITIH5  LINC01687  MRGPRX1  DUSP5  ERVW−1  LCMT1−AS2  DLX3  KRT80  CDH5  ERVFRD−1  TRPV2  ARHGAP18  LYPD5  GRHL3  MAGEA8  NECTIN4  LIPG  STS  S100A16  CLDN4  SIGLEC6  TRIM29  ANXA1  RIPOR2  MYO7A  TNNT3  PAGE4  MIR22HG  IFI6  VGLL1  KRT7  LRP2  SP6  GATA2  GATA3  MBNL3  DUXAP9  DUXAP10  DUXAP8  GPR87  OLFML1  SFN  KRT18  KRT8  ANXA3  ARHGDIB  ABCG2  SLC27A6  P2RY6  UCA1 scaled expression ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30924-1 characterized subpopulations with PrE signatures during the primed-to-naïve transition, with the capacities to derive PrE-like endoderm cell lines. Different from endoderm stem cells differentiated from pESCs with definitive endoderm characteristics, these endoderm cells derived from the transitioning intermediates provide an in vitro culture model for extra-embryonic endoderm development studies.
During the past 5 years, diligent work and substantial advances have been made in human TSC derivation and culture condition optimization. Human TSCs can be derived from villous cytotrophoblast (CT) cells, blastocysts, and naive PSCs 7,31-33 , indicating powerful in vitro models that can be used to recapitulate human trophoblast development 31,33 . Excitingly, a recent study reported that human TSCs can even be induced directly from somatic cells, as indicated by a reprogramming roadmap analysis of transitioning somatic cells moving toward naive pluripotency 22 . In this study, we observed appearance of prominent TE-like and PrE-like subpopulations with corresponding signatures, which may mimic the postimplantation to preimplantation conversion in vitro. Moreover, we analyzed the dynamics of cell proportions with different lineage signatures during the primed-to-naive transition (Fig. 2g). The proportion of naive state-like cells was greatly upregulated on day 8 and ultimately reached 93% by the end of the transition process. The proportions of subpopulations with PrE or TE signatures increased from day 6 to day 8, then decreased sharply and were nearly undetectable by the end of the period.
Finally, the appearance of PrE-like, TE-like, and EPI-like (naive) cells on day 8 suggests the possibility that day 8-intermediates may serve as valuable sources for human blastocyst modeling and early embryogenesis studies.

Methods
Cell lines. 293 T (human embryonic kidney cells) were acquired from ATCC (CRL-3216). The human primed ESCs with H9 background (kindly provided by Haoyi Wang, Institute of Zoology, CAS) were genetically engineered with ALPG-promoter-RFP (RFP) and OCT4-△PE-GFP (GFP) to generate the dual-fluorescence reporter cell lines. Human TSC and endoderm cell lines were generated from transitioning intermediates of the primed-to-naïve induction process. All research with human cell lines in this study complied with the principles laid out in the International Society for Stem Cell Research and with ethical approval for these experiments by the Biological Research Ethics Committee of Tongji University.
The primed-to-naive transition. The generation of ALPG-promoter-RFP (RFP); OCT4-△PE-GFP (GFP) pESCs was performed as previously described 14,21 . In brief, the ALPG promoter was cloned into a pSicoR-RFP plasmid (Addgene) to replace the CMV promoter, and was transiently co-transfected with packaging plasmids into 293 T cells. After 48 h, the viral supernatants were harvested, concentrated and incubated with OCT4-△PE-GFP pESCs. For inducing the primed to naive state transition, 2-3 × 10 5 dissociated single RFP; GFP pESCs were seeded on an irradiated feeder layer in conventional ESC medium supplemented with Y-27632 (Selleck, 10 mM). The medium was then switched to 5iLAF medium on the second day and was changed every day thereafter. The intermediate cells were identified by flow cytometry analysis and collected at different time points during the primed-to-naive transition.
Bulk RNA-seq library generation and sequencing. Total RNA was isolated from cells using TRIzol (Invitrogen). To generate RNA-sequencing libraries, a KAPA stranded mRNA-Seq kit (KAPA) was used following the manufacturer's instructions. Adapters were through a TruSeq Library Prep Pooling kit (Illumina). Paired-end150 bp sequencing was further performed on a Novaseq 6000 (Illumina) at Berry Genomics Corporation.
RNA-seq data processing. RNA-seq raw reads were processed with default parameters by Trim_galore (version 0.6.6) to remove adapters and low-quality reads. Bulk RNA-seq reads were then aligned to the human genome (hg38) using STAR (STAR_2.5.2b) 39 with the default parameters except for "--outSAMattrIHstart 0", "--outSAMstrandField intronMotif", "--outFilterIntronMotifs RemoveNoncanonical", "--outFilterMismatchNmax 999", "--outFilterMismatchNoverReadLmax 0.04", "--quantMode GeneCounts", and "--twopassMode Basic" parameters. Expression levels of all Refseq genes for samples were quantified to FPKM using Stringtie (version 2.1.4) 40 . To perform differential gene expression analysis, the RNA-seq raw counts calculated by STAR were processed by DEseq2 41 , and genes with a Benjamini-Hochberg adjusted p value < 0.05 and a fold change >2 were considered DE. For principal component analysis (PCA) of the RNA-seq data, the rlog() and Fig. 4 TE signatures during the primed-to-naive transition. a EPI, TE, PrE and trophoblast stem cell (TSC) signature scores of the primed-to-naive transitioning intermediates. b PCA of the bulk RNA-seq datasets (circles) from the primed-to-naive transitioning intermediates with published RNA-seq (diamonds) datasets 32 . n ≥ 2. c Expression of GATA3 in FDL. d Experimental design for the induction of TSCs from the primed-to-naive intermediates. e TP63 and KRT7 immunostaining of TSCs derived from day 8 RFP − and day 8 RFP + cells during the primed-to-naive transition. Scale bars, 20 μm. Representative images from n = 3. f PCA of the bulk RNA-seq datasets (circles) from the transitioning intermediates-derived TSCs with published RNAseq (diamonds) datasets 32 . n ≥ 2. g Heatmap (left) and TSC score (right) showing the expression levels of representative TSC-related genes during the TSC derivation process from RFP − and RFP + transitioning intermediates on day 8. Source data are provided as a Source Data file. h HLA-G, SDC1 and CGB immunostaining of extravillous trophoblast (EVT) (upper) and syncytiotrophoblast (ST) (lower) cells, respectively. EVT and ST cells were differentiated from day 8-RFP − and day 8-RFP + cell-derived TSCs. Scale bar, 20 μm. Representative images from n = 3. i Representation of day 8-RFP − and day 8 RFP + cell-derived TSC engraftment assay by injection into NOD-SCID mice. j Immunostaining of TP63, HLA-G and SDC1 in the lesions collected from day 8-RFP − and day 8 RFP + cell-derived TSC engrafts in NOD-SCID mice. No lesions were evident in the vehicle controls. Scale bar, 20 μm. Representative images from n = 3. k Representative positive results for the hCG pregnancy test performed on urine samples, serum samples, and ST cell culture supernatant collected from day 8-RFP − cell -derived TSCs. Source data are provided as a Source Data file.
plotPCA() functions with the "returnData=T" parameter in the DEseq2 package were used to normalize the counts and compute the PCA data. PCA data were then plotted with the ggplot2 package in R (http://ggplot2.org). Pearson correlation coefficient between samples was calculated using the R function cor(), and heatmap was plotted by the pheatmap package in R. K-means clustering was performed for genes with FPKM ≥ 5 in at least one sample among the selected stages setting k = 6. For public RNA-seq datasets, we downloaded the raw data and performed de novo analysis to obtain the raw counts and FPKM of the samples. To perform the integrated PCA with our all RNA-seq samples, we first merged the raw counts and performed normalization using the vst() function in the DEseq2 package, and the batch effects of the samples were removed using removeBatchEffect function in the limma package in R, then PCA was performed with all genes by the R prcomp() function. The sample distance was calculated by the R hclust() function by the "ward.D2" method.  TMTC1  LAPTM4B  FAM174B  FAM184A  SLC39A8  TRPA1  MANEA  VANGL1  ALDH1A3  CASP3  NCOA4  NTN1  TPRG1L  ROR2  VAMP8  PERP  KCNF1  ZSWIM5  MEIS2  DOK4  FLT1  APLNR  LIX1  MEST  PRKCE  TGFB2  STXBP6  KCNMA1  NID2  ARSI  CSRP1  CRB2  NAALAD2  PITX1  FOSL2  TMEM88  ADGRA2   ATAC-seq library generation and sequencing. ATAC-seq was performed as previously described 42 . In brief, a total of 50,000 cells were washed once with 50 μl of cold PBS, centrifuged for 5 min at 500 g at 4°C, resuspended in 50 μl of lysis buffer (10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl 2 , and 0.1% (v/v) NP40 and incubated on ice for 10 min. The suspension with nuclei was then centrifuged for 5 min at 500 g at 4°C, and 50 μl of a transposition reaction mixture (10 μl of 5× TTBL, 5 μl of TTE Mix V50 and 35 μl of nuclease-free H 2 O) obtained through a TruePrep DNA Library Prep Kit V2 for Illumina (TD501-TD503, Vazyme) was added, and the mixture was incubated at 37°C for 30 min. DNA fragments were isolated using a MinElute kit (QIAGEN). ATAC-seq libraries were processed through 13 cycles of amplification with a TruePrep DNA Library Prep Kit V2 for Illumina (TD501-TD503, Vazyme) according to the manufacturer's instructions, and then, the libraries were purified using a QIAquick PCR (QIA-GEN) column. The library concentration was measured using Qubit kit according to the manufacturer's instructions. Finally, the ATAC library was sequenced on Novaseq 6000 (Illumina) at Berry Genomics Corporation.
ATAC-seq data processing. The ATAC-seq sequencing data were preprocessed with the default parameters by Trim_galore (version 0.6.6) 43 to remove adapters and low-quality reads. All the cleaned reads were aligned to the human genome assembly (hg38) using bowtie2 (version 2.4.1) 44 with the default parameters except for the following options: "-X 2000 --no-unal --very-sensitive". Reads mapping to mitochondrial DNA were discarded using the "grep -v chrM" command. Only high-quality mapped reads and concordantly aligned pairs were retained using SAMtools (view -q 30 -f 2) 45 . For downstream analysis, PCA duplicates were removed using the sambamba markdup function (version 0.7.1) 46 with "-r" parameters. Alignment BAM files were transformed into read coverage files (big-Wig format) using deepTools (version 3.5.0) 47 through the RPKM normalization method, and the hg38 blacklist regions were also removed using "--black-ListFileName" parameters. Biological replicates with high correlation were merged, and peaks were called using MACS2 (version 2.2.7.1) 48 with default options except for the following options: --nomode -f BAMPE --keep-dup all. A motif analysis was performed using HOMER (v.4.11.1) 49 "findMotifsGenome.pl" function with the "-size given" option. For PCA analysis, ATAC-seq peaks across all samples were merged into one union ATAC-seq peak set using the BEDTools 50 merge function, and ATAC-seq reads in each sample were calculated over the union ATAC-seq peak set using the deepTools multiBigwigSummary function with RPKMnormalizd bigWig files. The output matrix was then log2 transformed (log2 + 1) and used as input for the PCA. The variance of normalized ATAC-seq reads over each peak was then calculated, and PCA analysis (prcomp function in R) was performed on peaks with the highest 2000 variances across samples. PCA plots were then plotted with the ggplot2 package in R. The Pearson correlation coefficient was calculated for samples using the R function cor(), and a heatmap was plotted by the pheatmap package in R. Peak annotation of the union ATAC-seq peak set was performed by chIPseeker 51 . For the definition of the "open" or "closed" state of ATAC-seq peaks, the background regions in the genome were first randomly identified, and the combined ATAC-seq peak set regions were excluded using the BEDTools shuffle function with the "-excl" parameter. Then, the ATACseq reads in each sample were calculated over the background regions similarly to the aforementioned analysis. We calculated the false discovery rate (FDR) between the peak region matrix and the background region matrix, setting the peak threshold RPKM value to 14.22, which resulted in a 1% false discovery rate. All downstream analyses were based on this threshold value: Reads with a value below this threshold were annotated to indicate "closed" loci, while those with a value above the threshold were considered to be "opened" loci.
Single-cell RNA-seq data processing and integration. The 10× Genomics single-cell data were preprocessed using the Cell Ranger pipeline (v.4.0.0) with default parameters to generate the expression matrix. For quality control, all cutoffs were determined after investigating the distributions of each variable. Cells with a low number of expressed genes (nFeature), extremely high counts (nCount) or a high percentage of mitochondrial genes (pctMT) were discarded. The following thresholds were applied to retain cells: nFeature >2500, 1000 <nCount <100,000 and pctMT < 10. Genes not present in at least 10 cells with at least 1 read each were discarded. Ribosomal genes were also removed from downstream analysis. After Single-cell RNA-seq dimension reduction, trajectory inference and RNA velocity analysis. For the dimension reduction, PCA was performed on the scaled gene expression using the RunPCA function in Seurat package. Following that, UMAP and t-SNE were implemented on the top 24 PCs via the RunUMAP and RunTSNE functions, respectively. FDL was generated using the scanpy.tl.draw_graph function in the scanpy package using the ForceAtlas 2 layout and initialized using the UMAP coordinates.
Partition-based graph abstraction (PAGA) 35 method was utilized to perform trajectory inference with preserving the global topology of data, which is robust and qualitatively outperforms previous lineage reconstruction algorithms. The PAGA algorithm was performed using the scanpy.tl.paga function in the scanpy package (v.1.7.2) using the Seurat cell clusters as input.
Scoring of single-cell RNA-seq and bulk RNA-seq samples using primed or naive gene signatures and TE, EPI, DE, PrE and TSC signatures. Scores of the gene signatures (EPI, TE, and PE) of single-cell RNA-seq was calculated with the AddModuleScore function in Seurat. Scores of the different gene signatures in the bulk RNA-seq samples were calculated as previously described 22 . In brief, the expression range value (max -min) for each gene across all samples was first computed. Then, the scores of each gene of the gene set across all samples were computed by the formula: (gene expression − min)/(max − min), obtaining scaled gene expression ranging from 0 to 1. Finally, the sample score of the gene signatures was the mean expression of all the gene scores per sample. The primed, naive, TE, EPI, DE, and PE gene sets were obtained from the paper 53,54 , and the list of TSC marker genes was defined by the relative gene expression (FPKM) of nTSCs compared to human ESCs (naïve and primed), primed TSCs, EVT and ST cells. Genes with log2((nTSC + 1)/(hES + 1)) > 3, log2((nTSC + 1)/(pTSC + 1)) > 2.5, log2((EVT + 1)/(nTSC + 1)) < −1.5 and log2((ST + 1)/(nTSC + 1)) < −1.5 were kept as TSC markers, RNA-seq data of TSCs, EVT and ST were from this paper 32 . For the score computation in this study, we only retained the genes with FPKM ≥ 5 in at least 1 sample in the gene set. Descriptions of these gene sets can be found in Supplemental Data.
Statistical analyses. For flow cytometry analysis and immunostaining, n = 3 biologically independent replicates were included for each sample. For bulk RNAseq data of the intermediate cells during the primed-to-naive transition process, n = 2 biological replicates were obtained for each sample at each time point during the transition process, except for pESCs (n = 3), nESCs (n = 3) and RFP -GFP − cells on day 8 (n = 4). For ATAC-seq, n = 3 biological replicates were obtained for each sample at each time point except for RFP + GFP − cells on day 14 (n = 6). For 10× Genomics scRNA-seq data, libraries were generated on pESC (n = 1), day 6 (n = 1), day 8 (n = 1), day 10 (n = 1), day 14 (n = 1) and nESC (n = 1).
The number of cells used for downstream analysis were 5707 for the pES library, 6812 for the day 6 library, 7175 for day 8 library, 7035 for day 10 library, 6831 for the day 14 library, 4476 for the nES library. For the bulk RNA-seq data of intermediate cells toward TSC induction and those in prolonged 5iLAF culture toward naive pluripotency respectively, n = 2 biological replicates were obtained for each sample. Detailed information can be found in specific parts of the Methods section.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The bulk RNA-seq datasets, scRNA-seq datasets and ATAC-seq datasets generated in this study are available at GEO: GSE173756 and GSE174771. The accession number for the RNA-seq data of human embryos is GSE36552. The accession numbers for the RNAseq data of published TSC cell lines is GSE138762. The accession number for the RNAseq data of End cell lines is GSE138012. Source data are provided with this paper.

Code availability
All data were analyzed with standard programs and packages as detailed. Scripts can be found at https://github.com/zftu/Human-primed-to-naive-transition-analysis 55 .