DNA replication fork speed underlies cell fate changes and promotes reprogramming

Totipotency emerges in early embryogenesis, but its molecular underpinnings remain poorly characterized. In the present study, we employed DNA fiber analysis to investigate how pluripotent stem cells are reprogrammed into totipotent-like 2-cell-like cells (2CLCs). We show that totipotent cells of the early mouse embryo have slow DNA replication fork speed and that 2CLCs recapitulate this feature, suggesting that fork speed underlies the transition to a totipotent-like state. 2CLCs emerge concomitant with DNA replication and display changes in replication timing (RT), particularly during the early S-phase. RT changes occur prior to 2CLC emergence, suggesting that RT may predispose to gene expression changes and consequent reprogramming of cell fate. Slowing down replication fork speed experimentally induces 2CLCs. In vivo, slowing fork speed improves the reprogramming efficiency of somatic cell nuclear transfer. Our data suggest that fork speed regulates cellular plasticity and that remodeling of replication features leads to changes in cell fate and reprogramming.


Plasmid construction
The 2C::3xturboGFP reporter plasmid was described and characterised previously 1 and the the 2C::tdTomato reporter was acquired from Addgene (#40281) 2 . Both reporters are known to mark 2CLCs 1,2 . Expression plasmids that contain mCherry-hCdt1(1/100)Cy(-) or iRFP-hGeminin(1/110) were generated after amplification from mCherry-hCdt1(1/100)/pCSII-EF (RIKEN BRC, RDB15442) 3 , iRFP-C1 (Addgene plasmid #54786), and ES-FUCCI (Addgene plasmid #62451) 4 and inserted into pCAG-Hyg or pCAG-bsd plasmid using the Ligation Mix (TaKaRa) or Gibson Assembly Master Mix (NEB). The cDNA encoding TIR1 was obtained from Addgene (#47328) and the amplified PCR fragment was ligated into the pCAG-Hyg. The Dux cDNA 5 was subcloned into pRN3P for in vitro transcription. The H2B-tdiRFP (Addgene #47884) has been described elsewhere 1 Tables S1 and Table S2, see references 1,2,5-7 . We also examined Annexin-V levels in 2CLCs and ESCs, which indicated that while GFP + cells contain more Annexin-positive cells than GFPcells (5% versus 1%, respectively), GFP + cells are mostly Annexin-V negative (Extended Data Fig. 7j). While slight differences in the proportion of 2CLCs may vary from reporter to reporter and between experiments, fold-induction is systematically compared to controls performed in the same experiment and under identical experimental conditions. Note that, in addition to the fast maturation/folding time of tbGFP (~30 min 8 ), the use of a PEST degradation signal to track 2CLCs 1,6 allows capturing 2CLCs more accurately than without PEST, in which the fluorescence also reflects 2CLC 'history' as the protein persists for longer in the cells.

Immunostaining and detection of EdU incorporation
Cells were washed with PBS, fixed for 15 min in 4% PFA in PBS at room temperature and permeabilized with 0.5% Triton X-100 in PBS for 15 min at room temperature. Cells were blocked in 5% normal goat serum in PBS for 1 h at room temperature and incubated overnight at 4ºC with the primary antibodies described in Supplementary Table S7

Analysis of EdU incorporation
For the analysis of EdU spots in 2CLCs, we set up an automated analysis pipeline in Fiji 9 and Icy 10 . First, we combined the GFP channel (indicating 2CLCs, based on the 2C::tbGFP reporter) with the EdU-channel and a manual annotation of nuclei from the DAPI signal. In Icy, EdU spots were identified with the Wavelet Spot detector 11  reprogramming is always higher in S-phase (fS) as compared to either G2/M (fG2M) (y axis) or G1 (fG1) (x axis). The fitting in Figure 2f is represented by the gray area below the dashed line: all the values of the transition rates falling within the grey area fit the experimental data.
Because the dashed line cuts the y and the x axis at values that are lower than 1 for both G2/M over S (fG2M/fS, y axis) or G1 over S (fG1/fS x axis), it follows that the transitions from ESCs to 2CLCs must occur most frequently in S-phase (e.g. else the dashed line would meet the axis at a value greater than 1). Under these assumptions, the following ODE system can be written: By rearranging terms, we obtain the following equation: where is a constant that depends on the initial conditions.
Model fitting. As we show above, the following two equations describe the fraction of 2CLC at the steady state and during the DTB block&release experiment, respectively: -( − *5 ): this is the difference between the cell death rate and the growth rate of 2CLC.
Note that 1/( − *5 ) corresponds to the typical time during which 2CLC remain alive, as estimated by live-cell time-lapse microscopy, which we considered between 12 and 24 hours.
Once a particular combination of the parameters is chosen, the fitting procedure consists of two steps: during the first step, equation (2) is used to fit the data from the DTB block and release experiment to estimate A . We considered only measurements starting from t=3h to take into account the delay in detecting the GFP signal.
Then, in a second step, the estimation of A is plugged in equation (1) where I is the image intensity, Imin and Imax are minimum and maximum intensities of the current cell time series, respectively, and Inorm is the normalised intensity.

Equation for logistic fit:
where A is the curve's amplitude, k the logistic growth rate, t the time and  Table S7) and at room temperature for 1 hour with secondary antibody conjugated with horseradish peroxidase (HRP). HRP activity was detected by ECL western blotting detection reagent (GE healthcare) and the band intensities were quantified using Fiji.

RNA sequencing analysis and sample clustering
STAR aligner 16 was used to map sequencing reads to transcripts in the mouse mm9 reference genome. Read counts for individual transcripts were produced with HTSeq-count 17 , followed by the estimation of expression values and detection of differentially expressed transcripts using EdgeR 18 . For the comparison between DE genes in endogenous and siUSP7-induced 2CLCs, log fold change of gene expression between GFPvs GFP + cells, and between siUSP7 GFPand siUSP7 GFP + cells (x and y axes in scatter plot in Extended Data Fig. 4e) was estimated using EdgeR 18 . Genes that were up-regulated, based on the cutoffs of at least 2fold change and FDR<0.01, in both comparisons are marked red. For all analyses, differentially expressed genes were defined by at least 2-fold change with FDR less than 0.01.
For the comparison between several 2CLC lines and mouse embryo stages, sample clustering was carried out as previously described 19 , using gene counts obtained for each sample by STAR and HTseq. Batch effects elicited by differences across previous studies were corrected using the ComBat method implemented in the SVA package (https://bioconductor. org/packages/release/bioc/html/sva.html). A sample distance matrix was calculated using the previously described 19

Chimera Assay
Collected zygotes were grown for two days in KSOM until they reached the 4-8-cell stage.
Their zona pellucida was removed by short exposure to Acid Tyrode solution and individual denuded embryos were placed each in a concave microwell, created by a smooth depression using darning needles. For the donor cell preparation, we transfected siControl, siUSP7, or treated with 50 µM HU into the 2C::tbGFP reporter ESC line stably expressing H2B-tdiRFP and ESCs and the 2CLCs of each group were sorted by FACS according to their GFP fluorescence. Approximately 10 cells were aggregated with each host embryo, and cultured for an additional two days. Chimera blastocysts were fixed for 15 min in 4% PFA in PBS and permeabilised with 0.5% Triton X-100 in PBS for 15 min at room temperature. Embryos were blocked in 5% normal goat serum in PBS for 1 h and incubated with the Alexa Fluor 647 Phalloidin (Thermofisher Scientific) for 1 h at room temperature. DNA was stained with 1 µg/ml DAPI. To analyse the contribution of donor cells, we reconstructed blastocysts in 3D using the IMARIS software, with the help of orthogonal planes, and defined outer (TE) and inner (ICM) cells, based on phallaoidin staining, which labels cortical actin, as before 22 . The use of phalloidin for cell membrane labelling allowed us to identify 'Inside' cells as those lacking any contact with the outer surface of the embryo, whereas 'outside' cells have such contact 22,23 .
We discarded cells that appeared morphologically abnormal or dead (e.g. based on DAPI and/or abnormal cytoplasm as judged by phallaoidin distribution) and cells which were not fully incorporated into the blastocysts analysed. Individual cells were quantified across 96 embryos by two independent people with double blind scoring. In Extended Data Fig. 5d

Reprogramming of MEFs to iPSC
i4F MEFs were derived from the i4F mice 24 and kindly provided by Anne Dejean (Institut Pasteur). 2.5x10 5 i4F MEFs were seeded on 6-well plates the day before the induction of reprogramming. Cells were kept in KSR medium with 1 µg/mL doxycycline for 10 days in the presence or absence of HU treatment with indicated time windows. Colonies were stained with Alkaline Phosphatase Detection Kit (Milipore) to assess the reprogramming efficiency.
Colony counting was performed in Fiji using the Trainable Weka segmentation plugin. We trained a FastRandomForest classifier with the default settings and the following features: Gaussian, Sobel, Hessian, Difference of Gaussians, Membrane projections, Variance, Mean, Minimum, Maximum, Median, Structure, Entropy, and Neighbours. We defined 4 classes, colonies, background, dust, and other (non-colony containing bright image regions, mainly from light reflecting off the plastic dish). We trained the classifier on several randomly chosen images. With a custom Fiji macro, the Weka classifier model was then applied to 25 randomly chosen, 500 pixel wide square regions per well of a 6-well plate. Candidate colony labels were binarised and binary masks extended 1 pixel to merge any fragmented colonies. We then used the Particle Analyser plugin to determine shape descriptors, specifically area and roundness of the identified objects. In 'R', we then filtered out any objects smaller than 20 square pixels and of a roundness smaller than 0.2 which we empirically determined as cutoff for wrongly classified noise or very elongated objects, respectively, the latter one originating most often from the border of the well. Plotting and statistical analysis were done in 'R'. We applied a generalised linear model with Poisson distribution to determine whether HU treatment and/or length of treatment lead to a significant change in colony number.

Analysis of repeat elements
The annotation of repeat elements in the mm9 reference genome was downloaded from Repeatmasker (http://www.repeatmasker.org). To estimate the enrichment of a given type of repeat elements within a given set of genomic regions, the number of element copies that overlapped with these regions was compared to the random distribution of the number of overlaps between these genomic regions and the instances of repeats randomly shuffled across the genome, based on 1000 random shuffles. The average expression value for each repeat type was then estimated as RPKM by normalizing the read count by the total size of sequencing library and total genomic length of all repeats of this type.

Histone modification profiles of genes that change the replication timing
Generation of libraries and analysis of histone modifications was done globally as described 25 .
ChIP-Seq data was downloaded from previously published data by Liu, X., et al. 26 for H3K4me3 and H3K27me3 (GSE73952) and Wang, C., et al. 27

Analysis of H3.3 enrichment on MERVL
ChIP-seq datasets for 2-cell stage mouse embryos 29 were aligned using bowtie2 (version 2.3.5) 30 with the options "--local --very-sensitive-local -I 5 -X 700". Duplicates were removed using samtools (version 1.9) 31 . Only correctly paired reads were used for subsequent analyses, without multi-mapping filtering. Reads overlapping MERVL elements (MT2_Mm, MERVL-int) were quantified for each locus using bedtools (v2. 26.0) and normalized by the sequencing depth and length of the fragment. The GTF annotation used was from the TEtranscripts 32 . Statistical tests were used against the corresponding chromatin input sample using a paired Wilcoxon test. Outliers were excluded from the figure.

Single embryo RNA sequencing analysis
Analyses were carried out on R (version 4.0.2). Reads were aligned with STAR (2.7.3a) 16 to the mm10 genome with the default settings and counting the reads for every gene using the option "--quantMode GeneCounts". The index was created using the mm10 annotation from iGenomes from the UCSC source, the ERCC spike-in control genes (Thermofisher catalog #4456739), and the mitochondrial genes (ENSEMBL annotation downloaded from UCSC) were added to the index. FPKM values were calculated for each sample using the sum of all the non-ERCC counts and the number of exonic kilobases for each gene as scaling factors.
We applied the following quality thresholds and kept cells with: less than 50% of ERCC counts; more than 5000 genes detected; >500000 reads mapping to genes; <10% mitochondrial reads. ERCC threshold was set to 50% to take into account the low transcriptional complexity and RNA content of cumulus cells (CC), compared to embryos. All embryo samples showed ERCC percentages lower than 10%. Based on the above QC, seven CC cells did not pass the thresholds and were removed from the analysis. For the embryonic PCA analysis, we used the processed RPKMs from available scRNA-seq data of mouse embryos 35  Genes of the cluster of the "Major ZGA" were filtered for those with a correlation greater than 0.75 and FPKM values greater than 3. The heatmap was plotted with pheatmap with a complete linkage hierarchical clustering was applied to rows and columns.