Transcription of MERVL retrotransposons is required for preimplantation embryo development

Zygotic genome activation (ZGA) is a critical postfertilization step that promotes totipotency and allows different cell fates to emerge in the developing embryo. MERVL (murine endogenous retrovirus-L) is transiently upregulated at the two-cell stage during ZGA. Although MERVL expression is widely used as a marker of totipotency, the role of this retrotransposon in mouse embryogenesis remains elusive. Here, we show that full-length MERVL transcripts, but not encoded retroviral proteins, are essential for accurate regulation of the host transcriptome and chromatin state during preimplantation development. Both knockdown and CRISPRi-based repression of MERVL result in embryonic lethality due to defects in differentiation and genomic stability. Furthermore, transcriptome and epigenome analysis revealed that loss of MERVL transcripts led to retention of an accessible chromatin state at, and aberrant expression of, a subset of two-cell-specific genes. Taken together, our results suggest a model in which an endogenous retrovirus plays a key role in regulating host cell fate potential.

Zygotic genome activation (ZGA) is a critical postfertilization step that promotes totipotency and allows different cell fates to emerge in the developing embryo. MERVL (murine endogenous retrovirus-L) is transiently upregulated at the two-cell stage during ZGA. Although MERVL expression is widely used as a marker of totipotency, the role of this retrotransposon in mouse embryogenesis remains elusive. Here, we show that full-length MERVL transcripts, but not encoded retroviral proteins, are essential for accurate regulation of the host transcriptome and chromatin state during preimplantation development. Both knockdown and CRISPRi-based repression of MERVL result in embryonic lethality due to defects in differentiation and genomic stability. Furthermore, transcriptome and epigenome analysis revealed that loss of MERVL transcripts led to retention of an accessible chromatin state at, and aberrant expression of, a subset of two-cell-specific genes. Taken together, our results suggest a model in which an endogenous retrovirus plays a key role in regulating host cell fate potential.
Fertilization and early preimplantation development are processes in which unipotent gametes unite and acquire totipotency (Fig. 1a). After fertilization, embryos undergo zygotic genome activation (ZGA), a process that is widely conserved in vertebrates [1][2][3] . ZGA involves a transcriptional burst of hundreds to thousands of two-cell-specific genes. At this point, gene expression switches from a maternal to zygotic program 2,4 . ZGA occurs in two distinct waves called minor and major ZGA 5 . In mice, minor ZGA occurs from S phase in the zygote to G1 phase in the early two-cell stage embryo, whereas the major wave occurs during the second round of DNA replication at the middle-to-late two-cell stage 6,7 . Both waves of ZGA are critical for the embryo to acquire developmental competence 6,8 . However, the molecular events that drive ZGA and lead to acquisition of totipotency and developmental competence are still enigmatic.
Approximately 40% of the mouse genome is occupied by transposable elements (TEs), mobile genetic elements of which ~10% are endogenous retrovirus (ERV) 9 . Notably, the expression of murine endogenous retrovirus with leucine transfer RNA primer binding site (MERVL) is specifically activated at the two-cell stage concomitant with ZGA 10-12 . Recently, the transcription factor DUX, which is expressed during minor ZGA, was documented as an upstream regulator that activates two-cell genes and MERVL [13][14][15] . Furthermore, the MERVL long terminal repeat (LTR) promoter drives a subset of two-cell genes and generates chimeric transcripts with the host genes 12 . The above findings suggest that DUX/MERVL may activate an early transcriptional network that is required for ZGA and totipotency.
In 2012, Macfarlan et al. found that a rare transient cell population (~1%) in mouse embryonic stem cell (ESC) and induced pluripotent stem cell cultures expresses high levels of MERVL and two-cell genes without expression of pluripotent inner cell mass (ICM) maker genes, such as Pou5f1 (also known as Oct4), Sox2 and Nanog 12 . MERVL expression has been used as a marker for totipotent cells, as MERVL + cells can commit to both embryonic and extraembryonic lineages after injection into recipient embryos at the eight-cell and morula stages 12,16,17 . Article https://doi.org/10.1038/s41588-023-01324-y from each blastomere at eight representative stages of preimplantation development 18 (Fig. 1a). To define regions of nonredundant MERVLs in mouse genome, we used RepeatMasker to annotate the genome for unique interspersed internal regions of MERVL (MERVL-int, n = 1,426) and LTR promoters of the MERVL (MT2_Mm, n = 2,366). The expression of MERVL and its LTR promoter culminated in the middle of the two-cell stage and then gradually decreased until blastocyst stage (Fig. 1b,c).
We also set out to investigate the expression and localization of MERVL transcripts in preimplantation embryos using single-molecule fluorescence in situ hybridization (smFISH). Interestingly, smFISH revealed that MERVL expression is detectable in the nuclei from zygotes and early two-cell stage embryos in which polyadenylated MERVL Despite the above findings, the function of MERVL itself remains unclear. Here, we overcome technical limitations in interrogating TE functions and analyze the role of MERVL in preimplantation development. We found that depletion of MERVL transcripts resulted in embryonic lethality due to defects in early lineage specification and genome stability, demonstrating that MERVL is essential for mouse preimplantation development.

MERVL exhibits distinct localization in mouse embryos
To understand the dynamics of MERVL expression, we first analyzed publicly available single-cell RNA-sequencing (scRNA-seq) datasets Article https://doi.org/10.1038/s41588-023-01324-y mRNA cannot be detected (Fig. 1b,d). Afterwards, MERVL RNA gradually translocated from the nucleus at middle two-cell stage onward and was highly restricted to the cytoplasm by late two-cell stage (Fig. 1d). These changes in MERVL transcript localization were consistent with increased MERVL protein levels during the middle two-cell stage (Fig. 1e). These observations raise the possibility that nuclear MERVL transcript has distinct roles in gene regulation in the early stages of preimplantation development compared to cytoplasmic MERVL transcript, leading us to investigate MERVL function further.

MERVL-KD results in embryonic lethality
Inconsistencies regarding the early embryonic phenotypes of MERVL knockdown (KD) in previous studies 11,19,20 , led us to re-examine the KD effects of MERVL on preimplantation development. To this end, we developed specific antisense oligonucleotides (ASOs) that target interspersed MERVL copies ( Fig. 2a and Extended Data Fig. 1a,b). After predicting the genome-wide target sites of individual ASOs using BLASTn, we confirmed that 46.9% (n = 669/1,426) of MERVL copies were targeted by at least one ASO with up to two mismatches allowed (Extended Data Fig. 1c). Subsequently, we confirmed that our ASO sequences efficiently targeted full-length MERVL (≥5 kb, n = 377/556, 67.8%), by combining three independent anti-MERVL ASOs (Extended Data Fig. 1d). We experimentally validated the KD efficiency of each ASO using a recently developed ESC-based in vitro system (Extended Data Fig. 1e and Methods) 21 in which MERVL expression was drastically reduced at both the mRNA and protein levels (Extended Data Fig. 1f-h). Injection of each ASO into the male pronucleus of zygotes also leads to substantial reduction of MERVL RNA signal at the late two-cell stage (Extended Data Fig. 2a). Because we noted that cocktail of three independent ASOs increased MERVL-KD efficiency ( Fig. 2b and Extended Data Fig. 2), mixed ASOs (1:1:1 = 20 µM) were used in subsequent experiments. Next, we monitored the effects of MERVL-KD using ASOs on preimplantation development (Fig. 2c,d). MERVL-KD embryos displayed a significant developmental delay from 2.5 dpc (Fig. 2d). Despite longer culture, a greater percentage of MERVL-KD embryos remained at the two-to four-cell stage (26.2% for control and 82.7% for MERVL-KD at 2.5 dpc (**P = 0.00533, chi-square test; Fig. 2d)). Strikingly, at 4.5 dpc, almost no blastocyst formation was observed in MERVL-KD embryos (Fig. 2c,d). These findings revealed that majority of MERVL-KD embryos suffered a developmental delay and halted development before blastocyst formation. To exclude the possibility that the reduction of MERVL transcript level and aberrant developmental phenotypes in MERVL-KD embryos might arise from unexpected artificial effects due to the presence of excessive small nucleotides, such as DNA replication stress and innate immune response, we performed microinjection of sense oligonucleotides (SOs), which are complementary to individual MERVL-targeting ASOs into the male pronucleus of zygotes.
This confirmed that injection of sense oligonucleotides (SOs) did not affect the expression of MERVL and preimplantation development (Extended Data Fig. 2).
Alternatively, we also tested KD of MERVL using CasRx, another KD strategy with a CRISPR-based RNA-targeting system 22 (Extended Data Fig. 3a). One day after induction of CasRx-mediated KD, no MERVL protein was detected in MERVL-KD two-cell stage embryos, and nuclear and cytoplasmic MERVL RNA signals remained, albeit with clear reductions compared to that of control (Extended Data Fig. 3b). Even with this KD inefficiency, the preimplantation development upon CasRx-mediated KD partially phenocopied that observed in ASO-mediated KD embryos (Extended Data Fig. 3c,d).
Of note, the phenotype of ASO-mediated MERVL-KD embryos is reminiscent of that observed in loss-of-function of genes associated with ICM and trophectoderm differentiation [23][24][25][26][27] , implying that MERVL transcription might function in early cell lineage specification during preimplantation development. To test this hypothesis, we examined mRNA levels of genes related to ICM (Oct4, Sox2 and Nanog) and trophectoderm differentiation (Tead4, Tcfap2c and Cdx2) in MERVL-KD embryos at 3.5 dpc, by RT-qPCR (Fig. 2e). Sox2, Nanog and Cdx2 mRNA levels were significantly reduced in MERVL-KD morula, in contrast to Oct4, Tead4 and Tcfap2c transcripts, which increased (Fig. 2e). These data suggest that MERVL transcription during early stages of preimplantation development is required for subsequent proper expression of genes linked to the earliest cell lineage specification events. To further investigate the molecular etiologies of the phenotype upon MERVL-KD, we examined OCT4 and CDX2 protein expression using immunofluorescence analysis at the morula stage (Fig. 2f). MERVL-KD morula is notably composed of significantly fewer blastomeres (Fig. 2g) with a significant reduction in OCT4 and CDX2 protein expression (Fig. 2h). Notably, MERVL-KD embryos also displayed increased hallmarks of genomic instability such as nuclear deformation and micronuclei, and apoptosis ( Fig. 2i,j). In sum, we conclude that the depletion of MERVL transcript results in disruption of lineage specification, cell death, and ultimately early embryonic lethality.

Cis-acting functions of MERVL is essential for development
We next interrogated which products from MERVL were required for preimplantation development. Firstly, we used two independent short interfering RNA (siRNAs) to knock down MERVL (Fig. 3a). Upon siRNA-mediated KD of MERVL, no cytoplasmic MERVL RNA and protein were detected in late two-cell stage embryos, whereas the nuclear signal of MERVL RNA was still intact in early two-cell stage embryos (Fig. 3b). Consequently, although the phenotypes of siRNA-mediated MERVL-KD are inconsistent between previous studies 19, 20 , we confirmed that siRNA-mediated KD of MERVL had almost no significant impact on preimplantation development (61.6% for control and 66.1% for MERVL-KD (P = 0.5966, chi-square test) (Fig. 3c,d)), suggesting that . NS, not significant; **P < 0.01; ***P < 0.001, chi-square test. e, Expressions of ICM-and TE-associated genes, as measured by qRT-PCR in control and MERVL-KD morula at 3.5 dpc. Bars show means with standard error of the mean (s.e.m.). Dots represent biological replicates (n = 6). *P < 0.05, **P < 0.01, ***P < 0.001, two-tailed unpaired t-tests. f, Representative images of immunofluorescence staining for OCT4 and CDX2 with DAPI counterstain in control and MERVL-KD morula at 3.5 dpc, from three independent experiments. Scale bars, 20 µm. g,h, Dot plots showing the total number of blastomere per embryo and relative intensity for OCT4 and CDX2 per nucleus in control and MERVL-KD morula at 3.5 dpc, normalized to DAPI signal. Central bars represent medians, and the top and bottom lines encompass 50% of the data points. ***P < 0.001, two-tailed unpaired t-tests. i, Representative images of immunofluorescence staining for E-cadherin (E-Cad, green) with DAPI counterstain (gray) in control and MERVL-KD morula at 3.5 dpc, from two independent experiments (left). Arrows and arrowheads indicate nuclear deformation and micronuclei. Scale bars, 20 µm. Bar chart showing the percentage of abnormal nuclear morphology at morula stage (right). *P < 0.05, chi-square test. j, Representative images of immunofluorescence staining for cleaved caspase-3 (cCasp3, green) with DAPI counterstain (gray) in control and MERVL-KD morula at 3.5 dpc, from three independent experiments (left). Scale bars, 20 µm. Bar chart showing the percentage of apoptotic cells per embryo at morula stage (right). ***P < 0.001, chi-square test. Data for panels in d, e and g-j are available as source data.
To determine whether transcribed MERVL RNA plays a critical role in preimplantation development, we used a MERVL RNA construct with point mutations conferring resistance to ASO-mediated KD (Fig. 3e) and co-injected an ASO-resistant MERVL RNA with mixed ASOs into zygotes. MERVL transcript was observed in nuclei of MERVL-KD early two-cell stage embryos at levels comparable to control embryos, when ASO-resistant MERVL transcript was present (Fig. 3f). However, upon injection of MERVL RNA, no cytoplasmic MERVL RNA signal was detected in late two-cell stage embryos. (Fig. 3f), which is to be expected, given that our exogenous MERVL construct is likely degraded by RNA surveillance pathways 28 . Although nuclear MERVL transcript was present in MERVL-KD embryos by co-injection with ASO-resistant MERVL RNA, the embryonic lethality was not rescued (Fig. 3g,h).
Collectively, our data demonstrated that neither encoded proteins nor trans-acting RNA of MERVL is indispensable for preimplantation development.   To gain better insight into cis-acting functions of MERVL, we employed the CRISPR interference (CRISPRi) system [29][30][31] (Fig. 3i). After microinjection, we verified that the expression of MERVL protein was efficiently suppressed in two-cell stage embryos upon CRISPRi to MERVL (MERVLi), whereas MERVL transcriptional level was significantly decreased in MERVLi embryos compared to that in GFPi control, but not fully silenced (Fig. 3j). Notably, blastocyst formation was severely impaired upon MERVLi (Fig. 3k). Only 38.7% MERVLi embryos reached blastocyst stage, compared with 78.5% for GFPi control embryos (Fig. 3l). These findings showed the necessity of MERVL transcription in early stage of development at the chromatin level.

A subset of two-cell genes are dysregulated in MERVL-KD embryos
To assess the effects of ASO-mediated MERVL-KD on ZGA, we made use of 5'-ethynyluridine (EU) to assay global transcriptional levels in control and MERVL-KD two-cell stage embryos (Fig. 4a). We observed no significant change in zygotic transcriptional activity between control and MERVL-KD two-cell stage embryos during major ZGA (Fig. 4b,c).
Further, we carried out total RNA-seq analyses in control and MERVL-KD embryos (Extended Data Fig. 4a,b). The RNA-seq dataset provided accurate gene expression profiles ( Fig. 4d and Supplementary Tables 1 and 2). When we compared differentially expressed genes (DEGs) between control and MERVL-KD embryos, we found that 938, 745 and 277 genes were significantly dysregulated in MERVL-KD embryos, at the two-cell, four-cell and eight-cell stages, respectively  Table 3). Of these DEGs, only 3-15% of genes were defined as 'maternally inherited RNA' in a database of transcriptome in mouse early embryos version 2 (DBTMEE v2 (refs. 32,33 )) across all stages examined (Fig. 4f), suggesting that MERVL-KD led to major defects in expression of zygotic genes rather than maternal mRNA clearance. In contrast to annotated genes, among all interspersed repetitive elements, only MERVL and a small number of closely related subtypes were significantly downregulated in MERVL-KD preimplantation embryos (Extended Data Fig. 5a,b). Moreover, largely consistent with previous reports 12 , we also detected 77 genes that generated chimeric transcripts with junction to MT2_Mm in both control and MERVL-KD embryos. However, the overall expression level of chimeric transcripts is unchanged in MERVL-KD embryos (Extended Data Fig. 5c).
To better understand the functions of the DEGs, we performed gene ontology (GO) and ChIP Enrichment Analysis (ChEA) 34 , indicating that TP53 (also known as p53)-related terms were significantly enriched in each set of upregulated DEGs ( Fig. 4g and Extended Data Fig. 4c). Taken together, these transcriptome analyses raise the possibility that activated p53 ectopically accumulates in the nuclei of MERVL-KD embryos. Because DNA damage and activated p53 contribute to the induction of two-cell genes in early-stage preimplantation embryos and ESCs 35 , we reasoned that the genes significantly upregulated in MERVL-KD embryos might be enriched for two-cell and p53-target genes. To test this hypothesis, we next compared DEGs with the lists of two-cell genes (as defined in Macfarlan et al. 12 ) and DBTMEE v2 (refs. 32,33 ). Significant enrichment of two-cell genes was observed in all DEG sets (P < 0.001, hypergeometric test for overrepresentation; Fig. 4h). Strikingly, upregulated DEGs in MERVL-KD four-cell embryos exhibited the highest increase in enrichment of two-cell genes (Fig. 4h). Unsupervised hierarchical clustering of all two-cell gene transcripts revealed that more than half of two-cell genes were dysregulated in at least one developmental stage of MERVL-KD embryos (Extended Data Fig. 4d). In addition, two-cell-and four-cell-transient genes were also enriched in all sets of upregulated DEGs (Fig. 4h). These findings argue that MERVL-KD embryos still retain a two-cell/totipotent-like transcriptome even at mid-preimplantation stages (that is, four-cell and eight-cell stages). Indeed, representative track views demonstrated that two-cell gene loci abnormally maintained high expression levels of two-cell genes in MERVL-KD embryos (Fig. 4i). Consistent with our hypothesis, we also observed a significant increase in phospho-S15-p53 signal intensity in MERVL-KD embryos compared to that in controls (Fig. 4j,k). Overall, these results indicate that a subset of two-cell genes are persistently active due to defects in MERVL transcription.
We also performed total RNA-seq analysis for GFPi control and MERVLi embryos (Supplementary Table 1). Hierarchical clustering and Pearson correlation analyses for the expression of all annotated transcripts showed higher similarity between ASO-mediated MERVL-KD and MERVLi embryos (Extended Data Fig. 6a,b), suggesting that the extensive transcriptional anomalies in MERVLi embryos at least partially resemble that in MERVL-KD embryos. Indeed, although only moderate reduction of MERVL expression was observed in MERVLi two-cell stage embryos due to limited effectiveness of CRISPRi for depleting nuclear MERVL RNA (Extended Data Fig. 6c), we identified 872, 535 and 273 genes that were differentially expressed in MERVLi embryos, at the two-cell, four-cell and eight-cell stages, respectively (Extended Data Fig. 6d and Supplementary Table 4). Importantly, significant enrichment of two-cell genes was observed in all sets of DEGs through MERVLi preimplantation development (P < 0.001, hypergeometric test for overrepresentation; Extended Data Fig. 6e). Consistently, a representative two-cell gene maintained abnormally high expression level in MERVLi embryos (Extended Data Fig. 6f). These profiles demonstrated that MERVLi embryos exhibited a transcriptome that was highly similar to that of MERVL-KD embryos, corroborating that the embryonic phenotypes in MERVL-KD and MERVLi embryos arise from defects in cis-regulatory function of MERVL.

MERVL modulates dynamic changes in accessible chromatin
To further investigate the correlation between transcription and chromatin state, we next asked whether MERVL-KD impacts chromatin accessibility at the transcription start sites (TSSs) of dysregulated two-cell  12 ). The y-axis represents normalized tag counts for total RNA-seq in each sample. Refgene, refseq gene in NCBI and UCSC. j, Representative images of immunofluorescence staining for phosphorylated p53 on serine 15 (p-S15-p53, pink) with DAPI counterstain (gray) in control and ASO-mediated MERVL-KD embryos, from three independent experiments. Scale bars, 20 µm. k, Dot plots showing the relative intensity for p-S15-p53 per nucleus in control and MERVL-KD embryos, normalized to DAPI signal. Central bars represent medians, the top and bottom lines encompass 50% of the data points. *P < 0.05 and ***P < 0.001, two-tailed unpaired t-tests. Data for panels in c-h and k are available as source data.
Article https://doi.org/10.1038/s41588-023-01324-y genes by using the optimized low-input assay for transposase-accessible chromatin with sequencing (miniATAC-seq) method (Supplementary  Table 1 and Extended Data Fig. 7) 36 . The overall enrichment pattern of ATAC-seq signals at 10-kb intervals across the genome was modestly changed in MERVL-KD embryos during preimplantation development (Fig. 5a)  Article https://doi.org/10.1038/s41588-023-01324-y reduced in MERVL-KD two-cell-stage embryos (Fig. 5b). Because ASO-mediated KD might trigger premature transcriptional termination via degradation of the residual Pol II-associated transcripts 37 , suggesting that the reduction of chromatin openness at MERVL loci is associated with premature termination of MERVL transcripts induced by KD. Next, we determined whether the transcriptional changes in MERVL-KD embryos were associated with their chromatin states by examining enrichment of ATAC-seq signals at TSSs of two-cell genes, ATAC-seq signals showed significantly higher enrichment at the TSSs of two-cell genes in MERVL-KD embryos at the four-cell stage compared to control embryos (Fig. 5c). In line with this finding, a representative track view showed that an accessible chromatin state was detected at the TSS of a two-cell gene. This open state persisted into the mid-preimplantation stage upon MERVL-KD, a time when this locus is repressed/closed in control embryos (Fig. 5d). These results confirm that ectopic expression of two-cell genes in mid-preimplantation embryos upon MERVL-KD is associated with accessible chromatin states at these loci.
Furthermore, k-means clustering of ATAC-seq signals across all peak regions in control embryos yielded five clusters that fall into two distinct categories. Category A became accessible gradually as development proceeds (composed of cluster 1 and 2), and category B became specifically accessible in two-cell-stage embryos (composed of cluster 3-5) (Fig. 5e). Among each cluster, approximately half of ATAC-seq peaks were enriched at largely genic regions, including promoter, exon and intronic regions (Fig. 5f). GO analysis revealed that genes adjacent to peaks in category A were highly enriched for roles in 'chromatin organization' and 'posttranscriptional regulation', such as those involved in regulation of the pluripotent state (Fig. 5g, top) 38,39 . Moreover, the peaks in category A contained genic regions that associated with pluripotency, including Oct4 and Cdx2 (Fig. 5g, top, and Supplementary Table 5). Conversely, the peaks in category B were characterized by genes and biological terms associated with totipotent and two-cell states, such as Dux and Zscan4c, whose expression was restricted to two-cell stage embryos (Fig. 5g, bottom, and Supplementary Table 5) 13,14,40 . Finally, we examined ATAC-seq signals in each cluster and compared control and MERVL-KD embryos. In category B (cluster 3-5), analysis of average tag densities of ATAC-seq signals confirmed that chromatin accessibility was significantly decreased in peaks of both cluster 3 and 4 in MERVL-KD two-cell stage embryos (Fig. 5h). In contrast, from the four-cell stage onward, ATAC-seq signals were significantly increased in peaks of cluster 4 and 5 in MERVL-KD embryos (Fig. 5h), suggesting that a two-cell-like chromatin state was partially retained during preimplantation development of MERVL-KD embryos.

MERVL-KD exhibits less abundant transcripts adjacent to MERVL
We interrogated the cis-regulatory functions of MERVL. First, we compared the expression levels of annotated genes (in GENCODE vM25) and their distances to nearby full-length MERVL in control and MERVL-KD two-cell stage embryos. No significant differences were observed in the expression of adjacent annotated genes to MERVL, ranging from 0 to 500 kb (Fig. 6a). Intriguingly, in contrast to annotated genes, the unannotated transcripts from the adjacent regions to MERVL were significantly downregulated in MERVL-KD embryos, and this was positively correlated with their proximity to the full-length MERVL (Fig. 6b).
To evaluate the expression levels of unannotated intergenic transcripts in control and MERVL-KD two-cell stage embryos, we used groHMM, a two-state hidden Markov model (HMM)-based algorithm 41 . This framework classified transcriptionally active regions (TARs) across the genome into annotated TAR (aTAR) that overlapped with existing gene annotation and unannotated TAR (uTAR) that did not overlap with regions of any genes, pseudogenes and MERVL families annotated by GENCODE and RepeatMasker (Fig. 6c), identifying in total 240,869 uTARs ( Fig. 6d and Supplementary Table 6). Despite the relative scarcity of uTAR, which accounted for 27.5-34.6% of aligned reads, we detected 3,499 differentially expressed uTARs between control and MERVL-KD embryos (Fig. 6d and Supplementary Table 7). In particular, the vast majority of them were shown to be downregulated upon the loss of MERVL transcription (Fig. 6e). Representative track views demonstrated that transcriptional activities in MERVL-adjacent regions were significantly reduced at upstream of genic TSS and gene desert regions, in MERVL-KD two-cell embryos (Fig. 6f). Specifically, we found a higher fraction of differentially expressed uTARs preferentially located in the neighborhood of MERVL, compared with all called TARs (Fig. 6g). These data suggest a cis-regulatory function for MERVL in the regulation of unannotated transcripts from their adjacent loci. Moreover, concomitantly with the change in expression of uTARs, the intensity of ATAC-seq signal on differentially expressed uTARs was also decreased in MERVL-KD embryos (Fig. 6h), leading to the suggestion that MERVL-driven unannotated transcripts may play a role in chromatin remodeling.

Discussion
We provided functional evidence that transcriptional activation of MERVL is essential for progression of development in mouse preimplantation embryos. Depletion of MERVL transcripts results in embryonic lethality with profound defects in development and is associated with dysregulation of MERVL including their adjacent transcripts, and retaining two-cell-like transcriptome and chromatin state (Fig. 6i). These findings suggest the possibility that MERVL transcription in totipotent cells may act as a switch for the transition from totipotency to pluripotency and is responsible for the onset of differentiation and ontogeny (Fig. 6i).
One of most striking consequences of loss of the MERVL transcript was complete defects in preimplantation development. Several groups have used KD strategies in an effort to characterize the role of MERVL in preimplantation development. However, results  12 ). The yaxis represents normalized tag counts for ATAC-seq in each sample. The flanking region around TSS is highlighted in red. e, Heatmaps showing ATAC-seq signals across all peak regions ±1 kb in control embryos at two-cell, four-cell, and eightcell stages. Each peak was ordered by k-means clustering of ATAC-seq signal and yielded five clusters. Cluster 1-2, increased accessibility during preimplantation development (defined as category A); cluster 3-5, were transiently accessible at two-cell stage and reduced accessibility during preimplantation development (defined as category B). f, Stacked bar chart shows ATAC-seq peak distributions across genomic entities (intergenic, intron, exon and promoter) in each cluster. g, GO analysis of genes adjacent to ATAC-seq peaks of cluster 1-2 (category A) and cluster 3-5 (category B). The pluripotent-related and 2C genes were obviously enriched in gene set adjacent to category A and category B, respectively. h, Average tag density plots of ATAC-seq enrichment around peak regions (±1 kb) in each cluster by stages of development in control and MERVL-KD embryos. *Padj < 0.05, **Padj < 0.01, ***Padj < 0.001, Mann-Whitney U test with Bonferroni correction. Data for panels in a, f and g are available as source data. Article https://doi.org/10.1038/s41588-023-01324-y and genomic stability. The phenotypic differences appear to be explained by methodologies for KD. First, we microinjected multiple ASOs with 2′-O-methoxyethyl modifications, providing enhanced duplex stability and nuclease resistance 42 , into zygotes resulting in efficient degradation of nuclear MERVL transcripts at much higher levels than liposome-based transfection (Figs. 2b and 3f). In addition, it is noteworthy that our ASOs more efficiently target interspersed full-length MERVL elements than that in previous study (n = 377/556, 67.8% versus n = 259/556, 46.6%). Consistent with previous studies, we confirmed that siRNA-mediated KD of MERVL did not affect development to the blastocyst stage (Fig. 3a-d), indicating that nuclear MERVL transcripts, but not encoded proteins, are required for preimplantation development. Although the activation of MT2_Mm inducing the two-cell state in previous study is readily appreciable 12,16,43 , we also found that the expression levels of chimeric transcripts with MT2_ Mm were unchanged in MERVL-KD two-cell stage embryos (Extended Data Fig. 5c), suggesting that the phenotypes of MERVL-KD embryos are unlikely to arise from dysregulation of chimeric transcripts with MT2_Mm. Given the results with siRNA-KD and in trans RNA rescue, cis-regulatory activity of MERVL loci is likely to play a key role in host chromatin remodeling at totipotent-to-pluripotent transition. As a proof-of-concept for this hypothesis, we performed CRISPRi-mediated repression of MERVL and revealed that MERVLi embryos partially mimicked ASO-mediated KD of MERVL with regard to developmental defects and retaining two-cell transcriptome (Fig. 3i-l and Extended Data Fig. 6).
Our findings showed that transcription itself and/or cis-acting RNA from MERVL loci is likely to play a critical role for host preimplantation development. Of note, having observed a subset of two-cell genes retaining expression in MERVL-KD mid-preimplantation embryos (Fig.  4g-i), MERVL transcripts may act to repress expression of two-cell genes during preimplantation development, at least in part. Several lines of evidence allow us to predict the mechanism by which MERVL may regulate host transcriptome and chromatin state in preimplantation development. First, a recent study demonstrated that MERVL drives the 3D reorganization of the genome in two-cell-like cells, the rare cell population in ESC culture, and early mouse embryos by providing a topologically associating domain boundary that is coupled to directional transcription from MERVL 44,45 . Thus, it is plausible that transcription itself from MERVL might be responsible for remodeling of chromatin structure during the transition from totipotency to pluripotency. As shown in Fig. 6, it is noteworthy that loss of MERVL transcription also leads to the reduction of transcription levels and chromatin accessibility in unannotated adjacent regions ranging up to 50 kb away. One possible explanation for this finding is that these cryptic transcripts may result from readthrough of RNA polymerase beyond MERVL elements, which leads to pervasive epigenetic changes. Related to our findings, Jachowicz et al. demonstrated that transcriptional activity of LINE1 that peaks at two-cell stage embryos affects global chromatin accessibility and unfolding 46 . Likewise, there has been reported that the transcribed RNA from other type of endogenous retroviruses (that is, intracisternal A-particles and MMERVK10C) can act in cis in ESC 47 . Accordingly, we speculate that en masse transcription of MERVL loci acts in cis as starting events for long-range 3D chromatin organization and remodeling toward the onset of ontogeny. Future studies can now interrogate the intriguing mechanisms that may underlie the important role of MERVL transcript in regulating the host genome that we have uncovered here.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41588-023-01324-y.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/. Article https://doi.org/10.1038/s41588-023-01324-y

Animals
Eight-week-old female and male B6D2F1 (BDF1) mice were purchased from the Japan SLC. All animal experiments were reviewed and approved by the Institutional Animal Care and Use Committee (protocols 09105-(10) and 11045-(6)) at Keio University, where mice were maintained and fed ad libitum with standard diet and water in a temperature-, humidity-and light-controlled room (23°C ± 3°C, a humidity of 50% ± 10% and 14-h light/10-h dark cycle).

MERVL-targeting ASO design
Three independent 20-nt ASOs were designed based on the consensus sequence of MERVL (Supplementary Table 8). ASO candidates that effectively target the internal retroviral regions of MERVL were selected using ChIRP Probe Designer (https://www.biosearchtech. com/support/tools/design-software/chirp-probe-designer), taking into account the secondary structure of the MERVL RNA. To identify potential binding sites for MERVL-targeting ASOs in silico, nonredundant high-confidence MERVL loci were downloaded from RepeatMasker database (http://www.repeatmasker.org/species/mm.html) and converted to BED format. Consequently, FASTA files for these MERVL loci were extracted using the getfasta function as implemented in BEDTools (version 2.30.0) 48 and entered into the makeblastdb program from BLASTn (version 2.6.0+) to generate DNA database for BLASTn search. Each ASO sequence was used to search the database of MERVL DNA using BLASTn 49 , with up to two mismatches allowed. The synthesized ASOs were chemically modified with phosphorothioate linkage and 2′-O-methoxyethyl modification at positions 1-5 and 15-20. As the control for the ASO experiments, we also generated a 20-nt randomized oligonucleotide that does not target the mouse genome (as determined by BLASTn search) and three independent 20-nt SOs that were complementary to respective MERVL-targeting ASOs. The three independent ASOs/SOs against MERVL with the highest targeting rate and randomized non-targeting ASO (scrambled ASO) were used for KD experiments (Supplementary Table 8).

ASO-mediated KD of MERVL in ESCs
A total of 1 × 10 5 ESC Dux s were harvested from culture dishes per one nucleofection. To validate efficiency of the ASO-mediated KD, the cells were nucleofected with 1 nmol ASO against MERVL or scrambled ASO using P3 Primary Cell 96-well Nucleofector Kit (Lonza) according to manufacturer's instruction and seeded into each well of a 24-well plate containing 500 µl ESC medium with 2i, LIF and 1.1 µl iMatrix-511 Silk solution (0.5 mg ml −1 , Nippi). After 6 h of nucleofection, we added 0.5 µl of 10 µg ml −1 Dox to each well of 24-well plate. The following day (24 h after nucleofection), the cells were harvested for quantitative RT-PCR and western blotting to evaluate the expression of MERVL.

Embryo culture and microinjection
Female mice were superovulated by intraperitoneal administrations of 150 µl CARD HyperOva (Kudo) followed 48 h later with 7.5 IU human chorionic gonadotropin (Asuka Pharmaceutical). After injection of human chorionic gonadotropin, females were housed with male mice overnight for copulation. The following day, zygotes were collected from the oviducts of superovulated/mated females. The microinjection was carried out under a phase-contrast inverted microscope (IX73, Olympus) equipped with a micromanipulation system (Narishige). Each ASO (20 µM), SO (20 µM) and siRNA (20 and 80 µM) was microinjected into the male pronuclei of zygotes using FemtoJet 4i (Eppendorf). All ASO, SO and siRNA sequences used in this study are listed in Supplementary Table 8. Embryos were cultured in KSOM (MR-101-D, Sigma) at 37°C with 5% CO 2 .

RNA rescue assay
ASO-resistant full-length MERVL was constructed by substituting all of the three ASO-targeted sequences of MERVL using inverse PCR. The primers used to produce the constructs are listed in Supplementary  Table 8. Briefly, intact MERVL sequence was obtained from a bacterial artificial chromosome plasmid (RP23-231A8, https://www.ncbi.nlm. nih.gov/nuccore/AC127321.4, Thermo Fisher Scientific) and subcloned into pCAGGS_Myc plasmid using NEBuilder HiFi DNA Assembly Master Mix (New England Biolabs). Individual ASO-targeted sequences were mutated by inverse PCR with PrimeSTAR GXL DNA polymerase (TaKaRa) and specific primer sets (Supplementary Table 8). For subsequent steps, ASO-resistant full-length MERVL was amplified by PCR with a specific primer set containing the T7 promoter sequence at the 5′ end of forward primer for in vitro transcription with T7 polymerase. Following PCR amplification, in vitro transcription was performed using MEGAscript T7 Transcription Kit (Thermo Fisher Scientific) according to the manufacturer's instructions. Transcribed RNA was purified using RNeasy Mini Kit (QIAGEN). ASO-resistant MERVL RNA was co-microinjected into the male pronucleus of zygotes with 20 µM ASOs (mixed) at the final concentration of 200 ng µl −1 .

smFISH and immunofluorescence analysis
Embryos were fixed using fixative solution (4% paraformaldehyde and 0.2% polyvinyl alcohol (PVA) in PBS) for 20 min at room temperature (RT) and then permeabilized with 0.5% TritonX-100 in PBS for 10 min at RT. For smFISH, embryos were washed once with Stellaris Wash Buffer A (Bioresearch Technologies) for 5 min at RT. FISH was then performed overnight at 37°C in Stellaris Hybridization Buffer (Bioresearch Technologies) containing 25 pmol of a 48-probe set (Quasar 670-labeled, Bioresearch Technologies) against MERVL RNA. The following day, embryos were washed once with Stellaris Wash Buffer A for 30 min at 37°C and then counterstained with DAPI (1 µg ml −1 , Nacalai Tesque) for 30 min at 37°C. After washing with Stellaris Wash Buffer B (Bioresearch Technologies) for 5 min at RT, embryos were transferred to 10 µl drops of 0.2% PVA in PBS on a glass-bottomed dish covered by paraffin oil. The FISH probe sequences used in this study are listed in Supplementary  Table 8. For immunofluorescence analysis, nonspecific immunoreaction was blocked by incubating embryos in 2% bovine serum albumin (BSA) in PBS for 1 h at RT. After blocking, whole-mount immunofluorescence analyses were conducted using the following primary antibodies: mouse anti-MERVL-Gag (1/5 dilution, kept in our lab), rabbit anti-OCT4 (1/100 dilution, ab181557, abcam), mouse anti-CDX2 (1/100 dilution, ab157524, abcam), mouse anti-E-Cadherin (1/100 dilution, 610182, BD Transduction Laboratories), rabbit anti-cleaved caspase-3 (1/200 dilution, 9661, CST) and rabbit anti-Phospho-S15-p53 (1/200 dilution, 9284, CST). Embryos were incubated overnight at 4°C with primary antibody. The following day, washed embryos three times with 0.5% BSA in PBS for 5 min each and then incubated with Alexa Fluor 488-and/ or 568-conjugated anti-mouse and/or anti-rabbit immunoglobulin G secondary antibodies (1/500 dilution, Thermo Fisher Scientific) for 1 hr at RT. After washing three times with 0.5% BSA in PBS for 10 min each, the embryos were counterstained with DAPI (1 µg ml −1 ) for 30 min at RT and transferred to 10 µl drops of 0.2% PVA in PBS on a glass-bottomed dish covered by paraffin oil. smFISH and immunofluorescence images were obtained with a BZ-X810 fluorescence microscope (Keyence) or FV3000 confocal laser scanning microscope (Olympus) and processed with ImageJ (NIH) 52 .

EU incorporation assay
The embryos microinjected with either scramble ASOs or MERVL ASOs, were cultured in KSOM for 24 hrs. On the next day, two-cell-stage embryos were transferred to new KSOM containing 1 mM EU (Click-iT RNA Alexa Fluor 488 Imaging Kit, Thermo Fisher Scientific) and cultured for 1 h at 37°C with 5% CO 2 . The EU-labeled embryos were immediately fixed and permeabilized as described above. Incorporated EU into newly synthesized RNA was detected using the Click-iT RNA Alexa Fluor 488 Imaging Kit (Thermo Fisher Scientific) according to the manufacturer's instructions. In brief, the fixed embryos were treated with the Click-iT reaction cocktail containing Alexa Fluor 488 azide for 30 min at RT, and then washed once with Click-iT reaction rinse buffer. After washing once with 0.2% PVA in PBS, the embryos were counterstained with Hoechst 33342 (2 µg ml −1 , Dojindo) for 30 min at RT and transferred to 10 µl drops of 0.2% PVA in PBS on a glass-bottomed dish covered by paraffin oil. Images of EU-labeled embryos were obtained with a FV3000 confocal laser scanning microscope (Olympus) and processed with ImageJ (NIH) 52 .

Quantitative RT-PCR
Acidic Tyrode's solution (Sigma) containing 0.2% PVA was used to remove the zona pellucida (ZP) from embryos before sample collection. Then, 30 ZP-free morula-stage embryos and nucleofected ESC Dux s were lysed in Lysis Solution (from SuperPrep II Cell Lysis & RT Kit, TOYOBO), directly. Genomic DNA elimination and reverse transcription were performed using the SuperPrep II Cell Lysis & RT Kit with random hexamers and oligo dT primers according to the manufacturer's instructions. Real-time quantitative PCR was carried out using the following conditions: 95°C for 1 min, followed by 45 cycles each of 95°C for 15 s and 60°C for 1 min, on a Thermal Cycler Dice Real Time System III (TaKaRa) with Thunderbird SYBR qPCR Mix (TOYOBO) and specific primer sets (Supplementary Table 8). Relative gene expression was quantified with the ΔΔCT method and normalized to Actb or Gapdh expression.

Western blotting
The cells were directly lysed in Laemmli SDS sample buffer (1×, 62.5 mM Tris-HCl (pH 6.8), 2% SDS, 10% glycerol, 5% β-mercaptoethanol and 0.02% bromophenol blue) and sonicate with Bioruptor at a high setting, for 10 cycles each of 30 s with 30 s intervals. Total proteins were denatured at 95°C for 5 min and separated by 10% SDS-PAGE. The proteins were then transferred onto a Protran Nitrocellulose Membranes (0.45 µm pore size, GE Healthcare) via Power Blotter-Semi-dry Transfer System (Thermo Fisher Scientific). The membrane was blocked with Bullet Blocking One for Western Blotting (Nacalai Tesque) for 10 min at RT with gently rocking before incubation with mouse anti-MERVL-Gag (1/5 dilution, kept in our lab) or mouse anti-β-tubulin (1/2,000 dilution, E7, Developmental Studies Hybridoma Bank) antibody in 1:20 diluted blocking buffer with 0.1% PBST at 4°C overnight. The following day, the membranes were then incubated with HRP-conjugated goat anti-mouse immunoglobulin G secondary antibody (1/10,000 dilution, #330, MBL Life Science) in 1:20 diluted blocking buffer with 0.1% PBST for 30 min at RT with gently rocking. After washing three times with 0.1% PBST for 10 min each, Blots were developed using ECL Western Blotting Detection Reagent (Sigma) and exposed onto X-ray film.

RNA-seq
Preparation of total RNA-seq libraries was performed using SMART-seq Stranded Kit (Clontech), according to the manufacturer's instructions. In brief, 30 ZP-free two-cell, four-cell and eight-cell stage embryos whose cleavage stages were visually confirmed under the microscope, were lysed in 1× Lysis Buffer containing RNase inhibitor (0.2 IU µl −1 , from SMART-seq Stranded Kit, Clontech), directly. RNAs were randomly sheared by heating at 85°C for 8 min and subjected to reverse transcription with random hexamers and PCR amplification. Ribosomal fragments were depleted from each cDNA sample with scZapR and scR-Probes. Indexed total RNA-seq libraries were enriched through a second PCR amplification and sequenced using an Illumina HiSeqX sequencer (paired end, 150 bp). Two biological replicates were generated for each sample.

miniATAC-seq
The protocol for miniATAC-seq library was adapted from a previous report with minor modifications 36 . Briefly, 30 ZP-free two-cell, four-cell and eight-cell stage embryos whose cleavage stages were visually confirmed under the microscope were lysed in 6 µl lysis buffer (10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl 2 and 0.5% NP-40) for 10 min on ice. After cell lysis, 2 µl ddH 2 O, 10 µl of 2× TD Buffer (from Nextera XT DNA Prep Kit, Illumina) and 2 µl of Tn5 Transposase (from Nextera XT DNA Prep Kit, Illumina) were added to the lysates, which were mixed by pipetting followed by incubation at 37°C for 30 min. To stop the transposase reaction, 5 µl of 0.5% SDS was added and incubated at RT for 5 min. After Tn5 tagmentation, 100 ng Carrier RNA (Yeast tRNA, Sigma) was added and then messed samples up to 130 µl with 1× TE Buffer (10 mM Tris-HCl (pH 8.0) and 1 mM EDTA). Tagmented DNA was Article https://doi.org/10.1038/s41588-023-01324-y purified by phenol-chloroform extraction and ethanol precipitation with 20 µg glycogen as carrier and dissolved in 20 µl ddH 2 O. Afterwards, PCR was performed to amplify and index libraries using the following conditions: 72°C for 5 min and 98°C for 30 s, followed by 18 cycles each of 98°C for 10 s, 63°C for 30 s and 72°C for 1 min, with NEBNext HiFi PCR Master Mix and specific index primer sets (Supplementary Table  8). Pooled miniATAC-seq libraries were sequenced using an Illumina HiSeqX sequencer (paired end, 150 bp). Two biological replicates were generated for each sample.

RNA-seq and ATAC-seq data processing
Methods for RNA-seq analysis have been described previously 53 , In short, raw paired-end RNA-seq reads were aligned to indexed mouse genome (GRCm38/mm10) using STAR aligner version 2.5.3a 54 with following options: --twopassMode Basic; --outSAMtype BAM SortedByCoordinate; --outFilterType ByS-Jout; --outFilterMultimapNmax 1; --winAnchorMultimapNmax 50; --chimSegmentMin 12; --chimJunctionOverhangMin 8; --alignSJoverhangMin 8; --alignSJDBoverhangMin 10; --outFilterMismatchNmax 999; --outFilterMismatchNoverReadLmax 0.04; --alignSJstitchMismatchNmax 5 -1 5 5; --outSAMattrRGline ID:GRPundef; --alignIntronMin 20; --alignIntronMax 1000000; --alignMatesGapMax 1000000 for unique alignments. To quantify aligned reads on annotated genes (GENCODE vM25) and repetitive loci (originating from mm10.fa.out (RepeatMasker), best-match TE annotation defined previously 53 ), we used the featureCounts function, which is part of the Subread package 55 . To detect differentially expressed genes and repetitive elements between control and MERVL-KD embryos, an output file of featureCounts was entered into the DESeq2 package (version 1.16.1) 56 ; then, the program functions DESeqDataSetFrom-Matrix and DESeq were used to compare each gene's expression level between two biological samples. Differentially expressed genes were identified through two criteria: (1) at least twofold change and (2) binominal tests (Padj < 0.05; P values were adjusted for multiple testing using the Benjamini-Hochberg method). To perform GO and ChEA analyses, we used the functional annotation clustering tool in Enrichr 57 . The annotated terms with P < 0.05 from a modified Fisher's exact test were considered significant. To visualize read enrichments over representative genomic loci, TDF files were created from sorted BAM files using the IGVTools count function (Broad Institute) 58 . Figures of continuous tag counts over selected genomic intervals were created in the IGV browser (Broad Institute) 58 . To comprehensively evaluate the expression profiles across the entire transcribed regions (genic and intergenic region) in the genome, alignment file from STAR was entered into the groHMM package (version 1.24.0) 41 ; then, program function detectTranscripts with default parameter settings was used for detection of TARs de novo based on a two-state hidden Markov model. TARs were further compartmentalized into aTARs and uTARs according to overlap with existing known annotations (in GENCODE vM25 and RepeatMasker) using subtract function from BEDTools (version 2.30.0) 48 . After quantification of aligned reads on uTARs, differentially expressed uTARs between control and MERVL-KD embryos were detected using DESeq2 package 56 with the threshold of Padj < 0.01.
Raw paired-end ATAC-seq reads were filtered by TrimGalore (version 0.6.4, https://github.com/FelixKrueger/TrimGalore) with the default setting and aligned to indexed mouse genome (GRCm38/ mm10) using bowtie2 (version 2.4.4) 59 with the following options, -N 1; -L 25; --no-mixed; --no-discordant. Multiple aligned reads and PCR duplicates were removed using grep -v 'XS:' and MarkDuplicates function with REMOVE_DUPLICATES = true option, a part of Picard tools. Using SeqMonk (Babraham Bioinformatics), we calculated Pearson correlation coefficients between biological replicates. Peak calling for ATAC-seq data was performed using MACS2 (version 2.1.4) 60 with default arguments; we used a cut-off of P ≤ 10 −2 . The average tag density plots were drawn using Ngsplot (version 2.47.1) and plotHeatmap program as implemented in deepTools (version 3.1.3) 61 . To visualize read enrichment over representative genomic loci, TDF files were created from sorted BAM files using the IGVTools count function (Broad Institute) 58 . Figures for continuous tag counts over selected genomic intervals were created in the IGV browser (Broad Institute) 58 . For k-means clustering of ATAC-seq peaks, we firstly generated a set of regions that were called peaks with MACS2 in at least one of the samples, by merging ATAC-seq peak regions of all biological replicates using the mergeBed function from BEDTools (version 2.30.0) 48 . Using this merged peak file and ATAC-seq data from control embryos, we ran k-means clustering analysis using computeMatrix and plotHeatmap program as implemented in deepTools (version 3.1.3) 61 and determined that k = 5 was suitable for our data. To evaluate the functional annotation of each cluster, we used Genomic Region Enrichment of Annotation Tool (GREAT, version 4.0.4) 62 and HOMER (version 4.9) 63 , which associates ATAC-seq peaks in each cluster with their genomic feature and ontology of their putative target genes adjacent to peaks.

Statics and reproducibility
All statistical methods, sample sizes and P values for each plot are listed in the figure legends and/or in the corresponding Methods section. In brief, all grouped data are represented as mean ± s.e.m. All box plots are represented as follows: center lines, median; box edges, interquartile range (25 and 75 percentiles); whisker, 90% of the data points. Statistical significance for pairwise comparisons were determined using the two-sided unpaired t-tests, chi-square tests and Mann-Whitney U-tests with Bonferroni correction. All quantitative data are represented from three or more biological replicates. Fisher's exact test and hypergeometric test were used for the detection of significantly enriched GO terms, genes, and loci compared with backgrounds. Differentially expressed genes, TEs and loci were determined in the DESeq2 package 56 . Next-generation sequencing data (total RNA-seq and miniATAC-seq) are based on two biological replicates. For all experiments, no statistical methods were used to predetermine sample size. Experiments were not randomized, and investigators were not blinded to allocation during experiments and outcome assessments.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The total RNA-seq and miniATAC-seq data in embryos upon MERVL-KD, MERVLi and their respective controls are deposited in the Gene Expression Omnibus under accession code GSE196520. Source data are provided with this paper.

Code availability
This study did not make any custom code or algorithm. All software used in this study is publicly available and cited in the main text and/ or Methods section.