Zygotic genome activation (ZGA) is a critical postfertilization step that promotes totipotency and allows different cell fates to emerge in the developing embryo. MERVL (murine endogenous retrovirus-L) is transiently upregulated at the two-cell stage during ZGA. Although MERVL expression is widely used as a marker of totipotency, the role of this retrotransposon in mouse embryogenesis remains elusive. Here, we show that full-length MERVL transcripts, but not encoded retroviral proteins, are essential for accurate regulation of the host transcriptome and chromatin state during preimplantation development. Both knockdown and CRISPRi-based repression of MERVL result in embryonic lethality due to defects in differentiation and genomic stability. Furthermore, transcriptome and epigenome analysis revealed that loss of MERVL transcripts led to retention of an accessible chromatin state at, and aberrant expression of, a subset of two-cell-specific genes. Taken together, our results suggest a model in which an endogenous retrovirus plays a key role in regulating host cell fate potential.
Fertilization and early preimplantation development are processes in which unipotent gametes unite and acquire totipotency (Fig. 1a). After fertilization, embryos undergo zygotic genome activation (ZGA), a process that is widely conserved in vertebrates1,2,3. ZGA involves a transcriptional burst of hundreds to thousands of two-cell-specific genes. At this point, gene expression switches from a maternal to zygotic program2,4. ZGA occurs in two distinct waves called minor and major ZGA5. In mice, minor ZGA occurs from S phase in the zygote to G1 phase in the early two-cell stage embryo, whereas the major wave occurs during the second round of DNA replication at the middle-to-late two-cell stage6,7. Both waves of ZGA are critical for the embryo to acquire developmental competence6,8. However, the molecular events that drive ZGA and lead to acquisition of totipotency and developmental competence are still enigmatic.
Approximately 40% of the mouse genome is occupied by transposable elements (TEs), mobile genetic elements of which ~10% are endogenous retrovirus (ERV)9. Notably, the expression of murine endogenous retrovirus with leucine transfer RNA primer binding site (MERVL) is specifically activated at the two-cell stage concomitant with ZGA10,11,12. Recently, the transcription factor DUX, which is expressed during minor ZGA, was documented as an upstream regulator that activates two-cell genes and MERVL13,14,15. Furthermore, the MERVL long terminal repeat (LTR) promoter drives a subset of two-cell genes and generates chimeric transcripts with the host genes12. The above findings suggest that DUX/MERVL may activate an early transcriptional network that is required for ZGA and totipotency.
In 2012, Macfarlan et al. found that a rare transient cell population (~1%) in mouse embryonic stem cell (ESC) and induced pluripotent stem cell cultures expresses high levels of MERVL and two-cell genes without expression of pluripotent inner cell mass (ICM) maker genes, such as Pou5f1 (also known as Oct4), Sox2 and Nanog12. MERVL expression has been used as a marker for totipotent cells, as MERVL+ cells can commit to both embryonic and extraembryonic lineages after injection into recipient embryos at the eight-cell and morula stages12,16,17.
Despite the above findings, the function of MERVL itself remains unclear. Here, we overcome technical limitations in interrogating TE functions and analyze the role of MERVL in preimplantation development. We found that depletion of MERVL transcripts resulted in embryonic lethality due to defects in early lineage specification and genome stability, demonstrating that MERVL is essential for mouse preimplantation development.
MERVL exhibits distinct localization in mouse embryos
To understand the dynamics of MERVL expression, we first analyzed publicly available single-cell RNA-sequencing (scRNA-seq) datasets from each blastomere at eight representative stages of preimplantation development18 (Fig. 1a). To define regions of nonredundant MERVLs in mouse genome, we used RepeatMasker to annotate the genome for unique interspersed internal regions of MERVL (MERVL-int, n = 1,426) and LTR promoters of the MERVL (MT2_Mm, n = 2,366). The expression of MERVL and its LTR promoter culminated in the middle of the two-cell stage and then gradually decreased until blastocyst stage (Fig. 1b,c).
We also set out to investigate the expression and localization of MERVL transcripts in preimplantation embryos using single-molecule fluorescence in situ hybridization (smFISH). Interestingly, smFISH revealed that MERVL expression is detectable in the nuclei from zygotes and early two-cell stage embryos in which polyadenylated MERVL mRNA cannot be detected (Fig. 1b,d). Afterwards, MERVL RNA gradually translocated from the nucleus at middle two-cell stage onward and was highly restricted to the cytoplasm by late two-cell stage (Fig. 1d). These changes in MERVL transcript localization were consistent with increased MERVL protein levels during the middle two-cell stage (Fig. 1e). These observations raise the possibility that nuclear MERVL transcript has distinct roles in gene regulation in the early stages of preimplantation development compared to cytoplasmic MERVL transcript, leading us to investigate MERVL function further.
MERVL-KD results in embryonic lethality
Inconsistencies regarding the early embryonic phenotypes of MERVL knockdown (KD) in previous studies11,19,20, led us to re-examine the KD effects of MERVL on preimplantation development. To this end, we developed specific antisense oligonucleotides (ASOs) that target interspersed MERVL copies (Fig. 2a and Extended Data Fig. 1a,b). After predicting the genome-wide target sites of individual ASOs using BLASTn, we confirmed that 46.9% (n = 669/1,426) of MERVL copies were targeted by at least one ASO with up to two mismatches allowed (Extended Data Fig. 1c). Subsequently, we confirmed that our ASO sequences efficiently targeted full-length MERVL (≥5 kb, n = 377/556, 67.8%), by combining three independent anti-MERVL ASOs (Extended Data Fig. 1d). We experimentally validated the KD efficiency of each ASO using a recently developed ESC-based in vitro system (Extended Data Fig. 1e and Methods)21 in which MERVL expression was drastically reduced at both the mRNA and protein levels (Extended Data Fig. 1f–h). Injection of each ASO into the male pronucleus of zygotes also leads to substantial reduction of MERVL RNA signal at the late two-cell stage (Extended Data Fig. 2a). Because we noted that cocktail of three independent ASOs increased MERVL-KD efficiency (Fig. 2b and Extended Data Fig. 2), mixed ASOs (1:1:1 = 20 µM) were used in subsequent experiments.
Next, we monitored the effects of MERVL-KD using ASOs on preimplantation development (Fig. 2c,d). MERVL-KD embryos displayed a significant developmental delay from 2.5 dpc (Fig. 2d). Despite longer culture, a greater percentage of MERVL-KD embryos remained at the two- to four-cell stage (26.2% for control and 82.7% for MERVL-KD at 2.5 dpc (**P = 0.00533, chi-square test; Fig. 2d)). Strikingly, at 4.5 dpc, almost no blastocyst formation was observed in MERVL-KD embryos (Fig. 2c,d). These findings revealed that majority of MERVL-KD embryos suffered a developmental delay and halted development before blastocyst formation. To exclude the possibility that the reduction of MERVL transcript level and aberrant developmental phenotypes in MERVL-KD embryos might arise from unexpected artificial effects due to the presence of excessive small nucleotides, such as DNA replication stress and innate immune response, we performed microinjection of sense oligonucleotides (SOs), which are complementary to individual MERVL-targeting ASOs into the male pronucleus of zygotes. This confirmed that injection of sense oligonucleotides (SOs) did not affect the expression of MERVL and preimplantation development (Extended Data Fig. 2).
Alternatively, we also tested KD of MERVL using CasRx, another KD strategy with a CRISPR-based RNA-targeting system22 (Extended Data Fig. 3a). One day after induction of CasRx-mediated KD, no MERVL protein was detected in MERVL-KD two-cell stage embryos, and nuclear and cytoplasmic MERVL RNA signals remained, albeit with clear reductions compared to that of control (Extended Data Fig. 3b). Even with this KD inefficiency, the preimplantation development upon CasRx-mediated KD partially phenocopied that observed in ASO-mediated KD embryos (Extended Data Fig. 3c,d).
Of note, the phenotype of ASO-mediated MERVL-KD embryos is reminiscent of that observed in loss-of-function of genes associated with ICM and trophectoderm differentiation23,24,25,26,27, implying that MERVL transcription might function in early cell lineage specification during preimplantation development. To test this hypothesis, we examined mRNA levels of genes related to ICM (Oct4, Sox2 and Nanog) and trophectoderm differentiation (Tead4, Tcfap2c and Cdx2) in MERVL-KD embryos at 3.5 dpc, by RT-qPCR (Fig. 2e). Sox2, Nanog and Cdx2 mRNA levels were significantly reduced in MERVL-KD morula, in contrast to Oct4, Tead4 and Tcfap2c transcripts, which increased (Fig. 2e). These data suggest that MERVL transcription during early stages of preimplantation development is required for subsequent proper expression of genes linked to the earliest cell lineage specification events. To further investigate the molecular etiologies of the phenotype upon MERVL-KD, we examined OCT4 and CDX2 protein expression using immunofluorescence analysis at the morula stage (Fig. 2f). MERVL-KD morula is notably composed of significantly fewer blastomeres (Fig. 2g) with a significant reduction in OCT4 and CDX2 protein expression (Fig. 2h). Notably, MERVL-KD embryos also displayed increased hallmarks of genomic instability such as nuclear deformation and micronuclei, and apoptosis (Fig. 2i,j). In sum, we conclude that the depletion of MERVL transcript results in disruption of lineage specification, cell death, and ultimately early embryonic lethality.
Cis-acting functions of MERVL is essential for development
We next interrogated which products from MERVL were required for preimplantation development. Firstly, we used two independent short interfering RNA (siRNAs) to knock down MERVL (Fig. 3a). Upon siRNA-mediated KD of MERVL, no cytoplasmic MERVL RNA and protein were detected in late two-cell stage embryos, whereas the nuclear signal of MERVL RNA was still intact in early two-cell stage embryos (Fig. 3b). Consequently, although the phenotypes of siRNA-mediated MERVL-KD are inconsistent between previous studies19,20, we confirmed that siRNA-mediated KD of MERVL had almost no significant impact on preimplantation development (61.6% for control and 66.1% for MERVL-KD (P = 0.5966, chi-square test) (Fig. 3c,d)), suggesting that proteins encoded by MERVL are dispensable for preimplantation development.
To determine whether transcribed MERVL RNA plays a critical role in preimplantation development, we used a MERVL RNA construct with point mutations conferring resistance to ASO-mediated KD (Fig. 3e) and co-injected an ASO-resistant MERVL RNA with mixed ASOs into zygotes. MERVL transcript was observed in nuclei of MERVL-KD early two-cell stage embryos at levels comparable to control embryos, when ASO-resistant MERVL transcript was present (Fig. 3f). However, upon injection of MERVL RNA, no cytoplasmic MERVL RNA signal was detected in late two-cell stage embryos. (Fig. 3f), which is to be expected, given that our exogenous MERVL construct is likely degraded by RNA surveillance pathways28. Although nuclear MERVL transcript was present in MERVL-KD embryos by co-injection with ASO-resistant MERVL RNA, the embryonic lethality was not rescued (Fig. 3g,h). Collectively, our data demonstrated that neither encoded proteins nor trans-acting RNA of MERVL is indispensable for preimplantation development.
To gain better insight into cis-acting functions of MERVL, we employed the CRISPR interference (CRISPRi) system29,30,31 (Fig. 3i). After microinjection, we verified that the expression of MERVL protein was efficiently suppressed in two-cell stage embryos upon CRISPRi to MERVL (MERVLi), whereas MERVL transcriptional level was significantly decreased in MERVLi embryos compared to that in GFPi control, but not fully silenced (Fig. 3j). Notably, blastocyst formation was severely impaired upon MERVLi (Fig. 3k). Only 38.7% MERVLi embryos reached blastocyst stage, compared with 78.5% for GFPi control embryos (Fig. 3l). These findings showed the necessity of MERVL transcription in early stage of development at the chromatin level.
A subset of two-cell genes are dysregulated in MERVL-KD embryos
To assess the effects of ASO-mediated MERVL-KD on ZGA, we made use of 5’-ethynyluridine (EU) to assay global transcriptional levels in control and MERVL-KD two-cell stage embryos (Fig. 4a). We observed no significant change in zygotic transcriptional activity between control and MERVL-KD two-cell stage embryos during major ZGA (Fig. 4b,c).
Further, we carried out total RNA-seq analyses in control and MERVL-KD embryos (Extended Data Fig. 4a,b). The RNA-seq dataset provided accurate gene expression profiles (Fig. 4d and Supplementary Tables 1 and 2). When we compared differentially expressed genes (DEGs) between control and MERVL-KD embryos, we found that 938, 745 and 277 genes were significantly dysregulated in MERVL-KD embryos, at the two-cell, four-cell and eight-cell stages, respectively (Fig. 4e, Supplementary Table 3). Of these DEGs, only 3–15% of genes were defined as ‘maternally inherited RNA’ in a database of transcriptome in mouse early embryos version 2 (DBTMEE v2 (refs. 32,33)) across all stages examined (Fig. 4f), suggesting that MERVL-KD led to major defects in expression of zygotic genes rather than maternal mRNA clearance. In contrast to annotated genes, among all interspersed repetitive elements, only MERVL and a small number of closely related subtypes were significantly downregulated in MERVL-KD preimplantation embryos (Extended Data Fig. 5a,b). Moreover, largely consistent with previous reports12, we also detected 77 genes that generated chimeric transcripts with junction to MT2_Mm in both control and MERVL-KD embryos. However, the overall expression level of chimeric transcripts is unchanged in MERVL-KD embryos (Extended Data Fig. 5c).
To better understand the functions of the DEGs, we performed gene ontology (GO) and ChIP Enrichment Analysis (ChEA)34, indicating that TP53 (also known as p53)-related terms were significantly enriched in each set of upregulated DEGs (Fig. 4g and Extended Data Fig. 4c). Taken together, these transcriptome analyses raise the possibility that activated p53 ectopically accumulates in the nuclei of MERVL-KD embryos. Because DNA damage and activated p53 contribute to the induction of two-cell genes in early-stage preimplantation embryos and ESCs35, we reasoned that the genes significantly upregulated in MERVL-KD embryos might be enriched for two-cell and p53-target genes. To test this hypothesis, we next compared DEGs with the lists of two-cell genes (as defined in Macfarlan et al.12) and DBTMEE v2 (refs. 32,33). Significant enrichment of two-cell genes was observed in all DEG sets (P < 0.001, hypergeometric test for overrepresentation; Fig. 4h). Strikingly, upregulated DEGs in MERVL-KD four-cell embryos exhibited the highest increase in enrichment of two-cell genes (Fig. 4h). Unsupervised hierarchical clustering of all two-cell gene transcripts revealed that more than half of two-cell genes were dysregulated in at least one developmental stage of MERVL-KD embryos (Extended Data Fig. 4d). In addition, two-cell- and four-cell-transient genes were also enriched in all sets of upregulated DEGs (Fig. 4h). These findings argue that MERVL-KD embryos still retain a two-cell/totipotent-like transcriptome even at mid-preimplantation stages (that is, four-cell and eight-cell stages). Indeed, representative track views demonstrated that two-cell gene loci abnormally maintained high expression levels of two-cell genes in MERVL-KD embryos (Fig. 4i). Consistent with our hypothesis, we also observed a significant increase in phospho-S15-p53 signal intensity in MERVL-KD embryos compared to that in controls (Fig. 4j,k). Overall, these results indicate that a subset of two-cell genes are persistently active due to defects in MERVL transcription.
We also performed total RNA-seq analysis for GFPi control and MERVLi embryos (Supplementary Table 1). Hierarchical clustering and Pearson correlation analyses for the expression of all annotated transcripts showed higher similarity between ASO-mediated MERVL-KD and MERVLi embryos (Extended Data Fig. 6a,b), suggesting that the extensive transcriptional anomalies in MERVLi embryos at least partially resemble that in MERVL-KD embryos. Indeed, although only moderate reduction of MERVL expression was observed in MERVLi two-cell stage embryos due to limited effectiveness of CRISPRi for depleting nuclear MERVL RNA (Extended Data Fig. 6c), we identified 872, 535 and 273 genes that were differentially expressed in MERVLi embryos, at the two-cell, four-cell and eight-cell stages, respectively (Extended Data Fig. 6d and Supplementary Table 4). Importantly, significant enrichment of two-cell genes was observed in all sets of DEGs through MERVLi preimplantation development (P < 0.001, hypergeometric test for overrepresentation; Extended Data Fig. 6e). Consistently, a representative two-cell gene maintained abnormally high expression level in MERVLi embryos (Extended Data Fig. 6f). These profiles demonstrated that MERVLi embryos exhibited a transcriptome that was highly similar to that of MERVL-KD embryos, corroborating that the embryonic phenotypes in MERVL-KD and MERVLi embryos arise from defects in cis-regulatory function of MERVL.
MERVL modulates dynamic changes in accessible chromatin
To further investigate the correlation between transcription and chromatin state, we next asked whether MERVL-KD impacts chromatin accessibility at the transcription start sites (TSSs) of dysregulated two-cell genes by using the optimized low-input assay for transposase-accessible chromatin with sequencing (miniATAC-seq) method (Supplementary Table 1 and Extended Data Fig. 7)36. The overall enrichment pattern of ATAC-seq signals at 10-kb intervals across the genome was modestly changed in MERVL-KD embryos during preimplantation development (Fig. 5a). Meanwhile, open chromatin at MERVL loci was significantly reduced in MERVL-KD two-cell-stage embryos (Fig. 5b). Because ASO-mediated KD might trigger premature transcriptional termination via degradation of the residual Pol II-associated transcripts37, suggesting that the reduction of chromatin openness at MERVL loci is associated with premature termination of MERVL transcripts induced by KD.
Next, we determined whether the transcriptional changes in MERVL-KD embryos were associated with their chromatin states by examining enrichment of ATAC-seq signals at TSSs of two-cell genes, ATAC-seq signals showed significantly higher enrichment at the TSSs of two-cell genes in MERVL-KD embryos at the four-cell stage compared to control embryos (Fig. 5c). In line with this finding, a representative track view showed that an accessible chromatin state was detected at the TSS of a two-cell gene. This open state persisted into the mid-preimplantation stage upon MERVL-KD, a time when this locus is repressed/closed in control embryos (Fig. 5d). These results confirm that ectopic expression of two-cell genes in mid-preimplantation embryos upon MERVL-KD is associated with accessible chromatin states at these loci.
Furthermore, k-means clustering of ATAC-seq signals across all peak regions in control embryos yielded five clusters that fall into two distinct categories. Category A became accessible gradually as development proceeds (composed of cluster 1 and 2), and category B became specifically accessible in two-cell-stage embryos (composed of cluster 3–5) (Fig. 5e). Among each cluster, approximately half of ATAC-seq peaks were enriched at largely genic regions, including promoter, exon and intronic regions (Fig. 5f). GO analysis revealed that genes adjacent to peaks in category A were highly enriched for roles in ‘chromatin organization’ and ‘posttranscriptional regulation’, such as those involved in regulation of the pluripotent state (Fig. 5g, top)38,39. Moreover, the peaks in category A contained genic regions that associated with pluripotency, including Oct4 and Cdx2 (Fig. 5g, top, and Supplementary Table 5). Conversely, the peaks in category B were characterized by genes and biological terms associated with totipotent and two-cell states, such as Dux and Zscan4c, whose expression was restricted to two-cell stage embryos (Fig. 5g, bottom, and Supplementary Table 5)13,14,40. Finally, we examined ATAC-seq signals in each cluster and compared control and MERVL-KD embryos. In category B (cluster 3–5), analysis of average tag densities of ATAC-seq signals confirmed that chromatin accessibility was significantly decreased in peaks of both cluster 3 and 4 in MERVL-KD two-cell stage embryos (Fig. 5h). In contrast, from the four-cell stage onward, ATAC-seq signals were significantly increased in peaks of cluster 4 and 5 in MERVL-KD embryos (Fig. 5h), suggesting that a two-cell-like chromatin state was partially retained during preimplantation development of MERVL-KD embryos.
MERVL-KD exhibits less abundant transcripts adjacent to MERVL
We interrogated the cis-regulatory functions of MERVL. First, we compared the expression levels of annotated genes (in GENCODE vM25) and their distances to nearby full-length MERVL in control and MERVL-KD two-cell stage embryos. No significant differences were observed in the expression of adjacent annotated genes to MERVL, ranging from 0 to 500 kb (Fig. 6a). Intriguingly, in contrast to annotated genes, the unannotated transcripts from the adjacent regions to MERVL were significantly downregulated in MERVL-KD embryos, and this was positively correlated with their proximity to the full-length MERVL (Fig. 6b).
To evaluate the expression levels of unannotated intergenic transcripts in control and MERVL-KD two-cell stage embryos, we used groHMM, a two-state hidden Markov model (HMM)-based algorithm41. This framework classified transcriptionally active regions (TARs) across the genome into annotated TAR (aTAR) that overlapped with existing gene annotation and unannotated TAR (uTAR) that did not overlap with regions of any genes, pseudogenes and MERVL families annotated by GENCODE and RepeatMasker (Fig. 6c), identifying in total 240,869 uTARs (Fig. 6d and Supplementary Table 6). Despite the relative scarcity of uTAR, which accounted for 27.5–34.6% of aligned reads, we detected 3,499 differentially expressed uTARs between control and MERVL-KD embryos (Fig. 6d and Supplementary Table 7). In particular, the vast majority of them were shown to be downregulated upon the loss of MERVL transcription (Fig. 6e). Representative track views demonstrated that transcriptional activities in MERVL-adjacent regions were significantly reduced at upstream of genic TSS and gene desert regions, in MERVL-KD two-cell embryos (Fig. 6f). Specifically, we found a higher fraction of differentially expressed uTARs preferentially located in the neighborhood of MERVL, compared with all called TARs (Fig. 6g). These data suggest a cis-regulatory function for MERVL in the regulation of unannotated transcripts from their adjacent loci. Moreover, concomitantly with the change in expression of uTARs, the intensity of ATAC-seq signal on differentially expressed uTARs was also decreased in MERVL-KD embryos (Fig. 6h), leading to the suggestion that MERVL-driven unannotated transcripts may play a role in chromatin remodeling.
We provided functional evidence that transcriptional activation of MERVL is essential for progression of development in mouse preimplantation embryos. Depletion of MERVL transcripts results in embryonic lethality with profound defects in development and is associated with dysregulation of MERVL including their adjacent transcripts, and retaining two-cell-like transcriptome and chromatin state (Fig. 6i). These findings suggest the possibility that MERVL transcription in totipotent cells may act as a switch for the transition from totipotency to pluripotency and is responsible for the onset of differentiation and ontogeny (Fig. 6i).
One of most striking consequences of loss of the MERVL transcript was complete defects in preimplantation development. Several groups have used KD strategies in an effort to characterize the role of MERVL in preimplantation development. However, results from these studies were inconsistent. By performing liposome-based transfection with ASO against MERVL, Kigami et al. reported that MERVL-KD embryos displayed a ~50% decrease in the developmental competence to the four-cell stage but no significant impact on developmental rate to the morula and blastocyst stages11. On the other hand, Huang et al. recently performed siRNA-mediated KD against MERVL by microinjection of siRNA into the cytoplasm of zygotes19. siRNA-mediated KD of MERVL exhibited a mild developmental delay at four- to eight-cell stage, but no statistically significant differences were observed on developmental rate between control and KD embryos20. In sharp contrast, ASO-mediated KD of MERVL in this study results in embryonic lethality with severe defects in first lineage specification and genomic stability. The phenotypic differences appear to be explained by methodologies for KD. First, we microinjected multiple ASOs with 2′-O-methoxyethyl modifications, providing enhanced duplex stability and nuclease resistance42, into zygotes resulting in efficient degradation of nuclear MERVL transcripts at much higher levels than liposome-based transfection (Figs. 2b and 3f). In addition, it is noteworthy that our ASOs more efficiently target interspersed full-length MERVL elements than that in previous study (n = 377/556, 67.8% versus n = 259/556, 46.6%). Consistent with previous studies, we confirmed that siRNA-mediated KD of MERVL did not affect development to the blastocyst stage (Fig. 3a–d), indicating that nuclear MERVL transcripts, but not encoded proteins, are required for preimplantation development. Although the activation of MT2_Mm inducing the two-cell state in previous study is readily appreciable12,16,43, we also found that the expression levels of chimeric transcripts with MT2_Mm were unchanged in MERVL-KD two-cell stage embryos (Extended Data Fig. 5c), suggesting that the phenotypes of MERVL-KD embryos are unlikely to arise from dysregulation of chimeric transcripts with MT2_Mm. Given the results with siRNA-KD and in trans RNA rescue, cis-regulatory activity of MERVL loci is likely to play a key role in host chromatin remodeling at totipotent-to-pluripotent transition. As a proof-of-concept for this hypothesis, we performed CRISPRi-mediated repression of MERVL and revealed that MERVLi embryos partially mimicked ASO-mediated KD of MERVL with regard to developmental defects and retaining two-cell transcriptome (Fig. 3i–l and Extended Data Fig. 6).
Our findings showed that transcription itself and/or cis-acting RNA from MERVL loci is likely to play a critical role for host preimplantation development. Of note, having observed a subset of two-cell genes retaining expression in MERVL-KD mid-preimplantation embryos (Fig. 4g–i), MERVL transcripts may act to repress expression of two-cell genes during preimplantation development, at least in part. Several lines of evidence allow us to predict the mechanism by which MERVL may regulate host transcriptome and chromatin state in preimplantation development. First, a recent study demonstrated that MERVL drives the 3D reorganization of the genome in two-cell-like cells, the rare cell population in ESC culture, and early mouse embryos by providing a topologically associating domain boundary that is coupled to directional transcription from MERVL44,45. Thus, it is plausible that transcription itself from MERVL might be responsible for remodeling of chromatin structure during the transition from totipotency to pluripotency. As shown in Fig. 6, it is noteworthy that loss of MERVL transcription also leads to the reduction of transcription levels and chromatin accessibility in unannotated adjacent regions ranging up to 50 kb away. One possible explanation for this finding is that these cryptic transcripts may result from readthrough of RNA polymerase beyond MERVL elements, which leads to pervasive epigenetic changes. Related to our findings, Jachowicz et al. demonstrated that transcriptional activity of LINE1 that peaks at two-cell stage embryos affects global chromatin accessibility and unfolding46. Likewise, there has been reported that the transcribed RNA from other type of endogenous retroviruses (that is, intracisternal A-particles and MMERVK10C) can act in cis in ESC47. Accordingly, we speculate that en masse transcription of MERVL loci acts in cis as starting events for long-range 3D chromatin organization and remodeling toward the onset of ontogeny. Future studies can now interrogate the intriguing mechanisms that may underlie the important role of MERVL transcript in regulating the host genome that we have uncovered here.
Eight-week-old female and male B6D2F1 (BDF1) mice were purchased from the Japan SLC. All animal experiments were reviewed and approved by the Institutional Animal Care and Use Committee (protocols 09105-(10) and 11045-(6)) at Keio University, where mice were maintained and fed ad libitum with standard diet and water in a temperature-, humidity- and light-controlled room (23°C ± 3°C, a humidity of 50% ± 10% and 14-h light/10-h dark cycle).
Doxycycline (Dox)-inducible Dux ESC line was generated using the PiggyBac Transposon System in our laboratory (hereafter referred to as ESCDUX), in which Dux expression was induced in the presence of Dox21. The ESCDUX was cultured in ESC medium (10% FBS, 1× sodium pyruvate, 1× GlutaMAX, 1× MEM non-essential amino acids solution, 1× penicillin/streptomycin and 0.055 mM β-mercaptoethanol in DMEM high glucose (4.5 g liter−1)) containing 2i (1 µM PD0325901, LC Laboratories; and 3 µM CHIR99021, LC Laboratories) and LIF (1,000 U ml−1, in-house) on cell-culture plates coated with 0.1% gelatin under feeder-free conditions. The expanded colonies were dissociated using 0.05% trypsin-EDTA solution for passaging.
MERVL-targeting ASO design
Three independent 20-nt ASOs were designed based on the consensus sequence of MERVL (Supplementary Table 8). ASO candidates that effectively target the internal retroviral regions of MERVL were selected using ChIRP Probe Designer (https://www.biosearchtech.com/support/tools/design-software/chirp-probe-designer), taking into account the secondary structure of the MERVL RNA. To identify potential binding sites for MERVL-targeting ASOs in silico, nonredundant high-confidence MERVL loci were downloaded from RepeatMasker database (http://www.repeatmasker.org/species/mm.html) and converted to BED format. Consequently, FASTA files for these MERVL loci were extracted using the getfasta function as implemented in BEDTools (version 2.30.0)48 and entered into the makeblastdb program from BLASTn (version 2.6.0+) to generate DNA database for BLASTn search. Each ASO sequence was used to search the database of MERVL DNA using BLASTn49, with up to two mismatches allowed. The synthesized ASOs were chemically modified with phosphorothioate linkage and 2′-O-methoxyethyl modification at positions 1–5 and 15–20. As the control for the ASO experiments, we also generated a 20-nt randomized oligonucleotide that does not target the mouse genome (as determined by BLASTn search) and three independent 20-nt SOs that were complementary to respective MERVL-targeting ASOs. The three independent ASOs/SOs against MERVL with the highest targeting rate and randomized non-targeting ASO (scrambled ASO) were used for KD experiments (Supplementary Table 8).
ASO-mediated KD of MERVL in ESCs
A total of 1 × 105 ESCDuxs were harvested from culture dishes per one nucleofection. To validate efficiency of the ASO-mediated KD, the cells were nucleofected with 1 nmol ASO against MERVL or scrambled ASO using P3 Primary Cell 96-well Nucleofector Kit (Lonza) according to manufacturer’s instruction and seeded into each well of a 24-well plate containing 500 µl ESC medium with 2i, LIF and 1.1 µl iMatrix-511 Silk solution (0.5 mg ml−1, Nippi). After 6 h of nucleofection, we added 0.5 µl of 10 µg ml−1 Dox to each well of 24-well plate. The following day (24 h after nucleofection), the cells were harvested for quantitative RT-PCR and western blotting to evaluate the expression of MERVL.
Embryo culture and microinjection
Female mice were superovulated by intraperitoneal administrations of 150 µl CARD HyperOva (Kudo) followed 48 h later with 7.5 IU human chorionic gonadotropin (Asuka Pharmaceutical). After injection of human chorionic gonadotropin, females were housed with male mice overnight for copulation. The following day, zygotes were collected from the oviducts of superovulated/mated females. The microinjection was carried out under a phase-contrast inverted microscope (IX73, Olympus) equipped with a micromanipulation system (Narishige). Each ASO (20 µM), SO (20 µM) and siRNA (20 and 80 µM) was microinjected into the male pronuclei of zygotes using FemtoJet 4i (Eppendorf). All ASO, SO and siRNA sequences used in this study are listed in Supplementary Table 8. Embryos were cultured in KSOM (MR-101-D, Sigma) at 37°C with 5% CO2.
RNA rescue assay
ASO-resistant full-length MERVL was constructed by substituting all of the three ASO-targeted sequences of MERVL using inverse PCR. The primers used to produce the constructs are listed in Supplementary Table 8. Briefly, intact MERVL sequence was obtained from a bacterial artificial chromosome plasmid (RP23-231A8, https://www.ncbi.nlm.nih.gov/nuccore/AC127321.4, Thermo Fisher Scientific) and subcloned into pCAGGS_Myc plasmid using NEBuilder HiFi DNA Assembly Master Mix (New England Biolabs). Individual ASO-targeted sequences were mutated by inverse PCR with PrimeSTAR GXL DNA polymerase (TaKaRa) and specific primer sets (Supplementary Table 8). For subsequent steps, ASO-resistant full-length MERVL was amplified by PCR with a specific primer set containing the T7 promoter sequence at the 5′ end of forward primer for in vitro transcription with T7 polymerase. Following PCR amplification, in vitro transcription was performed using MEGAscript T7 Transcription Kit (Thermo Fisher Scientific) according to the manufacturer’s instructions. Transcribed RNA was purified using RNeasy Mini Kit (QIAGEN). ASO-resistant MERVL RNA was co-microinjected into the male pronucleus of zygotes with 20 µM ASOs (mixed) at the final concentration of 200 ng µl−1.
CasRx-mediated KD of MERVL
The expression plasmids of CasRx (PB-CAG-CasRx-P2A-EGFP-BGH PolyA, #154004, Addgene) and gRNA (pXR003: CasRx gRNA cloning backbone, #109053, Addgene) were gifted from H. Yang and P. Hsu22,50. For optimal expression of CasRx in early stages of preimplantation embryos, the CAG promoter sequence in the CasRx-expressing plasmid was replaced with CBh promoter, which was obtained from pX330-U6-Chimeric_BB-CBh-hSpCas9 (pX330, gifted from F. Zhang, #42230, Addgene)51, using NEBuilder HiFi DNA Assembly Cloning Kit (New England Biolabs). The pXR003 CasRx gRNA-expressing plasmid was linearized by FastDigest BpiI (Thermo Fisher Scientific), an isoschizomers of BbsI, and then individual annealed gRNA spacer with four overhangs (sense: 5′-AAAC, antisense: 5′-AAAA) was inserted into the linearized pXR003 CasRx gRNA-expressing plasmid by T4 DNA ligase (Ligation high v2, TOYOBO). The primers and gRNA sequences used to produce the constructs are listed in Supplementary Table 8. The expression plasmids of CasRx and MERVL-targeting gRNA (or non-targeting gRNA) were microinjected into the male pronucleus of zygotes at a final concentration of 10 ng µl−1 each.
CRISPRi targeting MERVL
For evaluation of cis-acting functions of MERVL, we used the CRISPRi system, composed of dCas9-KRAB-MeCP2, which specifically induces H3K9 trimethylation at target loci. We designed three independent gRNAs targeting the coding regions of Gag and Pol in full-length copies of MERVL to repress MERVL transcription in preimplantation embryos. The CBh-dCas9-KRAB-MeCP2 fusion construct was constructed using the pX330 (#42230, Addgene)51 plasmid backbone. dCas9-KRAB-MeCP2 sequence was obtained from a PB-TRE-dCas9-KRAB-MeCP2 (gifted from A. Califano, #122267, Addgene) plasmid and subcloned into pX330 using NEBuilder HiFi DNA Assembly Cloning Kit (New England Biolabs). Constructed pX330 plasmid in which hSpCas9 site had been replaced with dCas9-KRAB-MeCP2 (hereafter referred as pX330-CRISPRi), was then subjected to subcloning of sgRNA sequences. Individual annealed sgRNA spacer with four overhangs (sense: 5′-CACC, antisense: 5′-AAAC) was subcloned into the BbsI site of the pX330 plasmid51. Subsequently, a transcriptional unit of each gRNA vector (consisting of a human U6 promoter, gRNA spacer and scaffold sequences) was inserted into SacII-to-XbaI sites of pX330-CRISPRi plasmid. The primers and gRNA sequences used to produce the constructs are listed in Supplementary Table 8. MERVL-targeting pX330-CRISPRi plasmid was microinjected into the male pronucleus of zygotes at a final concentration of 15 ng µl−1.
smFISH and immunofluorescence analysis
Embryos were fixed using fixative solution (4% paraformaldehyde and 0.2% polyvinyl alcohol (PVA) in PBS) for 20 min at room temperature (RT) and then permeabilized with 0.5% TritonX-100 in PBS for 10 min at RT. For smFISH, embryos were washed once with Stellaris Wash Buffer A (Bioresearch Technologies) for 5 min at RT. FISH was then performed overnight at 37°C in Stellaris Hybridization Buffer (Bioresearch Technologies) containing 25 pmol of a 48-probe set (Quasar 670-labeled, Bioresearch Technologies) against MERVL RNA. The following day, embryos were washed once with Stellaris Wash Buffer A for 30 min at 37°C and then counterstained with DAPI (1 µg ml−1, Nacalai Tesque) for 30 min at 37°C. After washing with Stellaris Wash Buffer B (Bioresearch Technologies) for 5 min at RT, embryos were transferred to 10 µl drops of 0.2% PVA in PBS on a glass-bottomed dish covered by paraffin oil. The FISH probe sequences used in this study are listed in Supplementary Table 8. For immunofluorescence analysis, nonspecific immunoreaction was blocked by incubating embryos in 2% bovine serum albumin (BSA) in PBS for 1 h at RT. After blocking, whole-mount immunofluorescence analyses were conducted using the following primary antibodies: mouse anti-MERVL-Gag (1/5 dilution, kept in our lab), rabbit anti-OCT4 (1/100 dilution, ab181557, abcam), mouse anti-CDX2 (1/100 dilution, ab157524, abcam), mouse anti-E-Cadherin (1/100 dilution, 610182, BD Transduction Laboratories), rabbit anti-cleaved caspase-3 (1/200 dilution, 9661, CST) and rabbit anti-Phospho-S15-p53 (1/200 dilution, 9284, CST). Embryos were incubated overnight at 4°C with primary antibody. The following day, washed embryos three times with 0.5% BSA in PBS for 5 min each and then incubated with Alexa Fluor 488- and/or 568-conjugated anti-mouse and/or anti-rabbit immunoglobulin G secondary antibodies (1/500 dilution, Thermo Fisher Scientific) for 1 hr at RT. After washing three times with 0.5% BSA in PBS for 10 min each, the embryos were counterstained with DAPI (1 µg ml−1) for 30 min at RT and transferred to 10 µl drops of 0.2% PVA in PBS on a glass-bottomed dish covered by paraffin oil. smFISH and immunofluorescence images were obtained with a BZ-X810 fluorescence microscope (Keyence) or FV3000 confocal laser scanning microscope (Olympus) and processed with ImageJ (NIH)52.
EU incorporation assay
The embryos microinjected with either scramble ASOs or MERVL ASOs, were cultured in KSOM for 24 hrs. On the next day, two-cell-stage embryos were transferred to new KSOM containing 1 mM EU (Click-iT RNA Alexa Fluor 488 Imaging Kit, Thermo Fisher Scientific) and cultured for 1 h at 37°C with 5% CO2. The EU-labeled embryos were immediately fixed and permeabilized as described above. Incorporated EU into newly synthesized RNA was detected using the Click-iT RNA Alexa Fluor 488 Imaging Kit (Thermo Fisher Scientific) according to the manufacturer’s instructions. In brief, the fixed embryos were treated with the Click-iT reaction cocktail containing Alexa Fluor 488 azide for 30 min at RT, and then washed once with Click-iT reaction rinse buffer. After washing once with 0.2% PVA in PBS, the embryos were counterstained with Hoechst 33342 (2 µg ml−1, Dojindo) for 30 min at RT and transferred to 10 µl drops of 0.2% PVA in PBS on a glass-bottomed dish covered by paraffin oil. Images of EU-labeled embryos were obtained with a FV3000 confocal laser scanning microscope (Olympus) and processed with ImageJ (NIH)52.
Acidic Tyrode’s solution (Sigma) containing 0.2% PVA was used to remove the zona pellucida (ZP) from embryos before sample collection. Then, 30 ZP-free morula-stage embryos and nucleofected ESCDuxs were lysed in Lysis Solution (from SuperPrep II Cell Lysis & RT Kit, TOYOBO), directly. Genomic DNA elimination and reverse transcription were performed using the SuperPrep II Cell Lysis & RT Kit with random hexamers and oligo dT primers according to the manufacturer’s instructions. Real-time quantitative PCR was carried out using the following conditions: 95°C for 1 min, followed by 45 cycles each of 95°C for 15 s and 60°C for 1 min, on a Thermal Cycler Dice Real Time System III (TaKaRa) with Thunderbird SYBR qPCR Mix (TOYOBO) and specific primer sets (Supplementary Table 8). Relative gene expression was quantified with the ΔΔCT method and normalized to Actb or Gapdh expression.
The cells were directly lysed in Laemmli SDS sample buffer (1×, 62.5 mM Tris-HCl (pH 6.8), 2% SDS, 10% glycerol, 5% β-mercaptoethanol and 0.02% bromophenol blue) and sonicate with Bioruptor at a high setting, for 10 cycles each of 30 s with 30 s intervals. Total proteins were denatured at 95°C for 5 min and separated by 10% SDS-PAGE. The proteins were then transferred onto a Protran Nitrocellulose Membranes (0.45 µm pore size, GE Healthcare) via Power Blotter-Semi-dry Transfer System (Thermo Fisher Scientific). The membrane was blocked with Bullet Blocking One for Western Blotting (Nacalai Tesque) for 10 min at RT with gently rocking before incubation with mouse anti-MERVL-Gag (1/5 dilution, kept in our lab) or mouse anti-β-tubulin (1/2,000 dilution, E7, Developmental Studies Hybridoma Bank) antibody in 1:20 diluted blocking buffer with 0.1% PBST at 4°C overnight. The following day, the membranes were then incubated with HRP-conjugated goat anti-mouse immunoglobulin G secondary antibody (1/10,000 dilution, #330, MBL Life Science) in 1:20 diluted blocking buffer with 0.1% PBST for 30 min at RT with gently rocking. After washing three times with 0.1% PBST for 10 min each, Blots were developed using ECL Western Blotting Detection Reagent (Sigma) and exposed onto X-ray film.
Preparation of total RNA-seq libraries was performed using SMART-seq Stranded Kit (Clontech), according to the manufacturer’s instructions. In brief, 30 ZP-free two-cell, four-cell and eight-cell stage embryos whose cleavage stages were visually confirmed under the microscope, were lysed in 1× Lysis Buffer containing RNase inhibitor (0.2 IU µl−1, from SMART-seq Stranded Kit, Clontech), directly. RNAs were randomly sheared by heating at 85°C for 8 min and subjected to reverse transcription with random hexamers and PCR amplification. Ribosomal fragments were depleted from each cDNA sample with scZapR and scR-Probes. Indexed total RNA-seq libraries were enriched through a second PCR amplification and sequenced using an Illumina HiSeqX sequencer (paired end, 150 bp). Two biological replicates were generated for each sample.
The protocol for miniATAC-seq library was adapted from a previous report with minor modifications36. Briefly, 30 ZP-free two-cell, four-cell and eight-cell stage embryos whose cleavage stages were visually confirmed under the microscope were lysed in 6 µl lysis buffer (10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl2 and 0.5% NP-40) for 10 min on ice. After cell lysis, 2 µl ddH2O, 10 µl of 2× TD Buffer (from Nextera XT DNA Prep Kit, Illumina) and 2 µl of Tn5 Transposase (from Nextera XT DNA Prep Kit, Illumina) were added to the lysates, which were mixed by pipetting followed by incubation at 37°C for 30 min. To stop the transposase reaction, 5 µl of 0.5% SDS was added and incubated at RT for 5 min. After Tn5 tagmentation, 100 ng Carrier RNA (Yeast tRNA, Sigma) was added and then messed samples up to 130 µl with 1× TE Buffer (10 mM Tris-HCl (pH 8.0) and 1 mM EDTA). Tagmented DNA was purified by phenol-chloroform extraction and ethanol precipitation with 20 µg glycogen as carrier and dissolved in 20 µl ddH2O. Afterwards, PCR was performed to amplify and index libraries using the following conditions: 72°C for 5 min and 98°C for 30 s, followed by 18 cycles each of 98°C for 10 s, 63°C for 30 s and 72°C for 1 min, with NEBNext HiFi PCR Master Mix and specific index primer sets (Supplementary Table 8). Pooled miniATAC-seq libraries were sequenced using an Illumina HiSeqX sequencer (paired end, 150 bp). Two biological replicates were generated for each sample.
RNA-seq and ATAC-seq data processing
Methods for RNA-seq analysis have been described previously53, In short, raw paired-end RNA-seq reads were aligned to indexed mouse genome (GRCm38/mm10) using STAR aligner version 2.5.3a54 with following options: --twopassMode Basic; --outSAMtype BAM SortedByCoordinate; --outFilterType BySJout; --outFilterMultimapNmax 1; --winAnchorMultimapNmax 50; --chimSegmentMin 12; --chimJunctionOverhangMin 8; --alignSJoverhangMin 8; --alignSJDBoverhangMin 10; --outFilterMismatchNmax 999; --outFilterMismatchNoverReadLmax 0.04; --alignSJstitchMismatchNmax 5 -1 5 5; --outSAMattrRGline ID:GRPundef; --alignIntronMin 20; --alignIntronMax 1000000; --alignMatesGapMax 1000000 for unique alignments. To quantify aligned reads on annotated genes (GENCODE vM25) and repetitive loci (originating from mm10.fa.out (RepeatMasker), best-match TE annotation defined previously53), we used the featureCounts function, which is part of the Subread package55. To detect differentially expressed genes and repetitive elements between control and MERVL-KD embryos, an output file of featureCounts was entered into the DESeq2 package (version 1.16.1)56; then, the program functions DESeqDataSetFromMatrix and DESeq were used to compare each gene’s expression level between two biological samples. Differentially expressed genes were identified through two criteria: (1) at least twofold change and (2) binominal tests (Padj < 0.05; P values were adjusted for multiple testing using the Benjamini–Hochberg method). To perform GO and ChEA analyses, we used the functional annotation clustering tool in Enrichr57. The annotated terms with P < 0.05 from a modified Fisher’s exact test were considered significant. To visualize read enrichments over representative genomic loci, TDF files were created from sorted BAM files using the IGVTools count function (Broad Institute)58. Figures of continuous tag counts over selected genomic intervals were created in the IGV browser (Broad Institute)58. To comprehensively evaluate the expression profiles across the entire transcribed regions (genic and intergenic region) in the genome, alignment file from STAR was entered into the groHMM package (version 1.24.0)41; then, program function detectTranscripts with default parameter settings was used for detection of TARs de novo based on a two-state hidden Markov model. TARs were further compartmentalized into aTARs and uTARs according to overlap with existing known annotations (in GENCODE vM25 and RepeatMasker) using subtract function from BEDTools (version 2.30.0)48. After quantification of aligned reads on uTARs, differentially expressed uTARs between control and MERVL-KD embryos were detected using DESeq2 package56 with the threshold of Padj < 0.01.
Raw paired-end ATAC-seq reads were filtered by TrimGalore (version 0.6.4, https://github.com/FelixKrueger/TrimGalore) with the default setting and aligned to indexed mouse genome (GRCm38/mm10) using bowtie2 (version 2.4.4)59 with the following options, -N 1; -L 25; --no-mixed; --no-discordant. Multiple aligned reads and PCR duplicates were removed using grep -v ‘XS:’ and MarkDuplicates function with REMOVE_DUPLICATES = true option, a part of Picard tools. Using SeqMonk (Babraham Bioinformatics), we calculated Pearson correlation coefficients between biological replicates. Peak calling for ATAC-seq data was performed using MACS2 (version 2.1.4)60 with default arguments; we used a cut-off of P ≤ 10−2. The average tag density plots were drawn using Ngsplot (version 2.47.1) and plotHeatmap program as implemented in deepTools (version 3.1.3)61. To visualize read enrichment over representative genomic loci, TDF files were created from sorted BAM files using the IGVTools count function (Broad Institute)58. Figures for continuous tag counts over selected genomic intervals were created in the IGV browser (Broad Institute)58. For k-means clustering of ATAC-seq peaks, we firstly generated a set of regions that were called peaks with MACS2 in at least one of the samples, by merging ATAC-seq peak regions of all biological replicates using the mergeBed function from BEDTools (version 2.30.0)48. Using this merged peak file and ATAC-seq data from control embryos, we ran k-means clustering analysis using computeMatrix and plotHeatmap program as implemented in deepTools (version 3.1.3)61 and determined that k = 5 was suitable for our data. To evaluate the functional annotation of each cluster, we used Genomic Region Enrichment of Annotation Tool (GREAT, version 4.0.4)62 and HOMER (version 4.9)63, which associates ATAC-seq peaks in each cluster with their genomic feature and ontology of their putative target genes adjacent to peaks.
Statics and reproducibility
All statistical methods, sample sizes and P values for each plot are listed in the figure legends and/or in the corresponding Methods section. In brief, all grouped data are represented as mean ± s.e.m. All box plots are represented as follows: center lines, median; box edges, interquartile range (25 and 75 percentiles); whisker, 90% of the data points. Statistical significance for pairwise comparisons were determined using the two-sided unpaired t-tests, chi-square tests and Mann–Whitney U-tests with Bonferroni correction. All quantitative data are represented from three or more biological replicates. Fisher’s exact test and hypergeometric test were used for the detection of significantly enriched GO terms, genes, and loci compared with backgrounds. Differentially expressed genes, TEs and loci were determined in the DESeq2 package56. Next-generation sequencing data (total RNA-seq and miniATAC-seq) are based on two biological replicates. For all experiments, no statistical methods were used to predetermine sample size. Experiments were not randomized, and investigators were not blinded to allocation during experiments and outcome assessments.
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
The total RNA-seq and miniATAC-seq data in embryos upon MERVL-KD, MERVLi and their respective controls are deposited in the Gene Expression Omnibus under accession code GSE196520. Source data are provided with this paper.
This study did not make any custom code or algorithm. All software used in this study is publicly available and cited in the main text and/or Methods section.
Minami, N., Suzuki, T. & Tsukamoto, S. Zygotic gene activation and maternal factors in mammals. J. Reprod. Dev. 53, 707–715 (2007).
Schultz, R. M., Stein, P. & Svoboda, P. The oocyte-to-embryo transition in mouse: past, present, and future. Biol. Reprod. 99, 160–174 (2018).
Jukam, D., Shariati, S. A. M. & Skotheim, J. M. Zygotic genome activation in vertebrates. Dev. Cell 42, 316–332 (2017).
Hamatani, T., Carter, M. G., Sharov, A. A. & Ko, M. S. Dynamics of global gene expression changes during mouse preimplantation development. Dev. Cell 6, 117–131 (2004).
Abe, K. et al. The first murine zygotic transcription is promiscuous and uncoupled from splicing and 3’ processing. EMBO J. 34, 1523–1537 (2015).
Abe, K. I. et al. Minor zygotic gene activation is essential for mouse preimplantation development. Proc. Natl Acad. Sci. USA 115, E6780–E6788 (2018).
Aoki, F., Worrad, D. M. & Schultz, R. M. Regulation of transcriptional activity during the first and second cell cycles in the preimplantation mouse embryo. Dev. Biol. 181, 296–307 (1997).
Warner, C. M. & Versteegh, L. R. In vivo and in vitro effect of alpha-amanitin on preimplantation mouse embryo RNA polymerase. Nature 248, 678–680 (1974).
Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
Peaston, A. E. et al. Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev. Cell 7, 597–606 (2004).
Kigami, D., Minami, N., Takayama, H. & Imai, H. MuERV-L is one of the earliest transcribed genes in mouse one-cell embryos. Biol. Reprod. 68, 651–654 (2003).
Macfarlan, T. S. et al. Embryonic stem cell potency fluctuates with endogenous retrovirus activity. Nature 487, 57–63 (2012).
Hendrickson, P. G. et al. Conserved roles of mouse DUX and human DUX4 in activating cleavage-stage genes and MERVL/HERVL retrotransposons. Nat. Genet. 49, 925–934 (2017).
De Iaco, A. et al. DUX-family transcription factors regulate zygotic genome activation in placental mammals. Nat. Genet. 49, 941–945 (2017).
Whiddon, J. L., Langford, A. T., Wong, C. J., Zhong, J. W. & Tapscott, S. J. Conservation and innovation in the DUX4-family gene network. Nat. Genet. 49, 935–940 (2017).
Yang, F. et al. DUX-miR-344-ZMYM2-mediated activation of MERVL LTRs induces a totipotent 2C-like state. Cell Stem Cell 26, 234–250 (2020).
Choi, Y. J. et al. Deficiency of microRNA miR-34a expands cell fate potential in pluripotent stem cells. Science 355, eaag1927 (2017).
Deng, Q., Ramskold, D., Reinius, B. & Sandberg, R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343, 193–196 (2014).
Huang, Y. et al. Stella modulates transcriptional and endogenous retrovirus programs during maternal-to-zygotic transition. Elife 6, e22345 (2017).
Wu, G., Lei, L. & Scholer, H. R. Totipotency in the mouse. J. Mol. Med (Berl.) 95, 687–694 (2017).
Li, T. D. et al. TDP-43 safeguards the embryo genome from L1 retrotransposition. Sci. Adv. 8, eabq3806 (2022).
Konermann, S. et al. Transcriptome engineering with RNA-targeting type VI-D CRISPR effectors. Cell 173, 665–676.e14 (2018).
Cao, Z. et al. Transcription factor AP-2gamma induces early Cdx2 expression and represses HIPPO signaling to specify the trophectoderm lineage. Development 142, 1606–1615 (2015).
Choi, I., Carey, T. S., Wilson, C. A. & Knott, J. G. Transcription factor AP-2gamma is a core regulator of tight junction biogenesis and cavity formation during mouse early embryogenesis. Development 139, 4623–4632 (2012).
Jedrusik, A., Cox, A., Wicher, K. B., Glover, D. M. & Zernicka-Goetz, M. Maternal-zygotic knockout reveals a critical role of Cdx2 in the morula to blastocyst transition. Dev. Biol. 398, 147–152 (2015).
Keramari, M. et al. Sox2 is essential for formation of trophectoderm in the preimplantation embryo. PLoS One 5, e13952 (2010).
Tan, M. H. et al. An Oct4-Sall4-Nanog network controls developmental progression in the pre-implantation mouse embryo. Mol. Syst. Biol. 9, 632 (2013).
Laffleur, B. & Basu, U. Biology of RNA surveillance in development and disease. Trends Cell Biol. 29, 428–445 (2019).
Todd, C. D., Deniz, O., Taylor, D. & Branco, M. R. Functional evaluation of transposable elements as enhancers in mouse embryonic and trophoblast stem cells. Elife 8, e44344 (2019).
Fuentes, D. R., Swigut, T. & Wysocka, J. Systematic perturbation of retroviral LTRs reveals widespread long-range effects on human gene regulation. Elife 7, e35989 (2018).
Yeo, N. C. et al. An enhanced CRISPR repressor for targeted mammalian gene regulation. Nat. Methods 15, 611–616 (2018).
Park, S. J., Shirahige, K., Ohsugi, M. & Nakai, K. DBTMEE: a database of transcriptome in mouse early embryos. Nucleic Acids Res. 43, D771–D776 (2015).
Park, S. J. et al. Inferring the choreography of parental genomes during fertilization from ultralarge-scale whole-transcriptome analysis. Genes Dev. 27, 2736–2748 (2013).
Lachmann, A. et al. ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics 26, 2438–2444 (2010).
Grow, E. J. et al. p53 convergently activates Dux/DUX4 in embryonic stem cells and in facioscapulohumeral muscular dystrophy cell models. Nat. Genet. 53, 1207–1220 (2021).
Wu, J. et al. Chromatin analysis in human early development reveals epigenetic transition during ZGA. Nature 557, 256–260 (2018).
Lee, J. S. & Mendell, J. T. Antisense-mediated transcript knockdown triggers premature transcription termination. Mol. Cell 77, 1044–1054 (2020).
Chen, Q. & Hu, G. Post-transcriptional regulation of the pluripotent state. Curr. Opin. Genet Dev. 46, 15–23 (2017).
Mattout, A. & Meshorer, E. Chromatin plasticity and genome organization in pluripotent embryonic stem cells. Curr. Opin. Cell Biol. 22, 334–341 (2010).
Falco, G. et al. Zscan4: a novel gene expressed exclusively in late two-cell embryos and embryonic stem cells. Dev. Biol. 307, 539–550 (2007).
Chae, M., Danko, C. G. & Kraus, W. L. groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data. BMC Bioinform. 16, 222 (2015).
Crooke, S. T., Baker, B. F., Crooke, R. M. & Liang, X. H. Antisense technology: an overview and prospectus. Nat. Rev. Drug Disco. 20, 427–453 (2021).
Modzelewski, A. J. et al. A mouse-specific retrotransposon drives a conserved Cdk2ap1 isoform essential for development. Cell 184, 5541–5558 (2021).
Genet, M. & Torres-Padilla, M. E. The molecular and cellular features of two-cell-like cells: a reference guide. Development 147, dev189688 (2020).
Kruse, K. et al. Transposable elements drive reorganisation of 3D chromatin during early embryogenesis. Preprint at bioRxiv https://doi.org/10.1101/523712 (2019).
Jachowicz, J. W. et al. LINE-1 activation after fertilization regulates global chromatin accessibility in the early mouse embryo. Nat. Genet. 49, 1502–1510 (2017).
Asimi, V. et al. Hijacking of transcriptional condensates by endogenous retroviruses. Nat. Genet. 54, 1238–1247 (2022).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
He, B. et al. Modulation of metabolic functions through Cas13d-mediated gene knockdown in liver. Protein Cell 11, 518–524 (2020).
Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).
Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676–682 (2012).
Sakashita, A. et al. Endogenous retroviruses drive species-specific germline transcriptomes in mammals. Nat. Struct. Mol. Biol. 27, 967–977 (2020).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinform. 14, 128 (2013).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Ramirez, F., Dundar, F., Diehl, S., Gruning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014).
McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
We thank all members of the Siomi laboratory for discussions on this work, K. Hayashi (Graduate School of Medicine, Faculty of Medicine, Osaka University) and A. Inoue (Center for Integrative Medical Science, RIKEN) for critical reading of the manuscript, and S. Aikawa (Graduate School of Medicine, The University of Tokyo) and C. Takeuchi (Keio University School of Medicine) for assistance with NGS data analyses. This work was supported by JSPS Grants-in-Aid for Early-Career Scientists (21K15108 to A.S.), the Kato Memorial Bioscience Foundation Research Grant (to A.S.), JST Fusion Oriented Research for disruptive Science and Technology (JPMJFR214O to A.S.), MEXT Grants-in-Aid for Scientific Research in Innovative Areas (19H05753 to H.S. and 22H04700 to A.S.), AMED project for elucidating and controlling mechanisms of aging and longevity (1005442 to H.S.), the Uehara Memorial Foundation Research Incentive Grant (to A.S.) and the Uehara Memorial Foundation Research Grant (to H.S.).
The authors declare no competing interests.
Peer review information
Nature Genetics thanks Magdalena Zernicka-Goetz, Denes Hnisz and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Design and validation of ASOs targeting MERVL.
(a) Schematic of MERVL indicating the positions of ASOs. Intact MERVL encodes the retroviral proteins; Gag (Group-specific antigens, comprised of MA, matrix; CA, capsid proteins) and Pol (Polymerase, comprised of RT, reverse transcriptase; INT, integrase; dUTPase, dUTP phosphatase). Below whisker plot showing alignment quality in each interspersed genomic MERVL copy (adapted from Dfam: https://dfam.org/family/DF0003918/seed). (b) Pie chart indicates the populations of full length (≥ 5001 bp) and truncated (1–5000 bp) MERVL copies across the mouse genome. (c) Chromosome maps showing the distribution of MERVL copies that are targeted by ASOs throughout the mouse genome. Each colored rectangle (blue for ASO-#1, yellow for ASO-#2, and red for ASO-#3) indicates targeted MERVL copy. (d) Histogram showing the distributions of ASO-targeted (magenta) and untargeted (green) MERVL copies. The abundance of each length of MERVL element was tallied with a given count. The proportion of full-length MERVL (>5 kb) was highlighted in red. (e) Schematic for ASO-mediated KD against MERVL in ESC harboring a doxycycline (Dox)-inducible Dux transgene (termed ESCDUX). TRE, tetracycline responsive element; rtTA, reverse tetracycline responsive transcriptional activator. (f) Expression levels of MERVL mRNA measured by qRT-PCR in ESCDUXs nucleofected with each individual MERVL-targeting ASO and scrambled ASO. Relative expression is quantified with ΔΔCt method and normalized to Gapdh expression. Bars show means with s.e.m. Dots represent biological replicates (n = 3 samples). (g) Expression level of MERVL mRNA measured by qRT-PCR in scrambled control and ASOs (Mixed)-mediated MERVL-KD ESCDUXs in the presence or absence of Dox. Relative expression is quantified with ΔΔCt method and normalized to Gapdh expression. Bars show means with s.e.m. Dots represent biological replicates (n = 4 samples). (h) Expressions of MERVL retroviral protein (Gag), measured by western blotting in scrambled control and ASOs (Mixed)-mediated MERVL-KD ESCDUXs in the presence or absence of Dox. β-Tubulin was used as a loading control. Representative blot image is from 3 independent experiments. Data for panels in b-d and f-h are available as source data.
Extended Data Fig. 2 The KD effectiveness of each ASO against MERVL in preimplantation embryos.
(a) Representative images of smFISH for MERVL RNA (green) with DAPI counterstain (gray) in two-cell stage embryos from each experimental condition, from 3 independent experiments. Zygotes were injected with scrambled ASO, MERVL ASOs or the corresponding SOs. Scale bar: 20 µm. (b) Representative phase-contrast images of 4.5 dpc blastocysts from each experimental condition, from 2 independent experiments. Scale bar: 100 µm. (c) Percentage of embryos by developmental stage at 4.5 dpc in each experimental condition. The number of embryos in each experimental condition is shown in the bottom. Data for panels in c are available as source data.
Extended Data Fig. 3 CasRx-mediated MERVL-KD and its effects on preimplantation development.
(a) Schematic of experimental procedure for CasRx-mediated KD of MERVL. (b) Representative images of smFISH and immunofluorescence staining for MERVL RNA (top) and MERVL-Gag protein (bottom) with DAPI counterstain in untargeted control and MERVL-KD two-cell stage embryos at 1.5 dpc, from 3 independent experiments. Scale bar: 20 µm. (c) Representative phase-contrast images of 4.5 dpc blastocysts upon untargeted control and MERVL-KD conditions, from 3 independent experiments. Scale bar: 100 µm. (d) Percentage of embryos by stages of development, upon untargeted control (n = 69) and MERVL-KD (n = 70) conditions. N.S., not significant; ***P < 0.001, chi-square test. Data for panel in d is available as source data.
Extended Data Fig. 4 Total RNA-seq analysis of control and MERVL-KD embryos at two-cell, four-cell, and eight-cell stages.
(a) Schematic of sample collection for total RNA-seq analysis in scrambled control and ASO-mediated MERVL-KD embryos. We supplied the apparently healthiest embryos upon MERVL-KD for preparation of a total RNA-seq library. (b) Scatter plots showing the reproducibility between biological replicates in total RNA-seq data. Pearson correlation values (R) are shown. (c) GO analysis of DEGs in MERVL-KD embryos, assessed by the Enrichr. (d) Heatmap showing hierarchical clustering of expression patterns of all 2C genes (as defined in 12) in control and MERVL-KD embryos at two-cell, four-cell and eight-cell stages. Expression level of each gene is shown in Z-score, calculated by subtracting the mean expression value and dividing by standard deviation. Data for panels in c and d are available as source data.
Extended Data Fig. 5 MERVL-KD only dysregulates a few types of transposable element.
(a) Dot plots showing RPKM values for MERVL-int and MT2_Mm in scrambled control and MERVL-KD embryos at two-cell, four-cell, and eight-cell stages. Central bars represent medians, the top and bottom lines encompass 50% of the data points. (b) MA plots show differentially expressed (DE) repetitive elements (annotated in RepeatMasker) between scrambled control and MERVL-KD embryos. DE repeats were defined as those with a |FoldChange| ≥ 2 and Padj < 0.05 (binomial test with Benjamini–Hochberg correction) and shown in red circles. MERVL-int and MT2_Mm were included in downregulated repeats. (c) Violin plots indicate CPM values for chimeric fusion transcripts in scrambled control and MERVL-KD two-cell stage embryos. Each plot encompasses box plot; central bars represent medians, box edges indicate 50% of data points, and the whiskers show 90% of data points. N.S., not significant, two-tailed unpaired t-tests. Data for panels in a-c are available as source data.
Extended Data Fig. 6 RNA-seq analysis upon CRISPRi induction shows that MERVLi embryos resemble ASO-mediated MERVL-KD embryos.
(a) Unsupervised hierarchical cluster of all annotated transcript profiles (n = 55,254) from RNA-seq data in embryos from different groups and developmental stages. Each dendrogram leaf represents an RNA-seq sample and the y-axis shows the distance based on Pearson correlation between each pair of samples. (b) Heatmap showing pairwise Pearson correlation coefficient of gene expression between each pair of samples. (c) Dot plots showing RPKM values for MERVL-int and MT2_Mm in GFPi control and MERVLi embryos. Central bars represent medians, the top and bottom lines encompass 50% of the data points. (d) RNA-seq differential gene expression analysis: MERVLi versus GFPi control embryos obtained at two-cell, four-cell and eight-cell stages. 872, 535 and 273 genes evinced significant changes in expression in MERVLi embryos (blue circles, Padj < 0.05; binomial test with Benjamini–Hochberg correction). (e) Bubble plot showing overlap between all DEGs in MERVLi embryos with the list of 2C genes and DBTMEE v2 transcriptome categories. The bubble plot sizes show the -log10[P values] derived from a hypergeometric test. (f) Track views show RNA-seq signals in GFPi control and MERVLi embryos, on a representative 2C gene locus (as defined in 12). The y-axis represents normalized tag counts for total RNA-seq in each sample. Data for panels in a-e are available as source data.
Extended Data Fig. 7 miniATAC-seq analysis of control and MERVL-KD embryos at two-cell, four-cell, and eight-cell stages.
(a) Schematic of sample collection for miniATAC-seq analysis in scrambled control and ASO-mediated MERVL-KD embryos. We supplied apparently healthiest embryos upon MERVL-KD for preparation of miniATAC-seq library. (b) Scatter plots showing the reproducibility between biological replicates in miniATAC-seq data. Pearson correlation values (R) are shown. (c) Track view of ATAC-seq read enrichments at a representative chromosomal position in all biological replicates. The y-axis represents normalized tag counts for ATAC-seq in each sample.
Supplementary Table 1
Alignment and quantification statistics in each NGS sample.
Supplementary Table 2
Transcript profiles of genes and repetitive elements for scrambled control and MERVL-KD embryos from total RNA-seq analysis.
Supplementary Table 3
List of DEGs between scrambled control and MERVL-KD embryos.
Supplementary Table 4
List of DEGs between GFPi control and MERVLi embryos.
Supplementary Table 5
List of locations of ATAC-seq peaks in each cluster and their biological features.
Supplementary Table 6
List of locations of uTARs from total RNA-seq in scrambled control and MERVL-KD two-cell embryos.
Supplementary Table 7
List of DE uTARs between scrambled control and MERVL-KD two-cell embryos.
Supplementary Table 8
List of sequence information used in this study.
Source Data Fig. 1
Statistical Source Data
Source Data Fig. 2
Statistical Source Data
Source Data Fig. 3
Statistical Source Data
Source Data Fig. 4
Statistical Source Data
Source Data Fig. 5
Statistical Source Data
Source Data Fig. 6
Statistical Source Data
Source Data Extended Data Fig. 1
Statistical Source Data
Source Data Extended Data Fig. 1h
Unprocessed western Blots
Source Data Extended Data Fig. 2
Statistical Source Data
Source Data Extended Data Fig. 3
Statistical Source Data
Source Data Extended Data Fig. 4
Statistical Source Data
Source Data Extended Data Fig. 5
Statistical Source Data
Source Data Extended Data Fig. 6
Statistical Source Data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sakashita, A., Kitano, T., Ishizu, H. et al. Transcription of MERVL retrotransposons is required for preimplantation embryo development. Nat Genet 55, 484–495 (2023). https://doi.org/10.1038/s41588-023-01324-y