DNA methylation, chromatin states and their interrelationships represent critical epigenetic information, but these are largely unknown in human early embryos. Here, we apply single-cell chromatin overall omic-scale landscape sequencing (scCOOL-seq) to generate a genome-wide map of DNA methylation and chromatin accessibility at single-cell resolution during human preimplantation development. Unlike in mice, the chromatin of the paternal genome is already more open than that of the maternal genome at the mid-zygote stage in humans, and this state is maintained until the 4-cell stage. After fertilization, genes with high variations in DNA methylation, and those with high variations in chromatin accessibility, tend to be two different sets. Furthermore, 1,797 out of 5,155 (35%) widely open chromatin regions in promoters closed when transcription activity was inhibited, indicating a feedback mechanism between transcription and open chromatin maintenance. Our work paves the way for dissecting the complex, yet highly coordinated, epigenetic reprogramming during human preimplantation development.
After fertilization, DNA methylation reprogramming and chromatin remodelling are crucial for converting terminally differentiated gametes into a totipotent state and subsequently into a pluripotent state1,2,3, and these processes have been extensively studied during mouse preimplantation development4,5,6,7,8,9,10,11. Although DNA methylation dynamics12,13,14,15,16,17 and parental allele-specific DNA methylation18,19 in human preimplantation embryos have been analysed, chromatin remodelling, parental allele-specific chromatin accessibility and the relationships between these different omics are still largely unknown. Furthermore, epigenetic reprogramming features dissected by low-input chromatin sequencing technologies, such as chromatin immunoprecipitation followed by sequencing (ChIP–seq), assay for transposase-accessible chromatin using sequencing (ATAC-seq) and DNase-seq, were potentially confounded by aneuploid embryos20,21,22, which were mixed in the pools of samples. Recently developed techniques6,23,24 based on nucleosome occupancy and methylome sequencing (NOMe-seq)25,26,27,28,29 and post-bisulfite adaptor tagging sequencing (PBAT-seq)30,31 can analyse DNA methylation and chromatin accessibility in the same sample. Here, we used the single-cell chromatin overall omic-scale landscape sequencing (scCOOL-seq) technique6 to simultaneously analyse the chromatin state, nucleosome positioning, DNA methylation, copy number variation (CNV) and interrelationships among different epigenetic layers in the same individual cells from human preimplantation embryos.
Chromatin accessibility remodelling of human early embryos
Using scCOOL-seq6, we analysed epigenetic reprogramming at six critical stages during human preimplantation development and in gametes and human embryonic stem cells (hESCs) (Fig. 1a). First, we examined the CNV of each individual cell and found that, in general, aneuploidy did not influence the global chromatin status during human preimplantation development (Supplementary Fig. 1a–c). However, for our analysis to remain stringent, we excluded all aneuploid cells and focused on euploid cells. Then, 227 euploid cells in total were retained for further analysis (Fig. 1a and Supplementary Fig. 1d–f), unless specified otherwise. On average, we could detect chromatin accessibility and DNA methylation in 14,916 (59%) promoters and 16,688 (60%) CpG islands (CGIs) simultaneously in each individual cell (Supplementary Fig. 1g,h and Supplementary Table 1).
First, the global chromatin accessibility pattern was characterized (Fig. 1b). The chromatin of oocytes was more accessible than that of sperm (4.7% mean GCH (GCA/GCC/GCT) methylation level in sperm and 41.8% in oocytes); however, oocytes still had much fewer nucleosome-depleted regions (NDRs) around the transcription start sites (TSSs; 368 TSS NDRs (NDRs containing TSSs)) (Fig. 1b–d and Supplementary Fig. 2a). Once fertilized, the average chromatin accessibility of the genome became less open and reached its lowest level at the 8-cell stage (34.2% mean GCH methylation level in zygotes, 31.9% at the 2-cell stage, 30.3% at the 4-cell stage and 26.9% at the 8-cell stage). After zygotic gene activation (ZGA), the global chromatin accessibility pattern increased and reached its highest point at the morula stage (39.6% mean GCH methylation level; Fig. 1b).
Like mouse embryos, human embryos had three strongly positioned nucleosomes located downstream of the TSS from the 2-cell stage onward (Fig. 1c), and the openness of the promoters was strongly correlated with the corresponding gene expression levels (Supplementary Fig. 2b,c). At the mid-zygote stage, 8,385 gene promoter regions abruptly turned into an open state that was maintained thereafter (Fig. 1d). Over half of these open promoters (58%) had a high CpG density and were enriched in Gene Ontology (GO) terms, such as cellular metabolic process, RNA processing and biosynthetic process (Supplementary Fig. 2d). Next, we called NDRs and nucleosome-occupied regions (NORs) of the pools of cells from each stage (Supplementary Fig. 3a–d). The promoter regions in oocytes were much more closed and only 3,033 proximal NDRs (NDRs within 2 kb upstream and downstream of the TSS) were identified. After fertilization, 13,406–37,916 proximal NDRs were called from zygote to the blastocyst stage (Supplementary Fig. 3a). Furthermore, the length of the open chromatin around TSSs gradually increased after the 2-cell stage, especially after ZGA (Supplementary Fig. 3b). Similar to the arrangement in somatic cells32,33,34,35, the nucleosomes were strongly enriched at the intron–exon boundaries (Fig. 2a). Relative enrichment analysis showed that NDRs were enriched in gene promoters, CGIs and enhancers (Fig. 2b). In addition, NDRs were more enriched in exons than in introns and were more enriched in intragenic regions than in intergenic regions (Fig. 2b,c). Then, 61,403 proximal NDRs were used to perform t-distributed stochastic neighbour embedding (t-SNE) and unsupervised hierarchical clustering analysis. We found that proximal NDRs had clear stage-specific features during preimplantation development (Fig. 2d,e). The most dramatic chromatin remodelling of proximal NDRs occurred between the 4-cell stage and the 8-cell stage during ZGA (Fig. 2e), indicating that ZGA is accompanied by global chromatin-state remodelling.
Next, we classified genes into three groups (homogeneously open, divergent and homogeneously closed) based on the heterogeneity of chromatin states around their TSSs to investigate the stepwise gene transition during preimplantation development and in hESCs. We used the GCH methylation level of TSS NDRs and nucleosomes in hESCs as a cut-off to define the chromatin states in each individual cell (Fig. 3a; see Methods). The regions around TSSs of homogenously open and divergent genes in hESCs were unmethylated, whereas those around TSSs of homogenously closed genes were hypermethylated (Fig. 3b). Interestingly, genes associated with developmental process, nervous system development and organ morphogenesis were enriched in the divergent cluster in hESCs, which indicated that early-differentiation genes in hESCs tend to have variable promoter chromatin states among individual cells (Fig. 3b). Compared with ChIP–seq data from hESCs36, we found that homogenously open genes in hESCs were mainly trimethylation of lysine 4 on histone H3 (H3K4me3) marked (71.7%), whereas divergent genes were more enriched with H3K4me3/H3K27me3 bivalent marks (49.2%). In addition, homogeneously closed genes were mainly H3K4me3/H3K27me3 unmarked (64.7%) (Fig. 3c). The expression levels of homogenously open genes were much higher and more uniform than those of divergent genes, whereas homogeneously closed genes were not expressed (Fig. 3d; RNA sequencing (RNA-seq) data were from a previous study37). Similar to genes in mouse oocytes6, the majority of genes in human oocytes were homogeneously closed and a large proportion was reprogrammed to either homogeneously open or divergent states soon after fertilization (Fig. 3e,f and Supplementary Table 2). Compared to the divergent genes, those in the homogeneously open state tended to have higher expression levels and less variation among individual cells after ZGA (Supplementary Fig. 4a,b; RNA-seq data were from a previous study37). Soon after fertilization, genes related to general cell activities, such as RNA processing, cellular metabolic process and cell cycle, became homogenously open, which was maintained thereafter (Fig. 3e and Supplementary Fig. 4c).
Differential DNA methylation and chromatin accessibility of parental genomes within each human early blastomere
A total of 1,402,172–1,612,418 single-nucleotide polymorphisms (SNPs) identified from donors’ genome sequencing data were used to separately trace the epigenetic reprogramming of parental genomes in each individual cell (Fig. 4a). The higher methylation level of the paternal genome than the maternal genome was weakened after fertilization due to faster demethylation of the paternal genome. After the 2-cell stage, residual DNA methylation on the maternal genome was already higher than that on the paternal genome (Fig. 4b). Unlike in mice6, shortly after fertilization, the chromatin of the paternal genome in human embryos was reprogrammed to a more open state than that of the maternal genome, which was maintained to the 4-cell stage. Afterwards, parental genomes reached comparable chromatin states in humans (Fig. 4b). We found that intragenic regions, including exons and introns, showed higher DNA methylation levels in the maternal genome shortly after fertilization, which was maintained throughout preimplantation development (Fig. 4c). Furthermore, for intragenic regions, residual DNA methylation was preferentially preserved on the maternal genome for highly and lowly expressed genes than genes that were not expressed (Fig. 4d). By contrast, for intergenic regions, residual methylation was preferentially preserved on the paternal genome until the 8-cell stage (Fig. 4c). Chromatin remodelling in repeat regions and enhancers generally mirrored the global pattern (Fig. 4c). For aneuploid cells, the differential DNA methylation and chromatin accessibility between parental genomes were similar to that of euploid cells (Supplementary Fig. 5a,b).
Distinct features of chromatin accessibility between human and mouse embryos and paternal X chromosome reactivation in female embryos
Based on t-SNE analysis, except for sperm, early embryos showed clear species-specific and stage-specific features based on 15,057 human–mouse homologous proximal NDRs (Supplementary Fig. 6a). Interestingly, the chromatin states of these NDRs were clearly different between the inner cell mass (ICM) and ESCs in both species, indicating that chromatin states changed drastically during the derivation of ESCs from the ICM. Moreover, we found that chromatin accessibility variance within an embryo was low until the 8-cell stage in humans, whereas in mice, the variance remained low until the morula stage (Fig. 5a). The digestion process caused embryos to become quite fragile; thus, for some of the embryos, not all cells within an embryo were recovered and sequenced, especially after the 8-cell stage. As a result, we cannot exclude the possibility that variance of chromatin accessibility within an embryo was underestimated (but not overestimated) in the late stages.
More importantly, we found that there were distinct features of chromatin accessibility in parental genomes during mouse and human preimplantation development. Compared to mouse embryos at the same stage6, human embryos had much more accessible chromatin (Fig. 5b). For instance, the chromatin of human oocytes was more open than that of mouse oocytes (41.8% mean GCH methylation level in human oocytes and 17.4% in mouse oocytes; P = 1.3 × 10−4). After fertilization, chromatin accessibility of human maternal genome was lower at the mid-zygote stage than in oocytes, and then continuously decreased until the 8-cell stage (Fig. 5b; 31.2% mean GCH methylation level in zygotes, 27.3% at the 2-cell stage, 27.5% at the 4-cell stage and 25.8% at the 8-cell stage). For mouse maternal genome, chromatin was more open in early zygotes than in oocytes, and the accessibility later decreased to the lowest level at the 2-cell stage (20.7% mean GCH methylation level in zygotes and 9.6% at the 2-cell stage). The chromatin of sperm was closed in humans and mice, but it opened sharply after fertilization in both species (Fig. 5b). Then, the chromatin accessibility of human paternal genome decreased gradually and reached its lowest level at the 8-cell stage (34.8% mean GCH methylation level in zygotes, 34.9% at the 2-cell stage, 31.9% at the 4-cell stage and 27.2% at the 8-cell stage), whereas that of the mouse paternal genome dramatically decreased to the lowest level at the 2-cell stage (22.0% mean GCH methylation level in zygotes and 8.8% at the 2-cell stage; scCOOL-seq data of mouse preimplantation embryos were from a previous study6). The differences mentioned above were probably due to the different times of ZGA between species.
Next, we analysed the chromatin accessibility and DNA methylation of the parental X chromosome in each female blastomere during human preimplantation development (Fig. 5c and Supplementary Table 1). Similar to the global pattern of differential epigenetic reprogramming, the paternal X chromosome demethylated and reactivated quickly after fertilization. After the 2-cell stage, the DNA methylation level of the maternal X chromosome was already higher than that of the paternal level, and this methylation difference was maintained. For the chromatin state, the paternal X chromosome was more open than the maternal X chromosome from zygote to the 4-cell stage, which probably facilitated the reactivation of the paternal X chromosome during this period38,39.
Linking DNA methylation variance to chromatin accessibility changes
Cell-to-cell variance in DNA methylation and chromatin accessibility were calculated31 to determine whether demethylation and chromatin remodelling were synchronized. Oocytes showed the lowest variance among individual cells (Fig. 6a,b). Once fertilized, strong variations in DNA methylation emerged in exons, introns, repeat elements (short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs) and long terminal repeats), enhancers and heterochromatin regions (H3K9me3 marked), whereas DNA methylation was less heterogeneous in promoters, CGIs, proximal NDRs and RNA polymerase II-binding sites, which were lowly methylated (Fig. 6a). However, stronger variations in chromatin accessibility were found in proximal and distal NDRs and in the binding sites of the pluripotency factors POU5F1 (also known as OCT4), SOX2 and NANOG among individual cells (Fig. 6b). These results indicated that, during human preimplantation development, DNA demethylation and chromatin remodelling were unsynchronized in different genomic regions.
Next, we explored the relationship between DNA methylation variance and chromatin accessibility variance in promoter regions. In sperm, promoters showed the lowest variations in both omics among individual cells (Fig. 6c). Compared to sperm, oocytes showed more variations in chromatin accessibility. Interestingly, after fertilization, genes showing high variations in DNA methylation and those with high variations in chromatin accessibility tended to be two different sets. For example, from the 4-cell stage to the morula stage, genes showing high chromatin accessibility variations but low DNA methylation variations (most were hypomethylated; Supplementary Fig. 6b) were strongly enriched in genes related to chromosome organization, chromatin modification and the cell cycle (Supplementary Table 3). This finding indicated that chromatin states of the promoters of chromatin remodelling-associated genes were heterogeneous at these stages. However, from zygote to the blastocyst stage, genes showing high DNA methylation variations but low chromatin accessibility variations were strongly enriched in GO terms, such as the inflammatory response and the detection of a chemical stimulus involved in sensory perception (Supplementary Table 3); these genes were not involved in early embryogenesis and showed unsynchronized DNA demethylation. After fertilization, especially after ZGA, genes with higher variations in DNA methylation tended to be in a closed state (Supplementary Fig. 7a). However, genes with higher variations in chromatin accessibility tended to be hypomethylated throughout preimplantation development (Supplementary Fig. 7b).
In addition, we also explored the relationships among chromatin accessibility, DNA methylation and gene expression during human preimplantation development (RNA-seq data were from a previous study37). We observed a clearly positive correlation between the chromatin accessibility of promoters and the expression levels of corresponding genes that gradually increased from zygote to the blastocyst stage, especially after ZGA (Fig. 6d). In addition, a clearly negative correlation between DNA methylation and chromatin accessibility of promoters was revealed in zygotes and throughout preimplantation development (Fig. 6d).
Chromatin accessibility of repeats and the regulation network during human preimplantation development
Certain groups of repeat elements were highly expressed during human preimplantation development12, so we analysed the potential effect of DNA methylation and chromatin accessibility on their transcription (Supplementary Fig. 8a–d). Interestingly, NDRs were preferentially enriched on SINE/variable number of tandem repeats/Alu (SVAs; which are non-autonomous hominid-specific retrotransposons) than others from the 2-cell stage to the blastocyst stage (Fig. 7a). Consistently, from zygote to the morula stage, especially from the 8-cell stage, chromatin accessibility gradually increased on SVAs, which was comparable with their expression levels, and this increase in accessibility probably accounts for the sharp increase in the SVA expression level during this period (Fig. 7a and Supplementary Fig. 8a; gene expression and DNA methylation patterns of SVAs, LINE1 (L1) and L2, Alu and mammalian-wide interspersed repeats (MIR) during human preimplantation development were reported previously12). In addition, for LINEs, evolutionarily younger L1 was less open than evolutionarily older L2, which was compatible with our previous finding12 that residual methylation was more preserved on L1 than on L2 (Fig. 7a and Supplementary Fig. 8b). However, for SINEs, evolutionarily younger Alu and evolutionarily older MIR showed comparable chromatin accessibility at each stage, although the chromatin states of both were drastically reprogrammed during this period (Supplementary Fig. 8c).
As open chromatin could potentially be bound by transcription factors to regulate gene transcription as enhancers or insulators40, we next searched for the transcription factor motifs in NDRs. Using HOMER (hypergeometric optimization of motif enrichment)41, we found that proximal NDRs were strongly enriched for the binding motifs of transcription factors for transcription complex formation and the initiation of transcription, such as SP1, MAZ, E2F1, ELF1, YY1 and TATA box (Fig. 7b). Interestingly, they were also enriched for the binding sequences of pluripotency and early embryonic regulators, such as RONIN, GATA2, GATA4, GATA6 and NANOG (Fig. 7b). By contrast, distal NDRs marked both enhancers and insulators in a stage-specific manner. For example, the SOX family (SOX2, SOX4 and SOX15),the GATA family (GATA2, GATA4 and GATA6), FOXA2 and RBPJ, which had key roles in differentiation and cell fate determination, showed strong motif enrichment from the 8-cell stage onward (Fig. 7c). The distal NDRs in the trophectoderm were specifically enriched for motifs of CDX2, AP2-γ and GATA2 (Fig. 7c). In addition, motifs of the pluripotency master regulators OCT4, SOX2 and NANOG were enriched in hESCs (Fig. 7b,c). Interestingly, the presumptive binding sites of OCT4, NANOG42 and P300 (ref.36) in hESCs were already opened in zygotes, long before the pluripotency was established in the ICM (Fig. 7d,e and Supplementary Fig. 8e). Furthermore, we performed de novo prediction of enhancers using information from both open chromatin and low-methylated regions (LMRs)43 at each stage (Supplementary Table 4; see Methods). However, few overlaps were found based on human–mouse homologous regions (Supplementary Fig. 8f).
Continual transcription is crucial for human preimplantation chromatin remodelling
For the human genome, the proportion of wide proximal NDRs (≥300 bp) increased slightly after fertilization from 1.2% in mature metaphase II oocytes to 7.6% at the mid-zygote stage and then remained stable and sharply increased from 6.9% at the 4-cell stage to 21% at the 8-cell stage (Fig. 8a). In mice, the proportion reached its highest percentage at the 2-cell stage when the zygotic gene activated6 (Fig. 8a). Compared to proximal NDRs of ≤200 bp, these wider proximal NDRs showed higher GCH methylation levels and were enriched for the RNA polymerase II signal42 (Fig. 8b,c). These results indicated that, in both species, a gain of wide proximal NDRs was associated with the transcription process, especially in ZGA. Then, we blocked RNA polymerase II-mediated transcription by α-amanitin in zygotes to explore the effect of transcription on the maintenance of wide proximal NDRs at the 8-cell stage. Compared to the control group, 1,797 out of 5,155 wide proximal NDRs were unable to be maintained when transcription was inhibited (Fig. 8d). Among the nearest genes of these 1,797 transcription-dependent proximal NDRs (Supplementary Table 4), many of them had a key role in embryogenesis. For example, E2F3, E2F6, EIF1 and ELL2 were required for transcription complex formation and the initiation of transcription. DPPA3, GATA6, LIN28A, POU5F1, HDAC1 and KDM3A were related to embryonic development and histone modification (Fig. 8d). These results indicated that continual transcription was functionally crucial for a large set of zygotic genes to maintain their promoters’ openness as a feedback mechanism.
We applied scCOOL-seq to six critical stages of human preimplantation development. When our work was under review, chromatin accessibility changes in human early embryos were reported44,45. The chromatin states underwent dramatic remodelling during this process. We also found that the most drastic chromatin remodelling was completed within the first 19 h after fertilization. In addition, we analysed DNA methylation reprogramming and chromatin remodelling simultaneously during this process. Our study provides insights toward a deeper understanding of epigenetic reprogramming during human preimplantation development.
The aneuploidy phenomena of human embryos, which is partially related to maternal age, probably results in embryo developmental retardation or other abnormalities46. In this study, our single-cell sequencing strategy enabled the discrimination of aneuploid cells from euploid cells. By focusing on euploid cells, we found that the genomic regions showing the strongest variations in chromatin accessibility were mainly enriched at the proximal and distal NDRs and pluripotency master transcription factor-binding regions42 (Fig. 6b). This result indicated that the chromatin-state heterogeneity of the binding sites for master transcription factors was an intrinsic feature of preimplantation embryos, which may contribute to the flexibility of cell fate determination in these blastomeres1,47.
Our previous studies using a mouse model showed that chromatin accessibility of parental alleles was comparable from the zygote stage onwards6,7,48. However, human embryos showed a delayed balance between parental alleles until the 4-cell stage (Fig. 4b). Moreover, the chromatin of the maternal genome in zygotes was less accessible than that of oocytes in humans, which was also different from the scenario in mice (Fig. 5b). These results indicated that chromatin accessibility has species-specific features.
Using multi-omics sequencing methods, such as scCOOL-seq6, scNOMe-seq24 and scNMT-seq23, we can estimate several epigenetic characteristics simultaneously in the same individual cell. We found that genes with heterogeneous chromatin accessibility tended to be different from those with heterogeneous DNA methylation throughout preimplantation development (Fig. 6c). To decrease costs and obtain sufficient information, improving the technique, such as its mapping rate and uniformity of coverage49,50, is still worthwhile. In summary, our work offers a new possibility to decipher highly complex, yet orderly and orchestrated epigenomic reprogramming processes and their effects on gene expression in human early embryonic development.
Collection of human early preimplantation embryos
Human sperm was obtained from a fertile male, and oocytes were donated from volunteers in compliance with the Reproductive Medicine Ethics Committee of Peking University Third Hospital. All volunteers signed an informed consent document.
The superovulation procedure and the acquisition of oocytes were performed according to a published method51. Metaphase II oocytes were treated with hyaluronidase (Sigma) to remove granulosa cells. The embryos were obtained through intracytoplasmic sperm injection. The embryos were maintained in G-1 media (Vitrolife) after intracytoplasmic sperm injection and were transferred to G-2 media (Vitrolife) on day 3. The zygotes were collected 19–21 h after fertilization. The assessment of embryos and the timing of embryo collection were performed according to a previous study52. The 2-cell and 4-cell blastomeres were collected after 27 h and 48 h of fertilization, respectively. The 8-cell blastomeres and morulae were collected on day 3 and day 4. Blastocysts were collected on day 5 to day 6. No delayed or arrested embryos were included in this study. To digest the zona pellucida, the embryos were treated with diluted 36% HCl solution (1:1,000) and were washed in DPBS with 0.1% HSA several times. Then, a mixture of Accutase (Millipore) and 0.25% trypsin-EDTA at a ratio of 1:1 was used to digest the embryo at 37 °C for 20–60 min supplied with 5% CO2. The single blastomeres were washed extensively with 0.1% HSA in DPBS before being transferred to lysis buffer.
hESC culture was conducted according to a previous method37. Briefly, hESCs were maintained on a mitotically inactivated mouse embryonic fibroblast feeder layer supplied with 4 ng ml−1 basic fibroblast growth factor. The undifferentiated hESC clones were digested in Accutase (Sigma) at 37 °C for 1 h to obtain a single-cell suspension for scCOOL-seq library construction.
scCOOL-seq library preparation
The scCOOL-seq libraries were constructed according to previous studies6,30,31. The single cells of early embryos were picked into lysis buffer by mouth pipette. The lysate was incubated in 5 U GpC MTase (NEB) at 37 °C for 45 min and 65 °C for 25 min. After adding 10 μg protease (Qiagen), the sample was incubated at 50 °C for 3 h and then 70 °C for 20 min. The bisulfite conversion was performed using EZ-96 DNA Methylation-Direct MagPrep (Zymo) according to the user guide. Ten rounds of amplification were completed in the presence of Klenow exo- (Enzymics) and scBS-seq-P5-N6-oligo1 (CTACACGACGCTCTTCCGATCTNNNNNN). The amplification product was purified by 0.8× Agencourt AMPure XP beads (Beckman). The second strand was synthesized using scBS-seq-P7-N6-oligo2 (AGACGTGTGCTCTTCCGATCTNNNNN), which was followed by 0.8× Agencourt AMPure XP beads (Beckman) purification. Thirteen cycles of PCR were performed to amplify the library using the index primer and the universal primer (NEB). The libraries were sequenced on the Illumina HiSeq 4000 platform on the paired-end 150-bp mode.
Transcription inhibition assay
The zygotes were treated with 100 μg ml−1 α-amanitin (Sigma). When the untreated group developed into the 8-cell stage, the blastomeres were collected to perform scCOOL-seq library construction as described above.
Human genomic DNA extraction and library preparation
The whole blood of female volunteers and sperm from the male volunteer were used to extract genomic DNA using a DNeasy Blood and Tissue Kit (Qiagen). Genomic DNA (500 ng) was fragmented into 300 bp by Covaris. Then, the libraries were constructed using a KAPA Hyper Prep Kit (Kapa Biosystems). The genomic DNA libraries were sequenced in the same way as the scCOOL-seq libraries.
Data quality control and alignment
For scCOOL-seq data, we used Trim Galore (v0.3.3) to remove the first six bases of random primer, adaptor sequences and low-quality bases. Bismark (v0.7.6)53 was used to map clean reads to the human reference genome hg19 (downloaded from the UCSC genome browser) with a paired-end and non-directional module, and then, unmapped reads were realigned to the same reference genome in a single-end and non-directional module31,54. After alignment, PCR duplicates were removed with SAMtools55 (v0.1.18).
For RNA-seq data37 (GSE36552), clean reads were realigned to the human reference genome hg19 using TopHat56 (v2.0.12). The gene expression levels were calculated using the reads per kilobase per million mapped reads (RPKM) method with a customized Python script.
CNV deduction with scCOOL-seq data
After alignment, R package HMMcopy57 was used to deduce CNV in scCOOL-seq data across the genome with GC and mappability corrections. ‘readCounter’ was used to obtain read counts for non-overlapping 1-Mb windows based on bam files. HMMcopy in R was used to correct counts and estimate the copy number in each window. Because systematic coverage bias still existed, we normalized the corrected counts of each window by dividing it by the median of corrected counts of the same window across each stage. Only euploid cells were retained for further analysis, unless specified elsewhere.
WCG and GCH methylation level estimation
For scCOOL-seq data, we used 1× as the read depth cut-off. In addition, the methylation level of each covered cytosine (separated on each strand) was estimated by the ratio of the methylated reads ‘C’ divided by the sum of the methylated and the unmethylated reads (‘C + T’). GCGs were removed for being undistinguishable between DNA methylation and chromatin accessibility. CCGs were also removed due to M.CviPI methyltransferase, which has some low-level activity for CC sites26,28,58. Thus, we used WCG (ACG/TCG) for DNA methylation analysis and GCH (GCA/GCC/GCT) for chromatin accessibility analysis.
We also analysed the DNA methylation level and chromatin accessibility in different genomic regions, repeat elements and functional regions defined in hESCs. For each region, DNA methylation and chromatin accessibility were calculated as the mean methylation level of all WCG or GCH sites covered in this region. Only regions that covered at least three WCG or five GCH sites were retained for further analysis. DNA methylation and chromatin accessibility coverage of regions in each single cell are shown in Supplementary Table 1. Genomic region annotations, including exons, introns and CGIs, were downloaded from the UCSC genome table. Promoters (regions 1 kb upstream and 0.5 kb downstream of the TSS) were classified into three groups based on the CpG density (HCP, high-density CpG promoter; ICP, intermediate-density CpG promoter; and LCP, low-density CpG promoter) as previously described59. Intragenic regions were defined as regions from the TSS to the transcription end site, and intergenic regions were regions complementary to the intragenic regions in the genome. Human enhancer annotations were obtained from a previous study12. The locations of repeat elements and their subfamilies were downloaded from hg19 RepeatMasker track of the UCSC genome browser. Transcription factor-binding sites in hESCs, such as CTCF, NANOG, POU5F1, SOX2 and RNA polymerase II, and the normalized signal intensity of RNA polymerase II in hESCs were downloaded from the Epigenome Roadmap project (GSE61475)42. DNase-seq peaks of hESCs were downloaded from the ENCODE project (GSE32970)60. The other ChIP–seq broadPeak files (P300, H3K4me1, H3K27ac, H3K4me3, H3K9me3 and H3K27me3) were downloaded from another ENCODE project (GSE29611)36. All the processed data of published ChIP–seq, such as peak files or normalized signal files, were directly downloaded from previous studies36,42,60.
Parental SNP calling
For parental genomic DNA sequencing data, we used Trim Galore to conduct the read quality control. The clean reads were then mapped to the human reference genome hg19 with the command ‘bwa (v0.7.12)61 mem -M’. PCR duplicates were removed using Picard (v1.126) with the ‘MarkDuplicates’ module (http://broadinstitute.github.io/picard). After alignment, GATK (v3.4-46) was used to perform the read local realignment around indels using the ‘RealignerTargetCreator’ and ‘IndelRealigner’ modules. ‘BaseRecalibrator’ and ‘PrintReads’ modules were used to recalibrate the base qualities that would improve the accuracy of SNP calling. Then, SNPs were called using GATK62 with the parameters ‘-T HaplotypeCaller -D dbsnp_v135.hg19.vcf --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30’. To reduce the false-positive rate, we filtered raw SNPs using the ‘VariantFiltration’ module with the command ‘-T VariantFiltration --filterExpression “QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0”’. Only SNPs that were annotated in the dbSNP database (v135) with a read depth between 10 and 60 (ref.63) and that were distinguishable between parental genomes were retained for further analysis.
On average, 1,489,597 SNPs could be used to reassign the mapped reads covering these sites into the maternal or paternal groups. Then, SNP-linked WCG or GCH sites were used to analyse parental allele-specific DNA methylation and chromatin accessibility in scCOOL-seq data.
Detection of NDRs and NORs
Only GCH sites were used to deduce chromatin states. Owing to the sparse nature of current single-cell sequencing data, we merged all euploid cells from the same stage to robustly and accurately detect the NDRs and NORs at each stage. Then, we took advantage of these well-defined open or closed regions to analyse their chromatin states in each individual cell. We used a sliding-window chi-squared test to detect regions with significantly higher GCH methylation levels than the whole-genomic background26,28,58. Briefly, the GCH methylation level of 100-bp windows at 20-bp spacing was calculated based on all covered 1× GCH sites, and only windows with P ≤ 10−20 that were longer than 140 bp covering at least 5 GCH sites were defined as NDRs. NDRs were grouped into three classes based on the distance to the TSS: TSS NDRs (NDRs containing TSSs); proximal NDRs (NDRs within 2 kb upstream and 2 kb downstream of the TSS); and distal NDRs (NDRs at least 2 kb away from the nearest TSS). By contrast, significantly hypomethylated regions were called with 40-bp windows at 20-bp spacing. Only the regions with a P ≤ 10−4 that were longer than 60 bp covering at least 3 GCH sites were defined as NORs.
Heterogeneity analysis of promoter status
We used TSS NDRs called in the pools of scCOOL-seq data to measure chromatin status among individual cells to deduce each gene’s status in each stage. In the pools of hESC data, the average GCH methylation levels of TSS NDRs were >0.5, whereas the average GCH methylation levels of NORs were <0.3. Thus, we used these cut-off values to measure the chromatin status in each individual cell. Specifically, if a TSS NDR called in the pools of data was hypermethylated (GCH of ≥0.5) and covered at least 5 GCH sites in a single cell, then this TSS NDR was open in this cell. By contrast, if a TSS NDR called in pools of data was hypomethylated (GCH of ≤0.3) and covered at least 5 GCH sites in a single cell, then this TSS NDR was closed in this cell. For those TSSs without NDRs, the regions that were 200 bp upstream and 100 bp downstream of the TSS were used for chromatin accessibility estimations in each individual cell with the same cut-off. Only regions covered in at least half of the cells in a stage were retained for further analysis. If a TSS NDR was open in >80% of the covered cells in a stage, then the corresponding gene was defined as homogeneously open at this stage. If a TSS NDR was open in 30–80% of cells in a stage, then the corresponding gene was defined as divergent at this stage. If a TSS NDR was open in <30% of the cells at a stage, then the corresponding gene was homogeneously closed at this stage.
The GOstats64 package in R was used to obtain the enriched GO terms. All genes with a defined status in each stage were used as the enrichment background.
Connecting transcription factor regulation to open chromatin
‘findMotifsGenome.pl’ in HOMER41 (v4.9.1) was used to search for transcription factor motif enrichment in NDRs with the parameters ‘-size 2000 -len 8 -S 100’. Only motifs with P ≤ 10−10 and RPKM ≥ 5 in at least one stage were retained.
De novo prediction of enhancers
We used chromatin accessibility and DNA methylation levels from the same group of cells at a stage to perform de novo prediction of enhancers. First, we used ‘MethylSeekR’ packages65 in R to call LMRs in hESCs (the mean of the DNA methylation level of hESCs was 82.3%) with the parameters ‘FDR.cutoff 5, m.sel 0.5’. Then, we searched for NDRs that overlapped with LMRs as potential enhancers, but only NDRs that were located 200 bp upstream or 200 bp downstream of the TSS were taken into account. Owing to the strong demethylation process during preimplantation development, we used the parameters ‘m.sel 0.3’ to search for LMRs in the other stages, and only LMRs with four WCG sites covered were retained. In total, there were 44,222 de novo-predicted enhancers from zygote to ESCs in humans and 93,670 de novo-predicted enhancers in mouse (overlapping enhancers within each species were elongated and combined as one enhancer). Using the UCSC liftOver tool, 19,287 human–mouse homologous regions were found in mouse-predicted enhancers. Only 1,167 de novo-predicted enhancers were overlapping between human and mouse homologous regions.
t-SNE and unsupervised hierarchical clustering analysis of chromatin accessibility based on proximal NDRs
To analyse cell populations based on chromatin accessibility, t-SNE and unsupervised hierarchical clustering were performed using proximal NDRs called from oocytes to hESCs covering at least five GCH sites. t-SNE was performed using the ‘tsne’ package in R. Pearson coefficients were calculated with the parameter ‘pairwise.complete.obs’, and then the ‘hclust’ function with the ‘ward.D2’ method in the R package was used for hierarchical clustering.
To compare the chromatin accessibility of proximal NDRs between mouse and human species during preimplantation development, the UCSC liftOver tool was used to covert the mouse (mm9) GCH methylation levels of each site to the human assembly (hg19), which was combined with the genomic coordinates of the mouse proximal NDRs called from oocytes to mouse ESCs. Only proximal NDRs overlapping between human and mouse that were longer than 100 bp and covered at least 3 GCH sites were used to estimate chromatin accessibility in each individual cell. The scCOOL-seq data from mouse preimplantation development were obtained from our previous study6.
Variance of methylation levels
A reported method31 with minor revisions was used to compare DNA methylation variance and chromatin accessibility variance in different regions. Briefly, DNA methylation variance was calculated using a 3,000-bp sliding window with 600-bp steps as previously described31. Only windows with at least three WCG sites covered were retained for further analysis. The lower-bound variance of the chi-square confidence interval of the variance estimator with a confidence level of 0.95 was used to estimate cell-to-cell variance in each region. Only regions covered in at least 30% of the single cells were assessed. Although open chromatin called in the scCOOL-seq method relied on successive highly methylated GCH sites, the mean length of open chromatin was approximately 200 bp. Thus, we used a 200-bp sliding window with a 100-bp step to estimate the variance in chromatin accessibility among single cells along development, and only windows with at least 5 GCH sites covered were counted. For the density plot showing the relationship between DNA methylation variance (1 kb upstream and 0.5 kb downstream of the TSS) and chromatin accessibility variance (200 bp upstream and 100 bp downstream of the TSS) and the relationship between the mean and the variance, ‘var’ function in R was used.
Statistics and reproducibility
Statistical significance between two groups was calculated by two-tailed Student’s t-test. NDRs and NORs were called using a sliding-window chi-squared test. Motif enrichment analysis was performed by HOMER. The enriched GO terms were obtained with the GOstats package in R. All error bars represent 1*s.e.m., and the centre represents the mean. Each box represents the median and the 25% and 75% quantiles, and the whiskers indicate 1.5 times of the interquartile range. The number of biological replicates in each stage is shown in Fig. 1a. All the replicates showed similar chromatin accessibility within each stage. The results of all the replicates are shown in our paper.
This research project was reviewed and approved through an Embryo Research Oversight (EMRO) Process performed by the Reproductive Medicine Ethics Committee of Peking University Third Hospital (2012SZ015). Fifteen members constitute the committee, including scientists, advisors who are familiar with the law, experts in ethics and experienced physicians. The scientific value and ethical justification of this study and adverse events of the donors were assessed by the committee. In addition, the donation processes and follow-up manipulation of donated samples were also supervised by the committee. All experimental protocols underwent ethical review by the Reproductive Medicine Ethics Committee of Peking University Third Hospital.
Peking University Third Hospital was in charge of recruiting research donors in this study. All human gametes were collected after receiving written informed consent from the donors. A donor manager was trained by the researchers and her major responsibility was to explain details in plain words, before giving consent, about the research project, including the procedure of creating embryos with the donated gamete samples, lysis of the samples and the generation of sequencing data, the benefits and risks of the donation, their right of withdrawal, data confidentiality and the publication of results derived from their donated samples.
All experiments on hESCs complied with the principles laid out in the 2016 guidelines for Stem Cell Research and Clinical Translation issued by the International Society for Stem Cell Research (ISSCR), and experiments on the human embryos were performed under the regulations of the official ethical guidelines for hESC research, which were issued by the Ministry of Science and Technology and Ministry of Health of China in 2003. All of our studies and experiments were conducted in compliance with the laws and policies in China.
Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.
All the analyses were performed with custom Python, perl and R codes based on published software as described in the Methods. All computational codes are available upon request from the corresponding authors.
scCOOL-seq data that support the findings of this study have been deposited in the Gene Expression Omnibus (GEO) under the accession code GSE100272. Whole-genome sequencing data have been deposited in the European Genome-phenome Archive (EGA) under the accession code EGAS00001002987. RNA-seq data of human embryos and scCOOL-seq data of mouse embryos were from our previous publication (GSE36552 and GSE78140). ChIP–seq and DNase-seq data were obtained from GSE61475, GSE29611 and GSE32970. All other data supporting the findings of this study are available from the corresponding authors on reasonable request.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by grants from the Ministry of Science and Technology of China (2017YFA0102702 and 2017YFA0103402) and the National Natural Science Foundation of China (81521002 and 31625018). F.T. and J.Q. were also supported by a grant from the Beijing Municipal Science and Technology Commission (D151100002415000). This work was also supported by the Beijing Advanced Innovation Center for Genomics at Peking University. Some of the bioinformatics analyses were conducted on the Computing Platform at the Center for Life Science.