Abundant DNA 6mA methylation during early embryogenesis of zebrafish and pig

DNA N6-methyldeoxyadenosine (6mA) is a well-known prokaryotic DNA modification that has been shown to exist and play epigenetic roles in eukaryotic DNA. Here we report that 6mA accumulates up to ∼0.1–0.2% of total deoxyadenosine during early embryogenesis of vertebrates, but diminishes to the background level with the progression of the embryo development. During this process a large fraction of 6mAs locate in repetitive regions of the genome.

D ynamic DNA methylations expand the genetic code beyond the four canonical bases and carry important epigenetic information that can be transmitted among generations of cells or organisms 1 . DNA methylations such as N 4 -methyldeoxylcytosine, 5-methyldeoxycytosine (5mC or m 5 dC) and N 6 -methyldeoxyadenosine (6mA or m 6 dA) are known to exist and carry functional roles in living organisms 2 . N 4 -methyldeoxylcytosine and 6mA are primarily used by prokaryotes in the restriction-modification system to protect their own genomes from foreign DNA invasion 2 , whereas 5mC is best known for its significant epigenetic roles in eukaryotes, especically in mammals and plants 3,4 .
Although both 6mA and 5mC were discovered almost at the same time a few decades ago, attention to 6mA in eukaryotes has been limited mostly due to its low abundance in multicellular eukaryotes and technological limitations. In most cells and tissues of vetebrates including mammals, 6mA is either detected at a few to tens p.p.m. (part per million A) level or simply not detectable at all using modern mass spectrometry technique 5 . The extremely low abundance of 6mA in most organisms, at a level similar to DNA base damage, has cast in the past further doubt about the functional relevance of 6mA. Very recently, we and others reported the discovery/characterization of 6mA in Chlamydomonas reinhardtii 6 , Caenorhabditis elegans 7 and Drosophila 8 , suggesting its potential epigenetic functions 9,10 . In C. reinhardtii, 6mA was found to be located in the AT motif near the transcription start sites. It marks active genes and is associated with nucleosome positioning 6 . In the genomes of C. elegans and Drosophila, 6mA affects trans-generational inheritance 7 and expression of certain transposons 8 , respectively. Although we have been working on 6mA in vertebrates and mammals, a recent publication has reported the detection of 6mA in Xenopus laevis and mammals, albeit with very low levels of 6mA observed (0.00009%) 11 . A more recent work reported a potential 6mA demethylase in mammals, which still needs to be validated with biochemical and physiological studies 12 . The 6mA level reported is again quite low even after enrichment. This modification was found to be depleted in coding regions and might exist in the AG sequence content. In this work, we characterize DNA 6mA modification in zebrafish and pig. We found that 6mA accumulates to relatively high abundances during early embryogenesis (6mA/A up to B0.1-0.2%), but is attenuated to the background p.p.m. level with the development progression. During this process, a substantial set of 6mAs locate in repetitive regions of the genome.

Results
DNA 6mA modification during early embryogenesis of zebrafish. We have been searching biological processes during which 6mA accumulates to relatively high abundance. We reason that an abundant accumulation and a dynamic change of 6mA could indicate functional relevance. After quantifying 6mA levels in various cell types and tissues, we noticed that 6mA accumulates to surprisingly high levels during early embryogenesis in vertebrates such as zebrafish and pig. Using zebrafish as a model, we purified and quantified genomic DNAs (gDNAs) from sperm, oocyte, early embryos with different time point and adult fish organs by using a sensitive ultra-high-performance liquid chromatography coupled with a triple-quadrupole tandem mass spectrometry (UHPLC-QQQ-MS/MS) assay with pure 6mA nucleoside as an external standard 6 . We devoted significant efforts to the collection of the oocytes and early embryos, to purify sufficient gDNA for LC-MS/MS measurements. On average, we collected over 10,000 cells at each embryo stage and purified gDNA.
The LC-MS/MS result shows that the sperm gDNA contains around 0.003% of 6mA/A, whereas the oocyte contains fivefold higher 6mA than in sperm. On zygote formation, the level of 6mA increases and doubles that of oocyte. Interestingly, starting from the 1-cell zygote, the level of 6mA in DNA further increases along with development and reaches a maximum of B0.1% 6mA/A at around 32-cell to 64-cell embryo stage corresponding to B2 h post fertilization (2 hpf; Fig. 1a and Supplementary  Fig. 1a) 13 . Afterwards, the 6mA level gradually decreases to around 0.006% at 512-cell stage and stays at this low level till prim22 (B36 hpf). We further characterized adult fish organs including the brain, eye, heart, ovary, testis, muscle and intestine; all of them yielded a relatively low ratio of 6mA/A at 0.002 À 0.004% (Supplementary Fig. 1b). Dot blot assay was used as an independent way to measure 6mA levels at selected zebrafish early development stages ( Supplementary  Fig. 2), showing results consistent with LC-MS/MS experiments.
Immunostaining of 6mA in zebrafish embryonic gDNAs. To exclude the possibility that the detected LC-MS/MS signal is due to foreign 6mA contamination, we applied immunofluorescence staining of the embryos 14 with anti-6mA-specific antibody (after treating with RNase A to digest and wash away all RNAs) for 6mA detection in gDNA. Very weak fluorescence was observed from the sperm staining, while oocyte staining displayed stronger signals ( Fig. 1b and Supplementary Fig. 3a). Starting from the zygote stage, the nuclear immunofluorescence of embryos increases significantly along with the development, reaching its maximum at about 32-cell to 64-cell stage (Fig. 1b). Afterwards, the signal decreases with the progression of the development. These observations are consistent with the LC-MS/MS result. After treating with DNase, the immunofluorescence signal disappeared ( Supplementary  Fig. 3b). Together, LC-MS/MS, dot blot assay and immunostaining results confirm the existence of abundant 6mA during early embryogenesis of zebrafish. Mammalian cells and tissues tend to contain a few percent of 5mC and 0.01 À 1% 5-hydroxymethycytosine (5hmC) in gDNA. Our observation of 6mA during early embryogenesis represents the first example that an abundant 6mA level comparable to those of 5mC and 5hmC is observed in vertebrates. The dynamic changes suggest functional roles of 6mA during the process. To further substantiate this observation in mammals, we studied early embryogenesis in pig.
Quantification of DNA 6mA level in pig embryos. It is much harder to obtain sufficient pig oocytes and embryos. The current available way to obtain early embyros is done through in vitro fertilization followed by culturing into different embryonic stages 15 ; however, this method is restricted by low fertilization rates. Owing to the technical limitation, the successfully fertilized oocyte can only be cultured to, at most, the blastocyst stage. To have enough gDNA for LC-MS/MS and immuostaining experiments, we devoted significant efforts to collecting sufficient materials. Besides pig sperm and oocyte, we selected 4-cell, morula and blastocyst stages (B2,500 oocytes, B1,800 4C stage, B900 morulae and B500 blastocysts) and purified gDNAs for LC-MS/MS measurement (Fig. 1c). Similar to zebrafish, the 6mA/A ratio in oocyte (0.09%) is approximately six times higher than that of sperm. This ratio rises to B0.17% from the four-cell to the morula stage and then decreases to 0.05% at the blastocyst stage. The corresponding immunostaining results are consistent with LC-MS/MS measurements ( Fig. 1d and Supplementary Fig. 3c), which further supports the presence of abundant 6mA. We also checked the 6mA level in the gDNAs of various adult pig tissues with only low p.p.m. level observed ( Supplementary Fig. 1c).
Distribution of 6mA in zebrafish embryonic gDNAs. Next, we studied the genomic distribution of 6mA in selected systems. The antibody-based (SYSY 6mA antibody) immunoprecipitation of 6mA-containing DNA followed by next-generation high-throughput sequencing (6mA-IP-seq) 6 was used to map the genomic distribution patterns of 6mA. The 6mA-IP-seq generally requires micrograms of gDNA because of the unavoidable loss of materials in many purification steps during the construction of sequencing libraries. A large number of embryos would be required, to obtain sufficient gDNA materials, in particular at the early stages. We collected 5,000 zebrafish embryos at the 64-cell stage (B1 mg gDNA isolated) in which 6mA accumulates to the highest level, as well as gDNAs from later stages including 11 hpf, 12 hpf and 13 hpf, and performed 6mA-IP-seq. Owing to the difficulty of acquiring enough gDNA from pig embyros, we were unable to sequence pig samples at the current stage.
We performed bioinformatic analysis of the sequencing data generated from the zebrafish genome. By comparing 6mA-IP with input data, we identified B57,000 to B3,300 potential 6mA peaks in samples of 64C, 11 hpf, 12 hpf and 13 hpf ( Supplementary  Fig. 4a), among which the 64C samples possess the most peaks, consistent with the 6mA abundance measurements. These samples show peak overlapping ratio of 41 À 50% across different time points ( Supplementary Fig. 4b). In general, the 6mA peaks are distributed widely across the genome with slight enrichment after transcription start site (TSS; Supplementary Fig. 5) and they were found to be enriched at exon, but not intron, intergenic nor  Fig. 6), which is different from the exon depletion case observed in the X. laevis testis genome 11 . Specifically, we found that B78-81% of the peaks locate in repetitive elements (REs), which suggests a close relationship between 6mA and RE ( Fig. 2a and Supplementary Fig. 7a). Interestingly, 6mA peaks in the simple repeat region accounts for 37 À 45% of the total peaks. We further divided the RE regions into subtypes based on RepeatMasker 16 . Compared with the background constitution of RE regions in the genome, 6mA peaks enriches in simple repeat, RC/Helitron (rolling-circle transposon), LTR (long terminal repeat retrotransposon), LINE/L1 (long interspersed element) and DNA/Maverick (virus-like DNA transposon) classes of REs (Fig. 2b) Consensus motifs containing 6mA. To search for sequence preference of 6mA, we performed motif analysis on 6mA peak regions. Considering the sequence diversity between RE and non-RE regions, we divided peaks to three groups: simple repeat regions, other RE regions apart from simple repeats and regions not related to RE (non-RE). Interestingly, we obtained significant but also diverse motifs for each group.
For instance, at 64C stage the regions of simple repeat, other RE regions and non-RE are mainly featured with 5 0 -CACACACA-3 0 , 5 0 -CCTAGC-3 0 and 5 0 -CAGCAG-3 0 motifs, respectively (Fig. 2c). Among the simple repeats, tandem CA motif is the most prevalent, which is followed by tandem 5 0 -TCCA-3 0 and 5 0 -CCAA-3 0 motifs. The most significant motif sequences showed recurrence in all sequencing samples ( Supplementary  Fig. 7c) and are different from those observed in a recent study 11 .   6mA-IP-seq by using different anti-6mA antibody. To avoid potential sequencing bias introduced by antibody, we chose another 6mA antibody (Abcam) to perfrom independent 6mA-IP-seq for the 64C stage and found 6mA distribution pattern and peak motif consistent with those described above by using the SYSY antibody ( Supplementary Fig. 8). Simple repeat regions are again significantly enriched with 6mA, accounting for B53% peaks across the genome. These two antibodies show slightly different affinities to 6mA with the Abcam antibody possessing a relatively lower affinity but slightly higher selectivity ( Supplementary Fig. 9); the use of the Abcam antibody appears to further enrich 6mA-abundant DNA segments. This result further suggests that simple repeat region is more enriched with 6mA modification.
Selected 6mA-containing targets. To probe the correlation between the expression of REs enriched with 6mA and the timing of accumulation, we selected a few targets and used quantitative reverse transcription PCR to quantify their transcript expression levels at various developmental stages ( Supplementary  Fig. 10). Among the selected targets, Thy1 (LTR/Gypsy), Parp4 (LTR/Gypsy) and Hel1 (RC/Helitron) all showed a trend of reduced expression with the progression of development. These data suggest that the expression of REs might be positively correlated with the 6mA abundance.

Discussion
6mA is a well-studied prokaryotic DNA marker. In Escherichia coli, 6mA is present in the palindromic 5 0 -GATC-3 0 motif and participates in the control of chromosome replication, nucleoid segregation, mismatch repair and transcription regulation 17 . 6mA is also present in eukaryotes and appears to have functional roles 10 . We show here that 6mA can accumulate to B0.1 À 0.2% of 6mA/A during early embryogenesis in zebrafish and pig. Levels of 6mA in oocyte are several folds higher than those of sperms (p.p.m. level) and, on zygote formation, the abundance of 6mA increases (6mA/ A up to B0.1 À 0.2%) and then decreases (back to p.p.m. level) with the progression of embyronic development. 6mA prefers to be enriched in REs, whose expression might be positively correlated with the 6mA abundance. The highest abundance of 6mA observed appoaches the level of 5hmC in most mammalian tissues. Future studies should reveal how 6mA may correlate 5mC reprogramming in mammals 18 , the role of 6mA in REs 12 and other regions (with motifs such as 5 0 -CACACACA-3 0 , 5 0 -CCTAGC-3 0 and 5 0 -CAGCAG-3 0 ), enzymes that install and/or erase 6mA and potential 'reader' proteins that may bind 6mA to mediate biological functions.

Methods
Zebrafish strains and husbandry. Zebrafish (Danio rerio) embryos were derived from the Tubingen zebrafish lines. Embryos were incubated in Holtfreter's solution at 28.5°C.
Collection of early embryos and isolation of their gDNAs. Unfertilized sperm were squeezed out of anaesthetized males and oocytes were squeezed out of anaesthetized females (the oocytes were activated when they were squeezed into Holtfreter's solution; the polar bodies were extruded from the egg surface following egg activation). The fertilized embryos were grown in Holtfreter's solution at 28.5°C and were staged according to standard morphological criteria. The oocytes and different stages of embryos were frozen in liquid nitrogen until enough embryos were collected for DNA extraction. For sperm, they were directly squeezed into lysis solution (100 mM Tris-HCl pH 8.5, 200 mM NaCl, 5 mM EDTA, 0.2% SDS and 50 mg ml À 1 proteinase K) for gDNA extraction. For oocytes and early embryos, to each aliquot of 300 dechorionated embryos was added 600 ml of lysis solution. The solution was incubated in a rotation oven at 55°C overnight. Equal volume of phenol/chloroform (pH 8.0) was added for extraction and the mixture was vortexed briefly and spun for 10 min in a microfuge at 16,000 g. To the collected aqueous solution 2 ml of RNase A (10 mg ml À 1 ) was added and the resultant solution was incubated in thermomixer at 37°C for additional 3 h. Phenol/chloroform extraction and DNA precipitation with 1/10 volume of 5 M NaCl and 1 volume of isopropanol were performed. The obtained DNA pellet was dissolved with 3dH 2 O.
Pig oocyte collection and in vitro maturation. Ovaries of prepubertal gilts were collected in a local abattoir and transported to the laboratory at 37°C in a 0.9% NaCl solution supplemented with kanamycin and penicillin at 30 À 35°C within 3 h. Antral follicles of 3-6 mm were aspirated to collect the cumulus-oocyte complex (COCs) using a syringe. After washing three times with PVA-TL-HEPES media, the oocyte selection started with a screening for those oocytes that had a multilayered compact cumulus and a homogeneous ooplasm. The oocytes were washed twice again with TCM-199 media, which had been equilibrated for 3 h at 38.5°C under 5% CO 2 . Subsequently, the COCs were matured in TCM-199 media without hormones at 38.5°C in a humidified atmosphere of 5% CO 2 for 42 À 44 h. Then, the COCs were washed in the working solution (0.03 g hyaluronidase, 5.46 g mannitol, 0.001 g BSA, 5 ml PVA-TL-HEPES and 95 ml ddH 2 O) to remove the cumulus cell and the mature oocytes were picked out for storage at À 80°C.
In vitro fertilization. After maturation, the COCs were washed gently with 2 ml fresh IVF medium, to remove cumulus cells. After washing, denuded oocytes were placed in 50 ml drops of IVF medium covered with mineral oil. The oocytes were incubated at 38.5°C in an atmosphere of 5% CO 2 for 30 min until the spermatozoa were added. The sperms were washed twice by DPBS (136.89 mM NaCl, 2.68 mM KCl, 8.1 mM Na 2 HPO 4 and 1.46 mM CaCl 2 .2H 2 O) at 12,000 g min À 1 for 30 s and the resulting pellet was suspended in the IVF medium. After appropriate dilution, 50 ml of this sperm suspension was added to a 50 ml drop of IVF medium containing the oocytes at an oocytes:spermatozoa ratio of 1:4,000. The oocytes were incubated with the spermatozoa in IVF medium for 6 h at 38.5°C under 5% CO 2 . Next, they were transferred into 500 ml PZM-3 medium and cultured for another 2 days to the 4-cells stage, 4 days to the morula stage and 7 days to the blastocyst stage, and the samples were stored at À 80°C for further assays.
Mediums used in pig oocyte maturation and fertilization. The related medium recipes are shown as follows.
Immunostaining and confocal imaging. For 6mA immunostaining, the zebrafish embryos were fixed in 4% paraformaldehyde at 4°C overnight. Afterwards, the embryos were dechorionated manually and dehydrated with methanol and rehydrated with PBS, and the embryos were permeabilized for 15 min with PBS containing 1% Triton X-100. The permeabilized embryos were denatured with 2 M HCl for 1 h and then neutralized with 100 mM Tris-HCl (pH 8.5) for 20 min. The embryos were washed and then incubated for 1 h with blocking buffer (1.5% BSA in PBS containing 0.3% Tween and 50 mg mml À 1 RNase A for RNA digestion). After that, the embryos were incubated with 6mA antibody (1:1,000, SYSY antibody) at 4°C overnight. Washed thrice with PBS, the embryos were incubated with the secondary antibody at a 1:2,000 dilution (goat anti-rabbit Alexa488, Molecular Probes). Images were acquired using Nikon software on a Nikon Eclipse TI microscopy.
For pig sperm, oocyte and embryo immunostaining, the published protocol was followed 19 . The gametes and embryos were washed with PBS, fixed with 4% paraformaldehyde in PBS for 15 min, and permeabilized with 0.1% Triton X-100 in PBS at room temperature for 15 min. The permeabilized gametes and embryos were incubated in 4 N HCl solution at room temperature for 15 min and followed by neutralization in 100 mM Tris-Cl, pH 8.0 for 30 min. After blocking and incubation overnight with anti-6mA antibody, they were washed and incubated with secondary antibody. Nuclei were stained with DAPI. Immunofluorescence was visualized using confocal laser scanning microscope.
6mA dot blot assay. The DNA were denatured by heating at 98°C for 10 min and spotted on nitrocellulose membranes (GE Healthcare, catalogue number RPN303B). The membrane was baked at 80°C for 1 h and cross-linked by ultraviolet irradiation, the membrane was blocked in 5% BSA in PBS containing 0.5% Tween 20 (PBST) for 1.5 h at room temperature and then incubated with a 1:10,000 dilution of 6mA antibody (SYSY) overnight at 4°C. After three washes with PBST, the membrane was incubated with a 1:5,000 dilution of horseradish peroxidase-conjugated anti-rabbit secondary antibody. The membrane was then washed with PBST and treated with enhanced chemiluminescence (ECL).
To estimate the level of 6mA in zebrafish gDNA at different stages, synthetic 6mA-containing DNA oligonucleotide was diluted with unmethylated oligonucleotide to generate standards with a gradient of 6mA content (0.2, 0.1, 0.05, 0.025, 0.0125 and 0.00625%). All the samples and standards were loaded at the same amount. Among the samples we tested, gDNAs at 64-cell, 256-cell, sphere, 13 hpf and 24 hpf stages have 6mA levels of around 0.05, 0.01, 0.004, 0.002 and 0.002%, respectively, which is well consistent with the LC-MS/MS observation.
6mA-containing DNA oligo: 5 0 -TTGCT(6mA)GGTGGTTGCT(6mA)GGCGGTTGCT(6mA)GGGT-3 0 DNA 6mA-IP-seq and bioinformatic analysis. The published protocol of DNA 6mA-IP-seq was applied 6 . Fifty-basepair single-end sequencing was performed on the Illumina HiSeq2500 platform. Raw sequence reads were mapped to reference genomes (zv10/danRer10 for zebrafish) using Bowtie v1.0.1 (ref. 20), with parameters -M 1, which randomly select one best match if multiple alignments were found. Then MACS 21 software was used to identify the enriched regions (6mA peaks) by comparing reads from the IP sample with that from the input sample. False discovery rate cutoff was set to 0.01, to select statistically significant peaks. After these peak regions were obtained, the genomic loci were compared with the coordinates of repeat elements downloaded from RepeatMasker database 22 . If more than half length of one peak region overlaps with one annotated repeat element, the peak was annotated as a repeat-originated peak. Gene annotation information was downloaded from UCSC database (http://hgdownload.soe.ucsc.edu/goldenPath/danRer10/database/). The enrichment of peaks distributed in each genomic region was calculated by HOMER (Hypergeometric Optimization of Motif EnRichment) software 23 .
Primers for 6mA-containing REs. See Supplementary Fig. 10. Data availability. The data that support the findings of this study are available from the corresponding author upon request.