LINE-1 transcription in round spermatids is associated with accretion of 5-carboxylcytosine in their open reading frames

Chromatin of male and female gametes undergoes a number of reprogramming events during the transition from germ cell to embryonic developmental programs. Although the rearrangement of DNA methylation patterns occurring in the zygote has been extensively characterized, little is known about the dynamics of DNA modifications during spermatid maturation. Here, we demonstrate that the dynamics of 5-carboxylcytosine (5caC) correlate with active transcription of LINE-1 retroelements during murine spermiogenesis. We show that the open reading frames of active and evolutionary young LINE-1s are 5caC-enriched in round spermatids and 5caC is eliminated from LINE-1s and spermiogenesis-specific genes during spermatid maturation, being simultaneously retained at promoters and introns of developmental genes. Our results reveal an association of 5caC with activity of LINE-1 retrotransposons suggesting a potential direct role for this DNA modification in fine regulation of their transcription. Blythe et al investigate the patterns of oxidized forms of the 5-methylcytosine DNA modification during spermatogenesis. They find that open reading frames of evolutionarily young and transcriptionally active LINE-1 retrotransposons are enriched in 5-carboxylcytosine (5caC) during spermatid maturation, suggesting a link between 5cac and LINE-1 activity.

T he chromatin of both maternal and paternal pronuclei of the mammalian zygote undergoes extensive genome-wide reprogramming after fertilization, as the embryo transitions from germ cell to somatic developmental programs 1 . This process involves reorganization of the patterns of 5methylcytosine (5mC), a DNA modification associated with regulation of gene activity 1,2 and repression of transposable elements (TEs) 3 . Moreover, both maternal and paternal genomes are subjected to TET3-dependent oxidation of 5mC to 5hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5carboxylcytosine (5caC) in one-cell embryos [4][5][6][7] . Although the precise biological functions of these oxidized forms of 5mC remain elusive, they likely contribute to the embryonic developmental program due to their anticipated roles in DNA demethylation 8 and transcriptional regulation [9][10][11] .
In contrast with a large volume of experimental data on the rearrangement of DNA methylation patterns in pre-implantation embryos, little is known about the dynamics of DNA modifications during spermatid maturation. Nevertheless, a number of studies imply different epigenetic states for round spermatids (rST) and spermatozoa (SZ) nuclei 12,13 . Thus, the rate of successful development of the embryos produced by rST injection into oocytes (ROSI) is substantially lower than that of the embryos generated by injection using mature spermatozoa (intracytoplasmic sperm injection, ICSI). Moreover, ROSIderived embryos exhibit aberrant patterns of zygotic DNA methylation 13 and impaired zygotic demethylation 12 .
Since, 5fC/5caC can be recognized and excised from DNA by thymine-DNA glycosylase (TDG) followed by subsequent incorporation of unmodified cytosine into the abasic site by the components of the base excision repair (BER) pathway 8 , transient accumulation of these marks during differentiation may serve as an indicator of active demethylation 14,15 .
In this study, we aimed to examine whether TDG/BERdependent demethylation is utilized in spermatogenesis and to determine the patterns of the genomic distribution of oxidized forms of 5mC during spermatid maturation.

Results
The levels of 5caC substantially decrease in testis germ cells during spermatid maturation. To determine the levels of oxidized forms of 5mC in testicular cells, we initially used a method of sensitive immunostaining we previously employed for the detection of 5hmC and 5caC 14,15 . Whereas both germ and somatic cells of the murine testis displayed strong 5mC staining, 5caC was detectable only in germ cells, exhibiting particularly high levels of the signal in rST ( Fig. 1a and Supplementary  Fig. 1a). We next confirmed the specificity of 5caC staining in competition experiments on testis tissue sections ( Supplementary  Fig. 1b). Correspondingly, despite obvious detection of 5hmC, we did not observe a 5caC signal in the testes of two mouse models devoid of germ cells: busulfan treated 16 (busulfan destroys spermatogenic stem cells leading to a lack of any type of germ cells); and adult GILZ knockout 17 mice (where seminiferous tubules also contain only somatic cells) (Supplementary Fig. 1c). Moreover, the 5caC staining was not detectable in the testes of P4 (postnatal day) mice, lacking germ cells of any type other than spermatogonia, but was obvious in spermatocytes at P14 (Supplementary Fig. 1d). Remarkably, although 5hmC and 5mC signals were relatively high in both rST and SZ, the 5caC signal intensity dropped markedly between these two stages ( Fig. 1a and Supplementary Fig. 1e, f). In addition, the patterns of nuclear distribution of 5hmC and 5caC were not identical in rST, suggesting that the generation of these two marks is regulated independently in these cells ( Supplementary Fig. 1g, h).
To test if the dynamics of 5caC during spermatid maturation are associated with global DNA demethylation, we performed mass spectrometry (MS) detection of 5mC, 5hmC, 5fC, and 5caC in rST, and SZ ( Fig. 1b and Supplementary Fig. 2). Although in agreement with a previous study 18 , we could not detect any substantial differences in 5mC content between these stages of spermatogenesis, its oxidized derivatives exhibited different dynamics in our MS experiments (Fig. 1b). Indeed, a 30 percent reduction in the levels of 5hmC was accompanied by a marked 5fold decrease in 5caC content between rST and SZ, although no changes were detected in 5fC (Fig. 1b), at least under our experimental conditions. Intriguingly, although the MS results confirmed our immunostaining data, the analyses of publicly available RNA expression datasets [18][19][20][21][22][23][24] revealed that contrasting with Dnmt1/3a/3b/L and Tet1/2/3 transcripts, Tdg mRNA was not expressed at any appreciable levels in the analyzed testis cell types ( Fig. 1c and Supplementary Fig. 3), suggesting that TDG/ BER-dependent demethylation is not responsible for the elimination of 5caC during spermatid maturation.
The patterns of 5caC genomic distribution are highly dynamic during spermiogenesis. Since 5caC exhibited specific dynamics during spermatid maturation, we next profiled the genomic distributions of 5caC, 5mC, and 5hmC (referred to together as mod-Cs) in purified rST ( Supplementary Fig. 4) and SZ using DNA immunoprecipitation (DIP) coupled with high-throughput DNA sequencing 25 . After mapping reads to the mouse genome (Supplementary Fig. 5a), we found that the distribution of the densities of mod-Cs across CpG islands, transcription start sites, and most of the regulatory sequences previously identified in mouse testis 26,27 , followed the same pattern in rST and SZ (Supplementary Fig. 5b, c).
To determine the genomic regions enriched in mod-Cs, we performed calling of highly methylated (modified) regions (peaks) 28 , followed by identification of the confident peaks for each sample by comparing the corresponding replicates ( Fig. 2a-d and Supplementary Fig. 6a-e, Supplementary Data 1). Consistent with our MS and immunostaining results, the distribution of 5caC confident peaks in rST and SZ was highly dynamic and differed for peaks localized in repetitive and non-repetitive sequences, respectively (Fig. 2b, c and Supplementary Fig. 6a, b). Although the sequence-space occupied by 5caC peaks in non-repetitive sequences increased between rST and SZ ( Supplementary Fig. 6a), the majority of the 5caC peaks associated with Transposable Elements (TEs) in rST were not identified in SZ (Fig. 2b). Moreover, while most of the mod-C peaks overlapped with each other in SZ, a large proportion of the 5caC peaks coincided neither with 5mC nor 5hmC peaks in rST (Fig. 2c). Importantly, the majority of 5caC rST peaks did not correspond to 5mC or 5hmC enriched regions in SZ, suggesting that elimination of 5caC from DNA leads to the generation of unmodified cytosine during the rST to SZ transition ( Supplementary Fig. 6c). These analyses also revealed that the majority of the 5caC peaks were associated with introns and Long INterspersed Element class-1 (LINE-1 or L1) retrotransposons at the rST stage, but were distributed more evenly between different gene features and classes of repetitive sequences in SZ ( Supplementary Fig. 6d, e). Notably, during spermiogenesis, the numbers of 5caC peaks dropped in introns, LINE-1 retrotransposons, and in two classes of Short INterspersed Elements (SINEs) (B1 or Alu-like and B2 elements, Fig. 2d). Importantly, in agreement with our immunostaining and MSbased results (Fig. 1), general 5caC reads density markedly decreased over the majority of the 5caC peaks between rST and SZ stages, whereas the density of 5mC reads did not considerably change at the 5mC peaks (Fig. 3a, b). In summary, these data demonstrated that 5caC patterns are dynamic during spermiogenesis, and that this dynamism is specific for particular classes of genomic sequences.
5caC-enriched regions are eliminated from LINE-1 retrotransposons and LINE-1-associated spermiogenesis-specific genes and retained at developmental genes during spermatid maturation. To gain insight into the biological significance of the observed dynamics of mod-Cs, we next identified genes containing peaks of mod-Cs in rST or SZ (Supplementary Data 1). Notably, the majority of 5caC and 5hmC peak-containing genes (4311 and 1962, respectively) also displayed 5mC peaks in SZ, contrasting with a substantial number of genes (n = 727) containing only 5caC but not 5hmC or 5mC peaks at the rST stage (Fig. 4a). Using previously published datasets 18,19 , we next performed clustering analysis of the genes containing 5caC peaks, genes encompassing only 5caC but not 5hmC/5mC peaks, and genes containing only 5mC but not 5hmC/5caC peaks, according to their expression in different testis cell types ( Supplementary   Fig. 6f). Although we could not detect any significant differences in the representation of the various clusters between 5caC peakcontaining and 5mC only peak-containing genes in SZ, the genes with spermiogenesis-specific patterns of expression (clusters 7 and 10) were significantly enriched in the 5caC peak-containing than the 5mC only peak-containing category in rST (Fig. 4b, c). In contrast, genes transcriptionally silent during spermatid maturation but expressed at earlier stages of spermatogenesis ( Supplementary Fig. 6f, cluster 11), were enriched among 5mC only peak-containing genes in rST (Fig. 4b). Subsequent gene ontology (GO) analysis demonstrated that genes associated with "developmental process," "biological regulation" and "cellular process" categories acquired 5caC peaks between rST and SZ (Fig. 4d). Moreover, we observed retention of 5caC peaks in genes associated with morphogenesis of various anatomical structures, but not with reproduction-linked developmental processes during the rST/SZ transition (Fig. 4e). It is important to note that due to extremely low absolute levels of 5caC in sperm DNA (Figs. 1e and 3a), these 5caC peaks are unlikely to reflect the accumulation of this modification at the corresponding loci and more probably indicate that 5caC is retained at developmental genes in SZ. Interestingly, the genes containing only 5caC peaks in rST were enriched in "sperm-capacitation" and "oxidation reduction" GO categories, reportedly important for spermiogenesis 29 (Fig. 4f). To validate our results by an independent method, we identified differentially methylated regions (DMRs) 30 at the rST/SZ transition (Supplementary Data 2). We detected 12,539 DMRs that "gain" (rST↗SZ) and 2,874 DMRs that "loose" (rST↘SZ) 5caC in SZ cells (Supplementary Data 2). Importantly, as the density of 5caC reads was substantially decreasing between rST and SZ ( Fig. 3a, b), the 5caC enrichment at most of the DMRs "gaining 5caC" at SZ was likely extremely low. Next, we identified genes associated with both classes of 5caC DMRs (Supplementary Data 2). Notably, the majority of the rST↘SZ 5caC DMR-containing genes encompassed LINE-1s in their gene bodies or promoter regions, and corresponded to genes containing 5caC peaks in rST (Fig. 5a, b). In line with this, the 5caC density considerably dropped across the bodies of LINE-1 elements between rST and SZ stages at the same time (Fig. 5c). Moreover, similar to the cluster of expression enriched in 5caC only peak-containing genes (Fig. 5c), the expression of both LINE-1s and rST↘SZ 5caC DMR-containing genes reaches a maximum at the rST stage during spermatogenesis ( Fig. 5d-f).
The genes encompassing rST↘SZ 5caC DMRs were enriched in "signaling" and "biological adhesion" GO categories and included well-characterized spermiogenesis-specific genes located on the Y-chromosome such as Ssty2 and Sly 31 , which both contain several LINE-1 copies 32 (Fig. 5g). In contrast, confirming our peak analysis-based results (Fig. 4d), rST↗SZ 5caC DMRcontaining genes were associated with a wide range of "developmental process"-related GO categories (Fig. 5h). Thus, using both peak-based and DMR-based approaches, we identified an association of 5caC with LINE-1s and LINE-1-containing spermiogenesis genes in rST and with developmental genes in SZ.
Transcriptionally active and evolutionary young LINE-1 elements are enriched in 5caC in round spermatids. Since we identified an association of 5caC with LINE-1 elements during spermiogenesis, we next enquired how this modification is associated with the evolutionary age and transcriptional activity of these retrotransposons. Several TE types continue to impact the mouse genome, including SINE, LINE-1, and Endogenous Retrovirus (ERV) retrotransposons (reviewed in ref. 33 ). Active LINE-1 retrotransposons replicate in genomes using a copy-andpaste mechanism and have amplified to astonishing numbers in   the mouse genome, comprising at least 20% of its genomic mass 34 . Although different LINE-1 subfamilies can be identified within the mouse genome 34 , only three evolutionary young subfamilies of these retrotransposons continue to impact the murine genome: L1Md_G f 35 , L1Md_T f 36 , and L1Md_A-type 37 LINE-1s (reviewed in ref. 29 ). Based on two previous studies 38, 39 , we categorized murine TEs according to their evolutionary age and determined the degree of 5caC and 5mC enrichment for each of the obtained TE classes. This analysis revealed that the levels of 5caC enrichment of LINE-1 and SINE elements in rST showed a strong inverse correlation with the evolutionary age of these elements (Fig. 6a). In contrast with 5caC, we did not observe a comparable association of 5mC content with the age of TEs (Fig. 6b). Thus, a number of young LINE-1s exhibited moderate or low levels of methylation in round spermatids (Fig. 6b). Importantly, the majority of evolutionary young, originated <2 or 2-5 million years ago (MYA), LINE-1s displayed very substantial degrees of 5caC enrichment in rST (Fig. 6c) and most of them were highly expressed in these cells, as revealed after exploring previously published RNA-seq data set 18,19 (Fig. 6d). Conversely, we noticed that currently active LINE-1s were considerably enriched in 5caC and highly expressed in rST ( Supplementary  Fig. 7). Consistently, lower levels of 5caC enrichment were detected in evolutionarily older LINE-1s regardless of the levels of their transcriptional activity (Fig. 6c, d). Next, we compared the distribution of 5caC-, 5mC-DIP-, and whole-transcriptome sequencing reads in the consensus sequences of currently active mouse LINE-1s, focusing on their ORFs and 5′UTRs (Supplementary Fig. 7). These analyses revealed that the promoter regions (i.e., 5′UTR) of evolutionarily young and active LINE-1 retrotransposons (L1Md_A, L1Md_G f , and L1Md_T f ) are substantially depleted of 5caC compared with the ORFs of these retroelements ( Supplementary Fig. 7). Remarkably, we did not observe a similar depletion in 5mC in the 5′UTRs of L1Md_A elements ( Supplementary Fig. 7).
In summary, our analyses demonstrated that ORFs of evolutionarily young and transcriptionally active LINE-1s are considerably enriched in 5caC during spermatid maturation.

Discussion
A number of recent studies suggest that, in addition to their roles as intermediates in active demethylation pathway, both 5fC and 5caC may also function as informative epigenetic marks [40][41][42] . 5fC has been shown to be associated with specific sets of regulatory genomic sequences 43 , and both 5fC and 5caC have been reported to interact with specific groups of candidate "reader" proteins in MS-based proteomics experiments 40 . Most importantly, compared with relatively small numbers of putative 5hmC-binding proteins, potential 5fC and 5caC readers include rather long lists of transcription factors, chromatin remodelers, and histonemodifying enzymes 40,44 . As our results suggest an association of 5caC with actively transcribed loci and show that, in contrast with those of 5mC and 5hmC, 5caC patterns are extremely dynamic during spermiogenesis, our data, collectively, contribute to the emerging body of experimental evidence suggesting a specific function for 5caC in the regulation of gene expression.
Given that reorganization of paternal patterns of 5caC is undergoing during maturation of spermatids, and that Tdg is extremely poorly expressed in the mouse testis, our data support the existence of a TDG-independent mechanism of active demethylation, and are in line with previous studies implying that such a mechanism is operative in pre-implantation embryos 6,7 . Interestingly, although it seems unlikely that DNA glycosylases other than TDG contribute to active demethylation as their knockouts do not appear to affect developmental capacity 45 , an unidentified 5caC-specific decarboxylase activity has been detected in mouse ESCs on the basis of isotope tracing 46 . Furthermore, since mammalian de novo DNA methyltransferases DNMT3A and DNMT3B have been reported to convert 5mC and 5hmC to unmodified cytosines in vitro 47,48 , these enzymes may potentially possess DNA decarboxylase activity in specific chromatin microenvironments as well 49 .
Despite the fact that 5fC/5caC are immunochemically detectable in one-cell embryos, a recent study demonstrated that the paternal germline-specific knockout of all three TET proteins does not affect early embryogenesis 50 , implying either dispensability of the spermatogenesis-specific 5mC oxidation for the Genes with 5hmC peaks ~20% of the mouse 52 and human 53 genomes, and are actively transcribed during germline and early embryonic development in most mammalian species [54][55][56] , generating new insertions in these cell types 57,58 . Although the transcriptional activity of LINE-1s has long been regarded as a side effect of chromatin remodeling taking place at these developmental stages, a number of recent studies suggest that activation of these retrotransposons is essential for regulating global chromatin accessibility and activity of their host genes during normal development [59][60][61] . This implies the existence of a complex interplay between the activity of TEs and host gene expression, and suggests that transcription of retrotransposons needs to be finely tuned in both the germline and the early embryo [61][62][63] . Given that our analysis demonstrates an association between 5caC and transcriptionally active LINE-1s in rSTs, we speculate that the oxidation of 5mC to 5caC may contribute to the delicate regulation of activity of LINE-1 elements and LINE-1-linked genes in these cells. As both 5fC and 5caC have been shown to decrease the rate and substrate specificity of RNA polymerase II transcription and retard transcript elongation on gene bodies 64,65 , these modifications may directly and specifically reduce the transcription rate of active, evolutionarily young LINE-1s, "correcting" the levels of their activity during spermiogenesis. Indeed, together with the presence of LINE-1associated 5caC-enriched regions in the introns of genes essential for transposon repression, such as piRNA pathway gene Maelstrom (Mael) 66,67 , our data add an additional level of complexity to the potential role of this DNA modification in the regulation of LINE-1 elements. In summary, our results suggest that 5caC may be an integral part of an intricate regulatory network governing the activity of LINE-1 retrotransposons during mammalian development, based on finely adjusting the levels of their transcription to avoid accumulating deleterious mutations in the germline genome over evolution.

Methods
Animals. Experiments were performed in compliance with the UK and EU guidelines for the care and use of laboratory animals. Animal procedures were subjected to local ethical review (Comite d'Ethique pour l'Experimentation Animale, Universite Paris Descartes; registration number CEEA34.JC.114.12). C57BL/ 6 wild type, CD1 wild type and CD1 GILZ Y/-adult, 4 and 14 dpp male mice were culled by decapitation or by cervical dislocation following sedation by inhalation of CO 2 . Spermatozoa were collected from the caudal segment of the epididymis.
Mass spectrometry. Purification of spermatogenic cells was performed by elutriation as previously described 68 . DNA was isolated according to standard procedures. The 2-dimensional ultra-performance liquid chromatography-tandem mass spectrometry (2D-UPLC-MS/MS) analyses were performed according to the method described in ref. 69  Chromatographic separation was performed with a Waters Acquity 2D-UPLC system with photo-diode array detector, for the first-dimension chromatography (used for quantification of unmodified deoxynucleosides (dN) and 5-methyl-2′-deoxycytidine (5-mdC)), and Xevo TQ-S tandem quadrupole mass spectrometer for second-dimension chromatography. Atcolumn-dilution technique was used between the first and second dimension for improving retention at trap/transfer column. The columns used were: a Phenomenex Kinetex C18 column (150 mm × 2.1 mm, 1.7 µm) at the first dimension, a Waters Xselect C18 CSH (30 mm × 2.1 mm, 1.7 µm) at the second dimension, and Waters Xselect C18 CSH (30 mm × 2.1 mm, 1,7 µm) as trap/transfer column. The chromatographic system operated in heart-cutting mode, indicating that selected parts of effluent from the first dimension were directed to trap/transfer column via 6-port valve switching, which served as "injector" for the second-dimension chromatography system. The flow rate at the first dimension was 0.25 mL/min and the injection volume was 0.5-2 µL. The separation was performed with a gradient elution for 10 min using a mobile phase 0.1% acetate (A) and acetonitrile (B) (1-5% B for 5 min, column washing with 30% acetonitrile, and re-equilibration with 99% A for 3.6 min). Flow rate at the second dimension was 0.35 mL/min The separation was performed with a gradient elution for 10 min using a mobile phase 0.01% acetate (A) and methanol (B) (4-50% B for 4 min, isocratic flow of 50% B for 1.5 min, and reequilibration with 96% A up to next injection). All samples were analyzed in three to five technical replicates of which technical mean was used for further calculation. Mass spectrometric detection was performed using the Waters Xevo TQ-S tandem quadrupole mass spectrometer, equipped with an electrospray ionization source. Collision-induced dissociation was obtained using argon 6.0 at 3 × 10 −6 bar pressure as the collision gas. Transition patterns for all the analyzed compounds, as well as specific detector settings, were determined using the MassLynx 4.1 Intelli-Start feature.
5mC-, 5hmC, and 5caC-DNA IP (DIP). 5mC-, 5hmC, and 5caC-DNA IP (DIP) was performed as described 14,43 . Spermatogenic cells were purified using FACS sorting 31 . Fractions were assessed under phase optics and the purity of rST fraction was consistently more than 95%. Genomic DNA was isolated from rST and SZ cells or hPSCs according to standard procedures and fragmented to 100-300 bp using Diagenode Bioruptor Standard UCD-200. 10 μg of genomic DNA was used for immunoprecipitation. 5hmC-and 5caC-DIP were carried out using rabbit polyclonal antibodies (Active Motif) and magnetic anti-rabbit Dynabeads (Invitrogen). Mouse monoclonal antibody (clone 33D3, Diagenode) and anti-mouse Dynabeads (Invitrogen) were used for 5mC-DIP. Specificity of the antibodies was assessed in DIP experiments with modified oligonucleotides as described 14 .
Library preparation and high-throughput sequencing. SOLiD sequencing libraries (rST and SZ samples) were prepared from 5caC-, 5hmC-, and 5mC-DIP enriched DNA as stated in the Lifetech Solid 5500 Chip-Seq library preparation guide. Enzymes and reagents were used from the 5500 SOLiD Fragment Library Core Kit (Life Technologies, 4464412). Fifteen cycles of library amplification were carried out using primers specific to the library sequencing adapters. Barcoded DNA fragment libraries were quantified using the Kapa Library Quantification kit (Kapa Biosystems, KK4823) and pooled equally. The Solid EZ bead system was used according to the manufacturer's guidance to prepare ePCR and enrichment of templated beads.
Bioinformatics analysis. 5mC, 5hmC, and 5caC-DIP-SOLiD read alignment were performed as follows. The 75 bp color space (cs) SOLiD reads were aligned to the Fig. 4 5caC-enriched regions are eliminated from spermiogenesis-specific genes and are retained at developmental genes during spermatid maturation. a Venn diagrams showing overlaps between the genes containing mod-Cs peaks in rST and SZ. Each circle's area is equivalent to the number of genes in each category (indicated). b Distribution of all the genes containing 5caC peaks (5caC all), genes encompassing only 5caC but not 5hmC/5mC peaks (5caC only), and genes containing only 5mC but not 5hmC/5caC peaks (5mC only) in rST and SZ regarding to different clusters of gene expression. The significance was determined by Z-test of proportions, **Z score = 7.5481 and ***Z score = 9.03 designate significance at 99% confidence interval. c The clusters of gene expression enriched in 5caC peak-containing genes in rST. d Gene ontology (GO) categories significantly enriched in 5caC peakcontaining genes in rST and SZ classified according to their GO score. e GO categories associated with developmental processes significantly enriched in 5caC peak-containing genes at rST and SZ stages classified according to their GO score. f GO categories enriched in the genes containing exclusively 5caC but not 5hmC/5mC peaks. The significance threshold (GO score = 3) is indicated with a dashed line in d-f.
mouse genome (mm10) using the aligner LifeScope (Life Technologies). The alignment parameters were modified to use a seed length of 60 cs with a 6 cs miss-match allowance. Reads that aligned to more than 99 genomic positions were discarded. Reads were mapped with a low miss-match tolerance. The primary alignment position was recorded for each read. If a read mapped to more than one position with the same best alignment score then one of these positions was selected at random as the primary alignment position. Highly modified (methylated) regions (HMRs, peaks) were first identified from alignment BAM files for each replicate sample using the peak-calling software MACS1.4 28 26,27 , extracted from the ENCODE project (http://genome.ucsc.edu/ENCODE/) and converted to the mm10 assembly coordinates using LiftOver 28 . The genomic coordinates of features were compared using BedTools 71 . Pybedtools were used to generate base pair-specific Venn diagrams 72 . The coverage plots were generated with the R package Gviz and BEDtools. For the analysis of gene expression in spermatogenic cell types, the RNAseq reads from a previously published data set 18,19 were re-analyzed using the mouse genome assembly mm10. Reads were aligned to the mouse genome using TopHat2 73 . Read counts per gene were subsequently calculated with Htseq-count 74 and normalized gene expression values (RPKM) calculated. The RPKM values for LINE-1 elements were calculated using primary read alignments. Gene expression profiles were clustered according to changes in expression value across the testis cell types using the bioconductor package Mfuzz 75 . Statistically enriched Gene Ontology (GO) categories were determined within lists of genes using Partek Genomics Suite version 6.6 Gene Set ANOVA. Area-proportional Venn diagrams for lists of genes were generated using BioVenn 76 . The heatmap plots that compare computed read densities across 5caC and 5mC SZ and rST peaks were generated using deeptools2 plotHeatmap tool 77 . The deeptools computeMatrix parameters were: referencepoint-referencePoint center -a 3000 -b 3000-skipZeros-missingDataAsZero. To   quantify Transposable Elements at the subfamily level, we used SQuIRE (Software for Quantifying Interspersed Repeat Elements) 78 . Briefly, DIP-SOLiD or RNA-seq-Illumina reads from the previous study 18,19 (GEO accession number GSE35005) were aligned to mm10 reference genome, and the quantification stage was performed using the SQuIRE-specific algorithm, which incorporates both unique and multi-mapped reads, generating output read counts and fragments per kilobase transcript per million reads (FPKM) for each TE locus. TE count tables were used for differential (expression) analysis of genes and TEs using DESeq2 via Squire Call tool. DESeq2 79 estimates variance-mean dependence in count data, and tests for differential (expression) analysis, based on a model using the negative binomial distribution. The age of TEs was estimated as described in ref. 34 , and TE-consensus sequences were obtained from DFAM 80 .
Statistics and reproducibility. At least three independent experiments were carried out for MS and immunostaining experiments. All experiments were replicated independently. DIP was performed in two biologically independent experiments. We observed a generally good correlation between the replicates. Statistical tests used for individual experiments are described in corresponding figure legends. Signal intensity and MS data were plotted and analyzed in GraphPad Prism 7.04.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The rST and SZ deep sequencing data have been deposited in the EBI's European Nucleotide Achieve (ENA) (http://www.ebi.ac.uk/ena) under accession number PRJEB8358. MS source data for Fig. 1b can be found in Supplementary Data 3. The confocal raw data and all other data supporting the conclusions of this study are available from the corresponding author upon reasonable request.

Code availability
The in-house scripts used for the analysis can be found in the following online repository (https://bitbucket.org/thomgiles/adac0175-code).