Dnmt1 has de novo activity targeted to transposable elements

DNA methylation plays a critical role during development, particularly in repressing retrotransposons. The mammalian methylation landscape is dependent on the combined activities of the canonical maintenance enzyme Dnmt1 and the de novo Dnmts, 3a and 3b. Here, we demonstrate that Dnmt1 displays de novo methylation activity in vitro and in vivo with specific retrotransposon targeting. We used whole-genome bisulfite and long-read Nanopore sequencing in genetically engineered methylation-depleted mouse embryonic stem cells to provide an in-depth assessment and quantification of this activity. Utilizing additional knockout lines and molecular characterization, we show that the de novo methylation activity of Dnmt1 depends on Uhrf1, and its genomic recruitment overlaps with regions that enrich for Uhrf1, Trim28 and H3K9 trimethylation. Our data demonstrate that Dnmt1 can catalyze DNA methylation in both a de novo and maintenance context, especially at retrotransposons, where this mechanism may provide additional stability for long-term repression and epigenetic propagation throughout development.

the MOABS software suite with standard parameter settings 6 . Finally, all CpGs that were covered with less than 1,000 reads (1,000X) were discarded from the analysis.

Hairpin bisulfite processing
CpG methylation states were determined using BiQAnalyzerHT, filtering for a sequence identity of >= 0.8 (reference sequence for IAP and IAPez have been shortened in order to obtain a better mapping of the sequencing reads) and strand-specific methylation patterns were obtained using Hairpinanalyzer V0.3 (Mathias Bader and Julia Arand, available under: https://github.com/PascalGiehr/Hairpinanalyzer), filtering for a linker conversion rate of >= 0.8. H(O)TA was applied in order to determine enzyme efficiencies 7,8 .

MeDIP-seq processing
The MeDIP-seq sequencing data were aligned to the mouse reference genome mm9 using segemehl (v0.2.0, default parameters) 9-11 and the alignment was filtered to remove unusual insert sizes (>1kb) and multiple mappers. Postprocessing was done using the R package QSEA 12 with the DKO0 methylation rates as reference.

ChIP-seq processing
The ChIP-seq sequencing data as well as the control input sequencing were aligned to the mouse mm9 reference genome using BWA mem 13 using the default parameter. GATK 14 was used to obtain alignment metrics and remove duplicates. Peaks were called using the MACS2 (2.1.2_dev) 15 peakcall function using default parameters. After validation of replicate comparability and quality, replicates were merged on read level and reprocessed together with input samples. Background subtracted coverage files were obtained using MACS2 bdgcomp with -m FE.

Bioinformatics tools, statistics, and visualization
Command-line processing of BAM, BED and bigwig files was done using SAMtools (v1.10) 16 , BEDtools (v2.25.0) 17 and UCSCtools (v4) 18 . If not stated otherwise: All statistics and plots are generated using R version 3.6.1 "Action of the Toes". In all boxplots, the centerline is median; boxes, first and third quartiles; whiskers, 1.5 x inter-quartile range; data beyond the end of the whiskers are displayed as points. Statistical

Differentially methylated regions
Differentially methylated regions (DMRs) were called using a dual sliding window approach.
To capture smaller windows, the first window was defined as 1 kb in length and 250 bp sliding, for broader regions the second window was defined as 2 kb and 500 bp sliding. Windows covering less than 10 CpGs or located in regions called as missing based on Nanopore sequencing were excluded from the analysis. For each of the remaining windows, the average methylation rate difference between the P15 DKO0 and TKOL samples was calculated and any window with an absolute difference >0.15 was considered a potential DMR. Finally, all DMRs within 5 kb distance were merged and regions with an average methylation rate of >0.15 in the DKO0 P15 were considered as DMRs.

Control regions
Control regions (CRs) were resampled from the genome using BEDtools shuffle. DMR length distribution (in nt) and the number of DMRs were preserved and only regions with 10 or more CpGs were allowed as a destination for BEDtools shuffle. To obtain these regions, we merged all CpGs covered by TKOL and DKO0 samples that are within a distance of 2.5 kb and filtered for a minimum of 10 CpGs. The identity of the original DMR was transferred to allow for the exclusion of the CR if the DMR was removed based on Nanopore sequencing data analysis.
Each DMR was placed randomly 1,000 times to avoid strong impact of outlier samples.
Additionally, the same number of CRs were placed only into regions with DMR-like CpG density to create the CpG-density matched CRs (CR CpG ).

Filtering for high confidence DMRs
DMRs were filtered based on Nanopore long reads from the same cell line by stringently filtering on methylation difference as measured from Nanopore read data. Only DMRs that had at least 10 informative CpG positions were subject to filtering and any DMR with a mean difference between the DKO0 and TKOL samples of less than 0.05 was excluded (n = 1,058).
The corresponding control regions were removed from the set of CRs used for further analysis (n = 1,515).

Comparison of alignments at DMRs/CRs
For Illumina sequenced WGBS data global as well as alignments overlapping DMRs and CRs were counted individually using the SAMtools view command for counting (-c), DMRs, or CRs specified by -L and filtered out (-F) of specific bitwise flags. Unique alignments of Illumina short-read WGBS were defined by the not set "not primary alignment" SAM flag (bitwise flag 256). For Nanopore long-read alignments, the aligner does not report any reads that align to more than one location with the same quality. To validate that DMRs are not biased towards lower coverage with this approach, the read coverage distribution of DMRs and CRs for Nanopore sequencing were compared to Illumina WGBS read coverage distributions with and without multi-mappers (bam to coverage in-house python script).

Characterization of genome-wide methylation
1 kb tiling windows were calculated using BEDtools makewindow with parameter -w 1000.
The average methylation level of all 1 kb windows was calculated using UCSCtools bigWigAverageOverBed and filtered for a minimum coverage of 5 CpGs. The methylation rate distribution of the DKO0 and TKOL was visualized using the density plot function of ggplot2.
The methylation difference in these windows between the DKO0 and TKOL was calculated and visualized using the empirical cumulative density function of the ggplot2 package. The genomic distribution of DMRs was visualized using the Cis-regulatory Element Annotation System (CEAS) 23 .

Correlation of methylation rates in 5 kb windows and IAPs
5 kb tiling windows were calculated using BEDtools makewindow with parameter -w 5000.
The average methylation level of all 5 kb windows not overlapping an IAP as well as for all IAP elements were calculated using UCSCtools bigWigAverageOverBed and filtered for a minimum coverage of 10 CpGs. For visualization the R smoothScatter function was used.

Split violin plots for DMRs and CRs
Genome-wide methylation information was filtered by sample to CpGs within DMRs or CRs.
The single CpG resolution methylation values were visualized in R using the vioplot package (parameter side=left/right).

Read distribution in MeDIPseq
SAMtools view was used to calculate the number of reads in DMRs/CRs (parameter -c -F 256 -F4). CR counts were divided by 1,000 to account for their overrepresentation. Boxplots were used to display the mean number of reads.

Profile plots and enrichment heatmaps (ChIP-seq/MeDIP-seq)
The computeMatrix function of the deepTools package in scale-regions mode with parameters -missingDataAsZero binSize=50 or 100 was used to calculate the input for the plotHeatmap function. Missing data color was set to white.

Overlap of DMRs and CRs with genomic element
Genomic regions were each first collapsed using BEDtools merge to avoid counting regions twice, i.e., overlapping repeat classes within a family. Next, for each genomic element the overlap with DMRs (CRs) was calculated, normalized by the total length of DMRs (CRs) and visualized using ggplot2. For enrichment of DMRs a two-sample Wilcoxon test was used to compare DMR and CR overlaps (alternative='greater') with the respective elements.

Fold-change of hairpin bisulfite derived CpG methylation status
Hairpinanalyzer output was further used to visualize the change in CpG status composition per target. The fraction of CpGs that are fully methylated, hemi-methylated on plus-or minusstrand or fully unmethylated were normalized by TKOL values for DKO0 P1, DKO0 P5 and WT. Log2-fold change of these normalized values was plotted using ggplot2.

Unique vs. multiple mapper analysis
Alignment statistics of Illumina sequenced samples for DMRs (CRs) were obtained using SAMtools view with parameters -c (count) -F 4 (exclude unmapped reads) and -F/-f 256 (unique/multiple alignments) and -L DMRs/CRs (reduce to regions). For CRs the average per replicate was used to calculate the log2 fold-change of unique (multiple) alignments between DMRs and CRs. A paired Wilcoxon test was used to test if unique (multiple) alignments show a different prevalence in DMRs compared to CR. Nanopore reads are by default only aligned uniquely, therefore the distribution of DMR vs. CR alignment rates (log2 fold-change) is only shown for unique alignments.

Gene set enrichment analysis of mass spectrometry experiments
Genes that were detected as significantly enriched in DKO0 and TKOL Uhrf1-FLAG Mass Spec experiment were used as input for the WEB-based Gene SeT AnaLysis Toolkit (WebGestalt) to calculate Gene Set Enrichment Analysis (gene ontology: cellular components) 24 .

ERVK, methylation and Uhrf1, Trim28 and H3K9me3 enrichment at DMRs
The heatmap visualizing the overlap with ERVKs, the binding signal of Uhrf1, Trim28 and H3K9me3 as well as DMR was plotted using the R package EnrichedHeatmap. Single nucleotide resolution bigWig files and the ERVK annotation file were loaded to R and for each sample the normalizeToMatrix function (w=250; methylation: mean_mode='absoltute', background=NA; Uhrf1, Trim28 and H3K9me3: mean_mode='coverage', background=0) was used to calculate the signal across DMRs.

IAPEz-int methylation comparison of E3.5 and E6.5
The mean DMR methylation rate of each E3.5 and E6.5 embryo conditions was calculated and visualized using the R package ComplexHeatmap. The color key was set display the center color to 0.15 to allow for visualization of expected methylation rate dynamics.