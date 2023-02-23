Generation of DCM–Polr2b ESC line

DCM and Polr2b sequences were amplified by polymerase chain reaction (PCR) from cDNA. For DCM, the primers 5′-GCTAGCATGGTCGACCAGGAAAATATATCAGTAACCGATTCAT-3′ and 5′-GCGGCCGCTTAGGGCGCGCCTCGTGAACGTCGGCCATGTTGTGCCTC-3′ were used and for Polr2b 5′-GCTAGCATGGGCGCGCCGCAATATGATGAAGACGACGATGAGATCA-3′ and Polr2b reverse 5′-GCGGCCGCCTAGTCGACTCATTCGTGGTGCGATGCTCATGGACAT-3′. The DCM sequence was (XhoI-AscI) inserted in frame 5′ to the Polr2b coding region, eliminating the translational stop and generating the DCM–Polr2b fusion gene. The fusion gene was introduced (XhoI-NotI, blunt) into pgk-ATG-frt (Addgene, 20734). This DCM–Polr2b shuttle vector was targeted to KH2 mouse ESCs harboring a frt homing site and hygromycin resistance gene in the Col1a1 locus using a pCAGGS-flpE flipase expression cassette15. Positive clones were identified by PCR using the following primers: Cola1-flpin: 5′-TGCTCGCACGTACTTCATTC-3′ and 5′-GAAAGACCGCGAAGAGTTTG-3′.

Empty clones were excluded using primers: Cola-flpin-empty: F1: 5′-TGCTCGCACGTACTTCATTC-3′ and R1: 5′-GGGGAACTTCCTGACTAGGG-3′.

ESCs were maintained on mouse embryonic fibroblasts in ES medium (DMEM, 15% FCS, penicillin–streptomycin (Invitrogen), 1% non-essential amino acids (Lonza, BE13-114E), LIF (1,000U ml−1), 0.1 mM 2-mercaptoethanol (Gibco, 31350-010)) with daily media changes. Induction was achieved by adding dox (final concentration 2 µg ml−1; Sigma-Aldrich, D9891) to medium. ESCs were passaged every 4 days using trypsin/EDTA dissociation.

ESC pulse-chase experiment

DCM-Rpol2b:m2rtTA transgenic ESCs were induced with dox for 24 hours, followed by a chase and collection of cells at 4 hours, 8 hours, 12 hours and 24 hours after wash, RNA isolation, reverse transcription and quantitative PCR with primers detecting:

DCM-Polr2b: Fwd: 5′-GGT TTC GGA CAC TCA GGC-3′, Rev: 5′- AGT GAT CTC ATC GTC GTC TTC A-3′

Gapdh: Fwd: 5′- TGC CCC CAT GTT TGT GAT G-3′ Rev: 5′- TGT GGT CAT GAG CCC TTC C-3′

Hsp90: Fwd: 5′- CCA CCA CCC TGC TCT GTA CTA-3′, Rev: 5′- CCT CTC CAT GGT GCA CTT CC-3′

Differentiation of mouse ESCs to NPCs

Mouse ESCs were differentiated toward NPCs according to an established differentiation protocol46,47. Before differentiation, DCM–Polr2b ESCs were cultured on mouse embryonic fibroblasts in ES medium and induced with dox for 5 days, with medium refreshed every other day. For differentiation, an adaption of published protocols was used (Conti et al.46 and Splinter et al.47). In brief, cells were trypsinized and pre-plated for 40 minutes to remove mouse embryonic fibroblasts. In total, 8 × 105 cells were plated on gelatinized 10-cm dishes and grown in N2B27 medium for 7 days. N2B27 medium consists of 1:1 DMEM/F12 (Gibco, 31330-038) and Neurobasal Medium (Gibco, 21103-049) supplemented with NDiff Neuro-2 Medium Supplement (200×; Millipore, SCM012), 0.5× B-27 supplement (50×), serum free (Gibco, 17504-044), l-glutamine (100×) (Gibco, 25030-024), 50 mM 2-mercaptoethanol (Gibco, 31350-010) 1 ml and penicillin–streptomycin. Afterwards, cells were incubated with Accutase cell detachment solution (Millipore, SCR005) at room temperature until detaching and centrifuged for 5 minutes at 188g. In total, 3 × 106 cells were resuspended in N2B27 medium + EGF (10 ng ml−1; PeproTech, 315-09) and FGF2 (10 ng ml−1; PeproTech, 100-18B) and grown in suspension in 10-cm dishes. The cells form aggregates, and, after 3 days, the aggregates were collected and treated for 5 minutes with Accutase at room temperature. Pelleted cells were resuspended in N2B27 medium + EGF/FGF2 and grown on laminin-coated (Sigma-Aldrich, L2020) plates with media change every other day. After 14 days, the NPCs were harvested, and RNA was isolated using the ReliaPrep RNA Cell Miniprep System (Promega, Z6012) using the manufacturerʼs protocol. As a negative control, non-induced DCM–Polr2b ESCs were used.

Polr2b–DCM transgenic mice

Expression of the DCM–Polr2b fusion transgene in DCM–Polr2b:m2rtTA mice was mediated by addition of dox to the drinking water (2.0 mg ml−1, 2% sucrose). Mice were sacrificed by cervical dislocation, after which the jejunum was isolated, and 3–4 cm of proximal jejunum was used for either isolation of total epithelium or villi-enriched isolation. Total jejunum samples were used for DNA and/or RNA isolation; villi samples from jejunum were either used for DNA and/or RNA isolation or dissociated into single cells for FACS. Total epithelium was isolated using chelation Ca2+ ions to weaken cell adhesions and subsequent mechanical separation as previously described48. For villi isolation from jejunum, we continued with an additional protocol to separate villi from crypts and, if needed, followed by single-cell isolation49. Unless stated otherwise, all incubation steps were performed at 4 °C and centrifugation for 5 minutes at 200g and 4 °C. The jejunum was collected, flushed with cold 1× PBS, opened longitudinally and cut into pieces of approximately 1 cm. The pieces were incubated in 2 mM EDTA/PBS for 30 minutes on a shaker. After washing twice with cold PBS, the fragments were resuspended in 5 ml of PBS, and villi were mechanically separated with a 10-ml serological pipette. Villi were collected and supplemented with Dispase II (0.4 mg ml−1, Sigma-Aldrich). After 30-minute incubation on a shaker at 120 cycli per minute at 37 °C, the Dispase II reaction was stopped by addition of FCS (5% final concentration). Cells were filtered through a 40-uM cell strainer, counted and collected in 2% FCS/PBS at 1 × 106 cells per milliliter. Cells were incubated with antibodies (Epcam, Cd326, eFluor 450, eBioscience, from Thermo Fisher Scientific, 48-5791-82) and SLC2A2 (GLUT-2, cy5 from Bioss, bs-0351R-Cy5) for 45 minutes at 4 °C, protected from light and washed twice in 1 ml of cold PBS. After final centrifugation (5 minutes at 200g and 4 °C), cells were resuspended in 1 ml of cold PBS and filtered through a 40-uM cell strainer before proceeding with FACS. Cells stained for CD326 and GLUT-2 were FACS sorted using a BD FACSAria II version 9.0.1, using FlowJo 10.7.2, and double-positive cell populations were isolated, collecting >10,000 enterocytes per timepoint. DNA was isolated using a QIAamp DNA Micro Kit (Qiagen, 56304) according to the manufacturer’s protocol. RNA was isolated by a TRIzol extraction. All animal experiments were approved by the Dutch Central Committee on the Ethics of Animal Experiments (AVD10100202115681).

Western blot analysis

For western blot analysis, total protein of mESC DCM–Polr2b was isolated at different timepoints after dox induction (2 µg ml−1; Sigma-Aldrich, D9891) with RIPA buffer (Abcam, ab288006). The total protein was run on NuPAGE 3–8% Tris-Acetate Gel (Invitrogen, EA03755BOX) and transferred to a PVDF membrane overnight at 4 °C and constant current (60 mA). The membrane was blocked for 30 minutes at room temperature in blocking buffer (1.3 g of non-fat dry milk in 50 ml of 1× Tris-buffered saline) and probed with POLR2B (Thermo Fisher Scientific, PA5-30122) or DCM (Cusabio, CSB-PA365131XA01ENV) primary antibodies (1:1,000) in blocking buffer + 0.1% Tween for 2 hours at room temperature. Subsequently, the membrane was incubated with anti-rabbit HRP secondary antibody (Sigma-Aldrich, A6154, 1:5,000) in blocking buffer + 0.1% Tween for 1 hour at room temperature. As an internal protein control, β-actin was used: monoclonal anti-β-actin peroxidase antibody (Sigma-Aldrich, A3854, 1:7,500). Protein was detected with SuperSignal West Femto Maximum Sensitivity Substrate (Thermo Fisher Scientific, 34094) and an Amersham Imager 600 (Amersham Biosciences).

Immunohistochemistry on cryosections

Mice were sacrificed, and the proximal jejunum was collected and flushed with cold PBS, opened longitudinally and cut into 1-cm pieces. The tissue pieces were fixed for 3 hours at 4 °C in 4% PFA/PBS and subsequently rotated overnight at 4 °C in 4% PFA/30% sucrose/PBS. Fixed tissue pieces were embedded in OCT, and 8-µm-thick slices were sectioned onto silane adhesive slides and fixed with cold methanol for 20 minutes. Sections were washed three times for 5 minutes with PBST (1× PBS, 1% BSA and 0.1% Tween 20) and were incubated for 10 minutes at room temperature in PBSTX (1× PBS, 0.5% Triton X-100 and 1% BSA) for permeabilization. After three PBST washes of 5 minutes, sections were blocked for 1 hour at room temperature in blocking solution (1× PBS, 0.1% Tween 20 and 5% normal goat serum (Sigma-Aldrich, G9023)). This was followed by an overnight incubation at 4 °C with primary antibody in blocking solution. Sections were washed three times for 5 minutes in PBST and incubated for 1 hour at room temperature with secondary antibody (1:500) in blocking solution. Sections were then washed with PBST three times for 5 minutes and covered with ProLongGold antifade reagent with DAPI (Invitrogen, P36931).

Immunohistochemistry on paraffin sections

Jejunum was collected and prepared as described above. The tissue fragments were fixed overnight at 4 °C in 4% PFA/PBS and embedded in paraffin according to standard histologic protocols. Then, 6-µm sections on silan adhesive slides were incubated for 1 hour at 60 °C, deparaffinized and rehydrated in serial xylene and ethanol steps and washed three times in PBS. Sections were incubated for 15 minutes at room temperature in ProtK (1 µg ml−1 in PBS) and washed four times for 2 minutes in dH 2 O. Epitope retrieval was performed in 1× sodium citrate buffer (0.01 M) pH 6 + 0.05% Tween, by using the microwave at 900 W for 20 minutes. The sections were cooled down in the buffer to room temperatuire for 1 hour, washed three times in PBS for 5 minutes and blocked for 30 minutes at room temperature in 10% normal goat serum/5% BSA/PBS (for the GNL3/nucleostemin, antibody donkey serum (Sigma-Aldrich, D9663) was used). This was followed by an overnight primary antibody incubation at 4 °C in 5%BSA/PBS. The sections were washed three times for 5 minutes in PBS, incubated for 1 hour at room temperature with secondary antibody (1:500) in 1% BSA/PBS, washed again three times for 5 minutes in PBS and covered with ProLongGold antifade reagent with DAPI. For the staining of GNL3/nucleostemin in combination with HCAM, a TSA Biotin Systems Kit was used according to the manufacturerʼs protocol.

Antibody Source Dilution Rabbit anti-SGLT1 Alomone Labs, AGT-031 1:200 Rabbit anti-SLC43A2 MyBioSource, MBS9210948 1:50 Rabbit anti-SLC2A2/Glut2 Cy5 Bioss, bs-0351R-Cy5 1:50, 5 µl per 1 × 106 cells Rat anti-EpCam 450 Invitrogen, 48-5791-82 1:50, 2 µl per 1 × 106 cells Rat anti-CD31-BV421 BD Horizon, 563356 0.2 µl per 1 × 106 cells Rat anti-CD45-BV421 BD Horizon, 563890 0.2 µl per 1 × 106 cells Rat anti-TER119-BV421 BD Horizon, 563998 0.2 µl per 1 × 106 cells Rat anti-CD24-APC BioLegend, 562349 0.4 µl per 1 × 106 cells Rat anti-CD117-PE (cKit) BioLegend, 105808 0.3 µl per 1 × 106 cells Goat anti- GNL3/nucleostemin R&D Systems, AF1638 1:50 Rabbit anti-Nup54 Novus, NBP1-85899 1:50 Rabbit anti-CBX3 Proteintech, 11650-2AP 1:20 Rat anti-HCAM Santa Cruz Biotechnology, sc-18849 1:50 Rabbit anti-H2AK119Ac Gift from Zu-Wen Sun 1:500 Rabbit anti-H3K9me2 Upstate, 07-212 1:100 Rabbit anti-H3K9me3 Diagenode, cs-056-050 1:200 Rabbit anti-histone H2A.Z Abcam, ab4174 IHC: 1:500, CUT&Tag: 1:100 Rabbit ant-acetyl histone H2A.Z Merck, ABE1363 IHC: 1:500, CUT&Tag: 1:100 Rabbit anti-H3K27me3 Cell Signaling Technology, 9733 1:100 Rabbit α-mouse antibody Abcam, ab46540 1:100 Mouse Ring1B clone #3 Atsuta, T., Fujimura, Y., Moriya, H., Vidal, M., Akasaka, T. & Koseki, H. Production of monoclonal antibodies against mammalian Ring1B proteins. Hybridoma 20, 43–46 (2001) 1:2 Goat anti-rat Alexa 488 Invitrogen, A-11006 1:500 Goat anti-rabbit Alexa 488 Invitrogen, A-11008 1:500 Streptavidin Alexa 488 Invitrogen, s-32354 1:200 Goat anti-rat Alexa 546 Invitrogen, A-11081 1:500 Goat anti-rabbit Alexa 546 Invitrogen, A-11010 1:500 Donkey anti-rabbit Alexa 546 Invitrogen, A-10040 1:500 Donkey anti-goat Alexa 555 Invitrogen, A-21432 1:500 Rabbit anti-rat biotinylated Dako, E0468 1:200 TSA Biotin Systems PerkinElmer, NEL700A

MeD-seq sample preparations

MeD-seq analyses were essentially carried out as previously described16. At least 10 ng of DNA was digested by LpnPI (New England Biolabs). Stem-loop adapters were blunt-end ligated to repaired input DNA and amplified to include dual-indexed barcodes using a high-fidelity polymerase to generate an indexed Illumina next-generation sequencing (NGS) library. The amplified product was purified on a Pippin HT system with 3% agarose gel cassettes (Sage Science). Multiplexed samples were sequenced on Illumina HiSeq 2500 systems for single reads of 50 bp according to the manufacturer’s instructions. Dual-indexed samples were demultiplexed using bcl2fastq (Illumina). All experimental timepoints were performed in triplo.

MeD-seq data analysis

Data processing was carried out using custom scripts in Python and MATLAB. Raw FASTQ files were subjected to Illumina adaptor trimming, and reads were filtered based on LpnPI restriction site occurrence between 13 bp and 17 bp from either 5′ or 3′ end of the read. DCM methylation data (CCWGG sites) and CpG methylation data (CCG, CGG and GCGC sites) were separated during filtering and mapped separately to mm10 using bowtie2 (ref. 50). Genome-wide individual DCM site scores were used to generate read count scores for all annotated genes from UCSC (GRCm38.p2). BAM files were generated using SAMtools version 0.1.19 for visualization in IGV51,52. Because DCM and CpG methylation can be detected separately using MeD-seq, DCM enrichment was determined by either data normalization using CpG read coverage (for absolute DCM enrichment) or DCM read coverage (for relative DCM enrichment) between samples. For both situations, normalization is done using reads per million (RPM), where absolute DCM levels indicate the level of DCM–Rpol2b induction, and relative DCM levels are used to correct for differences in DCM–Rpol2b induction between mice and/or timepoints.

For gene meta studies, intragenic distribution of DCM (or CpG) reads was shown by generating 100 bins of 100 bp (10 kb) either upstream of the TSS or downstream of the TES. Gene body bins were generated using genes with a minimal gene size of 100 bp and dividing each gene body into 100 bins of 1% of the total gene body size; genes with overlapping gene bodies were excluded. For each bin, the number of DCM (or CpG) reads was plotted after adjusting for the DCM (or CpG) site frequency per bin. To compare pre-TSS and post-TES regions (10 kb) to the gene body regions, DCM site count for each gene body bin is adjusted for gene size and the 10-kb region. Subgroups were based on RNA expression data of the corresponding gene. Distribution of DCM reads across peaks from ChIP-seq data were generated accordingly, using genome-wide ChIP-seq peak boundaries instead of annotated genes. All ChIP-seq datasets were downloaded from the ENCODE portal53 (https://www.encodeproject.org; mouse ESC: ENCSR000CCC, ENCSR000CCD, ENCSR000CGN, ENCSR000CGO, ENCSR000CFZ, ENCSR000CGQ, ENCSR000CFN and ENCSR000CGR; mouse intestine: ENCSR159RVN, ENCSR198ACZ, ENCSR311VKI, ENCSR642VYW, ENCSR389EYR, ENCSR483KOD and ENCSR000CEE; ATAC-seq: ENCSR079GOY; and Lgr5+ ATAC-seq from the Gene Expression Omnibus (GEO): GSE83394.) For DCM methylation, ChIP-seq data comparing DCM and ChIP-seq read counts from the ChIP-seq peaks were used. The log 10 of (DCM read counts in peak / DCM sites in peak) was plotted with the log 10 of (ChIP-seq read counts in peak / peak length), followed by a Pearson correlation coefficient calculation (removing outliers with z-score >4).

To visualize relative DCM methylation changes over time during dox induction, we used triplicates of DCM read counts per gene. DCM read counts per gene were normalized for the total amount of DCM read counts per timepoint; mean DCM methylation levels were calculated; and the s.e.m. was used as measure for variability. Fold changes in mean DCM methylation per timepoint were calculated versus day 1, which was set as calibrator. DCM genes were selected for further analysis when the P value of Mann–Whitney U-test was below 0.05, using the negative DCM days as set X (n = 3) and all other DCM days as set Y (n = 21). MeD-seq sequence data are deposited at the National Center for Biotechnology Information (NCBI) with GEO accession number PRJNA615329.

DCM propagation rate in the small intestine

Pulse-chase experiments were performed with m2rtTA;H2B-GFP;DCMPolr2b compound transgenic reporter mice through an intraperitoneal (IP) dox injection. Enterocytes were isolated by FACS of Epcam+/SLC2A2+ (GLUT2) cells 3 days after a dox IP injection. The ratios of the GFP high and GFP low populations of sorted fractions were established, and DNA was isolated for MeD-seq analysis. The DCM methylation propagation rate was then calculated based on the DCM and CpG methylation read count ratio in relation to the number of cell divisions according to the following equations: \({{{\mathrm{division}}}}\;{{{\mathrm{nr}}}} = {{{\mathrm{log}}}}0.5\left[ {\begin{array}{*{20}{c}} {{{{\mathrm{GFPlow}}}}} \\ {{{{\mathrm{GFPhigh}}}}} \end{array}} \right]\)

$${{{\mathrm{propagation}}}}\;{{{\mathrm{rate}}}} = \root {{{{{\mathrm{division}}}}\;{{{\mathrm{nr}}}}}} \of {{\left[ {\begin{array}{*{20}{c}} {{{{\mathrm{DCMlow}}}}} \\ {{{{\mathrm{DCMhigh}}}}} \end{array}} \right]}}$$

We simulated what the DCM labeling levels would be after each cell division based on an average propagation rate of 56% to discover how many active genes could still be detected. Random subsets of the 2-day dox-induced samples (n = 3) were taken using bbtools version 37.62 reformat.sh using a ‘samplerate’ of 0.56division nr. From each simulated dataset, the number of reads overlapping each gene was counted using BEDTools version 2.29.2 intersect, and the read counts were normalized for the sequencing depth using the number of CpG reads of the original sample54,55. Finally, the fold change between the simulated subsets (n = 3) and the non-induced samples (n = 3) was plotted for all genes active in the complete MeD-seq dataset and peaking on day 2. For this list of genes, we calculated which percentage of genes had a fold change above 1, indicating that the simulated induced samples still have higher DCM methylation levels compared to the non-induced samples, and labels can still be detected.

RNA-seq analysis

Total RNA (1,000 ng per sample) was extracted in triplicate for the ESCs, NPCs and transgenic mouse samples. After rRNA depletion, sequencing libraries were prepared using the KAPA RNA Hyper Prep Kit with RiboErase. Sequencing was performed according to the Illumina TruSeq Rapid version 2 protocol on the HiSeq 2500 with a single-read 51-bp and 7-bp index.

Low-quality reads and contaminants (including sequence adapters) were removed using Trimmomatic. On average, 20 million reads per sample passed the quality assessment and were aligned to the mm10 genome using hisat2 version 2.1.0 (ref. 56). Transcript abundance level (transcript count) was generated using HTSeq version 0.9.1 (ref. 57). The transcript counts were further processed using the R software environment for statistical computing and graphics (version 3.4.0). Data normalization was performed using an EDASeq R package, and differential expression analysis was performed using an EdgeR R package58, using the negative binomial general linear model (GLM) approach. Differentially expressed genes with false discovery rate (FDR) ≤ 0.05 (Benjamini–Hochberg multiple testing correction, expression level in control samples >1 counts per million (CPM)) were retained and used for further processing, GO and pathway analysis. RNA sequence data are deposited at the NCBI with GEO accession number PRJNA615329.

scRNA-seq analysis

For validation and visualization of the DCM profiles, we downloaded scRNA-seq data from Haber et al.21 (GSE92332). Visualization of the cells was done using Monocle3 version 0.2.0 (ref. 59) and UMAP version 0.1.4 (ref. 60). We first pre-processed the scRNA-seq count matrix using a principal component analysis (PCA) with 75 dimensions and corrected for biases using batch as alignment group and the number of genes per cell as residual model formula_str. UMAP was run on this pre-processed matrix with the following settings: min distance of 0.8, n_neighbors of 120 and the cosine metric. The first two UMAP components were plotted using the clustering labels from Haber et al. as cell labels, which were merged when annotated clusters contain similar cell types.

The correlation between the RNA-seq data and the scRNA-seq data was plotted using custom Python scripts. For all genes with at least ten reads across all cells in the scRNA-seq dataset, we calculated the average TPM across replicates using the RNA-seq data. For each cell, the Pearson correlation between the scRNA-seq counts and the average TPM values from the RNA-seq was calculated. The Pearson correlation per cell was then visualized on the UMAP. For validation of gene clusters, we colored the cells in the UMAP according to mean expression of these genes. For each cell, the sum of the TPMs for the genes of interest were extracted and divided by the number of genes to get an average TPM for this set of genes. The mean expressions for all cells were then converted to z-scores for plotting.

For the RNA velocity analysis, velocyto version 0.17 was run on the BAM files provided by Haber et al. The resulting loom files were loaded into Python and analyzed using scanpy version 1.9.1 and scvelo version 0.2.4 (refs. 61,62). The cells from the different batches were merged, and the spliced and unspliced layers were pre-processed using scanpy.pp.filter_and_normalize (min_shared_counts = 20 and n_top_genes = 2,000) and scanpy.pp.moments (n_pcs = 30 and n_neighbors = 30). Then, RNA velocity was estimated using scanpy.tl.velocity with the model ‘stochastic’ and plotted on the UMAP computed previously using scanpy.pl.velocity_embedding_stream with smooth = 0.8 and min_mass = 3. Moreover, the pseudotime was estimated using diffusion pseudotime (DPT)63. The dataset was normalized and log-transformed, and genes and cells were filtered (min_cells = 10, min_genes = 100). The root of the dataset was set to the stem cell cluster, after which scanpy.tl.dpt was run with default parameters to obtain the pseudotime of each cell, which were plotted on the UMAP.

Enhancer DMR calling and validation for ESCs

Potential enhancer regions were called by filtering all genome-wide DCM sites using BEDTools version 2.29.2 (ref. 55) based on the following filters: (1) not overlapping any known genes from Ensembl version 98, (2) more than 1-kb distance to the closest gene and (3) not overlapping any repeat region from the UCSC RepeatMasker track. For each of the resulting 4.1 million sites, the number of overlapping DCM reads was counted and normalized to TPM using the total number of DCM reads per sample. Differentially methylated sites between the dox-treated and control ESC samples were selected based on a Mann–Whitney test (P < 0.05) and fold change ≥4. Enhancer sites were merged into enhancer regions when they were less than 500 bp apart. The genomic regions around these candidate enhancer sites were visualised using deepTools version 3.5.0 (ref. 64) by plotting the TPM-normalized tracks. Moreover, overlap with publicly available ChIP-seq datasets from ESCs (ENCSR000CGN, ENCSR000CMW, ENCSR000CGQ, ENCSR000CGO, ENCSR000CFO, ENCSR000CFN, ENCSR000CFZ, ENCSR779CZG, ENCSR000CCD, ENCSR392DGA and ENCSR000CCC from the ENCODE portal) was plotted. The H3K122ac and H3K64ac tracks from Pradeepa et al.65 (SRX1560887, SRX1560888, SRX1560889 and SRX1560890) were retrieved from NCBI Sequence Read Archive using sra-tools version 11.0. and reanalyzed. Reads were mapped to mm10 using bowtie2 version 2.4.1, after which the BAM files were normalized using the ‘callpeak’ and ‘bdgcmp’ functions of MACS2 version 2.2.7.1 (ref. 66). We visualized the differences in DCM peak height by dividing all differentially methylated sites in three equally sized groups based on the average TPM of the dox-treated samples and plotting overlap with ChIP-seq tracks separately.

The closest gene for each candidate enhancer was selected using BEDTools version 2.29.2 with only the genes that were significantly labelled by dox. Significantly labeled genes were grouped in three equally sized groups based on their fold change. The density of enhancers in the 20-kb region around these three gene groups and the non-significant genes were plotted using deepTools version 3.5.0 and a custom Python script. We plotted the normalized DCM count in the +dox samples for the genes close to enhancer DMRs together with the genes close to H3K27Ac peaks as a control. H3K27Ac peaks were downloaded from the ENCODE portal and processed similarly to the DCM DMRs by removing peaks overlapping gene bodies and <1 kb from genes. P values were calculated using a one-sided Wilcoxon rank-sum test. For visualization in the genome browser overviews, we extended the enhancer regions with 250 bp in both directions. Enhancers that overlapped after extension were merged into larger enhancer regions.

Enhancer DMR calling and validation for intestine

For each of the filtered DCM sites, the number of overlapping DCM reads was counted and normalized to TPM. Differentially methylated sites between all dox-treated and control intestine samples were selected based on (1) Mann–Whitney test (P < 0.05), (2) fold change ≥4 and (3) minimal ten overlapping reads. The average TPM-normalized tracks per day were used for visualization. We split the DMRs in seven groups based on the day with the maximum DCM signal. For each peak day group separately, the overlap with several ChIP-seq datasets was examined (SRX3920113, SRX3920114, SRX3920117, SRX3920105, SRX3920106, SRX3920107, SRX3920108, SRX5023289 and SRX5023290 from Chen et al.26). These datasets were reanalyzed as described for the ESC ChIP-seq data. Moreover, ChIP-seq datasets for H2AZ (SRX2339011, SRX2339012, SRX2339013, SRX2339022, SRX2339023 and SRX2339024 from Kazakevych et al.35), H2AK119ub (SRX856956, SRX856957, SRX856959 and SRX856960 from Chiacchiera et al.31), H3K27me3 (SRX2339102, SRX2339103, SRX2339104, SRX2339111, SRX2339112 and SRX2339113 from Kazakevych et al.35) and ATOH1 from Lo et al.67 (SRX1817263, SRX1817257, SRX1817249, SRX1817250, SRX1817251, SRX1817253, SRX1817254, SRX1817252 and SRX1817255) were also reanalyzed and plotted on the enhancer DMRs and the TSS of significantly labeled genes for each peak day separately.

The closest gene for each candidate enhancer was selected using BEDTools version 2.29.2. For both enhancers and their closest genes, we selected the day with the highest average TPM-normalized DCM count as peak day. The average TPM-normalized DCM counts per day were converted to z-scores for each region separately to visualize their patterns over time in heat maps. The density of enhancers per peak day in the 20-kb region around the genes per peak day was plotted using deepTools version 3.5.0 and a custom Python script. We visualized the correlation between peak day of enhancers and closest genes using the number of enhancers in the 3-kb region around each gene. For each combination of enhancer peak day and gene peak day, the normalized number was plotted.

Percentage of active enhancers labeled in intestine

To address what percentage of active enhancers was labeled by DCM in intestine, active intergenic enhancers were selected based on H3K27ac peaks. Villi ChIP-seq data from Saxena et al.23 were re-analyzed as described. The H3K27ac peaks called by MACS2 were filtered similarly to the DCM sites: (1) not overlapping a gene body and (2) >1 kb from a gene body. The intergenic H3K27ac peaks were plotted in a heat map using deepTools version 3.5.0, showing the overlap with several ChIP-seq datasets and the DCM data from −dox samples and the day 1 and day 2 +dox samples. The peaks were ordered according to the overlapping DCM signal ±1 kb of the peak center and split in four equally sized groups based on this ordering. The correlation at the H3K27ac peaks between the different datasets was examined by counting the number of reads overlapping each H3K27ac peak with ≥3 DCM sites. These read counts were normalized for the peak length or the number of DCM sites for the ChIP-seq and DCM datasets, respectively. For each combination of datasets, the Spearman correlation was calculated using these normalized counts.

Peaks were classified as labeled when at least one significant DCM site was overlapping the peak, whereas non-labeled peaks contained no significant DCM site. A set of random controls was generated by randomly permuting the H3K27ac peaks 100 times separately using BEDTools shuffle (excluding genic regions ±1 kb). The distance to the closest significant DCM site was plotted, and the number of overlapping DCM sites was counted. Based on the distance to the closest significant DCM site compared to the random control, H3K27ac peaks with a significant DCM site <750 bp were selected as active enhancers labeled by DCM.

Motif analysis

To identify TF binding locations at the intestine enhancer regions, a motif analysis was performed using the R package chromVAR68. We counted the number of reads overlapping each enhancer region for each sample separately and provided the total number of DCM reads per sample to normalize for the sequencing depth. The candidate enhancer sites were extended with 250 bp in both directions to obtain regions for motif finding. The package motifmatchr was used to find motifs within these regions based on the motifs retrieved from the JASPAR 2018 database using a P value cutoff of 4 × 10−5 (ref. 69). The motif scores calculated by chromVAR were downloaded and further analyzed in Python. For each motif, the motif scores for all samples were plotted against the z-score of normalized gene DCM counts of the corresponding TF. The Pearson correlation between both scores was calculated, and genes with a correlation >0.3 or <−0.3 were retained for plotting. For motifs that are co-bound by two or more TFs, the gene with the highest correlation with the motif scores was used.

We compared the temporal patterns of the motifs occurring >100 times to the temporal patterns of the genes themselves. Motifs with an early or late maximum temporal signal strength were selected based on the following criteria: (1) the highest signal strength was found early (day 1 or 2) or late (day 6 or 8), respectively, and (2) the second or third highest strength also occurred early or late. The temporal patterns of the early and late motifs and their related genes were plotted separately. Candidate TFs for enterocytes and ISCs were selected as TFs with a maximum DCM gene body accumulation as well as maximum motif proportion at days 1–2 and days 6–8, respectively.

WGBS

The following experiments were performed at GenomeScan B.V. following SOP176 draft version 8. (1) Concentration was determined using the QuantIT BR Kit. (2) To each normalized sample, 5 µl of 100× diluted Lambda Conversion Control (CC; SeqCap Epi Accessory Kit) was added. (3) The combined DNA + CC was fragmented to ~300 bp. (4) To >235 ng of DNA, 0.75 ng of GS spike in bisulfite conversion oligo (BCO) was added. (5) Library prep was performed with the NEBNext Ultra II DNA Kit and dsDNA adapters from Integrated DNA Technologies. (6) Bisulfite conversion was performed using the EZ-96 DNA Methylation-Lightning MagPrep Kit. (7) The converted libraries were amplified using the KAPA HiFi HotStart Uracil+ ReadyMix 2× using ten PCR cycles (samples 6–8, 12 cycles). (8) Concentration of the samples was determined using the QuantIT HS Kit. Size of the libraries was determined using the FA HS Kit. (9) Before the hybridization, the conversion ratio of the BCO control was determined using ddPCR. (10) Clustering and DNA sequencing using the NovaSeq 6000 was performed according to the manufacturerʼs protocols. A concentration of 1.1 nM of DNA was used. Image analysis, base calling and quality check were performed with the Illumina data analysis pipeline RTA 3.4.4 and Bcl2fastq version 2.20.

The unique molecular identifier (UMI) barcodes from each read were added to the read names using UMI-tools version 1.1.2 extract70. Trim Galore version 0.6.7 (wrapper of Cutadapt version 1.18 (ref. 71)) was run with default settings for adapter and quality trimming. The reads were bisulfite mapped using bismark version 0.23.1 (ref. 72) (–pbat and –bowtie2) and deduplicated using deduplicate_bismark based on the barcode information in the read names (–barcode). Methylation calls for all Cs were obtained using bismark_methylation_extractor with the settings –bedGraph and –CX to also consider cytosines in the non-CpG context. Moreover, coverage2cytosine (withv –CX) was run to obtain genome-wide cytosine methylation reports. To obtain DCM methylation-specific bedGraphs, the bedGraph with all Cs was filtered for Cs overlapping DCM sites using BEDTools intersect. We evaluated the correlation between MeD-seq and WGBS. For all genes with at least ten DCM sites, the average WGBS methylation percentage was plotted against the average number of DCM MeD-seq reads (n = 3) normalized for the number of DCM sites. Genes active based on MeD-seq analysis were highlighted, and the Spearman correlation was reported.

To understand the efficiency of DCM labeling better, we analyzed how often DCM sites in reads are co-methylated. We selected reads overlapping genes with at least two DCM sites, both with a methylation percentage above 0.0% in all reads. When reads had more than two DCM sites, we focused only on the first and last DCM site. From these reads, we scored how often both sites were methylated, how often either of one was methylated or how often both were unmethylated. Moreover, two control simulated datasets were added to represent a fully unlinked and a fully linked situation. For the fully unlinked situation, we used the same reads as in the dataset but simulated the methylation status of the sites. For both sites separately, we extracted the average methylation percentage from all reads and generated a random number, which is either above (unmethylated) or below (methylated) this number. For the fully linked situation, we also simulated a dataset based on the reads. From the average methylation percentage for both sites, we extracted the lowest percentage as the percentage in which both sites are methylated and again generated a random number to obtain the simulated methylation status. If the sites were simulated to be unmethylated, the difference in average methylation percentages between both sites was used to decide whether one of both sites was methylated.

CUT&Tag analysis

ISCs were isolated from LGR5-EGFP transgenic mice (LGR5-EGFP: B6-Lgr5tm1(cre/ERT2)Cle/J). Unless stated otherwise, steps were performed at 4 °C and centrifugation for 5 minutes at 300g and 4 °C. The entire small intestine was collected, flushed with cold PBS and opened longitudinally, and villi were removed with a glass slide. The intestine was cut into 5-mm pieces and washed four times in cold PBS. After washing, the pieces were incubated twice in 10 mM EDTA for 15 minutes and 90 minutes at 4 °C. After EDTA incubation, crypts were mechanically separated from stromal tissue in cold PBS with a 10-ml serological pipette. The crypts were collected in the supernatant and centrifuged. The pellet was resuspended in advanced DMEM/F12 (ADF) and incubated with DNase for 10 minutes at room temperature. Next, the crypts were filtered through a 70-µm cell strainer and centrifuged for 5 minutes at 80g and 4 °C. Crypts were dissociated to single cells in TrypLE Select Enzyme (1×, Gibco, 12563011) for 3 minutes at 37 °C, and cells were disrupted every 60 seconds with a P1000. TrypLE was diluted with ADF, and cells were washed twice with 5% FCS in HBSS. Cells were incubated with antibodies (TER-119, BD Horizon, 563998; CD31, BD Horizon, 563356; CD45, BD Horizon, 563890; CD24-Apc, BioLegend, 101814; CD117-PE, BioLegend, 105808) for 30 minutes at 4 °C and washed twice in 3.5 ml of 5% FCS in HBSS. Cells were filtered through a 40-µM cell strainer before proceeding to FACS. The FACS-sorted cells were centrifuged for 5 minutes at 200g and 4 °C and resuspended in CUT&Tag washing buffer.

To study the genome-wide distribution of H2A.Z and H2A.Zac in the ISCs and enterocytes, a CUT&Tag experiment was performed (Kaya-Okur et al.73). Whereas, for the ISCs, the whole cells could be used for CUT&Tag, for enterocytes the nuclei had to be isolated due to crossover in animal of origin between FACS (Glut2) and CUT&Tag (H2A.Z and H2A.Zac) antibodies. After the FACS procedure, enterocytes were centrifuged for 5 minutes at 100g and 4 °C, and pellet was resuspended in TST buffer (0.5% Tween, 1% BSA, 10 mM Tris-HCl pH 7.5, 1 mM CaCl 2 , 146 mM NaCl and 41 mM MgCl 2 in MQ). Subsequently, nuclei were collected by centrifuging for 10 minutes at 100g and 4 °C. Nuclei were incubated for 5 minutes on ice in 800 µl of TST buffer and centrifuged for 10 minutes at 100g and 4 °C. Nuclei were resuspended in CUT&Tag wash buffer.

CUT&Tag was performed following published protocol with minor adaptions73. Per condition, 3.5 × 104 ISCs and 1 × 105 enterocyte nuclei were used as input. The same protocol was followed for both cells and nuclei. Samples were incubated O/N with 1:100 primary antibody (rabbit anti-histone H2A.Z, Abcam, ab4174; rabbit anti-acetyl histone H2A.Z, Merck, ABE1363; or rabbit anti-H3K27me3, Cell Signaling Technology, 9733) and for 1 hour with secondary antibody (rabbit α-mouse antibody, Abcam, ab46540). The pA-Tn5 adaptor complex was incubated for 1 hour at room temperature (pA-Tn5 transposase (loaded)) (Diagenode, C01070001). After DNA extraction, pellet was resuspended in 10 µl of 0.1 mM EDTA by vortex. Sequencing librabries were prepared with the published CUT&Tag amplification method. The libraries were sequenced on an Illumina HiSeq 2500 sequencer, and paired-end clusters were generated of 50 bases in length. The reads were mapped to mm10 using bowtie2 version 2.4.1 (–end-to-end–very-sensitive–no-mixed–no-discordant–phred33 -I 10 -X 700). CPM-normalized bigWig tracks were made using deepTools version 3.5.0 bamCoverage (-bs 1–normalizeUsing CPM) for visualization.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.