Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Intragenic DNA methylation prevents spurious transcription initiation

Abstract

In mammals, DNA methylation occurs mainly at CpG dinucleotides. Methylation of the promoter suppresses gene expression, but the functional role of gene-body DNA methylation in highly expressed genes has yet to be clarified. Here we show that, in mouse embryonic stem cells, Dnmt3b-dependent intragenic DNA methylation protects the gene body from spurious RNA polymerase II entry and cryptic transcription initiation. Using different genome-wide approaches, we demonstrate that this Dnmt3b function is dependent on its enzymatic activity and recruitment to the gene body by H3K36me3. Furthermore, the spurious transcripts can either be degraded by the RNA exosome complex or capped, polyadenylated, and delivered to the ribosome to produce aberrant proteins. Elongating RNA polymerase II therefore triggers an epigenetic crosstalk mechanism that involves SetD2, H3K36me3, Dnmt3b and DNA methylation to ensure the fidelity of gene transcription initiation, with implications for intragenic hypomethylation in cancer.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Dnmt3b co-localizes with H3K36me3 on the gene body and its loss reduces gene-body DNA methylation.
Figure 2: Dnmt3b knockout increases intragenic RNA Pol II spurious entry and transcription initiation events on gene body.
Figure 3: Dnmt3b is required to maintain transcription initiation fidelity.
Figure 4: H3K36me3-dependent maintenance of transcription initiation fidelity is mediated by Dnmt3b through its DNA methylation activity.
Figure 5: Transcripts produced from intragenic cryptic starting sites are polyadenylated, stable and ribosome-associated RNAs.

Similar content being viewed by others

Accession codes

Primary accessions

Gene Expression Omnibus

References

  1. Robertson, K. D. DNA methylation, methyltransferases, and cancer. Oncogene 20, 3139–3155 (2001)

    Article  CAS  Google Scholar 

  2. Chen, Z. X. & Riggs, A. D. DNA methylation and demethylation in mammals. J. Biol. Chem. 286, 18347–18353 (2011)

    Article  CAS  Google Scholar 

  3. Neri, F. et al. Single-base resolution analysis of 5-formyl and 5-carboxyl cytosine reveals promoter DNA methylation dynamics. Cell Reports 10, 674–683 (2015)

    Article  CAS  Google Scholar 

  4. Neri, F. et al. TET1 is a tumour suppressor that inhibits colon cancer growth by derepressing inhibitors of the WNT pathway. Oncogene 34, 4168–4176 (2015)

    Article  CAS  Google Scholar 

  5. Okano, M., Bell, D. W., Haber, D. A. & Li, E. DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell 99, 247–257 (1999)

    Article  CAS  Google Scholar 

  6. Bestor, T. H. The DNA methyltransferases of mammals. Hum. Mol. Genet. 9, 2395–2402 (2000)

    Article  CAS  Google Scholar 

  7. Schübeler, D. Function and information content of DNA methylation. Nature 517, 321–326 (2015)

    Article  ADS  Google Scholar 

  8. Jeltsch, A. & Jurkowska, R. Z. New concepts in DNA methylation. Trends Biochem. Sci. 39, 310–318 (2014)

    Article  CAS  Google Scholar 

  9. Neri, F. et al. Dnmt3L antagonizes DNA methylation at bivalent promoters and favors DNA methylation at gene bodies in ESCs. Cell 155, 121–134 (2013)

    Article  CAS  Google Scholar 

  10. Baubec, T. et al. Genomic profiling of DNA methyltransferases reveals a role for DNMT3B in genic methylation. Nature 520, 243–247 (2015)

    Article  ADS  CAS  Google Scholar 

  11. Morselli, M. et al. In vivo targeting of de novo DNA methylation by histone modifications in yeast and mouse. eLife 4, e06205 (2015)

    Article  Google Scholar 

  12. Edmunds, J. W., Mahadevan, L. C. & Clayton, A. L. Dynamic histone H3 methylation during gene induction: HYPB/Setd2 mediates all H3K36 trimethylation. EMBO J. 27, 406–420 (2008)

    Article  CAS  Google Scholar 

  13. Yoh, S. M., Lucas, J. S. & Jones, K. A. The Iws1:Spt6:CTD complex controls cotranscriptional mRNA biosynthesis and HYPB/Setd2-mediated histone H3K36 methylation. Genes Dev. 22, 3422–3434 (2008)

    Article  CAS  Google Scholar 

  14. Wagner, E. J. & Carpenter, P. B. Understanding the language of Lys36 methylation at histone H3. Nat. Rev. Mol. Cell Biol. 13, 115–126 (2012)

    Article  CAS  Google Scholar 

  15. Carrozza, M. J. et al. Histone H3 methylation by Set2 directs deacetylation of coding regions by Rpd3S to suppress spurious intragenic transcription. Cell 123, 581–592 (2005)

    Article  CAS  Google Scholar 

  16. Carvalho, S. et al. Histone methyltransferase SETD2 coordinates FACT recruitment with nucleosome dynamics during transcription. Nucleic Acids Res. 41, 2881–2893 (2013)

    Article  CAS  Google Scholar 

  17. Maunakea, A. K. et al. Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature 466, 253–257 (2010)

    Article  ADS  CAS  Google Scholar 

  18. Maderious, A. & Chen-Kiang, S. Pausing and premature termination of human RNA polymerase II during transcription of adenovirus in vivo and in vitro. Proc. Natl Acad. Sci. USA 81, 5931–5935 (1984)

    Article  ADS  CAS  Google Scholar 

  19. Yankulov, K., Yamashita, K., Roy, R., Egly, J. M. & Bentley, D. L. The transcriptional elongation inhibitor 5,6-dichloro-1-β-d-ribofuranosylbenzimidazole inhibits transcription factor IIH-associated protein kinase. J. Biol. Chem. 270, 23922–23925 (1995)

    Article  CAS  Google Scholar 

  20. Bochnig, P., Reuter, R., Bringmann, P. & Lührmann, R. A monoclonal antibody against 2,2,7-trimethylguanosine that reacts with intact, class U, small nuclear ribonucleoproteins as well as with 7-methylguanosine-capped RNAs. Eur. J. Biochem. 168, 461–467 (1987)

    Article  CAS  Google Scholar 

  21. Deana, A., Celesnik, H. & Belasco, J. G. The bacterial enzyme RppH triggers messenger RNA degradation by 5′ pyrophosphate removal. Nature 451, 355–358 (2008)

    Article  ADS  CAS  Google Scholar 

  22. Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Genet. 38, 626–635 (2006)

    Article  CAS  Google Scholar 

  23. Butler, J. E. F. & Kadonaga, J. T. The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Dev. 16, 2583–2592 (2002)

    Article  CAS  Google Scholar 

  24. Clark, S. J., Harrison, J. & Molloy, P. L. Sp1 binding is inhibited by mCpmCpG methylation. Gene 195, 67–71 (1997)

    Article  CAS  Google Scholar 

  25. Douet, V., Heller, M. B. & Le Saux, O. DNA methylation and Sp1 binding determine the tissue-specific transcriptional activity of the mouse Abcc6 promoter. Biochem. Biophys. Res. Commun. 354, 66–71 (2007)

    Article  CAS  Google Scholar 

  26. Hogart, A. et al. Genome-wide DNA methylation profiles in hematopoietic stem and progenitor cells reveal overrepresentation of ETS transcription factor binding sites. Genome Res. 22, 1407–1418 (2012)

    Article  CAS  Google Scholar 

  27. Uchiumi, F., Miyazaki, S. & Tanuma, S. The possible functions of duplicated ets (GGAA) motifs located near transcription start sites of various human genes. Cell. Mol. Life Sci. 68, 2039–2051 (2011)

    Article  CAS  Google Scholar 

  28. Yu, M. et al. GA-binding protein-dependent transcription initiator elements. Effect of helical spacing between polyomavirus enhancer a factor 3(PEA3)/Ets-binding sites on initiator activity. J. Biol. Chem. 272, 29060–29067 (1997)

    Article  CAS  Google Scholar 

  29. Gowher, H. & Jeltsch, A. Molecular enzymology of the catalytic domains of the Dnmt3a and Dnmt3b DNA methyltransferases. J. Biol. Chem. 277, 20409–20414 (2002)

    Article  CAS  Google Scholar 

  30. Tani, H. & Akimitsu, N. Genome-wide technology for determining RNA stability in mammalian cells: historical perspective and recent advantages based on modified nucleotide labeling. RNA Biol. 9, 1233–1238 (2012)

    Article  CAS  Google Scholar 

  31. Ingolia, N. T., Lareau, L. F. & Weissman, J. S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802 (2011)

    Article  CAS  Google Scholar 

  32. Maunakea, A. K., Chepelev, I., Cui, K. & Zhao, K. Intragenic DNA methylation modulates alternative splicing by recruiting MeCP2 to promote exon recognition. Cell Res. 23, 1256–1269 (2013)

    Article  CAS  Google Scholar 

  33. Yearim, A. et al. HP1 is involved in regulating the global impact of DNA methylation on alternative splicing. Cell Reports 10, 1122–1134 (2015)

    Article  CAS  Google Scholar 

  34. Jones, P. A. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat. Rev. Genet. 13, 484–492 (2012)

    Article  CAS  Google Scholar 

  35. Jones, P. A. & Baylin, S. B. The fundamental role of epigenetic events in cancer. Nat. Rev. Genet. 3, 415–428 (2002)

    Article  CAS  Google Scholar 

  36. Gaudet, F. et al. Induction of tumors in mice by genomic hypomethylation. Science 300, 489–492 (2003)

    Article  ADS  CAS  Google Scholar 

  37. Feinberg, A. P. & Vogelstein, B. Hypomethylation distinguishes genes of some human cancers from their normal counterparts. Nature 301, 89–92 (1983)

    Article  ADS  CAS  Google Scholar 

  38. Kanu, N. et al. SETD2 loss-of-function promotes renal cancer branched evolution through replication stress and impaired DNA repair. Oncogene 34, 5699–5708 (2015)

    Article  CAS  Google Scholar 

  39. Fontebasso, A. M. et al. Mutations in SETD2 and genes affecting histone H3K36 methylation target hemispheric high-grade gliomas. Acta Neuropathol. 125, 659–669 (2013)

    Article  CAS  Google Scholar 

  40. Duns, G. et al. Histone methyltransferase gene SETD2 is a novel tumor suppressor gene in clear cell renal cell carcinoma. Cancer Res. 70, 4287–4291 (2010)

    Article  CAS  Google Scholar 

  41. Neri, F. et al. Genome-wide analysis identifies a functional association of Tet1 and Polycomb repressive complex 2 in mouse embryonic stem cells. Genome Biol. 14, R91 (2013)

    Article  Google Scholar 

  42. Mikkelsen, T. S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007)

    Article  ADS  CAS  Google Scholar 

  43. Incarnato, D., Neri, F., Diamanti, D. & Oliviero, S. MREdictor: a two-step dynamic interaction model that accounts for mRNA accessibility and Pumilio binding accurately predicts microRNA targets. Nucleic Acids Res. 41, 8421–8433 (2013)

    Article  CAS  Google Scholar 

  44. Incarnato, D., Neri, F., Anselmi, F. & Oliviero, S. Genome-wide profiling of mouse RNA secondary structures reveals key features of the mammalian transcriptome. Genome Biol. 15, 491 (2014)

    Article  Google Scholar 

  45. Incarnato, D., Krepelova, A. & Neri, F. High-throughput single nucleotide variant discovery in E14 mouse embryonic stem cells provides a new reference genome assembly. Genomics 104, 121–127 (2014)

    Article  CAS  Google Scholar 

  46. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013)

    Article  Google Scholar 

  47. Trapnell, C. et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31, 46–53 (2013)

    Article  CAS  Google Scholar 

  48. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009)

    Article  Google Scholar 

  49. Xi, Y. & Li, W. BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics 10, 232 (2009)

    Article  Google Scholar 

  50. Chen, C.-Y. A., Ezzeddine, N. & Shyu, A.-B. Messenger RNA half-life measurements in mammalian cells. Methods Enzymol. 448, 335–357 (2008)

    Article  CAS  Google Scholar 

  51. Zhang, Y. et al. Model-based analysis of ChIP-seq (MACS). Genome Biol. 9, R137 (2008)

    Article  Google Scholar 

  52. Li, J. Y. et al. Synergistic function of DNA methyltransferases Dnmt3a and Dnmt3b in the methylation of Oct4 and Nanog. Mol. Cell. Biol. 27, 8748–8759 (2007)

    Article  CAS  Google Scholar 

  53. Core, L. J., Waterfall, J. J. & Lis, J. T. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322, 1845–1848 (2008)

    Article  ADS  CAS  Google Scholar 

  54. Sharova, L. V. et al. Database for mRNA half-life of 19 977 genes obtained by DNA microarray analysis of pluripotent and differentiating mouse embryonic stem cells. DNA Res. 16, 45–58 (2009)

    Article  CAS  Google Scholar 

  55. Schwanhäusser, B. et al. Global quantification of mammalian gene expression control. Nature 473, 337–342 (2011)

    Article  ADS  Google Scholar 

Download references

Acknowledgements

We thank T. Baubec and D. Schübeler for providing the Dnmt3b construct. We thank S. Yamanaka for anti-Dnmt3l antibody. We thank E. Guccione, R. Calogero and T. Bates for helpful suggestions and critical reading of the manuscript. This work was supported by the Associazione Italiana Ricerca sul Cancro (AIRC) IG 2014 Id15217.

Author information

Authors and Affiliations

Authors

Contributions

F.N. and S.O. conceived the study; S.R. and A.K. performed genome-wide experiments, cloning and cell treatments; F.N. and D.I. performed genome-wide experiments and data analysis; M.M. performed cloning and cell treatments; C.P. and G.B. performed RNA-seq; F.A. performed CAPIP-seq experiments; F.N. and S.O. wrote the paper with input from all authors.

Corresponding authors

Correspondence to Francesco Neri or Salvatore Oliviero.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

eviewer Information Nature thanks P. Carninci and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data figures and tables

Extended Data Figure 1 Generation of Dnmt3b−/− and mapping of the endogenous Dnmt3b in ES cells.

Dnmt3b−/− ES cell clones (B126 and B77) showed normal cell growth and alkaline phosphatase (AP) staining as well as impaired silencing, by promoter DNA methylation of Nanog expression during the differentiation into embryonic bodies (EBs) with respect to the wild-type cell line, indicating the bona fide nature of the transgenic cell lines. a, Schematic of the region of the Dnmt3b gene targeted by TALEN zinc-fingers, and representative sequences on the two alleles of two Dnmt3b−/− clones, compared to wild-type (KO #1 = B77; KO #2 = B126). b, Western blot analysis of Dnmt3b protein in the two Dnmt3b−/− clones compared to wild-type. Dnmt1 and Dnmt3a2 levels are not affected by loss of Dnmt3b. Actin is used as loading control. Notably, the mRNA level (data not shown) of the Dnmt3b gene is also almost completely lost in Dnmt3b−/− cells. c, Growth curve of wild-type and Dnmt3b−/− ES cells over 3 days. d, Alkaline phosphatase staining of wild-type and Dnmt3b−/− ES cell colonies. e, RT–qPCR of Nanog levels in embryoid bodies derived from Dnmt3b−/− clones, compared to wild-type ES cells, and embryoid bodies. Error bars represent the standard deviation of at least three independent experiments. f, Sanger sequencing of bisulphite-treated genomic DNA from wild-type ES cells and embryoid bodies, and Dnmt3b−/− ES-cell-derived embryoid bodies, at the region of the Nanog promoter previously shown to be target of Dnmt3b-mediated methylation upon differentiation52. g, Histogram showing the quantity of the DNA recovered in ChIP experiments performed with different antibodies directed against Dnmt3b protein. h, Western blot analysis of Dnmt3b protein in wild-type and Dnmt3b−/− ES cells. Actin was used as a loading control. i, Histogram showing the quantity (ng) of the DNA recovered in ChIP experiments performed with anti-Dnmt3b antibody (Ab122932) in wild-type and Dnmt3b−/− ES cells. j, Genomic views of the mapped reads from different ChIP-seq datasets in ES cells. IgG and Dnmt3b ChIP-seq and WGBS are from the present work, bio-Dnmt3b from GSE57413, MeDIP-seq from GSE44644, GLIB-seq from GSE44566, histone modifications from GSE12241. k, Left, heat map representations of Dnmt3b binding and relevant histone modifications on a window of ±3 kb centred on the TSS of RefSeq genes, sorted by their expression level, according to RNA-seq data. Right, plots of Dnmt3b binding and relevant histone modifications on a window of ±3 kb centred on the TSS of RefSeq genes, clustered in the four quartiles of expression (q4 = upper quartile, the most expressed genes). l, m, Binding enrichment of IgG (in wild-type ES cells) as well as IgG and Dnmt3b (in Dnmt3b−/− ES cells) on the exons or introns partitioned in quartiles on the basis of the expression of the related gene. These figures represent control experiments for the Fig. 1b. n, Hierarchical clustering of pairwise Pearson correlation of Dnmt3b, and third-party ChIP-seq datasets in ES cells, reveals a strong genome-wide association of Dnmt3b with H3K36me3 histone marks. o, Scatter plots comparing intragenic H3K36me3 and IgG/Dnmt3b enrichments (log2) in wild-type and Dnmt3b−/− cells. r, Pearson correlation. p, qPCR of ChIP analysis of Dnmt3b on the indicated regions. A specific enrichment can be observed on gene body of active genes. Error bars represent the standard deviation of at least three independent experiments. Primers used are reported in Supplementary Table 1. q, Immunoprecipitation experiment (using a different antibody for Dnmt3b, Ab2851) in ES cells reveals the interaction of Dnmt3b with H3K36me3, but not H3K4me3, in agreement with ChIP-seq data. r, Violin plots of the methylation level of all CpGs in both wild-type and Dnmt3b−/− ES cells as determined by WGBS on the indicated genomic features. P values calculated with Wilcoxon rank-sum test.

Extended Data Figure 2 Dnmt3b loss increases intragenic RNA transcription initiation.

a, Scatter plots of the log2 RPKM gene values in the indicated samples. b, Genomic views of the RNA-seq mapped reads from the indicated samples. c, Box plots of the ratio between normalized RNA-seq read counts (RPKM) for the second and the first exon (top left), the third and the first exon (bottom left), the average of the intermediate exons (from the fourth to penultimate) and the first exon (top right), the last and the first exon (bottom right), in wild-type (rep #2) and Dnmt3b−/− (rep #2 and clone B77) ES cells. P values calculated with Wilcoxon rank-sum test. d, Pie charts showing the percentage of transcripts with log2 fold change ≥1, ≤ −1 or between −1 and 1. e, RT–qPCR analysis of Ints2, Nodal, Gabpa and XpoI transcripts by using primers targeting different exons to discriminate different isoforms in wild-type, Dnmt3b−/− (cl. B126) and Dnmt3b−/− (cl. B77) ES cells. All the PCR were normalized to β-actin and on the wild-type condition. Error bars represent the standard deviation of at least three independent experiments. P values calculated against wild type condition for each experiment by using t-test. **P < 0.001, *P < 0.01. Primers used are reported in Supplementary Table 1.

Extended Data Figure 3 Dnmt3b loss does not extensively affect alternative promoter activation.

We investigate the activation or repression of alternative promoters on the subset of genes showing at least two annotated alternative promoters. On these genes, we measured the RPKM value of the first exon of all the isoforms transcribed from each of the alternative promoters in wild-type or Dnmt3b−/− cells. We observed that the Dnmt3b−/− cells showed a general trend to exhibit genes with the first exon less expressed independently from the isoform, thus suggesting a non-global general activation of intragenic promoters. Analysis of the ratio of the expression between the first promoter and the second downstream promoter identified four genes (on a total of 2,563 genes) with a reactivation of the intragenic promoter in Dnmt3b−/− cells. a, Schematic of the gene dataset used for alternative promoter analysis. The dataset is composed of total 2,563 genes showing at least two annotated alternative promoters, including 713 genes having at least three, 195 genes at least four and another 189 genes with multiple alternative promoters (from at least 5 to a maximum of 12). b, RPKM value of the first exon of all the isoforms transcribed from the alternative promoters in wild-type or Dnmt3b−/− ES cells. Dnmt3b−/− cells showed a general trend to have genes with the first exon less expressed independently from the isoform, and none of putative intragenic promoters (from the second to the twelfth) showed general activation. c, Analysis of the ratio of the expression of the first promoter over the second downstream promoter displayed high correlation between replicates and wild-type or Dnmt3b−/− ES cells. Only four genes (of a total of 2,563 genes) showed a reactivation of the intragenic promoter in Dnmt3b−/− ES cells. Further analysis of the ratio between RPKM of the first exon and of the whole transcript for each class of alternative-promoter-transcribed genes did not show any evidence for possible reactivation of any class of transcript isoforms derived from intragenic promoters. d, e, Analysis of the ratio of the RPKM value of the first exon over the whole transcript for each class of alternative promoters transcribed genes showed high correlation between wild-type and Dnmt3b−/− ES cells and did not reveal evidence for possible reactivation of any class of transcript isoforms derived from intragenic promoters.

Extended Data Figure 4 Dnmt3b loss does not globally affect elongating Pol II or H3K36me3 deposition on the gene bodies, but increases intragenic Pol II spurious entry.

a, Genomic views of the mapped reads from the indicated different ChIP-seq data sets in wild-type and Dnmt3b−/− ES cells normally cultured or treated with the Pol II elongation inhibitor DRB. b, Hierarchical clustering of pairwise Pearson correlation of ChIP-seq experiments performed in this work, and third-party ChIP-seq datasets in ES cells. c, Heat map representations of the indicated ChIP-seq (in wild-type and Dnmt3b−/− ES cells) peaks with respect to annotated RefSeq genes, sorted by their expression level, according to RNA-seq data. Each gene was extended by 3 kb upstream of its TSS, and downstream of its TES. d, Plots of the H3K36me3 distribution in wild-type and Dnmt3b−/− ES cells. e, Binding enrichment of H3K36me3 on intermediate exons and introns in wild-type and Dnmt3b−/− ES cells. f, Binding enrichment of the indicated ChIP-seq experiments in wild-type and Dnmt3b−/− ES cells treated (or not) with DRB on the intermediate exons and introns subdivided into quartiles on the basis of expression level. This result demonstrated that only non-elongating Pol II is enriched on the bodies of the most expressed genes (q3 and q4) in Dnmt3b−/− ES cells. P values calculated with Wilcoxon rank-sum test; **P < 2.2 × 10−16. g, Genomic views of the mapped reads from the ChIP-seq analyses for H3K4me3 and H3ac in wild-type and Dnmt3b−/− ES cells. h, Hierarchical clustering of pairwise Pearson correlation of ChIP-seq experiments performed in this work, compared with ENCODE ChIP-seq datasets. i, Heat map representations of the indicated ChIP-seq (in wild-type and Dnmt3b−/− ES cells) peaks with respect to annotated RefSeq genes, sorted by their expression level, according to RNA-seq data. Each gene was extended by 3 kb upstream of its TSS, and downstream of its TES. j, Binding enrichment of the indicated ChIP-seq experiments in wild-type and Dnmt3b−/− ES cells on the first exons, intermediate exons and introns subdivided into quartiles on the basis of expression level. This result demonstrates that H3K4me3 and H3ac distribution are enriched on the intermediate exons and introns of the most expressed genes of the Dnmt3b−/− ES cells. P values calculated with Wilcoxon rank-sum test.

Extended Data Figure 5 CAPIP-seq enrichment of the 5′ of the RNAs shows that Dnmt3b loss increases intragenic spurious transcription initiation.

a, Schematic view of the CAPIP-seq protocol used. Total RNA is chemically fragmented and then subjected to immunoprecipitation by using a specific anti-CAP antibody or a control anti-IgG antibody. Eluted RNA (as well as one-tenth of the starting material for input) is subjected to random primer reverse-transcription. The library is then completed, starting from second strand generation. b, Scatter plots of the log2 RPKM of CAPIP-seq data (anti-CAP antibody) and input in wild-type and Dnmt3b−/− ES cells. c, Hierarchical clustering of pairwise Pearson correlation of CAPIP-seq-related sequencings in wild-type and Dnmt3b−/− ES cells. d, Genomic views of the total mapped reads from the indicated CAPIP-seq related sequencings. Enrichment of the CAP signal is present on the 5′ of the RNA as a peak of about 150 bp broader with respect to the signal obtained by performing DECAP-seq. e, Plots of the CAPIP-seq mapped reads distribution in wild-type and Dnmt3b−/− ES cells with respect to annotated RefSeq genes, extended by 5 kb upstream of its TSS, and downstream of its TES. f, Box plots of the log2 enrichment of the CAPIP-seq signal rep #2 (CAP immunoprecipitation signal over input in wild-type and Dnmt3b−/− ES cells on the indicated genic features). P values calculated with Wilcoxon rank-sum test. g. Further analysis showing the increase of CAP localization from intragenic regions of the RNA. Intragenic ratio is calculated as the log2 ratio of cap signal gene-body enrichment in Dnmt3b−/− versus wild-type cells. The correlation between the two replicates is shown.

Extended Data Figure 6 DECAP-seq method maps, at single-base resolution, TSSs on the gene body in ES cells.

a, Schematic representation of the workflow of the DECAP-seq technique that is based on the RNA 5′ pyrophosphohydrolase (RppH) enzymatic activity that in Thermopol buffer is able to mediate decapping and pyrophosphate removal from the 5′ end of RNA to leave a 5′ monophosphate RNA (5′-P). 5′-P RNA is then used for selective adaptor ligation by T4 RNA ligase to the originally capped RNA fragments allowing single-base resolution mapping of the RNA capping sites. Treating sample in the same way, but without RppH enzyme generates a negative control (to detect technical background). Positive control is generated by treating sample with T4 polynucleotide kinase (PNK) for 5′ phosphorylation of all RNA fragments. This method represents an affordable alternative to the use of the tobacco acid pyrophosphatase (TAP) enzyme that has been used in several high-throughput techniques such as GRO-seq, CAP-seq, CIP-TAP53 because the EpiCentre Technologies (to our knowledge, the only company producing commercial TAP) has discontinued TAP and all kits containing it. b, Total RNA fragmentation was verified by using Fragment Analyzer (Advanced Analytical). c, Final DECAP-seq libraries were inspected on Fragment Analyzer before gel size selection. RppH-treated and untreated samples showed a double peak around 130 bp corresponding to the dimers of adaptor, but only the RppH-treated sample showed a higher enrichment (in the red box) corresponding to the decapped RNA fragments. The PNK-treated sample displayed a large peak around 200 bp. d, Final DECAP-seq libraries were quantified on Qubit (Invitrogen) after gel size selection and PCR enrichment. The library generated by treating RNA with RppH showed a fifty-fold higher concentration with respect to the library generated without RppH treatment (5 ng μl−1 versus 0.1 ng μl−1). e, Genomic views of the total DECAP-seq mapped reads from the indicated treatment on a gene (Actb) on the Crick DNA strand (− strand) and a gene (Rpl5) on the Watson DNA strand (+ strand). A pronounced sharp peak (red arrow) is present on the TSS only on the respective gene strand, thus reflecting both the cap- and strand-specificity of the method. Unstranded RNA-seq is shown as reference example. f, Plot of total TSSs (identified by using DECAP-seq rep #2) distribution along genes in Dnmt3b−/− (blue line) compared with wild-type (red line) ES cells. g, Box plots showing the number of total TSSs per gene on RefSeq-annotated TSSs and on gene body in wild-type and Dnmt3b−/− ES cells. P values calculated with Wilcoxon rank-sum test. h, Histogram showing the average RPM of novel identified TSSs by DECAP-seq in both replicates of wild-type ES cells. i, j, Scatter plots of the log2 RPM values on canonical annotated TSSs (±100 bp) in both replicate of DECAP-seq samples in wild-type and Dnmt3b−/− ES cells. k, Venn diagrams of intragenic TSSs with a DECAP-seq signal RPM > 6 showing the single-base resolution overlap between the DECAP-seq experiment replicates. P values calculated with Hypergeometric Distribution test. l, Venn diagram of intragenic TSSs with a DECAP-seq signal RPM > 6 showing single-base resolution overlap between Dnmt3b−/− and wild-type ES cells (rep #2). m, Pie charts of the DECAP-seq read distribution on TSSs RPM > 6 in wild-type (left) and Dnmt3b−/− (right) cells (rep #2). In green are shown the novel TSSs that overlap with RefSeq-annotated TSSs. Yellow, all the common TSSs distributed on the gene body; pink, the sample-specific TSSs on the gene body. n, Box plot distribution of the enrichment of the CAPIP-seq and Pol II ChIP-seq signals calculated as the log2 ratio in Dnmt3b−/− versus wild-type cells on the novel identified TSSs and on an intragenic random dataset. Green, those overlapping with RefSeq-annotated TSSs; pink, those specifically found on the gene bodies of Dnmt3b−/− ES cells. P values calculated with Wilcoxon rank-sum test. o, Box plot distribution of the ratio between downstream and upstream exon expression levels with respect to the novel identified intragenic TSSs or an intragenic random dataset in Dnmt3b−/− cells. The exon expression levels were calculated by counting the reads from the RNA-seq experiments in Dnmt3b−/− or wild-type cells. P values calculated with Wilcoxon rank-sum test.

Extended Data Figure 7 DECAP-seq maps the internal TSSs in Dnmt3b−/− ES cells revealing their correlation with the binding of methylation-sensitive transcription factors.

a, Sequence binding motifs of the indicated transcription factors. b, Schematic representation of CpG localization and putative transcription factor binding elements on the regions (±50 bp) of some intragenic TSSs specific to Dnmt3b−/− ES cells. c, RT–qPCR analysis of CAPIP (top) and qPCR analysis of ChIP (middle) experiments on the indicated genomic regions in wild-type and Dnmt3b−/− ES cells. For CAPIP RT–qPCR the primers were designed downstream the novel identified TSSs. Bottom panel represents the fold difference of the ratio between downstream and upstream exon expression levels with respect to the novel identified intragenic TSSs. For TSSs falling on exons, the downstream or upstream part of the same exon was considered as downstream or upstream exon if longer than 200 bp. P value was calculated against the wild-type condition using a t-test; **P < 0.01; *P < 0.05; n.s., not significant. d, Sanger bisulphite sequencing of intragenic TSSs previously described in wild-type and Dnmt3b−/− ES cells. e, qPCR analysis of ChIP experiments on the indicated genomic regions.

Extended Data Figure 8 SetD2 knockdown reduces H3K36me3 marks, Dnmt3b binding, intragenic DNA methylation, and spurious TSSs on the gene bodies.

a, RT–qPCR of SetD2 knockdown in wild-type and Dnmt3b−/− ES cells, using two independent shRNAs. Error bars represent the standard deviation of at least three independent experiments. b, Venn diagram showing the genome-wide number of H3K36me3 peaks in control and SetD2 knockdown ES cells. c, Plots of H3K36me3 distribution on genes in control and SetD2 knockdown cells show a decrease of H3K36me3 on the gene bodies of SetD2-silenced cells. d, Histograms of the percentage of Dnmt3b ChIP-seq peaks overlapping intronic and exonic regions of genes grouped into quartiles on the basis of expression in control or SetD2 knockdown cells. P value was calculated with a χ2 test; **P < 0.001. e, Genomic views of the mapped reads from H3K36me3 and Dnmt3b ChIP-seq datasets in control and two different SetD2 knockdowns ES cells. f, qPCR analysis of H3K36me3 and Dnmt3b ChIP experiments and MeDIP analysis in control and SetD2 knockdown cells for the indicated genomic regions. A specific loss of Dnmt3b and DNA methylation is observed only on the gene body of active genes. Error bars represent the standard deviation of at least three independent experiments. P value was calculated against the wild-type condition for each experiment with a t-test; **P < 0.001. Primers used are reported in Supplementary Table 1. g, Scatter plots of the log2 RPKM gene values in the indicated samples. h, Genomic views of the RNA-seq-mapped reads from the indicated samples. i, Plot of total TSSs (identified by using DECAP-seq) distribution along genes in SetD2 knockdown (yellow line) compared with control knockdown (red line) ES cells. j, Box plots showing the number of total TSSs per gene on RefSeq-annotated TSSs and on gene bodies in control and SetD2 knockdown ES cells. P values calculated with Wilcoxon rank-sum test. k, Scatter plots of the log2 RPM values on canonical annotated TSSs (±100 bp) in control and SetD2 knockdown ES cells. l, Venn diagram of intragenic TSSs with a DECAP-seq signal RPM > 6 showing single-base resolution overlap between control and SetD2 knockdown ES cells. m, n, Venn diagrams of intragenic TSSs having DECAP-seq signal RPM > 6 showing single-base resolution overlap between the indicated samples. P values calculated with Hypergeometric Distribution test. o, Pie charts of the DECAP-seq read distribution on TSSs RPM > 6 in control knockdown (top) and SetD2 (bottom) ES cells. In green are the novel TSSs that overlap with RefSeq-annotated TSSs; in yellow, all the common TSSs distributed on gene bodies; and in pink, the sample-specific TSSs on gene bodies.

Extended Data Figure 9 Internal transcription activation in Dnmt3b−/− ES cells show the same intragenic TSSs as in SetD2 knockdown cells.

a, Genomic view of the indicated genes showing intragenic transcription initiation increase in Dnmt3b−/− and in shSetD2 wild-type cells. Below, Sanger bisulphite sequencing of shCTR (control) and shSetD2 wild-type ES cells on previously described intragenic TSSs. b, Genomic views of the mapped reads from H3K36me3 (in wild-type ES cells) and Dnmt3b ChIP-seq datasets (in mock or Dnmt3b-transfected Dnmt3b−/− ES cells). Both the wild-type and the catalytically inactive Dnmt3b(V725G) mutant showed intragenic binding enrichment. c, qPCR analysis of IgG and Dnmt3b ChIP experiments in mock or Dnmt3b (wild-type and V725G) transfected Dnmt3b−/− ES cells for the indicated intragenic regions. Error bars represent the standard deviation of at least three independent experiments. P value calculated against the mock condition using a t-test; **P < 0.001. Primers used are reported in Supplementary Table 1. d, Dot-blot analysis of genomic DNA isolated from mock or Dnmt3b (wild-type and V725G) transfected Dnmt3b−/− ES cells. Dot intensity quantification from three biological replicates revealed that wild-type Dnmt3b (but not the V725G mutant) significantly (P = 0.003) increased global DNA 5mC. P value calculated against the mock condition using a t-test. e, qPCR analysis of MeDIP experiments in mock or Dnmt3b (wild-type and V725G) transfected Dnmt3b−/− ES cells for the indicated intragenic regions. A significant intragenic increase of DNA methylation is evident in wild-type Dnmt3b (but not mutant) transfected Dnmt3b−/− ES cells. Error bars represent the standard deviation of at least three independent experiments. P value calculated against the mock condition using a t-test; **P < 0.001. Primers used are reported in Supplementary Table 1. f, Genomic views of the RNA-seq-mapped reads from the indicated samples. g, Scatter plots of the log2 RPKM gene values in the indicated samples. Of note, mock-treated ES cells showed higher correlation with Dnmt3b-mutant-transfected ES cells (r = 0.99) than with wild-type Dnmt3b-transfected ES cells (r = 0.95), suggesting that DNA methylation enzymatic activity is the major driver of the Dnmt3b-dependent transcriptome alterations. h, Western blot of Dnmt3b−/− ES cells transfected with mock, wild-type Dnmt3b, Dnmt3b(S277P) or Dnmt3b(VW-RR). β-Actin was used as protein loading control. i, j, qPCR analysis of ChIP and MeDIP experiments of the indicated regions in Dnmt3b mutant conditions. Specific impairment of Dnmt3b binding and DNA methylation is observed in both the mutants compared to rescue using the wild-type Dnmt3b enzyme. Error bars represent the standard deviation of at least three independent experiments. Primers used are reported in Supplementary Table 1.

Extended Data Figure 10 Cryptic RNA transcripts are degraded in part by the RNA exosome complex.

a, RNA-seq profile of Dnmt3b−/− cells transfected with mock or Dnmt3b mutants. b, Scatter plots of the log2 RPKM gene values in the indicated samples. c, Box plot of the ratio between normalized RNA-seq read counts (RPKM) for the second, third, intermediate (average) and last exons to the first exon in Dnmt3b−/− ES cells transfected with mock, wild-type Dnmt3b or mutant Dnmt3b (S277P and VW-RR). P values calculated with Wilcoxon rank-sum test. d, Pie chart showing the percentage of transcripts with log2 fold change (FC) >1, < −1 or between −1 and 1. e, f, Histogram and western blot showing mRNA and protein levels of Dis3 and Rrp6 genes in control or Dis3/Rrp6 double knockdown (dKD) in Dnmt3b−/− ES cells. β-Actin was used as protein loading control. g, Box plots showing the number of total TSSs per gene on RefSeq-annotated TSSs and gene bodies in control or Dis3/Rrp6 dKD Dnmt3b−/− ES cells. P values calculated with Wilcoxon rank-sum test. h, Box plot of the normalized DECAP-seq read counts (RPM) on the intragenic TSSs in the indicated samples. P values calculated with Wilcoxon rank-sum test. i, Scatter plots of the log2 RPM values on canonical annotated TSSs (±100 bp) of the indicated samples. j, Venn diagrams of intragenic TSSs with a DECAP-seq signal RPM > 6 showing the single-base resolution overlap between the DECAP-seq experiment replicates performed in Dnmt3b−/− ES cells. P values calculated with Hypergeometric Distribution test. k, Pie charts of the DECAP-seq reads distribution on TSSs RPM > 6 in control (left) and Dis3/Rrp6 KD (right) Dnmt3b−/− ES cells. In green are shown the novel TSSs that overlap with RefSeq annotated TSSs; in yellow, all the common TSSs distributed on gene bodies; and in pink, the sample-specific TSSs on gene bodies. l, Scatter plots of the log2 RPKM gene values in the indicated samples. m, Genomic views of the RNA-seq mapped reads from the indicated samples. n, Box plots of the ratio between normalized poly(A)+ RNA-seq read counts (RPKM) for the second and the first exon, the third and the first exon, the average of the intermediates (from the fourth to the penultimate exons) and the first exon, and the last and the first exon in wild-type (rep #2) and Dnmt3b−/− (rep #2 and clone B77) ES cells. P values calculated with Wilcoxon rank-sum test. o, Pie-chart showing the percentage of transcripts with an intermediate to first exon ratio (in Dnmt3b−/− rep #2 and clone B77 poly(A)+ RNA-seq) versus an intermediate to first exon ratio (in wild-type rep #2 poly(A)+ RNA-seq) log2 fold change >1, < −1 or between −1 and 1. p, Box plots showing the number of total TSSs per gene on RefSeq-annotated TSSs and gene bodies identified by DECAP-seq in the indicated RNA compartments. P values calculated with Wilcoxon rank-sum test. q, Scatter plots of the log2 RPM values on canonical annotated TSSs (±100 bp) in the indicated RNA compartments. r, Venn diagram of the common intragenic TSSs (defined as having RPM > 6 in both Dnmt3b−/− and wild-type ES cells) in the indicated RNA compartments. s, Box plot of the normalized DECAP-seq read counts (RPM) on the common intragenic TSSs (RPM > 6) in the indicated RNA compartments.

Extended Data Figure 11 Loss of Dnmt3b generates partial intragenic starting RNAs that are as stable as canonical mRNAs.

a, Genomic views of the RNA-seq mapped reads from the indicated samples. Genes with slow, medium and fast decay are shown. b, Gene Ontology (GO) analysis of the subsets of the mRNAs with fast decay (half-life lower than four hours) or slow decay (half-life higher than nine hours) in wild-type ES cells. The analysis revealed that fast-decay mRNAs are mainly involved in cell cycle and transcription biological processes, while slow-decay mRNAs are related to metabolism and translation. This result is in agreement with that previously observed in mouse ES cells54,55, supporting the bona fide nature of the experiment. c, Scatter plot of mRNA half-life (in hours) in wild-type and Dnmt3b−/− ES cells. d, e, Box plots of intron half-life (in hours) in wild-type and Dnmt3b−/− ES cells. Intron half-life is estimated by considering only the reads mapped to intronic regions. Intron half-life is generally lower than mRNA half-life, suggesting lower stability of the RNAs containing intronic parts. Intron half-life calculated in Dnmt3b−/− ES cells is significantly (P = 0.0016) higher than in wild-type ES cells. P values calculated with Wilcoxon rank-sum test. f, g, Frequency distribution of introns and mRNA half-life among all introns in wild-type and Dnmt3b−/− ES cells. h, Genomic views of the RNA-seq mapped reads from the indicated samples. ART-seq reads derived only from the coding sequences (CDS) of the mRNAs. RNA-seq is shown as reference example. i, Scatter plots of the log2 RPKM gene values in the indicated samples. j, Box plot of the normalized ART-seq rep #2 read counts (RPKM) on the indicated RNA regions in wild-type and Dnmt3b−/− ES cells. k, Box plot of the normalized ART-seq read counts (RPKM, for both the biological replicates) on the introns in wild-type and Dnmt3b−/− ES cells. P values calculated with Wilcoxon rank-sum test.

Extended Data Figure 12 Models from of the obtained results.

a, Scheme of the functional role of the Dnmt3b-dependent intragenic DNA methylation in ES cells. In wild-type cells, Dnmt3b is able to methylate gene bodies to favour a repressive chromatin environment that inhibits spurious entries of Pol II. In the absence of Dnmt3b, gene bodies are hypomethylated, leading to Pol II intragenic entries that generate intragenic transcription initiation. b, Epigenetic crosstalk between Pol II, SetD2 and Dnmt3b and relative H3K36me3 and 5mC chromatin modifications unveils how Pol II, through the transcription elongation process, triggers a safety mechanism to ensure its transcription initiation fidelity.

Supplementary information

Supplementary Figure 1

This file contains the uncropped scans with indication of the protein size. (PDF 1644 kb)

Supplementary Tables

This file contains Supplementary Tables 1 and 2. (PDF 113 kb)

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Neri, F., Rapelli, S., Krepelova, A. et al. Intragenic DNA methylation prevents spurious transcription initiation. Nature 543, 72–77 (2017). https://doi.org/10.1038/nature21373

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature21373

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing