Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

BANP opens chromatin and activates CpG-island-regulated genes

Abstract

The majority of gene transcripts generated by RNA polymerase II in mammalian genomes initiate at CpG island (CGI) promoters1,2, yet our understanding of their regulation remains limited. This is in part due to the incomplete information that we have on transcription factors, their DNA-binding motifs and which genomic binding sites are functional in any given cell type3,4,5. In addition, there are orphan motifs without known binders, such as the CGCG element, which is associated with highly expressed genes across human tissues and enriched near the transcription start site of a subset of CGI promoters6,7,8. Here we combine single-molecule footprinting with interaction proteomics to identify BTG3-associated nuclear protein (BANP) as the transcription factor that binds this element in the mouse and human genome. We show that BANP is a strong CGI activator that controls essential metabolic genes in pluripotent stem and terminally differentiated neuronal cells. BANP binding is repelled by DNA methylation of its motif in vitro and in vivo, which epigenetically restricts most binding to CGIs and accounts for differential binding at aberrantly methylated CGI promoters in cancer cells. Upon binding to an unmethylated motif, BANP opens chromatin and phases nucleosomes. These findings establish BANP as a critical activator of a set of essential genes and suggest a model in which the activity of CGI promoters relies on methylation-sensitive transcription factors that are capable of chromatin opening.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: BANP binds the CGCG element in vivo.
Fig. 2: BANP is sensitive to DNA methylation.
Fig. 3: BANP drives the expression of essential genes.
Fig. 4: BANP opens chromatin at CGIs.

Similar content being viewed by others

Data availability

Next-generation sequencing data have been deposited at the Gene Expression Omnibus with accession number GSE155604. Mass spectrometry data have been deposited at the ProteomeXchange Consortium through the PRIDE partner repository with the identifier PXD024794.

References

  1. Mohn, F. & Schübeler, D. Genetics and epigenetics: stability and plasticity during cellular differentiation. Trends Genet. 25, 129–136 (2009).

    Article  CAS  PubMed  Google Scholar 

  2. Deaton, A. M. & Bird, A. CpG islands and the regulation of transcription. Genes Dev. 25, 1010–1022 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Lambert, S. A. et al. The human transcription factors. Cell 172, 650–665 (2018).

    Article  CAS  PubMed  Google Scholar 

  4. Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A. & Luscombe, N. M. A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 10, 252–263 (2009).

    Article  CAS  PubMed  Google Scholar 

  5. Slattery, M. et al. Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 39, 381–399 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Ernst, J. et al. Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions. Nat. Biotechnol. 34, 1180–1190 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. FitzGerald. P. C., Shlyakhtenko, A., Mir, A. A. & Vinson, C. Clustering of DNA sequences in human promoters. Genome Res. 14, 1562–1574 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Pique-Regi, R. et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447–455 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Yang, J. G., Madrid, T. S., Sevastopoulos, E. & Narlikar, G. J. The chromatin-remodeling enzyme ACF is an ATP-dependent DNA length sensor that regulates nucleosome spacing. Nat. Struct. Mol. Biol. 13, 1078–1083 (2006).

    Article  CAS  PubMed  Google Scholar 

  10. Lienert, F. et al. Identification of genetic elements that autonomously determine DNA methylation states. Nat. Genet. 43, 1091–1097 (2011).

    Article  CAS  PubMed  Google Scholar 

  11. Pardo, C. E., Darst, R. P., Nabilsi, N. H., Delmas, A. L. & Kladde, M. P. Simultaneous single-molecule mapping of protein–DNA interactions and DNA methylation by MAPit. Curr. Protoc. Mol. Biol. 95, 21.22.1–21.22.18 (2011).

    Article  Google Scholar 

  12. Sönmezer, C. et al. Molecular co-occupancy identifies transcription factor binding cooperativity in vivo. Mol. Cell 81, 255–267 (2021).

    Article  PubMed  CAS  Google Scholar 

  13. Makowski, M. M. et al. Global profiling of protein–DNA and protein-nucleosome binding affinities using quantitative mass spectrometry. Nat. Commun. 9, 1653 (2018).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  14. Saksouk, N. et al. Redundant mechanisms to form silent chromatin at pericentromeric regions rely on BEND3 and DNA methylation. Mol. Cell 56, 580–594 (2014).

    Article  CAS  PubMed  Google Scholar 

  15. Dai, Q. et al. The BEN domain is a novel sequence-specific DNA-binding domain conserved in neural transcriptional repressors. Genes Dev. 27, 602–614 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Dai, Q. et al. Common and distinct DNA-binding and regulatory activities of the BEN-solo transcription factor family. Genes Dev. 29, 48–62 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. Khan, A. & Prasanth, S. G. BEND3 mediates transcriptional repression and heterochromatin organization. Transcription 6, 102–105 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Sathyan, K. M., Shen, Z., Tripathi, V., Prasanth, K. V. & Prasanth, S. G. A BEN-domain-containing protein associates with heterochromatin and represses transcription. J. Cell Sci. 124, 3149–3163 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Rampalli, S., Pavithra, L., Bhatt, A., Kundu, T. K. & Chattopadhyay, S. Tumor suppressor SMAR1 mediates cyclin D1 repression by recruitment of the SIN3/histone deacetylase 1 complex. Mol. Cell. Biol. 25, 8415–8429 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Sreenath, K. et al. Nuclear matrix protein SMAR1 represses HIV-1 LTR mediated transcription through chromatin remodeling. Virology 400, 76–85 (2010).

    Article  CAS  PubMed  Google Scholar 

  21. Domcke, S. et al. Competition between DNA methylation and transcription factors determines binding of NRF1. Nature 528, 575–579 (2015).

    Article  ADS  CAS  PubMed  Google Scholar 

  22. Baylin, S. B. & Jones, P. A. Epigenetic determinants of cancer. Cold Spring Harb. Perspect. Biol. 8, a019505 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  23. Berman, B. P. et al. Regions of focal DNA hypermethylation and long-range hypomethylation in colorectal cancer coincide with nuclear lamina-associated domains. Nat. Genet. 44, 40–46 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. Mahpour, A., Scruggs, B. S., Smiraglia, D., Ouchi, T. & Gelman, I. H. A methyl-sensitive element induces bidirectional transcription in TATA-less CpG island-associated promoters. PLoS ONE 13, e0205608 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101 (2015).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  26. McDonald, E. R., III et al. Project DRIVE: a compendium of cancer dependencies and synthetic lethal relationships uncovered by large-scale, deep RNAi screening. Cell 170, 577–592 (2017).

    Article  CAS  PubMed  Google Scholar 

  27. Nabet, B. et al. The dTAG system for immediate and target-specific protein degradation. Nat. Chem. Biol. 14, 431–441 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Muhar, M. et al. SLAM-seq defines direct gene-regulatory functions of the BRD4–MYC axis. Science 360, 800–805 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Gaidatzis, D., Burger, L., Florescu, M. & Stadler, M. B. Analysis of intronic and exonic reads in RNA-seq data characterizes transcriptional and post-transcriptional regulation. Nat. Biotechnol. 33, 722–729 (2015).

    Article  CAS  PubMed  Google Scholar 

  30. Dahlet, T. et al. Genome-wide analysis in the mouse embryo reveals the importance of DNA methylation for transcription integrity. Nat. Commun. 11, 3153 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  31. Thoma, E. C. et al. Ectopic expression of neurogenin 2 alone is sufficient to induce differentiation of embryonic stem cells into mature neurons. PLoS ONE 7, e38651 (2012).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  32. Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 109, 21.29.1–21.29.9 (2015).

    Article  Google Scholar 

  33. Fu, Y., Sinha, M., Peterson, C. L. & Weng, Z. The insulator binding protein CTCF positions 20 nucleosomes around its binding sites across the human genome. PLoS Genet. 4, e1000138 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  34. Vierstra, J. et al. Global reference mapping of human transcription factor footprints. Nature 583, 729–736 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  35. Wilson, B. C. et al. Intellectual disability-associated factor Zbtb11 cooperates with NRF-2/GABP to control mitochondrial function. Nat. Commun. 11, 5469 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  36. Stielow, B. et al. The SAM domain-containing protein 1 (SAMD1) acts as a repressive chromatin regulator at unmethylated CpG islands. Sci. Adv. 7, eabf2229 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Weber, M. et al. Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat. Genet. 39, 457–466 (2007).

    Article  CAS  PubMed  Google Scholar 

  38. Iurlaro, M. et al. Mammalian SWI/SNF continuously restores local accessibility to chromatin. Nat. Genet. 53, 279–287 (2021).

    Article  CAS  PubMed  Google Scholar 

  39. Schick, S. et al. Acute BAF perturbation causes immediate changes in chromatin accessibility. Nat. Genet. 53, 269–278 (2021).

    Article  CAS  PubMed  Google Scholar 

  40. Hartl, D. et al. CG dinucleotides enhance promoter activity independent of DNA methylation. Genome Res. 29, 554–563 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Mohn, F. et al. Lineage-specific polycomb targets and de novo DNA methylation define restriction and potential of neuronal progenitors. Mol. Cell 30, 755–766 (2008).

    Article  CAS  PubMed  Google Scholar 

  42. Zhang, Y. et al. Rapid single-step induction of functional neurons from human pluripotent stem cells. Neuron 78, 785–798 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Lowary, P. T. & Widom, J. New DNA sequence rules for high affinity binding to histone octamer and sequence-directed nucleosome positioning. J. Mol. Biol. 276, 19–42 (1998).

    Article  CAS  PubMed  Google Scholar 

  44. Khan, A. et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res. 46, D260–D266 (2018).

    Article  CAS  PubMed  Google Scholar 

  45. Feng, Y. Q. et al. Site-specific chromosomal integration in mammalian cells: highly efficient CRE recombinase-mediated cassette exchange. J. Mol. Biol. 292, 779–785 (1999).

    Article  CAS  PubMed  Google Scholar 

  46. Gaidatzis, D., Lerch, A., Hahne, F. & Stadler, M. B. QuasR: quantification and annotation of short reads in R. Bioinformatics 31, 1130–1132 (2015).

    Article  CAS  PubMed  Google Scholar 

  47. Ostapcuk, V. et al. Activity-dependent neuroprotective protein recruits HP1 and CHD4 to control lineage-specifying genes. Nature 557, 739–743 (2018).

    Article  ADS  CAS  PubMed  Google Scholar 

  48. Cox, J. et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794–1805 (2011).

    Article  ADS  CAS  PubMed  Google Scholar 

  49. Cox, J. et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol. Cell. Proteomics 13, 2513–2526 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Hubner, N. C. et al. Quantitative proteomics combined with BAC TransgeneOmics reveals in vivo protein interactions. J. Cell Biol. 189, 739–754 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Tyanova, S. et al. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat. Methods 13, 731–740 (2016).

    Article  CAS  PubMed  Google Scholar 

  52. Tusher, V. G., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA 98, 5116–5121 (2001).

    Article  ADS  CAS  PubMed  PubMed Central  MATH  Google Scholar 

  53. Wang, Y. et al. Reversed-phase chromatography with multiple fraction concatenation strategy for proteome profiling of human MCF10A cells. Proteomics 11, 2019–2026 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).

    Article  CAS  PubMed  Google Scholar 

  55. Gräwe, C., Makowski, M. M. & Vermeulen, M. PAQMAN: protein-nucleic acid affinity quantification by mass spectrometry in nuclear extracts. Methods 184, 70–77 (2020).

    Article  PubMed  CAS  Google Scholar 

  56. Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Cui, K. & Zhao, K. Genome-wide approaches to determining nucleosome occupancy in metazoans using MNase-seq. Methods Mol. Biol. 833, 413–419 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Gaidatzis, D. et al. DNA sequence explains seemingly disordered methylation levels in partially methylated domains of mammalian genomes. PLoS Genet. 10, e1004143 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  59. Barisic, D., Stadler, M. B., Iurlaro, M. & Schübeler, D. Mammalian ISWI and SWI/SNF selectively mediate binding of distinct transcription factors. Nature 569, 136–140 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  60. Abdulrahman, W. et al. A set of baculovirus transfer vectors for screening of affinity tags and parallel expression strategies. Anal. Biochem. 385, 383–385 (2009).

    Article  CAS  PubMed  Google Scholar 

  61. Huber, W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12, 115–121 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Lawrence, M., Gentleman, R. & Carey, V. rtracklayer: an R package for interfacing with genome browsers. Bioinformatics 25, 1841–1842 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Ginno, P. A. et al. A genome-scale map of DNA methylation turnover identifies site-specific dependencies of DNMT and TET activity. Nat. Commun. 11, 2680 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  64. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  65. Zhang, Y. et al. Model-based analysis of ChIP–seq (MACS). Genome Biol. 9, R137 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  66. Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Héberlé, É. & Bardet, A. F. Sensitivity of transcription factors to DNA methylation. Essays Biochem. 63, 727–741 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Buck-Koehntop, B. A. et al. Molecular basis for recognition of methylated and specific DNA sequences by the zinc finger protein Kaiso. Proc. Natl Acad. Sci. USA 109, 15229–15234 (2012).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  69. Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  70. Arnold, P. et al. Modeling of epigenome dynamics identifies transcription factors that mediate Polycomb targeting. Genome Res. 23, 60–73 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Neumann, T. et al. Quantification of experimentally induced nucleotide conversions in high-throughput sequencing datasets. BMC Bioinformatics 20, 258 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  73. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10 (2011).

    Article  Google Scholar 

  75. Stadler, M. B. et al. DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature 480, 490–495 (2011).

    Article  ADS  CAS  PubMed  Google Scholar 

  76. Blattler, A. et al. Global loss of DNA methylation uncovers intronic enhancers in genes showing expression changes. Genome Biol. 15, 469 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  77. Xuan Lin, Q. X. et al. MethMotif: an integrative cell specific database of transcription factor binding motifs coupled with DNA methylation profiles. Nucleic Acids Res. 47, D145–D154 (2019).

    Article  PubMed  CAS  Google Scholar 

  78. Hon, G. C. et al. Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer. Genome Res. 22, 246–258 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  ADS  PubMed Central  CAS  Google Scholar 

  80. Venables, W. N. & Ripley, B. D. Modern Applied Statistics with S 4th edn (Springer, 2002).

  81. Hahne, F. & Ivanek, R. Visualizing genomic data using Gviz and Bioconductor. Methods Mol Biol. 1418, 335–351 (2016).

    Article  PubMed  Google Scholar 

  82. Alexa, A., Rahnenführer, J. & Lengauer, T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600–1607 (2006).

    Article  CAS  PubMed  Google Scholar 

  83. Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9, e1003118 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Fenouil, R. et al. CpG islands and GC content dictate nucleosome depletion in a transcription-independent manner at mammalian promoters. Genome Res. 22, 2399–2408 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank P. Papasaikas for help with the SLAM-seq analysis; M. Frederiksen and N. Leroy from the Novartis Institutes of Biomedical Research for providing the dTAG13 compound; and M. Lorincz and members of the D.S. laboratory for critical feedback on the manuscript. D.S. and N.T. acknowledge support from the Novartis Research Foundation, the Swiss National Science Foundation (310030B_176394 to D.S. and 31003A_179541 to N.T.) and the European Research Council under the European Union’s (EU) Horizon 2020 research and innovation programme grant agreements (ReadMe-667951 and DNAaccess-884664 to D.S. and CsnCRL-666068 and NucEM-884331 to N.T.). M.V. is part of the Oncode Institute, which is partly funded by the Dutch Cancer Society. R.S.G., A.K.M. and S.D. acknowledge EMBO Long-Term Fellowships. R.S.G and L.I. acknowledge the EU Horizon 2020 Research and Innovation Program under the Marie Sklodowska-Curie grant (705354 to R.S.G. and 748760 to L.I.). A.K.M. acknowledges the Human Frontier Science Program. L.I. acknowledges the National Health and Medical Research Council CJ Martin Fellowship APP1148380. A.R.K. acknowledges support from the European Molecular Biology Laboratory, Deutsche Forschungsgemeinschaft (KR 5247/1-1) and a Swiss National Fund Ambizione grant (PZOOP3_161493).

Author information

Authors and Affiliations

Authors

Contributions

R.S.G., L.B. and D.S. conceived and planned the experiments. R.S.G. performed all experiments, performed SMF analysis and contributed to initial data analysis. L.B. performed comprehensive computational data analysis. C.G. and M.V. validated the affinity purification and performed PAQMAN analysis. A.K.M. performed and N.T. supervised protein purification and biochemistry assays. L.I. assisted with genomics and biochemistry assays. D.H. and V.I. performed mass spectrometry quantification and initial data processing. L.H. assisted with western blots, MNase-seq and cell line maintenance. S.D. performed and analysed the immunofluorescence experiments. M.P. and A.R.K. assisted in the establishment of the SMF method. S.A.S. advised on and oversaw the generation of next-generation-sequencing data. D.S. supervised the project. R.S.G., L.B. and D.S. interpreted the results and wrote the manuscript.

Corresponding author

Correspondence to Dirk Schübeler.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Eric Mendenhall and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 BANP binds the orphan CGCG element in the mouse genome.

a, Footprint created by REST bound to its motif (red) and CpG methylation around the bound motif (black). Motif indicated by grey rectangle in the middle. b, No footprint over the scrambled REST motif (red) and corresponding CpG methylation at this inserted construct (black). Individual biological replicates are shown (n = 2). Red line is the mean. c, ChIP–seq read counts for REST at two genomic loci with a REST motif (top and middle) and one without (bottom). SMF amplicon indicated in blue, REST motif in grey12. d, Footprinting of the corresponding loci in c. A footprint of around 30 bp was detected over the REST motif (top and middle) compared to a site without a motif. The transcription factor footprint is distinguishable from the neighbouring nucleosome footprint by size—around 30 bp compared to around 150 bp in width. Individual biological replicates are shown (n = 4). Red line is the mean. e, f, Same as Fig. 1b but including CpG methylation. Individual biological replicates are shown (n = 2). Red line is the mean. g, Quantitative mass spectrometry (PAQMAN) determines affinity of BANP for the CGCG element to be around 18.5 nM. Binding curves were generated by fitting the parameters of the Hill equation to determine the relative equilibrium dissociation constant (Kdapp). Data are the mean of three experiments (n = 3), error bars represent standard error of the mean. h, Reproducibility of enrichments at peaks (n = 1302) for three independent BANP ChIP–seq replicates (R1–R3) from wild-type mouse ES cells. Pearson correlation coefficients are indicated. i, Top motif found by de novo motif search in the top 500 peaks of each ChIP–seq replicate (R1–R3) using HOMER. j, 6mer enrichments as measured by Pearson residuals (Methods) at the top 500 peaks inside CpG islands (CGIs) and outside of CGIs (nonCGI). The five sub-6mers of the highest-scoring motif in i (TCTCGCGAGA), TCTCGC, CTCGCG, TCGCGA, CGCGAG and GCGAGA are marked in red. R1–R3 indicate biological replicates. k, BANP motif instances of varying motif scores were predicted genome-wide (Methods) and the fraction of predicted motifs that overlap common peaks (peaks identified in all three replicates, Methods) was determined for equally spaced bins of motif scores (missing bins do not contain any predicted sites). The chosen cut-off of 12.5 is indicated by a dashed line. l, The fraction of common BANP peaks in varying bins of BANP enrichment that contain a predicted BANP motif using the cut-off of 12.5 defined in k. m, Reproducibility of BANP enrichments at predicted BANP motifs (n = 1207) as defined in k. Pearson correlation coefficients are indicated. R1–R3 indicate biological replicates. n, GO enrichment of genes that contain a bound BANP motif. The top 30 most significant GO categories are shown.

Extended Data Fig. 2 BANP is methylation-sensitive in mouse and human cells.

a, BANP binding versus percentage methylation of the CpGs in the BANP motif. n indicates the number of motifs per bin. R indicates Pearson correlation coefficient. Black lines correspond to median, boxes to first and third quartile and whiskers to the maximum and minimum values of the distribution after removal of outliers, in which outliers are defined as more than 1.5 × (interquartile range) away from the box (Methods). b, Fraction of variance in BANP binding explained by a linear model that incorporates either motif score or methylation of the motif, or both (Methods). c, True versus predicted BANP ChIP–seq enrichments at predicted motifs (as defined in Extended Data Fig. 1k) for a linear model that uses motif score (left), methylation (middle) or both motif score and methylation (right) as predictors. Fraction of variance explained is indicated as R2. d, PAQMAN using a methylated (Meth.) BANP motif reduces affinity by more than 16-fold compared to an unmethylated (unmeth.) motif. Binding curves were generated by fitting the parameters of the Hill equation to determine the relative equilibrium dissociation constant (Kdapp). Data are the mean of three experiments (n = 3), error bars represent standard error of the mean. e, Top motif found by HOMER in the top 500 peaks of each replicate (R1–R3) in DNMT TKO cells. f, Reproducibility of changes in BANP binding in DNMT TKO versus wild-type ES cells at predicted BANP motifs (n = 1,207). Pearson correlation coefficient is indicated. R1–R3 indicate biological replicates. g, Change in BANP binding in DNMT TKO versus wild-type ES cells compared to the methylation level of the motif (WGBS) in wild-type ES cells at predicted BANP motifs (n = 1,207). h, Distribution of motif scores as a function of change in BANP binding in DNMT TKO versus wild-type ES cells. Box plots as in a. Notches extend to ± 1.58 × (interquartile range/sqrt(n)). i, Single-locus examples of BANP binding in wild-type ES cells and DNMT TKO cells at promoters with a methylated BANP motif. Methylation of the CpGs in the motif is indicated by the colour of the circles above the motif. Colour range from white (0% methylation) to black (100% methylation). For Tex13b, the circle represents the average methylation of both CpGs as the coverage was too low to quantify each CpG separately. j, Expression changes versus changes in BANP binding between wild-type ES cells and DNMT TKO cells at genes with a predicted BANP motif in their promoter. For the definition of gene–motif pairings, see Methods. k, Superose 6 increase 10/300 GL size exclusion chromatography profile of full-length (FL) BANP protein. Peak fractions were analysed by SDS–PAGE and stained by Coomassie (inset) showing protein size and high purity. l, Electrophoretic mobility shift assay of full-length BANP binding to the unmethylated (left), methylated (middle) or scrambled (right) BANP motif (n = 2 replicates). m, Reproducibility of changes in BANP binding at peaks between the human cancer cell line HCT116 and HCC1954. R indicates the Pearson correlation coefficient. n, Top motif found by de novo motif search (HOMER) in the top 500 peaks of the first replicate of both cell types. Motifs found for the remaining replicates are very similar (data not shown). o, The fraction of peaks that contain a BANP motif as a function of peak strength in the first replicate of both cell types. Peaks were sorted by read counts and binned into groups of 250 peaks (each bar representing one group). Although we identified between around 14,000–24,000 peaks, only the top bins show a high fraction of peaks with motif (results very similar for the remaining replicates, data not shown). The additional peaks are likely to be false positives owing to an open chromatin bias in the ChIP–seq data as shown for HCT116 in p. p, BANP binding versus DNaseI in HCT116 cells in 1-kb tiling windows of chromosome 1. There is a global correlation of ChIP–seq and DNaseI signal, which probably explains the large number of peaks without a BANP motif. q, GO enrichment of genes that contain a bound BANP motif. The 30 most significant GO categories are shown and GO categories were grouped as in Extended Data Fig. 1n. Similarities to the mouse GO analysis are indicated by coloured bars and arrows (Extended Data Fig. 1n). r, Differential methylation versus differential binding at BANP motifs bound in at least one of the two cell types. Single locus displayed in Fig. 2d is circled. R indicates Pearson correlation coefficient. s, t, BANP binding and DNA methylation in HCT116 and HCC1954 cells at a BANP motif at a differentially methylated CGI shore (s) or in a CGI that lies in a partially methylated domain (t). Methylation of the CpGs in the motif is indicated by the colour of the circles above the motif. Colour range from white (0% methylation) to black (100% methylation). u, Negative correlation between methylation in the BANP motif and BANP binding at CGI promoter sites in several human cancer cell lines of different origin, using DNaseI hypersensitivity as an indicator of BANP binding. Both methylation and DNaseI are shown relative to the average level across all cell types. Only sites that are bound in at least one cell line are shown (Methods). Pearson correlation coefficients are indicated.

Extended Data Fig. 3 Inducible BANP depletion by targeted degradation allows the loss of function of this essential gene to be studied.

a, Activation of a luciferase reporter gene by one, two or three copies of the BANP motif after transient transfection into mouse ES cells. b, BANP lethality score from genome wide CRISPR screens across more than 500 cell lines. A gene with a score below −0.5 is considered a common essential gene (Broad Institute (https://depmap.org/portal/)). Black lines correspond to median, boxes to first and third quartile and whiskers to the maximum and minimum values of the distribution. c, Full sized western blot of BANP in wild-type and DNMT TKO cell lines before and after addition of the dTAG demonstrates the reduced level of BANP due to tagging, and the absence of protein following induced degradation by the addition of the dTAG13 compound to the medium (n = 3 replicates). Arrowhead on right indicates the target protein. d, The cell-cycle phase distribution determined by BrdU incorporation followed by flow cytometry analysis of wild-type and DNMT TKO cells before and after endogenous tagging of the Banp gene (n = 2 replicates). e, Immunofluorescence visualizes BANP degradation in mouse ES cells (n = 2 replicates). a.u., arbitrary units. f, Quantification of cell death in wild-type and TKO cells after inducing BANP depletion by the addition of the dTAG13 compound. Individual data points are shown and the bars represent the mean of three biological replicates. g, Level of s4U incorporation at different time points of a BANP degradation time course, in which D stands for the time of induced BANP degradation and T for the time of incorporation. Percentage refers to the fraction of Ts converted to Cs. Although cells untreated with s4U show very low percentages (wild-type untreated R1–R3), the percentages increase with increasing incorporation time. Only genes with at least a total count of 50 reads overlapping Ts in all replicates are shown. R1–R3 indicate biological replicates. Black lines correspond to median, boxes to first and third quartile and whiskers to the maximum and minimum values of the distribution after removal of outliers, in which outliers are defined as more than 1.5 × (interquartile range) away from the box. n = 13,801 genes. h, Change in gene expression level of BANP-bound and unbound genes between wild-type and BANP degron-tagged cells. There is no consistent global difference between BANP bound and unbound genes. Same y range as Fig. 3c for comparison. Box plots as in g. Notches extend to ± 1.58 × (interquartile range/sqrt(n)). i, Change in RNA across a BANP degradation time course showing the response of unbound genes. Same y range as Fig. 3c for comparison. Box plots as in g. j, Beeswarm plot of expression changes (log2) after 6 h of dTAG treatment (versus untreated) for all genes with a bound BANP motif belonging to one of the groups of GO categories as defined in Extended Data Fig. 1n.

Extended Data Fig. 4 Correlation heat maps for the wild-type ES cell RNA-seq time course.

af, Pearson correlations between samples for all quantifiable genes on exon level (a), intron level (b) and in SLAM-seq (c). df, Same as ac but for all quantifiable genes with a BANP motif in their promoter. To remove correlations due to varying gene lengths, counts in all three measures were converted to log2 RPKM values before determining the correlation coefficients. r1–r3 indicate biological replicates. Samples named as in Extended Data Fig. 3g. af illustrate high reproducibility between replicates. df indicate that changes on the transcriptional level at BANP target genes occur fast whereas changes on the mRNA level are delayed. Note that the correlation structure in SLAM-seq is also influenced by the varying incorporation times.

Extended Data Fig. 5 Reproducibility and comparison of exonic, intronic and SLAM-seq signal during a time course of BANP degradation.

Top three rows show reproducibility of log2 changes relative to untreated for exonic, intronic and SLAM-seq, respectively. The first two replicates are shown in each case (R1 and R2). Bottom three rows show the same changes comparing SLAM-seq to intronic, intronic to exonic and SLAM-seq to exonic signal, respectively. Average of all replicates is shown. Comparisons indicate a high degree of similarity between intronic and SLAM-seq signal and a delayed exonic response. In all figures, Pearson correlations were calculated on all genes with a promoter that overlaps a bound BANP motif, highlighted in red.

Extended Data Fig. 6 The downregulation of BANP target genes is also detected at the protein level.

a, Pearson correlations of log2 changes in protein levels at different time points relative to untreated for all quantifiable genes (left) or for all quantifiable genes with a BANP motif in their promoter (right). Reproducible changes can be clearly observed after 10 h. R1–R3 indicate biological replicates. b, Scatter plots showing the correlation between RNA (exonic) and protein levels across a BANP degradation time course (n = 8,128 genes). In all panels, Pearson correlations were calculated on all genes with a promoter that overlaps a bound BANP motif (n = 357 genes), shown in red. As the aim of these comparisons is to see how mRNA changes of BANP targets are reflected at the protein level, but BANP itself, which has a BANP motif in its promoter, has been degraded at the protein level, it was removed from the comparisons. Its protein level changes are shown in c (top). The 10-h time point is missing as it was not measured in RNA. c, Depletion of BANP (top) and an essential BANP target gene, TUBGCP5 (bottom), relative to untreated cells across the BANP degradation time course. d, Total proteome showing the downregulation of BANP target genes at the protein level over a BANP degradation time course. BANP itself was removed as in b. Black lines correspond to median, boxes to first and third quartile and whiskers to the maximum and minimum values of the distribution after removal of outliers, in which outliers are defined as more than 1.5 × (interquartile range) away from the box. Notches extend to ± 1.58 × (interquartile range/sqrt(n)). Bound: n = 357 genes, unbound: n = 7,861 genes. e, Western blot for the essential BANP target gene TUBGCP5 over a degradation time course (n = 2 replicates). Arrowhead on right indicates the target protein. f, Quantification of the TUBGCP5 protein level in e normalized to the loading control. Individual replicates are shown, and the bars represent the mean.

Extended Data Fig. 7 BANP regulates a similar set of genes in DNMT TKO cells and is necessary and sufficient to drive expression of TKO-specific bound genes.

ad, Same as Extended Data Fig. 4a, b, d, e, but for a DNMT TKO RNA-seq time course. e, f, Same as top two rows in Extended Data Fig. 5, but for a DNMT TKO RNA-seq time course. g, Comparison of the RNA response (exonic level) to BANP removal in wild-type versus TKO cell lines. In all panels, Pearson correlations were calculated on all genes with a promoter that overlaps a bound (bound in either wild-type or DNMT TKO) BANP motif, which are shown in red. All annotated promoters were used in order not to bias the analysis towards promoters with Pol II signal in wild-type (Methods, ‘Annotations’). h, Expression changes (relative to wild-type cells) in the DNMT TKO degron cell line across a BANP degradation time course for the genes that gain binding and increase expression in DNMT TKO cells (Extended Data Fig. 2i, j). The three genes are inactivated in response to BANP removal in DNMT TKO cells, which is a combination of reduced BANP levels in the dTAG line (dTAG untreated) and induced degradation by the addition of the dTAG13 compound (dTAG 1–6 h). Initial expression levels of these genes in wild-type ES cells are below 0.1 RPKM and can thus be considered inactive. Bars show means of n = 6 for TKO, n = 2 for TKO dTAG 1h and otherwise n = 3 biological replicates. Individual replicates are shown as dots. Error bars denote ±1 standard deviation.

Extended Data Fig. 8 BANP binding in neurons is mostly conserved compared to ES cells but also shows cell-type-specific binding.

a, Immunofluorescence of mouse ES cells and derived neurons stained with Hoechst and calcein-AM (n = 3 replicates). b, Reproducibility of BANP enrichments at predicted BANP motifs (n = 1,207) as defined in Extended Data Fig. 1k. Pearson correlation coefficients are indicated. R1–R3 indicate biological replicates. UI, ES cells with uninduced Ngn2 construct. c, Scatter plot of the change in BANP binding from ES cells to neurons at predicted BANP-binding sites. R1 and R2 indicate biological replicates. d, Change in RNA compared to the change in BANP binding between mouse ES cells and neurons at predicted BANP motifs (n = 1,207). The Pearson correlation coefficient is indicated.

Extended Data Fig. 9 BANP-bound genes are rapidly downregulated in neurons.

a, Western blot of BANP in neurons demonstrates absence of protein following induced degradation by the addition of the dTAG13 compound to the medium (n = 3 replicates, replicate 1 shown). Arrowhead on right indicates the target protein. b, c, Same as Extended Data Fig. 4a, b, d, e, but for a neuron RNA-seq time course. Ey wt, wild-type ES cells. wtBANP, wild-type ES cells with the BANP dTAG. UI, uninduced, I, induced. D, dTAG-treated. d, e, Same as top two rows in Extended Data Fig. 5, but for a neuron RNA-seq time course. f, Scatter plots of the change in gene expression after BANP degradation in ES cells versus neurons. R indicates Pearson correlation coefficient. Bound BANP motifs are shown in red.

Extended Data Fig. 10 Open chromatin and phased nucleosomes around BANP-bound motifs in CGIs is linked to gene activity.

a, Hierarchically clustered correlation heat map of ATAC-seq signal at predicted BANP motifs across a BANP degradation time course. The main change in signal occurs already in the first hour of degradation. b, Average ATAC-seq profiles around bound BANP motifs across the time course, illustrating (as in a) that the main change occurs within the first hour. For each time point, there are two replicates shown in the same colour. Signal smoothed over 51 nt. c, Accessibility change relative to untreated at bound BANP motifs after removal of BANP. Same as inset in Fig. 4b, but for all time points. Dots represent individual replicates, bars the mean of the two replicates. d, Scatter plots of MNase-seq signal at predicted BANP motifs in untreated, 1-h-BANP-degraded (1 h) and 4-h-BANP-degraded (4 h) cells (n = 2 replicates combined per condition). The main change occurs within the first hour (see also e). The Pearson correlation coefficient is indicated. e, Scatter plot comparing changes in MNase-seq signal at predicted BANP motifs after 1 h and 4 h of BANP degradation relative to untreated, indicating little change from 1 h to 4 h. Pearson correlation coefficient indicated. f, Nucleosome phasing around the top 100 BANP or CTCF-bound motifs in CGI promoters. Profiles are oriented in the 5′ to 3′ direction to the corresponding genes. This highly organized chromatin is at odds with previous suggestions of low nucleosomal density at CGIs84, which we speculate reflects inefficient amplification of GC rich sequences in first-generation sequencing reagents. g, Changes in MNase-seq versus changes in ATAC-seq signal after 1h of BANP degradation at BANP motifs. Sites that lose accessibility tend to gain nucleosome signal. Pearson correlation coefficient indicated. h, i, Changes in ATAC-seq (h) and MNase-seq (i) after 1h of BANP degradation versus BANP binding strength. Loss in accessibility and gain in nucleosomal signal occurs mostly at bound sites. Pearson correlation coefficient indicated. jl, Change in expression at 6h versus untreated (exonic) compared to BANP binding (j), change in accessibility (k) or nucleosomal signal (l) after 1 h of BANP degradation at predicted BANP motifs. For the definition of gene–motif pairings, see Methods. Exonic changes at 6 h were used as they are similar to intronic changes at 1 h (Extended Data Fig. 5), but allow for the quantification of a larger number of genes. Accessibility and expression changes are positively correlated, whereas nucleosomal signal and expression changes are negatively correlated. P values were determined via an approximate permutation test (two-sided, n = 458 in all cases, Methods). m, The changes in expression (RNA, P = 2.510−16, robust F test, two-sided), accessibility (ATAC-seq, P = 0.015, robust F test, two-sided) and nucleosome positioning (MNase-seq, P = 1.310−10, robust F test, two-sided) after removal of BANP increase significantly with increasing binding strength (Methods). Unbound, below twofold enriched (IP/IgG). Weak, log2 enrichment (IP/IgG) between 1 and 4. Strong, log2 enrichment larger than 4. Box plots as in Extended Data Fig. 6d. n, Linear model to predict changes in expression (exonic) after 6 h of BANP degradation versus untreated cells using BANP binding, ATAC-seq changes after 1 h of BANP degradation (ATAC-seq), MNase-seq changes after 1 h of BANP degradation (MNase-seq) and distance of the BANP motif to TSS, a binary variable that indicates whether the motif lies within 100 nt upstream of the TSS (Methods). Only bound motifs were used (log2(IP/IgG) ≥ 1) and, to be able to cleanly assign motifs to genes, only genes with promoters that contained one bound motif and for which the motif did not overlap with any other promoter were used (n = 321). The models were evaluated via fivefold crossvalidation (Methods). Left, fraction of variance explained using only BANP binding (binding), binding and distance to TSS (binding + TSSdist) or BANP binding, distance to TSS, MNase-seq and ATAC-seq signal (binding + TSSdist + chromatin). Coloured dots refer to the performance of each model in each partition of the cross-validation. Chromatin information increases the predictive power of the model as is evident by the larger average fraction of explained variance of ‘Binding + TSS + chromatin’ (averaged over all five partitions) as well as the fact that ‘Binding + TSSdist + chromatin’ outperforms ‘Binding’ in all partitions and ‘Binding + TSSdist’ in 4 out of 5 partitions. Middle, inferred coefficients for the full model. Colours refer to the different partitions. Right, true expression changes versus predicted expression changes when using the average coefficients (averaged over all partitions) for prediction. In all panels, ATAC-seq, MNase-seq and ChIP–seq signal are quantified in a 201-bp window centred around the motif.

Supplementary information

Supplementary Tables

This file contains Supplementary Tables 1-6. Supplementary Table 1: Antibodies and dilutions used in this study. Supplementary Table 2: Motif sequences used for RMCE insertion and footprinting. A footprintable GpC was added to both ends of each motif to maximize the ability to detect a footprint (red). Supplementary Table 3: Oligonucleotide sequences used for affinity purification. Supplementary Table 4: Oligonucleotides used for PAQMAN assay. Supplementary Table 5: Motif sequences used for RMCE insertion and luciferase assays. Supplementary Table 6: BANP degradation time and s4U incorporation time for the SLAM-seq time course.

Reporting Summary

Peer Review File

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Grand, R.S., Burger, L., Gräwe, C. et al. BANP opens chromatin and activates CpG-island-regulated genes. Nature 596, 133–137 (2021). https://doi.org/10.1038/s41586-021-03689-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-021-03689-8

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing