Abstract
The majority of gene transcripts generated by RNA polymerase II in mammalian genomes initiate at CpG island (CGI) promoters1,2, yet our understanding of their regulation remains limited. This is in part due to the incomplete information that we have on transcription factors, their DNA-binding motifs and which genomic binding sites are functional in any given cell type3,4,5. In addition, there are orphan motifs without known binders, such as the CGCG element, which is associated with highly expressed genes across human tissues and enriched near the transcription start site of a subset of CGI promoters6,7,8. Here we combine single-molecule footprinting with interaction proteomics to identify BTG3-associated nuclear protein (BANP) as the transcription factor that binds this element in the mouse and human genome. We show that BANP is a strong CGI activator that controls essential metabolic genes in pluripotent stem and terminally differentiated neuronal cells. BANP binding is repelled by DNA methylation of its motif in vitro and in vivo, which epigenetically restricts most binding to CGIs and accounts for differential binding at aberrantly methylated CGI promoters in cancer cells. Upon binding to an unmethylated motif, BANP opens chromatin and phases nucleosomes. These findings establish BANP as a critical activator of a set of essential genes and suggest a model in which the activity of CGI promoters relies on methylation-sensitive transcription factors that are capable of chromatin opening.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Mohn, F. & Schübeler, D. Genetics and epigenetics: stability and plasticity during cellular differentiation. Trends Genet. 25, 129–136 (2009).
Deaton, A. M. & Bird, A. CpG islands and the regulation of transcription. Genes Dev. 25, 1010–1022 (2011).
Lambert, S. A. et al. The human transcription factors. Cell 172, 650–665 (2018).
Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A. & Luscombe, N. M. A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 10, 252–263 (2009).
Slattery, M. et al. Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 39, 381–399 (2014).
Ernst, J. et al. Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions. Nat. Biotechnol. 34, 1180–1190 (2016).
FitzGerald. P. C., Shlyakhtenko, A., Mir, A. A. & Vinson, C. Clustering of DNA sequences in human promoters. Genome Res. 14, 1562–1574 (2004).
Pique-Regi, R. et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447–455 (2011).
Yang, J. G., Madrid, T. S., Sevastopoulos, E. & Narlikar, G. J. The chromatin-remodeling enzyme ACF is an ATP-dependent DNA length sensor that regulates nucleosome spacing. Nat. Struct. Mol. Biol. 13, 1078–1083 (2006).
Lienert, F. et al. Identification of genetic elements that autonomously determine DNA methylation states. Nat. Genet. 43, 1091–1097 (2011).
Pardo, C. E., Darst, R. P., Nabilsi, N. H., Delmas, A. L. & Kladde, M. P. Simultaneous single-molecule mapping of protein–DNA interactions and DNA methylation by MAPit. Curr. Protoc. Mol. Biol. 95, 21.22.1–21.22.18 (2011).
Sönmezer, C. et al. Molecular co-occupancy identifies transcription factor binding cooperativity in vivo. Mol. Cell 81, 255–267 (2021).
Makowski, M. M. et al. Global profiling of protein–DNA and protein-nucleosome binding affinities using quantitative mass spectrometry. Nat. Commun. 9, 1653 (2018).
Saksouk, N. et al. Redundant mechanisms to form silent chromatin at pericentromeric regions rely on BEND3 and DNA methylation. Mol. Cell 56, 580–594 (2014).
Dai, Q. et al. The BEN domain is a novel sequence-specific DNA-binding domain conserved in neural transcriptional repressors. Genes Dev. 27, 602–614 (2013).
Dai, Q. et al. Common and distinct DNA-binding and regulatory activities of the BEN-solo transcription factor family. Genes Dev. 29, 48–62 (2015).
Khan, A. & Prasanth, S. G. BEND3 mediates transcriptional repression and heterochromatin organization. Transcription 6, 102–105 (2015).
Sathyan, K. M., Shen, Z., Tripathi, V., Prasanth, K. V. & Prasanth, S. G. A BEN-domain-containing protein associates with heterochromatin and represses transcription. J. Cell Sci. 124, 3149–3163 (2011).
Rampalli, S., Pavithra, L., Bhatt, A., Kundu, T. K. & Chattopadhyay, S. Tumor suppressor SMAR1 mediates cyclin D1 repression by recruitment of the SIN3/histone deacetylase 1 complex. Mol. Cell. Biol. 25, 8415–8429 (2005).
Sreenath, K. et al. Nuclear matrix protein SMAR1 represses HIV-1 LTR mediated transcription through chromatin remodeling. Virology 400, 76–85 (2010).
Domcke, S. et al. Competition between DNA methylation and transcription factors determines binding of NRF1. Nature 528, 575–579 (2015).
Baylin, S. B. & Jones, P. A. Epigenetic determinants of cancer. Cold Spring Harb. Perspect. Biol. 8, a019505 (2016).
Berman, B. P. et al. Regions of focal DNA hypermethylation and long-range hypomethylation in colorectal cancer coincide with nuclear lamina-associated domains. Nat. Genet. 44, 40–46 (2011).
Mahpour, A., Scruggs, B. S., Smiraglia, D., Ouchi, T. & Gelman, I. H. A methyl-sensitive element induces bidirectional transcription in TATA-less CpG island-associated promoters. PLoS ONE 13, e0205608 (2018).
Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101 (2015).
McDonald, E. R., III et al. Project DRIVE: a compendium of cancer dependencies and synthetic lethal relationships uncovered by large-scale, deep RNAi screening. Cell 170, 577–592 (2017).
Nabet, B. et al. The dTAG system for immediate and target-specific protein degradation. Nat. Chem. Biol. 14, 431–441 (2018).
Muhar, M. et al. SLAM-seq defines direct gene-regulatory functions of the BRD4–MYC axis. Science 360, 800–805 (2018).
Gaidatzis, D., Burger, L., Florescu, M. & Stadler, M. B. Analysis of intronic and exonic reads in RNA-seq data characterizes transcriptional and post-transcriptional regulation. Nat. Biotechnol. 33, 722–729 (2015).
Dahlet, T. et al. Genome-wide analysis in the mouse embryo reveals the importance of DNA methylation for transcription integrity. Nat. Commun. 11, 3153 (2020).
Thoma, E. C. et al. Ectopic expression of neurogenin 2 alone is sufficient to induce differentiation of embryonic stem cells into mature neurons. PLoS ONE 7, e38651 (2012).
Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 109, 21.29.1–21.29.9 (2015).
Fu, Y., Sinha, M., Peterson, C. L. & Weng, Z. The insulator binding protein CTCF positions 20 nucleosomes around its binding sites across the human genome. PLoS Genet. 4, e1000138 (2008).
Vierstra, J. et al. Global reference mapping of human transcription factor footprints. Nature 583, 729–736 (2020).
Wilson, B. C. et al. Intellectual disability-associated factor Zbtb11 cooperates with NRF-2/GABP to control mitochondrial function. Nat. Commun. 11, 5469 (2020).
Stielow, B. et al. The SAM domain-containing protein 1 (SAMD1) acts as a repressive chromatin regulator at unmethylated CpG islands. Sci. Adv. 7, eabf2229 (2021).
Weber, M. et al. Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat. Genet. 39, 457–466 (2007).
Iurlaro, M. et al. Mammalian SWI/SNF continuously restores local accessibility to chromatin. Nat. Genet. 53, 279–287 (2021).
Schick, S. et al. Acute BAF perturbation causes immediate changes in chromatin accessibility. Nat. Genet. 53, 269–278 (2021).
Hartl, D. et al. CG dinucleotides enhance promoter activity independent of DNA methylation. Genome Res. 29, 554–563 (2019).
Mohn, F. et al. Lineage-specific polycomb targets and de novo DNA methylation define restriction and potential of neuronal progenitors. Mol. Cell 30, 755–766 (2008).
Zhang, Y. et al. Rapid single-step induction of functional neurons from human pluripotent stem cells. Neuron 78, 785–798 (2013).
Lowary, P. T. & Widom, J. New DNA sequence rules for high affinity binding to histone octamer and sequence-directed nucleosome positioning. J. Mol. Biol. 276, 19–42 (1998).
Khan, A. et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res. 46, D260–D266 (2018).
Feng, Y. Q. et al. Site-specific chromosomal integration in mammalian cells: highly efficient CRE recombinase-mediated cassette exchange. J. Mol. Biol. 292, 779–785 (1999).
Gaidatzis, D., Lerch, A., Hahne, F. & Stadler, M. B. QuasR: quantification and annotation of short reads in R. Bioinformatics 31, 1130–1132 (2015).
Ostapcuk, V. et al. Activity-dependent neuroprotective protein recruits HP1 and CHD4 to control lineage-specifying genes. Nature 557, 739–743 (2018).
Cox, J. et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794–1805 (2011).
Cox, J. et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol. Cell. Proteomics 13, 2513–2526 (2014).
Hubner, N. C. et al. Quantitative proteomics combined with BAC TransgeneOmics reveals in vivo protein interactions. J. Cell Biol. 189, 739–754 (2010).
Tyanova, S. et al. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat. Methods 13, 731–740 (2016).
Tusher, V. G., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA 98, 5116–5121 (2001).
Wang, Y. et al. Reversed-phase chromatography with multiple fraction concatenation strategy for proteome profiling of human MCF10A cells. Proteomics 11, 2019–2026 (2011).
Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).
Gräwe, C., Makowski, M. M. & Vermeulen, M. PAQMAN: protein-nucleic acid affinity quantification by mass spectrometry in nuclear extracts. Methods 184, 70–77 (2020).
Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017).
Cui, K. & Zhao, K. Genome-wide approaches to determining nucleosome occupancy in metazoans using MNase-seq. Methods Mol. Biol. 833, 413–419 (2012).
Gaidatzis, D. et al. DNA sequence explains seemingly disordered methylation levels in partially methylated domains of mammalian genomes. PLoS Genet. 10, e1004143 (2014).
Barisic, D., Stadler, M. B., Iurlaro, M. & Schübeler, D. Mammalian ISWI and SWI/SNF selectively mediate binding of distinct transcription factors. Nature 569, 136–140 (2019).
Abdulrahman, W. et al. A set of baculovirus transfer vectors for screening of affinity tags and parallel expression strategies. Anal. Biochem. 385, 383–385 (2009).
Huber, W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12, 115–121 (2015).
Lawrence, M., Gentleman, R. & Carey, V. rtracklayer: an R package for interfacing with genome browsers. Bioinformatics 25, 1841–1842 (2009).
Ginno, P. A. et al. A genome-scale map of DNA methylation turnover identifies site-specific dependencies of DNMT and TET activity. Nat. Commun. 11, 2680 (2020).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Zhang, Y. et al. Model-based analysis of ChIP–seq (MACS). Genome Biol. 9, R137 (2008).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Héberlé, É. & Bardet, A. F. Sensitivity of transcription factors to DNA methylation. Essays Biochem. 63, 727–741 (2019).
Buck-Koehntop, B. A. et al. Molecular basis for recognition of methylated and specific DNA sequences by the zinc finger protein Kaiso. Proc. Natl Acad. Sci. USA 109, 15229–15234 (2012).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Arnold, P. et al. Modeling of epigenome dynamics identifies transcription factors that mediate Polycomb targeting. Genome Res. 23, 60–73 (2013).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Neumann, T. et al. Quantification of experimentally induced nucleotide conversions in high-throughput sequencing datasets. BMC Bioinformatics 20, 258 (2019).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10 (2011).
Stadler, M. B. et al. DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature 480, 490–495 (2011).
Blattler, A. et al. Global loss of DNA methylation uncovers intronic enhancers in genes showing expression changes. Genome Biol. 15, 469 (2014).
Xuan Lin, Q. X. et al. MethMotif: an integrative cell specific database of transcription factor binding motifs coupled with DNA methylation profiles. Nucleic Acids Res. 47, D145–D154 (2019).
Hon, G. C. et al. Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer. Genome Res. 22, 246–258 (2012).
The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Venables, W. N. & Ripley, B. D. Modern Applied Statistics with S 4th edn (Springer, 2002).
Hahne, F. & Ivanek, R. Visualizing genomic data using Gviz and Bioconductor. Methods Mol Biol. 1418, 335–351 (2016).
Alexa, A., Rahnenführer, J. & Lengauer, T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600–1607 (2006).
Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9, e1003118 (2013).
Fenouil, R. et al. CpG islands and GC content dictate nucleosome depletion in a transcription-independent manner at mammalian promoters. Genome Res. 22, 2399–2408 (2012).
Acknowledgements
We thank P. Papasaikas for help with the SLAM-seq analysis; M. Frederiksen and N. Leroy from the Novartis Institutes of Biomedical Research for providing the dTAG13 compound; and M. Lorincz and members of the D.S. laboratory for critical feedback on the manuscript. D.S. and N.T. acknowledge support from the Novartis Research Foundation, the Swiss National Science Foundation (310030B_176394 to D.S. and 31003A_179541 to N.T.) and the European Research Council under the European Union’s (EU) Horizon 2020 research and innovation programme grant agreements (ReadMe-667951 and DNAaccess-884664 to D.S. and CsnCRL-666068 and NucEM-884331 to N.T.). M.V. is part of the Oncode Institute, which is partly funded by the Dutch Cancer Society. R.S.G., A.K.M. and S.D. acknowledge EMBO Long-Term Fellowships. R.S.G and L.I. acknowledge the EU Horizon 2020 Research and Innovation Program under the Marie Sklodowska-Curie grant (705354 to R.S.G. and 748760 to L.I.). A.K.M. acknowledges the Human Frontier Science Program. L.I. acknowledges the National Health and Medical Research Council CJ Martin Fellowship APP1148380. A.R.K. acknowledges support from the European Molecular Biology Laboratory, Deutsche Forschungsgemeinschaft (KR 5247/1-1) and a Swiss National Fund Ambizione grant (PZOOP3_161493).
Author information
Authors and Affiliations
Contributions
R.S.G., L.B. and D.S. conceived and planned the experiments. R.S.G. performed all experiments, performed SMF analysis and contributed to initial data analysis. L.B. performed comprehensive computational data analysis. C.G. and M.V. validated the affinity purification and performed PAQMAN analysis. A.K.M. performed and N.T. supervised protein purification and biochemistry assays. L.I. assisted with genomics and biochemistry assays. D.H. and V.I. performed mass spectrometry quantification and initial data processing. L.H. assisted with western blots, MNase-seq and cell line maintenance. S.D. performed and analysed the immunofluorescence experiments. M.P. and A.R.K. assisted in the establishment of the SMF method. S.A.S. advised on and oversaw the generation of next-generation-sequencing data. D.S. supervised the project. R.S.G., L.B. and D.S. interpreted the results and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature thanks Eric Mendenhall and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 BANP binds the orphan CGCG element in the mouse genome.
a, Footprint created by REST bound to its motif (red) and CpG methylation around the bound motif (black). Motif indicated by grey rectangle in the middle. b, No footprint over the scrambled REST motif (red) and corresponding CpG methylation at this inserted construct (black). Individual biological replicates are shown (n = 2). Red line is the mean. c, ChIP–seq read counts for REST at two genomic loci with a REST motif (top and middle) and one without (bottom). SMF amplicon indicated in blue, REST motif in grey12. d, Footprinting of the corresponding loci in c. A footprint of around 30 bp was detected over the REST motif (top and middle) compared to a site without a motif. The transcription factor footprint is distinguishable from the neighbouring nucleosome footprint by size—around 30 bp compared to around 150 bp in width. Individual biological replicates are shown (n = 4). Red line is the mean. e, f, Same as Fig. 1b but including CpG methylation. Individual biological replicates are shown (n = 2). Red line is the mean. g, Quantitative mass spectrometry (PAQMAN) determines affinity of BANP for the CGCG element to be around 18.5 nM. Binding curves were generated by fitting the parameters of the Hill equation to determine the relative equilibrium dissociation constant (Kdapp). Data are the mean of three experiments (n = 3), error bars represent standard error of the mean. h, Reproducibility of enrichments at peaks (n = 1302) for three independent BANP ChIP–seq replicates (R1–R3) from wild-type mouse ES cells. Pearson correlation coefficients are indicated. i, Top motif found by de novo motif search in the top 500 peaks of each ChIP–seq replicate (R1–R3) using HOMER. j, 6mer enrichments as measured by Pearson residuals (Methods) at the top 500 peaks inside CpG islands (CGIs) and outside of CGIs (nonCGI). The five sub-6mers of the highest-scoring motif in i (TCTCGCGAGA), TCTCGC, CTCGCG, TCGCGA, CGCGAG and GCGAGA are marked in red. R1–R3 indicate biological replicates. k, BANP motif instances of varying motif scores were predicted genome-wide (Methods) and the fraction of predicted motifs that overlap common peaks (peaks identified in all three replicates, Methods) was determined for equally spaced bins of motif scores (missing bins do not contain any predicted sites). The chosen cut-off of 12.5 is indicated by a dashed line. l, The fraction of common BANP peaks in varying bins of BANP enrichment that contain a predicted BANP motif using the cut-off of 12.5 defined in k. m, Reproducibility of BANP enrichments at predicted BANP motifs (n = 1207) as defined in k. Pearson correlation coefficients are indicated. R1–R3 indicate biological replicates. n, GO enrichment of genes that contain a bound BANP motif. The top 30 most significant GO categories are shown.
Extended Data Fig. 2 BANP is methylation-sensitive in mouse and human cells.
a, BANP binding versus percentage methylation of the CpGs in the BANP motif. n indicates the number of motifs per bin. R indicates Pearson correlation coefficient. Black lines correspond to median, boxes to first and third quartile and whiskers to the maximum and minimum values of the distribution after removal of outliers, in which outliers are defined as more than 1.5 × (interquartile range) away from the box (Methods). b, Fraction of variance in BANP binding explained by a linear model that incorporates either motif score or methylation of the motif, or both (Methods). c, True versus predicted BANP ChIP–seq enrichments at predicted motifs (as defined in Extended Data Fig. 1k) for a linear model that uses motif score (left), methylation (middle) or both motif score and methylation (right) as predictors. Fraction of variance explained is indicated as R2. d, PAQMAN using a methylated (Meth.) BANP motif reduces affinity by more than 16-fold compared to an unmethylated (unmeth.) motif. Binding curves were generated by fitting the parameters of the Hill equation to determine the relative equilibrium dissociation constant (Kdapp). Data are the mean of three experiments (n = 3), error bars represent standard error of the mean. e, Top motif found by HOMER in the top 500 peaks of each replicate (R1–R3) in DNMT TKO cells. f, Reproducibility of changes in BANP binding in DNMT TKO versus wild-type ES cells at predicted BANP motifs (n = 1,207). Pearson correlation coefficient is indicated. R1–R3 indicate biological replicates. g, Change in BANP binding in DNMT TKO versus wild-type ES cells compared to the methylation level of the motif (WGBS) in wild-type ES cells at predicted BANP motifs (n = 1,207). h, Distribution of motif scores as a function of change in BANP binding in DNMT TKO versus wild-type ES cells. Box plots as in a. Notches extend to ± 1.58 × (interquartile range/sqrt(n)). i, Single-locus examples of BANP binding in wild-type ES cells and DNMT TKO cells at promoters with a methylated BANP motif. Methylation of the CpGs in the motif is indicated by the colour of the circles above the motif. Colour range from white (0% methylation) to black (100% methylation). For Tex13b, the circle represents the average methylation of both CpGs as the coverage was too low to quantify each CpG separately. j, Expression changes versus changes in BANP binding between wild-type ES cells and DNMT TKO cells at genes with a predicted BANP motif in their promoter. For the definition of gene–motif pairings, see Methods. k, Superose 6 increase 10/300 GL size exclusion chromatography profile of full-length (FL) BANP protein. Peak fractions were analysed by SDS–PAGE and stained by Coomassie (inset) showing protein size and high purity. l, Electrophoretic mobility shift assay of full-length BANP binding to the unmethylated (left), methylated (middle) or scrambled (right) BANP motif (n = 2 replicates). m, Reproducibility of changes in BANP binding at peaks between the human cancer cell line HCT116 and HCC1954. R indicates the Pearson correlation coefficient. n, Top motif found by de novo motif search (HOMER) in the top 500 peaks of the first replicate of both cell types. Motifs found for the remaining replicates are very similar (data not shown). o, The fraction of peaks that contain a BANP motif as a function of peak strength in the first replicate of both cell types. Peaks were sorted by read counts and binned into groups of 250 peaks (each bar representing one group). Although we identified between around 14,000–24,000 peaks, only the top bins show a high fraction of peaks with motif (results very similar for the remaining replicates, data not shown). The additional peaks are likely to be false positives owing to an open chromatin bias in the ChIP–seq data as shown for HCT116 in p. p, BANP binding versus DNaseI in HCT116 cells in 1-kb tiling windows of chromosome 1. There is a global correlation of ChIP–seq and DNaseI signal, which probably explains the large number of peaks without a BANP motif. q, GO enrichment of genes that contain a bound BANP motif. The 30 most significant GO categories are shown and GO categories were grouped as in Extended Data Fig. 1n. Similarities to the mouse GO analysis are indicated by coloured bars and arrows (Extended Data Fig. 1n). r, Differential methylation versus differential binding at BANP motifs bound in at least one of the two cell types. Single locus displayed in Fig. 2d is circled. R indicates Pearson correlation coefficient. s, t, BANP binding and DNA methylation in HCT116 and HCC1954 cells at a BANP motif at a differentially methylated CGI shore (s) or in a CGI that lies in a partially methylated domain (t). Methylation of the CpGs in the motif is indicated by the colour of the circles above the motif. Colour range from white (0% methylation) to black (100% methylation). u, Negative correlation between methylation in the BANP motif and BANP binding at CGI promoter sites in several human cancer cell lines of different origin, using DNaseI hypersensitivity as an indicator of BANP binding. Both methylation and DNaseI are shown relative to the average level across all cell types. Only sites that are bound in at least one cell line are shown (Methods). Pearson correlation coefficients are indicated.
Extended Data Fig. 3 Inducible BANP depletion by targeted degradation allows the loss of function of this essential gene to be studied.
a, Activation of a luciferase reporter gene by one, two or three copies of the BANP motif after transient transfection into mouse ES cells. b, BANP lethality score from genome wide CRISPR screens across more than 500 cell lines. A gene with a score below −0.5 is considered a common essential gene (Broad Institute (https://depmap.org/portal/)). Black lines correspond to median, boxes to first and third quartile and whiskers to the maximum and minimum values of the distribution. c, Full sized western blot of BANP in wild-type and DNMT TKO cell lines before and after addition of the dTAG demonstrates the reduced level of BANP due to tagging, and the absence of protein following induced degradation by the addition of the dTAG13 compound to the medium (n = 3 replicates). Arrowhead on right indicates the target protein. d, The cell-cycle phase distribution determined by BrdU incorporation followed by flow cytometry analysis of wild-type and DNMT TKO cells before and after endogenous tagging of the Banp gene (n = 2 replicates). e, Immunofluorescence visualizes BANP degradation in mouse ES cells (n = 2 replicates). a.u., arbitrary units. f, Quantification of cell death in wild-type and TKO cells after inducing BANP depletion by the addition of the dTAG13 compound. Individual data points are shown and the bars represent the mean of three biological replicates. g, Level of s4U incorporation at different time points of a BANP degradation time course, in which D stands for the time of induced BANP degradation and T for the time of incorporation. Percentage refers to the fraction of Ts converted to Cs. Although cells untreated with s4U show very low percentages (wild-type untreated R1–R3), the percentages increase with increasing incorporation time. Only genes with at least a total count of 50 reads overlapping Ts in all replicates are shown. R1–R3 indicate biological replicates. Black lines correspond to median, boxes to first and third quartile and whiskers to the maximum and minimum values of the distribution after removal of outliers, in which outliers are defined as more than 1.5 × (interquartile range) away from the box. n = 13,801 genes. h, Change in gene expression level of BANP-bound and unbound genes between wild-type and BANP degron-tagged cells. There is no consistent global difference between BANP bound and unbound genes. Same y range as Fig. 3c for comparison. Box plots as in g. Notches extend to ± 1.58 × (interquartile range/sqrt(n)). i, Change in RNA across a BANP degradation time course showing the response of unbound genes. Same y range as Fig. 3c for comparison. Box plots as in g. j, Beeswarm plot of expression changes (log2) after 6 h of dTAG treatment (versus untreated) for all genes with a bound BANP motif belonging to one of the groups of GO categories as defined in Extended Data Fig. 1n.
Extended Data Fig. 4 Correlation heat maps for the wild-type ES cell RNA-seq time course.
a–f, Pearson correlations between samples for all quantifiable genes on exon level (a), intron level (b) and in SLAM-seq (c). d–f, Same as a–c but for all quantifiable genes with a BANP motif in their promoter. To remove correlations due to varying gene lengths, counts in all three measures were converted to log2 RPKM values before determining the correlation coefficients. r1–r3 indicate biological replicates. Samples named as in Extended Data Fig. 3g. a–f illustrate high reproducibility between replicates. d–f indicate that changes on the transcriptional level at BANP target genes occur fast whereas changes on the mRNA level are delayed. Note that the correlation structure in SLAM-seq is also influenced by the varying incorporation times.
Extended Data Fig. 5 Reproducibility and comparison of exonic, intronic and SLAM-seq signal during a time course of BANP degradation.
Top three rows show reproducibility of log2 changes relative to untreated for exonic, intronic and SLAM-seq, respectively. The first two replicates are shown in each case (R1 and R2). Bottom three rows show the same changes comparing SLAM-seq to intronic, intronic to exonic and SLAM-seq to exonic signal, respectively. Average of all replicates is shown. Comparisons indicate a high degree of similarity between intronic and SLAM-seq signal and a delayed exonic response. In all figures, Pearson correlations were calculated on all genes with a promoter that overlaps a bound BANP motif, highlighted in red.
Extended Data Fig. 6 The downregulation of BANP target genes is also detected at the protein level.
a, Pearson correlations of log2 changes in protein levels at different time points relative to untreated for all quantifiable genes (left) or for all quantifiable genes with a BANP motif in their promoter (right). Reproducible changes can be clearly observed after 10 h. R1–R3 indicate biological replicates. b, Scatter plots showing the correlation between RNA (exonic) and protein levels across a BANP degradation time course (n = 8,128 genes). In all panels, Pearson correlations were calculated on all genes with a promoter that overlaps a bound BANP motif (n = 357 genes), shown in red. As the aim of these comparisons is to see how mRNA changes of BANP targets are reflected at the protein level, but BANP itself, which has a BANP motif in its promoter, has been degraded at the protein level, it was removed from the comparisons. Its protein level changes are shown in c (top). The 10-h time point is missing as it was not measured in RNA. c, Depletion of BANP (top) and an essential BANP target gene, TUBGCP5 (bottom), relative to untreated cells across the BANP degradation time course. d, Total proteome showing the downregulation of BANP target genes at the protein level over a BANP degradation time course. BANP itself was removed as in b. Black lines correspond to median, boxes to first and third quartile and whiskers to the maximum and minimum values of the distribution after removal of outliers, in which outliers are defined as more than 1.5 × (interquartile range) away from the box. Notches extend to ± 1.58 × (interquartile range/sqrt(n)). Bound: n = 357 genes, unbound: n = 7,861 genes. e, Western blot for the essential BANP target gene TUBGCP5 over a degradation time course (n = 2 replicates). Arrowhead on right indicates the target protein. f, Quantification of the TUBGCP5 protein level in e normalized to the loading control. Individual replicates are shown, and the bars represent the mean.
Extended Data Fig. 7 BANP regulates a similar set of genes in DNMT TKO cells and is necessary and sufficient to drive expression of TKO-specific bound genes.
a–d, Same as Extended Data Fig. 4a, b, d, e, but for a DNMT TKO RNA-seq time course. e, f, Same as top two rows in Extended Data Fig. 5, but for a DNMT TKO RNA-seq time course. g, Comparison of the RNA response (exonic level) to BANP removal in wild-type versus TKO cell lines. In all panels, Pearson correlations were calculated on all genes with a promoter that overlaps a bound (bound in either wild-type or DNMT TKO) BANP motif, which are shown in red. All annotated promoters were used in order not to bias the analysis towards promoters with Pol II signal in wild-type (Methods, ‘Annotations’). h, Expression changes (relative to wild-type cells) in the DNMT TKO degron cell line across a BANP degradation time course for the genes that gain binding and increase expression in DNMT TKO cells (Extended Data Fig. 2i, j). The three genes are inactivated in response to BANP removal in DNMT TKO cells, which is a combination of reduced BANP levels in the dTAG line (dTAG untreated) and induced degradation by the addition of the dTAG13 compound (dTAG 1–6 h). Initial expression levels of these genes in wild-type ES cells are below 0.1 RPKM and can thus be considered inactive. Bars show means of n = 6 for TKO, n = 2 for TKO dTAG 1h and otherwise n = 3 biological replicates. Individual replicates are shown as dots. Error bars denote ±1 standard deviation.
Extended Data Fig. 8 BANP binding in neurons is mostly conserved compared to ES cells but also shows cell-type-specific binding.
a, Immunofluorescence of mouse ES cells and derived neurons stained with Hoechst and calcein-AM (n = 3 replicates). b, Reproducibility of BANP enrichments at predicted BANP motifs (n = 1,207) as defined in Extended Data Fig. 1k. Pearson correlation coefficients are indicated. R1–R3 indicate biological replicates. UI, ES cells with uninduced Ngn2 construct. c, Scatter plot of the change in BANP binding from ES cells to neurons at predicted BANP-binding sites. R1 and R2 indicate biological replicates. d, Change in RNA compared to the change in BANP binding between mouse ES cells and neurons at predicted BANP motifs (n = 1,207). The Pearson correlation coefficient is indicated.
Extended Data Fig. 9 BANP-bound genes are rapidly downregulated in neurons.
a, Western blot of BANP in neurons demonstrates absence of protein following induced degradation by the addition of the dTAG13 compound to the medium (n = 3 replicates, replicate 1 shown). Arrowhead on right indicates the target protein. b, c, Same as Extended Data Fig. 4a, b, d, e, but for a neuron RNA-seq time course. Ey wt, wild-type ES cells. wtBANP, wild-type ES cells with the BANP dTAG. UI, uninduced, I, induced. D, dTAG-treated. d, e, Same as top two rows in Extended Data Fig. 5, but for a neuron RNA-seq time course. f, Scatter plots of the change in gene expression after BANP degradation in ES cells versus neurons. R indicates Pearson correlation coefficient. Bound BANP motifs are shown in red.
Extended Data Fig. 10 Open chromatin and phased nucleosomes around BANP-bound motifs in CGIs is linked to gene activity.
a, Hierarchically clustered correlation heat map of ATAC-seq signal at predicted BANP motifs across a BANP degradation time course. The main change in signal occurs already in the first hour of degradation. b, Average ATAC-seq profiles around bound BANP motifs across the time course, illustrating (as in a) that the main change occurs within the first hour. For each time point, there are two replicates shown in the same colour. Signal smoothed over 51 nt. c, Accessibility change relative to untreated at bound BANP motifs after removal of BANP. Same as inset in Fig. 4b, but for all time points. Dots represent individual replicates, bars the mean of the two replicates. d, Scatter plots of MNase-seq signal at predicted BANP motifs in untreated, 1-h-BANP-degraded (1 h) and 4-h-BANP-degraded (4 h) cells (n = 2 replicates combined per condition). The main change occurs within the first hour (see also e). The Pearson correlation coefficient is indicated. e, Scatter plot comparing changes in MNase-seq signal at predicted BANP motifs after 1 h and 4 h of BANP degradation relative to untreated, indicating little change from 1 h to 4 h. Pearson correlation coefficient indicated. f, Nucleosome phasing around the top 100 BANP or CTCF-bound motifs in CGI promoters. Profiles are oriented in the 5′ to 3′ direction to the corresponding genes. This highly organized chromatin is at odds with previous suggestions of low nucleosomal density at CGIs84, which we speculate reflects inefficient amplification of GC rich sequences in first-generation sequencing reagents. g, Changes in MNase-seq versus changes in ATAC-seq signal after 1h of BANP degradation at BANP motifs. Sites that lose accessibility tend to gain nucleosome signal. Pearson correlation coefficient indicated. h, i, Changes in ATAC-seq (h) and MNase-seq (i) after 1h of BANP degradation versus BANP binding strength. Loss in accessibility and gain in nucleosomal signal occurs mostly at bound sites. Pearson correlation coefficient indicated. j–l, Change in expression at 6h versus untreated (exonic) compared to BANP binding (j), change in accessibility (k) or nucleosomal signal (l) after 1 h of BANP degradation at predicted BANP motifs. For the definition of gene–motif pairings, see Methods. Exonic changes at 6 h were used as they are similar to intronic changes at 1 h (Extended Data Fig. 5), but allow for the quantification of a larger number of genes. Accessibility and expression changes are positively correlated, whereas nucleosomal signal and expression changes are negatively correlated. P values were determined via an approximate permutation test (two-sided, n = 458 in all cases, Methods). m, The changes in expression (RNA, P = 2.5⋅10−16, robust F test, two-sided), accessibility (ATAC-seq, P = 0.015, robust F test, two-sided) and nucleosome positioning (MNase-seq, P = 1.3⋅10−10, robust F test, two-sided) after removal of BANP increase significantly with increasing binding strength (Methods). Unbound, below twofold enriched (IP/IgG). Weak, log2 enrichment (IP/IgG) between 1 and 4. Strong, log2 enrichment larger than 4. Box plots as in Extended Data Fig. 6d. n, Linear model to predict changes in expression (exonic) after 6 h of BANP degradation versus untreated cells using BANP binding, ATAC-seq changes after 1 h of BANP degradation (ATAC-seq), MNase-seq changes after 1 h of BANP degradation (MNase-seq) and distance of the BANP motif to TSS, a binary variable that indicates whether the motif lies within 100 nt upstream of the TSS (Methods). Only bound motifs were used (log2(IP/IgG) ≥ 1) and, to be able to cleanly assign motifs to genes, only genes with promoters that contained one bound motif and for which the motif did not overlap with any other promoter were used (n = 321). The models were evaluated via fivefold crossvalidation (Methods). Left, fraction of variance explained using only BANP binding (binding), binding and distance to TSS (binding + TSSdist) or BANP binding, distance to TSS, MNase-seq and ATAC-seq signal (binding + TSSdist + chromatin). Coloured dots refer to the performance of each model in each partition of the cross-validation. Chromatin information increases the predictive power of the model as is evident by the larger average fraction of explained variance of ‘Binding + TSS + chromatin’ (averaged over all five partitions) as well as the fact that ‘Binding + TSSdist + chromatin’ outperforms ‘Binding’ in all partitions and ‘Binding + TSSdist’ in 4 out of 5 partitions. Middle, inferred coefficients for the full model. Colours refer to the different partitions. Right, true expression changes versus predicted expression changes when using the average coefficients (averaged over all partitions) for prediction. In all panels, ATAC-seq, MNase-seq and ChIP–seq signal are quantified in a 201-bp window centred around the motif.
Supplementary information
Supplementary Tables
This file contains Supplementary Tables 1-6. Supplementary Table 1: Antibodies and dilutions used in this study. Supplementary Table 2: Motif sequences used for RMCE insertion and footprinting. A footprintable GpC was added to both ends of each motif to maximize the ability to detect a footprint (red). Supplementary Table 3: Oligonucleotide sequences used for affinity purification. Supplementary Table 4: Oligonucleotides used for PAQMAN assay. Supplementary Table 5: Motif sequences used for RMCE insertion and luciferase assays. Supplementary Table 6: BANP degradation time and s4U incorporation time for the SLAM-seq time course.
Rights and permissions
About this article
Cite this article
Grand, R.S., Burger, L., Gräwe, C. et al. BANP opens chromatin and activates CpG-island-regulated genes. Nature 596, 133–137 (2021). https://doi.org/10.1038/s41586-021-03689-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-021-03689-8
This article is cited by
-
Integrative cross-omics and cross-context analysis elucidates molecular links underlying genetic effects on complex traits
Nature Communications (2024)
-
Epigenomic insights into common human disease pathology
Cellular and Molecular Life Sciences (2024)
-
Emergence and influence of sequence bias in evolutionarily malleable, mammalian tandem arrays
BMC Biology (2023)
-
Predicting the impact of sequence motifs on gene regulation using single-cell data
Genome Biology (2023)
-
Transposable elements as tissue-specific enhancers in cancers of endodermal lineage
Nature Communications (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.