DNA-dependent formation of transcription factor pairs alters their binding specificity

Jolma, Arttu; Yin, Yimeng; Nitta, Kazuhiro R.; Dave, Kashyap; Popov, Alexander; Taipale, Minna; Enge, Martin; Kivioja, Teemu; Morgunova, Ekaterina; Taipale, Jussi

doi:10.1038/nature15518

Letter
Published: 09 November 2015

DNA-dependent formation of transcription factor pairs alters their binding specificity

Arttu Jolma¹,
Yimeng Yin¹,
Kazuhiro R. Nitta¹,
Kashyap Dave¹,
Alexander Popov²,
Minna Taipale¹,
Martin Enge¹,
Teemu Kivioja³,
Ekaterina Morgunova¹ &
…
Jussi Taipale^1,3

Nature volume 527, pages 384–388 (2015)Cite this article

42k Accesses
329 Citations
131 Altmetric
Metrics details

Subjects

Abstract

Gene expression is regulated by transcription factors (TFs), proteins that recognize short DNA sequence motifs^1,2,3. Such sequences are very common in the human genome, and an important determinant of the specificity of gene expression is the cooperative binding of multiple TFs to closely located motifs^4,5,6. However, interactions between DNA-bound TFs have not been systematically characterized. To identify TF pairs that bind cooperatively to DNA, and to characterize their spacing and orientation preferences, we have performed consecutive affinity-purification systematic evolution of ligands by exponential enrichment (CAP-SELEX) analysis of 9,400 TF–TF–DNA interactions. This analysis revealed 315 TF–TF interactions recognizing 618 heterodimeric motifs, most of which have not been previously described. The observed cooperativity occurred promiscuously between TFs from diverse structural families. Structural analysis of the TF pairs, including a novel crystal structure of MEIS1 and DLX3 bound to their identified recognition site, revealed that the interactions between the TFs were predominantly mediated by DNA. Most TF pair sites identified involved a large overlap between individual TF recognition motifs, and resulted in recognition of composite sites that were markedly different from the individual TF’s motifs. Together, our results indicate that the DNA molecule commonly plays an active role in cooperative interactions that define the gene regulatory lexicon.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: CAP-SELEX reveals DNA-mediated TF–TF interactions.**

**Figure 2: Overlapping composite TF motifs with novel specificity.**

**Figure 3: All identified TF–TF interactions.**

**Figure 4: Structural validation of TF–TF interactions.**

Simultaneous single-cell three-dimensional genome and gene expression profiling uncovers dynamic enhancer connectivity underlying olfactory receptor choice

Article Open access 15 April 2024

Improving prime editing with an endogenous small RNA-binding protein

Article Open access 03 April 2024

Nuclear mRNA decay: regulatory networks that control gene expression

Article 18 April 2024

Accession codes

Primary accessions

European Nucleotide Archive

PRJEB7934

Protein Data Bank

Data deposits

Sequencing reads are deposited to European Nucleotide Archive (accession PRJEB7934). The atomic coordinates and diffraction data are deposited to Protein Data Bank (accession 4XRM, 5BNG and 4XRS). All computer programs and scripts used are either published or available upon request.

References

Shlyueva, D., Stampfel, G. & Stark, A. Transcriptional enhancers: from properties to genome-wide predictions. Nature Rev. Genet. 15, 272–286 (2014)
CAS PubMed Google Scholar
Levo, M. & Segal, E. In pursuit of design principles of regulatory sequences. Nature Rev. Genet. 15, 453–468 (2014)
CAS PubMed Google Scholar
Slattery, M. et al. Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 39, 381–399 (2014)
CAS PubMed PubMed Central Google Scholar
Rodda, D. J. et al. Transcriptional regulation of Nanog by OCT4 and SOX2. J. Biol. Chem. 280, 24731–24737 (2005)
CAS PubMed Google Scholar
Panne, D., Maniatis, T. & Harrison, S. C. An atomic model of the interferon-beta enhanceosome. Cell 129, 1111–1123 (2007)
CAS PubMed PubMed Central Google Scholar
De Val, S. et al. Combinatorial regulation of endothelial gene expression by Ets and Forkhead transcription factors. Cell 135, 1053–1064 (2008)
CAS PubMed PubMed Central Google Scholar
Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009)
ADS CAS PubMed PubMed Central Google Scholar
Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013)
CAS PubMed Google Scholar
Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A. & Luscombe, N. M. A census of human transcription factors: function, expression and evolution. Nature Rev. Genet. 10, 252–263 (2009)
CAS PubMed Google Scholar
Najafabadi, H. S. et al. C2H2 zinc finger proteins greatly expand the human regulatory lexicon. Nature Biotechnol. 33, 555–562 (2015)
CAS Google Scholar
Emery, P. et al. A consensus motif in the RFX DNA binding domain and binding domain mutants with altered specificity. Mol. Cell. Biol. 16, 4486–4494 (1996)
CAS PubMed PubMed Central Google Scholar
Kurokawa, R. et al. Differential orientations of the DNA-binding domain and carboxy-terminal dimerization interface regulate binding site selection by nuclear receptor heterodimers. Genes Dev. 7, 1423–1435 (1993)
CAS PubMed Google Scholar
Mohibullah, N., Donner, A., Ippolito, J. A. & Williams, T. SELEX and missing phosphate contact analyses reveal flexibility within the AP-2α protein: DNA binding complex. Nucleic Acids Res. 27, 2760–2769, (1999)
CAS PubMed PubMed Central Google Scholar
Slattery, M. et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147, 1270–1282 (2011)
CAS PubMed PubMed Central Google Scholar
Grove, C. A. et al. A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell 138, 314–327 (2009)
CAS PubMed PubMed Central Google Scholar
Nitta, K. R . et al. Conservation of transcription factor binding specificities across 600 million years of bilateria evolution. eLife 4, e04837 (2015)
PubMed Central Google Scholar
Kim, S. et al. Probing allostery through DNA. Science 339, 816–819 (2013)
ADS CAS PubMed PubMed Central Google Scholar
Rhee, H. S. & Pugh, B. F. Comprehensive genome-wide protein–DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419 (2011)
CAS PubMed PubMed Central Google Scholar
Yan, J. et al. Transcription factor binding in human cells occurs in dense clusters formed around cohesin anchor sites. Cell 154, 801–813 (2013)
CAS PubMed Google Scholar
Guturu, H., Doxey, A. C., Wenger, A. M. & Bejerano, G. Structure-aided prediction of mammalian transcription factor complexes in conserved non-coding elements. Phil. Trans. R. Soc. Lond. B 368, 20130029 (2013)
Google Scholar
Wei, G. H. et al. Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo. EMBO J. 29, 2147–2160 (2010)
CAS PubMed PubMed Central Google Scholar
Mirny, L. A. Nucleosome-mediated cooperativity between transcription factors. Proc. Natl Acad. Sci. USA 107, 22534–22539 (2010)
ADS CAS PubMed PubMed Central Google Scholar
Wasson, T. & Hartemink, A. J. An ensemble model of competitive multi-factor binding of the genome. Genome Res. 19, 2101–2112 (2009)
CAS PubMed PubMed Central Google Scholar
Poss, Z. C., Ebmeier, C. C. & Taatjes, D. J. The Mediator complex and transcription regulation. Crit. Rev. Biochem. Mol. Biol. 48, 575–608 (2013)
CAS PubMed PubMed Central Google Scholar
Kagey, M. H. et al. Mediator and cohesin connect gene expression and chromatin architecture. Nature 467, 430–435 (2010)
ADS CAS PubMed PubMed Central Google Scholar
Nishizawa, M. & Nagata, S. cDNA clones encoding leucine-zipper proteins which interact with G-CSF gene promoter element 1-binding protein. FEBS Lett. 299, 36–38 (1992)
CAS PubMed Google Scholar
Shen, W. F. et al. AbdB-like Hox proteins stabilize DNA binding by the Meis1 homeodomain proteins. Mol. Cell. Biol. 17, 6448–6458 (1997)
CAS PubMed PubMed Central Google Scholar
Williams, D. C., Jr, Cai, M. & Clore, G. M. Molecular basis for synergistic transcriptional activation by Oct1 and Sox2 revealed from the solution structure of the 42-kDa Oct1·Sox2·Hoxb1–DNA ternary transcription factor complex. J. Biol. Chem. 279, 1449–1457 (2004)
CAS PubMed Google Scholar
Cohen, S. X. et al. Structure of the GCM domain–DNA complex: a DNA-binding domain with a novel fold and mode of target site recognition. EMBO J. 22, 1835–1845 (2003)
CAS PubMed PubMed Central Google Scholar
Mo, Y., Vaessen, B., Johnston, K. & Marmorstein, R. Structure of the Elk-1–DNA complex reveals how DNA-distal residues affect ETS domain recognition of DNA. Nature Struct. Mol. Biol. 7, 292–297 (2000)
CAS Google Scholar
Jolma, A. et al. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 20, 861–873 (2010)
CAS PubMed PubMed Central Google Scholar
Katainen, R. et al. CTCF/cohesion-binding sites are frequently mutated in cancer. Nature Genetics 47, 818–821 (2015)
CAS PubMed Google Scholar
Guo, Y. et al. Discovering homotypic binding events at high spatial resolution. Bioinformatics 26, 3028–3034 (2010)
CAS PubMed PubMed Central Google Scholar
Passner, J. M., Ryoo, H. D., Shen, L., Mann, R. S. & Aggarwal, A. K. Structure of a DNA-bound Ultrabithorax-Extradenticle homeodomain complex. Nature 397, 714–719 (1999)
ADS CAS PubMed Google Scholar
Vincentelli, R. et al. High-throughput protein expression screening and purification in Escherichia coli. Methods 55, 65–72 (2011)
CAS PubMed Google Scholar
Keshava Prasad, T. S. et al. Human Protein Reference Database–2009 update. Nucleic Acids Res. 37, D767–D772 (2009)
CAS PubMed Google Scholar
Ravasi, T. et al. An atlas of combinatorial transcriptional regulation in mouse and man. Cell 140, 744–752 (2010)
CAS PubMed Google Scholar
Newman, J. R. & Keating, A. E. Comprehensive identification of human bZIP interactions with coiled-coil arrays. Science 300, 2097–2101 (2003)
ADS CAS PubMed Google Scholar
Klemm, J. D. & Pabo, C. O. Oct-1 POU domain–DNA interactions: cooperative binding of isolated subdomains and effects of covalent linkage. Genes Dev. 10, 27–36 (1996)
CAS PubMed Google Scholar
Panne, D., Maniatis, T. & Harrison, S. C. Crystal structure of ATF-2/c-Jun and IRF-3 bound to the interferon-β enhancer. EMBO J. 23, 4384–4393 (2004)
CAS PubMed PubMed Central Google Scholar
Rigaut, G. et al. A generic protein purification method for protein complex characterization and proteome exploration. Nature Biotechnol. 17, 1030–1032 (1999)
CAS Google Scholar
Hallikas, O. et al. Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell 124, 47–59 (2006)
CAS PubMed Google Scholar
Winn, M. D. et al. Overview of the CCP4 suite and current developments. Acta Crystallogr. D 67, 235–242 (2011)
CAS PubMed PubMed Central Google Scholar
Moorman, C. et al. Hotspots of transcription factor colocalization in the genome of Drosophila melanogaster. Proc. Natl Acad. Sci. USA 103, 12027–12032 (2006)
ADS CAS PubMed PubMed Central Google Scholar
Yip, K. Y. et al. Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol. 13, R48 (2012)
CAS PubMed PubMed Central Google Scholar
Meireles-Filho, A. C., Bardet, A. F., Yanez-Cuna, J. O., Stampfel, G. & Stark, A. cis-regulatory requirements for tissue-specific programs of the circadian clock. Curr. Biol. 24, 1–10 (2014)
CAS PubMed Google Scholar
Löytynoja, A. & Goldman, N. An algorithm for progressive multiple alignment of sequences with insertions. Proc. Natl Acad. Sci. USA 102, 10557–10562 (2005)
ADS PubMed PubMed Central Google Scholar
Pape, U. J., Rahmann, S. & Vingron, M. Natural similarity measures between position frequency matrices with an application to clustering. Bioinformatics 24, 350–357 (2008)
CAS PubMed Google Scholar
Odrowaz, Z. & Sharrocks, A. D. The ETS transcription factors ELK1 and GABPA regulate different gene networks to control MCF10A breast epithelial cell migration. PLoS ONE 7, e49892 (2012)
ADS CAS PubMed PubMed Central Google Scholar
Huang, Q. et al. A prostate cancer susceptibility allele at 6q22 increases RFX6 expression by modulating HOXB13 chromatin binding. Nature Genet. 46, 126–135, 10.1038/ng.2862 (2014)
Article CAS PubMed Google Scholar
Huang, Y. et al. Identification and characterization of Hoxa9 binding sites in hematopoietic cells. Blood 119, 388–398 (2012)
CAS PubMed PubMed Central Google Scholar
Penkov, D. et al. Analysis of the DNA-binding profile and function of TALE homeoproteins reveals their specialization and specific interactions with Hox genes/proteins. Cell Reports 3, 1321–1333, (2013)
CAS PubMed Google Scholar
Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011)
CAS PubMed PubMed Central Google Scholar
Korhonen, J., Martinmaki, P., Pizzi, C., Rastas, P. & Ukkonen, E. MOODS: fast search for position weight matrix matches in DNA sequences. Bioinformatics 25, 3181–3182 (2009)
CAS PubMed PubMed Central Google Scholar
Garber, M. et al. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 25, i54–i62 (2009)
ADS CAS PubMed PubMed Central Google Scholar
Blanchette, M. et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14, 708–715 (2004)
CAS PubMed PubMed Central Google Scholar
Savitsky, P. et al. High-throughput production of human proteins for crystallization: the SGC experience. J. Struct. Biol. 172, 3–13 (2010)
CAS PubMed PubMed Central Google Scholar
Bourenkov, G. P. & Popov, A. N. A quantitative approach to data-collection strategies. Acta Crystallogr. D 62, 58–64 (2006)
PubMed Google Scholar
Kabsch, W. Xds. Acta Crystallogr. D 66, 125–132 (2010)
CAS PubMed PubMed Central Google Scholar
Collaborative Computational Project Number 4. The CCP4 suite: programs for protein crystallography. Acta Crystallogr. D 50, 760–763 (1994)
McCoy, A. J. et al. Phaser crystallographic software. J. Appl. Crystallogr. 40, 658–674 (2007)
CAS PubMed PubMed Central Google Scholar
Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D 66, 213–221 (2010)
CAS PubMed PubMed Central Google Scholar
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D 66, 486–501 (2010)
CAS PubMed PubMed Central Google Scholar
Fitzsimmons, D. et al. Pax-5 (BSAP) recruits Ets proto-oncogene family proteins to form functional ternary complexes on a B-cell-specific promoter. Genes Dev. 10, 2198–2211 (1996)
CAS PubMed Google Scholar
Kim, J. J. et al. Regulation of insulin-like growth factor binding protein-1 promoter activity by FKHR and HOXA10 in primate endometrial cells. Biol. Reprod. 68, 24–30 (2003)
CAS PubMed Google Scholar
Vinson, C. R., Hai, T. & Boyd, S. M. Dimerization specificity of the leucine zipper-containing bZIP motif on DNA binding: prediction and rational design. Genes Dev. 7, 1047–1058 (1993)
CAS PubMed Google Scholar
Williams, T. M., Williams, M. E. & Innis, J. W. Range of HOX/TALE superclass associations and protein domain requirements for HOXA13:MEIS interaction. Dev. Biol. 277, 457–471 (2005)
CAS PubMed Google Scholar
Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nature Biotechnol. 30, 271–277 (2012)
CAS Google Scholar
Raveh-Sadka, T. et al. Manipulating nucleosome disfavoring sequences allows fine-tune regulation of gene expression in yeast. Nature Genet. 44, 743–750 (2012)
CAS PubMed Google Scholar
Hochschild, A. & Ptashne, M. Cooperative binding of λ repressors to sites separated by integral turns of the DNA helix. Cell 44, 681–687 (1986)
CAS PubMed Google Scholar
Moretti, R. et al. Targeted chemical wedges reveal the role of allosteric DNA modulation in protein–DNA assembly. ACS Chem. Biol. 3, 220–229 (2008)
CAS PubMed PubMed Central Google Scholar
Aggarwal, A. K., Rodgers, D. W., Drottar, M., Ptashne, M. & Harrison, S. C. Recognition of a DNA operator by the repressor of phage 434: a view at high resolution. Science 242, 899–907 (1988)
ADS CAS PubMed Google Scholar
Jordan, S. R. & Pabo, C. O. Structure of the lambda complex at 2.5Å resolution: details of the repressor–operator interactions. Science 242, 893–899 (1988)
ADS CAS PubMed Google Scholar
Rohs, R. et al. Origins of specificity in protein–DNA recognition. Annu. Rev. Biochem. 79, 233–269 (2010)
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank J. Yan, B. Schmierer, E. Kaasinen, C. Daub, E. Haapaniemi and Å. Kolterud for their review of the manuscript, the Karolinska Institutet protein science facility for protein purification, and S. Augsten, L. Hu and A. Zetterlund for technical assistance. This work was supported by Finnish Academy CoE in Cancer Genetics, Center for Innovative Medicine, Knut and Alice Wallenberg and Göran Gustafsson Foundations and Vetenskapsrådet.

Author information

Authors and Affiliations

Department of Biosciences and Nutrition, Karolinska Institutet, SE 141 83, Sweden
Arttu Jolma, Yimeng Yin, Kazuhiro R. Nitta, Kashyap Dave, Minna Taipale, Martin Enge, Ekaterina Morgunova & Jussi Taipale
European Synchrotron Radiation Facility, Grenoble, 38043, France
Alexander Popov
Genome-Scale Biology Program, University of Helsinki, P.O. Box 63, FI-00014, Finland
Teemu Kivioja & Jussi Taipale

Authors

Arttu Jolma
View author publications
You can also search for this author in PubMed Google Scholar
Yimeng Yin
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhiro R. Nitta
View author publications
You can also search for this author in PubMed Google Scholar
Kashyap Dave
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Popov
View author publications
You can also search for this author in PubMed Google Scholar
Minna Taipale
View author publications
You can also search for this author in PubMed Google Scholar
Martin Enge
View author publications
You can also search for this author in PubMed Google Scholar
Teemu Kivioja
View author publications
You can also search for this author in PubMed Google Scholar
Ekaterina Morgunova
View author publications
You can also search for this author in PubMed Google Scholar
Jussi Taipale
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.J. and J.T. designed the experiments. A.J. and Y.Y. performed CAP-SELEX, K.D. performed ChIP-exo, and M.T. the sequencing analyses. A.J., K.R.N., T.K., M.E. and J.T. wrote computer programs for the analyses. E.M. and A.P. performed X-ray crystallography, and E.M. solved the structures. A.J., K.R.N. and E.M. prepared illustrations and A.J., Y.Y. and J.T. wrote the article. All authors contributed to data analysis and reviewed the manuscript.

Corresponding author

Correspondence to Jussi Taipale.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 CAP-SELEX data analysis and comparison to previous data.

a, Flowchart of CAP-SELEX data analysis. Left, a library of selection ligands with random sequences (yellow) is incubated with TFs. After CAP-SELEX, enriched individual TF motifs (1°; arrows) and composite motifs that are not simply combinations of the individual motifs (2°; green dots) are detected from the reads. To detect preferential spacings and orientations of the TF pair (3°), co-occurrence of the indicative 6-mer subsequences (arrowheads) are counted from the reads. The subsequences are then used to generate seeds for the PWM models (right). Heatmap (bottom right; scale divided by highest observed count) shows frequency of occurrence of the two 6-mers (CCGGAA, red arrowhead; CATTCC, black arrowhead) in all possible spacings (columns) and orientations (rows). Note that the 6-mer based approach cannot model the composite site, but identifies a strong case of cooperativity where the ERG 6-mer CCGGAA is followed by the TEAD4 6-mer CATTCC site with an 8 bp gap. Logo of the PWM for this site is also shown. b, Comparison between CAP-SELEX PWMs and previously characterized specificities for the indicated TF pairs. This method has been used previously and its references are also indicated. CAP-SELEX models also shown in Fig. 1 are indicated by asterisks. Note that four out of five of the CAP-SELEX models are similar to the previously identified consensus sequences. The exception is ELK1–PAX5 consensus, that matches poorly both the CAP-SELEX motif and individual motifs for ELK1 and PAX5 (not shown). c, CAP-SELEX PWMs for TF pairs known to interact at protein level. Method used to identify the protein–protein interaction and its reference are also shown^{6,26,27,28,37,38,64,65,66,67}.

Extended Data Figure 2 Pairwise interaction matrix between TFs.

Columns indicate TF1 proteins, and rows TF2 proteins, subjected to the first and second affinity purifications, respectively. Pairs of TFs with a single spacing and orientation preference are indicated in dark green, and pairs with multiple preferred configurations in light green. White boxes indicate pairs that displayed weak or no interaction, and grey boxes cases where robust preference data was not recovered. Previously known interacting TF-pairs are indicated by a yellow outline (see Extended Data Fig. 1). Histograms show the counts for the interactions for each TF. Only TFs for which at least one clear interaction or independent binding was identified are included. The importance of including DNA in the interaction assay is highlighted by the fact that only four and five of the interactions detected are among those observed between 762 human or 877 mouse TF pairs identified using protein–protein interaction assays³⁷, or compiled from literature³⁶, respectively.

Extended Data Figure 3 CAP-SELEX reproducibility.

a, Replicate analysis of more than two hundred of the generated PWMs. The same seeds that had been used to generate PWMs for the primary experiments were used to seed new PWMs from the replicates. Left, red bars on the left show the percentage of the PWM pairs that are similar at the indicated cut-offs (measured as SSTAT covariance^8,48). The highest threshold is the same used for identifying the dominating set of PWMs. Blue bars indicate fraction of all replicate PWMs that are similar using the same cut-off. Right, dendrogram and barcode logos of all PWM pairs. Plot in the middle shows fraction of reads included in the same models in replicate 1 and 2. b, Validation of the CAP-SELEX analysis using shortened TF constructs (DBD+) by HT-SELEX using full-length protein mixtures (full length). Note that the same orientation and spacing is preferred in all but one of the cases. In one case (bottom), full-length proteins show the highest preference to a different spacing than that observed in CAP-SELEX; even in this case, the second-most preferred spacing is the one identified using CAP-SELEX.

Extended Data Figure 4 Long-range cooperativity.

Many experiments where TFs bound sites that were relatively far apart showed preferential binding to sites that are separated by approximately nine to ten bases. Heatmap (maximum count set to 1) representations showing frequency of occurrence of the representative 6-mers for TF pairs in all possible spacings (columns) and orientations (rows). a, Replicate experiment of GCM1 (black arrowhead) and MAX (red ball) pair show very similar preference for cooperatively bound representative 6-mers (see Supplementary Table 1). While one of the orientations shows preference for a single spacing, the second has two preferentially recognized regions separated by ~9 bp. b, TEAD4–CEBPB pair shows a similar ~9 bp separation between three regions of preferred spacings (brackets). c, Very deep sequencing of the unselected input ligand does not show the same preference, instead counts decrease linearly as a function of gap length (due to decreasing number of available positions in the 40N random sequence). The mode of cooperativity seen in a and b appears similar to that reported by Kim et al.¹⁷. In addition to high-affinity sites, lower affinity spacings and orientations between TF pairs could be employed in fine-tuned transcriptional responses (see refs. 68, 69).

Extended Data Figure 5 CAP-SELEX motifs are conserved and enhance prediction of in vivo peaks.

a, Pie chart showing the frequency of DNA-mediated, DNA-facilitated and potentially protein–protein interaction mediated heterodimers in the CAP-SELEX data set. Cooperativity between TFs can result from direct contacts between the proteins (protein-mediated), DNA-facilitated protein contacts (DNA-facilitated) or arise indirectly from DNA-mediated interactions^{17,34,39,40,70,71}. The last type of cooperativity is caused by the DBD binding-induced changes in DNA shape, and do not involve other domains or direct contact between the proteins^17,39,40. The dimers were classified to DNA-mediated, DNA-facilitated and potentially protein–protein interaction mediated classes manually, based on structural models shown in Supplementary Data Set 2. b, Conservation of the genomic sites recognized by the CAP-SELEX identified heterodimeric motifs (left) compared to monomeric and homodimeric sites identified by HT-SELEX (right, motifs from ref. 8). For each motif, ten thousand non-overlapping highest affinity sites within human constrained non-coding regions recognized by the motif or one its control motifs (see Methods for details) were selected and their conservation was tested. The fold enrichment (y axis), that is, the fraction of conserved sites among the motif sites divided by the fraction of conserved sites among the control motif sites, is shown as a function of the number of conserved motif sites among the top ten thousand sites (x axis). The motifs that are significantly conserved (multiple testing adjusted P value <0.05) are marked green. Five motifs with lowest P values are also indicated. Note that ~50% of the HT-SELEX and ~25% of the CAP-SELEX motifs are conserved above the significance threshold. c, Inclusion of heterodimeric motifs improves prediction of ChIP-seq peaks. Left, the error rate of prediction of ChIP-seq peak positions using either the monomer motifs and CAP-SELEX dimers (light grey), or monomer motifs and control motifs where the partner of the indicated TF is reversed but not complemented (dark grey) are shown. Note that inclusion of the correct heterodimeric motifs decreases the prediction error rate in the cases of HOXB13 and MEIS1. The relatively modest effect is likely due to the fact that only a subset of heterodimers were identified in our study, and that ChIP-seq peak positions are also influenced by other factors such as nucleosome binding and chromatin structure. Right, number of PWM matches as a function of distance from HOXB13 ChIP-exo peaks. Note that using the original heterodimer motifs clearly outperforms the control motifs.

Extended Data Figure 6 Heterodimers where the individual TF core recognition sites appear to overlap.

a, Composite site formation alters specificity at bases flanking the core TF site. TFs often directly read specific ‘core’ sequence motifs via hydrogen bonding to DNA bases. The sequences flanking this core are commonly read indirectly, through contacts to the sugar and phosphate backbone of DNA^72,73,74. The backbone contacts specify a preferred DNA conformation, which then leads to a preference of a sequence that is optimal for stacking interactions between consecutive base pairs (reviewed in ref. 74). Figure shows summary of base positions whose specificity is affected in all composite sites identified in this study for the indicated TFs. Note that the bases comprising the core motif that is recognized by direct hydrogen bonds to the DNA bases are not commonly affected by heterodimer formation. In contrast, specificity at flanking positions that are recognized by contacts to the sugar or phosphate backbone of DNA are commonly altered by binding of the heterodimer partner. Hydrogen bonds contacts were determined based on the indicated (refs 29 and 30) or homologous TF structures (see Supplementary Table 3). b, A base (arrow) can be contacted both from the side of the major groove (black dot; G contacted by GCM1) and the minor groove (white dot; C contacted by DRGX homeodomain). c, A TF that can bind to a homodimeric site appears instead to bind as a heterodimer. A composite site is shown where HOXB2 appears to form a heterodimer with a monomer of RFX5. d, In some cases, the binding positions of the TFs cannot be unambigiously assigned based on the composite recognition sequence. In a, the annotation of hydrogen bond contacts is as described in main Fig. 2; in b–d, the major groove contacts of the left and right TFs are indicated in black and red dots, respectively.

Extended Data Figure 7 Specificities of individual TFs and heterodimer pairs.

Dendrogram shows motif similarities between the representative heterodimer and monomer motifs. Heterodimer models are indicated by green bars. Barcode logos for each factor are also shown. Centre of dendrogram shows the colour key for the TF families.

Extended Data Figure 8 Comparison of CAP-SELEX models to models inferred from conserved genomic sequences.

a, Motifs that are very similar to the CAP-SELEX motifs are enriched and conserved. A previous study by Guturu et al.²⁰ made structural models for pairs of TFs to identify sterically possible configurations and predict sites that could be bound by such complexes. Enrichment of those target sites were then quantified in evolutionarily conserved noncoding regions over nonconserved control regions to infer putative target sites for cooperatively binding TFs. Pie chart shows comparison of top 100 most significant target sites predicted²⁰ to all heterodimeric PWMs generated in this study. 15 PWMs showed clear similarity to our heterodimeric PWMs (upper right, dark green slice), 8 were partially similar (green) and further 5 had enrichment for the site but under the threshold used in our study. We did not detect 25 motifs, and for 14 potential pairs, we identified a different spacing and orientation. This result is expected as we did not test all potential TF–TF pairs, and many TFs that bind to similar monomer sites prefer different dimer spacings and orientations. Finally, of the 100 Guturu et al.²⁰ top motifs, 33 were not analysed in our study (14 were homodimeric and no possible pair was tested for 19; for example, three of the pairs were predicted for pairs with a SMAD TF, and no SMAD TFs were tested in our study). b, Comparison of the 15 (top) and 8 (bottom, boxed) matching and partially matching PWMs, respectively.

Extended Data Figure 9 Detailed view of MEIS1 and MEIS1–DLX3 structures.

a, Contacts between MEIS1 (cyan) and DNA. b, Contacts between DLX3 (magenta) and DNA. c, Comparison of the DNA structures in MEIS1 homodimer (cyan) and MEIS1–DLX3 heterodimer (magenta). Note that the DNA bound to the heterodimer is more distorted.

Supplementary information

Supplementary Tables

This file contains Supplementary Tables 1-5 as follows: 1) Sequence information for Protein and DNA molecules used in the assay; 2) PWM models; 3) Hydrogen bond contact annotated PWMs; 4) Numeric data and 5) X-ray Structure statistics. (XLSX 849 kb)

Supplementary Data 1 - Logos for all heterodimeric TF complexes and the corresponding monomers

Figures show the generated PWMs organized according to family-wise PWM similarity dendrograms (distance metric is SSTAT covariance). Families with large number PWMs have been split to show individual branches on their own pages, in such cases, the part of the branch shown is indicated by gray shading of the full dendrogram shown on the left side of each page. Similar motifs (< 30 distance units, dotted vertical line) recognized by pairs of paralogous TFs are indicated by orange colour of the rightmost branches, and PWMs generated from replicate experiments are indicated by green lines on the right side of the TF names. TF pairs displaying latent specificity or assimilation of binding specificities are indicated with blue and black boxes, respectively. Binding specificities of individual TFs were taken from our previous work⁸ or determined here using the previously described protocol (23 TFs; see Supplementary Table 2). (PDF 13217 kb)

Supplementary Data 2 - Analysis based on existing structures

Figures show structural models for the detected TF-TF pairs bound to DNA, or X-ray structures for the actual dimeric complex in the few cases where data was available (indicated with boxes). Structural models are based on DNA sequence alignment followed by superimposition of existing crystal structures of the same or orthologous TF into B-DNA models of the CAP-SELEX determined consensus sequences, PDB entry names of the used crystal structures are indicated below the TF names. (PDF 27745 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Fig. 4

Source data

Source data to Fig. 1

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jolma, A., Yin, Y., Nitta, K. et al. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature 527, 384–388 (2015). https://doi.org/10.1038/nature15518

Download citation

Received: 23 January 2015
Accepted: 24 August 2015
Published: 09 November 2015
Issue Date: 19 November 2015
DOI: https://doi.org/10.1038/nature15518

This article is cited by

Hold out the genome: a roadmap to solving the cis-regulatory code
- Carl G. de Boer
- Jussi Taipale
Nature (2024)
Epigenetic profiling reveals key genes and cis-regulatory networks specific to human parathyroids
- Youngsook Lucy Jung
- Wenping Zhao
- Michael Mannstadt
Nature Communications (2024)
Transcription factor binding site orientation and order are major drivers of gene regulatory activity
- Ilias Georgakopoulos-Soares
- Chengyu Deng
- Nadav Ahituv
Nature Communications (2023)
Inhibition of demethylase by IOX1 modulates chromatin accessibility to enhance NSCLC radiation sensitivity through attenuated PIF1
- Qian Li
- Kexin Qin
- Lijun Wu
Cell Death & Disease (2023)
Double DAP-seq uncovered synergistic DNA binding of interacting bZIP transcription factors
- Miaomiao Li
- Tao Yao
- Shao-shan Carol Huang
Nature Communications (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.