Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

DNA-dependent formation of transcription factor pairs alters their binding specificity


Gene expression is regulated by transcription factors (TFs), proteins that recognize short DNA sequence motifs1,2,3. Such sequences are very common in the human genome, and an important determinant of the specificity of gene expression is the cooperative binding of multiple TFs to closely located motifs4,5,6. However, interactions between DNA-bound TFs have not been systematically characterized. To identify TF pairs that bind cooperatively to DNA, and to characterize their spacing and orientation preferences, we have performed consecutive affinity-purification systematic evolution of ligands by exponential enrichment (CAP-SELEX) analysis of 9,400 TF–TF–DNA interactions. This analysis revealed 315 TF–TF interactions recognizing 618 heterodimeric motifs, most of which have not been previously described. The observed cooperativity occurred promiscuously between TFs from diverse structural families. Structural analysis of the TF pairs, including a novel crystal structure of MEIS1 and DLX3 bound to their identified recognition site, revealed that the interactions between the TFs were predominantly mediated by DNA. Most TF pair sites identified involved a large overlap between individual TF recognition motifs, and resulted in recognition of composite sites that were markedly different from the individual TF’s motifs. Together, our results indicate that the DNA molecule commonly plays an active role in cooperative interactions that define the gene regulatory lexicon.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: CAP-SELEX reveals DNA-mediated TF–TF interactions.
Figure 2: Overlapping composite TF motifs with novel specificity.
Figure 3: All identified TF–TF interactions.
Figure 4: Structural validation of TF–TF interactions.

Accession codes

Primary accessions

European Nucleotide Archive

Protein Data Bank

Data deposits

Sequencing reads are deposited to European Nucleotide Archive (accession PRJEB7934). The atomic coordinates and diffraction data are deposited to Protein Data Bank (accession 4XRM, 5BNG and 4XRS). All computer programs and scripts used are either published or available upon request.


  1. 1

    Shlyueva, D., Stampfel, G. & Stark, A. Transcriptional enhancers: from properties to genome-wide predictions. Nature Rev. Genet. 15, 272–286 (2014)

    CAS  PubMed  Google Scholar 

  2. 2

    Levo, M. & Segal, E. In pursuit of design principles of regulatory sequences. Nature Rev. Genet. 15, 453–468 (2014)

    CAS  PubMed  Google Scholar 

  3. 3

    Slattery, M. et al. Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 39, 381–399 (2014)

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4

    Rodda, D. J. et al. Transcriptional regulation of Nanog by OCT4 and SOX2. J. Biol. Chem. 280, 24731–24737 (2005)

    CAS  PubMed  Google Scholar 

  5. 5

    Panne, D., Maniatis, T. & Harrison, S. C. An atomic model of the interferon-beta enhanceosome. Cell 129, 1111–1123 (2007)

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6

    De Val, S. et al. Combinatorial regulation of endothelial gene expression by Ets and Forkhead transcription factors. Cell 135, 1053–1064 (2008)

    CAS  PubMed  PubMed Central  Google Scholar 

  7. 7

    Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  8. 8

    Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013)

    CAS  PubMed  Google Scholar 

  9. 9

    Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A. & Luscombe, N. M. A census of human transcription factors: function, expression and evolution. Nature Rev. Genet. 10, 252–263 (2009)

    CAS  PubMed  Google Scholar 

  10. 10

    Najafabadi, H. S. et al. C2H2 zinc finger proteins greatly expand the human regulatory lexicon. Nature Biotechnol. 33, 555–562 (2015)

    CAS  Google Scholar 

  11. 11

    Emery, P. et al. A consensus motif in the RFX DNA binding domain and binding domain mutants with altered specificity. Mol. Cell. Biol. 16, 4486–4494 (1996)

    CAS  PubMed  PubMed Central  Google Scholar 

  12. 12

    Kurokawa, R. et al. Differential orientations of the DNA-binding domain and carboxy-terminal dimerization interface regulate binding site selection by nuclear receptor heterodimers. Genes Dev. 7, 1423–1435 (1993)

    CAS  PubMed  Google Scholar 

  13. 13

    Mohibullah, N., Donner, A., Ippolito, J. A. & Williams, T. SELEX and missing phosphate contact analyses reveal flexibility within the AP-2α protein: DNA binding complex. Nucleic Acids Res. 27, 2760–2769, (1999)

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14

    Slattery, M. et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147, 1270–1282 (2011)

    CAS  PubMed  PubMed Central  Google Scholar 

  15. 15

    Grove, C. A. et al. A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell 138, 314–327 (2009)

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16

    Nitta, K. R . et al. Conservation of transcription factor binding specificities across 600 million years of bilateria evolution. eLife 4, e04837 (2015)

    PubMed Central  Google Scholar 

  17. 17

    Kim, S. et al. Probing allostery through DNA. Science 339, 816–819 (2013)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  18. 18

    Rhee, H. S. & Pugh, B. F. Comprehensive genome-wide protein–DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419 (2011)

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19

    Yan, J. et al. Transcription factor binding in human cells occurs in dense clusters formed around cohesin anchor sites. Cell 154, 801–813 (2013)

    CAS  PubMed  Google Scholar 

  20. 20

    Guturu, H., Doxey, A. C., Wenger, A. M. & Bejerano, G. Structure-aided prediction of mammalian transcription factor complexes in conserved non-coding elements. Phil. Trans. R. Soc. Lond. B 368, 20130029 (2013)

    Google Scholar 

  21. 21

    Wei, G. H. et al. Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo. EMBO J. 29, 2147–2160 (2010)

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22

    Mirny, L. A. Nucleosome-mediated cooperativity between transcription factors. Proc. Natl Acad. Sci. USA 107, 22534–22539 (2010)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  23. 23

    Wasson, T. & Hartemink, A. J. An ensemble model of competitive multi-factor binding of the genome. Genome Res. 19, 2101–2112 (2009)

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24

    Poss, Z. C., Ebmeier, C. C. & Taatjes, D. J. The Mediator complex and transcription regulation. Crit. Rev. Biochem. Mol. Biol. 48, 575–608 (2013)

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25

    Kagey, M. H. et al. Mediator and cohesin connect gene expression and chromatin architecture. Nature 467, 430–435 (2010)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  26. 26

    Nishizawa, M. & Nagata, S. cDNA clones encoding leucine-zipper proteins which interact with G-CSF gene promoter element 1-binding protein. FEBS Lett. 299, 36–38 (1992)

    CAS  PubMed  Google Scholar 

  27. 27

    Shen, W. F. et al. AbdB-like Hox proteins stabilize DNA binding by the Meis1 homeodomain proteins. Mol. Cell. Biol. 17, 6448–6458 (1997)

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28

    Williams, D. C., Jr, Cai, M. & Clore, G. M. Molecular basis for synergistic transcriptional activation by Oct1 and Sox2 revealed from the solution structure of the 42-kDa Oct1·Sox2·Hoxb1–DNA ternary transcription factor complex. J. Biol. Chem. 279, 1449–1457 (2004)

    CAS  PubMed  Google Scholar 

  29. 29

    Cohen, S. X. et al. Structure of the GCM domain–DNA complex: a DNA-binding domain with a novel fold and mode of target site recognition. EMBO J. 22, 1835–1845 (2003)

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30

    Mo, Y., Vaessen, B., Johnston, K. & Marmorstein, R. Structure of the Elk-1–DNA complex reveals how DNA-distal residues affect ETS domain recognition of DNA. Nature Struct. Mol. Biol. 7, 292–297 (2000)

    CAS  Google Scholar 

  31. 31

    Jolma, A. et al. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 20, 861–873 (2010)

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32

    Katainen, R. et al. CTCF/cohesion-binding sites are frequently mutated in cancer. Nature Genetics 47, 818–821 (2015)

    CAS  PubMed  Google Scholar 

  33. 33

    Guo, Y. et al. Discovering homotypic binding events at high spatial resolution. Bioinformatics 26, 3028–3034 (2010)

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34

    Passner, J. M., Ryoo, H. D., Shen, L., Mann, R. S. & Aggarwal, A. K. Structure of a DNA-bound Ultrabithorax-Extradenticle homeodomain complex. Nature 397, 714–719 (1999)

    ADS  CAS  PubMed  Google Scholar 

  35. 35

    Vincentelli, R. et al. High-throughput protein expression screening and purification in Escherichia coli. Methods 55, 65–72 (2011)

    CAS  PubMed  Google Scholar 

  36. 36

    Keshava Prasad, T. S. et al. Human Protein Reference Database–2009 update. Nucleic Acids Res. 37, D767–D772 (2009)

    CAS  PubMed  Google Scholar 

  37. 37

    Ravasi, T. et al. An atlas of combinatorial transcriptional regulation in mouse and man. Cell 140, 744–752 (2010)

    CAS  PubMed  Google Scholar 

  38. 38

    Newman, J. R. & Keating, A. E. Comprehensive identification of human bZIP interactions with coiled-coil arrays. Science 300, 2097–2101 (2003)

    ADS  CAS  PubMed  Google Scholar 

  39. 39

    Klemm, J. D. & Pabo, C. O. Oct-1 POU domain–DNA interactions: cooperative binding of isolated subdomains and effects of covalent linkage. Genes Dev. 10, 27–36 (1996)

    CAS  PubMed  Google Scholar 

  40. 40

    Panne, D., Maniatis, T. & Harrison, S. C. Crystal structure of ATF-2/c-Jun and IRF-3 bound to the interferon-β enhancer. EMBO J. 23, 4384–4393 (2004)

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41

    Rigaut, G. et al. A generic protein purification method for protein complex characterization and proteome exploration. Nature Biotechnol. 17, 1030–1032 (1999)

    CAS  Google Scholar 

  42. 42

    Hallikas, O. et al. Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell 124, 47–59 (2006)

    CAS  PubMed  Google Scholar 

  43. 43

    Winn, M. D. et al. Overview of the CCP4 suite and current developments. Acta Crystallogr. D 67, 235–242 (2011)

    CAS  PubMed  Google Scholar 

  44. 44

    Moorman, C. et al. Hotspots of transcription factor colocalization in the genome of Drosophila melanogaster. Proc. Natl Acad. Sci. USA 103, 12027–12032 (2006)

    ADS  CAS  PubMed  Google Scholar 

  45. 45

    Yip, K. Y. et al. Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol. 13, R48 (2012)

    CAS  PubMed  PubMed Central  Google Scholar 

  46. 46

    Meireles-Filho, A. C., Bardet, A. F., Yanez-Cuna, J. O., Stampfel, G. & Stark, A. cis-regulatory requirements for tissue-specific programs of the circadian clock. Curr. Biol. 24, 1–10 (2014)

    CAS  PubMed  Google Scholar 

  47. 47

    Löytynoja, A. & Goldman, N. An algorithm for progressive multiple alignment of sequences with insertions. Proc. Natl Acad. Sci. USA 102, 10557–10562 (2005)

    ADS  PubMed  Google Scholar 

  48. 48

    Pape, U. J., Rahmann, S. & Vingron, M. Natural similarity measures between position frequency matrices with an application to clustering. Bioinformatics 24, 350–357 (2008)

    CAS  PubMed  Google Scholar 

  49. 49

    Odrowaz, Z. & Sharrocks, A. D. The ETS transcription factors ELK1 and GABPA regulate different gene networks to control MCF10A breast epithelial cell migration. PLoS ONE 7, e49892 (2012)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  50. 50

    Huang, Q. et al. A prostate cancer susceptibility allele at 6q22 increases RFX6 expression by modulating HOXB13 chromatin binding. Nature Genet. 46, 126–135, 10.1038/ng.2862 (2014)

    Article  CAS  PubMed  Google Scholar 

  51. 51

    Huang, Y. et al. Identification and characterization of Hoxa9 binding sites in hematopoietic cells. Blood 119, 388–398 (2012)

    CAS  PubMed  PubMed Central  Google Scholar 

  52. 52

    Penkov, D. et al. Analysis of the DNA-binding profile and function of TALE homeoproteins reveals their specialization and specific interactions with Hox genes/proteins. Cell Reports 3, 1321–1333, (2013)

    CAS  PubMed  Google Scholar 

  53. 53

    Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011)

    CAS  PubMed  PubMed Central  Google Scholar 

  54. 54

    Korhonen, J., Martinmaki, P., Pizzi, C., Rastas, P. & Ukkonen, E. MOODS: fast search for position weight matrix matches in DNA sequences. Bioinformatics 25, 3181–3182 (2009)

    CAS  PubMed  PubMed Central  Google Scholar 

  55. 55

    Garber, M. et al. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 25, i54–i62 (2009)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  56. 56

    Blanchette, M. et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14, 708–715 (2004)

    CAS  PubMed  PubMed Central  Google Scholar 

  57. 57

    Savitsky, P. et al. High-throughput production of human proteins for crystallization: the SGC experience. J. Struct. Biol. 172, 3–13 (2010)

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58

    Bourenkov, G. P. & Popov, A. N. A quantitative approach to data-collection strategies. Acta Crystallogr. D 62, 58–64 (2006)

    PubMed  Google Scholar 

  59. 59

    Kabsch, W. Xds. Acta Crystallogr. D 66, 125–132 (2010)

    CAS  PubMed  Google Scholar 

  60. 60

    Collaborative Computational Project Number 4. The CCP4 suite: programs for protein crystallography. Acta Crystallogr. D 50, 760–763 (1994)

  61. 61

    McCoy, A. J. et al. Phaser crystallographic software. J. Appl. Crystallogr. 40, 658–674 (2007)

    CAS  PubMed  PubMed Central  Google Scholar 

  62. 62

    Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D 66, 213–221 (2010)

    CAS  Google Scholar 

  63. 63

    Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D 66, 486–501 (2010)

    CAS  PubMed  Google Scholar 

  64. 64

    Fitzsimmons, D. et al. Pax-5 (BSAP) recruits Ets proto-oncogene family proteins to form functional ternary complexes on a B-cell-specific promoter. Genes Dev. 10, 2198–2211 (1996)

    CAS  PubMed  Google Scholar 

  65. 65

    Kim, J. J. et al. Regulation of insulin-like growth factor binding protein-1 promoter activity by FKHR and HOXA10 in primate endometrial cells. Biol. Reprod. 68, 24–30 (2003)

    CAS  PubMed  Google Scholar 

  66. 66

    Vinson, C. R., Hai, T. & Boyd, S. M. Dimerization specificity of the leucine zipper-containing bZIP motif on DNA binding: prediction and rational design. Genes Dev. 7, 1047–1058 (1993)

    CAS  PubMed  Google Scholar 

  67. 67

    Williams, T. M., Williams, M. E. & Innis, J. W. Range of HOX/TALE superclass associations and protein domain requirements for HOXA13:MEIS interaction. Dev. Biol. 277, 457–471 (2005)

    CAS  PubMed  Google Scholar 

  68. 68

    Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nature Biotechnol. 30, 271–277 (2012)

    CAS  Google Scholar 

  69. 69

    Raveh-Sadka, T. et al. Manipulating nucleosome disfavoring sequences allows fine-tune regulation of gene expression in yeast. Nature Genet. 44, 743–750 (2012)

    CAS  PubMed  Google Scholar 

  70. 70

    Hochschild, A. & Ptashne, M. Cooperative binding of λ repressors to sites separated by integral turns of the DNA helix. Cell 44, 681–687 (1986)

    CAS  PubMed  Google Scholar 

  71. 71

    Moretti, R. et al. Targeted chemical wedges reveal the role of allosteric DNA modulation in protein–DNA assembly. ACS Chem. Biol. 3, 220–229 (2008)

    CAS  PubMed  PubMed Central  Google Scholar 

  72. 72

    Aggarwal, A. K., Rodgers, D. W., Drottar, M., Ptashne, M. & Harrison, S. C. Recognition of a DNA operator by the repressor of phage 434: a view at high resolution. Science 242, 899–907 (1988)

    ADS  CAS  PubMed  Google Scholar 

  73. 73

    Jordan, S. R. & Pabo, C. O. Structure of the lambda complex at 2.5Å resolution: details of the repressor–operator interactions. Science 242, 893–899 (1988)

    ADS  CAS  PubMed  Google Scholar 

  74. 74

    Rohs, R. et al. Origins of specificity in protein–DNA recognition. Annu. Rev. Biochem. 79, 233–269 (2010)

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank J. Yan, B. Schmierer, E. Kaasinen, C. Daub, E. Haapaniemi and Å. Kolterud for their review of the manuscript, the Karolinska Institutet protein science facility for protein purification, and S. Augsten, L. Hu and A. Zetterlund for technical assistance. This work was supported by Finnish Academy CoE in Cancer Genetics, Center for Innovative Medicine, Knut and Alice Wallenberg and Göran Gustafsson Foundations and Vetenskapsrådet.

Author information




A.J. and J.T. designed the experiments. A.J. and Y.Y. performed CAP-SELEX, K.D. performed ChIP-exo, and M.T. the sequencing analyses. A.J., K.R.N., T.K., M.E. and J.T. wrote computer programs for the analyses. E.M. and A.P. performed X-ray crystallography, and E.M. solved the structures. A.J., K.R.N. and E.M. prepared illustrations and A.J., Y.Y. and J.T. wrote the article. All authors contributed to data analysis and reviewed the manuscript.

Corresponding author

Correspondence to Jussi Taipale.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 CAP-SELEX data analysis and comparison to previous data.

a, Flowchart of CAP-SELEX data analysis. Left, a library of selection ligands with random sequences (yellow) is incubated with TFs. After CAP-SELEX, enriched individual TF motifs (1°; arrows) and composite motifs that are not simply combinations of the individual motifs (2°; green dots) are detected from the reads. To detect preferential spacings and orientations of the TF pair (3°), co-occurrence of the indicative 6-mer subsequences (arrowheads) are counted from the reads. The subsequences are then used to generate seeds for the PWM models (right). Heatmap (bottom right; scale divided by highest observed count) shows frequency of occurrence of the two 6-mers (CCGGAA, red arrowhead; CATTCC, black arrowhead) in all possible spacings (columns) and orientations (rows). Note that the 6-mer based approach cannot model the composite site, but identifies a strong case of cooperativity where the ERG 6-mer CCGGAA is followed by the TEAD4 6-mer CATTCC site with an 8 bp gap. Logo of the PWM for this site is also shown. b, Comparison between CAP-SELEX PWMs and previously characterized specificities for the indicated TF pairs. This method has been used previously and its references are also indicated. CAP-SELEX models also shown in Fig. 1 are indicated by asterisks. Note that four out of five of the CAP-SELEX models are similar to the previously identified consensus sequences. The exception is ELK1–PAX5 consensus, that matches poorly both the CAP-SELEX motif and individual motifs for ELK1 and PAX5 (not shown). c, CAP-SELEX PWMs for TF pairs known to interact at protein level. Method used to identify the protein–protein interaction and its reference are also shown6,26,27,28,37,38,64,65,66,67.

Extended Data Figure 2 Pairwise interaction matrix between TFs.

Columns indicate TF1 proteins, and rows TF2 proteins, subjected to the first and second affinity purifications, respectively. Pairs of TFs with a single spacing and orientation preference are indicated in dark green, and pairs with multiple preferred configurations in light green. White boxes indicate pairs that displayed weak or no interaction, and grey boxes cases where robust preference data was not recovered. Previously known interacting TF-pairs are indicated by a yellow outline (see Extended Data Fig. 1). Histograms show the counts for the interactions for each TF. Only TFs for which at least one clear interaction or independent binding was identified are included. The importance of including DNA in the interaction assay is highlighted by the fact that only four and five of the interactions detected are among those observed between 762 human or 877 mouse TF pairs identified using protein–protein interaction assays37, or compiled from literature36, respectively.

Extended Data Figure 3 CAP-SELEX reproducibility.

a, Replicate analysis of more than two hundred of the generated PWMs. The same seeds that had been used to generate PWMs for the primary experiments were used to seed new PWMs from the replicates. Left, red bars on the left show the percentage of the PWM pairs that are similar at the indicated cut-offs (measured as SSTAT covariance8,48). The highest threshold is the same used for identifying the dominating set of PWMs. Blue bars indicate fraction of all replicate PWMs that are similar using the same cut-off. Right, dendrogram and barcode logos of all PWM pairs. Plot in the middle shows fraction of reads included in the same models in replicate 1 and 2. b, Validation of the CAP-SELEX analysis using shortened TF constructs (DBD+) by HT-SELEX using full-length protein mixtures (full length). Note that the same orientation and spacing is preferred in all but one of the cases. In one case (bottom), full-length proteins show the highest preference to a different spacing than that observed in CAP-SELEX; even in this case, the second-most preferred spacing is the one identified using CAP-SELEX.

Extended Data Figure 4 Long-range cooperativity.

Many experiments where TFs bound sites that were relatively far apart showed preferential binding to sites that are separated by approximately nine to ten bases. Heatmap (maximum count set to 1) representations showing frequency of occurrence of the representative 6-mers for TF pairs in all possible spacings (columns) and orientations (rows). a, Replicate experiment of GCM1 (black arrowhead) and MAX (red ball) pair show very similar preference for cooperatively bound representative 6-mers (see Supplementary Table 1). While one of the orientations shows preference for a single spacing, the second has two preferentially recognized regions separated by ~9 bp. b, TEAD4–CEBPB pair shows a similar ~9 bp separation between three regions of preferred spacings (brackets). c, Very deep sequencing of the unselected input ligand does not show the same preference, instead counts decrease linearly as a function of gap length (due to decreasing number of available positions in the 40N random sequence). The mode of cooperativity seen in a and b appears similar to that reported by Kim et al.17. In addition to high-affinity sites, lower affinity spacings and orientations between TF pairs could be employed in fine-tuned transcriptional responses (see refs. 68, 69).

Extended Data Figure 5 CAP-SELEX motifs are conserved and enhance prediction of in vivo peaks.

a, Pie chart showing the frequency of DNA-mediated, DNA-facilitated and potentially protein–protein interaction mediated heterodimers in the CAP-SELEX data set. Cooperativity between TFs can result from direct contacts between the proteins (protein-mediated), DNA-facilitated protein contacts (DNA-facilitated) or arise indirectly from DNA-mediated interactions17,34,39,40,70,71. The last type of cooperativity is caused by the DBD binding-induced changes in DNA shape, and do not involve other domains or direct contact between the proteins17,39,40. The dimers were classified to DNA-mediated, DNA-facilitated and potentially protein–protein interaction mediated classes manually, based on structural models shown in Supplementary Data Set 2. b, Conservation of the genomic sites recognized by the CAP-SELEX identified heterodimeric motifs (left) compared to monomeric and homodimeric sites identified by HT-SELEX (right, motifs from ref. 8). For each motif, ten thousand non-overlapping highest affinity sites within human constrained non-coding regions recognized by the motif or one its control motifs (see Methods for details) were selected and their conservation was tested. The fold enrichment (y axis), that is, the fraction of conserved sites among the motif sites divided by the fraction of conserved sites among the control motif sites, is shown as a function of the number of conserved motif sites among the top ten thousand sites (x axis). The motifs that are significantly conserved (multiple testing adjusted P value <0.05) are marked green. Five motifs with lowest P values are also indicated. Note that ~50% of the HT-SELEX and ~25% of the CAP-SELEX motifs are conserved above the significance threshold. c, Inclusion of heterodimeric motifs improves prediction of ChIP-seq peaks. Left, the error rate of prediction of ChIP-seq peak positions using either the monomer motifs and CAP-SELEX dimers (light grey), or monomer motifs and control motifs where the partner of the indicated TF is reversed but not complemented (dark grey) are shown. Note that inclusion of the correct heterodimeric motifs decreases the prediction error rate in the cases of HOXB13 and MEIS1. The relatively modest effect is likely due to the fact that only a subset of heterodimers were identified in our study, and that ChIP-seq peak positions are also influenced by other factors such as nucleosome binding and chromatin structure. Right, number of PWM matches as a function of distance from HOXB13 ChIP-exo peaks. Note that using the original heterodimer motifs clearly outperforms the control motifs.

Extended Data Figure 6 Heterodimers where the individual TF core recognition sites appear to overlap.

a, Composite site formation alters specificity at bases flanking the core TF site. TFs often directly read specific ‘core’ sequence motifs via hydrogen bonding to DNA bases. The sequences flanking this core are commonly read indirectly, through contacts to the sugar and phosphate backbone of DNA72,73,74. The backbone contacts specify a preferred DNA conformation, which then leads to a preference of a sequence that is optimal for stacking interactions between consecutive base pairs (reviewed in ref. 74). Figure shows summary of base positions whose specificity is affected in all composite sites identified in this study for the indicated TFs. Note that the bases comprising the core motif that is recognized by direct hydrogen bonds to the DNA bases are not commonly affected by heterodimer formation. In contrast, specificity at flanking positions that are recognized by contacts to the sugar or phosphate backbone of DNA are commonly altered by binding of the heterodimer partner. Hydrogen bonds contacts were determined based on the indicated (refs 29 and 30) or homologous TF structures (see Supplementary Table 3). b, A base (arrow) can be contacted both from the side of the major groove (black dot; G contacted by GCM1) and the minor groove (white dot; C contacted by DRGX homeodomain). c, A TF that can bind to a homodimeric site appears instead to bind as a heterodimer. A composite site is shown where HOXB2 appears to form a heterodimer with a monomer of RFX5. d, In some cases, the binding positions of the TFs cannot be unambigiously assigned based on the composite recognition sequence. In a, the annotation of hydrogen bond contacts is as described in main Fig. 2; in bd, the major groove contacts of the left and right TFs are indicated in black and red dots, respectively.

Extended Data Figure 7 Specificities of individual TFs and heterodimer pairs.

Dendrogram shows motif similarities between the representative heterodimer and monomer motifs. Heterodimer models are indicated by green bars. Barcode logos for each factor are also shown. Centre of dendrogram shows the colour key for the TF families.

Extended Data Figure 8 Comparison of CAP-SELEX models to models inferred from conserved genomic sequences.

a, Motifs that are very similar to the CAP-SELEX motifs are enriched and conserved. A previous study by Guturu et al.20 made structural models for pairs of TFs to identify sterically possible configurations and predict sites that could be bound by such complexes. Enrichment of those target sites were then quantified in evolutionarily conserved noncoding regions over nonconserved control regions to infer putative target sites for cooperatively binding TFs. Pie chart shows comparison of top 100 most significant target sites predicted20 to all heterodimeric PWMs generated in this study. 15 PWMs showed clear similarity to our heterodimeric PWMs (upper right, dark green slice), 8 were partially similar (green) and further 5 had enrichment for the site but under the threshold used in our study. We did not detect 25 motifs, and for 14 potential pairs, we identified a different spacing and orientation. This result is expected as we did not test all potential TF–TF pairs, and many TFs that bind to similar monomer sites prefer different dimer spacings and orientations. Finally, of the 100 Guturu et al.20 top motifs, 33 were not analysed in our study (14 were homodimeric and no possible pair was tested for 19; for example, three of the pairs were predicted for pairs with a SMAD TF, and no SMAD TFs were tested in our study). b, Comparison of the 15 (top) and 8 (bottom, boxed) matching and partially matching PWMs, respectively.

Extended Data Figure 9 Detailed view of MEIS1 and MEIS1–DLX3 structures.

a, Contacts between MEIS1 (cyan) and DNA. b, Contacts between DLX3 (magenta) and DNA. c, Comparison of the DNA structures in MEIS1 homodimer (cyan) and MEIS1–DLX3 heterodimer (magenta). Note that the DNA bound to the heterodimer is more distorted.

Supplementary information

Supplementary Tables

This file contains Supplementary Tables 1-5 as follows: 1) Sequence information for Protein and DNA molecules used in the assay; 2) PWM models; 3) Hydrogen bond contact annotated PWMs; 4) Numeric data and 5) X-ray Structure statistics. (XLSX 849 kb)

Supplementary Data 1 - Logos for all heterodimeric TF complexes and the corresponding monomers

Figures show the generated PWMs organized according to family-wise PWM similarity dendrograms (distance metric is SSTAT covariance). Families with large number PWMs have been split to show individual branches on their own pages, in such cases, the part of the branch shown is indicated by gray shading of the full dendrogram shown on the left side of each page. Similar motifs (< 30 distance units, dotted vertical line) recognized by pairs of paralogous TFs are indicated by orange colour of the rightmost branches, and PWMs generated from replicate experiments are indicated by green lines on the right side of the TF names. TF pairs displaying latent specificity or assimilation of binding specificities are indicated with blue and black boxes, respectively. Binding specificities of individual TFs were taken from our previous work8 or determined here using the previously described protocol (23 TFs; see Supplementary Table 2). (PDF 13217 kb)

Supplementary Data 2 - Analysis based on existing structures

Figures show structural models for the detected TF-TF pairs bound to DNA, or X-ray structures for the actual dimeric complex in the few cases where data was available (indicated with boxes). Structural models are based on DNA sequence alignment followed by superimposition of existing crystal structures of the same or orthologous TF into B-DNA models of the CAP-SELEX determined consensus sequences, PDB entry names of the used crystal structures are indicated below the TF names. (PDF 27745 kb)

PowerPoint slides

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jolma, A., Yin, Y., Nitta, K. et al. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature 527, 384–388 (2015).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing