Gene expression is regulated by transcription factors (TFs), proteins that recognize short DNA sequence motifs1,2,3. Such sequences are very common in the human genome, and an important determinant of the specificity of gene expression is the cooperative binding of multiple TFs to closely located motifs4,5,6. However, interactions between DNA-bound TFs have not been systematically characterized. To identify TF pairs that bind cooperatively to DNA, and to characterize their spacing and orientation preferences, we have performed consecutive affinity-purification systematic evolution of ligands by exponential enrichment (CAP-SELEX) analysis of 9,400 TF–TF–DNA interactions. This analysis revealed 315 TF–TF interactions recognizing 618 heterodimeric motifs, most of which have not been previously described. The observed cooperativity occurred promiscuously between TFs from diverse structural families. Structural analysis of the TF pairs, including a novel crystal structure of MEIS1 and DLX3 bound to their identified recognition site, revealed that the interactions between the TFs were predominantly mediated by DNA. Most TF pair sites identified involved a large overlap between individual TF recognition motifs, and resulted in recognition of composite sites that were markedly different from the individual TF’s motifs. Together, our results indicate that the DNA molecule commonly plays an active role in cooperative interactions that define the gene regulatory lexicon.
Your institute does not have access to this article
Open Access articles citing this article.
Transcription factor paralogs orchestrate alternative gene regulatory networks by context-dependent cooperation with multiple cofactors
Nature Communications Open Access 01 July 2022
Nature Open Access 08 June 2022
BMC Bioinformatics Open Access 03 June 2022
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
European Nucleotide Archive
Protein Data Bank
Sequencing reads are deposited to European Nucleotide Archive (accession PRJEB7934). The atomic coordinates and diffraction data are deposited to Protein Data Bank (accession 4XRM, 5BNG and 4XRS). All computer programs and scripts used are either published or available upon request.
Shlyueva, D., Stampfel, G. & Stark, A. Transcriptional enhancers: from properties to genome-wide predictions. Nature Rev. Genet. 15, 272–286 (2014)
Levo, M. & Segal, E. In pursuit of design principles of regulatory sequences. Nature Rev. Genet. 15, 453–468 (2014)
Slattery, M. et al. Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 39, 381–399 (2014)
Rodda, D. J. et al. Transcriptional regulation of Nanog by OCT4 and SOX2. J. Biol. Chem. 280, 24731–24737 (2005)
Panne, D., Maniatis, T. & Harrison, S. C. An atomic model of the interferon-beta enhanceosome. Cell 129, 1111–1123 (2007)
De Val, S. et al. Combinatorial regulation of endothelial gene expression by Ets and Forkhead transcription factors. Cell 135, 1053–1064 (2008)
Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009)
Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013)
Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A. & Luscombe, N. M. A census of human transcription factors: function, expression and evolution. Nature Rev. Genet. 10, 252–263 (2009)
Najafabadi, H. S. et al. C2H2 zinc finger proteins greatly expand the human regulatory lexicon. Nature Biotechnol. 33, 555–562 (2015)
Emery, P. et al. A consensus motif in the RFX DNA binding domain and binding domain mutants with altered specificity. Mol. Cell. Biol. 16, 4486–4494 (1996)
Kurokawa, R. et al. Differential orientations of the DNA-binding domain and carboxy-terminal dimerization interface regulate binding site selection by nuclear receptor heterodimers. Genes Dev. 7, 1423–1435 (1993)
Mohibullah, N., Donner, A., Ippolito, J. A. & Williams, T. SELEX and missing phosphate contact analyses reveal flexibility within the AP-2α protein: DNA binding complex. Nucleic Acids Res. 27, 2760–2769, (1999)
Slattery, M. et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147, 1270–1282 (2011)
Grove, C. A. et al. A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell 138, 314–327 (2009)
Nitta, K. R . et al. Conservation of transcription factor binding specificities across 600 million years of bilateria evolution. eLife 4, e04837 (2015)
Kim, S. et al. Probing allostery through DNA. Science 339, 816–819 (2013)
Rhee, H. S. & Pugh, B. F. Comprehensive genome-wide protein–DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419 (2011)
Yan, J. et al. Transcription factor binding in human cells occurs in dense clusters formed around cohesin anchor sites. Cell 154, 801–813 (2013)
Guturu, H., Doxey, A. C., Wenger, A. M. & Bejerano, G. Structure-aided prediction of mammalian transcription factor complexes in conserved non-coding elements. Phil. Trans. R. Soc. Lond. B 368, 20130029 (2013)
Wei, G. H. et al. Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo. EMBO J. 29, 2147–2160 (2010)
Mirny, L. A. Nucleosome-mediated cooperativity between transcription factors. Proc. Natl Acad. Sci. USA 107, 22534–22539 (2010)
Wasson, T. & Hartemink, A. J. An ensemble model of competitive multi-factor binding of the genome. Genome Res. 19, 2101–2112 (2009)
Poss, Z. C., Ebmeier, C. C. & Taatjes, D. J. The Mediator complex and transcription regulation. Crit. Rev. Biochem. Mol. Biol. 48, 575–608 (2013)
Kagey, M. H. et al. Mediator and cohesin connect gene expression and chromatin architecture. Nature 467, 430–435 (2010)
Nishizawa, M. & Nagata, S. cDNA clones encoding leucine-zipper proteins which interact with G-CSF gene promoter element 1-binding protein. FEBS Lett. 299, 36–38 (1992)
Shen, W. F. et al. AbdB-like Hox proteins stabilize DNA binding by the Meis1 homeodomain proteins. Mol. Cell. Biol. 17, 6448–6458 (1997)
Williams, D. C., Jr, Cai, M. & Clore, G. M. Molecular basis for synergistic transcriptional activation by Oct1 and Sox2 revealed from the solution structure of the 42-kDa Oct1·Sox2·Hoxb1–DNA ternary transcription factor complex. J. Biol. Chem. 279, 1449–1457 (2004)
Cohen, S. X. et al. Structure of the GCM domain–DNA complex: a DNA-binding domain with a novel fold and mode of target site recognition. EMBO J. 22, 1835–1845 (2003)
Mo, Y., Vaessen, B., Johnston, K. & Marmorstein, R. Structure of the Elk-1–DNA complex reveals how DNA-distal residues affect ETS domain recognition of DNA. Nature Struct. Mol. Biol. 7, 292–297 (2000)
Jolma, A. et al. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 20, 861–873 (2010)
Katainen, R. et al. CTCF/cohesion-binding sites are frequently mutated in cancer. Nature Genetics 47, 818–821 (2015)
Guo, Y. et al. Discovering homotypic binding events at high spatial resolution. Bioinformatics 26, 3028–3034 (2010)
Passner, J. M., Ryoo, H. D., Shen, L., Mann, R. S. & Aggarwal, A. K. Structure of a DNA-bound Ultrabithorax-Extradenticle homeodomain complex. Nature 397, 714–719 (1999)
Vincentelli, R. et al. High-throughput protein expression screening and purification in Escherichia coli. Methods 55, 65–72 (2011)
Keshava Prasad, T. S. et al. Human Protein Reference Database–2009 update. Nucleic Acids Res. 37, D767–D772 (2009)
Ravasi, T. et al. An atlas of combinatorial transcriptional regulation in mouse and man. Cell 140, 744–752 (2010)
Newman, J. R. & Keating, A. E. Comprehensive identification of human bZIP interactions with coiled-coil arrays. Science 300, 2097–2101 (2003)
Klemm, J. D. & Pabo, C. O. Oct-1 POU domain–DNA interactions: cooperative binding of isolated subdomains and effects of covalent linkage. Genes Dev. 10, 27–36 (1996)
Panne, D., Maniatis, T. & Harrison, S. C. Crystal structure of ATF-2/c-Jun and IRF-3 bound to the interferon-β enhancer. EMBO J. 23, 4384–4393 (2004)
Rigaut, G. et al. A generic protein purification method for protein complex characterization and proteome exploration. Nature Biotechnol. 17, 1030–1032 (1999)
Hallikas, O. et al. Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell 124, 47–59 (2006)
Winn, M. D. et al. Overview of the CCP4 suite and current developments. Acta Crystallogr. D 67, 235–242 (2011)
Moorman, C. et al. Hotspots of transcription factor colocalization in the genome of Drosophila melanogaster. Proc. Natl Acad. Sci. USA 103, 12027–12032 (2006)
Yip, K. Y. et al. Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol. 13, R48 (2012)
Meireles-Filho, A. C., Bardet, A. F., Yanez-Cuna, J. O., Stampfel, G. & Stark, A. cis-regulatory requirements for tissue-specific programs of the circadian clock. Curr. Biol. 24, 1–10 (2014)
Löytynoja, A. & Goldman, N. An algorithm for progressive multiple alignment of sequences with insertions. Proc. Natl Acad. Sci. USA 102, 10557–10562 (2005)
Pape, U. J., Rahmann, S. & Vingron, M. Natural similarity measures between position frequency matrices with an application to clustering. Bioinformatics 24, 350–357 (2008)
Odrowaz, Z. & Sharrocks, A. D. The ETS transcription factors ELK1 and GABPA regulate different gene networks to control MCF10A breast epithelial cell migration. PLoS ONE 7, e49892 (2012)
Huang, Q. et al. A prostate cancer susceptibility allele at 6q22 increases RFX6 expression by modulating HOXB13 chromatin binding. Nature Genet. 46, 126–135, 10.1038/ng.2862 (2014)
Huang, Y. et al. Identification and characterization of Hoxa9 binding sites in hematopoietic cells. Blood 119, 388–398 (2012)
Penkov, D. et al. Analysis of the DNA-binding profile and function of TALE homeoproteins reveals their specialization and specific interactions with Hox genes/proteins. Cell Reports 3, 1321–1333, (2013)
Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011)
Korhonen, J., Martinmaki, P., Pizzi, C., Rastas, P. & Ukkonen, E. MOODS: fast search for position weight matrix matches in DNA sequences. Bioinformatics 25, 3181–3182 (2009)
Garber, M. et al. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 25, i54–i62 (2009)
Blanchette, M. et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14, 708–715 (2004)
Savitsky, P. et al. High-throughput production of human proteins for crystallization: the SGC experience. J. Struct. Biol. 172, 3–13 (2010)
Bourenkov, G. P. & Popov, A. N. A quantitative approach to data-collection strategies. Acta Crystallogr. D 62, 58–64 (2006)
Kabsch, W. Xds. Acta Crystallogr. D 66, 125–132 (2010)
Collaborative Computational Project Number 4. The CCP4 suite: programs for protein crystallography. Acta Crystallogr. D 50, 760–763 (1994)
McCoy, A. J. et al. Phaser crystallographic software. J. Appl. Crystallogr. 40, 658–674 (2007)
Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D 66, 213–221 (2010)
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D 66, 486–501 (2010)
Fitzsimmons, D. et al. Pax-5 (BSAP) recruits Ets proto-oncogene family proteins to form functional ternary complexes on a B-cell-specific promoter. Genes Dev. 10, 2198–2211 (1996)
Kim, J. J. et al. Regulation of insulin-like growth factor binding protein-1 promoter activity by FKHR and HOXA10 in primate endometrial cells. Biol. Reprod. 68, 24–30 (2003)
Vinson, C. R., Hai, T. & Boyd, S. M. Dimerization specificity of the leucine zipper-containing bZIP motif on DNA binding: prediction and rational design. Genes Dev. 7, 1047–1058 (1993)
Williams, T. M., Williams, M. E. & Innis, J. W. Range of HOX/TALE superclass associations and protein domain requirements for HOXA13:MEIS interaction. Dev. Biol. 277, 457–471 (2005)
Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nature Biotechnol. 30, 271–277 (2012)
Raveh-Sadka, T. et al. Manipulating nucleosome disfavoring sequences allows fine-tune regulation of gene expression in yeast. Nature Genet. 44, 743–750 (2012)
Hochschild, A. & Ptashne, M. Cooperative binding of λ repressors to sites separated by integral turns of the DNA helix. Cell 44, 681–687 (1986)
Moretti, R. et al. Targeted chemical wedges reveal the role of allosteric DNA modulation in protein–DNA assembly. ACS Chem. Biol. 3, 220–229 (2008)
Aggarwal, A. K., Rodgers, D. W., Drottar, M., Ptashne, M. & Harrison, S. C. Recognition of a DNA operator by the repressor of phage 434: a view at high resolution. Science 242, 899–907 (1988)
Jordan, S. R. & Pabo, C. O. Structure of the lambda complex at 2.5Å resolution: details of the repressor–operator interactions. Science 242, 893–899 (1988)
Rohs, R. et al. Origins of specificity in protein–DNA recognition. Annu. Rev. Biochem. 79, 233–269 (2010)
We thank J. Yan, B. Schmierer, E. Kaasinen, C. Daub, E. Haapaniemi and Å. Kolterud for their review of the manuscript, the Karolinska Institutet protein science facility for protein purification, and S. Augsten, L. Hu and A. Zetterlund for technical assistance. This work was supported by Finnish Academy CoE in Cancer Genetics, Center for Innovative Medicine, Knut and Alice Wallenberg and Göran Gustafsson Foundations and Vetenskapsrådet.
The authors declare no competing financial interests.
Extended data figures and tables
a, Flowchart of CAP-SELEX data analysis. Left, a library of selection ligands with random sequences (yellow) is incubated with TFs. After CAP-SELEX, enriched individual TF motifs (1°; arrows) and composite motifs that are not simply combinations of the individual motifs (2°; green dots) are detected from the reads. To detect preferential spacings and orientations of the TF pair (3°), co-occurrence of the indicative 6-mer subsequences (arrowheads) are counted from the reads. The subsequences are then used to generate seeds for the PWM models (right). Heatmap (bottom right; scale divided by highest observed count) shows frequency of occurrence of the two 6-mers (CCGGAA, red arrowhead; CATTCC, black arrowhead) in all possible spacings (columns) and orientations (rows). Note that the 6-mer based approach cannot model the composite site, but identifies a strong case of cooperativity where the ERG 6-mer CCGGAA is followed by the TEAD4 6-mer CATTCC site with an 8 bp gap. Logo of the PWM for this site is also shown. b, Comparison between CAP-SELEX PWMs and previously characterized specificities for the indicated TF pairs. This method has been used previously and its references are also indicated. CAP-SELEX models also shown in Fig. 1 are indicated by asterisks. Note that four out of five of the CAP-SELEX models are similar to the previously identified consensus sequences. The exception is ELK1–PAX5 consensus, that matches poorly both the CAP-SELEX motif and individual motifs for ELK1 and PAX5 (not shown). c, CAP-SELEX PWMs for TF pairs known to interact at protein level. Method used to identify the protein–protein interaction and its reference are also shown6,26,27,28,37,38,64,65,66,67.
Columns indicate TF1 proteins, and rows TF2 proteins, subjected to the first and second affinity purifications, respectively. Pairs of TFs with a single spacing and orientation preference are indicated in dark green, and pairs with multiple preferred configurations in light green. White boxes indicate pairs that displayed weak or no interaction, and grey boxes cases where robust preference data was not recovered. Previously known interacting TF-pairs are indicated by a yellow outline (see Extended Data Fig. 1). Histograms show the counts for the interactions for each TF. Only TFs for which at least one clear interaction or independent binding was identified are included. The importance of including DNA in the interaction assay is highlighted by the fact that only four and five of the interactions detected are among those observed between 762 human or 877 mouse TF pairs identified using protein–protein interaction assays37, or compiled from literature36, respectively.
a, Replicate analysis of more than two hundred of the generated PWMs. The same seeds that had been used to generate PWMs for the primary experiments were used to seed new PWMs from the replicates. Left, red bars on the left show the percentage of the PWM pairs that are similar at the indicated cut-offs (measured as SSTAT covariance8,48). The highest threshold is the same used for identifying the dominating set of PWMs. Blue bars indicate fraction of all replicate PWMs that are similar using the same cut-off. Right, dendrogram and barcode logos of all PWM pairs. Plot in the middle shows fraction of reads included in the same models in replicate 1 and 2. b, Validation of the CAP-SELEX analysis using shortened TF constructs (DBD+) by HT-SELEX using full-length protein mixtures (full length). Note that the same orientation and spacing is preferred in all but one of the cases. In one case (bottom), full-length proteins show the highest preference to a different spacing than that observed in CAP-SELEX; even in this case, the second-most preferred spacing is the one identified using CAP-SELEX.
Many experiments where TFs bound sites that were relatively far apart showed preferential binding to sites that are separated by approximately nine to ten bases. Heatmap (maximum count set to 1) representations showing frequency of occurrence of the representative 6-mers for TF pairs in all possible spacings (columns) and orientations (rows). a, Replicate experiment of GCM1 (black arrowhead) and MAX (red ball) pair show very similar preference for cooperatively bound representative 6-mers (see Supplementary Table 1). While one of the orientations shows preference for a single spacing, the second has two preferentially recognized regions separated by ~9 bp. b, TEAD4–CEBPB pair shows a similar ~9 bp separation between three regions of preferred spacings (brackets). c, Very deep sequencing of the unselected input ligand does not show the same preference, instead counts decrease linearly as a function of gap length (due to decreasing number of available positions in the 40N random sequence). The mode of cooperativity seen in a and b appears similar to that reported by Kim et al.17. In addition to high-affinity sites, lower affinity spacings and orientations between TF pairs could be employed in fine-tuned transcriptional responses (see refs. 68, 69).
a, Pie chart showing the frequency of DNA-mediated, DNA-facilitated and potentially protein–protein interaction mediated heterodimers in the CAP-SELEX data set. Cooperativity between TFs can result from direct contacts between the proteins (protein-mediated), DNA-facilitated protein contacts (DNA-facilitated) or arise indirectly from DNA-mediated interactions17,34,39,40,70,71. The last type of cooperativity is caused by the DBD binding-induced changes in DNA shape, and do not involve other domains or direct contact between the proteins17,39,40. The dimers were classified to DNA-mediated, DNA-facilitated and potentially protein–protein interaction mediated classes manually, based on structural models shown in Supplementary Data Set 2. b, Conservation of the genomic sites recognized by the CAP-SELEX identified heterodimeric motifs (left) compared to monomeric and homodimeric sites identified by HT-SELEX (right, motifs from ref. 8). For each motif, ten thousand non-overlapping highest affinity sites within human constrained non-coding regions recognized by the motif or one its control motifs (see Methods for details) were selected and their conservation was tested. The fold enrichment (y axis), that is, the fraction of conserved sites among the motif sites divided by the fraction of conserved sites among the control motif sites, is shown as a function of the number of conserved motif sites among the top ten thousand sites (x axis). The motifs that are significantly conserved (multiple testing adjusted P value <0.05) are marked green. Five motifs with lowest P values are also indicated. Note that ~50% of the HT-SELEX and ~25% of the CAP-SELEX motifs are conserved above the significance threshold. c, Inclusion of heterodimeric motifs improves prediction of ChIP-seq peaks. Left, the error rate of prediction of ChIP-seq peak positions using either the monomer motifs and CAP-SELEX dimers (light grey), or monomer motifs and control motifs where the partner of the indicated TF is reversed but not complemented (dark grey) are shown. Note that inclusion of the correct heterodimeric motifs decreases the prediction error rate in the cases of HOXB13 and MEIS1. The relatively modest effect is likely due to the fact that only a subset of heterodimers were identified in our study, and that ChIP-seq peak positions are also influenced by other factors such as nucleosome binding and chromatin structure. Right, number of PWM matches as a function of distance from HOXB13 ChIP-exo peaks. Note that using the original heterodimer motifs clearly outperforms the control motifs.
Extended Data Figure 6 Heterodimers where the individual TF core recognition sites appear to overlap.
a, Composite site formation alters specificity at bases flanking the core TF site. TFs often directly read specific ‘core’ sequence motifs via hydrogen bonding to DNA bases. The sequences flanking this core are commonly read indirectly, through contacts to the sugar and phosphate backbone of DNA72,73,74. The backbone contacts specify a preferred DNA conformation, which then leads to a preference of a sequence that is optimal for stacking interactions between consecutive base pairs (reviewed in ref. 74). Figure shows summary of base positions whose specificity is affected in all composite sites identified in this study for the indicated TFs. Note that the bases comprising the core motif that is recognized by direct hydrogen bonds to the DNA bases are not commonly affected by heterodimer formation. In contrast, specificity at flanking positions that are recognized by contacts to the sugar or phosphate backbone of DNA are commonly altered by binding of the heterodimer partner. Hydrogen bonds contacts were determined based on the indicated (refs 29 and 30) or homologous TF structures (see Supplementary Table 3). b, A base (arrow) can be contacted both from the side of the major groove (black dot; G contacted by GCM1) and the minor groove (white dot; C contacted by DRGX homeodomain). c, A TF that can bind to a homodimeric site appears instead to bind as a heterodimer. A composite site is shown where HOXB2 appears to form a heterodimer with a monomer of RFX5. d, In some cases, the binding positions of the TFs cannot be unambigiously assigned based on the composite recognition sequence. In a, the annotation of hydrogen bond contacts is as described in main Fig. 2; in b–d, the major groove contacts of the left and right TFs are indicated in black and red dots, respectively.
Dendrogram shows motif similarities between the representative heterodimer and monomer motifs. Heterodimer models are indicated by green bars. Barcode logos for each factor are also shown. Centre of dendrogram shows the colour key for the TF families.
Extended Data Figure 8 Comparison of CAP-SELEX models to models inferred from conserved genomic sequences.
a, Motifs that are very similar to the CAP-SELEX motifs are enriched and conserved. A previous study by Guturu et al.20 made structural models for pairs of TFs to identify sterically possible configurations and predict sites that could be bound by such complexes. Enrichment of those target sites were then quantified in evolutionarily conserved noncoding regions over nonconserved control regions to infer putative target sites for cooperatively binding TFs. Pie chart shows comparison of top 100 most significant target sites predicted20 to all heterodimeric PWMs generated in this study. 15 PWMs showed clear similarity to our heterodimeric PWMs (upper right, dark green slice), 8 were partially similar (green) and further 5 had enrichment for the site but under the threshold used in our study. We did not detect 25 motifs, and for 14 potential pairs, we identified a different spacing and orientation. This result is expected as we did not test all potential TF–TF pairs, and many TFs that bind to similar monomer sites prefer different dimer spacings and orientations. Finally, of the 100 Guturu et al.20 top motifs, 33 were not analysed in our study (14 were homodimeric and no possible pair was tested for 19; for example, three of the pairs were predicted for pairs with a SMAD TF, and no SMAD TFs were tested in our study). b, Comparison of the 15 (top) and 8 (bottom, boxed) matching and partially matching PWMs, respectively.
a, Contacts between MEIS1 (cyan) and DNA. b, Contacts between DLX3 (magenta) and DNA. c, Comparison of the DNA structures in MEIS1 homodimer (cyan) and MEIS1–DLX3 heterodimer (magenta). Note that the DNA bound to the heterodimer is more distorted.
This file contains Supplementary Tables 1-5 as follows: 1) Sequence information for Protein and DNA molecules used in the assay; 2) PWM models; 3) Hydrogen bond contact annotated PWMs; 4) Numeric data and 5) X-ray Structure statistics. (XLSX 849 kb)
Figures show the generated PWMs organized according to family-wise PWM similarity dendrograms (distance metric is SSTAT covariance). Families with large number PWMs have been split to show individual branches on their own pages, in such cases, the part of the branch shown is indicated by gray shading of the full dendrogram shown on the left side of each page. Similar motifs (< 30 distance units, dotted vertical line) recognized by pairs of paralogous TFs are indicated by orange colour of the rightmost branches, and PWMs generated from replicate experiments are indicated by green lines on the right side of the TF names. TF pairs displaying latent specificity or assimilation of binding specificities are indicated with blue and black boxes, respectively. Binding specificities of individual TFs were taken from our previous work8 or determined here using the previously described protocol (23 TFs; see Supplementary Table 2). (PDF 13217 kb)
Figures show structural models for the detected TF-TF pairs bound to DNA, or X-ray structures for the actual dimeric complex in the few cases where data was available (indicated with boxes). Structural models are based on DNA sequence alignment followed by superimposition of existing crystal structures of the same or orthologous TF into B-DNA models of the CAP-SELEX determined consensus sequences, PDB entry names of the used crystal structures are indicated below the TF names. (PDF 27745 kb)
About this article
Cite this article
Jolma, A., Yin, Y., Nitta, K. et al. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature 527, 384–388 (2015). https://doi.org/10.1038/nature15518
abc4pwm: affinity based clustering for position weight matrices in applications of DNA sequence analysis
BMC Bioinformatics (2022)
BMC Bioinformatics (2022)
Transcription factor paralogs orchestrate alternative gene regulatory networks by context-dependent cooperation with multiple cofactors
Nature Communications (2022)
Prediction of protein–ligand binding affinity from sequencing data with interpretable machine learning
Nature Biotechnology (2022)
Computational identification of new potential transcriptional partners of ERRα in breast cancer cells: specific partners for specific targets
Scientific Reports (2022)