Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Determining the specificity of protein–DNA interactions

Key Points

  • Sequence-specific transcription factors (TFs) control gene expression.

  • New methods allow for the rapid and accurate determination of TF binding specificity.

  • Medium-throughput methods using microfluidic devices or surface plasmon resonance (SPR) can determine binding affinities directly.

  • Microarray methods, such as protein-binding microarray (PBM) and cognate site identifier (CSI), can lead to very high-throughput measurements of TF specificity with binding sites of up to about ten base pairs.

  • High-throughput versions of SELEX, with or without multiple rounds of selection, can provide accurate binding site specificities rapidly.

  • Bacterial one-hybrid methods can also be made high-throughput and give accurate binding site models.

  • Computational models can utilize the high-throughput data to make predictions beyond the data itself, provide information about the interaction and help in the design of factors with novel specificity.

Abstract

Proteins, such as many transcription factors, that bind to specific DNA sequences are essential for the proper regulation of gene expression. Identifying the specific sequences that each factor binds can help to elucidate regulatory networks within cells and how genetic variation can cause disruption of normal gene expression, which is often associated with disease. Traditional methods for determining the specificity of DNA-binding proteins are slow and laborious, but several new high-throughput methods can provide comprehensive binding information much more rapidly. Combined with in vivo determinations of transcription factor binding locations, this information provides more detailed views of the regulatory circuitry of cells and the effects of variation on gene expression.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Different methods for studying DNA–protein interactions.
Figure 2: Comparison of data and analysis from three methods.

Similar content being viewed by others

References

  1. Farnham, P. J. Insights from genomic profiling of transcription factors. Nature Rev. Genet. 10, 605–616 (2009).

    CAS  PubMed  Google Scholar 

  2. Li, X. Y. et al. Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm. PLoS Biol. 6, e27 (2008).

    Article  Google Scholar 

  3. Zhang, X. et al. Genome-wide analysis of cAMP-response element binding protein occupancy, phosphorylation, and target gene activation in human tissues. Proc. Natl Acad. Sci. USA 102, 4459–4464 (2005).

    Article  CAS  Google Scholar 

  4. Madan Babu, M., Teichmann, S. A. & Aravind, L. Evolutionary dynamics of prokaryotic transcriptional regulatory networks. J. Mol . Biol. 358, 614–633 (2006).

    Article  CAS  Google Scholar 

  5. Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A. & Luscombe, N. M. A census of human transcription factors: function, expression and evolution. Nature Rev. Genet. 10, 252–263 (2009).

    Article  CAS  Google Scholar 

  6. Kharchenko, P. V., Tolstorukov, M. Y. & Park, P. J. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nature Biotech. 26, 1351–1359 (2008).

    Article  CAS  Google Scholar 

  7. Park, P. J. ChIP-seq: advantages and challenges of a maturing technology. Nature Rev. Genet. 10, 669–680 (2009).

    Article  CAS  Google Scholar 

  8. Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein–DNA interactions. Science 316, 1497–1502 (2007).

    Article  CAS  Google Scholar 

  9. Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nature Methods 6, S22–S32 (2009).

    Article  CAS  Google Scholar 

  10. Ji, H. et al. An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nature Biotech. 26, 1293–1300 (2008).

    Article  CAS  Google Scholar 

  11. Gordan, R., Hartemink, A. J. & Bulyk, M. L. Distinguishing direct versus indirect transcription factor–DNA interactions. Genome Res. 19, 2090–2100 (2009).

    Article  CAS  Google Scholar 

  12. Maerkl, S. J. & Quake, S. R. A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233–237 (2007). This paper introduced the MITOMI method and demonstrated its application on four bHLH TFs.

    Article  CAS  Google Scholar 

  13. Paul, S., Vadgama, P. & Ray, A. K. Surface plasmon resonance imaging for biosensing. IET Nanobiotechnol. 3, 71–80 (2009).

    Article  CAS  Google Scholar 

  14. Shumaker-Parry, J. S., Aebersold, R. & Campbell, C. T. Parallel, quantitative measurement of protein binding to a 120-element double-stranded DNA array in real time using surface plasmon resonance microscopy. Anal. Chem. 76, 2071–2082 (2004).

    Article  CAS  Google Scholar 

  15. Campbell, C. T. & Kim, G. SPR microscopy and its applications to high-throughput analyses of biomolecular binding events and their kinetics. Biomaterials 28, 2380–2392 (2007). A review of SPR methods and applications, including the study of protein–DNA interactions.

    Article  CAS  Google Scholar 

  16. Berger, M. F. et al. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nature Biotech. 24, 1429–1435 (2006). This paper introduced the universal PBM that includes all possible ten-base-long binding sites and its application on several TFs.

    Article  CAS  Google Scholar 

  17. Mukherjee, S. et al. Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays. Nature Genet. 36, 1331–1339 (2004).

    Article  CAS  Google Scholar 

  18. Bulyk, M. L., Huang, X., Choo, Y. & Church, G. M. Exploring the DNA-binding specificities of zinc fingers with DNA microarrays. Proc. Natl Acad. Sci. USA 98, 7158–7163 (2001).

    Article  CAS  Google Scholar 

  19. Berger, M. F. & Bulyk, M. L. Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nature Protoc. 4, 393–411 (2009).

    Article  CAS  Google Scholar 

  20. Philippakis, A. A., Qureshi, A. M., Berger, M. F. & Bulyk, M. L. Design of compact, universal DNA microarrays for protein binding microarray experiments. J. Comput. Biol. 15, 655–665 (2008).

    Article  CAS  Google Scholar 

  21. Newburger, D. E. & Bulyk, M. L. UniPROBE: an online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Res. 37, D77–D82 (2009).

    Article  CAS  Google Scholar 

  22. Zhu, C. et al. High-resolution DNA-binding specificity analysis of yeast transcription factors. Genome Res. 19, 556–566 (2009).

    Article  CAS  Google Scholar 

  23. Grove, C. A. et al. A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell 138, 314–327 (2009).

    Article  CAS  Google Scholar 

  24. Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009).

    Article  CAS  Google Scholar 

  25. Puckett, J. W. et al. Quantitative microarray profiling of DNA-binding molecules. J. Am. Chem. Soc. 129, 12310–12319 (2007).

    Article  CAS  Google Scholar 

  26. Warren, C. L. et al. Defining the sequence-recognition profile of DNA-binding molecules. Proc. Natl Acad. Sci. USA 103, 867–872 (2006). This paper introduced the CSI method and its application to TFs as well as small DNA-binding molecules.

    Article  CAS  Google Scholar 

  27. Carlson, C. D. et al. Specificity landscapes of DNA binding molecules elucidate biological function. Proc. Natl Acad. Sci. USA 107, 4544–4549 (2010).

    Article  CAS  Google Scholar 

  28. Hauschild, K. E., Stover, J. S., Boger, D. L. & Ansari, A. Z. CSI-FID: high throughput label-free detection of DNA binding molecules. Bioorg Med. Chem. Lett. 19, 3779–3782 (2009).

    Article  CAS  Google Scholar 

  29. Oliphant, A. R., Brandl, C. J. & Struhl, K. Defining the sequence specificity of DNA-binding proteins by selecting binding sites from random-sequence oligonucleotides: analysis of yeast GCN4 protein. Mol. Cell. Biol. 9, 2944–2949 (1989).

    Article  CAS  Google Scholar 

  30. Tuerk, C. & Gold, L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249, 505–510 (1990).

    Article  CAS  Google Scholar 

  31. Blackwell, T. K. & Weintraub, H. Differences and similarities in DNA-binding preferences of MyoD and E2A protein complexes revealed by binding site selection. Science 250, 1104–1110 (1990).

    Article  CAS  Google Scholar 

  32. Wright, W. E., Binder, M. & Funk, W. Cyclic amplification and selection of targets (CASTing) for the myogenin consensus binding site. Mol. Cell. Biol. 11, 4104–4110 (1991).

    Article  CAS  Google Scholar 

  33. Fields, D. S., He, Y., Al-Uzri, A. Y. & Stormo, G. D. Quantitative specificity of the Mnt repressor. J. Mol. Biol. 271, 178–194 (1997).

    Article  CAS  Google Scholar 

  34. Liu, X., Noll, D. M., Lieb, J. D. & Clarke, N. D. DIP-chip: rapid and accurate determination of DNA-binding specificity. Genome Res. 15, 421–427 (2005).

    Article  CAS  Google Scholar 

  35. Roulet, E. et al. High-throughput SELEX SAGE method for quantitative modeling of transcription-factor binding sites. Nature Biotech. 20, 831–835 (2002).

    Article  CAS  Google Scholar 

  36. Nagaraj, V. H., O'Flanagan, R. A. & Sengupta, A. M. Better estimation of protein–DNA interaction parameters improve prediction of functional sites. BMC Biotechnol. 8, 94 (2008).

    Article  Google Scholar 

  37. Zhao, Y., Granas, D. & Stormo, G. D. Inferring binding energies from selected binding sites. PLoS Comput. Biol. 5, e1000590 (2009). Introduction of HT-SELEX and the maximum likelihood method 'binding energy estimates using maximum likelihood' (BEEML) for obtaining binding energy models from the data.

    Article  Google Scholar 

  38. Zykovich, A., Korf, I. & Segal, D. J. Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing. Nucleic Acids Res. 37, e151 (2009).

    Article  Google Scholar 

  39. Jolma, A. et al. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 20, 861–873 (2010). This study describes the use of HT-SELEX in parallel to determine the binding specificities of several human TFs.

    Article  CAS  Google Scholar 

  40. Meng, X., Brodsky, M. H. & Wolfe, S. A. A bacterial one-hybrid system for determining the DNA-binding specificity of transcription factors. Nature Biotech. 23, 988–994 (2005). The introduction of an efficient B1H approach for determining TF binding specificities.

    Article  CAS  Google Scholar 

  41. Meng, X. & Wolfe, S. A. Identifying DNA sequences recognized by a transcription factor using a bacterial one-hybrid system. Nature Protoc. 1, 30–45 (2006).

    Article  CAS  Google Scholar 

  42. Noyes, M. B. et al. Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites. Cell 133, 1277–1289 (2008).

    Article  CAS  Google Scholar 

  43. Noyes, M. B. et al. A systematic characterization of factors that regulate Drosophila segmentation via a bacterial one-hybrid system. Nucleic Acids Res. 36, 2547–2560 (2008).

    Article  CAS  Google Scholar 

  44. Stormo, G. D. & Zhao, Y. Putting numbers on the network connections. Bioessays 29, 717–721 (2007).

    Article  CAS  Google Scholar 

  45. Benos, P. V., Bulyk, M. L. & Stormo, G. D. Additivity in protein–DNA interactions: how good an approximation is it? Nucleic Acids Res. 30, 4442–4451 (2002).

    Article  CAS  Google Scholar 

  46. Alleyne, T. M. et al. Predicting the binding preference of transcription factors to individual DNA k-mers. Bioinformatics 25, 1012–1018 (2009).

    Article  CAS  Google Scholar 

  47. Benos, P. V., Lapedes, A. S. & Stormo, G. D. Probabilistic code for DNA recognition by proteins of the EGR family. J. Mol. Biol. 323, 701–727 (2002).

    Article  CAS  Google Scholar 

  48. Cathomen, T. & Joung, J. K. Zinc-finger nucleases: the next generation emerges. Mol. Ther. 16, 1200–1207 (2008).

    Article  CAS  Google Scholar 

  49. Schneider, T. D., Stormo, G. D., Gold, L. & Ehrenfeucht, A. Information content of binding sites on nucleotide sequences. J. Mol. Biol. 188, 415–431 (1986).

    Article  CAS  Google Scholar 

  50. Stormo, G. D. & Fields, D. S. Specificity, free energy and information content in protein–DNA interactions. Trends Biochem. Sci. 23, 109–113 (1998).

    Article  CAS  Google Scholar 

  51. Schneider, T. D. & Stephens, R. M. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100 (1990).

    Article  CAS  Google Scholar 

  52. Stormo, G. D., Schneider, T. D. & Gold, L. Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Res. 14, 6661–6679 (1986).

    Article  CAS  Google Scholar 

  53. Lee, M. L., Bulyk, M. L., Whitmore, G. A. & Church, G. M. A statistical model for investigating binding probabilities of DNA nucleotide sequences using microarrays. Biometrics 58, 981–988 (2002).

    Article  Google Scholar 

  54. Djordjevic, M., Sengupta, A. M. & Shraiman, B. I. A biophysical approach to transcription factor binding site discovery. Genome Res. 13, 2381–2390 (2003).

    Article  CAS  Google Scholar 

  55. Fordyce, P. M. et al. De novo identification and biophysical characterization of transcription-factor binding sites with microfluidic affinity analysis. Nature Biotech. 28, 970–975 (2010).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank S. Wolfe for providing the plasmids and cells for the B1H experiments and useful comments on the manuscript. We thank members of the Stormo Laboratory for comments on the manuscript and especially R. Christensen, D. Granas, Z. Zuo and L. Schriefer for providing the data from the B1H selections with ZIF268.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gary D. Stormo.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links

FURTHER INFORMATION

Gary D. Stormo's homepage

Berkeley database of Drosophila TF specificities

Database of Drosophila TF DNA-binding specificities

JASPAR database of TF binding specificities

PAZAR database

SELEX-SAGE database

UniProbe database

Glossary

In vivo

In this context we use the term in vivo to refer to any experiments performed on living cells, whether within or outside a whole organism (sometimes referred to as ex vivo).

Chromatin immunoprecipitation

(ChIP). A technique that is used to identify the location of DNA-binding proteins and epigenetic marks in the genome. Genomic sequences containing the mark of interest are enriched by binding soluble DNA chromatin extracts (complexes of DNA and protein) to an antibody that recognizes the mark. ChIP can be followed by analysis of the precipitated DNA through hybridization to a microarray (ChIP–chip) or by sequencing of the precipitated DNA (ChIP–seq).

Motif

In this Review motif refers to a representation of the specificity of a transcription factor. For instance, a position weight matrix is a motif for a transcription factor, but there could be other ways to represent the specificity.

Microfluidic device

A device in which fluids are conveyed to samples in channels with diameters in the order of 1 μm; these chambers can be used to precisely and dynamically control the microenvironment to which cells are exposed.

K d

The dissociation constant between two molecules (here, for a transcription factor and a DNA sequence). It is the ratio of the off-to-on rate for the formation and dissolution of the complex.

Consensus sequence

A single sequence (possibly degenerate) that represents the specificity of a transcription factor. It is usually the highest affinity sequence and would be the most frequent base at each position in a collection of binding sites. It is also possible to use degeneracies, for example, 'R = A or G', if two (or more) bases are equivalent.

Gibbs standard free energy

The difference in free energy between the equilibrium state and the standard state (all reactants and products at 1M concentration).

Gas constant

(Represented by R). The thermodynamic constant relating energy per mole to temperature.

Microarray

A high-density array of DNA molecules, typically with each element of the array containing a different DNA sequence. Single-stranded DNA arrays are used to hybridize to labelled DNA (or RNA) sequences to determine the relative abundances of different sequences. Double-stranded DNA arrays are used in protein-binding microarray and cognate site identifier methods to determine the binding preferences of transcription factors or other DNA-binding molecules.

Gel filtration

A permeable gel, such a polyacrylamide or agarose, is used to separate molecules and molecular complexes based on their size. DNA will migrate faster through a gel than the same DNA that is bound to a protein, so the protein–DNA complex can be separated from the free DNA.

Barcoding

The process of adding the same unique DNA sequence to the ends of DNA molecules from the same experiment, so that the resulting DNA reads can be traced back to the experiment that generated them, allowing several experiments to be sequenced simultaneously (multiplexed).

Multiplexing

The process of mixing several experimental samples together and sequencing them (or some other process) simultaneously. Each sample can be barcoded to recover the information about its experimental origin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stormo, G., Zhao, Y. Determining the specificity of protein–DNA interactions. Nat Rev Genet 11, 751–760 (2010). https://doi.org/10.1038/nrg2845

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg2845

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing