Determining the specificity of protein–DNA interactions

Stormo, Gary D.; Zhao, Yue

doi:10.1038/nrg2845

Review Article
Published: 28 September 2010

Determining the specificity of protein–DNA interactions

Gary D. Stormo¹ &
Yue Zhao¹

Nature Reviews Genetics volume 11, pages 751–760 (2010)Cite this article

9686 Accesses
222 Citations
3 Altmetric
Metrics details

Subjects

Key Points

Sequence-specific transcription factors (TFs) control gene expression.
New methods allow for the rapid and accurate determination of TF binding specificity.
Medium-throughput methods using microfluidic devices or surface plasmon resonance (SPR) can determine binding affinities directly.
Microarray methods, such as protein-binding microarray (PBM) and cognate site identifier (CSI), can lead to very high-throughput measurements of TF specificity with binding sites of up to about ten base pairs.
High-throughput versions of SELEX, with or without multiple rounds of selection, can provide accurate binding site specificities rapidly.
Bacterial one-hybrid methods can also be made high-throughput and give accurate binding site models.
Computational models can utilize the high-throughput data to make predictions beyond the data itself, provide information about the interaction and help in the design of factors with novel specificity.

Abstract

Proteins, such as many transcription factors, that bind to specific DNA sequences are essential for the proper regulation of gene expression. Identifying the specific sequences that each factor binds can help to elucidate regulatory networks within cells and how genetic variation can cause disruption of normal gene expression, which is often associated with disease. Traditional methods for determining the specificity of DNA-binding proteins are slow and laborious, but several new high-throughput methods can provide comprehensive binding information much more rapidly. Combined with in vivo determinations of transcription factor binding locations, this information provides more detailed views of the regulatory circuitry of cells and the effects of variation on gene expression.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Different methods for studying DNA–protein interactions.**

**Figure 2: Comparison of data and analysis from three methods.**

Global reference mapping of human transcription factor footprints

Article Open access 29 July 2020

Sequence determinants of human gene regulatory elements

Article Open access 21 February 2022

Determinants of enhancer and promoter activities of regulatory elements

Article 11 October 2019

References

Farnham, P. J. Insights from genomic profiling of transcription factors. Nature Rev. Genet. 10, 605–616 (2009).
CAS PubMed Google Scholar
Li, X. Y. et al. Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm. PLoS Biol. 6, e27 (2008).
Article Google Scholar
Zhang, X. et al. Genome-wide analysis of cAMP-response element binding protein occupancy, phosphorylation, and target gene activation in human tissues. Proc. Natl Acad. Sci. USA 102, 4459–4464 (2005).
Article CAS Google Scholar
Madan Babu, M., Teichmann, S. A. & Aravind, L. Evolutionary dynamics of prokaryotic transcriptional regulatory networks. J. Mol . Biol. 358, 614–633 (2006).
Article CAS Google Scholar
Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A. & Luscombe, N. M. A census of human transcription factors: function, expression and evolution. Nature Rev. Genet. 10, 252–263 (2009).
Article CAS Google Scholar
Kharchenko, P. V., Tolstorukov, M. Y. & Park, P. J. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nature Biotech. 26, 1351–1359 (2008).
Article CAS Google Scholar
Park, P. J. ChIP-seq: advantages and challenges of a maturing technology. Nature Rev. Genet. 10, 669–680 (2009).
Article CAS Google Scholar
Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein–DNA interactions. Science 316, 1497–1502 (2007).
Article CAS Google Scholar
Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nature Methods 6, S22–S32 (2009).
Article CAS Google Scholar
Ji, H. et al. An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nature Biotech. 26, 1293–1300 (2008).
Article CAS Google Scholar
Gordan, R., Hartemink, A. J. & Bulyk, M. L. Distinguishing direct versus indirect transcription factor–DNA interactions. Genome Res. 19, 2090–2100 (2009).
Article CAS Google Scholar
Maerkl, S. J. & Quake, S. R. A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233–237 (2007). This paper introduced the MITOMI method and demonstrated its application on four bHLH TFs.
Article CAS Google Scholar
Paul, S., Vadgama, P. & Ray, A. K. Surface plasmon resonance imaging for biosensing. IET Nanobiotechnol. 3, 71–80 (2009).
Article CAS Google Scholar
Shumaker-Parry, J. S., Aebersold, R. & Campbell, C. T. Parallel, quantitative measurement of protein binding to a 120-element double-stranded DNA array in real time using surface plasmon resonance microscopy. Anal. Chem. 76, 2071–2082 (2004).
Article CAS Google Scholar
Campbell, C. T. & Kim, G. SPR microscopy and its applications to high-throughput analyses of biomolecular binding events and their kinetics. Biomaterials 28, 2380–2392 (2007). A review of SPR methods and applications, including the study of protein–DNA interactions.
Article CAS Google Scholar
Berger, M. F. et al. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nature Biotech. 24, 1429–1435 (2006). This paper introduced the universal PBM that includes all possible ten-base-long binding sites and its application on several TFs.
Article CAS Google Scholar
Mukherjee, S. et al. Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays. Nature Genet. 36, 1331–1339 (2004).
Article CAS Google Scholar
Bulyk, M. L., Huang, X., Choo, Y. & Church, G. M. Exploring the DNA-binding specificities of zinc fingers with DNA microarrays. Proc. Natl Acad. Sci. USA 98, 7158–7163 (2001).
Article CAS Google Scholar
Berger, M. F. & Bulyk, M. L. Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nature Protoc. 4, 393–411 (2009).
Article CAS Google Scholar
Philippakis, A. A., Qureshi, A. M., Berger, M. F. & Bulyk, M. L. Design of compact, universal DNA microarrays for protein binding microarray experiments. J. Comput. Biol. 15, 655–665 (2008).
Article CAS Google Scholar
Newburger, D. E. & Bulyk, M. L. UniPROBE: an online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Res. 37, D77–D82 (2009).
Article CAS Google Scholar
Zhu, C. et al. High-resolution DNA-binding specificity analysis of yeast transcription factors. Genome Res. 19, 556–566 (2009).
Article CAS Google Scholar
Grove, C. A. et al. A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell 138, 314–327 (2009).
Article CAS Google Scholar
Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009).
Article CAS Google Scholar
Puckett, J. W. et al. Quantitative microarray profiling of DNA-binding molecules. J. Am. Chem. Soc. 129, 12310–12319 (2007).
Article CAS Google Scholar
Warren, C. L. et al. Defining the sequence-recognition profile of DNA-binding molecules. Proc. Natl Acad. Sci. USA 103, 867–872 (2006). This paper introduced the CSI method and its application to TFs as well as small DNA-binding molecules.
Article CAS Google Scholar
Carlson, C. D. et al. Specificity landscapes of DNA binding molecules elucidate biological function. Proc. Natl Acad. Sci. USA 107, 4544–4549 (2010).
Article CAS Google Scholar
Hauschild, K. E., Stover, J. S., Boger, D. L. & Ansari, A. Z. CSI-FID: high throughput label-free detection of DNA binding molecules. Bioorg Med. Chem. Lett. 19, 3779–3782 (2009).
Article CAS Google Scholar
Oliphant, A. R., Brandl, C. J. & Struhl, K. Defining the sequence specificity of DNA-binding proteins by selecting binding sites from random-sequence oligonucleotides: analysis of yeast GCN4 protein. Mol. Cell. Biol. 9, 2944–2949 (1989).
Article CAS Google Scholar
Tuerk, C. & Gold, L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249, 505–510 (1990).
Article CAS Google Scholar
Blackwell, T. K. & Weintraub, H. Differences and similarities in DNA-binding preferences of MyoD and E2A protein complexes revealed by binding site selection. Science 250, 1104–1110 (1990).
Article CAS Google Scholar
Wright, W. E., Binder, M. & Funk, W. Cyclic amplification and selection of targets (CASTing) for the myogenin consensus binding site. Mol. Cell. Biol. 11, 4104–4110 (1991).
Article CAS Google Scholar
Fields, D. S., He, Y., Al-Uzri, A. Y. & Stormo, G. D. Quantitative specificity of the Mnt repressor. J. Mol. Biol. 271, 178–194 (1997).
Article CAS Google Scholar
Liu, X., Noll, D. M., Lieb, J. D. & Clarke, N. D. DIP-chip: rapid and accurate determination of DNA-binding specificity. Genome Res. 15, 421–427 (2005).
Article CAS Google Scholar
Roulet, E. et al. High-throughput SELEX SAGE method for quantitative modeling of transcription-factor binding sites. Nature Biotech. 20, 831–835 (2002).
Article CAS Google Scholar
Nagaraj, V. H., O'Flanagan, R. A. & Sengupta, A. M. Better estimation of protein–DNA interaction parameters improve prediction of functional sites. BMC Biotechnol. 8, 94 (2008).
Article Google Scholar
Zhao, Y., Granas, D. & Stormo, G. D. Inferring binding energies from selected binding sites. PLoS Comput. Biol. 5, e1000590 (2009). Introduction of HT-SELEX and the maximum likelihood method 'binding energy estimates using maximum likelihood' (BEEML) for obtaining binding energy models from the data.
Article Google Scholar
Zykovich, A., Korf, I. & Segal, D. J. Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing. Nucleic Acids Res. 37, e151 (2009).
Article Google Scholar
Jolma, A. et al. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 20, 861–873 (2010). This study describes the use of HT-SELEX in parallel to determine the binding specificities of several human TFs.
Article CAS Google Scholar
Meng, X., Brodsky, M. H. & Wolfe, S. A. A bacterial one-hybrid system for determining the DNA-binding specificity of transcription factors. Nature Biotech. 23, 988–994 (2005). The introduction of an efficient B1H approach for determining TF binding specificities.
Article CAS Google Scholar
Meng, X. & Wolfe, S. A. Identifying DNA sequences recognized by a transcription factor using a bacterial one-hybrid system. Nature Protoc. 1, 30–45 (2006).
Article CAS Google Scholar
Noyes, M. B. et al. Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites. Cell 133, 1277–1289 (2008).
Article CAS Google Scholar
Noyes, M. B. et al. A systematic characterization of factors that regulate Drosophila segmentation via a bacterial one-hybrid system. Nucleic Acids Res. 36, 2547–2560 (2008).
Article CAS Google Scholar
Stormo, G. D. & Zhao, Y. Putting numbers on the network connections. Bioessays 29, 717–721 (2007).
Article CAS Google Scholar
Benos, P. V., Bulyk, M. L. & Stormo, G. D. Additivity in protein–DNA interactions: how good an approximation is it? Nucleic Acids Res. 30, 4442–4451 (2002).
Article CAS Google Scholar
Alleyne, T. M. et al. Predicting the binding preference of transcription factors to individual DNA k-mers. Bioinformatics 25, 1012–1018 (2009).
Article CAS Google Scholar
Benos, P. V., Lapedes, A. S. & Stormo, G. D. Probabilistic code for DNA recognition by proteins of the EGR family. J. Mol. Biol. 323, 701–727 (2002).
Article CAS Google Scholar
Cathomen, T. & Joung, J. K. Zinc-finger nucleases: the next generation emerges. Mol. Ther. 16, 1200–1207 (2008).
Article CAS Google Scholar
Schneider, T. D., Stormo, G. D., Gold, L. & Ehrenfeucht, A. Information content of binding sites on nucleotide sequences. J. Mol. Biol. 188, 415–431 (1986).
Article CAS Google Scholar
Stormo, G. D. & Fields, D. S. Specificity, free energy and information content in protein–DNA interactions. Trends Biochem. Sci. 23, 109–113 (1998).
Article CAS Google Scholar
Schneider, T. D. & Stephens, R. M. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100 (1990).
Article CAS Google Scholar
Stormo, G. D., Schneider, T. D. & Gold, L. Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Res. 14, 6661–6679 (1986).
Article CAS Google Scholar
Lee, M. L., Bulyk, M. L., Whitmore, G. A. & Church, G. M. A statistical model for investigating binding probabilities of DNA nucleotide sequences using microarrays. Biometrics 58, 981–988 (2002).
Article Google Scholar
Djordjevic, M., Sengupta, A. M. & Shraiman, B. I. A biophysical approach to transcription factor binding site discovery. Genome Res. 13, 2381–2390 (2003).
Article CAS Google Scholar
Fordyce, P. M. et al. De novo identification and biophysical characterization of transcription-factor binding sites with microfluidic affinity analysis. Nature Biotech. 28, 970–975 (2010).
Article CAS Google Scholar

Download references

Acknowledgements

We thank S. Wolfe for providing the plasmids and cells for the B1H experiments and useful comments on the manuscript. We thank members of the Stormo Laboratory for comments on the manuscript and especially R. Christensen, D. Granas, Z. Zuo and L. Schriefer for providing the data from the B1H selections with ZIF268.

Author information

Authors and Affiliations

Department of Genetics, Washington University School of Medicine, 4444 Forest Park Blvd, St. Louis, 63110-8510, Missouri, USA
Gary D. Stormo & Yue Zhao

Authors

Gary D. Stormo
View author publications
You can also search for this author in PubMed Google Scholar
Yue Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gary D. Stormo.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Glossary

In vivo: In this context we use the term in vivo to refer to any experiments performed on living cells, whether within or outside a whole organism (sometimes referred to as ex vivo).
Chromatin immunoprecipitation: (ChIP). A technique that is used to identify the location of DNA-binding proteins and epigenetic marks in the genome. Genomic sequences containing the mark of interest are enriched by binding soluble DNA chromatin extracts (complexes of DNA and protein) to an antibody that recognizes the mark. ChIP can be followed by analysis of the precipitated DNA through hybridization to a microarray (ChIP–chip) or by sequencing of the precipitated DNA (ChIP–seq).
Motif: In this Review motif refers to a representation of the specificity of a transcription factor. For instance, a position weight matrix is a motif for a transcription factor, but there could be other ways to represent the specificity.
Microfluidic device: A device in which fluids are conveyed to samples in channels with diameters in the order of 1 μm; these chambers can be used to precisely and dynamically control the microenvironment to which cells are exposed.
K _d: The dissociation constant between two molecules (here, for a transcription factor and a DNA sequence). It is the ratio of the off-to-on rate for the formation and dissolution of the complex.
Consensus sequence: A single sequence (possibly degenerate) that represents the specificity of a transcription factor. It is usually the highest affinity sequence and would be the most frequent base at each position in a collection of binding sites. It is also possible to use degeneracies, for example, 'R = A or G', if two (or more) bases are equivalent.
Gibbs standard free energy: The difference in free energy between the equilibrium state and the standard state (all reactants and products at 1M concentration).
Gas constant: (Represented by R). The thermodynamic constant relating energy per mole to temperature.
Microarray: A high-density array of DNA molecules, typically with each element of the array containing a different DNA sequence. Single-stranded DNA arrays are used to hybridize to labelled DNA (or RNA) sequences to determine the relative abundances of different sequences. Double-stranded DNA arrays are used in protein-binding microarray and cognate site identifier methods to determine the binding preferences of transcription factors or other DNA-binding molecules.
Gel filtration: A permeable gel, such a polyacrylamide or agarose, is used to separate molecules and molecular complexes based on their size. DNA will migrate faster through a gel than the same DNA that is bound to a protein, so the protein–DNA complex can be separated from the free DNA.
Barcoding: The process of adding the same unique DNA sequence to the ends of DNA molecules from the same experiment, so that the resulting DNA reads can be traced back to the experiment that generated them, allowing several experiments to be sequenced simultaneously (multiplexed).
Multiplexing: The process of mixing several experimental samples together and sequencing them (or some other process) simultaneously. Each sample can be barcoded to recover the information about its experimental origin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stormo, G., Zhao, Y. Determining the specificity of protein–DNA interactions. Nat Rev Genet 11, 751–760 (2010). https://doi.org/10.1038/nrg2845

Download citation

Published: 28 September 2010
Issue Date: November 2010
DOI: https://doi.org/10.1038/nrg2845

This article is cited by

Obtaining genetics insights from deep learning via explainable artificial intelligence
- Gherman Novakovsky
- Nick Dexter
- Sara Mostafavi
Nature Reviews Genetics (2023)
Quantification of absolute transcription factor binding affinities in the native chromatin context using BANC-seq
- Hannah K. Neikes
- Katarzyna W. Kliza
- Michiel Vermeulen
Nature Biotechnology (2023)
abc4pwm: affinity based clustering for position weight matrices in applications of DNA sequence analysis
- Omer Ali
- Amna Farooq
- Junbai Wang
BMC Bioinformatics (2022)
Modeling binding specificities of transcription factor pairs with random forests
- Anni A. Antikainen
- Markus Heinonen
- Harri Lähdesmäki
BMC Bioinformatics (2022)
Massively parallel kinetic profiling of natural and engineered CRISPR nucleases
- Stephen K. Jones
- John A. Hawkins
- Ilya J. Finkelstein
Nature Biotechnology (2021)