Key Points
-
Sequence-specific transcription factors (TFs) control gene expression.
-
New methods allow for the rapid and accurate determination of TF binding specificity.
-
Medium-throughput methods using microfluidic devices or surface plasmon resonance (SPR) can determine binding affinities directly.
-
Microarray methods, such as protein-binding microarray (PBM) and cognate site identifier (CSI), can lead to very high-throughput measurements of TF specificity with binding sites of up to about ten base pairs.
-
High-throughput versions of SELEX, with or without multiple rounds of selection, can provide accurate binding site specificities rapidly.
-
Bacterial one-hybrid methods can also be made high-throughput and give accurate binding site models.
-
Computational models can utilize the high-throughput data to make predictions beyond the data itself, provide information about the interaction and help in the design of factors with novel specificity.
Abstract
Proteins, such as many transcription factors, that bind to specific DNA sequences are essential for the proper regulation of gene expression. Identifying the specific sequences that each factor binds can help to elucidate regulatory networks within cells and how genetic variation can cause disruption of normal gene expression, which is often associated with disease. Traditional methods for determining the specificity of DNA-binding proteins are slow and laborious, but several new high-throughput methods can provide comprehensive binding information much more rapidly. Combined with in vivo determinations of transcription factor binding locations, this information provides more detailed views of the regulatory circuitry of cells and the effects of variation on gene expression.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Farnham, P. J. Insights from genomic profiling of transcription factors. Nature Rev. Genet. 10, 605–616 (2009).
Li, X. Y. et al. Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm. PLoS Biol. 6, e27 (2008).
Zhang, X. et al. Genome-wide analysis of cAMP-response element binding protein occupancy, phosphorylation, and target gene activation in human tissues. Proc. Natl Acad. Sci. USA 102, 4459–4464 (2005).
Madan Babu, M., Teichmann, S. A. & Aravind, L. Evolutionary dynamics of prokaryotic transcriptional regulatory networks. J. Mol . Biol. 358, 614–633 (2006).
Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A. & Luscombe, N. M. A census of human transcription factors: function, expression and evolution. Nature Rev. Genet. 10, 252–263 (2009).
Kharchenko, P. V., Tolstorukov, M. Y. & Park, P. J. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nature Biotech. 26, 1351–1359 (2008).
Park, P. J. ChIP-seq: advantages and challenges of a maturing technology. Nature Rev. Genet. 10, 669–680 (2009).
Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein–DNA interactions. Science 316, 1497–1502 (2007).
Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nature Methods 6, S22–S32 (2009).
Ji, H. et al. An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nature Biotech. 26, 1293–1300 (2008).
Gordan, R., Hartemink, A. J. & Bulyk, M. L. Distinguishing direct versus indirect transcription factor–DNA interactions. Genome Res. 19, 2090–2100 (2009).
Maerkl, S. J. & Quake, S. R. A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233–237 (2007). This paper introduced the MITOMI method and demonstrated its application on four bHLH TFs.
Paul, S., Vadgama, P. & Ray, A. K. Surface plasmon resonance imaging for biosensing. IET Nanobiotechnol. 3, 71–80 (2009).
Shumaker-Parry, J. S., Aebersold, R. & Campbell, C. T. Parallel, quantitative measurement of protein binding to a 120-element double-stranded DNA array in real time using surface plasmon resonance microscopy. Anal. Chem. 76, 2071–2082 (2004).
Campbell, C. T. & Kim, G. SPR microscopy and its applications to high-throughput analyses of biomolecular binding events and their kinetics. Biomaterials 28, 2380–2392 (2007). A review of SPR methods and applications, including the study of protein–DNA interactions.
Berger, M. F. et al. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nature Biotech. 24, 1429–1435 (2006). This paper introduced the universal PBM that includes all possible ten-base-long binding sites and its application on several TFs.
Mukherjee, S. et al. Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays. Nature Genet. 36, 1331–1339 (2004).
Bulyk, M. L., Huang, X., Choo, Y. & Church, G. M. Exploring the DNA-binding specificities of zinc fingers with DNA microarrays. Proc. Natl Acad. Sci. USA 98, 7158–7163 (2001).
Berger, M. F. & Bulyk, M. L. Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nature Protoc. 4, 393–411 (2009).
Philippakis, A. A., Qureshi, A. M., Berger, M. F. & Bulyk, M. L. Design of compact, universal DNA microarrays for protein binding microarray experiments. J. Comput. Biol. 15, 655–665 (2008).
Newburger, D. E. & Bulyk, M. L. UniPROBE: an online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Res. 37, D77–D82 (2009).
Zhu, C. et al. High-resolution DNA-binding specificity analysis of yeast transcription factors. Genome Res. 19, 556–566 (2009).
Grove, C. A. et al. A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell 138, 314–327 (2009).
Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009).
Puckett, J. W. et al. Quantitative microarray profiling of DNA-binding molecules. J. Am. Chem. Soc. 129, 12310–12319 (2007).
Warren, C. L. et al. Defining the sequence-recognition profile of DNA-binding molecules. Proc. Natl Acad. Sci. USA 103, 867–872 (2006). This paper introduced the CSI method and its application to TFs as well as small DNA-binding molecules.
Carlson, C. D. et al. Specificity landscapes of DNA binding molecules elucidate biological function. Proc. Natl Acad. Sci. USA 107, 4544–4549 (2010).
Hauschild, K. E., Stover, J. S., Boger, D. L. & Ansari, A. Z. CSI-FID: high throughput label-free detection of DNA binding molecules. Bioorg Med. Chem. Lett. 19, 3779–3782 (2009).
Oliphant, A. R., Brandl, C. J. & Struhl, K. Defining the sequence specificity of DNA-binding proteins by selecting binding sites from random-sequence oligonucleotides: analysis of yeast GCN4 protein. Mol. Cell. Biol. 9, 2944–2949 (1989).
Tuerk, C. & Gold, L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249, 505–510 (1990).
Blackwell, T. K. & Weintraub, H. Differences and similarities in DNA-binding preferences of MyoD and E2A protein complexes revealed by binding site selection. Science 250, 1104–1110 (1990).
Wright, W. E., Binder, M. & Funk, W. Cyclic amplification and selection of targets (CASTing) for the myogenin consensus binding site. Mol. Cell. Biol. 11, 4104–4110 (1991).
Fields, D. S., He, Y., Al-Uzri, A. Y. & Stormo, G. D. Quantitative specificity of the Mnt repressor. J. Mol. Biol. 271, 178–194 (1997).
Liu, X., Noll, D. M., Lieb, J. D. & Clarke, N. D. DIP-chip: rapid and accurate determination of DNA-binding specificity. Genome Res. 15, 421–427 (2005).
Roulet, E. et al. High-throughput SELEX SAGE method for quantitative modeling of transcription-factor binding sites. Nature Biotech. 20, 831–835 (2002).
Nagaraj, V. H., O'Flanagan, R. A. & Sengupta, A. M. Better estimation of protein–DNA interaction parameters improve prediction of functional sites. BMC Biotechnol. 8, 94 (2008).
Zhao, Y., Granas, D. & Stormo, G. D. Inferring binding energies from selected binding sites. PLoS Comput. Biol. 5, e1000590 (2009). Introduction of HT-SELEX and the maximum likelihood method 'binding energy estimates using maximum likelihood' (BEEML) for obtaining binding energy models from the data.
Zykovich, A., Korf, I. & Segal, D. J. Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing. Nucleic Acids Res. 37, e151 (2009).
Jolma, A. et al. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 20, 861–873 (2010). This study describes the use of HT-SELEX in parallel to determine the binding specificities of several human TFs.
Meng, X., Brodsky, M. H. & Wolfe, S. A. A bacterial one-hybrid system for determining the DNA-binding specificity of transcription factors. Nature Biotech. 23, 988–994 (2005). The introduction of an efficient B1H approach for determining TF binding specificities.
Meng, X. & Wolfe, S. A. Identifying DNA sequences recognized by a transcription factor using a bacterial one-hybrid system. Nature Protoc. 1, 30–45 (2006).
Noyes, M. B. et al. Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites. Cell 133, 1277–1289 (2008).
Noyes, M. B. et al. A systematic characterization of factors that regulate Drosophila segmentation via a bacterial one-hybrid system. Nucleic Acids Res. 36, 2547–2560 (2008).
Stormo, G. D. & Zhao, Y. Putting numbers on the network connections. Bioessays 29, 717–721 (2007).
Benos, P. V., Bulyk, M. L. & Stormo, G. D. Additivity in protein–DNA interactions: how good an approximation is it? Nucleic Acids Res. 30, 4442–4451 (2002).
Alleyne, T. M. et al. Predicting the binding preference of transcription factors to individual DNA k-mers. Bioinformatics 25, 1012–1018 (2009).
Benos, P. V., Lapedes, A. S. & Stormo, G. D. Probabilistic code for DNA recognition by proteins of the EGR family. J. Mol. Biol. 323, 701–727 (2002).
Cathomen, T. & Joung, J. K. Zinc-finger nucleases: the next generation emerges. Mol. Ther. 16, 1200–1207 (2008).
Schneider, T. D., Stormo, G. D., Gold, L. & Ehrenfeucht, A. Information content of binding sites on nucleotide sequences. J. Mol. Biol. 188, 415–431 (1986).
Stormo, G. D. & Fields, D. S. Specificity, free energy and information content in protein–DNA interactions. Trends Biochem. Sci. 23, 109–113 (1998).
Schneider, T. D. & Stephens, R. M. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100 (1990).
Stormo, G. D., Schneider, T. D. & Gold, L. Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Res. 14, 6661–6679 (1986).
Lee, M. L., Bulyk, M. L., Whitmore, G. A. & Church, G. M. A statistical model for investigating binding probabilities of DNA nucleotide sequences using microarrays. Biometrics 58, 981–988 (2002).
Djordjevic, M., Sengupta, A. M. & Shraiman, B. I. A biophysical approach to transcription factor binding site discovery. Genome Res. 13, 2381–2390 (2003).
Fordyce, P. M. et al. De novo identification and biophysical characterization of transcription-factor binding sites with microfluidic affinity analysis. Nature Biotech. 28, 970–975 (2010).
Acknowledgements
We thank S. Wolfe for providing the plasmids and cells for the B1H experiments and useful comments on the manuscript. We thank members of the Stormo Laboratory for comments on the manuscript and especially R. Christensen, D. Granas, Z. Zuo and L. Schriefer for providing the data from the B1H selections with ZIF268.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Glossary
- In vivo
-
In this context we use the term in vivo to refer to any experiments performed on living cells, whether within or outside a whole organism (sometimes referred to as ex vivo).
- Chromatin immunoprecipitation
-
(ChIP). A technique that is used to identify the location of DNA-binding proteins and epigenetic marks in the genome. Genomic sequences containing the mark of interest are enriched by binding soluble DNA chromatin extracts (complexes of DNA and protein) to an antibody that recognizes the mark. ChIP can be followed by analysis of the precipitated DNA through hybridization to a microarray (ChIP–chip) or by sequencing of the precipitated DNA (ChIP–seq).
- Motif
-
In this Review motif refers to a representation of the specificity of a transcription factor. For instance, a position weight matrix is a motif for a transcription factor, but there could be other ways to represent the specificity.
- Microfluidic device
-
A device in which fluids are conveyed to samples in channels with diameters in the order of 1 μm; these chambers can be used to precisely and dynamically control the microenvironment to which cells are exposed.
- K d
-
The dissociation constant between two molecules (here, for a transcription factor and a DNA sequence). It is the ratio of the off-to-on rate for the formation and dissolution of the complex.
- Consensus sequence
-
A single sequence (possibly degenerate) that represents the specificity of a transcription factor. It is usually the highest affinity sequence and would be the most frequent base at each position in a collection of binding sites. It is also possible to use degeneracies, for example, 'R = A or G', if two (or more) bases are equivalent.
- Gibbs standard free energy
-
The difference in free energy between the equilibrium state and the standard state (all reactants and products at 1M concentration).
- Gas constant
-
(Represented by R). The thermodynamic constant relating energy per mole to temperature.
- Microarray
-
A high-density array of DNA molecules, typically with each element of the array containing a different DNA sequence. Single-stranded DNA arrays are used to hybridize to labelled DNA (or RNA) sequences to determine the relative abundances of different sequences. Double-stranded DNA arrays are used in protein-binding microarray and cognate site identifier methods to determine the binding preferences of transcription factors or other DNA-binding molecules.
- Gel filtration
-
A permeable gel, such a polyacrylamide or agarose, is used to separate molecules and molecular complexes based on their size. DNA will migrate faster through a gel than the same DNA that is bound to a protein, so the protein–DNA complex can be separated from the free DNA.
- Barcoding
-
The process of adding the same unique DNA sequence to the ends of DNA molecules from the same experiment, so that the resulting DNA reads can be traced back to the experiment that generated them, allowing several experiments to be sequenced simultaneously (multiplexed).
- Multiplexing
-
The process of mixing several experimental samples together and sequencing them (or some other process) simultaneously. Each sample can be barcoded to recover the information about its experimental origin.
Rights and permissions
About this article
Cite this article
Stormo, G., Zhao, Y. Determining the specificity of protein–DNA interactions. Nat Rev Genet 11, 751–760 (2010). https://doi.org/10.1038/nrg2845
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrg2845
This article is cited by
-
Obtaining genetics insights from deep learning via explainable artificial intelligence
Nature Reviews Genetics (2023)
-
Quantification of absolute transcription factor binding affinities in the native chromatin context using BANC-seq
Nature Biotechnology (2023)
-
abc4pwm: affinity based clustering for position weight matrices in applications of DNA sequence analysis
BMC Bioinformatics (2022)
-
Modeling binding specificities of transcription factor pairs with random forests
BMC Bioinformatics (2022)
-
Massively parallel kinetic profiling of natural and engineered CRISPR nucleases
Nature Biotechnology (2021)