High-throughput SELEX–SAGE method for quantitative modeling of transcription-factor binding sites

Roulet, Emmanuelle; Busso, Stéphane; Camargo, Anamaria A.; Simpson, Andrew J.G.; Mermod, Nicolas; Bucher, Philipp

doi:10.1038/nbt718

Technical Report
Published: 08 July 2002

High-throughput SELEX–SAGE method for quantitative modeling of transcription-factor binding sites

Emmanuelle Roulet¹,
Stéphane Busso¹,
Anamaria A. Camargo²,
Andrew J.G. Simpson²,
Nicolas Mermod¹ &
…
Philipp Bucher³

Nature Biotechnology volume 20, pages 831–835 (2002)Cite this article

1952 Accesses
169 Citations
6 Altmetric
Metrics details

Abstract

The ability to determine the location and relative strength of all transcription-factor binding sites in a genome is important both for a comprehensive understanding of gene regulation and for effective promoter engineering in biotechnological applications. Here we present a bioinformatically driven experimental method to accurately define the DNA-binding sequence specificity of transcription factors. A generalized profile¹ was used as a predictive quantitative model for binding sites, and its parameters were estimated from in vitro–selected ligands using standard hidden Markov model training algorithms^2,3. Computer simulations showed that several thousand low- to medium-affinity sequences are required to generate a profile of desired accuracy. To produce data on this scale, we applied high-throughput genomics methods to the biochemical problem addressed here. A method combining systematic evolution of ligands by exponential enrichment (SELEX)⁴ and serial analysis of gene expression (SAGE)⁵ protocols was coupled to an automated quality-controlled sequence extraction procedure based on Phred quality scores⁶. This allowed the sequencing of a database of more than 10,000 potential DNA ligands for the CTF/NFI transcription factor. The resulting binding-site model defines the sequence specificity of this protein with a high degree of accuracy not achieved earlier and thereby makes it possible to identify previously unknown regulatory sequences in genomic DNA. A covariance analysis of the selected sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: CTF/NFI sequence-specific DNA–protein interaction profiles.**

**Figure 2: Use of a SELEX experiment with a SAGE-inspired multimerization step to construct a new CTF/NFI binding-site model.**

Deciphering eukaryotic gene-regulatory logic with 100 million random promoters

Article 02 December 2019

Systematic analysis of binding of transcription factors to noncoding variants

Article 27 January 2021

Prediction of protein–ligand binding affinity from sequencing data with interpretable machine learning

Article Open access 23 May 2022

References

Bucher, P., Karplus, K., Moeri, N. & Hofmann, K. A flexible motif search technique based on generalized profiles. Comput. Chem. 20, 3–29 (1996).
Article CAS Google Scholar
Durbin, R., Eddy, S., Krogh, A. & Mitchison, G. Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids (Cambridge Univ. Press, Cambridge, United Kingdom, 1998).
Book Google Scholar
Ehret, G.B. et al. DNA binding specificity of different STAT proteins. Comparison of in vitro specificity with natural target sites. J. Biol. Chem. 276, 6675–6688 (2001).
Article CAS Google Scholar
Klug, S.J. & Famulok, M. All you wanted to know about SELEX. Mol. Biol. Rep. 20, 97–107 (1994).
Article CAS Google Scholar
Velculescu, V.E., Zhang, L., Vogelstein, B. & Kinzler, K.W. Serial analysis of gene expression. Science 270, 484–487 (1995).
Article CAS Google Scholar
Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 3, 175–185 (1998).
Article Google Scholar
Roulet, E., Fisch, I., Junier, T., Bucher, P. & Mermod, N. Evaluation of computer tools for the prediction of transcription factor binding sites on genomic DNA. In Silico Biol. 1, 21–28 (1998).
CAS PubMed Google Scholar
Roulet, E. et al. Experimental analysis and computer prediction of CTF/NF-I transcription factor DNA binding sites. J. Mol. Biol. 297, 833–848 (2000).
Article CAS Google Scholar
Berg, O.G. & von Hippel, P.H. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J. Mol. Biol. 193, 723–750 (1987).
Article CAS Google Scholar
Goodman, S.D., Velten, N.J., Gao, Q., Robinson, S. & Segall, A.M. In vitro selection of integration host factor binding sites. J. Bacteriol. 181, 3246–3255 (1999).
Article CAS Google Scholar
Fields, D.S., He, Y.Y., Al-Uzri, A.Y. & Stormo, G.D. Quantitative specificity of the Mnt repressor. J. Mol. Biol. 271, 178–194 (1997).
Article CAS Google Scholar
Vant-Hull, B., Payano-Baez, A., Davis, R.H. & Gold, L. The mathematics of SELEX against complex targets. J. Mol. Biol. 278, 579–597 (1998).
Article CAS Google Scholar
Meisterernst, M., Gander, I., Rogge, L. & Winnacker, E.L. A quantitative analysis of nuclear factor I/DNA interactions. Nucleic Acids Res. 16, 4419–4435 (1988).
Article CAS Google Scholar
Perier, R.C., Praz, V., Junier, T. & Bucher, P. The eukaryotic promoter database EPD. Nucleic Acids Res. 28, 302–303 (2000).
Article CAS Google Scholar
Man, T.K. & Stormo, G.D. Non-independence of Mnt repressor-operator interactions determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. Nucleic Acids Res. 29, 2471–2478 (2001).
Article CAS Google Scholar
Zhang, M.Q. & Marr, T.G. A weight array method for splicing signal analysis. Comput. Appl. Biosci. 9, 499–509 (1993).
CAS PubMed Google Scholar
Burge, C.B. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Article CAS Google Scholar
Hughey Hughey, R. & Krogh, A. Hidden Markov models for sequence analysis. Extension and analysis of the basic method. Comput. Appl. Biosci. 12, 95–107 (1996).
PubMed Google Scholar

Download references

Acknowledgements

We thank Victor Jongeneel for support and suggestions, Roman Chrast and Stylianos Antonarakis for help with the SAGE procedure, Khalil Kadaoui for assistance, and Alan McNair for helpful comments on the manuscript. The financial support of the Ludwig Institute for Cancer Research, the Etat de Vaud, and the Swiss National Science Foundation (grants 31-63933.00 and 31-59370.99) are gratefully acknowledged.

Author information

Authors and Affiliations

Laboratory of Molecular Biotechnology, Center for Biotechnology UNIL-EPFL, and Institute of Animal Biology, University of Lausanne, Lausanne, 1015, Switzerland
Emmanuelle Roulet, Stéphane Busso & Nicolas Mermod
Laboratory of Cancer Genetics, Ludwig Institute for Cancer Research, Sao Paulo, 01509-010, Brazil
Anamaria A. Camargo & Andrew J.G. Simpson
Swiss Institute for Experimental Cancer Research, Swiss Institute of Bioinformatics, Epalinges, 1066, Switzerland
Philipp Bucher

Authors

Emmanuelle Roulet
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Busso
View author publications
You can also search for this author in PubMed Google Scholar
Anamaria A. Camargo
View author publications
You can also search for this author in PubMed Google Scholar
Andrew J.G. Simpson
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Mermod
View author publications
You can also search for this author in PubMed Google Scholar
Philipp Bucher
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Nicolas Mermod or Philipp Bucher.

Supplementary information

Supplementary Table 1 (PDF 56 kb)

Supplementary Fig. 1 (GIF 62 kb)

Supplementary Fig. 2 (GIF 12 kb)

Supplementary Fig. 3 (GIF 21 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roulet, E., Busso, S., Camargo, A. et al. High-throughput SELEX–SAGE method for quantitative modeling of transcription-factor binding sites. Nat Biotechnol 20, 831–835 (2002). https://doi.org/10.1038/nbt718

Download citation

Received: 28 September 2001
Accepted: 10 May 2002
Published: 08 July 2002
Issue Date: 01 August 2002
DOI: https://doi.org/10.1038/nbt718

This article is cited by

DNA–protein interaction studies: a historical and comparative analysis
- Ricardo André Campos Ferraz
- Ana Lúcia Gonçalves Lopes
- Sílvia Vieira de Almeida Coimbra
Plant Methods (2021)
A deep learning framework to predict binding preference of RNA constituents on protein surface
- Jordy Homing Lam
- Yu Li
- Xin Gao
Nature Communications (2019)
From biophysics to ‘omics and systems biology
- Marko Djordjevic
- Andjela Rodic
- Stefan Graovac
European Biophysics Journal (2019)
SELMAP - SELEX affinity landscape MAPping of transcription factor binding sites using integrated microfluidics
- Dana Chen
- Yaron Orenstein
- Doron Gerber
Scientific Reports (2016)
Reliable scaling of position weight matrices for binding strength comparisons between transcription factors
- Xiaoyan Ma
- Daphne Ezer
- Boris Adryan
BMC Bioinformatics (2015)

High-throughput SELEX–SAGE method for quantitative modeling of transcription-factor binding sites

Abstract

Access options

Similar content being viewed by others

Deciphering eukaryotic gene-regulatory logic with 100 million random promoters

Systematic analysis of binding of transcription factors to noncoding variants

Prediction of protein–ligand binding affinity from sequencing data with interpretable machine learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Supplementary information

Supplementary Table 1 (PDF 56 kb)

Supplementary Fig. 1 (GIF 62 kb)

Supplementary Fig. 2 (GIF 12 kb)

Supplementary Fig. 3 (GIF 21 kb)

Rights and permissions

About this article

Cite this article

This article is cited by

DNA–protein interaction studies: a historical and comparative analysis

A deep learning framework to predict binding preference of RNA constituents on protein surface

From biophysics to ‘omics and systems biology

SELMAP - SELEX affinity landscape MAPping of transcription factor binding sites using integrated microfluidics

Reliable scaling of position weight matrices for binding strength comparisons between transcription factors

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links