Nature Biotechnology20, 831 - 835 (2002)
Published online: 8 July 2002; | doi:10.1038/nbt718
High-throughput SELEX−SAGE method for quantitative modeling of
transcription-factor binding sites
Emmanuelle Roulet1, Stéphane Busso1, Anamaria A. Camargo2, Andrew J.G. Simpson2, Nicolas Mermod1
& Philipp Bucher3
1
Laboratory of Molecular Biotechnology, Center for
Biotechnology UNIL-EPFL, and Institute of Animal Biology, University of
Lausanne, 1015 Lausanne,
Switzerland.
2
Laboratory of Cancer Genetics, Ludwig Institute
for Cancer Research, Sao Paulo, 01509-010,
Brazil.
3
Swiss Institute for Experimental Cancer Research,
Swiss Institute of Bioinformatics, 1066 Epalinges,
Switzerland.
The ability to determine the location and relative strength of all
transcription-factor binding sites in a genome is important both for a
comprehensive understanding of gene regulation and for effective promoter
engineering in biotechnological applications. Here we present a
bioinformatically driven experimental method to accurately define the
DNA-binding sequence specificity of transcription factors. A generalized
profile1 was used as a predictive quantitative model for binding
sites, and its parameters were estimated from in vitro−selected
ligands using standard hidden Markov model training algorithms2,
3. Computer simulations showed that several thousand low- to
medium-affinity sequences are required to generate a profile of desired
accuracy. To produce data on this scale, we applied high-throughput genomics
methods to the biochemical problem addressed here. A method combining
systematic evolution of ligands by exponential enrichment (SELEX)4 and serial analysis of gene expression (SAGE)5
protocols was coupled to an automated quality-controlled sequence extraction
procedure based on Phred quality scores6. This allowed the
sequencing of a database of more than 10,000 potential DNA ligands for the
CTF/NFI transcription factor. The resulting binding-site model defines the
sequence specificity of this protein with a high degree of accuracy not
achieved earlier and thereby makes it possible to identify previously unknown
regulatory sequences in genomic DNA. A covariance analysis of the selected
sites revealed non-independent base preferences at different nucleotide
positions, providing insight into the binding mechanism.