Abstract
This protocol shows how to detect putative cis-regulatory elements and regions enriched in such elements with the regulatory sequence analysis tools (RSAT) web server (http://rsat.ulb.ac.be/rsat/). The approach applies to known transcription factors, whose binding specificity is represented by position-specific scoring matrices, using the program matrix-scan. The detection of individual binding sites is known to return many false predictions. However, results can be strongly improved by estimating P value, and by searching for combinations of sites (homotypic and heterotypic models). We illustrate the detection of sites and enriched regions with a study case, the upstream sequence of the Drosophila melanogaster gene even-skipped. This protocol is also tested on random control sequences to evaluate the reliability of the predictions. Each task requires a few minutes of computation time on the server. The complete protocol can be executed in about one hour.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Wasserman, W.W. & Sandelin, A. Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5, 276–287 (2004).
van Helden, J. Regulatory sequence analysis tools. Nucleic Acids Res. 31, 3593–3596 (2003).
van Helden, J., André, B. & Collado-Vides, J. A web site for the computational analysis of yeast regulatory sequences. Yeast 16, 177–187 (2000).
Defrance, M., Janky, R., Sand, O. & van Helden, J. Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences. Nat. Protoc. doi:10.1038/nprot.2008.98 (2008).
Sand, O., Thomas-Chollier, M., Vervisch, E. & van Helden, J. Analyzing multiple data sets by interconnecting RSAT programs via SOAP Web services—an example with ChIP-chip data. Nat. Protoc. doi:10.1038/nprot.2008.99 (2008).
Brohée, S., Faust, K., Lima-Mendez, G., Vanderstocken, G. & van Helden, J. Network Analysis Tools: from biological networks to clusters and pathways. Nat. Protoc. doi:10.1038/nprot.2008.100 (2008).
Wingender, E. TRANSFAC, TRANSPATH and CYTOMER as starting points for an ontology of regulatory networks. In Silico Biol. 4, 55–61 (2004).
Wingender, E., Dietze, P., Karas, H. & Knüppel, R. TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24, 238–241 (1996).
Gama-Castro, S. et al. RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation. Nucleic Acids Res. 36, D120–D124 (2008).
Huerta, A.M., Salgado, H., Thieffry, D. & Collado-Vides, J. RegulonDB: a database on transcriptional regulation in Escherichia coli. Nucleic Acids Res. 26, 55–59 (1998).
Hertz, G.Z. & Hartzell, G.W. 3rd & Stormo, G.D. Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput. Appl. Biosci. 6, 81–92 (1990).
Hertz, G.Z. & Stormo, G.D. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999).
Coessens, B. et al. INCLUSive: a web portal and service registry for microarray and regulatory sequence analysis. Nucleic Acids Res. 31, 3468–3470 (2003).
Thijs, G. et al. A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17, 1113–1122 (2001).
Kel, A.E. et al. MATCH: a tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res. 31, 3576–3579 (2003).
Frith, M.C., Li, M.C. & Weng, Z. Cluster-Buster: finding dense clusters of motifs in DNA sequences. Nucleic Acids Res. 31, 3666–3668 (2003).
Philippakis, A.A., He, F.S. & Bulyk, M.L. Modulefinder: a tool for computational discovery of cis regulatory modules. Pac. Symp. Biocomput. 519–530 (2005).
Sosinsky, A., Bonin, C.P., Mann, R.S. & Honig, B. Target Explorer: an automated tool for the identification of new target genes for a specified set of transcription factors. Nucleic Acids Res. 31, 3589–3592 (2003).
Donaldson, I.J., Chapman, M. & Göttgens, B. TFBScluster: a resource for the characterization of transcriptional regulatory networks. Bioinformatics 21, 3058–3059 (2005).
Donaldson, I.J. & Göttgens, B. TFBScluster web server for the identification of mammalian composite regulatory elements. Nucleic Acids Res. 34, W524–W528 (2006).
Berman, B.P. et al. Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biol. 5, R61 (2004).
Berman, B.P. et al. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl. Acad. Sci. USA 99, 757–762 (2002).
Pierstorff, N., Bergman, C.M. & Wiehe, T. Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA. Bioinformatics 22, 2858–2864 (2006).
Aerts, S., Van Loo, P., Moreau, Y. & De Moor, B. A genetic algorithm for the detection of new cis-regulatory modules in sets of coregulated genes. Bioinformatics 20, 1974–1976 (2004).
Loots, G.G. & Ovcharenko, I. rVISTA 2.0: evolutionary analysis of transcription factor binding sites. Nucleic Acids Res. 32, W217–W221 (2004).
Levine, M. & Tjian, R. Transcription regulation and animal diversity. Nature 424, 147–151 (2003).
Aerts, S. et al. Toucan: deciphering the cis-regulatory logic of coregulated genes. Nucleic Acids Res. 31, 1753–1764 (2003).
Stanojevic, D., Small, S. & Levine, M. Regulation of a segmentation stripe by overlapping activators and repressors in the Drosophila embryo. Science 254, 1385–1387 (1991).
Montgomery, S.B. et al. ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation. Bioinformatics 22, 637–640 (2006).
Griffith, O.L. et al. ORegAnno: an open-access community-driven resource for regulatory annotation. Nucleic Acids Res. 36, D107–D113 (2008).
Halfon, M.S., Gallo, S.M. & Bergman, C.M. REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila. Nucleic Acids Res. 36, D594–598 (2008).
Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J. & Wheeler, D.L. GenBank. Nucleic Acids Res. 35, D21–D25 (2007).
Flicek, P. et al. Ensembl 2008. Nucleic Acids Res. 36, D707–D714 (2008).
Sandelin, A., Alkema, W., Engström, P., Wasserman, W.W. & Lenhard, B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32, D91–D94 (2004).
Vlieghe, D. et al. A new generation of JASPAR, the open-access repository for transcription factor binding site profiles. Nucleic Acids Res. 34, D95–D97 (2006).
Bailey, T.L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).
Gallo, S.M., Li, L., Hu, Z. & Halfon, M.S. REDfly: a regulatory element database for Drosophila. Bioinformatics 22, 381–383 (2006).
Bina, M. The genome browser at UCSC for locating Genes, and much more! Mol. Biotechnol. 38, 269–275 (2008).
Staden, R. Methods for calculating the probabilities of finding patterns in sequences. Comput. Appl. Biosci. 5, 89–96 (1989).
Robin, S., Rodolphe, F. & Schbath, S. DNA, Words and Models—Statistics of Exceptional Words (Cambridge University Press, Cambridge, U.K., 2005).
van Helden, J., André, B. & Collado-Vides, J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998).
Acknowledgements
This work was supported by the Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture, FRIA (J.-V.T. PhD grant), the Vrije Universiteit Brussel (Geconcerteerde Onderzoeksactie 29) (M.T.-C. PhD grant), and by the BioSapiens Network of Excellence funded under the sixth Framework program of the European Communities (LSHG-CT-2003-503265). The postdoctoral grant of M.D. was funded by the Belgian Program on Interuniversity Attraction Poles, initiated by the Belgian Federal Science Policy Office, project P6/25 (BioMaGNet).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Turatsinze, JV., Thomas-Chollier, M., Defrance, M. et al. Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nat Protoc 3, 1578–1588 (2008). https://doi.org/10.1038/nprot.2008.97
Published:
Issue Date:
DOI: https://doi.org/10.1038/nprot.2008.97
This article is cited by
-
Comparative epigenomics reveals the impact of ruminant-specific regulatory elements on complex traits
BMC Biology (2022)
-
Genetic regulation of post-translational modification of two distinct proteins
Nature Communications (2022)
-
Dynamic changes in O-GlcNAcylation regulate osteoclast differentiation and bone loss via nucleoporin 153
Bone Research (2022)
-
FANCD2 modulates the mitochondrial stress response to prevent common fragile site instability
Communications Biology (2021)
-
Identification of a conserved set of cytokinin-responsive genes expressed in the fruits of Prunus persica
Plant Growth Regulation (2020)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.