Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules

Abstract

This protocol shows how to detect putative cis-regulatory elements and regions enriched in such elements with the regulatory sequence analysis tools (RSAT) web server (http://rsat.ulb.ac.be/rsat/). The approach applies to known transcription factors, whose binding specificity is represented by position-specific scoring matrices, using the program matrix-scan. The detection of individual binding sites is known to return many false predictions. However, results can be strongly improved by estimating P value, and by searching for combinations of sites (homotypic and heterotypic models). We illustrate the detection of sites and enriched regions with a study case, the upstream sequence of the Drosophila melanogaster gene even-skipped. This protocol is also tested on random control sequences to evaluate the reliability of the predictions. Each task requires a few minutes of computation time on the server. The complete protocol can be executed in about one hour.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Representations of the binding specificity for the Krüppel transcription factor of Drososphila melanogaster.
Figure 2: Graphical flowchart showing the links between regulatory sequence analysis tool (RSAT) programs used in the protocol.
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7: A matrix-scan result for the detection of individual sites.
Figure 8: Feature-maps for the even-skipped example.

References

  1. Wasserman, W.W. & Sandelin, A. Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5, 276–287 (2004).

    Article  CAS  Google Scholar 

  2. van Helden, J. Regulatory sequence analysis tools. Nucleic Acids Res. 31, 3593–3596 (2003).

    Article  CAS  Google Scholar 

  3. van Helden, J., André, B. & Collado-Vides, J. A web site for the computational analysis of yeast regulatory sequences. Yeast 16, 177–187 (2000).

    Article  CAS  Google Scholar 

  4. Defrance, M., Janky, R., Sand, O. & van Helden, J. Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences. Nat. Protoc. doi:10.1038/nprot.2008.98 (2008).

  5. Sand, O., Thomas-Chollier, M., Vervisch, E. & van Helden, J. Analyzing multiple data sets by interconnecting RSAT programs via SOAP Web services—an example with ChIP-chip data. Nat. Protoc. doi:10.1038/nprot.2008.99 (2008).

  6. Brohée, S., Faust, K., Lima-Mendez, G., Vanderstocken, G. & van Helden, J. Network Analysis Tools: from biological networks to clusters and pathways. Nat. Protoc. doi:10.1038/nprot.2008.100 (2008).

  7. Wingender, E. TRANSFAC, TRANSPATH and CYTOMER as starting points for an ontology of regulatory networks. In Silico Biol. 4, 55–61 (2004).

    CAS  PubMed  Google Scholar 

  8. Wingender, E., Dietze, P., Karas, H. & Knüppel, R. TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24, 238–241 (1996).

    Article  CAS  Google Scholar 

  9. Gama-Castro, S. et al. RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation. Nucleic Acids Res. 36, D120–D124 (2008).

    Article  CAS  Google Scholar 

  10. Huerta, A.M., Salgado, H., Thieffry, D. & Collado-Vides, J. RegulonDB: a database on transcriptional regulation in Escherichia coli. Nucleic Acids Res. 26, 55–59 (1998).

    Article  CAS  Google Scholar 

  11. Hertz, G.Z. & Hartzell, G.W. 3rd & Stormo, G.D. Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput. Appl. Biosci. 6, 81–92 (1990).

    CAS  PubMed  Google Scholar 

  12. Hertz, G.Z. & Stormo, G.D. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999).

    Article  CAS  Google Scholar 

  13. Coessens, B. et al. INCLUSive: a web portal and service registry for microarray and regulatory sequence analysis. Nucleic Acids Res. 31, 3468–3470 (2003).

    Article  CAS  Google Scholar 

  14. Thijs, G. et al. A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17, 1113–1122 (2001).

    Article  CAS  Google Scholar 

  15. Kel, A.E. et al. MATCH: a tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res. 31, 3576–3579 (2003).

    Article  CAS  Google Scholar 

  16. Frith, M.C., Li, M.C. & Weng, Z. Cluster-Buster: finding dense clusters of motifs in DNA sequences. Nucleic Acids Res. 31, 3666–3668 (2003).

    Article  CAS  Google Scholar 

  17. Philippakis, A.A., He, F.S. & Bulyk, M.L. Modulefinder: a tool for computational discovery of cis regulatory modules. Pac. Symp. Biocomput. 519–530 (2005).

  18. Sosinsky, A., Bonin, C.P., Mann, R.S. & Honig, B. Target Explorer: an automated tool for the identification of new target genes for a specified set of transcription factors. Nucleic Acids Res. 31, 3589–3592 (2003).

    Article  CAS  Google Scholar 

  19. Donaldson, I.J., Chapman, M. & Göttgens, B. TFBScluster: a resource for the characterization of transcriptional regulatory networks. Bioinformatics 21, 3058–3059 (2005).

    Article  CAS  Google Scholar 

  20. Donaldson, I.J. & Göttgens, B. TFBScluster web server for the identification of mammalian composite regulatory elements. Nucleic Acids Res. 34, W524–W528 (2006).

    Article  CAS  Google Scholar 

  21. Berman, B.P. et al. Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biol. 5, R61 (2004).

    Article  Google Scholar 

  22. Berman, B.P. et al. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl. Acad. Sci. USA 99, 757–762 (2002).

    Article  CAS  Google Scholar 

  23. Pierstorff, N., Bergman, C.M. & Wiehe, T. Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA. Bioinformatics 22, 2858–2864 (2006).

    Article  CAS  Google Scholar 

  24. Aerts, S., Van Loo, P., Moreau, Y. & De Moor, B. A genetic algorithm for the detection of new cis-regulatory modules in sets of coregulated genes. Bioinformatics 20, 1974–1976 (2004).

    Article  CAS  Google Scholar 

  25. Loots, G.G. & Ovcharenko, I. rVISTA 2.0: evolutionary analysis of transcription factor binding sites. Nucleic Acids Res. 32, W217–W221 (2004).

    Article  CAS  Google Scholar 

  26. Levine, M. & Tjian, R. Transcription regulation and animal diversity. Nature 424, 147–151 (2003).

    Article  CAS  Google Scholar 

  27. Aerts, S. et al. Toucan: deciphering the cis-regulatory logic of coregulated genes. Nucleic Acids Res. 31, 1753–1764 (2003).

    Article  CAS  Google Scholar 

  28. Stanojevic, D., Small, S. & Levine, M. Regulation of a segmentation stripe by overlapping activators and repressors in the Drosophila embryo. Science 254, 1385–1387 (1991).

    Article  CAS  Google Scholar 

  29. Montgomery, S.B. et al. ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation. Bioinformatics 22, 637–640 (2006).

    Article  CAS  Google Scholar 

  30. Griffith, O.L. et al. ORegAnno: an open-access community-driven resource for regulatory annotation. Nucleic Acids Res. 36, D107–D113 (2008).

    Article  CAS  Google Scholar 

  31. Halfon, M.S., Gallo, S.M. & Bergman, C.M. REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila. Nucleic Acids Res. 36, D594–598 (2008).

    Article  CAS  Google Scholar 

  32. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J. & Wheeler, D.L. GenBank. Nucleic Acids Res. 35, D21–D25 (2007).

    Article  CAS  Google Scholar 

  33. Flicek, P. et al. Ensembl 2008. Nucleic Acids Res. 36, D707–D714 (2008).

    Article  CAS  Google Scholar 

  34. Sandelin, A., Alkema, W., Engström, P., Wasserman, W.W. & Lenhard, B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32, D91–D94 (2004).

    Article  CAS  Google Scholar 

  35. Vlieghe, D. et al. A new generation of JASPAR, the open-access repository for transcription factor binding site profiles. Nucleic Acids Res. 34, D95–D97 (2006).

    Article  CAS  Google Scholar 

  36. Bailey, T.L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).

    CAS  PubMed  Google Scholar 

  37. Gallo, S.M., Li, L., Hu, Z. & Halfon, M.S. REDfly: a regulatory element database for Drosophila. Bioinformatics 22, 381–383 (2006).

    Article  CAS  Google Scholar 

  38. Bina, M. The genome browser at UCSC for locating Genes, and much more! Mol. Biotechnol. 38, 269–275 (2008).

    Article  CAS  Google Scholar 

  39. Staden, R. Methods for calculating the probabilities of finding patterns in sequences. Comput. Appl. Biosci. 5, 89–96 (1989).

    CAS  PubMed  Google Scholar 

  40. Robin, S., Rodolphe, F. & Schbath, S. DNA, Words and Models—Statistics of Exceptional Words (Cambridge University Press, Cambridge, U.K., 2005).

    Google Scholar 

  41. van Helden, J., André, B. & Collado-Vides, J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by the Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture, FRIA (J.-V.T. PhD grant), the Vrije Universiteit Brussel (Geconcerteerde Onderzoeksactie 29) (M.T.-C. PhD grant), and by the BioSapiens Network of Excellence funded under the sixth Framework program of the European Communities (LSHG-CT-2003-503265). The postdoctoral grant of M.D. was funded by the Belgian Program on Interuniversity Attraction Poles, initiated by the Belgian Federal Science Policy Office, project P6/25 (BioMaGNet).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jacques van Helden.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Turatsinze, JV., Thomas-Chollier, M., Defrance, M. et al. Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nat Protoc 3, 1578–1588 (2008). https://doi.org/10.1038/nprot.2008.97

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nprot.2008.97

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing