Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences

Abstract

This protocol explains how to discover functional signals in genomic sequences by detecting over- or under-represented oligonucleotides (words) or spaced pairs thereof (dyads) with the Regulatory Sequence Analysis Tools (http://rsat.ulb.ac.be/rsat/). Two typical applications are presented: (i) predicting transcription factor-binding motifs in promoters of coregulated genes and (ii) discovering phylogenetic footprints in promoters of orthologous genes. The steps of this protocol include purging genomic sequences to discard redundant fragments, discovering over-represented patterns and assembling them to obtain degenerate motifs, scanning sequences and drawing feature maps. The main strength of the method is its statistical ground: the binomial significance provides an efficient control on the rate of false positives. In contrast with optimization-based pattern discovery algorithms, the method supports the detection of under- as well as over-represented motifs. Computation times vary from seconds (gene clusters) to minutes (whole genomes). The execution of the whole protocol should take 1 h.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1
Figure 2: The Web interface of oligo-analysis is divided in five sections separated by dashed lines.
Figure 3: Example of pattern discovery in promoters bound by the transcription factor Spo0A from Bacillus subtilis.
Figure 4: Comparison between oligo-analysis and dyad-analysis for the FNR regulon.
Figure 5: Example of pattern discovery in promoters of orthologous genes.
Figure 6: Detection of restriction sites in whole genomes.

Similar content being viewed by others

References

  1. Thomas-Chollier, M. et al. RSAT: regulatory sequence analysis tools. Nucleic Acids Res. 36, W119–W127 (2008).

    Article  CAS  Google Scholar 

  2. Brohëe, S. et al. NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways. Nucleic Acids Res. Jul 1; 36 (web server issue): w444–51 (2008).

    Article  Google Scholar 

  3. Turatsinze, J.-V., Thomas-Chollier, M., Defrance, M. & van Helden, J. Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nat. Protoc. doi:10.1038/nprot.2008.97 (2008).

  4. Sand, O., Thomas-Chollier, M., Vervisch, E. & van Helden, J. Analyzing multiple data sets by interconnecting RSAT programs via SOAP Web services access—an example with ChIP-chip data. Nat. Protoc. doi:10.1038/nprot.2008.99 (2008).

  5. Brohée, S., Faust, K., Lima-Mendez, G., Vanderstocken, G. & van Helden, J. Network Analysis Tools: from biological networks to clusters and pathways. Nat. Protoc. doi:10.1038/nprot.2008.100 (2008).

  6. van Helden, J., André, B. & Collado-Vides, J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998).

    Article  CAS  Google Scholar 

  7. van Helden, J., Rios, A.F. & Collado-Vides, J. Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res. 28, 1808–1818 (2000).

    Article  CAS  Google Scholar 

  8. van Helden, J., del Olmo, M. & Pérez-Ortín, J.E. Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals. Nucleic Acids Res. 28, 1000–1010 (2000).

    Article  CAS  Google Scholar 

  9. Janky, R. & van Helden, J. Evaluation of phylogenetic footprint discovery for predicting bacterial cis-regulatory elements and revealing their evolution. BMC Bioinformatics 9, 37 (2008).

    Article  Google Scholar 

  10. Schneider, T.D., Stormo, G.D., Gold, L. & Ehrenfeucht, A. Information content of binding sites on nucleotide sequences. J. Mol. Biol. 188, 415–431 (1986).

    Article  CAS  Google Scholar 

  11. Hertz, G.Z. & Stormo, G.D. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999).

    Article  CAS  Google Scholar 

  12. Stormo, G.D. & Hartzell, G.W. III. Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl. Acad. Sci. USA 86, 1183–1187 (1989).

    Article  CAS  Google Scholar 

  13. Hertz, G.Z., Hartzell, G.W. III. & Stormo, G.D. Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput. Appl. Biosci. 6, 81–92 (1990).

    CAS  PubMed  Google Scholar 

  14. Bailey, T.L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).

    CAS  PubMed  Google Scholar 

  15. Bailey, T.L. & Elkan, C. The value of prior knowledge in discovering motifs with MEME. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 21–29 (1995).

    CAS  PubMed  Google Scholar 

  16. Lawrence, C.E. et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993).

    Article  CAS  Google Scholar 

  17. Neuwald, A.F., Liu, J.S. & Lawrence, C.E. Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci. 4, 1618–1632 (1995).

    Article  CAS  Google Scholar 

  18. Roth, F.P., Hughes, J.D., Estep, P.W. & Church, G.M. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol. 16, 939–945 (1998).

    Article  CAS  Google Scholar 

  19. Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J. & Church, G.M. Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999).

    Article  CAS  Google Scholar 

  20. Hughes, J.D., Estep, P.W., Tavazoie, S. & Church, G.M. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000).

    Article  CAS  Google Scholar 

  21. Thijs, G. et al. A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17, 1113–1122 (2001).

    Article  CAS  Google Scholar 

  22. Liu, X., Brutlag, D.L. & Liu, J.S. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac. Symp. Biocomput. 127–138 (2001).

  23. Schbath, S., Prum, B. & de Turckheim, E. Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences. J. Comput. Biol. 2, 417–437 (1995).

    Article  CAS  Google Scholar 

  24. Brazma, A., Jonassen, I., Vilo, J. & Ukkonen, E. Predicting gene regulatory elements in silico on a genomic scale. Genome Res. 8, 1202–1215 (1998).

    Article  CAS  Google Scholar 

  25. Brazma, A., Jonassen, I., Eidhammer, I. & Gilbert, D. Approaches to the automatic discovery of patterns in biosequences. J. Comput. Biol. 5, 279–305 (1998).

    Article  CAS  Google Scholar 

  26. Blanchette, M., Schwikowski, B. & Tompa, M. Algorithms for phylogenetic footprinting. J. Comput. Biol. 9, 211–223 (2002).

    Article  CAS  Google Scholar 

  27. Blanchette, M., Schwikowski, B. & Tompa, M. An exact algorithm to identify motifs in orthologous sequences from multiple species. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 37–45 (2000).

    CAS  PubMed  Google Scholar 

  28. Tompa, M. An exact method for finding short motifs in sequences, with application to the ribosome binding site problem. Proc. Int. Conf. Intell. Syst. Mol. Biol. 262–271 (1999).

  29. Bussemaker, H.J., Li, H. & Siggia, E.D. Regulatory element detection using a probabilistic segmentation model. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 67–74 (2000).

    CAS  PubMed  Google Scholar 

  30. Vanet, A., Marsan, L. & Sagot, M.F. Promoter sequences and algorithmical methods for identifying them. Res. Microbiol. 150, 779–799 (1999).

    Article  CAS  Google Scholar 

  31. DeRisi, J.L., Iyer, V.R. & Brown, P.O. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680–686 (1997).

    Article  CAS  Google Scholar 

  32. Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998).

    Article  CAS  Google Scholar 

  33. Tamayo, P. et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96, 2907–2912 (1999).

    Article  CAS  Google Scholar 

  34. Brazma, A. & Vilo, J. Gene expression data analysis. FEBS Lett. 480, 17–24 (2000).

    Article  CAS  Google Scholar 

  35. Lee, T.I. et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002).

    Article  CAS  Google Scholar 

  36. Harbison, C.T. et al. Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004).

    Article  CAS  Google Scholar 

  37. Molle, V. et al. The Spo0A regulon of Bacillus subtilis. Mol. Microbiol. 50, 1683–1701 (2003).

    Article  CAS  Google Scholar 

  38. Salgado, H. et al. RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res. 34, D394–D397 (2006).

    Article  CAS  Google Scholar 

  39. Huerta, A.M., Salgado, H., Thieffry, D. & Collado-Vides, J. RegulonDB: a database on transcriptional regulation in Escherichia coli. Nucleic Acids Res. 26, 55–59 (1998).

    Article  CAS  Google Scholar 

  40. Wasserman, W.W. & Fickett, J.W. Identification of regulatory regions which confer muscle-specific gene expression. J. Mol. Biol. 278, 167–181 (1998).

    Article  CAS  Google Scholar 

  41. McGuire, A.M., Hughes, J.D. & Church, G.M. Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. Genome Res. 10, 744–757 (2000).

    Article  CAS  Google Scholar 

  42. McCue, L. et al. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res. 29, 774–782 (2001).

    Article  CAS  Google Scholar 

  43. van Nimwegen, E., Zavolan, M., Rajewsky, N. & Siggia, E.D. Probabilistic clustering of sequences: inferring new bacterial regulons by comparative genomics. Proc. Natl. Acad. Sci. USA 99, 7323–7328 (2002).

    Article  CAS  Google Scholar 

  44. Kellis, M., Patterson, N., Endrizzi, M., Birren, B. & Lander, E.S. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241–254 (2003).

    Article  CAS  Google Scholar 

  45. Godard, P. et al. Effect of 21 different nitrogen sources on global gene expression in the yeast Saccharomyces cerevisiae. Mol. Cell. Biol. 27, 3065–3086 (2007).

    Article  CAS  Google Scholar 

  46. Gonze, D., Pinloche, S., Gascuel, O. & van Helden, J. Discrimination of yeast genes involved in methionine and phosphate metabolism on the basis of upstream motifs. Bioinformatics 21, 3490–3500 (2005).

    Article  CAS  Google Scholar 

  47. Simonis, N., Wodak, S.J., Cohen, G.N. & van Helden, J. Combining pattern discovery and discriminant analysis to predict gene co-regulation. Bioinformatics 20, 2370–2379 (2004).

    Article  CAS  Google Scholar 

  48. Simonis, N., van Helden, J., Cohen, G.N. & Wodak, S.J. Transcriptional regulation of protein complexes in yeast. Genome Biol. 5, R33 (2004).

    Article  Google Scholar 

  49. Hulzink, R.J. et al. In silico identification of putative regulatory sequence elements in the 5′-untranslated region of genes that are expressed during male gametogenesis. Plant Physiol. 132, 75–83 (2003).

    Article  CAS  Google Scholar 

  50. Aerts, S., van Helden, J., Sand, O. & Hassan, B.A. Fine-tuning enhancer models to predict transcriptional targets across multiple genomes. PLoS ONE 2, e1115 (2007).

    Article  Google Scholar 

  51. Stark, A. et al. Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature 450, 219–232 (2007).

    Article  CAS  Google Scholar 

  52. Strauch, M., Webb, V., Spiegelman, G. & Hoch, J.A. The SpoOA protein of Bacillus subtilis is a repressor of the abrB gene. Proc. Natl. Acad. Sci. USA 87, 1801–1805 (1990).

    Article  CAS  Google Scholar 

  53. Baldus, J.M., Green, B.D., Youngman, P. & Moran, C.P. Jr. Phosphorylation of Bacillus subtilis transcription factor Spo0A stimulates transcription from the spoIIG promoter by enhancing binding to weak 0A boxes. J. Bacteriol. 176, 296–306 (1994).

    Article  CAS  Google Scholar 

  54. Sierro, N., Makita, Y., de Hoon, M. & Nakai, K. DBTBS: a database of transcriptional regulation in Bacillus subtilis containing upstream intergenic conservation information. Nucleic Acids Res. 36, D93–D96 (2007).

    Article  Google Scholar 

  55. Roberts, R.J., Vincze, T., Posfai, J. & Macelis, D. REBASE—enzymes and genes for DNA restriction and modification. Nucleic Acids Res. 35, D269–D270 (2007).

    Article  CAS  Google Scholar 

  56. Roberts, R.J. & Macelis, D. REBASE—restriction enzymes and methylases. Nucleic Acids Res. 28, 306–307 (2000).

    Article  CAS  Google Scholar 

  57. Kurtz, S. et al. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 29, 4633–4642 (2001).

    Article  CAS  Google Scholar 

  58. Kurtz, S. & Schleiermacher, C. REPuter: fast computation of maximal repeats in complete genomes. Bioinformatics 15, 426–427 (1999).

    Article  CAS  Google Scholar 

  59. Robin, S., Rodolphe, F. & Schbath, S. DNA, Words and Models—Statistics of Exceptional Words (Cambridge University Press, Cambridge, 2005).

    Google Scholar 

  60. Nuel, G. & Prum, B. Analyse statistique des séquences biologiques: modélisation markovienne, alignements et motifs (Hermes Science Publishing, London, England, 2007).

    Google Scholar 

  61. Brazma, A., Vilo, J., Ukkonen, E. & Valtonen, K. Data mining for regulatory elements in yeast genome. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 65–74 (1997).

    CAS  PubMed  Google Scholar 

  62. Sinha, S. & Tompa, M. YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 31, 3586–3588 (2003).

    Article  CAS  Google Scholar 

  63. Reinert, G. & Schbath, S. Compound Poisson and Poisson process approximations for occurrences of multiple words in Markov chains. J. Comput. Biol. 5, 223–253 (1998).

    Article  CAS  Google Scholar 

  64. El Karoui, M., Biaudet, V., Schbath, S. & Gruss, A. Characteristics of Chi distribution on different bacterial genomes. Res. Microbiol. 150, 579–587 (1999).

    Article  CAS  Google Scholar 

  65. Vandenbogaert, M. & Makeev, V. Analysis of bacterial RM-systems through genome-scale analysis and related taxonomy issues. In Silico Biol. 3, 127–143 (2003).

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by the Belgian Program on Interuniversity Attraction Poles, initiated by the Belgian Federal Science Policy Office, project P6/25 (BioMaGNet), by the BioSapiens Network of Excellence funded under the sixth Framework program of the European Communities (LSHG-CT-2003-503265) and by the Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture, FRIA (R.J. PhD grant). We acknowledge the students of the Licenciatura en Ciencias Genomicas (CCG-UNAM, Mexico) and the Instituto de Biotecnologia (IBT-UNAM, Mexico) for having tested the protocol and provided useful feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jacques van Helden.

Supplementary information

Supplementary Fig. 1

Feature-map of the significant oligonucleotides. Note that the precise values of the oligo-analysis results can slightly vary with successive versions of the genome stored at NCBI. (PDF 116 kb)

Supplementary Fig. 2

Feature-maps of the significant dyads detected in purE promoters of Enterobacteriales (PDF 58 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Defrance, M., Janky, R., Sand, O. et al. Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences. Nat Protoc 3, 1589–1603 (2008). https://doi.org/10.1038/nprot.2008.98

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nprot.2008.98

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing