Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Analyzing multiple data sets by interconnecting RSAT programs via SOAP Web services—an example with ChIP-chip data

Abstract

This protocol shows how to access the Regulatory Sequence Analysis Tools (RSAT) via a programmatic interface in order to automate the analysis of multiple data sets. We describe the steps for writing a Perl client that connects to the RSAT Web services and implements a workflow to discover putative cis-acting elements in promoters of gene clusters. In the presented example, we apply this workflow to lists of transcription factor target genes resulting from ChIP-chip experiments. For each factor, the protocol predicts the binding motifs by detecting significantly overrepresented hexanucleotides in the target promoters and generates a feature map that displays the positions of putative binding sites along the promoter sequences. This protocol is addressed to bioinformaticians and biologists with programming skills (notions of Perl). Running time is 6 min on the example data set.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5: Feature map showing the positions of the significant hexanucleotides found in promoters of genes from the BAS1_YPD cluster.

Similar content being viewed by others

References

  1. Thomas-Chollier, M. et al. RSAT: regulatory sequence analysis tools. Nucleic Acids Res. 36, W119–W127 (2008).

    Article  CAS  Google Scholar 

  2. Brohée, S. et al. NeAT: network analysis tools. Nucleic Acids Res. 36, Suppl_2 W444–W451 (2008).

    Article  Google Scholar 

  3. Turatsinze, J.-V., Thomas-Chollier, M., Defrance, M. & van Helden, J. Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nat. Protoc. doi:10.1038/nprot.2008.97 (2008).

  4. Defrance, M., Janky, R., Sand, O. & van Helden, J. Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences. Nat. Protoc. doi:10.1038/nprot.2008.98 (2008).

  5. Brohée, S., Faust, K., Lima-Mendez, G., Vanderstocken, G. & van Helden, J. Network Analysis Tools: from biological networks to clusters and pathways. Nat. Protoc. doi:10.1038/nprot.2008.100 (2008).

  6. van Helden, J., André, B. & Collado-Vides, J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998).

    Article  CAS  Google Scholar 

  7. van Helden, J. Regulatory sequence analysis tools. Nucleic Acids Res. 31, 3593–3596 (2003).

    Article  CAS  Google Scholar 

  8. van Helden, J., Andre, B. & Collado-Vides, J. A web site for the computational analysis of yeast regulatory sequences. Yeast 16, 177–187 (2000).

    Article  CAS  Google Scholar 

  9. Harbison, C.T. et al. Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004).

    Article  CAS  Google Scholar 

  10. Montgomery, S.B. et al. ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation. Bioinformatics 22, 637–640 (2006).

    Article  CAS  Google Scholar 

  11. Vlieghe, D. et al. A new generation of JASPAR, the open-access repository for transcription factor binding site profiles. Nucleic Acids Res. 34, D95–D97 (2006).

    Article  CAS  Google Scholar 

  12. Sandelin, A., Alkema, W., Engström, P., Wasserman, W.W. & Lenhard, B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32, D91–D94 (2004).

    Article  CAS  Google Scholar 

  13. Aerts, S. et al. TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis. Nucleic Acids Res. 33, W393–W396 (2005).

    Article  CAS  Google Scholar 

  14. Aerts, S. et al. Toucan: deciphering the cis-regulatory logic of coregulated genes. Nucleic Acids Res. 31, 1753–1764 (2003).

    Article  CAS  Google Scholar 

  15. Saeed, A.I. et al. TM4: a free, open-source system for microarray data management and analysis. Biotechniques 34, 374–378 (2003).

    Article  CAS  Google Scholar 

  16. Saeed, A.I. et al. TM4 microarray software suite. Methods Enzymol. 411, 134–193 (2006).

    Article  CAS  Google Scholar 

  17. Reimers, M. & Carey, V.J. Bioconductor: an open source framework for bioinformatics and computational biology. Methods Enzymol. 411, 119–134 (2006).

    Article  CAS  Google Scholar 

  18. van Helden, J., Rios, A.F. & Collado-Vides, J. Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res. 28, 1808–1818 (2000).

    Article  CAS  Google Scholar 

  19. Oinn, T. et al. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20, 3045–3054 (2004).

    Article  CAS  Google Scholar 

  20. Hull, D. et al. Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34, W729–W732 (2006).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by the BioSapiens Network of Excellence funded under the sixth Framework program of the European Communities (LSHG-CT-2003-503265) (O.S. postdoc grant, E.V. research fellowship), the Vrije Universiteit Brussel (Geconcerteerde Onderzoeksactie 29) (M.T.-C. PhD grant) and by the Belgian Program on Interuniversity Attraction Poles, initiated by the Belgian Federal Science Policy Office, project P6/25 (BioMaGNet).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jacques van Helden.

Supplementary information

Supplementary Figure 1

Full version of the feature map showing the positions of the significant hexanucleotides found in promoters of genes from the BAS1_YPD cluster. (PDF 82 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sand, O., Thomas-Chollier, M., Vervisch, E. et al. Analyzing multiple data sets by interconnecting RSAT programs via SOAP Web services—an example with ChIP-chip data. Nat Protoc 3, 1604–1615 (2008). https://doi.org/10.1038/nprot.2008.99

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nprot.2008.99

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing