Subjects

Abstract

The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    & Combinatorial approaches to finding subtle signals in DNA sequences. in Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ed. Altman, R. et al.). 269–278 (AAAI Press, Menlo Park, CA, 2000).

  2. 2.

    & Performance comparison of algorithms for finding transcription factor binding sites. in 3rd IEEE Symposium on Bioinformatics and Bioengineering (ed. Bourbakis, N.G.). 214–220 (IEEE Computer Society, New York, 2003).

  3. 3.

    & Evaluation of gene structure prediction programs. Genomics 34, 353–367 (1996).

  4. 4.

    & Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).

  5. 5.

    et al. Genome annotation assessment in Drosophila melanogaster. Genome Res. 10, 483–501 (2000).

  6. 6.

    A biologist's view of the Drosophila genome annotation assessment project. Genome Res. 10, 391–393 (2000).

  7. 7.

    , , & Computational identification of cis-regulatory elements associated with functionally coherent groups of genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000).

  8. 8.

    & ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. in Pacific Symposium on Biocomputing (ed. Altman, R., Dunker, A.K., Hunter, L. & Klein, T.E.). 467–478 (Stanford University, Stanford, CA, 2000).

  9. 9.

    & Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999).

  10. 10.

    , , & Finding functional sequence elements by multiple local alignment. Nucleic Acids Res. 32, 189–200 (2004).

  11. 11.

    , , , & Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR. Science 305, 1743–1746 (2004).

  12. 12.

    & The value of prior knowledge in discovering motifs with MEME. in Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology. 21–29 (AAAI Press, Menlo Park, CA, 1995).

  13. 13.

    & Finding composite regulatory patterns in DNA sequences. Bioinformatics (Supplement 1) 18, S354–S363 (2002).

  14. 14.

    et al. A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17, 1113–1122 (2001).

  15. 15.

    , & Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998).

  16. 16.

    , & Discovering regulatory elements in noncoding sequences by analysis of spaced dyads. Nucleic Acids Res. 28, 1808–1818 (2000).

  17. 17.

    & Rare events and conditional events on random strings. Discrete Math. Theor. Comput. Sci. 6, 191–214 (2004).

  18. 18.

    , , , & Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length and its validation on the ArcA binding sites. in Proceedings of BGRS 2004 (BGRS, Novosibirsk, 2004).

  19. 19.

    , , & Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res. 32, W199–W203 (2004).

  20. 20.

    & YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 31, 3586–3588 (2003).

  21. 21.

    , , & TRANSFAC: a Database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24, 238–241 (1996).

  22. 22.

    , , & Critical assessment of methods of protein structure prediction (CASP)-round V. Proteins 53, 334–339 (2003).

  23. 23.

    , & PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformat. 5, 170 (2004).

Download references

Acknowledgements

We thank Mathieu Blanchette, Ari Frank, Phil Green, Susan Hewitt, S.N. Maheshwari, Larry Ruzzo, Terry Speed, Gary Stormo and the organizers and participants of the 2002 Bellairs Workshop on Computational Biology for their important contributions to this project. Martin Tompa and Nan Li were supported by National Science Foundation (NSF) grant DBI-0218798 and by National Institutes of Health (NIH) grant R01 HG02602. Alexander Favorov, Andrei Mironov and Vsevolod Makeev were supported by Howard Hughes Medical Institute grant 55000309, Ludwig Cancer Research Institute grant CRDF RBO-1268-MO-02, Russian Fund of Basic Research grant 04-07-90270 and support from the Russian Academy of Sciences Presidium Program in Molecular and Cellular Biology, project no. 10. Yutao Fu, Martin C. Frith and Zhiping Weng were supported by NSF grant DBI-0116574 and NIH NHGRI grant 1R01HG03110. Giulio Pavesi and Graziano Pesole were supported by the Italian Ministry of University and Scientific Research's Fondo Italiano per la Ricerca di Base project 'Bioinformatica per la Genomica e la Proteomica' and by Telethon. Nicolas Simonis and Jacques van Helden were supported by the European Communities grant QLRI-199-01333, by the Action de Recherches Concertées de la Communauté Française de Belgique and by the Government of the Brussels Region. Saurabh Sinha was supported by a Keck Foundation Fellowship. Gert Thijs and Bart De Moor were supported by Geconcerteerde Onderzoeks-Acties Mefisto-666 and Ambiorics, InterUniversity Attraction Pole V-22, and several funded projects of the Institut voor de aanmoediging van Innovatie door Wetenshap en Technologie in Vlaanderen, Fonds voor Wetenshappelijk Onderzoek, and European Union. Zhou Zhu is a Howard Hughes Medical Institute predoctoral fellow. Zhou Zhu and George Church were supported by the Department of Energy and the Lipper Foundation.

Author information

Affiliations

  1. Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, Washington 98195-2350, USA.

    • Martin Tompa
    • , Nan Li
    •  & William Stafford Noble
  2. Department of Genome Sciences, Box 357730, University of Washington, Seattle, Washington 98195-7730, USA.

    • Martin Tompa
    •  & William Stafford Noble
  3. Institute for Molecular Biosciences, University of Queensland, Brisbane, Australia.

    • Timothy L Bailey
  4. Department of Genetics and Lipper Center for Computational Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.

    • George M Church
    •  & Zhou Zhu
  5. ESAT-SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium.

    • Bart De Moor
    •  & Gert Thijs
  6. Department of Computer Science and Engineering, University of California, San Diego, California 92093, USA.

    • Eleazar Eskin
  7. State Scientific Centre 'GosNIIGenetica,' 1st Dorozhny pr. 1, Moscow, 117545, Russia.

    • Alexander V Favorov
    • , Vsevolod J Makeev
    •  & Andrei A Mironov
  8. Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilova 32, Moscow 119991, Russia.

    • Alexander V Favorov
    •  & Vsevolod J Makeev
  9. Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA.

    • Martin C Frith
    • , Yutao Fu
    •  & Zhiping Weng
  10. Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA.

    • W James Kent
  11. Department of Bioengineering and Bioinformatics, Moscow State University, Lab. Bldg B, Vorobiovy Gory 1-33, Moscow 119992, Russia.

    • Andrei A Mironov
  12. Department of Computer Science and Communication (D.I.Co), University of Milan, Milan, Italy.

    • Giulio Pavesi
  13. Department of Biomolecular Science and Biotechnology, University of Milan, Milan, Italy.

    • Graziano Pesole
  14. INRIA Rocquencourt, Domaine de Voluceau B.P. 105, 78153 Le Chesnay, France.

    • Mireille Régnier
    •  & Mathias Vandenbogaert
  15. SCMB-Université Libre de Bruxelles, Campus Plaine, CP 263, Boulevard du Triomphe, 1050 Bruxelles, Belgium.

    • Nicolas Simonis
    •  & Jacques van Helden
  16. Center for Studies in Physics and Biology, The Rockefeller University, New York, New York 10021, USA.

    • Saurabh Sinha
  17. Department of Bioengineering, University of California, San Diego, California 92093, USA.

    • Christopher Workman
  18. Bioinformatics Program, University of California, San Diego, California 92093, USA.

    • Chun Ye

Authors

  1. Search for Martin Tompa in:

  2. Search for Nan Li in:

  3. Search for Timothy L Bailey in:

  4. Search for George M Church in:

  5. Search for Bart De Moor in:

  6. Search for Eleazar Eskin in:

  7. Search for Alexander V Favorov in:

  8. Search for Martin C Frith in:

  9. Search for Yutao Fu in:

  10. Search for W James Kent in:

  11. Search for Vsevolod J Makeev in:

  12. Search for Andrei A Mironov in:

  13. Search for William Stafford Noble in:

  14. Search for Giulio Pavesi in:

  15. Search for Graziano Pesole in:

  16. Search for Mireille Régnier in:

  17. Search for Nicolas Simonis in:

  18. Search for Saurabh Sinha in:

  19. Search for Gert Thijs in:

  20. Search for Jacques van Helden in:

  21. Search for Mathias Vandenbogaert in:

  22. Search for Zhiping Weng in:

  23. Search for Christopher Workman in:

  24. Search for Chun Ye in:

  25. Search for Zhou Zhu in:

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Martin Tompa.

Supplementary information

Zip files

  1. 1.

    Supplementary Data

    Generic

  2. 2.

    Supplementary Data

    MChain

  3. 3.

    Supplementary Data

    realUpstream

About this article

Publication history

Published

DOI

https://doi.org/10.1038/nbt1053

Further reading