The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.
This is a preview of subscription content, access via your institution
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Pevzner, P. & Sze, S.-H. Combinatorial approaches to finding subtle signals in DNA sequences. in Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ed. Altman, R. et al.). 269–278 (AAAI Press, Menlo Park, CA, 2000).
Sinha, S. & Tompa, M. Performance comparison of algorithms for finding transcription factor binding sites. in 3rd IEEE Symposium on Bioinformatics and Bioengineering (ed. Bourbakis, N.G.). 214–220 (IEEE Computer Society, New York, 2003).
Burset, M. & Guigó, R. Evaluation of gene structure prediction programs. Genomics 34, 353–367 (1996).
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Reese, M.G. et al. Genome annotation assessment in Drosophila melanogaster. Genome Res. 10, 483–501 (2000).
Ashburner, M. A biologist's view of the Drosophila genome annotation assessment project. Genome Res. 10, 391–393 (2000).
Hughes, J.D., Estep, P.W., Tavazoie, S. & Church, G.M. Computational identification of cis-regulatory elements associated with functionally coherent groups of genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000).
Workman, C.T. & Stormo, G.D. ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. in Pacific Symposium on Biocomputing (ed. Altman, R., Dunker, A.K., Hunter, L. & Klein, T.E.). 467–478 (Stanford University, Stanford, CA, 2000).
Hertz, G.Z. & Stormo, G.D. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999).
Frith, M.C., Hansen, U., Spouge, J.L. & Weng, Z. Finding functional sequence elements by multiple local alignment. Nucleic Acids Res. 32, 189–200 (2004).
Ao, W., Gaudet, J., Kent, W.J., Muttumu, S. & Mango, S.E. Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR. Science 305, 1743–1746 (2004).
Bailey, T.L. & Elkan, C. The value of prior knowledge in discovering motifs with MEME. in Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology. 21–29 (AAAI Press, Menlo Park, CA, 1995).
Eskin, E. & Pevzner, P. Finding composite regulatory patterns in DNA sequences. Bioinformatics (Supplement 1) 18, S354–S363 (2002).
Thijs, G. et al. A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17, 1113–1122 (2001).
van Helden, J., Andre, B. & Collado-Vides, J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998).
van Helden, J., Rios, A.F. & Collado-Vides, J. Discovering regulatory elements in noncoding sequences by analysis of spaced dyads. Nucleic Acids Res. 28, 1808–1818 (2000).
Régnier, M. & Denise, A. Rare events and conditional events on random strings. Discrete Math. Theor. Comput. Sci. 6, 191–214 (2004).
Favorov, A.V., Gelfand, M.S., Gerasimova, A.V., Mironov, A.A. & Makeev, V.J. Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length and its validation on the ArcA binding sites. in Proceedings of BGRS 2004 (BGRS, Novosibirsk, 2004).
Pavesi, G., Mereghetti, P., Mauri, G. & Pesole, G. Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res. 32, W199–W203 (2004).
Sinha, S. & Tompa, M. YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 31, 3586–3588 (2003).
Wingender, E., Dietze, P., Karas, H. & Knüppel, R. TRANSFAC: a Database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24, 238–241 (1996).
Moult, J., Fidelis, K., Zemla, A. & Hubbard, T. Critical assessment of methods of protein structure prediction (CASP)-round V. Proteins 53, 334–339 (2003).
Sinha, S., Blanchette, M. & Tompa, M. PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformat. 5, 170 (2004).
We thank Mathieu Blanchette, Ari Frank, Phil Green, Susan Hewitt, S.N. Maheshwari, Larry Ruzzo, Terry Speed, Gary Stormo and the organizers and participants of the 2002 Bellairs Workshop on Computational Biology for their important contributions to this project. Martin Tompa and Nan Li were supported by National Science Foundation (NSF) grant DBI-0218798 and by National Institutes of Health (NIH) grant R01 HG02602. Alexander Favorov, Andrei Mironov and Vsevolod Makeev were supported by Howard Hughes Medical Institute grant 55000309, Ludwig Cancer Research Institute grant CRDF RBO-1268-MO-02, Russian Fund of Basic Research grant 04-07-90270 and support from the Russian Academy of Sciences Presidium Program in Molecular and Cellular Biology, project no. 10. Yutao Fu, Martin C. Frith and Zhiping Weng were supported by NSF grant DBI-0116574 and NIH NHGRI grant 1R01HG03110. Giulio Pavesi and Graziano Pesole were supported by the Italian Ministry of University and Scientific Research's Fondo Italiano per la Ricerca di Base project 'Bioinformatica per la Genomica e la Proteomica' and by Telethon. Nicolas Simonis and Jacques van Helden were supported by the European Communities grant QLRI-199-01333, by the Action de Recherches Concertées de la Communauté Française de Belgique and by the Government of the Brussels Region. Saurabh Sinha was supported by a Keck Foundation Fellowship. Gert Thijs and Bart De Moor were supported by Geconcerteerde Onderzoeks-Acties Mefisto-666 and Ambiorics, InterUniversity Attraction Pole V-22, and several funded projects of the Institut voor de aanmoediging van Innovatie door Wetenshap en Technologie in Vlaanderen, Fonds voor Wetenshappelijk Onderzoek, and European Union. Zhou Zhu is a Howard Hughes Medical Institute predoctoral fellow. Zhou Zhu and George Church were supported by the Department of Energy and the Lipper Foundation.
The authors declare no competing financial interests.
About this article
Cite this article
Tompa, M., Li, N., Bailey, T. et al. Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23, 137–144 (2005). https://doi.org/10.1038/nbt1053
This article is cited by
A novel approach GRNTSTE to reconstruct gene regulatory interactions applied to a case study for rat pineal rhythm gene
Scientific Reports (2022)
Improving language model of human genome for DNA–protein binding prediction based on task-specific pre-training
Interdisciplinary Sciences: Computational Life Sciences (2022)
BMC Bioinformatics (2021)
Integrative inference of transcriptional networks in Arabidopsis yields novel ROS signalling regulators
Nature Plants (2021)
Genome-wide identification and expression analysis of the pear autophagy-related gene PbrATG8 and functional verification of PbrATG8c in Pyrus bretschneideri Rehd