Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Promoter prediction analysis on the whole human genome

Abstract

Promoter prediction programs (PPPs) are important for in silico gene discovery without support from expressed sequence tag (EST)/cDNA/mRNA sequences, in the analysis of gene regulation and in genome annotation. Contrary to previous expectations, a comprehensive analysis of PPPs reveals that no program simultaneously achieves sensitivity and a positive predictive value >65%. PPP performances deduced from a limited number of chromosomes or smaller data sets do not hold when evaluated at the level of the whole genome, with serious inaccuracy of predictions for non-CpG-island-related promoters. Some PPPs even perform worse than, or close to, pure random guessing.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

References

  1. Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Article  CAS  PubMed  Google Scholar 

  2. Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    Article  CAS  PubMed  Google Scholar 

  3. Weinzierl, R.O.J. Mechanisms of Gene Expression: Structure, Function, and Evolution of the Basal Transcriptional Machinery (Imperial College Press, London, 1999).

    Book  Google Scholar 

  4. Pedersen, A.G., Baldi, P., Chauvin, Y. & Brunak, S. The biology of eukaryotic promoter prediction—a review. Comput. Chem. 23, 191–207 (1999).

    Article  CAS  PubMed  Google Scholar 

  5. Bajic, V.B. et al. Computer model for recognition of functional transcription start sites in RNA polymerase II promoters of vertebrates. J. Mol. Graph. Model. 21, 323–332 (2003).

    Article  CAS  PubMed  Google Scholar 

  6. Bajic, V.B. & Seah, S.H. Dragon Gene Start Finder identifies approximate locations of the 5′ ends of genes. Nucleic Acids Res. 31, 3560–3563 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Bajic, V.B. & Seah, S.H. Dragon Gene Start Finder: an advanced system for finding approximate locations of the start of gene transcriptional units. Genome Res. 13, 1923–1929 (2003).

    PubMed  PubMed Central  CAS  Google Scholar 

  8. Davuluri, R.V., Grosse, I. & Zhang, M.Q. Computational identification of promoters and first exons in the human genome. Nat. Genet. 29, 412–417 (2001).

    Article  CAS  PubMed  Google Scholar 

  9. Down, T.A. & Hubbard, T.J. Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res. 12, 458–461 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Reese, M.G. Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput. Chem. 26, 51–56 (2001).

    Article  CAS  PubMed  Google Scholar 

  11. Knudsen, S. Promoter2.0: for the recognition of PolII promoter sequences. Bioinformatics 15, 356–361 (1999).

    Article  CAS  PubMed  Google Scholar 

  12. Ohler, U., Liao, G.C., Niemann, H. & Rubin, G.M. Computational analysis of core promoters in the Drosophila genome. Genome Biol. 3(12), RESEARCH0087. Epub 2002 Dec 20 (2002).

  13. Ohler, U., Stemmer, G., Harbeck, S. & Niemann, H. Stochastic segment models of eukaryotic promoter regions. Proc. Pac. Symp. Biocomput. 5, 380–391 (2000).

    Google Scholar 

  14. Ponger, L. & Mouchiroud, D. CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences. Bioinformatics 18, 631–633 (2002).

    Article  CAS  PubMed  Google Scholar 

  15. Hannenhalli, S. & Levy, S. Promoter prediction in the human genome. Bioinformatics 17, S90–S96 (2001).

    Article  PubMed  Google Scholar 

  16. Ioshikhes, I.P. & Zhang, M.Q. Large-scale human promoter mapping using CpG islands. Nat. Genet. 26, 61–63 (2000).

    Article  CAS  PubMed  Google Scholar 

  17. Scherf, M., Klingenhoff, A. & Werner, T. Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. J. Mol. Biol. 297, 599–606 (2000).

    Article  CAS  PubMed  Google Scholar 

  18. Solovyev, V.V. & Shahmuradov, I.A. PromH: Promoters identification using orthologous genomic sequences. Nucleic Acids Res. 31, 3540–3545 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Fickett, J.W. & Hatzigeorgiou, A.G. Eukaryotic promoter recognition. Genome Res. 7, 861–878 (1997).

    Article  CAS  PubMed  Google Scholar 

  20. Prestridge, D.S. Computer software for eukaryotic promoter analysis. Methods Mol. Biol. 130, 265–295 (2000).

    PubMed  CAS  Google Scholar 

  21. Bajic, V.B. Comparing the success of different prediction software in sequence analysis: a review. Brief. Bioinform. 1, 214–228 (2000).

    Article  CAS  PubMed  Google Scholar 

  22. Liu, R. & States, D.J. Consensus promoter identification in the human genome utilizing expressed gene markers and gene modeling. Genome Res. 12, 462–469 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Suzuki, Y. et al. Diverse transcriptional initiation revealed by fine, large-scale mapping of mRNA start sites. EMBO Rep. 2, 388–393 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Scherf, M. et al. First pass annotation of promoters on human chromosome 22. Genome Res. 11, 333–340 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Suzuki, Y. et al. DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs. Nucleic Acids Res. 30, 328–331 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Ripley, B.D. Pattern Recognition and Neural Networks (Cambridge University Press, Cambridge, UK, 1996).

    Book  Google Scholar 

  27. Murakami, K. & Takagi, T. Gene recognition by combination of several gene-finding programs. Bioinformatics 14, 665–675 (1998).

    Article  CAS  PubMed  Google Scholar 

  28. Rogic, S., Ouellette, B.F. & Mackworth, A.K. Improving gene recognition accuracy by combining predictions from two gene-finding programs. Bioinformatics 18, 1034–1045 (2002).

    Article  CAS  PubMed  Google Scholar 

  29. Matys, V. et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31(1), 374–8 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We are grateful to Riu Yamashita and Kenta Nakai for assisting in constructing and maintaining DBTSS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir B Bajic.

Ethics declarations

Competing interests

The employer of Vladimir B. Bajic and Sin Lam Tan has licensed Dragon Promoter Finder and Dragon Gene Start Finder to Biobase, Germany. Vladimir B. Bajic receives royalty for these two programs.

Supplementary information

Supplementary Figure 1

Distribution of clustered predictions for seven analyzed PPPs.

Supplementary Table 1

Results of promoter prediction on human chromosomes 21 and 22.

Supplementary Table 2

Results of promoter prediction on human chromosomes 4, 21 and 22.

Supplementary Table 3

Results of promoter prediction on HG for different distance criteria.

Supplementary Methods

Supplementary Notes

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bajic, V., Tan, S., Suzuki, Y. et al. Promoter prediction analysis on the whole human genome. Nat Biotechnol 22, 1467–1473 (2004). https://doi.org/10.1038/nbt1032

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt1032

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing