Analysis | Published:

Promoter prediction analysis on the whole human genome

Nature Biotechnology volume 22, pages 14671473 (2004) | Download Citation

Subjects

Abstract

Promoter prediction programs (PPPs) are important for in silico gene discovery without support from expressed sequence tag (EST)/cDNA/mRNA sequences, in the analysis of gene regulation and in genome annotation. Contrary to previous expectations, a comprehensive analysis of PPPs reveals that no program simultaneously achieves sensitivity and a positive predictive value >65%. PPP performances deduced from a limited number of chromosomes or smaller data sets do not hold when evaluated at the level of the whole genome, with serious inaccuracy of predictions for non-CpG-island-related promoters. Some PPPs even perform worse than, or close to, pure random guessing.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

  2. 2.

    et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

  3. 3.

    Mechanisms of Gene Expression: Structure, Function, and Evolution of the Basal Transcriptional Machinery (Imperial College Press, London, 1999).

  4. 4.

    , , & The biology of eukaryotic promoter prediction—a review. Comput. Chem. 23, 191–207 (1999).

  5. 5.

    et al. Computer model for recognition of functional transcription start sites in RNA polymerase II promoters of vertebrates. J. Mol. Graph. Model. 21, 323–332 (2003).

  6. 6.

    & Dragon Gene Start Finder identifies approximate locations of the 5′ ends of genes. Nucleic Acids Res. 31, 3560–3563 (2003).

  7. 7.

    & Dragon Gene Start Finder: an advanced system for finding approximate locations of the start of gene transcriptional units. Genome Res. 13, 1923–1929 (2003).

  8. 8.

    , & Computational identification of promoters and first exons in the human genome. Nat. Genet. 29, 412–417 (2001).

  9. 9.

    & Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res. 12, 458–461 (2002).

  10. 10.

    Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput. Chem. 26, 51–56 (2001).

  11. 11.

    Promoter2.0: for the recognition of PolII promoter sequences. Bioinformatics 15, 356–361 (1999).

  12. 12.

    , , & Computational analysis of core promoters in the Drosophila genome. Genome Biol. 3(12), RESEARCH0087. Epub 2002 Dec 20 (2002).

  13. 13.

    , , & Stochastic segment models of eukaryotic promoter regions. Proc. Pac. Symp. Biocomput. 5, 380–391 (2000).

  14. 14.

    & CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences. Bioinformatics 18, 631–633 (2002).

  15. 15.

    & Promoter prediction in the human genome. Bioinformatics 17, S90–S96 (2001).

  16. 16.

    & Large-scale human promoter mapping using CpG islands. Nat. Genet. 26, 61–63 (2000).

  17. 17.

    , & Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. J. Mol. Biol. 297, 599–606 (2000).

  18. 18.

    & PromH: Promoters identification using orthologous genomic sequences. Nucleic Acids Res. 31, 3540–3545 (2003).

  19. 19.

    & Eukaryotic promoter recognition. Genome Res. 7, 861–878 (1997).

  20. 20.

    Computer software for eukaryotic promoter analysis. Methods Mol. Biol. 130, 265–295 (2000).

  21. 21.

    Comparing the success of different prediction software in sequence analysis: a review. Brief. Bioinform. 1, 214–228 (2000).

  22. 22.

    & Consensus promoter identification in the human genome utilizing expressed gene markers and gene modeling. Genome Res. 12, 462–469 (2002).

  23. 23.

    et al. Diverse transcriptional initiation revealed by fine, large-scale mapping of mRNA start sites. EMBO Rep. 2, 388–393 (2001).

  24. 24.

    et al. First pass annotation of promoters on human chromosome 22. Genome Res. 11, 333–340 (2001).

  25. 25.

    et al. DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs. Nucleic Acids Res. 30, 328–331 (2002).

  26. 26.

    Pattern Recognition and Neural Networks (Cambridge University Press, Cambridge, UK, 1996).

  27. 27.

    & Gene recognition by combination of several gene-finding programs. Bioinformatics 14, 665–675 (1998).

  28. 28.

    , & Improving gene recognition accuracy by combining predictions from two gene-finding programs. Bioinformatics 18, 1034–1045 (2002).

  29. 29.

    et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31(1), 374–8 (2003).

Download references

Acknowledgements

We are grateful to Riu Yamashita and Kenta Nakai for assisting in constructing and maintaining DBTSS.

Author information

Affiliations

  1. Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613 Singapore.

    • Vladimir B Bajic
    •  & Sin Lam Tan
  2. Human Genome Center, University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan.

    • Yutaka Suzuki
    •  & Sumio Sugano

Authors

  1. Search for Vladimir B Bajic in:

  2. Search for Sin Lam Tan in:

  3. Search for Yutaka Suzuki in:

  4. Search for Sumio Sugano in:

Competing interests

The employer of Vladimir B. Bajic and Sin Lam Tan has licensed Dragon Promoter Finder and Dragon Gene Start Finder to Biobase, Germany. Vladimir B. Bajic receives royalty for these two programs.

Corresponding author

Correspondence to Vladimir B Bajic.

Supplementary information

PDF files

  1. 1.

    Supplementary Figure 1

    Distribution of clustered predictions for seven analyzed PPPs.

  2. 2.

    Supplementary Table 1

    Results of promoter prediction on human chromosomes 21 and 22.

  3. 3.

    Supplementary Table 2

    Results of promoter prediction on human chromosomes 4, 21 and 22.

  4. 4.

    Supplementary Table 3

    Results of promoter prediction on HG for different distance criteria.

  5. 5.

    Supplementary Methods

  6. 6.

    Supplementary Notes

About this article

Publication history

Published

DOI

https://doi.org/10.1038/nbt1032

Further reading