GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes

Article metrics

Abstract

We present 'gene prediction improvement pipeline' (GenePRIMP; http://geneprimp.jgi-psf.org/), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes and split genes. We found that manual curation of gene models using the anomaly reports generated by GenePRIMP improved their quality, and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome-sequencing and annotation technologies.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: GenePRIMP analysis of gene calls in the M. palustris genome by three gene callers.
Figure 2: The GenePRIMP processing pipeline.

References

  1. 1

    Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J. & Sayers, E.W. Nucleic Acids Res. 38, D46–D51 (2010).

  2. 2

    Ishino, Y., Okada, H., Ikeuchi, M. & Taniguchi, H. Proteomics 7, 4053–4065 (2007).

  3. 3

    Smollett, K.L. et al. Microbiology 155, 186–197 (2009).

  4. 4

    Kyrpides, N.C. Nat. Biotechnol. 27, 627–632 (2009).

  5. 5

    Hyatt, D. et al. BMC Bioinformatics (in the press).

  6. 6

    Besemer, J., Lomsadze, A. & Borodovsky, M. Nucleic Acids Res. 29, 2607–2618 (2001).

  7. 7

    Delcher, A.L., Bratke, K.A., Powers, E.C. & Salzberg, S.L. Bioinformatics 23, 673–679 (2007).

  8. 8

    Zhu, H.Q., Hu, G.Q., Quyang, Z.Q., Wang, J. & She, Z.S. Bioinformatics 20, 3308–3317 (2004).

  9. 9

    Tech, M. & Meinicke, P. BMC Bioinformatics 7, 121 (2006).

  10. 10

    Yu, G.X. et al. Nucleic Acids Res. 35, 3953–3962 (2007).

  11. 11

    Nagy, A. et al. BMC Bioinformatics 9, 353 (2008).

  12. 12

    Castellana, N.E. et al. Proc. Natl. Acad. Sci. USA 105, 21034–21038 (2008).

  13. 13

    Markowitz, V.M. et al. Nucleic Acids Res. 38, D382–D390 (2010).

  14. 14

    Aziz, R.K. et al. BMC Genomics 9, 75 (2008).

  15. 15

    Bocs, S., Cruveiller, S., Vallenet, D., Nuel, G. & Medigue, C. Nucleic Acids Res. 31, 3723–3726 (2003).

Download references

Acknowledgements

We acknowledge the help and support of I. Anderson, K. Mavromatis, X. Zhao and V. Markowitz. GenePRIMP was developed under the auspices of the US Department of Energy′s Office of Science, Biological and Environmental Research Program and by the University of California, Lawrence Berkeley National Laboratory under contract DE-AC02-05CH11231, Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344 and Los Alamos National Laboratory under contract DE-AC02-06NA25396. Validation and improvement of the system was supported by US National Institutes of Health Data Analysis and Coordination Center contract U01-HG004866. The work conducted by the US Department of Energy Joint Genome Institute is supported by the Office of Science of the US. Department of Energy under contract DE-AC02-05CH11231.

Author information

N.N.I. and N.C.K. conceived the initial approach. N.N.I. and A.P. designed the system. A.P. implemented the GenePRIMP code base and web portal. S.D.H. contributed to the development of the web portal. N.N.I., N.M., G.O. and A.L. manually curated the genomes sequenced at the Department of Energy Joint Genome Institute and contributed to testing and validation.

Correspondence to Amrita Pati.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9, Supplementary Table 1 and Supplementary Data 1–5 (PDF 3711 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Pati, A., Ivanova, N., Mikhailova, N. et al. GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes. Nat Methods 7, 455–457 (2010) doi:10.1038/nmeth.1457

Download citation

Further reading