Accurate and universal delineation of prokaryotic species

Journal name:
Nature Methods
Volume:
10,
Pages:
881–884
Year published:
DOI:
doi:10.1038/nmeth.2575
Received
Accepted
Published online

The exponentially increasing number of sequenced genomes necessitates fast, accurate, universally applicable and automated approaches for the delineation of prokaryotic species. We developed specI (species identification tool; http://www.bork.embl.de/software/specI/), a method to group organisms into species clusters based on 40 universal, single-copy phylogenetic marker genes. Applied to 3,496 prokaryotic genomes, specI identified 1,753 species clusters. Of 314 discrepancies with a widely used taxonomic classification, >62% were resolved by literature support.

At a glance

Figures

  1. Comparative performance assessment of specI.
    Figure 1: Comparative performance assessment of specI.

    (a) Performance of the 40 pMGs (combined pMGs) and 16S rRNA gene species-level clustering cutoffs (cutoff estimated) as well as the classical 16S rRNA gene 97% nucleotide sequence identity cutoff (97% ID cutoff) in terms of average precision and recall using indicated clustering algorithms. The cutoffs are reported in the inset. (b) Accuracy of species placements compared to type strains using specI (two implementations, using different alignment algorithms; Online Methods), Amphora2 and a 16S rRNA nearest-neighbor classifier (16S NN classifier; which included an optimized 99% ID cutoff) based on 130 holdout genomes, whose taxonomy is not disputed. (c) Empirical runtimes of specI, Amphora2 and ANI calculations for analysis of 100 randomly chosen genomes. For ANI calculations, only the MUMmer alignment step of JSpecies was benchmarked (Online Methods). (d) Large-scale application of the species-level clustering method to 3,496 high-quality genomes. The genomes were either in agreement with the NCBI Taxonomy data or showed discrepancies. In these cases, the species-level clustering either was supported by literature, or implied splits or merges of named species. Genomes that were not taxonomically identified were subdivided into different types according to their species clusters.

  2. Phylogenetic trees and species-level clustering of Prochlorococcus displaying discrepancies with the NCBI Taxonomy data.
    Figure 2: Phylogenetic trees and species-level clustering of Prochlorococcus displaying discrepancies with the NCBI Taxonomy data.

    Trees were independently built from concatenated alignments of the 40 universal single-copy phylogenetic marker genes and from the 16S rRNA gene (see Supplementary Figs. 5 and 6 for additional examples). Species-level clusterings and phylogenetic trees of the combined 40 pMGs and the 16S rRNA gene suggest that Prochlorococcus marinus ecotypes form individual species. Cl., specI cluster.

References

  1. Rosselló-Mora, R. & Amann, R. FEMS Microbiol. Rev. 25, 3967 (2001).
  2. Stackebrandt, E. et al. Int. J. Syst. Evol. Microbiol. 52, 10431047 (2002).
  3. Kämpfer, P. & Glaeser, S. Environ. Microbiol. 14, 291317 (2012).
  4. Richter, M. & Rosselló-Móra, R. Proc. Natl. Acad. Sci. USA 106, 1912619131 (2009).
  5. Chun, J. et al. Int. J. Syst. Evol. Microbiol. 57, 22592261 (2007).
  6. Stackebrandt, E. & Goebel, B.M. Int. J. Syst. Bacteriol. 44, 846849 (1994).
  7. Stackebrandt, E. & Ebers, J. Microbiol. Today 33, 152 (2006).
  8. Konstantinidis, K. & Tiedje, J. Proc. Natl. Acad. Sci. USA 102, 25672572 (2005).
  9. von Mering, C. et al. Science 315, 11261130 (2007).
  10. Wu, M. & Scott, A. Bioinformatics 28, 10331034 (2012).
  11. Ciccarelli, F. et al. Science 311, 12831287 (2006).
  12. Creevey, C.J. et al. PLoS ONE 6, e22099 (2011).
  13. Powell, S. et al. Nucleic Acids Res. 40, 9 (2012).
  14. Murray, R.G.E. Int. J. Syst. Bacteriol. 46, 831 (1996).
  15. Konstantinidis, K. & Tiedje, J. J. Bacteriol. 187, 62586264 (2005).
  16. Kremer, K. et al. J. Clin. Microbiol. 37, 26072618 (1999).
  17. McDonald, D. et al. ISME J. 6, 610618 (2012).
  18. Jousselin, E., Desdevises, Y. & Coeur d'acier, A. Proc. Royal Soc. B Biol. Sci. 276, 187196 (2009).
  19. Chen, X., Li, S. & Aksoy, S. J. Mol. Evol. 48, 4958 (1999).
  20. Brenner, D.J. Bergey's Manual of Systematic Bacteriology 1, 408420 (The Williams & Wilkins Co., 1984).
  21. Chisholm, S.W. et al. Nature. 334, 340343 (1988).
  22. Moore, L., Rocap, G. & Chisholm, S. Nature 393, 464467 (1998).
  23. Cowan, S. J. Gen. Microbiol. 67, 18 (1971).
  24. Sorek, R. et al. Science 318, 14491452 (2007).
  25. Arumugam, M., Harrington, E., Foerstner, K., Raes, J. & Bork, P. Bioinformatics 26, 29772978 (2010).
  26. Altschul, S. et al. Nucleic Acids Res. 25, 33893402 (1997).
  27. Huang, Y., Gilna, P. & Li, W. Bioinformatics 25, 13381340 (2009).
  28. Finn, R., Clements, J. & Eddy, S. Nucleic Acids Res. 39, 37 (2011).
  29. Pruesse, E. et al. Nucleic Acids Res. 35, 71887196 (2007).
  30. Caporaso, J. et al. Bioinformatics 26, 266267 (2010).
  31. Jensen, L. et al. Nucleic Acids Res. 36, D250D254 (2008).
  32. Pearson, W. & Lipman, D. Proc. Natl. Acad. Sci. USA 85, 24442448 (1988).
  33. Letunic, I. & Bork, P. Nucleic Acids Res. 39, W475W478 (2011).
  34. Edgar, R. Bioinformatics 26, 24602461 (2010).
  35. Delcher, A., Salzberg, S. & Phillippy, A. Curr. Protoc. Bioinformatics 10, 10.3 (2003).
  36. Muller, J., Creevey, C., Thompson, J., Arendt, D. & Bork, P. Bioinformatics 26, 263265 (2010).
  37. Talavera, G. & Castresana, J. Syst. Biol. 56, 564577 (2007).
  38. Stamatakis, A. Bioinformatics 22, 26882690 (2006).

Download references

Author information

Affiliations

  1. European Molecular Biology Laboratory, Heidelberg, Germany.

    • Daniel R Mende,
    • Shinichi Sunagawa,
    • Georg Zeller &
    • Peer Bork
  2. Max Delbrück Centre for Molecular Medicine, Berlin, Germany.

    • Peer Bork

Contributions

P.B., D.R.M., S.S. and G.Z. designed the study. D.R.M. developed and implemented the program, D.R.M. and G.Z. performed the experiments, D.R.M., S.S. and G.Z. analyzed the data, and D.R.M., S.S., G.Z. and P.B. wrote the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (1,611 KB)

    Supplementary Figures 1–8, Supplementary Tables 1–3, 5–7, 15, 17, 19 and 20, and Supplementary Note

Excel files

  1. Supplementary Table 4 (373 KB)

    NCBI Taxonomy information of type strains listed on the list of prokaryotic names with standing in nomenclature (LPSN;http://www.bacterio.net/) that could be linked to NCBI, including their sequencing status

  2. Supplementary Table 8 (19 KB)

    ANIb values of Prochlorococcusmarinus

  3. Supplementary Table 9 (14 KB)

    ANIm values of Prochlorococcusmarinus

  4. Supplementary Table 10 (14 KB)

    ANIb values of the Serratia and Rahnella clades

  5. Supplementary Table 11 (14 KB)

    ANIm values of the Serratia and Rahnella clades

  6. Supplementary Table 12 (15 KB)

    ANIb values of the Buchnera clade

  7. Supplementary Table 13 (15 KB)

    ANIm values of the Buchnera clade

  8. Supplementary Table 14 (508 KB)

    Cluster assignments for the 3,496 genomes used in this study

  9. Supplementary Table 16 (96 KB)

    Literature-based reclassifications of species assignments of NCBI Taxonomy database

  10. Supplementary Table 18 (28 KB)

    Assignments of genomes were previously not assigned to a named species to known species using the species clustering strategy presented in this publication

Additional data