The exponentially increasing number of sequenced genomes necessitates fast, accurate, universally applicable and automated approaches for the delineation of prokaryotic species. We developed specI (species identification tool; http://www.bork.embl.de/software/specI/), a method to group organisms into species clusters based on 40 universal, single-copy phylogenetic marker genes. Applied to 3,496 prokaryotic genomes, specI identified 1,753 species clusters. Of 314 discrepancies with a widely used taxonomic classification, >62% were resolved by literature support.
We thank the members of the Bork group for helpful discussions and Y. Yuan and members of the European Molecular Biology Laboratory information technology core facility for managing the high-performance computing resources. We acknowledge funding provided by the CancerBiome project (European Research Council project reference 268985), the 'METACARDIS' project (FP7-HEALTH-2012-INNOVATION-I-305312) and the International Human Microbiome Standards project (HEALTH-F4-2010-261376).
NCBI Taxonomy information of type strains listed on the list of prokaryotic names with standing in nomenclature (LPSN;http://www.bacterio.net/) that could be linked to NCBI, including their sequencing status
ANIb values of Prochlorococcusmarinus
ANIm values of Prochlorococcusmarinus
ANIb values of the Serratia and Rahnella clades
ANIm values of the Serratia and Rahnella clades
ANIb values of the Buchnera clade
ANIm values of the Buchnera clade
Cluster assignments for the 3,496 genomes used in this study
Literature-based reclassifications of species assignments of NCBI Taxonomy database
Assignments of genomes were previously not assigned to a named species to known species using the species clustering strategy presented in this publication