Sir

Many prokaryotic strains used for genome sequencing projects are poorly documented and not generally available.

With improvements in sequencing technology and growing recognition of the value of microbial genome sequence data, the number of microbial genome-sequencing projects is increasing rapidly. There are 56 completed prokaryotic genome sequences (10 strains of Archaea and 46 of Bacteria), and another 210 in progress (see, for example, http://www.tigr.org and http://www.integratedgenomics.com).

Of these 266 projects, some of which are performed on organisms not available in pure culture — some endosymbionts, for example — only 51 represent the type strain of the species. (A type strain is made up of living cultures of an organism descended from the nomenclatural type.) Of the rest, 138 represent non-type strains (a non-type strain is often selected only because it happens to be close at hand); 31 projects concern symbionts and environmental (uncultured) strains; 32 do not specify a strain; 14 represent prokaryotic species with invalid species names (validly named bacterial species are either on the 1980 approved lists of names, or validated after 1980 by taxonomic description in the International Journal of Systematic and Evolutionary Microbiology, or by validation in that journal: an invalid name has no standing in nomenclature and may be changed subsequently. Only 115 represent the type species of the genus (the typus of the genus included when the genus name was originally validly published) and only 123 are deposited in public culture collections. Mandatory deposition of the type strain of any validly described (culturable) prokaryotic species in a major public culture collection guarantees the availability of the strain and allows cross-referencing of published data.

When there were only a few projects, these taxonomic and preservation issues were not so evident. With the explosion in sequencing and the sequencing of multiple strains of a species (including Escherichia coli, Staphylococcus aureus and other major pathogens), questions of strain identity and safekeeping assume more importance. Deposition of strains in public collections with long-term funding is the only way to ensure their maintenance and their continuing availability to the scientific community. As things stand, it is a real possibility that a strain for which a wealth of genomic data has been generated may become “extinct” through loss of viability.

We propose that the following standards should be adopted by the entire community. First, genome-sequencing project lists and databases should include the name of the strain sequenced and its associated culture collection accession number(s), as well as its origin. Second, the type strain of a species should be used for sequencing unless other factors make this inappropriate. Third, strains for which genome sequences have been, or are being, generated should be deposited in at least two major public biological resource centres, such as the American Type Culture Collection, the German Collection of Microorganisms and Cell Cultures, the Pasteur Institute Collection or the Japanese Collection of Microorganisms.