Main

Microbiologists unite! This is a frequent plea in editorial columns such as this, in which the Editors extol the benefits, and indeed necessity, of improving communication and collaboration between microbiologists working in the same or different areas. This month, rather than repeat this plea, we thought that we would take a look at an excellent example of such communication in action.

It is perhaps surprising now to look back at the early days of microbial genome sequencing and realize that Escherichia coli K-12 (Ref. 1) was not the first microbial genome sequence to be completed — in fact, it was the seventh. At the time, Richard Moxon and Chris Higgins remarked2 that “we have reached the end of the beginning”. Many years earlier, its small size, rapid doubling time and basic growth requirements had made this Gram-negative microorganism the ideal model system for the early pioneers of modern bacterial genetics, particularly following Lederberg's discovery of recombination. More than 50 years on, E. coli continues to be regarded as an excellent model system, not only by those who classify themselves as microbiologists, but also by cell biologists, geneticists and biochemists.

Sequencing technologies have of course moved on since then and, with high-throughput, relatively low-cost whole-genome sequencing now commonplace, more than 150 microbial genomes have been sequenced, including three pathogenic strains of E. coli. However, the publication of a genome sequence does not signal the end of the hard work. Without comprehensive and accurate functional annotation, the availability of genome sequence data cannot be used to its best advantage.

As Fred Blattner stated in 1997, “Annotation is an ongoing task whose goal is to make the genome sequence more useful by correlating it with other knowledge”1. With this in mind, in mid-November last year, a group of 15 leading members of the E. coli bioinformatics community from the United States, Japan and Europe organized their own E. coli K-12 MG1655 genome annotation workshop. The objective was first to pool and coordinate the parallel collections of data from each laboratory — much of which is unpublished — and then go on to refine the annotated gene products and the boundaries of genes and pseudogenes.

This collaborative effort was a great success; in the course of just four and half days, the participants managed to work through and reliably assign a function and correct boundaries to 25% of the E. coli coding sequences. Indeed, the collaboration was so fruitful that the participants agreed to carry on back in their laboratories and, in the short time period that has passed since they met, they have worked through more than 50% of the coding sequences. These new data will be deposited with GenBank and made publicly available.

It is also gratifying to learn that the collaborative atmosphere was such that it was agreed that Professor Takashi Horiuchi of the National Institute for Basic Biology in Japan will deposit the sequence his group obtained when they re-sequenced a region of the MG1655 genome, which contains 341 differences from the original publication, and that these changes will be used to produce a revised GenBank entry for Blattner's original MG1655 deposition. The new sequence data and annotation will be incorporated into a new E. coli database to be launched later in 2004. Existing E. coli databases, such as Echobase (http://www.ecoli-york.org), will then feed their data into this new unified resource to prevent fragmentation and overlap of current efforts, and to improve the annotation of this model organism.

It has been observed in the past that the use of E. coli as a basic research tool in modern molecular biology is so widespread that it is difficult to define a fixed 'community' of E. coli researchers. This might indeed be the case, but as this recent workshop shows, the E. coli bioinformatics community is not only a cohesive group, but a cohesive group that is prepared to work together for the benefit of the wider research community. Perhaps for E. coli, the beginning of the end is now in sight.