Sir

When the Pseudomonas aeruginosa genome project began in 1997, one question facing us was how best to annotate gene descriptions and other information. We decided to take a community approach, in an attempt to improve the quality of annotation and to use the resources of all those researchers working with this versatile pathogenic bacterium.

Our results support aspects of ‘open annotation’ approaches for the human genome, as described in Correspondence1 and News2. However, our experience has suggested that certain precautions must be taken.

For the community project, termed PseudoCAP3, we recruited volunteers from the Pseudomonas research community, and later others, to submit annotations of genes or gene families with which they were familiar, through the direction of a single project moderator. Unlike the annotation jamboree for the Drosophila genome project4, all communications with, and submissions by, the volunteer participants were made exclusively through the Internet4.

The PseudoCAP annotations were overlaid on a genome viewer console developed by PathoGenesis, containing layers of other automatically generated analyses and literature reference information.

This resource, coupled with a critical, conservative annotation approach, was used to generate the final genome annotations, which were also classified according to whether they were based on (1) functional studies in P. aeruginosa ; (2) high homology to functionally studied genes in other organisms; (3) low homology to functionally studied genes; or (4) homology to hypothetical genes (see the accompanying paper in this issue5).

We were pleasantly surprised at the enthusiasm for PseudoCAP — 61 participants made 1,741 submissions. Most of the later participants did not work on Pseudomonas but were researchers who wanted to examine genes of particular function. Judging from this response, an adequate number of annotators could probably be recruited for other community annotation projects.

Given the experimental nature of our approach, we allowed participants to submit whatever information they wished; as a result, variation in the quality of annotations led to numerous inconsistencies. Therefore, review of all annotations by a core group was essential. For the future, we recommend that community participants should be required to clearly define their annotation methods and criteria for using any particular functional description, and adopt a consistent, searchable format. Otherwise inconsistencies will not be easily detected, and useful information (for example, retrieval of all annotations based on a certain type of functional study) will not be readily available.

Final annotations for the genome project were based almost exclusively on functional studies of the gene in question, or on close homology of the encoded protein to functionally studied proteins. This method involved significant manual intervention, which could be automated if a sequence database based only on functional studies is created. SwissProt and the National Center for Biotechnology Information's RefSeq are beginning to develop in part along these lines.

In the meantime, we recommend that genome projects consider a community-aided annotation approach, coupled with critical, conservative annotation by a core group of project annotators. If such community involvement occurs through the Internet in a formal, well publicized setting, annotations can continue to be updated and corrected after a genome sequence is published.