To the Editor

Genome sequencing and annotation projects are no longer restricted to large sequencing institutes. With high-throughput next-generation sequencing technologies, draft genome sequences are now being determined at an unprecedented pace. However, more and more genome sequences are poorly annotated. To bridge this gap, there is an increasing need for easy-to-use, online genome-annotation tools1. Moreover, genome projects are being conducted by smaller consortia, and thus genome annotation should benefit maximally from community annotation efforts2 that reflect the knowledge available in the scientific community3. Therefore, we developed ORCAE (http://bioinformatics.psb.ugent.be/orcae/, formerly known as BOGAS), a wiki-style annotation portal. ORCAE offers public access to a wide variety of plant, fungal and animal genomes (Supplementary Table 1).

The basic setup of ORCAE is highly comparable to wiki systems such as MediaWiki, and the information page for each gene can be seen as a 'topic' page of a traditional text wiki. We developed ORCAE with a gene-centric vision in mind, which is reflected by the fact that the gene information pages (Fig. 1 and Supplementary Figs. 1 and 2) are the central pages in the system (Supplementary Methods), in contrast to chromosome-centric systems such as GNPannot4. The system is ideally suited to allow multiple people to simultaneously curate the initial automatic predictions, a typical setup in many genome projects where curators with diverse expertise join forces to improve the genome annotation. ORCAE offers functionality to browse, annotate and manage eukaryotic genome annotation projects. The system can be used both to present published genomes to the scientific community and to coordinate manual annotation and curation efforts in ongoing annotation projects. This is one of ORCAE's main advantages because users have a central access point where annotations can be queried or improved.

Figure 1: Gene page in the ORCAE resource.
figure 1

Through the extensive use of graphical representations, a clear overview of the data is provided to assist users in easily assessing the quality and accuracy of the offered annotations for a given gene locus.

For publicly available genomes, everyone is allowed to view all gene pages. Restricted genome projects are only accessible for registered users. Editing privileges are limited to participants in the particular genome project, whereas nonmembers of the consortium can receive editing privileges when approved by the consortium (Supplementary Methods).

The portal will keep track of all the modifications to both functional and structural annotations. This is done in a wiki-compliant manner: all edits are stored in the database, and thus a history of modifications is available for each locus (Supplementary Methods). This allows ORCAE to eliminate a bottleneck found in other online annotation tools, such as yrGATE5, where a curator needs to approve modifications before they become available. Indeed, in the spirit of community annotation, the quality of the presented annotation is the responsibility of the whole community. However, to ensure a certain level of quality, several automatic checks have been put in place that encompass all existing knowledge on the gene structure and functions (for example, whether a KOG or EC number exist) (Supplementary Methods). Finally, ORCAE is highly dynamic: upon modification of a gene model, all the available information is immediately updated and presented on the gene page.