The BIOSAPIENS Network of Excellence exists to provide infrastructure to support laboratories across Europe in annotating genome data using both bioinformatics tools and experimental data. The annotations generated by the network will be made available in the public domain and easily accessible on the web.

The first draft of the human sequence, published in 2001, was followed by other related genomes and the detailed resequencing of the human genome. This explosion in genomic information was achieved in a remarkably short time period. However, a DNA sequence must be interpreted in terms of the RNA and proteins that it encodes, and the promoters and regulatory regions that control transcription and translation. A genomic sequence also provides a convenient ‘coordinate’ reference onto which functional information can be mapped.

Annotation can be described as the process of ‘defining the biological role of a molecule in all its complexity’ and mapping this knowledge onto the relevant gene products encoded by genomes (see Table 1). This involves both experimental and computational approaches and, indeed, absolutely requires their integration. As such, this effort will occupy the majority of biologists (experimental and theoretical) for the foreseeable future. The network provides the necessary expertise and infrastructure to allow distributed annotation. These expert annotations will be made available to everyone on the web.

Table 1 BioSapiens genome annotation levels
Table 2

Currently, we are far from solving even simple annotation tasks in silico. There are two key problems in annotation. Firstly, many of the current computational tools need further development and careful validation against experimental data. Secondly, the integrated results from all this work need to be made available to experimenters in ways that will guide their future work.

In this very dynamic scientific area scientists will provide annotation with their best methods while they continue with the development of the underlying databases and algorithms that are to be incorporated in future annotation pipelines.

Together with this systematic effort to develop general annotations we have selected important biological areas of application in which in-depth computational analysis might be required. Our goal is to provide a focus for the network and offer a wide variety of Bioinformatics methods in order to advance the understanding of diseases. The objective of this network is to extensively and thoroughly analyze the genomic information related to various diseases, and to create a publicly available resource for each of them. This resource will combine existing information with data produced within the network, and allow discussion with experimental scientists working in the corresponding areas.

The first two selected topics are Down's syndrome (chromosome 21) and HCV and HIV viruses, and they will later be followed by other diseases.

In the case of HCV, we will use all the available methods developed by the network partners to try to assign a function to the less well-characterized proteins. There are at least three proteins that are potential antiviral targets for which no structural or functional information is available, and the network will strongly focus on their analysis. In the case of HIV, the aim is both to correlate the variability of the virus with its ability to escape drug therapy, and to use predictive methods to identify immunogenic regions that are potential candidates for vaccine development.

For the study of the human chromosome 21 (Hsa21) we will reannotate the full chromosome with the integrated BioSapiens battery of methods, but also experimentally assay the function of conserved nongenic sequences and contribute to the sequencing of 21p, which is part (∼6%) of the human genome that remains to be sequenced. In addition, we will analyze the HapMap results for this chromosome, as well as the results of the ‘Perlegen’ resequencing effort, generate 21p CGH BAC arrays covering the long arm of human chromosome 21, and analyze transcriptome results in human and mouse models.

How to distribute the annotations generated by BioSapiens?

For the success of the project it is essential to make the complex annotations generated by BioSapiens accessible to the experimental biologist. For this task we have adopted the DAS technology. DAS was first created for the visualization of the annotations of the human genome, using the sophisticated Ensembl system for the representation of the results (www.emsembl.org), which is the entry point most commonly used by experimental biologist to access the human genome information. In the context of BioSapiens we also have other variants to view the annotations related with protein sequences (DASTY), and three-dimensional protein structures (SPICE, extending the version originally developed by the Sanger Institute) (see Figure 1). The annotations generated by the DAS servers can be viewed from any site worldwide using any of these three very easy to use systems (DAS clients), with the additional possibility for the users to select the annotations appropriate to their biological questions. In this first year of the network our goal has been to establish the DAS infrastructure, and to test the DAS access systems (clients) before releasing them to the public domain. We now intend to focus on the development of new tools, and very importantly, on setting up the evaluation, scoring and consensus systems to provide not only the best annotations but also a scientifically sound evaluation of their reliability that can help users to assess the significance of the predictions.

Figure 1
figure 1

The BioSapiens DAS portal. Screen shot of the BioSapiens web portal showing functional annotations for a given protein sequence collected from the predictions provided the various teams of the network. The annotations are simultaneously displayed along the sequence of the protein and in their position in the corresponding three-dimensional structure of that protein. Using this type of systems biologist will be able to analyze genomes at different levels of complexity by freely combining the annotations provided by the DAS servers developed by the BioSapiens network. Figure kindly provided by Brendan Vaughan (EBI-EMBL).

The network's Scientific Advisor Committee is organized to stimulate participation in biological projects and to help with their critical views to better organize the activities of the network.

Barry Honig, Columbia University (chair); Siv Andersson, University of Uppsala; Janan Eppig, The Jackson Laboratory; Ian Harrow Pfizer, Veronica van Heyningen, MRC Human Genetics Unit, Edinburgh; Minoru Kanehisa, Kyoto University; Jonathan Knowles, Hoffman-La Roche; Carlos Martinez-A, CSIC, Madrid; Iain Mattaj, EMBL; Trudy McKay, North Carolina State University; Gert-Jan van Ommen, University of Leiden; Kai Simons, Max Planck Institute of Molecular Cell Biology and Genetics; Mathew Woodwark, AstraZeneca.

Collaboration with the community

The Network has established a permanent ‘European School of Bioinformatics’ (see http://www.BioSapiens.info), to train bioinformaticians and to encourage best practice in the exploitation of genome annotation data for biologists. We have organized two basic training workshops, one in Verona, Italy followed by the related Advanced Workshop on ‘Molecular Interactions’ (Sponsored by ESF), and in Nijmegen, Netherlands, followed by a workshop on 7TM Receptors.

As in other networks, BioSapiens is designed to create a focus of collaboration and development in Bioinformatics in Europe; we have supported a number of sectorial meetings, such as CAPRI: Critical Assessment of Predictions on Interactions; CASP6: Critical Assessment of Techniques for Protein Structure Prediction; Bologna Winter School in Bioinformatics; meeting on ‘Genome Annotation’ at ISMB/ECCB2004. This year we will contribute to the organization of: the annual European Conference for Computational Biology (next meeting at Madrid in September 2005, www.eccb05.org); II meeting on ‘Genome Annotation’; future editions of the Bologna Winter School in Bioinformatics that will host a permanent ‘BioSapiens corner’; the BioSapiens workshop at 2nd ESF Conference: Functional Genomics and Disease (www.functionalgenomics.org.uk). We will also host specific meetings with scientist working on Down's syndrome (chromosome 21) and a Symposium, ‘Molecular and computational Biology of HIV and HCV’ jointly with the viRgil NoE, which focuses in HIV.

The web portal

All the network information and activities, together with the various web DAS clients displaying the full range of genome annotations, is progressively incorporated into the BioSapiens Portal (http://www.biosapiens.info) together with the information about how to install the DAS systems locally.

The portal is also the entrance to the activities organized by BioSapiens, and to the interaction with the groups participating in the network.