E-Biosci: a European approach to handling biological information
For most researchers, being online is the electronic equivalent of discovering Eldorado. They surf between databases and consult journal articles that were previously unavailable, or were only obtainable via the tortuous process of interlibrary loans. They are now able to download full text, data sets and images and this empowerment has made them thirsty for more. They now have expectations, justified or otherwise, that access to information should be free and that one, or at most two mouse clicks should suffice to retrieve the data they seek.
In bio-medical sciences, Medline and PubMed in the United States provide a set of tools that can be used to search for information and retrieve it from a collection of abstracts linked to DNA and protein sequence databases. There is, however, plenty of scope for refinement. Examples would be faster and/or fuzzy search algorithms, further innovation in full-text searching and the development of better discriminative criteria for interlinking and establishing relationships between documents, as well as linking publications with data in a variety of other formats that include structures, images and animations.
Harold Varmus's 1999 E-BioMed/PubMed Central proposal addressed some of these needs. It envisaged the establishment of what has since been termed a GenBank of the literature � a centralized, freely accessible, comprehensive collection of full text publications that would facilitate searching and improve the connectivity of the published literature with genome sequence, structural and other data.
Unfortunately, there was controversy over certain aspects of Varmus�s proposal and this prevented it being implemented in its original form. Questions arose over the absence, or optional presence of peer-review, the need to physically transfer publishers� material to a central site and the lack of realism involved in aiming to distribute content owned by others without imposing any charge. Even more recently, as discussed elsewhere in this forum, important questions relating to interconnectivity and unrestricted database access have been pushed even further into the background by the Public Library of Science's call for a boycott in it's fight for open access to published literature.
Inexplicably, this call for a boycott focuses on primary journal publications and ignores a potentially much more serious problem, the growing trend towards limited access to, and usage of database information. This trend was further strengthened by the 1996 EU database protection directive that amended European legislation governing the extraction and re-utilization of database information. Another aspect of this problem was highlighted recently when the publisher of the Science journal accepted Celera Genomic�s terms for the release of their human genome sequence data. The information will not be placed in public data banks and access via Celera�s own site will be restricted to those who agree not to redistribute the information. The implications of this latter condition will depend on how the term redistribute is interpreted. They could extend to limitations being placed on the freedom to publish studies based on the data, to carry out large-scale bio-informatic analysis and to incorporate derived data into other databases.
In response to the NIH initiative, the EMBO decided to lead a collaborative effort to establish a European-based, digital information resource network with a global role. During a series of discussions with interested parties, including research organizations, libraries, learned societies, publishers, individual research scientists and representatives of numerous EU member states, the shortcomings of the NIH proposal were identified and the E-BioSci was formulated. E-BioSci has been defined as an extensive, networked platform that will combine the skills and content already present or being developed in various centres in Europe. It will work in harmony with other global initiatives such as PubMed Central and with publishers and other information providers. Although it is superficially complex, this means it accurately reflects the European dimension of the project. It also offers potential advantages in terms of speed of access and the provision of backup and secure storage facilities, and it will allow queries to be put in different language formats. The initiative recently received an important boost with an EU commitment to funding.
E-BioSci provides an extensive set of links through the biological information chain and will:
The E-BioSci network will:
E-BioSci will act as an information portal and provide hosting services for electronic publications. The aim will be to provide a platform for the dissemination of material that has previously undergone peer review and authentication by an independent body. E-BioSci need not be the sole repository of such material and authors can submit their reviewed and authenticated manuscripts to as many sites as they wish.
This emphasis on quality assessment and control distinguishes E-BioSci from other e-publishing initiatives, including those modelled on the Los Alamos Physics Archive (e.g. the eprint-based Cogprints server) and from commercial services such as those offered by BioMed-Central. Authors rely on the perceived quality of their publications when making funding applications and advancing their careers and, in the absence of reliable and widely-accepted alternatives, they are likely to be reticent about abandoning a tried-and-trusted assessment model. From the reader�s point of view, editorial control is, at least in part, a guarantee that technical standards have been met, the conclusions are adequately supported by the experimental data and presentation meets acceptable standards of clarity.