Sir

The guaranteed and sustained public availability of primary, fundamental, experimental scientific data is a matter of considerable concern. Such data include (but are not exclusive to) nucleotide sequences of biological organisms, amino-acid sequences of proteins, three-dimensional structures of biological molecules, and other data produced by genomics and proteomics studies.

In Correspondence (Nature 417, 222; 200210.1038/417222b), D. Agosti and N. F. Johnson stress the importance of open access to taxonomic data, noting that the situation for basic taxonomic data is much worse than for genomic data. But even for genomic and structural data there are no internationally agreed mechanisms for ensuring continuing open access to data, and no strict rules for their deposition in public archival databases. These pressing issues have recently been considered by the Inter-Union Bioinformatics Group (IUBG), which contains, under the umbrella of the International Council for Science (ICSU), representatives from several international unions: the International Union for Pure and Applied Biophysics, the International Union of Biochemistry and Molecular Biology, the International Union of Pure and Applied Chemistry, the International Union of Crystallography and the ICSU Committee on Data for Science and Technology. The IUBG report of May 2002 is available at http://md.chem.rug.nl/~berends/IUBG-FinalReport.html or via http://www.IUPAB.org.

In the fields of genomics, proteomics and macromolecular structures, the primary scientific data, which form an essential part of a scientific publication, are not included in detail in publications, but are deposited in databases. It has always been the practice that those who claim scientific advances in their published work support their claim by making the objective data on which their claim is based openly available. Therefore, such data must be available on at least the same basis as the publication itself, if the common standards of scientific integrity are to be maintained.

The databases concerned are at present maintained by institutions that do not have the support status of national libraries. It is not yet generally recognized at government level that the archiving of such data needs protection similar to the archiving of literature; the responsibilities to maintain the collections and safeguard their integrity and access into the distant future are not clearly defined and internationally agreed.

The IUBG report contains four explicit statements and seven recommendations. It recommends: first, that the international scientific unions identify key archival databases and have an active role in standardization; second, that publishers require authors to deposit their primary data in a key archival database; third, that funding agencies insist on such deposition and actively support primary-data repositories; and fourth, that legislators ensure that laws on intellectual property rights allow the fair use of data for scientific and educational purposes.

The aim of the IUBG report is to stir up the scientific community worldwide. The US government has taken the lead by supporting GenBank and the Protein Data Bank, but the maintenance of archival databases is a supranational activity. At present there are different models for funding various databases and there are different funding models in the United States, Europe and Japan. None has an explicit long-term commitment. The obligation to deposit data must be followed worldwide. There must be a single international archive for each class of data, even if it is distributed over more than one site, and data must remain uniform in format. There is an urgent need for international agreements to stabilize the situation and to guarantee cooperation, consistency and funding.