Bioinformatics researchers shouldn't need coercion to act responsibly and collegially.
The policy on release of unpublished data from large genome centres has generated considerable discussion and some confusion, as your Editorial “Sacrifice for the greater good?” makes plain (Nature 421, 875; 2003). In our view, data sets from large, centralized, expensive genome data-collection projects should be freely available to the entire scientific community, immediately and with no restrictions or conditions.
Our position is that pre-publication release of large genome data sets is a special case, and not a principle that should be applied “throughout the world of biology”, as was asserted in your editorial. Large genome sequencing has become an expensive, factory-style operation, in which economies of scale can only be realized by very large centres. Large data production centres, established and supported by the scientific community, represent a different model of science from traditional investigator-initiated projects. We argue that they need to operate under different rules.
The broader scientific community supports the highly centralized model represented by the US large-scale genome centres (funded via the National Institutes of Health and the Department of Energy) on the condition that everyone in this community gets equal access to the data. If this is the case, everyone wins: large data sets are generated at the lowest cost and greatest speed, and scientific work progresses on multiple fronts without delay. In contrast, if genome centres restrict their data and get preferential access to it, then some members of the community will no longer support monopolistic funding models (in which large centres sequence one genome after another without peer review of each project). Instead, they will demand the right to compete with these empires, especially for the most scientifically desirable genomes. Other scientists, especially bioinformaticians, will seek to relocate to the centres to gain the advantage of early data access. Data restrictions will therefore promote factionalization where we should be seeking efficiencies of scale, and centralization where we should be promoting diversity.
We agree with your editorial that the proposed new policy, recently released for comment by the US genome centres (see http://www.genome.gov/page.cfm?pageID=10506537), is ambiguous. It states that genome sequence data “should be available for all to use without restriction”. This statement, which notably uses the word “should” rather than “must”, is qualified by a lengthy discussion of conditions, including a reference to the sequence producers' interest in “the first peer-reviewed published analysis of the results of the sequencing project”. This reflects the real concern of the genome centres that prepublication data access may put the scientists there at a competitive disadvantage. We understand these concerns, but we believe that the qualifying discussion cannot coexist with the principle of “without restriction”. We propose that these qualifications are simply dropped to avoid confusion among data consumers, journals and journal reviewers. The Human Genome Project has been a spectacularly successful demonstration that the “Bermuda rules” of free access without restriction do work, for everyone.
As bioinformaticians, we have an important role in this process. We reaffirm our own commitment to respecting the goals of the scientists at the genome centres, who should be consulted as part of any large-scale analysis of unpublished genome data, and included as collaborators where appropriate. It is a serious problem that these invaluable centres feel compelled to coerce such simple scientific courtesy from our community. We encourage our peers in bioinformatics to act responsibly, cooperatively and collegially, to help to assure open, unrestricted, immediate release from large community-driven data-collection projects.
About this article
Cite this article
Salzberg, S., Birney, E., Eddy, S. et al. Unrestricted free access works and must continue. Nature 422, 801 (2003). https://doi.org/10.1038/422801a
The Bermuda Triangle: The Pragmatics, Policies, and Principles for Data Sharing in the History of the Human Genome Project
Journal of the History of Biology (2018)