Policies on release of biological data should reflect reality, to the benefit of all.
Fierce controversy surrounds the issue of data sharing by authors of publications in major scientific journals. Private-sector authors are discouraged or shut out from publishing in the open literature because of demands for unlimited openness that are unacceptable to their corporate employers. We believe that ways should and can be found to maximize the amount of information that is openly shared.
In February last year, Nature1 and Science2 published the famous papers reporting the 'draft' sequence of the 3.2-billion-base-pair human genome. The Nature paper was by the publicly funded International Human Genome Sequencing Consortium; the Science paper was from Celera Genomics Corporation. Although the consortium's sequencing data were all rapidly released and deposited in GenBank as soon as available, Celera posted its data on its own website (http://www.celera.com) on publication, limiting free access to 1 million base pairs per day. Academic researchers requesting the entire sequence are required to sign a licensing agreement; those in the private sector have to negotiate a fee as a prerequisite for access.
In April this year, Science published two papers3,4 reporting the draft genome sequences for two subspecies of rice, Oryza sativa. One, from Syngenta International, has limitations on data access essentially identical to those of Celera's human genome sequence. Much controversy has resulted from these constraints, with criticisms directed both at the private companies and at Science. Some critics believe that any limits on data access violate norms of standard scientific practice rooted in openness and unrestricted access to all data underlying a publication in the open literature. Eric Lander, first author of the human genome consortium's sequence paper, for example, believes that “if you choose to publish a claim, you must release all the 'integral data' supporting it, as determined by editors and peer reviewers”5. We have a different, more nuanced view, which we believe better acknowledges the realities of science as it is currently practised and funded.
Research funded by a private company is the intellectual property of that company, whereas data and other results derived from public (federal) funding are accessible not only to academic and other non-profit institutions but also to private companies. The challenge is to suggest how the private sector can be persuaded to share more data, to the benefit of all.
One possibility that has been proposed in various contexts6,7 is to start a timer on the deposition of certain data whereby a journal or other depository agrees to restrict access to the source data underlying a paper for a specified duration; or the data could be lodged with a trustee who ensures that the data were indeed deposited at the agreed time. Careful stipulations would be required both for how long the timer is set to run as well as precisely when it starts, but the idea is to permit a set duration for commercial exploitation (including filing of patent applications) on inventions derived from the data. The US Patent and Trademark Office allows up to one year before a provisional patent application is converted to a utility patent application, giving an applicant time to perform additional research towards clarifying the value of the invention while retaining the early priority date; thus one year might be a reasonable time for such a timer to run. This is similar to past practices at databases such as the Protein Data Bank.
The responsibility for implementing this scheme could rest with the journal or with a respected non-profit foundation (for example, the Institute for Scientific Information or the Federation of American Societies for Experimental Biology). In consultation with GenBank (or a relevant public repository), the journal (or foundation) could provide access to the necessary files after the timer expires. It would be very useful to know the consequences of varying timer periods, as well as how much privately held data actually contribute to the commercial viability of a company, which could be investigated in a pilot study.
As time goes by, data lose value as new discoveries (and new technologies for reacquisition of the same data) are made, and as science inevitably proceeds unpredictably into new areas. The 'timer' mechanism would allow a company to publish valuable data that would otherwise remain private, while offering some protection for a limited duration for it to use the data exclusively. This role might be uncomfortable for journals and trustees, so it is important to explore fully a mechanism that all sides would have confidence in. An added concern could arise if implications for national or international security (for example, potential detection signatures in a pathogen's genomic sequence) emerged while the data were held on deposit before publication.
Another possibility would be to strengthen, perhaps even to codify in law, the research exemption by which a researcher using patented technologies is usually free of patent infringement fears if the research is fundamental and exploratory. This would not significantly affect the status quo, although its degree of protection would require careful legal definition and might be difficult to negotiate. However, such a law would permit the advancement of science using new technologies — such as PCR (polymerase chain reaction) or expressed sequence tag (EST) sequences as probes — without fear that the inventor of the technology would 'reach through' to claim intellectual property rights on new discoveries and thus discourage the original research. This protection would not extend to the use of someone else's patented technology for an economic return, only for fundamental research intended for open publication and dissemination. The basic researcher would benefit from the use of new, powerful tools, and the entire scientific community, public and private, would benefit from the new knowledge gained.
There is an urgent need to find ways of giving incentives to the private sector, which now controls vast amounts of valuable data that have no obvious short-term commercial value but are of great potential research value. Most of the human genome sequence (about 98% of it) is non-coding; derestricting access to this part of it would not seem to threaten Celera's stated goal of disovering candidate drugs based on those portions of the genome that encode expressed proteins. Additionally, the sheer volume of data from high-throughput sequencing centres challenges even the most advanced and sophisticated labs to mine it for value within a reasonable time frame.
Can incentives be defined to induce Celera, Syngenta and companies like them to relax access restrictions on this part of their data? These might take the form of bilateral exchange programmes through which academic scientists visit private-sector labs, contributing their knowledge and learning to a company's science. University departments could offer opportunities, perhaps including adjunct appointments, to private-sector scientists to work in academic labs, providing legitimacy and academic status for work done by private researchers. A company could be enticed to collaborate by more direct funding of academic research that supports its ventures. For example, the Keck Graduate Institute in California has graduate students conducting company-sponsored research under confidentiality agreements in exchange for publication rights. Whatever the specifics, the benefits are in both directions: academic expertise and legitimacy would become more available to companies; private-sector research would become more accessible to academic scientists. Ideally, this becomes a 'win-win' situation for both sectors. Although such interactions have been common in other disciplines, the practice in biology is limited.
There are other precedents for successful public–private collaboration. The SNP Consortium8 and the IMAGE Consortium9 both involved partnerships that placed into the public domain valuable genomic sequence information (the first on single-base-pair variants useful for trait mapping; the second on complementary DNA sequences representing expressed human genes). The commercial partners became valued contributors, having made the assessment that the value of restricting the data was less than the expected benefits of making them freely available. This is not without hazards, given the unpredictability of some commercial ventures (such as the corporate takeover and disbanding of Research Genetics, the distributor of one set of clone resources) and the long-term sustenance needs of databases. Although most research resources should be subject to the rules of free enterprise, there might be circumstances in which publicly funded agencies step in to fill gaps.
Whatever policy or policies are promulgated by scientific communities, it is the journals that, as a practical matter, must enforce them. The tradeoff has always been between the prestige of publishing in Nature, Science or other high-profile journals in exchange for openness and unrestricted access to the relevant data. What expectations of the journal review process are reasonable? In an era when the source data for a publication might be the complete multi-million base-pair sequence for an organism, which is obviously beyond the capacity of any journal to print, access must be via websites and the Internet.
A matter of trust
Can reviewers be temporary 'trustees' of the data under review, being permitted total access to the data in a paper submitted by commercial scientists to make a recommendation for publication subject to maintaining the confidentiality of the data? The notion of a trustee would require careful deliberation and a pilot experiment should explore its efficacy. It could help the scientific community to confront the key question of whether some constraints on data access are preferable to not seeing the data at all. We believe they are. Is the academic scientific community willing to forgo the principle of universal free access? If not, then an increasing fraction of the 60% or so of genomics research that is conducted in the private sector will remain unavailable to academic and government scientists. In our view, that is too high a price to pay.
Our case is reinforced by the fact that most genomics companies do not publish their data. The many companies sequencing complementary DNAs and identifying single-nucleotide polymorphisms, for example, have information that would be immensely valuable to academic researchers if it were publicly available. But the companies patent genes as they are characterized and sell access to their databases under agreements that protect data as trade secrets. That is, of course, their right. But we need to create incentives for them to publish data within proprietary constraints, rather than clamouring for policies that push them towards non-disclosure.
The steady progress of science is founded on the tradition that individual scientists assemble knowledge 'brick-by-brick'. We believe that full and unrestricted access to fundamental research data should remain a guiding star of science because centuries of experience suggest that it is the best way to achieve progress and realize science's many benefits for everyone. However, we must also accept current realities.
Science has never been the exclusive province of the academic world. Yet today, the proportion of high-quality science in the private sector (the invention of PCR technology and the development of cre-lox recombinase gene knockout technology, to cite just two examples) is impressive as never before. The potential for productive collaboration has never been greater. We should not bemoan this development but welcome it. Private-sector science has its legitimate interests. The burden of argument is on the academic sector to attract and justify greater openness on the part of private-sector science by stating clearly what benefits this will bring to companies. Above all, it requires openness to new approaches, bereft of fundamentalism, regarding access to data that governments did not fund and cannot claim to own.
We thank many colleagues for helpful comments, and in particular Robert Cook-Deegan for suggestions on earlier drafts. The views expressed here are those of the authors and do not reflect policy of either the US Department of Energy or the US Government.
About this article
The Pharmacogenomics Journal (2002)