A little-noticed proposal promises to have a huge impact on how science is done in the ‘big data’ era. In September, the US National Institutes of Health (NIH) released draft guidelines on the sharing of genomic data. The guidelines, which have been in the works for five years, are a necessary and valuable update to the agency’s stance on how researchers who receive its funds must share data produced by projects that use array-based and high-throughput technologies. They cover a huge swathe of research, including sequencing human and non-human genomes, genes and gene variants, as well as transcriptomic, epigenomic and gene-expression data.
The issues related to collecting and sharing such data are complex, and the guidelines touch on most of the controversial topics in large-scale biology research today.
“This is a thorny issue, and the NIH cannot please everybody.”
One of the key issues is when to share. The draft policy says that researchers must have shared their data by the time those data are published in a formal manuscript. However, there are earlier release deadlines for some data types, such as raw sequence data from non-human organisms and the initial analysis of some human sequence data, both of which must be shared within six months of submission to an approved repository. This is a thorny issue, and the NIH cannot please everybody. Some researchers favour the early release of more data, whereas others fear that releasing data ahead of publication will leave them vulnerable to being scooped.
Another major issue is how to protect the identity of those whose data are shared — especially as it is now clear that it is possible to identify people from anonymous data (see Nature http://doi.org/px4; 2013). The guidelines say that researchers should tell study participants that their data “may be shared broadly for future research purposes”, and let them know whether it will be shared through an open- or controlled-access mechanism. It asks researchers to gain explicit consent from patients who agree to share their data through open-access mechanisms. And, importantly, it sets a new bar for informed consent on de-identified materials, including cell lines and clinical specimens. Such research has historically been exempt from informed-consent requirements, but the guidelines ask researchers to obtain consent for future research on these materials, too.
This is a potentially major step, and one that this publication supports (see Nature 486, 293; 2012). It is true that some researchers who have relied on clinical specimens will see it as an impediment to valuable research. But similarly, some advocates of more transparent informed-consent rules will not like the fact that the guidelines allow researchers to opt out of this requirement if they give “compelling scientific reasons” for so doing.
A third aspect relates to how long the data should be shared for. Researchers who rely on controlled-access data sets often complain about periodically having to renew their requests for access. The guidelines maintain this standard, offering access to such data for one year at a time. This is unlikely to please those who have argued that legitimate scientists should be able to access larger tranches of data and for longer periods of time — although the NIH has responded that scientists who take this position are sometimes not aware of the restrictions on all of the data sets that they plan to use (see Nature 497, 172–174; 2013).
Once finalized, the regulations will become part of a patchwork of international research regulation on the sharing of genomic data. The United Kingdom, for instance, is still deciding how much information from its 100K Genome Project will be released and whether researchers will be able to access both sequencing results and the relevant personal health records. At the same time, US open-genomics evangelist George Church is expanding his Personal Genome Project to Canada and Europe, raising questions such as whether the project will be able to access records from centralized health systems (see go.nature.com/izmgpo). By contrast, informed-consent regulations in other parts of the world are still being developed, leaving a question mark over whether the United States will become an easier place for genomics researchers to work than other parts of the world.
In that context, the US proposals will have a major impact on the work of Nature’s readers. Yet, according to the NIH’s Office of Science Policy, as of 7 November, just 18 comments had been received on the guidelines. That is a poor response to such an important issue. The policy will affect many more scientists and Nature urges them to submit their responses to the proposals before the deadline of 20 November.
- Journal name:
- Date published: