New NIH guidelines may have been influenced in part by the controversy over HeLa cells, derived from cancer patient Henrietta Lacks in 1951. Credit: Science Photo Library / Dr Torsten Wittmann

Scientists who work on genomics and are funded by the US National Institutes of Health (NIH) must post their data online so that others can build on the information, the agency has said in an update to its guidelines.

Social sciences suffer from severe publication bias Ebola virus mutating rapidly as it spreads Scientific advice: Crisis counsellors

The change, which expands the remit of an earlier data-sharing policy, is not expected to drastically alter research practices — many genomics researchers are accustomed to sharing their data. But the latest policy, released on 27 August, gives clearer instructions for gaining the informed consent of study participants. The NIH will now require researchers to tell study participants that their data may be broadly shared for future research.

Informed consent will be required not just for genomic data, but also for cell lines or clinical specimens such as tissue samples, even when they are stripped of information that directly identify the source. That extension, which the NIH has not previously required, is a “big step”, says Ellen Wright Clayton, a bioethicist and lawyer at Vanderbilt University in Nashville, Tennessee.

The agency has been working on the changes for several years, as new technologies have rapidly made it easier and cheaper for researchers to gather, analyse and share genomic data. Even anonymized, or 'de-identified', data can sometimes be traced to the individuals who provided them1.

“I have the somewhat jaundiced expectation that, one way or another, human-subjects research data and biosamples are increasingly being used in the broadest ways,” says Hank Greely, director of the Center for Law and the Biosciences at Stanford University in California. “People should be told what they are agreeing to — warning them about this broad use is a good idea.”

The use of patients' biological information for previously undisclosed purposes has been the subject of several high-profile cases. In 2010, the Havasupai tribe of Arizona won a US$700,000 settlement against Arizona State University when blood samples originally provided for a study on diabetes were used in mental-illness research and population studies.

And in 2013, a controversy erupted over the publication of the genome of HeLa cells, a tissue-culture cell line derived from Henrietta Lacks in 1951. Lacks’s surviving relatives had not been consulted before the data were released, and NIH director Francis Collins met with them to discuss their concerns and agree on guidelines for how scientists can gain access to the data.

“The lessons have been learned that you need to get consent — and that you cannot rely even on de-identified information remaining anonymous,” says Yaniv Erlich, a computational biologist at the Whitehead Institute for Biomedical Research in Cambridge, Massachusetts, who led a study showing the possibility of re-identifying individuals from anonymous data sets1.

Some researchers say that the NIH could have done more to investigate different types of informed consent. “We hoped the NIH would recognize dynamic consent, which allows an individual to broaden their consent in the future, or have choices [to give permission for particular research uses] along the way,” says Heather Pierce, senior director of science policy at the Association of American Medical Colleges in Washington DC. “But the NIH did acknowledge that it will be watching these new areas,” she says.

The final rules expand a 2007 policy that required sharing of data from genome-wide association studies, the search for genetic variants linked to traits and disease. Starting in January 2015, the same policy will apply to many other types of genomic data, such as information from studies of how genes are expressed, large-scale RNA sequencing (the ‘transcriptome’) and also non-human genomes. “Many researchers have already been submitting that type of data into our genomic repositories,” says Dina Paltoo, director of the genetics, health and society programme at the NIH Office of Science Policy.

The rules differ little from a draft policy released last year. Most comments on that draft supported data sharing, says Paltoo, but there was detailed argument about when data should be shared. The NIH stuck with its original view that data must be accessible by the time a study is published, and earlier in some cases. The policy excludes studies on fewer than 100 genomes. Erlich worries that this means data from research involving very rare diseases might not be shared. But Paltoo says that individual NIH institutes could extend the requirements to more specialised types of studies.

Access to sensitive data, such as those linked with medical information, will be controlled. In 2007, the NIH created a central repository for this purpose — the Database of Genotypes and Phenotypes (dbGaP) — where investigators can upload sensitive data, classified according to the re-use consent given by study participants. The database now includes more than 300 studies, with 2 petabytes of information from 800,000 research subjects, says Paltoo. Researchers apply to a central committee to gain access to particular data. Over six years, more than 2,200 investigators have made more than 17,500 access requests, Paltoo and her colleagues reported on 27 August in Nature Genetics2.

Wider changes to privacy and consent for genomics data may come in future years. In 2011, the US Office of Human Research Protections, which oversees human research ethics, proposed revisions to the 'Common Rule’, a 23-year-old policy that governs the protection of human research subjects. One proposal was that written consent be obtained for the research use of all biological specimens, says Paltoo — which would be consistent with the NIH rules, she adds. But researchers are still waiting for the Common Rule changes to progress.