Nature | Editorial

The ups and downs of data sharing in science

Pooling clinical details helps doctors to diagnose rare diseases — but more sharing is needed.

Article tools

When doctors in Ottawa saw a child with an unusual developmental disorder last year, they were stumped. Their patient had an abnormally small head and face and had been slow to develop. They sequenced the child’s genome hoping to find a genetic explanation, but came up with too many possible candidate genes to pinpoint a likely culprit. This still happens a lot in medicine: people with rare problems go undiagnosed. And that’s one reason behind a big push in science in recent years — the pooling and sharing of clinically relevant information.

In the Ottawa case, the doctors got lucky. They were able to search a database that contained information about other patients with undiagnosed diseases, and when they did so they found a second person with similar symptoms — and an identical mutation in one gene, EFTUD2. The finding allowed the Ottawa doctors to diagnose their patient with a disease called mandibulofacial dysostosis with microcephaly, and to begin to understand why mutations in EFTUD2 cause the disease’s symptoms.

That’s the upside of the new era of data sharing. But there is a possible downside too: invasion of privacy. Massive genetic studies in countries such as the United States, Qatar, Saudi Arabia and Brazil are collecting genetic data on millions of people, so there is a chance that a person’s identity could be dragged from those data ­— especially if they are linked to clinical information, such as medical history. The risk is that someone who volunteers their DNA could see their medical problems opened to public scrutiny.

This is a legitimate concern for many researchers, and is one reason why data sharing is easier said than done. Others include the lingering sense of ownership, and the career benefits offered to those who have privileged access. Those concerns relate to the standard model of data sharing, in which different groups of scientists deposit their results into centralized databases. This model has had some success, but researchers have already encountered problems, such as how to grant and control access to the pooled information.

Pooling it in the first place becomes more difficult as the data sets get larger and the underlying techniques more varied. Imagine the difficulty of finding a specific book by gathering all the contents of a dozen different national libraries and then devising a way to integrate the numerous ways in which they are filed, tracked, recorded and made available. It would be much easier to ask each library whether it holds that book. What if data sharing in science could go the same way?

“As technology to permit targeted data access improves, so will smart sharing.”

The diagnosis of the Ottawa child shows that it can. The doctors tapped into a system that is part of the Matchmaker Exchange, which allows researchers to query multiple databases of information on patients with undiagnosed rare diseases. A doctor can feed the system information about a patient’s symptoms and genetic make-up, and then ask it whether other people have them too. (Normally, it’s hard for doctors to find other patients with similar rare diseases; often they learn about such cases by word of mouth.)

The Matchmaker Exchange exemplifies a subtle shift in how researchers think about data sharing — and one that more scientists should engage with. It was created by the Global Alliance for Genomics and Health, a 3-year-old organization with more than 700 members from 70 countries that aims to help researchers, doctors and patients to make scientific progress by sharing data (see Global Alliance for Genomics and Health Science 352, 1278–1280; 2016).

The alliance is creating technological tools that allow researchers to find out where data that are relevant to their patients are held around the world. It aims to make data not just shareable but discoverable, too. Doing this allows those who produce the data to keep more control of the information. It also streamlines searches. For example, researchers looking for a diagnosis want to know the symptoms that other ­doctors have seen in people with particular genetic traits. Thus they just want to know who might have seen these mutations and what symptoms might have been observed in patients who have them; they don’t want to comb through all the existing databases of genetic information themselves.

Of course, there are still many instances in which accumulating and sharing large amounts of data ­­— on particular genetic traits, for example — is essential and valuable. The gene-testing company Myriad Genetics is locked in a tussle with doctors and patients who want it to open up its massive database of information on variations in the BRCA1 and BRCA2 genes, which are linked to a higher risk of breast and ovarian cancer. (Another alliance project, the BRCA Exchange, seeks to provide easily searchable interpretations of BRCA variants that have been shared by groups outside Myriad.)

But in other cases, data access works best, for both sides, when the requests for information are targeted at specific traits. And as the ­technology to permit that improves, so will smart sharing.

Journal name:
Nature
Volume:
534,
Pages:
435–436
Date published:
()
DOI:
doi:10.1038/534435b

For the best commenting experience, please login or register as a user and agree to our Community Guidelines. You will be re-directed back to this page where you will see comments updating in real-time and have the ability to recommend comments to other users.

Comments

Commenting is currently unavailable.

sign up to Nature briefing

What matters in science — and why — free in your inbox every weekday.

Sign up

Listen

new-pod-red

Nature Podcast

Our award-winning show features highlights from the week's edition of Nature, interviews with the people behind the science, and in-depth commentary and analysis from journalists around the world.