Cloud processing of DNA sequence data promises to speed up discovery of disease-linked gene variants. Credit: David Paul Morris/Bloomberg via Getty

The dream for tomorrow’s medicine is to understand the links between DNA and disease — and to tailor therapies accordingly. But scientists working to realize such ‘personalized’ or ‘precision’ medicine have a problem: how to keep genetic data and medical records secure while still enabling the massive, cloud-based analyses needed to make meaningful associations. Now, tests of an emerging form of data encryption suggest that the dilemma can be solved.

At a workshop on 16 March hosted by the University of California, San Diego (UCSD), cryptographers analysed test genetic data. Working with small data sets, and using a method known as homomorphic encryption, they could find disease-associated gene variants in about ten minutes. Despite the fact that computers were still kept bogged down for hours by more-realistic tasks — such as finding a disease-linked variant in a stretch of DNA a few hundred-thousandths the size of the whole genome — experts in crypto­graphy were encouraged.

“This is a promising result,” says Xiaoqian Jiang, a computer scientist at UCSD who helped to set up the workshop. “But challenges still exist in scaling it up.”

Physicians and researchers think that understanding how genes influence disease will require genetic and health data to be collected from millions of people. They have already started planning projects, such as US President Barack Obama’s Precision Medicine Initiative and Britain’s 100,000 Genomes Project. Such a massive task will probably require harnessing the processing power of networked cloud computers, but online security breaches in the past few years illustrate the dangers of entrusting huge, sensitive data sets to the cloud. Administrators at the US National Institutes of Health’s database of Genotypes and Phenotypes (dbGaP), a catalogue of genetic and medical data, are so concerned about security that they forbid users of the data from storing it on computers that are directly connected to the Internet.

Homomorphic encryption could address those fears by allowing researchers to deposit only a mathematically scrambled, or encrypted, form of data in the cloud. It involves encrypting data on a local computer, then uploading that scrambled data to the cloud. Computations on the encrypted data are performed in the cloud and an encrypted result is then sent back to a local computer, which decrypts the answer. If would-be thieves were to intercept the encrypted data at any point along the way, the under­lying data would remain safe.

“If we can show that these techniques work, then it will give increased reassurance that this high-volume data will be computed on and stored in a way that protects individual privacy,” says Lucila Ohno-Machado, a computer scientist at UCSD and a workshop organizer.

Homomorphic data encryption, first proposed in 1978, differs from other types of encryption in that it would allow the cloud to manipulate scrambled data — in essence, the cloud would never actually ‘see’ the numbers it was working with. And, unlike other encryption schemes, it would give the same result as calculations on unencrypted data.

But it remained largely a theoretical concept until 2009, when cryptographer Craig Gentry at the IBM Thomas J. Watson Research Center in Yorktown Heights, New York, proved that it was possible to carry out almost any type of computation on homomorphically encrypted data. This was done by transforming each data point into a piece of encrypted information, or ciphertext, that was larger and more complex than the original bit of data. A single bit of unencrypted data would become encrypted into a ciphertext of a few megabytes — the size of a digital photograph. It was a breakthrough, but calculations could take 14 orders of magnitude as long as working on unencrypted data. Gentry had rendered the approach possible, but it remained impractical.

Since then, cryptographers have developed systems to address these issues, for instance by encrypting many pieces of data together so that they can be processed in parallel, or by encrypting real numbers directly into single ciphertexts, rather than first converting them into bits. Such innovations take less time and memory than encrypting and carrying out computations on each piece of data separately. As a result, homomorphic encryption now runs 150,000 times faster than it did in 2009, says Shai Halevi, a cryptographer at the IBM research centre. “The same calculation that took a day-and-a-half in 2012 now takes us five minutes to do,” he says. “Now is the time to ask, is this fast enough to be usable?”

At the 16 March iDASH Privacy & Security Workshop 2015 (iDASH stands for Integrating Data for Analysis, Anonymization and Sharing), five teams revealed homomorphic encryption schemes that could examine data from 400 people within about 10 minutes, and could pick out a disease-linked variant from among 311 spots at which the genome is known to vary. It took up to 30 minutes to similarly analyse small chunks of genome a little larger than the size of a typical gene, about 5,000 DNA base pairs. For larger stretches of sequence data — 100,000 base pairs, or about 0.003% of the overall genome — analysis was not always possible, or took hours, and consumed up to 100 times more memory than computing unencrypted data. Even so, cryptographers say that the results indicate major progress: “Our challenge shows this is not impossible, compared to three years ago, when people were thinking this computation was infeasible,” Jiang says.

But some data custodians remain sceptical about encryption. Steven Sherry, chief of the reference collections section at the US National Center for Biotechnology Information in Bethesda, Maryland, manages dbGaP. He says that cryptography, even if it worked, would not necessarily protect data on researchers’ computers or give them enough analytical flexibility. He instead favours restricting access to a small circle of scientists and asking them to certify that they will abide by rules and regulations on how the data can be used. “We haven’t looked at cryptographic methods, because it hasn’t been demonstrated to us that they’re both secure and useful,” Sherry says.

But some cryptographers are already deploying limited homomorphic encryption systems in biomedicine. Administrators of an HIV study and a hospital biobank in Switzerland are using such a scheme: they need to perform only a small set of relatively simple operations on the data, so do not mind that the system cannot do complex operations quickly. They are worried about the potential release of genetic data, says Jean-Pierre Hubaux at the Swiss Federal Institute of Technology in Lausanne, who is developing the system: “People are really afraid that in the near future some scandal will occur and they will be criticized for not having all the proper precautions in place.”