Unmasked: can data from genetic databases be truly anonymous?

Questions have been raised about how confidential genetic databases for health and research-related purposes really can be.

A letter in Science by Russ Altman and colleagues at Stanford University shows that as few as 75 single-nucleotide polymorphisms (SNPs) could identify an anonymous subject (Lin, Z. et al. Science 305, 183 (2004)).

“We looked into the literature, and we're convinced it is not technically feasible to protect privacy,” says Altman. Efforts such as creating an 'Enigma Code' by randomly changing a small percentage of SNPs for each subject in the database would not protect privacy. Other technical methods that should in theory increase confidentiality — including disclosure control methods, such as data suppression and adding noise, or disregarding exact genomic locations of SNPs — would also fail in practice.

“We came to the conclusion that we need a social attempt to control this as there won't be a technical solution,” says Altman.

The authors call for strict guidelines for genetic databases, and legal protections and penalties should these safeguards fail. Their suggestions include tight monitoring of users accessing research databases, as well as the addition of specific wording to address genetic information in rules governing health-oriented data, such as in the United States' Health Insurance Portability and Accountability Act (HIPAA), which forbids the sharing of identifiable data without patient consent, but currently does not single out pharmacogenomic data.

Isolating genetics from other medical information, however, runs counter to some other recommendations, including those from the Nuffield Council on Bioethics report entitled Pharmacogenetics: Ethical Issues.

Klaus Lindpaintner, head of medical genomics at Roche and a member of the working committee for the Nuffield Council on Bioethics report, thinks that genetic tests are not fundamentally different from other forms of medical data. “There's a lot of good to be done with genetic tests, and putting it in a different category creates a lot of hurdles for medical science,” says Lindpaintner. “This approach will do more harm than good.” Lindpaintner advocates establishing broad antidiscrimination laws that protect people who are ill or who carry a genetic predisposition to disease, perhaps following the model in which people with HIV receive protection under the Americans with Disabilities Act.

Mildred Cho, associate director of the Stanford Center for Biomedical Ethics at Stanford University, California, does, however, believe there is a difference. “Genetic information is much more searchable, it can serve as a link to medical information, and it is easily computerized,” says Cho. “It is harder to search medical text than DNA sequence information.”

“Lots of information in medical records could be used to identify individuals,” says Morris Foster, associate professor of anthropology at the University of Oklahoma, United States. “We think genetics ID us and more powerfully [than other means]; this is perception. A social security number could be more harmful to you [economically] than your genotype.”

But researchers running databases cannot be sued on the basis of a perception, says Bartha Knoppers, professor of law at the Center for Public Law Research at the University of Montreal, Canada. If database developers take reasonable precautions, they can protect themselves at least against frivolous lawsuits by putting comprehensive privacy-protection guidelines into practice. Knoppers believes a step in protecting researchers is to make sure there are data-protection officers with training in science and ethics on the front lines of every database “to protect data and make decisions so researchers don't have to.”