Peter Border asks how we can protect our personal genomic data while making them available for research.
The $1,000 Genome: The Revolution in DNA Sequencing and the New Era of Personalized Medicine
The Human Genome Project cost US$3 billion and took 13 years. Today, sequencing machines can churn through a whole human genome in days for a few thousand dollars, making personal genomics increasingly affordable. Two books, Kevin Davies's The $1,000 Genome and Misha Angrist's Here is a Human Being, chart the growth of personal genomics and examine its implications.
Davies, a biomedical journalist and founding editor of Nature Genetics, focuses on the scientific advances that are enabling more of us to map our genes. Angrist, a geneticist at Duke University, North Carolina, examines the personal and political consequences. Both agree that more scientific work is needed to improve the usefulness of genome data for health care, and that this will require sharing data with researchers. Neither says much about how this might be achieved in practice, or how people's concerns about genomic privacy and security can be overcome.
Davies focuses on the developments that replaced automated versions of Sanger sequencing with second-generation, high-throughput machines. He explains the science that underpins the massively parallel, miniaturized approaches used and gives a glimpse of future developments. He acquaints us with DNA microscopes, nanopores and ion sensors, techniques that are vying to form the basis of third-generation sequencers that might deliver whole genomes in a few minutes for a few hundred dollars.
Angrist is a part of the Personal Genome Project, an ambitious plan to sequence human genomes and make them freely available on the Internet. The project has two underlying principles. First, genome information is most useful when linked to other, phenotypic information about an individual's medical records or family history. Second, it is misleading and unrealistic to guarantee participants' genomic privacy. As Angrist points out, DNA is the ultimate identifier, particularly when it is linked to detailed personal information.
Attempts to sanitize genome data are fraught with difficulty. Angrist discusses James Watson's genome, one of the first to be made public. Watson stipulated he did not want to know if he was at an increased risk of Alzheimer's disease, so asked for the sequence data for the relevant gene, ApoE4, to be removed before publication. But this proved futile; researchers showed it is easy to deduce ApoE4 status by studying the sequence on either side of that gene.
Participants in the Personal Genome Project were given the opportunity to leave out some of their genome data, but most did not. They took part with the expectation that their genomes would become public currency and were warned of the implications. For instance, genetic information might be used to infer paternity, to adversely affect employment or insurance situations, or even to make synthetic DNA that could be planted at a crime scene. Angrist describes how he agonized over going public with his own genetic and medical record, including a family history of breast cancer that might raise future concerns for his two daughters.
Each book discusses the growth of 'spit-kit' companies that offer genome analysis through direct-to-consumer testing. For a few hundred dollars, companies such as 23andMe, Navigenics and Pathway Genomics will analyse a sample of your DNA and provide a report. They also allow you to compare your genome against a maintained online database of sequences and associated traits, from the trivial to the potentially life-changing.
The debate continues as to whether consumer genetic tests should be more closely regulated, with questions about the veracity of the tests and how they should be offered: are the links between the genetic markers and the associated traits robust enough to allow reliable probabilistic assessments? Can these lead to useful health interventions or lifestyle changes? Are consumers or their physicians sufficiently informed to interpret genome information in a meaningful way? Neither book draws hard and fast conclusions.
Everyone agrees that the coming deluge of genome data will need to be linked to information about an individual's health, environment and family background. However, the use of genome data in research raises privacy and confidentiality issues. The Personal Genome Project model that Angrist describes represents one end of a spectrum. Its participants are advocates of biomedical research, actively involved in the field and able to understand the possible consequences of having their data posted online.
Direct-to-consumer testing is nearer the other end of the spectrum. Each consumer's genome information sits on a secure server maintained by the testing company, and it is up to each person how they share it. The popularity of this approach suggests that people trust the companies to hold their information and like being in control of who can see it. The downside is that the information is less accessible to researchers.
The challenge will be to reconcile people's concerns about genomic privacy and security with the need to allow researchers and clinicians data access. If personal genomics is offered largely by consumer testing, the onus will be on the companies to engage their customers in research projects that improve understanding of how our genomes influence health. Some have started to do this. For instance, 23andWe — the research arm of 23andMe — recruits customers into research projects on pharmacogenomics, Parkinson's disease, sarcoma and aspects of pregnancy.
Alternatively, national or regional health-care systems concerned with disease prevention may offer genomic tests on a more systematic basis. A possible model might be the UK Biobank, which has recruited more than 500,000 people in a study to see how health is affected by lifestyle, environment and genes. Participants divulge medical and lifestyle information and donate biological samples from which researchers can generate gene sequences. The success of the project to date is largely a result of creating a robust, independent framework for ethics and governance, recruiting participants through trusted intermediaries (their physicians) and sharing the data with researchers in a form that hides the identity of individuals.
No amount of data will be useful if you can't interpret what they mean. We are reaching the point at which the cost of interpreting genome information will exceed the cost of generating it, so the challenge ahead will be to make more sense of the data we already have. We will also have to answer the question of whether genome data are personal, in that they are paid for and controlled by individuals, or whether such data are medical, being funded by and accessible to health-care systems.