Offering new consent options to study participants will accelerate the release of valuable and varied clinically annotated genomic data.
Few would dispute that to speed up translational research, the greater the amount of clinically annotated human genome data available, the better. However, the release of genomic data, even stripped of direct identifiers, is not without risk to patient privacy. Over the past decade, a series of publications has demonstrated that it is possible, using publicly accessible data, to re-identify the individual to whom the de-identified genomic data belongs.
This realization prompted the development of controlled-access databases, which house de-identified, individual-level genomic and phenotypic data, and the appointment of data-access committees to review requests for access to them. These committees grant access only to those researchers whose stated aims comply with the consent forms signed by each individual donor. Examples of controlled-access databases include the database for Genotypes and Phenotypes (dbGAP), which is the mandated repository for all human genetic studies funded by the US National Institutes of Health (NIH), and the European Genome-phenome Archive (EGA), among others.
Committee oversight of these databases undoubtedly reduces the likelihood that someone with nefarious aims can obtain sensitive data, but these controls are not without trade-offs. The process of submitting genomic and phenotypic data to dbGAP is so complex that it can take weeks to months to complete, depending on the backlog and number of back-and-forth discussions required to satisfy data-consent and curation requirements. Such delays complicate the process of submitting papers that are based on the genomic and phenotypic data to peer-reviewed journals. Many journals—including this one—require that all data that form the basis for the conclusions of a paper must be made accessible to readers no later than at the time of publication. Therefore, delays at the dbGAP submission stage can translate into delays in publication.
Controlled access may also impede the actual use of the data. Some have suggested that when related data sets are compared, those in controlled-access databases are used less frequently than their counterparts in open-access venues (Science 339, 275–276, 2013).
These trade-offs might be worthwhile when they are the only way to protect the privacy preferences of study participants. However, a recent study from the Baylor College of Medicine indicates that controlled access is not actually the preference of all study participants (Sci. Data 3, 160010, 2016). Among a cohort of people with cancer in Texas, the majority consented to full open release of their tumor and normal tissue whole-genome sequencing data; the only restriction was that data users could not attempt to re-identify the participants. A subset of consenting patients was then educated about the risks and benefits of such open-data sharing and tested for comprehension of these risks and benefits; a majority of these patients reaffirmed open-access consent. Annotated whole-genome and/or exome sequences from a subset of these individuals were released in February.
Of course, one must be careful in generalizing the findings of this study to other groups of individuals. Consent preferences presumably vary widely with health and socioeconomic status, geography and culture. But other organizations are also experimenting with new consent procedures and access options. Although it reports phenotypic, not genotypic, data from healthy individuals and those with Parkinson's disease, the Sage Bionetworks mPower Parkinson's disease study enables participants to choose—in a dynamic way with a new electronic interface—their personal data-sharing preferences. Instead of applying to a data-access committee, data requestors who satisfactorily answer a short list of logical questions will be designated as 'qualified researchers' and granted access to the data. As outlined in a recent Nature Biotechnology Commentary by Sage Bionetworks–affiliated authors, the goals of this process are to increase participants' control over their own data (they can change their consent preferences in a dynamic manner later) and to lower the barriers to data access by qualified researchers (Nat. Biotech., http://dx.doi.org/10.1038/nbt.3516, 2016).
Less-direct benefits of faster and wider data access that are enabled by new consent options should not be ignored. Any approach that accelerates wider availability of more diverse clinically annotated genomic samples will hasten the development of analytical methods capable of extracting diagnostic and therapeutic insights from samples that are representative of the full spectrum of clinical scenarios. Much of the currently available clinically annotated genotypic data is collected by consortia that, with meritorious intentions, impose strict requirements on the quality and features of the samples in their databases. For example, The Cancer Genome Atlas (TCGA) accepts only samples obtained during resection (not biopsy) of a primary tumor (not a metastasis) from a patient who did not receive any neoadjuvant treatment. Moreover, the sample must be composed primarily of tumor cells (at least 60% of the cell nuclei must be from tumor cells). So TCGA data may not be useful to a researcher who is seeking to train, validate and standardize algorithms that analyze immune infiltrates in metastases from patients treated with immune-checkpoint inhibitors. However, these and other varied samples will probably be encountered during the application of genomic technology to personalized medicine in 'real-time' clinical scenarios. Thus, ready availability of validated methods that are capable of analyzing them will be essential.
Patient awareness of the existence and promise of genomic technology is rapidly increasing. So, too, should their awareness of the risks and benefits of different data-access options.