Main

An increasing number of investigators are prospectively collecting and storing biological specimens for genomic analysis. Investigators engaged in this research activity are strongly encouraged to comply with genomic data sharing policies, which have historically called for the rapid public release of all generated DNA data.13 Making data publicly accessible is cost efficient and maximizes the scientific utility of genomic information. Deidentification, or the removal of all personally identifying information before public release, has been the traditional means of protecting the privacy of individuals participating in genomic research. However, it has been shown that individuals can be uniquely identified on the basis of just 30–80 statistically independent single-nucleotide polymorphisms,4 and it is now even possible to identify an individual from pooled or aggregated DNA data.5 These findings raise concern about the privacy of research participants and have led to the creation of controlled access, or restricted, scientific databases.6,7

Some have criticized this shift in data access policy as being overly protective,8 and some projects will only enroll participants who agree to full public data release.9 We have argued that all data sharing decisions involve an unavoidable trade-off between protecting privacy and advancing research, and as individuals may vary in their judgments about this trade-off, decisions about DNA data release ought to be made by research participants during the informed consent process.10 However, a major policy concern is that giving participants control over decisions about data sharing will lead to excessive anxiety about protecting privacy and a reluctance to share data, negatively impacting research. We conducted a single-blinded, randomized controlled trial of three different types of consent, each affording varying levels of control over the decision about data sharing, to assess their impact on research enrollment into an underlying genomic study and participants' data sharing preferences.

MATERIALS AND METHODS

Study participants and procedures

Participants were adult (18 years or older) patients (n = 205), parents/guardians of pediatric patients (n = 103), and family members acting as matched case controls (n = 28) who were recruited to one of six ongoing genomic studies (pediatric brain cancer, pediatric brain controls, pediatric autism, adult/pediatric epilepsy, adult/pediatric liver cancer, and adult pancreatic cancer) at Baylor College of Medicine (BCM) in Houston, Texas, between January 2008 and August 2009.

Participants eligible for the randomized consent study were English proficient and were enrolled with a waiver of consent obtained from the BCM Institutional Review Board. Participants considering enrollment in one of the genomic studies were randomized to one of three experimental consent types by a centralized, web-based randomization program using permuted blocks and stratified by genomic study. Genomic study PI's who could not use the online randomization system were provided with sealed, prerandomized envelopes each containing the assignment. Informed consent into the genomic study was obtained in a face-to-face setting by the genomic study Principal Investigator (PI), a research nurse, or a medical resident with one of the three experimental consent documents. The consent process varied slightly depending on the design of the underlying genomic study; however, the overall process did not differ by randomized consent type.

After providing informed consent for the genomic study, participants were debriefed by a designated research coordinator from this consent study. Those who were ineligible or declined participation in the genomic study but had seen or signed one of our experimental consent documents were debriefed by the genomic study PI or research nurse; most refusals were due to general research concerns (e.g., fear of blood draw) or lack of time. One individual reportedly refused participation in the genomic study specifically because of concerns about data sharing and was debriefed by a consent study coordinator. Debriefing took place either in a private hospital room during an inpatient stay or in a waiting or examination room during a follow-up clinic visit. Twenty-seven participants did not return for a follow-up visit and were debriefed by phone or US mail. During the debriefing, participants were given information about the consent study and the randomization process, a detailed review of the data sharing options in each experimental consent document, and an opportunity to change their data sharing choice.

Eligible participants were invited to participate in a structured follow-up interview to assess understanding, comfort in decision making, and to examine preferences and attitudes regarding data sharing. To prevent bias, those who agreed to the interview were not shown the other consent forms or data sharing options until partway through the interview. Analysis of interview responses will be reported elsewhere. All materials and methods for this study were reviewed and approved by the BCM Institutional Review Board.

Study instruments

Three experimental consent templates were developed by a review of the informed consent literature and were refined with input from an interdisciplinary panel of experts at BCM and focus group research conducted by two of the authors (A.L.M. and A.M.G.).11 The experimental consent templates were adapted for each genomic study. All the consent documents contained specific information about the respective genomic study, including purpose, risks, benefits, compensation, and access to health records. Data sharing was explained in each consent document (see Supplemental Digital Content 1, http://links.lww.com/GIM/A187, which contains excerpted text from each consent type on data sharing). Participants were told that personally identifying information (e.g., their name) would never be released. Risks of data sharing were described as small potential breaches in privacy if DNA were traced back to the individual. Participants were cautioned that these risks could increase in the future. It was noted that a researcher's obligation to protect privacy and confidentiality in restricted databases offers participants an extra layer of protection. Benefits of data sharing were characterized as aiding in the advancement of medical research by speeding up research and allowing other investigators to use the data to answer future research questions.

Each experimental consent document offered some combination of the following three data release options: (1) public data release (release of genetic and clinical information into both publicly accessible [open access through the internet] and restricted [accessible only to approved researchers] scientific databases), (2) restricted release (release of genetic and clinical information into restricted databases only), and (3) no release (accessible only to the genomic study PI and his or her staff) (Table 1).

Table 1 Consent form data release options

Those who signed the traditional consent agreed by default to release their genetic and clinical information into both publicly accessible and restricted scientific databases. The binary consent allowed participants to choose between full public data release and no release. Tiered consent presented all three options: participants could choose public data release, restricted release only, or no release.

The primary outcomes were (a) the rate of refusal and withdrawal within each consent type and (b) the difference in data sharing choices between the three randomized groups.

Data analysis

Participant characteristics were described with the use of frequencies for categorical variables and means or medians for continuous variables. Differences between groups were tested with χ2 tests for categorical variables and one-way analysis of variance for continuous variables. Differences in postdebriefing data release selections were examined with multinomial logistic regression which allows for polytomous instead of dichotomous outcomes, adjusting for potential confounders. Results were presented as odds ratios (ORs) with 95% confidence interval (CI) comparing restricted or no data release to public data release. Participants' age and the time lapse between consent and debrief were treated as continuous variables. All other factors included in the multivariate analysis were categorical variables; this included sociodemographics: gender, race and ethnicity, marital status, religious affiliation, education and income, and participant characteristics, including randomized consent type and consentee relationship (either adults providing consent for themselves or providing parental consent for their child). For all tests, a significance level of P < 0.05 (two tailed) was used. All analyses were conducted using SPSS 17 or SAS 9.2 (SPSS, Inc., Chicago, IL; SAS Institute Inc., Cary, NC).

RESULTS

Three hundred seventy-eight individuals were approached for recruitment into one of the six genomic studies; 42 were deemed ineligible or chose not to enroll and were removed from the randomization. A total of 349 experimental consent documents were randomized to 336 individual participants (Fig. 1). Most of the participants were either consenting adult patients or parents or guardians of pediatric patients. Two of the genomic studies (autism and epilepsy) also enrolled patients' family members to serve as matched case controls. Parents of pediatric patients who enrolled as matched case controls made two consent choices, one for their child who was the primary subject (i.e., parental consent) and one for themselves as a matched case control (i.e., adult/self-consent); these cases were treated as a single participant making two distinct decisions (n = 13). All participating members of the same family were randomized to the same experimental consent document (n = 18 families comprised 34 individuals making 47 distinct decisions).

Fig. 1
figure 1

Study consort diagram.

Thirteen participants were deemed ineligible: five turned out to have limited English proficiency; four died during the course of the study and could not be debriefed, three were lost to follow-up (one participant consented on behalf of a child and as a matched case control for a total of four distinct data release decisions lost to follow-up), and one did not provide a data release option. The remaining 323 individual participants were enrolled into the consent study, and 335 distinct data sharing decisions were analyzed.

The median age of participants was 48.5 years (range: 18–86 years). Most participants were women (57.3%) and non-Hispanic white (56.1%). The majority reported being married (63.7%), Christian (81.3%), and roughly two thirds indicated completing at least 1 year of college (67.8%) (Table 2).

Table 2 Participant characteristics by randomized consent type

Consent type and data sharing decisions

All eligible participants randomized to traditional consent agreed to participate in the genomic study and by default to public release. Most participants (84.9%) randomized to binary consent chose public data release, whereas the remaining individuals (15.1%) opted out of data sharing (no release). The majority of participants (66.4%) randomized to tiered consent agreed to public data release, less than a fifth (19.5%) chose restricted release, and the remainder (14.1%) chose no release.

After the debriefing, participants were given an opportunity to change their data release option; the majority (67.8%) stayed with their original choice. Of those who changed, only three chose an option that was less restrictive then their original choice (i.e., changed from no release to restricted release). Those randomized to tiered consent were less likely to change (21.2%) than those randomized to binary (37.7%) or traditional consent (37.9%) (χ2 test, P = 0.01).

A majority of participants (53.1%) chose public data release as their final data sharing decision, a third (33.1%) chose restricted release, and the remaining individuals (13.7%) chose no release (Table 3). Final data sharing decisions and whether this choice differed from their original selection were significantly associated with randomized consent type (final decision χ2 test, P = 0.02; changing decision χ2 test, P = 0.01). Those randomized to traditional consent were most likely to choose public data release as their final data sharing decision (62.1%). Conversely, they were least likely to choose no release; only 6% of participants randomized to traditional consent chose not to release their data at all, compared with nearly 20% of those randomized to either binary or tiered consent. Participants randomized to tiered consent were less likely to change their data sharing decision before and after debriefing; 21.2% of those randomized to tiered consent changed from their initial data release selection compared with 37.9% randomized to traditional consent and 37.7% randomized to binary consent.

Table 3 Pre- and postdebriefing data release selections by randomized consent type, genomic study, and consentee relationship

Other factors influencing data sharing decisions

Hispanic participants were significantly less likely to choose public data release compared with non-Hispanic white participants (restricted release: OR, 2.94 CI, 1.16–7.43; no release: OR, 3.94; CI, 1.05–1.76). Unmarried participants, including those who were divorced, widowed, separated, or never married, were more likely to choose restricted data release (OR, 2.40; CI, 1.05–5.44). When choosing between restricted and public data release, participants with some college or a college degree were also more likely to choose restricted data release (some college: OR, 3.52; CI, 1.02–12.14; college graduate: OR, 4.67; CI, 1.35–16.12) (Table 4).

Table 4 Multinomial logistic regression analysis of factors associated with participants' final data release selection

Genomic study was also found to be significantly associated with final data release selection (Table 3). Participants from studies conducting pediatric research (autism, brain cancer, and brain control) were more restrictive in their final data release choices than individuals from studies targeting mostly adult populations (liver and pancreatic cancers) (χ2 test, P = 0.04). To determine whether these differences could be categorized based on consentee relationship, parental consent decisions (n = 113) were compared with adult/self-consent decisions (n = 221). Consentee relationship was significantly associated with one's final data release selection (χ2 test, P < 0.001). After controlling for other variables, consentee relationship remained a significant predictor; participants providing parental consent were significantly less likely to chose public data release than adults consenting for themselves (restricted release: OR, 3.56; CI, 1.57–8.08; no release: OR, 4.78; CI, 1.46–15.64) (Table 4). Those participants who made decisions both for themselves (adult/self-consent) and on behalf of their child (parental consent) (n = 12) made the same data sharing choice for themselves as for their child.

Another possible explanation for the difference between genomic studies could be the amount of time that lapsed between obtaining informed consent and debriefing the study participant. Most of the participants from the autism, brain cancer, and brain control studies were debriefed immediately after the informed consent process, whereas some individuals from the pancreatic cancer and liver cancer studies were not debriefed until months later (at a subsequent postoperative visit). However, when we controlled for other factors, time lapse between consent and debrief was not found to be a significant predictor of one's final data release selection (restricted release: OR, 1.00; CI, 1.00–1.01; no release: OR, 1.00; CI, 0.99–1.01) (Table 4).

Refusal and withdrawal rates

All the genomic studies reported high enrollment rates (autism, 85.7%; brain cancer, 80.9%; brain control, 61.5%; epilepsy, 85.7%; liver cancer, 97%; and pancreatic cancer, 98.1%). Variations in genomic study enrollment rates were due to individual recruitment methods and the populations under study and were not reflective of issues with the consent process or data sharing concerns. Only 20 individuals overall declined participation. Of those, four were randomized to traditional consent, six to binary, three to tiered, and seven were not randomized to a consent type before declining. Most who declined participation in the genomic study reported that they did so because of general research-related concerns (e.g., blood draw and no time). Only one participant, randomized to binary consent, specified apprehension about data sharing.

DISCUSSION

When given a choice about genomic data sharing, just over half of the participants in this study chose public data release. Genomic data generated during the course of research have traditionally been treated primarily as a community resource and made widely available through publicly accessible scientific databases. Some human DNA data are still available to the general public (e.g., http://hapmap.ncbi.nlm.nih.gov/, http://www.1000genomes.org/page.php), but the majority of data are now only available to approved researchers through controlled access databases (e.g., http://www.ncbi.nlm.nih.gov/gap). This policy shift was prompted by evidence of the potential vulnerability of deidentified DNA data4,5 and related concerns about participant privacy.10,12 Privacy risks have been carefully considered,13 but until now, there has been little empirical data on stakeholder perspectives to help inform these policy decisions.

Studies have shown that participants are apprehensive about potential privacy invasions when participating in genomic research.11,14,15 This is the first study to examine in a randomized fashion how these concerns impact research enrollment and actual data sharing decisions. Our findings indicate that, despite privacy concerns, the majority of research participants are “information altruists”16 with respect to the public release of their genomic data.

Another important observation is that parents are less inclined to consent to the public release of their child's DNA data. Still, the majority is willing to share the data with the broader scientific community by controlled access databases. Additional research on pediatric participants' attitudes toward data sharing will be important for future policy development.

Our finding that white participants are less restrictive in their data sharing choice than their minority counterparts is consistent with other studies that have found that minorities are less likely to participate in research and are more distrustful of study investigators.17 Educational programs that aim to increase minority participation in research should specifically address concerns about genomic data sharing.

This study has several limitations. All participants were recruited within a clinical setting. They may have formed a trusting relationship with study investigators, which could have influenced their willingness to participate and their data release choice. Additionally, all the genomic studies were conducted at BCM in Houston, Texas. These findings may not be generalizable to other participant populations or geographic regions. Although all the genomic studies used the same consent language, the consenting process varied across studies with respect to the consent process facilitator, length of time spent with the participants, and the timing of the consenting process (i.e., whether consent was obtained at a preoperative visit, just before surgery, or during the postoperative period). These are some of the factors that may help explain the variations in data sharing choices among participants from individual studies.

A primary outcome was to determine whether offering greater control over DNA data release during the informed consent process affected the rate of participation in or withdrawal from the genomic research. However, the common practice of participant recruitment into research studies may not have allowed us to accurately assess these parameters. Participants eligible for genomic studies were often approached informally by the study PI, a clinical research nurse, or a surgical resident and only those who expressed an interest were formally recruited. All the genomic studies reported high enrollment rates, but these data may not capture individuals who were approached but not formally recruited. This informal screening typically occurred before randomization or exposure to the assigned consent.

CONCLUSION

There are great scientific benefits to the public availability of genomic data as expressed by Collins:18 “free and open access to genome data has had a profoundly positive effect on progress.” However, data sharing policies must balance the scientific benefits with ethical obligations to participants. This study raises the important question of whether existing policies achieve an appropriate balance or whether they are overly restrictive. In this study, more than half of the participants consented to the public release of their DNA data, and nobody declined enrollment when participation was conditioned on public data release. This suggests that mandating full public data release would maximize data availability. However, it would not be consistent with the preferences of the 47% of study participants who chose a more restrictive data sharing option.

Providing options through tiered consent respects participants' preferences without significantly impeding research. Those who were randomized to tiered consent were less likely to change their consent postdebrief, which suggests that offering options maximizes participant autonomy by allowing participants to make decisions consistent with their preferences. The primary purpose of some large-scale genomic studies (e.g., 1000 Genomes Project) is to create a community resource. For those studies, where unrestricted data sharing is an essential component of the research, traditional consent may be most appropriate. However, in studies where data sharing is desired but not required, tiered consent can provide a mechanism to respect individuals' preferences without imposing an excessive burden on researchers. Participants in this study were generally accepting of broad but controlled data sharing; other groups may be less willing to share their data. Respecting the preferences of individuals within these groups will go a long way toward securing the public's trust and will maximize the diversity of participation in genomic research.

Additional research is needed to assess the costs and benefits of providing participants with control over data release through tiered consent; to determine whether data sharing decisions, attitudes, and preferences differ among disease, geographic, ethnic, and socioeconomic populations; and to better understand any discrepancy between participants' stated preferences, reported concerns, and actual decisions.