Main

Recent technological advances have decreased the expense and increased the feasibility of genome-wide association studies (GWAS), and still more comprehensive genomic investigation, in the form of whole-exome research and full genome re-sequencing, is on the horizon. Because the contribution of individual gene variants to common diseases tends to be small, and because more definitive mutations tend to be quite rare, these forms of research require large sample sizes—in some cases, tens of thousands of participants.1,2 Sharing study data within the research community is an attractive solution to the problem of amassing sufficient datasets; it also promises to increase research efficiencies, maximizing the utility of existing datasets while minimizing participant burden. These benefits have informed policies of the National Institutes of Health (NIH) aimed at promoting data sharing.3,4

However, making such data available to the research community generates tension between two important goods: advancing scientific goals and protecting the privacy interests of study participants.59 Because every person's DNA is unique, the traditional means of safeguarding research participants' privacy—de-identification of study data and biospecimens—does not guarantee protection.1014 In addition, trade-offs exist between de-identification and other possible participant concerns, such as the ability to receive individual research findings or the ability to withdraw from research participation.9,15,16

Numerous previous studies have characterized potential participants' views about willingness to participate in biobanks and related forms of population-based genomic research and how informed consent ought to be handled.1733 There are also some published reports regarding participants' views about research access to medical record data.3439 However, relatively little is known about participants' and the general public's attitudes and perceptions regarding newer data-sharing mechanisms, such as the Federal database of Genotypes and Phenotypes (dbGaP), designed to make large amounts of genotypic and phenotypic information about individual participants available to any qualified researcher. McGuire et al.40 have described research participants' preferences regarding informed consent for public release of data, whereas Kaufman et al.41,42 have investigated public opinions regarding a large prospective genetic cohort study being contemplated by the National Human Genome Research Institute. Lemke et al.43 recently reported the results of focus groups with biorepository participants and members of the general public, in which they found “varying views” with respect to data sharing. This study was designed to explore the perceptions, beliefs, and attitudes of research participants and possible future participants regarding GWAS and repository-based research. In this article, we report study findings with respect to participants' views about data sharing.

The study was performed as part of the electronic Medical Records and Genomics (eMERGE) Network, a research consortium funded by the National Human Genome Research Institute and the National Institute of General Medical Sciences to explore the feasibility of using electronic medical record (EMR) data to derive reliable phenotypic data for use in GWAS. Our project, a partnership between the Group Health Research Institute, the University of Washington, and the Fred Hutchinson Cancer Research Center, is using an existing dataset from the Adult Changes in Thought (ACT) Study to perform proof-of-concept GWAS of dementia, carotid artery atherosclerotic disease, and adverse events associated with statin use. The project also includes an aim specifically targeted at understanding the ethical and social implications of such research, with the ultimate goal of informing policy development.

The ACT Study is a cohort study of aging and dementia and the successor to a model Alzheimer's Disease Patient Registry funded since 1986 by the National Institute on Aging.44 The cohort study began in 1994 with the enrollment of a randomly selected population of 2,581 persons over age 65 who were known not to have dementia at the time of enrollment. The project has been focused on detection of markers and risk factors for Alzheimer disease and related dementias, as well as age-related cognitive decline. Related studies address the relationship of mild cognitive impairment and insulin resistance45 and population-based pharmacoepidemiologic neuropathology.46 Because of ongoing replacement sampling, approximately 2,000 living members are currently enrolled; another 2,000 study participants have died after enrolling in the study. Participants are followed over time, completing periodic study interviews and cognitive tests at study visits every other year. In addition, a rich array of clinical and pharmacy data about each participant is available through the Group Health EMR and other electronic data systems.

METHODS

Between March and August 2008, we conducted a series of 10 focus group discussions at Group Health Cooperative, a large health maintenance organization based in the Seattle metropolitan area. Two separate sessions were held with representatives of 5 selected populations: (a) current research participants in the ACT Study, (b) individuals with decision-making authority on behalf of incapacitated ACT participants, (c) Group Health members aged 18–34 years, (d) Group Health members aged 35–50 years, and (e) Group Health members aged >50 years who were not in the ACT Study. Because the composition of any given focus group can affect group dynamics in unpredictable ways, we held two sessions within each population (e.g., Group A participants were in either Session A1 or Session A2). To be eligible to participate any of the sessions, individuals needed to be able to communicate in English and attend in person. For Groups A, C, D, and E, current enrollment in Group Health was also required. To be eligible for Group B participation, individuals had to recall having given consent on behalf of the ACT Study participant under their care.

Study design

The ACT Study participants (Group A) were included in this investigation because this is the study group being used in the eMERGE project, and we wanted to understand ACT participants' thoughts and questions about GWAS and data sharing for use in future communications. In addition, ACT participants represent individuals who have enrolled in a long-term study that includes a genetic component, rather than general members of the public who may or may not be willing to participate in such research.

Our study setting also afforded an opportunity to explore the perceptions of surrogate decision-makers (Group B) with respect to sharing study data. The ACT Study population includes many participants who have experienced cognitive decline since enrollment or died while being followed for the study. When an ACT participant has been diagnosed with dementia, their continued study participation is authorized (or not) by a legally authorized representative or surrogate. Participants in Group B either held current decision-making authority for a living ACT participant or had previously been responsible for a participant who was deceased at the time of the focus group session.

Inclusion of the three age-stratified groups (Groups C, D, and E) was designed to address potential concerns about the generalizability of our findings. In addition, we wanted to understand whether differences in beliefs, attitudes, and perceptions about data sharing may be correlated with age. Prospective observational studies with young adults represent a valuable research resource for high-throughput genomic investigations, but little is known about this group's attitudes toward such research. In particular, controversy exists over the question of whether younger adults' adoption of web-based communications and social networking tools has resulted in a lower level of concern regarding personal privacy.47,48 We were unable to find peer-reviewed reports that considered research participation and wide data sharing in this light.

Focus groups enable researchers to observe how opinions about the issues under study coalesce or diverge within a relatively homogenous group.49 These guided discussions are an effective and time-efficient means of gathering data for the purposes of policy development and public education, particularly when questions of acceptability are salient and the subject under investigation is complex.50,51 Importantly, for this study, focus groups can elicit information from people who may be intimidated by or unwilling to participate in interviews, who have trouble responding to written surveys, who feel they “have nothing to say,” or who may not believe they have sufficient subject-area expertise to share their thoughts about the topic of interest. This method is also well suited to gathering potentially critical feedback, because individuals may feel more comfortable sharing negative comments when they are part of a larger group.52 All plans and study instruments for the focus groups were approved by the Group Health Human Subjects Review Committee and were developed in accordance with accepted methods for this type of research.53,54 Written informed consent was obtained from all focus group participants.

Focus group pilot

To test the planned recruitment approach and refine the draft discussion guide, a pilot focus group was convened in March 2008 with five Group Health members >50 years of age. Light refreshments and a participation incentive of $50 were provided, and participants received paid parking or taxi service. The session lasted 2 hours, followed by 1 hour of debriefing and discussion with focus group participants. Substantive revisions were made to the focus group guide based on the trial discussion and on participants' feedback. Changes included starting with a few open-ended questions to assess the group's familiarity with basic genetic concepts and health research more generally; providing education on genetics, GWAS, and informed consent as needed; sharpening the hypothetical examples posed for discussion; and reordering the discussion topics to promote participants' comprehension and facilitate the flow of conversation more effectively.

Recruitment

Prospective participants in each of the five population groups were randomly identified using Group Health automated records. Before recruitment, ACT Study staff screened the list of candidates for Group A and removed those who had experienced cognitive decline, would have difficulty traveling to downtown Seattle, or were otherwise inappropriate to contact in the study timeframe (e.g., current hospitalization, recent death of a spouse). The initial recruitment contact was a letter that described the study, explained what would be involved in participating, and told potential participants to expect a follow-up call inviting them to participate. Up to three attempts were made to contact candidates by telephone to ascertain their willingness to take part. Those who agreed to participate were then contacted by telephone or e-mail to schedule the focus group sessions and provide logistical information. A packet of written materials, including the consent form, study description, and directions to the Group Health Research Institute, was mailed to all participants before their scheduled session. Participants were offered the same payments and reimbursements as for the pilot session.

A total of 969 letters were mailed to prospective participants. We were unable to contact 293 of these individuals by telephone. Of the 676 who were successfully contacted, 124 (18%) were ineligible. Ineligible candidates were those who had disenrolled from Group Health (23% of ineligibles), were mentally or physically unable to participate (18%), had moved out of the area (15%), had language barriers (7%), had died (5%), or—for Group B participants—did not recall having given consent for the ACT Study (18%). Another 14% were classified as “other.” Of those who were contacted and eligible to participate, 355 (64%) declined and 197 (36%) agreed to participate. Reasons reported for declining participation were time/too busy (39%), lack of interest (24%), location inconvenient (6%), timing inconvenient (5%), and caring for a sick family member (1%). The remaining 25% declined to state a reason. Coordinating the schedules of those who wished to be in the study led to a total of 79 participants (14% of contacted eligible candidates) being recruited into the focus groups.

Demographic characteristics for the five groups are shown in Table 1. Focus group participants ranged in age from 18 to 89, with the mean age of ACT Study participants (Group A) approximately 20 years older than the oldest group of general Group Health members (Group E), 80.4 versus 62.7. Surrogate decision-makers for ACT participants were approximately the same age as Group E members, whereas Groups C and D were, on average, 40 and 20 years younger, respectively. Overall, focus group participants were evenly balanced with regard to sex, with more men participating in Groups D and E and more women participating in Group B. Focus group participants were overwhelmingly likely (89% overall) to identify their race as white, and the majority (60%) reported annual household incomes exceeding $50,000. Focus group participants were also, as a group, very highly educated, with 83% reporting postsecondary levels of education. Forty-two percent of all participants reported taking part in health research in the past; excluding the ACT Study participants in Group A, 25% of focus group participants reported prior research involvement. Group Health has not routinely collected data on enrolled members' race/ethnicity, socioeconomic status, or educational attainments. The demographic data reported here were collected as part of this study; comparable data are not available for those we could not reach or who declined to participate. General demographic information, however, suggests that focus group participants were representative of Group Health members: 85% of current enrollees are white and 84% have at least some college education (K. Ehrlich, personal communication).

Table 1 Demographics

Focus group discussions

The focus group discussions were held at the Group Health Research Institute during early evening hours between May and August 2008. Each session lasted 2 hours and included 5–9 participants. Two members of the research team (S.M.F. and S.B.T.) cofacilitated the group discussions, and another (J.M.B.) took notes and provided logistical support. The first 15 to 25 minutes of each session were spent introducing the aims and mechanics of GWAS, with particular emphasis on the need for large datasets (and thus the importance of data sharing), the nature and comprehensiveness of genetic data generated in the course of GWAS, and the role of EMR-derived phenotypic data. The full discussion guide is available from the authors on request.

To frame the data-sharing discussion, we asked participants to consider a series of hypothetical scenarios in which they were to imagine that they were taking part in an ongoing genetic study. The data-sharing portion of the discussion guide is shown in Table 2. Participants were asked to consider questions such as whether they would want to limit research access to their EMR data, whether it would be acceptable for their de-identified genetic information to be shared outside Group Health, and under what circumstances (if any) they would wish to be contacted by the research team. All sessions were audio recorded and transcribed for analysis. Transcripts were proofed against the audio recordings, which were then destroyed.

Table 2 Data-sharing portion of discussion guide

Analysis

Immediately after each session, the on-site investigators debriefed in person and one of us was assigned to draft field notes. The other team members who attended the session added their impressions, and then the complete field note was reviewed by another team member (W.B.) who was not present in the sessions. This step gave us the opportunity to determine whether there were any gaps in the discussion guide and offered an interim “reality check” on consistency of approach across the 10 sessions.55 As more sessions were completed, a comparative element was included in the field notes; this form of collaborative memoing helped us to identify emerging themes and concepts and to reach analytic consensus within the research team.56,57 When the transcripts had been proofed and the sessions completed, we closely read and re-read the transcripts and our field notes, writing margin notes and conceptual memos as we went. We then performed a summative content analysis with the aim of describing participants' views regarding data sharing.58,59

RESULTS

The focus groups were designed to elicit participants' views on a number of issues that can arise in the conduct of GWAS, including when re-consent and the return of individual research findings may be appropriate. This report focuses on our results with regard to wide data sharing; other findings are being prepared for publication. Summary findings are presented in Table 3. All quoted text in this section represents participants' words.

Table 3 Summary of major findings

Overall, participants endorsed the value of data sharing and, while they recognized some risks, most considered the potential benefit of high-throughput genomic research to outweigh the possible harms. As one participant put it, “At the same time as I can see some tremendous assets to having [dbGaP], because you can really do something powerful, I think there's always risk. In this case, I tend to think, well, with that potential of where we are in terms of understanding the genome, maybe that's a benefit and maybe, if it's securely regulated and actually looked after, maybe that's a risk worth taking.”

Acceptability of wide data sharing and willingness to participate

Most participants saw the pooling of research resources as a reasonable approach to enhancing efficiency, avoiding duplication of effort, hastening the development of outcomes that would benefit public health, and creating a reference of “historical value” for future generations. Participants told us, “I think there does have to be an open exchange of information in order for some of these really significant things to happen for people's benefit,” “I think some very interesting things may turn up because of that. That vast amount of information has got to have some really positive effects for everybody,” and “I think the whole thing's just a marvelous idea.” One participant remarked, “To me, the more information researchers have, the better, as long as you [can protect against discrimination]. I mean, that's what research is, and you're crippling it by not allowing them to share. And they can't make advances, you know, if they can't—I mean, they can advance quicker [if they share], I would think. I would hope.” Participants believed that the value of such resources lies in 1) the completeness and accuracy of the data and 2) its accessibility to many different researchers investigating many different questions.

We asked participants whether knowing that study data would be deposited in dbGaP would affect their willingness to participate in a genetic study. Most did not see data sharing as a reason not to participate, and some said that it would encourage them to sign up. As one person commented, “It would be another reason to do it.” Some told us that they would be gratified to know that their contribution would continue to be useful: “It's rewarding to know that I didn't just dabble a bit, got in the one study, but … roses keep on growing.” Others saw a practical benefit to participating in a study that would maximize the utility of their contributions. The longitudinal and ever-growing nature of dbGaP was also viewed favorably by most participants. This was especially so in Groups A and E, in which older participants spoke of the continued use of their research data as a “legacy, living on in the lab,” and a way to contribute to society even after death. Another participant said, “It makes me a little less mortal. Not immortal, but a little less mortal.”

There was general agreement among surrogate decision-makers (Group B) that the ACT Study participants they represented would have had no problem with having their study data sent to dbGaP. “I'm the power of attorney for a family friend who is in the late states of dementia, and she actually volunteered for this study, recognizing that her father had dementia, she wanted to participate and, you know, have her body donated in any and all research to gain more information. So I think she would have been very supportive of this.” Another said, “I know my mother would say fine. She definitely would have gone for it.” In another exchange, one participant said, “My aunt would have helped,” and another responded, “My mother-in-law too.” Surrogates were able to separate their own views and preferences from those of their charge: whereas some Group B participants said that while they personally may have misgivings about GWAS participation, these concerns were not shared by the ACT Study participants they knew. As described below, the surrogate decision-makers' perceptions of ACT enrollees' views were consistent with what we heard directly from ACT Study participants (Group A).

A minority of focus group participants considered having research data deposited in dbGaP as a reason not to take part in GWAS. These individuals saw research involving data sharing as a qualitatively different, and riskier, activity compared with other kinds of studies. One participant remarked, “It's a leap of faith to go from a bunch of researchers to a Federal database, and it's not one—if I knew, I would never have signed up for that [hypothetical] study if I thought even any of that information was going to go off …”

Who should have access to data

ACT Study participants (Group A) were largely in favor of data sharing with researchers outside of Group Health in the name of efficiency. The surrogate decision-makers in Group B, most of whom were not Group Health members, were more cautious about the potential sharing of their own data, and they expected to be informed if their loved one's information were to be widely shared. The youngest group we spoke with, Group C, expressed a range of opinions, from no concern about data sharing to requiring detailed information as part of the informed consent process (and the possibility that data sharing would be a reason not to participate in research). In Group D, there was some disagreement about whether any sharing not specifically described in the consent form was acceptable, even within Group Health. Participants in Group E generally felt that data sharing was a good thing and noted that even international sharing should be encouraged, both because “the same diseases affect us here in this country that affect people around the world” and because of an expectation of reciprocity: “If everybody keeps secrets … They may know something that will save my great-grandkids, and if I don't share mine, why should they share theirs? So it's in everybody's interest to have as much information [as possible] out there in the pool.”

Participants generally agreed that sharing with other Group Health investigators and close collaborators (such as those at local academic institutions) would be acceptable, as would sharing with nonprofit, public-interest research organizations (e.g., the American Cancer Society). Such organizations were viewed as “more legitimate,” because participants believed that these kinds of institutions conduct “pure science” aimed at benefiting the general public and advancing knowledge, rather than generating financial returns. (Some participants identified exceptions to this rule, e.g., corporately funded nonprofit organizations, such as research institutes funded by the tobacco industry, whose financial interests could be advanced or impeded by certain research results.) A few people expressed misgivings about the potential for insurance discrimination to occur within Group Health, which has functions in clinical care, insurance, and research.

Current research participants, who generally expressed altruistic motivations for research participation as well as strong trust in Group Health, were willing to rely on Group Health's internal review processes and trusted Group Health to “be selective” about granting access to outside entities. For most participants, concerns began to arise as they considered more “distant” users of the data. Many participants expressed misgivings about sharing data with for-profit entities; in half of the sessions (A2, B1, B2, D1, D2), participants raised the issue before we asked about it. These participants often perceived a mismatch between the altruistic motivations of research participants and the fiscal goals of for-profit companies, as reflected in this comment from a person who had participated in a breast cancer study: “I gave all my medical records, I signed permission—‘Use anything you want.’ It was in a Group Health context. Yes, they could have gone to (a local cancer research institute), yes, they could have gone to (a local research university), yes! Could they have gone to (a large pharmaceutical company)? No!”

We also heard that some participants felt that genetic information should not be patentable, and that it would be unethical to use public resources in “profit-seeking” activities. Although our questions were generic (we asked about sharing with “for-profit organizations”), participants in all groups expressed distrust of the motives, ethics, and research and marketing practices of pharmaceutical companies. Some thought it was unfair that research participants could be made to “pay twice” (or more) for commercial products resulting from the use of their data, once through their study contributions, and again through their taxes, pocketbooks, or insurance. There were counterbalancing opinions on this point, with some noting that industry partners are needed to translate research results into tangible products: “I don't see how you could avoid giving this out to for-profit companies. If this study is of any use at all, they are going to have to make it available to a wide group of experimenters, and there are no wide groups of experimenters that don't have something to do with for-profit companies.” Several participants commented that perhaps for-profit users could be required to pay NIH or Group Health for data access.

Governance concerns

Although some participants trusted the Federal government to manage dbGaP and similar repositories in a responsible manner, others worried about the potential for abuse. Distrust with regard to the possibility of Federal agencies' obtaining research data for purposes other than research was expressed in every session. In some sessions, strong trust in Group Health was contrasted with a lack of trust in the Federal government. As one participant stated, “This is the privacy issue: that there's no failsafe, as far as I'm concerned. And I would trust researchers, but I don't trust the insurance industry, and I don't trust the government.” Participants voiced two kinds of concerns. One was the potential for inappropriate use of data by law enforcement or national security agencies, and the other was the possibility of a “tyrannical government” using such data for eugenics or other objectionable purposes: “I don't really have a problem with it as it stands now, however, the future thought of Big Brother watching you and the government getting involved in doing all these things is scary, just because I think … trust in the government isn't real high right now, and if they were to, I mean, if government really got involved and insisted on doing this stuff, I mean, I could see where they could genetically do everything they wanted to do. And it's scary.”

Participants saw a need for trustworthy governance to ensure that both practical and ethical goals—advancing science and protecting research participants—would be achieved. As one participant noted, “I think the key is finding the right balance between letting science and research go along and make great discoveries and not throttling them back with public policy issues. Ideally, we could kind of work them together so that science could move ahead and the Congress and other bodies could work alongside to make sure the protections are there.” Another said, “I would want to do more than trust [the managers of the data repository]. I would hope that the Group Health institution and the NIH and others would also be very aggressive about safeguards.” A related concern had to do with what participants saw as the inevitability of changes in law and regulations. “You just don't know what your ‘yes’ really means down the line. We've all grown up realizing how nothing seems to be sacred, and how the most secure information somehow gets found and used and abused,” according to one participant.

The obligations of users of shared data came up without prompting from the facilitators, with concerns about whether such users would be held to the same standards as Group Health: “My question would be, do the rules that the first group signed on with, apply to the group that gets handed the new information? If we sign consent information forms and all that kind of stuff, what's the obligation of the second group to follow those guidelines?”

Inclusion of data from participants' electronic medical records

One focus of the eMERGE Network is to assess the feasibility of GWAS using EMR-derived phenotypes, requiring the sharing of some clinical information with the dbGaP repository. Participants understood the research value of such information. However, many participants saw medical record information as potentially more sensitive than genetic data, in part because of the potentially stigmatizing nature of information that could be contained in the EMR, such as reproductive health information and mental health history: “I can see [that] your personal health record, if it's a carte blanche to share anything that's in there, a lot of us might have reservations.” Some participants were uncomfortable with the idea that information they had shared confidentially with their health care provider could be made available to researchers.

Participants in Group A and E agreed that there were no specific parts of the medical record that they would like to withhold, and some mentioned that sharing the entire medical record without reservation would be of greatest utility to research. Group B participants had a range of opinions regarding the use of EMR data. Some were comfortable with allowing open access to the complete record, whereas others questioned the utility of such access. Some individuals were agreeable to making their records freely available, whereas others would not personally consent to such access. (“I doubt that I would participate in a study that involved universal access to my health care records.”) Participants in Group C, the youngest group we spoke with, advocated direct control over how much and which parts of their medical records would be available for research purposes (“I think it would also be helpful to have some way of my being able to take control of that process and being able to check boxes of, like, ‘It's fine to have this information, but you don't have permission—when it comes to, like, you don't have the right to my reproductive health [information], but you do to my blood pressure.”) Others noted that it could be difficult to operationalize such an approach, particularly given that researchers may not know in advance which variables would prove to be needed for a future study. In Group D, several participants thought that the sharing of physician's notes would not be appropriate; even the most altruistic participant thought that the text of patient-provider conversations held in confidence should be off limits.

Notwithstanding these concerns, most participants were generally willing to have some EMR data shared for research purposes, provided that the data were fairly limited (e.g., if only “strictly scientific things … like diagnostic codes and medications” were included), well defined (e.g., if they knew ahead of time what data would be extracted for research use), and de-identified (with links to personal identifiers maintained at Group Health): “If it's anonymized, and Group Health is the protector, I wouldn't have a problem.”

Privacy and confidentiality concerns

Although we did not raise the issue of privacy directly, it was an underlying theme throughout the discussions, and most participants had at least some privacy concerns. Participants in Groups A and E were substantially less worried about privacy and confidentiality than other groups. Many of those we talked with, however, said that they believed the potential benefits of wide data sharing outweighed potential risks: “I guess it comes down to a balance. How much good is expected from it, against that extreme risk that might—might—happen. You can't weigh that. I'd say you can't even weigh it today, let alone 5 years from now. So you just kinda take it on faith, and do it.”

A recurring theme in all groups was the inevitability of “protected” data being accidentally released (several participants mentioned stolen laptops containing confidential data) or otherwise accessed by unauthorized persons. Focus group participants simply did not believe that data security can be guaranteed, despite researchers' good intentions: “There's no realistic way of controlling [the data], once you share it. Let's face it.” and “Unless we go back to working out of a shoebox, there's no security at all.” At the same time, most people felt that the risk of breach of confidentiality is commonplace in modern life, as demonstrated by this exchange in Session B1:

  • Speaker 1: “As soon as there's a database, and it's on a computer, sooner or later there is a thing where bank records all of a sudden get lost, or somebody steals them, or somebody hacks them, or somebody's personal computer gets stolen out of their home, and all of a sudden it's gone. And it's bank records or it's hospital records, and this happens several times a year, it's in the paper. There's ten thousand records that were supposed to be private, are now unaccounted for.”

  • Speaker 2: “However, having said that, and knowing that this happens, we don't stop using banks! No, we don't. We don't stop those kinds of things. We do everything we can to divorce our personal information from uses we haven't authorized, but we still, just because of the complexity of life, are involved with insurance companies and banks and employers—and, hopefully, health research.”

Some participants believed that health information would be a less attractive target for ill-intentioned individuals than other kinds of data (such as financial records or credit-card information). To a number of participants, a confidentiality breach regarding banking information or other personal information that could be used for purposes of identity theft would be a greater cause for concern than would unauthorized access to their de-identified genetic information.

Some participants saw the large size of typical GWAS (and associated databases) as conferring a certain degree of privacy protection, citing “safety in numbers” as reducing the risk that they would be personally identified or harmed as a result of research participation. “You know, there's something that feels more comfortable about a huge study. You're kind of lost in that huge sea of information, and it really seems like fewer risks.” “It seems to me, as you increase the amount of data, your individuality is really getting more and more lost. You are just a much smaller part of a large data pool.” But not all participants agreed with this idea: “If I were to share my DNA and medical record, I would add one drop to this ocean of statistics. But if something were to go wrong, that would have a great effect on my life.”

Participants felt that robust privacy protections would be necessary to ensure the quality of the data to be deposited, for two reasons: first, enrollment would be higher and the ultimate value of the resource maximized if potential participants believed that appropriate steps would be taken to protect the data (acknowledging that such protections are not absolute). One person commented, “Don't you think that if the safeguards get lessened, people will stop saying, ‘Ok, I'll give my DNA?’ They'll stop being a part of the study if they perceive it isn't safe, and then our information will be just kind of dead-ended.” Second, research participants may be tempted to hedge or withhold potentially important self-reported information if they do not trust that it would be kept confidential: “One of the thoughts that comes to mind is the validity of the data somewhat could depend on the confidentiality, because a person might be a little hesitant to be really honest and outright if they felt uncomfortable about it.”

DISCUSSION

Participants in the focus groups understood the rationale for wide data sharing, especially in the context of GWAS and related genomic approaches, and believed that making de-identified study data available to the research community is a social good that should be pursued. Advantages identified by participants fell into three broad categories: increased research efficiency, benefit to patients and society, and respect for research participants. The value of maximizing research efficiency was embraced by all the groups, and participants favored efforts to reduce unnecessary duplication of effort, control costs, promote collaboration, and make the most of available resources. Participants also expressed the belief that broad data access would increase the potential for meaningful findings to be uncovered, and for health benefits to be realized in a more timely fashion. Most saw altruism as the primary reason anyone would agree to participate as a study subject; because of this, they saw researchers' maximizing the use of subjects' contributions as a respectful recognition and realization of subjects' goals.

Privacy and confidentiality concerns were also common, although they were not necessarily a deal-breaker when it came to willingness to participate. Although some participants considered the possibility of breach sufficient reason not to take part in genetic research, most considered the risk to be relatively small and worth taking in view of the potential benefits of such investigations. Several participants perceived the risks involved in data sharing to be substantially less concerning than the (often unavoidable) privacy risks they encounter in daily life. This finding raises interesting questions about whether such research may—at least in the minds of possible study participants—be classifiable as “minimal risk” under current regulatory definitions.60

Although younger people may have been expected to be more comfortable with the technology that allows data sharing and thus express fewer privacy concerns, we found that older people were least worried about the potential for loss of confidentiality. Younger participants were more inclined to desire direct control over which data could be shared. This could reflect greater interest in privacy per se, or perhaps greater knowledge of the technological feasibility of user-controlled privacy settings, such as those used in online social networking applications. Older participants, by contrast, told us that they had nothing to hide and little to fear. These findings may be consistent with survey results reported by Kaufman et al., in which younger respondents (<60 years of age) were more concerned that research data could be used against them.41

A few limitations of this study should be noted. Although our participants were fairly representative of Group Health Cooperative membership, that population tends to be slightly older, more highly educated, and less racially and ethnically diverse than in the Northwestern United States more generally and in the United States as a whole. Even within that context, our focus group participants were very well educated: nearly three quarters (73%) had earned at least a bachelor's degree, and more than a third (37%) held advanced degrees. Some degree of selection bias was unavoidable, as Group Health Cooperative members who are unlikely to support genetic research and associated data sharing may have been unwilling to participate in these focus groups. It may also be the case that Group Health Cooperative members are generally more favorably disposed toward research, given its history as a consumer-governed organization and the Group Health Research Institute's long-standing and locally well-publicized track record of health research. Many of our participants (25% excluding ACT Study enrollees) reported that they had taken part in health research in the past. However, rather than limiting the generalizability of our findings, the unique features of the study population may make this an informative “extreme case.”61 If a population with very high trust in both the researchers and the research institution nonetheless has significant reservations about data sharing, groups that do not have this kind of relationship with researchers may well be more likely to have such concerns.

In contrast with the academic community's emphasis on human subjects protections and regulatory controls, focus group participants viewed trust between researchers and subjects as central. Although they did consider informed consent important, most focus group participants seemed to believe that this tool is but a small part of the governing relationship between researchers and subjects, especially in a context in which future recipients and downstream uses of data may be difficult to predict. Considering the potential harm that could result from breach of confidentiality, for example, participants did not want—and did not place much if any credence in—assurances that their information would be kept private. Paradoxically, strong assurances regarding privacy were seen by some participants as generating less trust in researchers. They did, however, want researchers to promise that they would protect study data to the best of their ability, provide an honest accounting of any breaches that may occur, and make their “best effort” to mitigate any negative effects. This shift from legalistic guarantees to personal commitments signals a fundamental difference in how the relationship between researchers and study participants could be construed.62,63

Our findings point toward the need for investigation regarding governance models that enact the values of stewardship. To develop research practices that foster trust and trustworthiness, more dialogue between the research community and the lay public is needed; and the issue of trust (or lack of trust) in the Federal government must also be addressed. These engagements should explore ways of maintaining participants' and public trust around potentially contentious issues, including data access by for-profit entities, procedures for Federal oversight and accountability, the return of individual research findings to participants, and the conditions under which researchers should re-contact participants to seek permission for a new use of existing study data. Such efforts will help the research community to proactively address participant expectations for the coming era of high-throughput population-based genomic research.