Introduction

The recent drive to create biobanks and data repositories to store genetic, phenotypic, and other health-related data generated from and for biomedical research is intended to encourage and facilitate data sharing for secondary research purposes.1,2,3 Researchers are increasingly prompted by funders, research organizations, and journals to make their data easily accessible.3,4 Furthermore, data sharing helps to optimize investment from public funds and strengthen the statistical power and rigor of research.2,4,5,6,7,8,9,10,11 Researchers have an ethical duty to make better use of data and avoid unnecessary repetition, particularly in genetic studies, which extract human biological samples for analysis.1,3,12 While sharing research data is encouraged, researchers have an obligation to meet rigorous ethical and regulatory requirements to provide accurate and timely data, and to safeguard participants’ privacy and confidentiality.12,13,14,15,16 These requirements have been strengthened by the European Union (EU) General Data Protection Regulation17 (GDPR) that came into effect across Europe in May 2018.

A key challenge is that funding bodies do not provide resources to sustain the governance of data sharing; therefore, the onus is on research projects to develop and implement data sharing policies2,18 that conform to legislation throughout the data collection phase and thereafter. Data access and security are tasks typically managed throughout the duration of a study. Maintaining the same level of oversight after project completion is challenging without adequate governance and funding in place. In response to this, federal or centralized databases such as the database of Genotypes and Phenotypes (dbGaP)19 and European Genome-Phenome Archive (EGA)20 have pioneered submission and access policies for management of open and controlled genome study data sets. Also, the Public Population Project in Genomics and Society (P3G) is a consortium providing policy support to the international scientific community about how data sharing can be facilitated and managed.13 While these mechanisms are useful and well formulated, they may not be appropriate for some research, which has collected diverse types of data and has specific priorities for data sharing and monitoring.4,13,18,21 Consulting researchers’ and scientists’ views about data sharing governance and practices are important to understand how data should be managed after the end of research projects. Importantly, research participants should also be given the opportunity for involvement in data sharing decisions, including which aspects of study data should be safeguarded and to indicate their preferred level of control over data sharing.10,14,22 Not engaging creates a lack of connection with the participants and reduces their control over their data being shared for future research.23

There is growing literature about public and research participants’ engagement in data sharing decision making,7,23,24,25,26,27 though this has mainly focused on what data donors’ preferences would be in hypothetical contexts. Haga and O’Daniel28 found that public concerns about data sharing focused on privacy and mistrust in how samples are used and who has access. McGuire et al.29 investigated consent for data sharing and reported that research participants were happy to share genetic data with restricted access, indicating a disconnect between open data sharing policies and participants’ preferences for privacy and autonomy. Finally, Ludman et al.30 found that participants’ views about submitting data into dbGaP were dependent on their ability to have control over sharing their data, and that other methods such as opt-out or notification-only for public release were less acceptable than reconsenting for submission of their genomic data and having restricted or controlled access.

These studies have mostly been conducted in North America and have focused largely on improving participation rates and the initial consent process rather than understanding participants’ views about how data access should be reviewed and managed when studies end. The current study provided an opportunity to gain perspectives about future data access and governance from participants in European countries that are economically and culturally different.

Here, we report the results of a survey conducted with participants recruited to the Diabetes Research on Patient Stratification (DIRECT) Study, a European project funded by Innovative Medicines Initiative Joint Undertaking (IMI-JU).31 DIRECT is a European consortium of 25 partners from 10 countries, which recruited patients with and at high risk of developing type II diabetes (T2D).

As part of the DIRECT project, vast amounts of health and genetic data have been collected with the aim to identify biomarkers for the development, progress, and treatment responses of patients with and without T2D. A major focus for the project was the generation of a secure database so researchers in the consortium could access data easily for analysis. Significant effort was devoted to devising a data access policy that would allow secure and fair sharing between consortium members, with strict rules about how that would be supported and what was permissible. However, this did not extend to the postproject use of the data because it was not clear at the beginning of the DIRECT 7-year project what would be required. This raises the question about whose responsibility it would be to manage access to the data set and ensure that data is shared legally and ethically in future. Although research participants had consented to the broad use of their data postproject from the outset, it was decided that it would be ethical to elicit the views of participants about the details of the future governance structure. This aligned with the increased legal requirements under the GDPR, which at the time had not come into force. We conducted an engagement survey about data sharing governance with participants to explore the following: the importance of data access governance factors; preferences for which data types may be shared and with whom; and who should be involved in managing data access beyond the project.

Materials and methods

Study design and recruitment

A cross-sectional survey was conducted on a subset of the wider project sample between September 2015 and March 2016. In total, 7264 participants were enrolled into at least one of seven studies of the DIRECT project. Details of the patient cohort is described elsewhere.31 The study sample consisted of patients diagnosed with T2D and individuals at high risk of the disease but who were receiving no treatment for diabetes. From the participants enrolled into the project, a subset of 1082 were approached through ten specialized diabetes clinics and university study centers across four of the collaborating countries including the United Kingdom, Denmark, Sweden, and The Netherlands. Survey respondents were adults aged 18 to 80, of white European descent, and had already given a broad consent for data to be collected, stored, and used by both DIRECT researchers and researchers in the wider scientific community, who had been granted access.

Participants were recruited in one of two ways. First, during their follow-up appointments, as part of the main DIRECT study. Upon arrival for their appointment, they were given a blank envelope containing the survey, an information sheet, and invitation letter. If participants decided to take part, they were instructed to place the completed survey inside the blank envelope, seal it, and place it in a designated collection box. Second, participants who had already completed all their DIRECT study visits were mailed the invitation letter, information sheet, and survey, with a prepaid envelope for return. Consent was implied with the return of a completed survey.

Survey development

The survey underwent an iterative development process, through review with wider consortium members, specialist diabetes nurses, and T2D patients. They provided feedback about the survey’s content, questions, and structure. The final survey was composed of three sections with 24 items (see supplementary information survey): section 1 comprised of questions relating to respondents’ motivations to take part in medical research and their experiences since being involved in the DIRECT project; section 2 related to opinions about data sharing; and section 3 collected respondents’ sociodemographic information. Response categories varied by item and included multiple-choice answers, as well as 5-point Likert scales assessing happiness, importance, and support categories for given statements. Items about risks and benefits to sharing genetic information and attitudes toward privacy and advancing research were used from an existing survey.32 Results reported here relate to questions specifically about support for sharing data; what and with whom data can be shared, and how it should be managed; and sociodemographic characteristics. The study was approved by institutional review boards in Denmark (The Secretariat of the Scientific Ethics Committees for the Capital Region Protocol no. H-1-2011-166 Note no. 50965, and H-1-2012-100 Note no. 50694), Sweden (Regional Ethics Examination Board in Lund Dnr 2015/815 and Dnr 2015/843), The Netherlands (Medical Ethics Review Committee Vrije Universiteit Medical Centre Protocol 2012.222), and the United Kingdom (Newcastle and North Tynesside 1 Research Ethics Committee 12/NE/0132; East of Scotland Research Ethics Service 11/ES/0046; and 12/ES/0034).

Data analysis

Descriptive summaries of responses were calculated as frequencies and percentages, and results stratified by country are provided in the supplementary materials. All variables in the analyses were categorical. The Likert scale for item “How do you feel about your nonidentifiable data being shared for medical research?” was collapsed from 5 points into binary responses “Supportive” versus “Not supportive.” Response options in the categorical variables were collapsed due to small numbers in the extreme options, and to balance the distributions as much as possible. Extreme options were grouped but middle categories were retained to minimize the information lost through collapsing. The “I don’t know” and “Prefer not to say” options were treated as missing because of minimal (less than 5%) or zero counts. Chi-square analyses compared sociodemographic characteristics stratified by country. Univariate binary logistic regressions were conducted to assess associations between sociodemographic characteristics and support for data sharing, and multivariate binary logistic regressions were conducted for estimates of support for sharing data in future research by sociodemographic characteristics. The explanatory variables entered into the regression models were age, gender, country, education level, self-rated knowledge of genetics, diabetes status, previously worked in health or medicine, and self-reported health. Results are presented as odds ratios (ORs) with 95% confidence interval (CI) and significance level p ≤ 0.05. Due to small percentages of missing data, only complete cases were analyzed for the results presented. Internal consistency (Cronbach’s α) is reported for the following item scales: happiness to share different data types, happiness to share with different research groups, and the importance of data governance factors (see Tables 3 and 4). All analyses were performed using SPSS 22 (SPSS, Inc., Chicago, IL).

Results

Demographics and response rate

Completed surveys were received from participants attending clinic follow-up visits and by post. A total of 855 surveys were returned. The combined response rate for the United Kingdom, Denmark, and Sweden was 79%. The highest response came from The Netherlands (94%), United Kingdom (86%), followed by Denmark (70%), and the smallest from Sweden (59%). Nearly three-quarters of respondents were aged 61 or over (73%), 43% were female, 41% with vocational or professional qualifications, compared with 19% with degree level, and 37% with secondary education. Seventy percent had been diagnosed with T2D at the time of completing the survey (Table 1). When asked “How would you rate your knowledge of genetics?” 837 respondents rated their knowledge on average as fair (median = 3), where 1 = very good, 2 = good, 3 = fair, 4 = poor, 5 = very poor.

Table 1 Participant characteristics by countrya

Support for data sharing

Participants were asked about their level of support to share their de-identified data for medical research after the project had ended. The survey found that 97% were either very or fairly supportive of their data being shared. Country of origin was independently associated with support for sharing data (Χ2 = 11.58, df = 3, p = 0.009; see supplementary Table S1 for descriptive results and Table S2 for the univariate results), but there were no other significant associations between other sociodemographic factors and support for data sharing. After adjusting for all other factors, Dutch respondents were significantly less likely to support sharing of their de-identified records for future research compared with UK respondents (OR = 0.211; 95% CI, 0.073 to 0.608; p = 0.004). Respondents with vocational or professional qualifications were more likely to support data sharing compared with those with high school (or lower) education (OR = 3.82; 95% CI, 1.243 to 11.734; p = 0.019) (Table 2). Other education level groups were not significantly associated with support for data sharing.

Table 2 Multivariate binary logistic regression of factors associated with support for sharing de-identified data to be shared for medical researcha,b (N= 798)c

Level of happiness for sharing different types of data and with different research groups

Respondents were asked about their level of happiness to share different types of data (Table 3). More than 85% rated being happy or very happy to share medical history, genetic information blood test results, and lifestyle information, compared with 64% of respondents being happy to share their personal information.

Table 3 Happiness to share different types of data and with different groups for medical researcha

Respondents were asked to rate their level of happiness to share study data with different research groups (Table 3). Overall, respondents were most happy to share data with research teams in universities across Europe (90% happy or very happy), and least happy to share with commercial companies such as drug companies (56%). While there is support to share data across different types of research groups, our findings show a moderate willingness to do so. Very few respondents indicated being “very happy” to having their data shared across groups, though these were more than those who were “very unhappy.” The percentage of “happiness” to share with different research groups, stratified by country, is outlined in supplementary Figure S1.

Data governance and data access committee preferences

Respondents were asked how important the listed data governance factors were when their data are being shared with other research teams. The factors rated as highest importance were “The database is highly secure” (89%; extremely important and fairly important), followed by “Members of the DIRECT project can monitor how my data is being used” (74%). For all data governance factors, the median value = 2 (where 1 = Extremely important, 2 = Fairly important, 3 = Neither important or unimportant, 4 = Fairly unimportant, and 5 = Not at all important). This indicated that respondents believed that governance factors were paramount; for example, that all data access requests must be reviewed by appointed experts, and that someone from the DIRECT consortium should monitor use of data (Table 4).

Table 4 Importance of data governance factors when data is shareda, b

When asked who should be involved in any DIRECT poststudy Data Access Committee (DAC), respondents selected from multiple options. Figure S2 (supplementary information) summarizes those who respondents considered important for the DAC. The most frequently selected type of person or group to be involved were a DIRECT researcher (687 counts), diabetes doctor or nurse (582), diabetes patient representative (496), DIRECT participants (268), lawyer (212), and lay person (116). There were nine selections for “Other,” however, the survey did not ask for respondents to provide further information.

Discussion

Eliciting views of study participants at the individual or group level about whether, how, and with whom their data should be shared after a study has ended could help inform future data access decisions.5,10 This study preceded the GDPR. However, it aligns with the legal requirements for the use, storage, and management of data that have undergone a significant change. The GDPR strengthens individuals' rights and imposes greater obligations of accountability and transparency on researchers. New protections for individuals are the right to object to how their data is processed, the right to withdraw consent, and the right to be contacted in case of data breaches (Art. 7[3] and 34) (ref. 17). To be able to exercise these rights participants must receive information about the use of the data. Beyond the GDPR, offering participants' engagement in data management decisions further strengthens transparency and accountability.

While many studies have investigated the choices participants would hypothetically make to inform future data sharing for research policy and practice,28,30 this study engaged with participants to help develop the postproject data sharing strategy for the data obtained from their participation. This approach is not commonly found in the literature. Participants had undergone various data collection phases and were familiar with how the research was conducted and what data were collected; they were well suited to provide insights on if and how their data should be shared. Further qualitative research may provide insights about the reasons for respondents’ selections. Ongoing engagement and involvement is also key if research participants are to become integral stakeholders in data sharing governance.

The survey results indicate that overall respondents were supportive of sharing their de-identified data outside the consortium. This result mirrors North American studies investigating perspectives about de-identified data sharing, and reiterates the conditions under which respondents are prepared to allow data to be shared, which included that data are held and disseminated securely,28 donors have control over what data are shared,25 donors should be asked if their data can be shared,12 and that data are shared for the benefit of wider society.23 Interestingly, Dutch respondents were significantly less likely to be supportive of sharing their de-identified data compared to UK respondents. Similarly, a special Eurobarometer survey investigated European citizens’ attitudes toward the impact of digitization in daily life, and asked about their willingness to share anonymized data with different research groups.33 It reported that Dutch respondents were less willing than Swedish and Danish respondents to share anonymized data with public authorities and public sector companies for medical research.33 Further investigation is needed to understand why people in The Netherlands appear to be more protective of their data.

The survey set out to determine which types of data respondents were content to share outside of the DIRECT consortium and to consider whether their views varied by data type. The results indicated that respondents were moderately happy to share most types of information, with least support for sharing personal information. These findings echo previous literature investigating willingness to consent to share data.25,29,34 Other studies also show support for data sharing, for medical research, as long as data are de-identified.8,9,10,34

In our sample, respondents were happier to share data with universities and least happy to share with commercial companies. The reasons for this were not explored and warrant further investigation. In comparison with these results, it has been suggested that public concerns relating to sharing data with government or the pharmaceutical industry were centered around beliefs about past reported misuses of public data, specifically of vulnerable and minority communities.28 Additionally, McGuire et al.35 reported that participants in focus groups, investigating data sharing from genome-wide association studies, entrusted home institutions and local investigators to protect privacy and their data relative to federal control. Similarly, a report commissioned by the Wellcome Trust in the UK26 found that willingness to share data is influenced by trust in the institution and the extent to which patients are informed about who their data are being shared with, and what type of data, particularly in relation to commercial entities.

The survey was a mechanism for involving DIRECT participants in response to growing research about providing data donors with the choice to share and exercise control over their data; data privacy was an important aspect of their consent to share data, particularly for use in future research and the potential of for-profit organizations to access their data.10 It has been claimed there is a need, however, to redress the balance between privacy and openness of data that may be sensitive in nature, such as in genomic studies;15 engaging with stakeholders to determine their recommendations on how to proceed may facilitate this. Kim et al.23 found that patients who believed that sharing health-care data through specific health-care networks, which were considered secure and already protected privacy, were more likely to consent to share. Privacy and security are high priorities for patients and research participants, particularly when it is proposed that data may be shared with commercial companies26 and governmental organizations.28,30 The likelihood that patients and participants consent to use of their data is dependent on public trust in gatekeepers and the potential users of personal data. It has been argued that retention and use of existing data is more ethical and less wasteful,36 not least because it eliminates the burden of participants providing data repeatedly, provided participants are supportive of such practices and trusting of the research initiative. Involving patients and public in research data access and governance decisions facilitates transparency and trustworthiness of practices and is in line with the GDPR.

This study found that respondents are happy for their data to be shared beyond the original research, so long as measures are in place to protect them and provide control. However, that is increasingly difficult to do if there are no resources specifically allocated for this. Leadership from funding bodies should be key in enabling harmonization of data access and sharing.4,5,37 Shabani et al.5,37 stated that funding bodies should lead the way in which data access is arranged and that the creation of centralized DACs should be considered. In the absence of this support, databases such as dbGaP and EGA have paved the way to build mechanisms for data management and sharing. However, such guidance may be too generalized and broad for specific research projects, prompting individual research bodies to develop localized strategies relevant to their original research.4,13,18,21

DIRECT is an example of a large prospective project, spanning multiple European countries, that has taken significant steps to promote data sharing in a secure environment within the consortium and wishes this to be upheld once the project has ended. How this should be implemented by projects of this type needs to be addressed within the wider scientific community, and importantly, should include the involvement of study participants. This is particularly so if researchers are committed to data being shared and reused to maximize its use while simultaneously safeguarding the privacy of data subjects.

In conclusion, considering the survey findings and supporting literature, one solution might be to form a DAC to be responsible for data sharing and management beyond the project.4,16,37,38,39,40 However, as the survey results highlighted, nuances in the preferences of respondents on how data should be managed and shared would need to be accommodated within any policy and practice directives. The challenge now is to identify the mechanisms by which respondent preferences can be followed through and offer controllability. The next stage of DIRECT’s data governance strategy development will be to identify appropriate persons and mechanisms to be responsible for data sharing and management, identify repositories to store data, and consider how to facilitate DIRECT participants’ involvement in all future data sharing decisions.

Strengths and limitations

This is a novel large European cross-sectional study to understand data sharing preferences of patients enrolled in the DIRECT project. It was important to investigate participants’ perspectives from different European countries to enable development of data governance strategies relevant for this region. While this was a homogeneous self-selecting sample, the survey was the first step toward engagement with study participants to incorporate their views about governance of their data in the future. Although there was a large response rate, several limitations existed. The nature of self-reported studies introduces response bias in that it is difficult to establish whether respondents answered truthfully. However, to minimize this, participants were informed in the study materials that their information would be kept confidential and that they would remain anonymous. While the reliability of the item scales was within the acceptable range, the initial piloting of the survey only elicited content validity, and further validation is required. The average age of respondents was older than the general population, as T2D is more prevalent in people middle-aged and above. Additional demographic data such as comorbidities, occupation, and income were not collected on this occasion. The eligibility criteria for the DIRECT project did not include patients from ethnicities other than white European descent, due to the particular design and aims of the wider project. The sample therefore may not be representative of the overall populations in the countries investigated and precludes any conclusion regarding data governance attitudes more widely. Future research could replicate this study in different cross-cultural groups, proportionately represented in the populations of the countries involved. The survey was not designed to include open responses; it was translated into four languages and as such would have introduced complexities such as back-translation into English, which was beyond the remit of this study.