Introduction

International research consortia in the field of biomedicine collect large amounts of information consisting of different data types from participants that are often located in different countries. A key tenet that facilitates ongoing and future research is data sharing. Data sharing is viewed as good practice for advancing biomedical research, as it maximises the use of biological samples and other types of data, reduces participant burden, and stockpiling and pooling data helps to improve statistical power of research [1,2,3]. Data sharing within research consortia and externally is encouraged and is increasingly being adopted and enabled through advanced storing and sharing technologies. Supported by research data governance strategies, the optimisation of data sharing has been an important focus for the Open Science Agenda [4].

Empirical research to date has focused on three key areas: willingness of research participants and the public to share their data [5,6,7,8]; how to deliver broad informed consent to enable the sharing of data [9,10,11,12]; and patients’ happiness to share their own clinical data for research purposes [13,14,15,16]. Central to these studies has been the need to address privacy and confidentiality of data donors and any fear of data misuse. A balance, therefore, must be struck with permissions to use data for research. As more data is captured including genomic, phenotypic and other health-related data, safeguarding study participants’ privacy and confidentiality requires robust governance mechanisms.

Through ethical and legal standpoints, data protection and informed consent policies can support data sharing practice to avoid privacy mishaps. However, current mechanisms most commonly adopted in large consortia (such as a broad consent model) do not go far enough to address individual participants’ attitudes and perceptions about data sharing governance and practice [5, 12, 17,18,19]. Studies employing broad consent approaches regarding the release of de-identified data for future research may not be sufficiently ethical. There may be inconsistencies in the information provided at the time about how data may be shared and this approach further removes the ability for data donors to have control over what happens to their data after the end of the agreed research period [12, 15].

The question, therefore, arises as to how to design data governance that integrates study participants’ preferences, and the first step is to engage with them. Whilst some studies have explored the patient, public, and research participants’ perspectives about research consent types, preferences for how and who data should be shared with [9, 11, 12], to date, little is known about research participants’ views and preferences about how their biomedical, particularly genetic and phenotypic data from one research project should be shared for future and separate research [5].

While there is increasing recognition to engage and involve research participants in data governance plans in international consortia, studies highlighted here have largely been conducted in North America, with focus on hypothetical data sharing scenarios and improving broad consent at the initial stage of projects. Furthermore, the challenge in engaging research participants about the management of data sharing is compounded when international consortia collect data from people in different countries, where cultural and legal differences can affect readiness and ability to share data [5]. Differences in wanting or having control over data sharing also varies within diverse populations in relation to privacy concerns [1]. Participants in large consortia projects are often not consulted about their opinions on how their data should be governed during and after the end of the research project. Differences in attitudes and preferences between culturally dissimilar countries in Europe have been least studied, within the context of future research data sharing [5].

Therefore, this study aimed to investigate research participant’s beliefs about the importance of protecting their privacy, advancing research quickly and controlling future data sharing beyond the end of the research project with a subset of participants in four European countries enroled in the DIRECT (Diabetes Research on Patient Stratification) project.

Materials and methods

Study population and recruitment

Participants were sampled from a subset of those enroled in the DIRECT studies. In total 1082 participants attending follow-up appointments for other DIRECT studies at study centres in Denmark, Sweden, The Netherlands, and the UK were invited to complete the cross-sectional survey. The overall DIRECT project participant sample and recruitment is described in detail elsewhere [20, 21]. Study participants were eligible to take part in the survey if they were aged 18 years and older, of white European descent and able to consent to participate. This study was approved by institutional review boards in: Denmark (The Secretariat of the Scientific Ethics Committees for the Capital Region Protocol no. H-1-2011-166 Note no. 50965, and H-1-2012-100 Note no. 50694), Sweden (Regional Ethics Examination Board in Lund Dnr 2015/815 and Dnr 2015/843), The Netherlands (Medical Ethics Review Committee Vrije Universiteit Medical Centre Protocol 2012.222), and the UK (Newcastle and North Tynesside 1 Research Ethics Committee 12/NE/0132; East of Scotland Research Ethics Service 11/ES/0046; and 12/ES/0034).

Survey measures

Survey items analysed in this study were selected from a wider patient engagement survey that assessed: DIRECT participants’ willingness to participate in medical research; support for data sharing; preferences for control of different types of data; who data are shared with; and, preferences for future data sharing governance. Sociodemographic characteristics and self-reported knowledge of genetics and health status were also collected. The survey was developed with DIRECT diabetes clinicians, participants with Type 2 diabetes (T2D) and consortium researchers through iterative review and adjustment to question items [21].

Respondents were asked to assert their agreement on four statements measuring beliefs about whether it was important to advance research quickly, whether privacy should be protected, and whether respondents perceived that there were risks or benefits to sharing their genetic information [22]. The outcome variables assessed participants’ ratings of importance of which data are shared and with whom (importance of control), and were measured by the questions “How important is it that you decide what types of data are shared” and “How important is it that you decide who your data is shared with?”. The survey also measured respondents’ happiness to share different types of data. Similarly, respondents were asked to rate their happiness to share their de-identified data with different research groups. These items were treated as continuous explanatory variables. Participant characteristics were binary or categorical in nature. The explanatory variables were recoded into smaller categorical variables due to low numbers of responses in some categories, except the items measuring happiness to share different types of data and with different research groups, which were treated as continuous. The outcome variables were collapsed into binary variables for ease of interpretation.

Data analysis

Descriptive statistics were calculated as frequencies and percentages, and Chi-Square tests for independence assessed associations between categorical variables. Univariate (see supplementary tables S2 and S3) and multivariate logistic regressions were conducted to assess which explanatory variables predicted the odds of importance for control over (1) types of data shared, and (2) who data are shared with. These outcome variables were binary (important versus not important). The continuous explanatory variables entered into the logistic regressions were the four items measuring beliefs and perceptions about data sharing, happiness to share different types of data and with whom data can be shared. Between-country differences were assessed in the multivariate logistic regressions adjusted for by all other variables (see Table 2). The multivariate logistic regressions were adjusted for by the categorical variables: age, gender, country, education level, self-rated knowledge of genetics, diabetes status, previously worked in health or medicine, and self-reported health (see Tables 3 and 4). All univariate and multivariate models contained complete cases, as not all respondents answered all of the questions and the minimal amount of cases were missing. All analyses were also stratified by country to assess associations within countries and compare findings. The logistic regression results are reported as odds ratios (ORs) with 95% confidence intervals (CI) and significance level p < 0.05. The reference group in all regression models comparing the countries was the UK due to the largest number of responses received from this participant group. The analyses were performed using SPSS version 22 (SPSS, Inc., Chicago, IL).

Results

Sample characteristics

In total, 1082 DIRECT project participants were approached and 855 participated in the engagement survey from University research centres and Diabetes clinics in the four countries. The combined response rate for all countries was 79%. The majority (73%) of participants were aged 61 and over, 57% were male, 70% had been diagnosed with T2D, 60% had education qualifications above secondary school, and 20% had held a job related to health or medicine at some point in their career (Supplementary Table S1). Sixty-three per cent of 835 respondents rated their health as ‘very good’ or ‘good’ versus 30% rating it as ‘fair’. Forty-five per cent rated their knowledge of genetics as ‘fair’ versus 39% that rated it as either ‘poor’ or ‘very poor’.

Beliefs about research and privacy, and risk-benefit assessments to sharing data

Eighty-nine percent of respondents either strongly agreed or agreed that it is important to advance research as quickly as possible; however, all respondents were already participating in research as they had agreed to enrol onto a study within the DIRECT project. Seventy-seven per cent overall also agreed that protecting privacy was important to them, and this was consistent across all countries when stratified. The perception that there were benefits to sharing their genetic information for research was strongly agreed and agreed by 87% of respondents; in contrast, only 46% agreed that there were risks to sharing their genetic information. There were no other significant differences in respondents’ beliefs about privacy or advancing research, and benefits to sharing their data by knowledge of genetics. When stratified, country of origin was significantly associated with all belief statements except the importance of protecting privacy (Table 1), except importance over privacy where there was no significant change in proportions between countries.

Table 1 Beliefs about advancing research and protecting privacy, and risk-benefits assessments to sharing genetic informationa,b
Table 2 Multivariate logistic regressions—differences between countries in importance for respondent’s to decide what data types are shared and who data is shared with (important versus not important a,b)

Importance of control for participants to share data

Forty-two percent of respondents rated having control of what types of data should be shared as either fairly or extremely important, and when stratified by country the results were: 41% in Denmark, 36% in Sweden, 36% in The Netherlands, and 45% in the UK (Fig. 1). However, after adjusting for all variables in the multivariate logistic regressions, none of the countries were significantly more or less likely to want control compared to the UK (see Table 4). Forty-three percent of respondents rated that having control over who their data is shared with was either fairly or extremely important to them, and by country the results were: 42% in Denmark, 44% in Sweden, 46% in The Netherlands, and 42% in the UK (Fig. 2). There were no significant differences in the importance of control for deciding who to share data with compared to the UK after adjusting for all other variables (see Table 4).

Fig. 1
figure 1

Importance of control over types of data shared from the DIRECT project

Fig. 2
figure 2

Importance of control over who data is shared with

Examining associations for importance for participants to control types of data shared

In univariate binary logistic regression models (Supplementary Table S2), our findings suggested that questions about: the importance of protecting privacy; beliefs that there are risks to sharing genetic information; and happiness to share: (a) genetic information, (b) blood test results, (c) lifestyle information, and (d) personal information, were all significant predictors of the importance of control. There were no significant differences between countries compared to the UK in whether deciding the types of data shared was important vs not important (supplementary Table S2).

The pooled country results (Table 3 and Supplementary Table S4) suggested that agreeing that it is important to protect privacy was significantly associated with beliefs concerning control over which data are shared (OR = 1.86, CI (1.38–2.51), p < 0.001). Happiness to share lifestyle and personal information were significantly associated with the importance to control which data are shared (OR = 0.5, CI (0.29–0.84), p < 0.01), and OR = 0.64, CI (0.52–0.80), p < 0.01), respectively. There were no other significant associations between the covariates and importance for control. When results were stratified by country, similar results were found in the Danish cohort, though results in the UK and Dutch cohorts did not reach significance. The sample size for the Swedish cohort was too small to compute the results for comparison (Supplementary Table S4).

Table 3 Multivariate Logistic regression—importance for respondent’s to decide what data types are shared (important versus not importanta,b)
Table 4 Multivariate logistic regression—importance for respondents to decide who data is shared with (important versus not importanta,b)

Examining associations for importance for participants to control who data is shared with

Results from univariate logistic regressions found that importance of control was predicted by belief in protecting privacy, agreement that there are benefits to sharing genetic information, happiness to share with commercial companies and charities. This was consistent in the results stratified by country (Supplementary Table S3). The adjusted model showed that there was no significant association between country and importance of respondents to decide who data are shared with (Supplementary Table S5). Table 4 shows that increased importance for protecting privacy resulted in respondents being more likely to indicate that having control over data sharing was important (OR = 2.26, CI (1.67–3.1), p < 0.001). This was consistent across all countries, except Sweden, which did not yield significant results due to a very small sample (Supplementary Table S5). Respondents in all countries were 1.64 times significantly more likely to also indicate importance of control (data sharing) and believe that there were benefits to sharing their genetic information (p = 0.03). Disagreement that there were risks to sharing genetic information was associated with decreased likelihood for rating importance of control (OR = 0.74, CI (0.59–0.91), p < 0.01). Happiness to share data with commercial companies and charities was significantly associated with rating importance for control (OR = 0.43, CI (0.32–0.56), p < 0.01) and (OR = 0.57, CI (0.39–0.84), p < 0.01), respectively. These results were similar across countries, except Sweden where results were not significant.

Discussion

The current study aimed to assess desire for control for sharing data in relation to motivations (measured by attitudes/beliefs) about advancing research and protecting privacy, and willingness to share data. Where previous research has investigated improving informed consent through tiered choices [8, 12, 22], this study sought to obtain a more granular overview of study participants’ judgements about data sharing, and whether there were differences between individual participants across four European countries. When given the choice to have control, <50% indicated that having control over what data is shared and with whom was important.

The study findings suggest that control over what data types are shared was less important to respondents than deciding who data are shared with. The importance for control over de-identified data sharing found in this study is consistent with other research, which has highlighted that when data are de-identified, fewer respondents expect the need to have control in sharing of their data [14].

Whilst we found that overall desire for control of de-identified data was moderate (<50%), when assessing associations between happiness to share different data types and with research groups, importance for control varied with different options. How participants valued control over data sharing was associated with unhappiness to share data with global universities, commercial companies, and charities that conduct research. A report commissioned by the Wellcome Trust in the UK [23], found that willingness to share data is influenced by trust in the institution and the extent patients are informed about who their data are being shared with, and what aspects, particularly in relation to commercial entities [23]. Therefore, the desire for control over data sharing with particular types of organisations may reflect uncertainty of risks and benefits of sharing data with these groups. We did not investigate trustworthiness of different research groups; however, participants’ support for data sharing to advance research in this study is likely to be determined by the actions of researchers and data repositories, who will need to provide rationale for why data may be shared with separate research groups, particularly with the commercial sector [5]. Offering participants the choice about data sharing may develop trust in the research and researchers. D’Abramo et al. [8] found that attitudes to data sharing become restrictive when more information and options are provided about permitting data in biobanks to be publicly available. Similarly, McGuire et al. [12] reported that participants preferred having multiple data sharing options but were less likely to consent to public data release after being given options. Conversely, Bell et al. [18], having surveyed healthy people about hypothetical data sharing preferences, found that participants wanted to have control over uses of their data to varying degrees, including demographics, lab results, and sensitive information, such as mental health and genetic information. Participants were more willing to share data if they were given choices about what they wanted to share, and a high proportion wanted to know more about the public and private researchers requesting access to their data [18]. Whether the extent of control participants have over data shared affects future research participation requires further investigation.

Further cultural factors may affect preferences for control. Gaskell et al. [5] found that in their pan-EU study, willingness to participate in biobank research was affected by beliefs in risk of misuse of data, and public in southern European countries were less likely to participate than in north-western countries. The study reported here was specifically situated in Western Europe and involved high-income countries with good health systems and nations that are viewed as socially inclusive. Designing international consortia data governance would benefit from understanding cultural attributes, if research aims to be inclusive of participants in data sharing decision making. As these findings show, some aspects of data sharing are consistently agreed upon, such as importance of privacy, whereas others are not (differences between countries in deciding with whom it is acceptable to share data). Furthermore, differences found between countries in this study show the diversity of perspectives about data sharing in different populations. Danish respondents indicated higher odds of importance to control data types shared, and Dutch respondents showed higher odds of importance to control who data are shared with. This means that large consortia sourcing data from culturally diverse countries may find it challenging to consistently oversee how data are shared and managed for future research.

Maintaining privacy is central for governance of data sharing in research; results from this study show that privacy is key to the likelihood of wanting control over sharing data. However, there may be ambiguity in understanding what privacy means across different populations [24]. The discourse about privacy being important now needs to shift to how it can be facilitated and in what context data donors require control over data sharing. Lemke et al. [7] argued that participants wanted control over release of genetic information and that mechanisms to protect privacy needed to be provided. In consideration of this, it is ethically important to provide research participants with options to control sharing their study data, even after its anonymization. This could be facilitated by having simple mechanisms for choosing preferences, and further research about which (and how many) choices are needed, so as not to overburden participants [12]. With divergences in attitudes to control data sharing, how the availability of control mechanisms is facilitated will require addressing. The first step is for international consortia to communicate and engage with participants to assess preferences for data sharing [5, 11, 12, 25].

The importance to engage and involve study participants in research decisions is very timely. One proposed solution to facilitate participant engagement with future data sharing decisions is Dynamic Consent [26]. This approach provides participants with an electronic record of their consent decisions, which can be reviewed and updated at any time. This would allow those that wanted greater control to be more directly involved in real-time decision-making, and could potentially provide an infrastructure to support participants beyond the lifetime of a specific project [27]. Based on the findings of this study, this may be a relevant solution to manage future involvement, and thus would be appropriate to investigate further.

Strengths and limitations

This was a unique study in that it looked at participants already enroled in research to engage their views about how their data should be shared for future research. Few studies have investigated data sharing choices of patients from different countries in Europe. However, there are also a number of limitations that must be discussed. Firstly, the results are not generalisable to other patient or healthy populations or countries. Countries included were in the north-western region of Europe, and there may be marked differences in data sharing opinions with other European countries, and between non-white population groups. Due to the socio-demographic and personal characteristics, participation may have been influenced by already being enroled in DIRECT studies, and data sharing opinions referred to data that would be de-identified. In addition to this, the cross-sectional nature of the study design meant that it was difficult to ascertain whether respondents’ views would change over time and with more information about data sharing options, as we did not investigate the level of awareness respondents had about data sharing for future research. Also, collapsing the Likert survey questions from 5 to binary variables removes nuances in opinions of respondents about a given issue. Respondents’ views could potentially have been influenced by their level of confidence in the effectiveness of de-identification of their data in protecting privacy [14].

Conclusions

As it is responsible practice to obtain informed consent from participants to share their data [12], it should also be responsible practice to involve participants in decisions about how their data should be governed. Our findings indicate that what research participants expect in terms of control over data sharing needs to be considered and aligned with sharing for future research and re-use of data [2]. There is a balance to be struck between protecting privacy and benefits to biomedical research from data sharing [11, 17, 28]. Contributing to the data sharing governance literature, this study argues to move research participants from passive participation in biomedical research to considering their opinions about data sharing and control of de-identified biomedical data. Our findings show that even with de-identified data, respondents prioritise privacy above all else. However, this does not shut data sharing down, this is consistent across all countries investigated. Though some differences between countries in attitudes towards data sharing and need for control were found, it is important not to presume that participants do not wish to be kept informed about study procedures moving forward. These findings will aid the development of future data sharing policy for the DIRECT consortium. While this study was conducted prior to the introduction of the General Data Protection Regulations (GDPR) in Europe, it aligned with the GDPR’s emphasis of understanding the preferences of those whose personal data is processed within the lens of privacy by design. While, consortia must adhere to regulatory governance; it can additionally develop specific data governance practices as appropriate through adopting evidence based and well-supported engagement and involvement guidelines and policies.