Perceptions and behavior of clinical researchers and research support staff regarding data FAIRification

The FAIR Data Principles are being rapidly adopted by many research institutes and funders worldwide. This study aimed to assess the awareness and attitudes of clinical researchers and research support staff regarding data FAIRification. A questionnaire was distributed to researchers and support staff in six Dutch University Medical Centers and Electronic Data Capture platform users. 164 researchers and 21 support staff members completed the questionnaire. 62.8% of the researchers and 81.0% of the support staff are currently undertaking at least some effort to achieve any aspect of FAIR, 11.0% and 23.8%, respectively, address all aspects. Only 46.6% of the researchers add metadata to their datasets, 39.7% add metadata to data elements, and 35.9% deposit their data in a repository. 94.7% of the researchers are aware of the usefulness of their data being FAIR for others and 89.3% are, given the right resources and support, willing to FAIRify their data. Institutions and funders should, therefore, develop FAIRification training and tools and should (financially) support researchers and staff throughout the process.


Introduction
Since their publication in 2016, the FAIR Data Principles are being rapidly adopted by many research institutes and funders 1 . The principles state that data should be FAIR, both for humans and machines 2 , and the need to organize and manage data according to the principles is becoming more important in the data-driven landscape of medicine 3 . There are many ongoing efforts that directly or indirectly contribute to the objective of making FAIR Data a reality 4 .
Researchers are increasingly asked or obliged to make their data (more) FAIR, either by funders (e.g. Horizon Europe 5 and The Dutch Research Council (NWO) 6 ) or their institutions (e.g., the Leiden University Medical Center 7 and Radboud University 8 ). Research institutes often support individual researchers in the FAIRification of their data by offering help from trained research support staff (e.g., data stewards) that assist them during the FAIRification process 3 . However, since FAIR is not a standard and the principles do not offer explicit guidance on the methodologies that should be used for data FAIRification, there are many different approaches to FAIRify data (i.e., make data FAIR) 9,10 . Various workflows have been published to guide researchers and researcher support staff during the FAIRification process [11][12][13] . For the majority of the steps in these workflows, different types of expertise are required, and these steps should, therefore, be carried out in a multidisciplinary team guided by FAIR data steward(s), instead of by individual researchers or research support staff 12,14 . We, therefore, identify a gap between the expectations of funders and institutions and researchers' and research support staff 's abilities to make data FAIR.
In order to determine what resources are needed to bridge this gap, more information about the knowledge and perceptions of researchers and research support staff (staff supporting researchers with data management and data stewardship tasks and questions, e.g., data stewards and data managers) regarding data FAIRification is needed. To our knowledge, this information has not been collected before. The aim of this study was, therefore, to gain insight into the current efforts of researchers and research support staff to share data and to make the data they collect more FAIR, through an online questionnaire. In addition, this study focuses on the perceptions of researchers regarding data FAIRification and their familiarity with their organization's Research Data Management policy.

Methods
Development and testing of the questionnaire. To determine the perceptions and current efforts of researchers and research support staff, an online English questionnaire was developed. The questionnaire consisted of six parts: 1) Demographics, 2) Familiarity with data sharing, 3) Familiarity with the organization's research data management policy, 4) Familiarity with the FAIR Data Principles, 5) Effort spent in making data FAIR, and 6) Statements regarding FAIR Data. Five stakeholders involved in clinical research from different institutes (two PhD candidates, one post-doctoral researcher, one data management consultant, and one data steward) tested the questionnaire and we modified the questionnaire based on their feedback.
Effort spent in making data FAIR. Researchers and research support staff were asked to report their current efforts in making data Findable, Accessible, Interoperable, and Reusable. Since there is no established quantitative measure available for self-reporting FAIRification efforts, we used a four-point Likert scale for every aspect of FAIR (no effort, very little effort, some effort, a lot of effort). Box 1 lists the descriptions of the aspects of FAIR that were used throughout the questionnaire.
Statements regarding FAIR data. Since data FAIRification is largely a standardization process, where one makes sure that data are exposed in a standardized manner and accompanied by structured and standardized metadata, we used an adapted version of Joukes et al. 's model 15 Fig. 1 Adopted part of the model described in 15 that focuses on working processes and human attitudes. Adapted parts are highlighted. Box 1. Description of the aspects of FAIR used in the questionnaire, based on 2 .

Findable
Data should be easy to find by both humans and computer systems based on a description of the dataset (metadata).

Accessible
Data should be stored for the long term such that they can be easily accessed with welldefined license and access conditions. This does not necessarily mean open access.

Interoperable
Data should be ready to be combined with other datasets by humans as well as computer systems.

Reusable
Data should be ready to be used for future research and to be processed further using computational methods.
professionals and is based on two models that use the TAM as underlying model, which aims to explain how users come to accept and use a technology. The model was used to both gain insight into the perceptions of researchers and research support staff regarding data FAIRification (or technology in the TAM) and to predict how these perceptions influence the behavior of researchers. We adopted the part of the model that focuses on working processes and human attitudes (Fig. 1). Experienced usefulness was added as an additional factor for Perceived usefulness, for if FAIR Data aided in previous research projects, it might influence the perceived usefulness. Additionally, Governmental influence was replaced by External influence, for there are more organizations that can influence the Subjective norm of a researcher or supporting staff (e.g., Funders). Institutional trust was removed from the model, since we hypothesized that Situational normality and Structural assurance for data FAIRification are not affiliated with the trust of researchers and staff in their institution. Descriptions of each factor in the model are listed in Table 1. Each question was a statement that could be answered on a five-point Likert scale (strongly disagree (1), disagree (2), neutral (3), agree (4), strongly agree (5)). All statements and their associated factors are listed in Supplementary Table 1.
Participants and data collection. Participants were recruited through an invitation in a popup in EDC platform Castor EDC 16 and by email in six out of the seven Dutch University Medical Centers. We used a generic invitation text focusing on Research Data Management instead of FAIR Data, in order to ensure that potential participants that did not know about FAIR were also included. Researchers and staff were eligible to participate in our study if they: 1) set up databases for clinical research (e.g., PhD candidates), or 2) support clinical researchers with the creation of these databases (e.g., data stewards). Users of the EDC platform that have previously built a database in the system, among which researchers and research support staff, received a popup after logging in to the system asking them to participate in the study. Research data desk coordinators of five University Medical Centers sent out invitation emails to a random group of researchers and research support staff in their center. PhD student associations of three University Medical Centers sent out invitation emails to their members. In order to enhance the response rate, a lottery-based incentive of ten 50 gift cards was offered. Data was collected in Castor EDC 16 between November 27, 2020, and February 27, 2021.
Ethical aspects. The study design was submitted to the Medical Ethics Review Committee of the Academic Medical Center in Amsterdam, the Netherlands, and was exempt from review (reference W20_499#20.550). Consent of respondents was required before the questionnaire could be opened.
Data analysis. Statistical analyses were performed using R (version 4.0.2) 17 . Questionnaires with complete answers to mandatory questions were included in the analysis. A p-value below 0.05 was considered significant. The p-values were adjusted for multiple testing using Bonferroni correction.
Perceptions of researchers and research support staff. The scores per factor of researchers with knowledge of FAIR and researchers without knowledge of FAIR were compared with the Mann-Whitney U test.
Behavior of researchers. Similar to Joukes et al. 15 , we performed SEM using the PLS method. PLS is a causal-predictive approach to SEM and is designed to provide causal explanations 18 . We used PLS because it is designed to provide causal explanations and predictions 18 and works efficiently with small sample sizes 19 . In PLS terminology, the statements about FAIR Data in our questionnaire are the observed variables that are linked to www.nature.com/scientificdata www.nature.com/scientificdata/ the latent variables, which are the factors in our model. The plspm R package 20 (version 0.4.9) was used for the analysis. Data from researchers was used, for researchers are responsible for (FAIR) data management throughout their research project, while support staff only assist them in this process.

Model concepts Explanation
In order to determine the performance of the model we used the performance measures and their target values described by Hair et al. 21,22 . Indicator reliability was derived from the outer loadings of the observed variables. A loading > 0.708 indicates that the observed variable explains more than 50% of the latent variable's variance and is, therefore, recommended. Discriminant validity was assessed using Henseler et al. 's HTMT ratio 23 , where an HTMT ≤ 0.90 (or ≤ 0.85 if the latent variables are conceptually more distinct) indicates discriminant validity. Internal consistency reliability was calculated using Dillon Goldstein's rho, where a value > 0.60 is considered acceptable in exploratory research. The convergent validity of each latent variable was calculated using the AVE. An acceptable AVE is ≥ 0.50, which indicates that the latent variable explains at least 50% of the variance of its items.

Results
In total, 312 respondents opened the questionnaire. 215 respondents completed the questionnaire and were included in the analysis (Fig. 2). Of these, 76.3% were conducting research themselves and 9.8% had a supporting function ( Table 2). The remaining 14% had another profession (e.g., doctors and nurses). The majority of the respondents were working in an academic hospital (84.2%). Of all respondents, 60.5% had heard of the FAIR Data Principles and 44.7% reported that they knew their definition.
Data sharing. Of all researchers, 79.5% stated that they have shared "raw" research data with others before ( Table 3). The most common ways to share data were via a shared network drive in the researcher's institution (56.2%) and via email (43.8%). 35.9% of the researchers that have shared data, shared data via data repositories (28.1% and 11.7% for an internal and external data repository respectively). 14.1% of the researchers that share data have published their data as a scientific publication, either as stand-alone data publication (2.34%) or appendix or supplementary information appended to a publication (14.1%). Other sharing methods that researchers mentioned in free text include a secure file transfer service (4.69%), an Electronic Data Capture system (1.56%), FTP transfer (0.78%), and an institution's Digital Research Environment (0.78%).
92.2% of the researchers that share data do so with coworkers that are working on the same research project. 18.0% of the researchers that share data do so with others that are not working on the same research project, both known and not known by the researcher (12.5% and 7.81% respectively). Other recipients of the data include (reviewers of) scientific journals (1.56%) and students that are supervised by the researcher (0.78%).

Research Data Management policy.
A majority of the researchers (85.4%) and support staff (90.5%) stated that their institution had a policy regarding Research Data Management (Table 4). 1.83% of the researchers stated that their institution had no such policy. 12.8% of the researchers and 9.52% of the support staff were not aware of such a policy or were unsure. 86.4% of the researchers that were aware of their institution's policy knew whom to contact if they had questions regarding this policy. For policy-aware research support staff, 100% knew whom to contact in case of questions. 55.7% of the researchers indicated that FAIR Data was incorporated in this www.nature.com/scientificdata www.nature.com/scientificdata/ policy, 2.86% indicated that it was not part of the policy and 41.4% did not know if FAIR was part of the policy. For research support staff, this was 73.7%, 5.26%, and 21.1% respectively.
Effort spent in making research data FAIR. All respondents were given a short explanation of the FAIR Data Principles and each aspect of FAIR. After reading this explanation, 81.1% of the researchers and 81.0% of the research support staff reported that they have spent effort (very little to a lot) to make their research data -or the research data they work with -more Findable, Accessible, Interoperable, or Reusable (any aspect of FAIR). Regarding all aspects of FAIR, this was 25.6% and 33.3%, respectively. 62.8% of the researchers and 81.0% of the research support staff achieved any aspect of FAIR with some or a lot of effort. For all aspects this was 11.0% and 23.8%, respectively. Figure 3 shows the effort spent per aspect of FAIR of both groups. Of all researchers, 51.8% spent effort to make their data more Findable, 57.3% spent effort to make their data more Accessible, 47.0% spent effort to make their data more Interoperable, and 60.4% spent effort to make their data more Reusable. For research support staff this was 52.4%, 57.1%, 71.4%, and 66.7% respectively.
Perceptions of researchers and research support staff. Table 5 lists the scores per latent variable (factor in the model) for researchers and research support staff. In six cases there was a significant difference in score between researchers with knowledge of the FAIR Data Principles and researchers without this knowledge (Awareness, Compatibility, External influence, Facilitating conditions, Interpersonal influence, Structural assurance). Research support staff had higher scores for all factors compared to all researchers, with the exception of Experienced usefulness and Perceived risk. Supplementary Table 2 lists the mean scores per observed variable (question in the questionnaire) for researchers and research support staff. The distribution of the scores can be found in Supplementary Table 3.  www.nature.com/scientificdata www.nature.com/scientificdata/ Attitude of researchers. Of all researchers, 94.7% stated that their research data being FAIR would be useful for others and 73.3% stated that it would be useful for themselves. With regard to other people's research data being FAIR, 90.1% of the researchers found this useful for others and 85.5% of the researchers found this useful for themselves.
Self-efficacy of researchers. Figure 4 shows the self-reported self-efficacy of the researchers in making data FAIR. 35.1% of the researchers state that they are able to make their data more Findable, Accessible, Interoperable, and Reusable by themselves. Regarding the individual aspects, F, A, I, and R, this is 41.2%, 40.4%, 36.6%, and 48.1%, respectively. 81.6% of the researchers state that they are able to make their data more FAIR when they receive help. For the individual FAIR aspects this is 81.0%, 81.0%, 80.2%, and 80.2%, respectively.
Intention to act of researchers. 89.3% of all researchers state that they would intend to make their research data FAIR when they have the possibility and resources to do so. 59.5% would still make their research data FAIR when they have to invest an extra amount of their own time and 27.5% would still do so when they have to spend an extra amount of their budget. If researchers would have to spend both an extra amount of their time and budget, 24.4% of them stated that they would still intend to make their data FAIR.    Table 6), which indicates that these latent variables explain at least 50% of the variance of the associated observed variables. All latent variables had a HTMT ratio ≤ 0.85 (Supplementary Table 7), indicating that there are no discriminant validity problems. Researcher's awareness of whom to contact or where to find help regarding data FAIRification (Awareness and Facilitating conditions) have higher loadings on the latent variable Structural assurance. Being able to make data more FAIR (in general), Findable, Accessible, and Interoperable by yourself (Self-efficacy) have higher loadings on the latent variable Perceived Ease of Use. Lastly, being able to make data more Reusable by yourself (Self-efficacy) has a higher loading on the latent variable Compatibility. Figure 5 shows the model with path coefficients and coefficients of determination. Path coefficients indicate the effect of a latent variable that is assumed to be a cause on another variable. The coefficients of determination  Table 5. Factors from the model (latent variables) and their scores for researchers and researcher support staff. Scores on a five-point rating scale (strongly disagree (1), disagree (2), neutral (3), agree (4), strongly agree (5)). *p < 0.05, **p < 0.001, ***p < 0.001. † Only applies if the researcher (all, with FAIR knowledge, and without FAIR knowledge) or research support staff member has spent effort to make data more FAIR. N = 133, N = 61, N = 72, N = 17, respectively.
www.nature.com/scientificdata www.nature.com/scientificdata/ explain the strength of the linear relationship between two latent variables. Seven out of 17 path coefficients were significant. The coefficients of determination (R 2 ) ranged from 0.043 (Perceived behavior control) to 0.329 (Subjective norm).

Discussion
To gain insight into clinical researchers' and research support staff 's knowledge and perceptions of data FAIRification and experience with making data FAIR, we conducted an online questionnaire. We found that 60.5% of the respondents had heard of the FAIR Data Principles and, after explaining the FAIR Data Principles to the respondents, 62.8% of the researchers and 81.0% of the support staff have spent at least some effort achieving any aspect of FAIR for their data. For all aspects of FAIR this was 11.0% and 23.8%, respectively. Nearly all researchers (94.5%) found the FAIRification of their data useful for others and 89.3% would like to make their data FAIR when they have the opportunity and resources. However, the majority of the researchers (81.6%) stated that they need help to do so.

FAIRification efforts.
A remarkable amount of researchers (81.1%) and support staff (81.0%) stated that they have spent very little or more effort into achieving any aspect of FAIR for their research data. However, we believe that very little effort spent in the FAIRification of data should be interpreted as no effort, for one may be more inclined to say that they have at least spent a bit of effort in making data FAIR, instead of no effort at all. When very little FAIRification efforts are neglected, we see that only a minority of the researchers (11.0%) and support staff (33.3%) spent 'real' effort on achieving all the different aspects of FAIR. Regarding these aspects, most researchers spent their efforts on improving the data's 1) Reusability, followed by 2) Findability and Accessibility (equal), and 3) Interoperability. For research support staff this is 1) Reusability, 2) Interoperability, 3) Accessibility, and 4) Findability. Since Reusability is a result of the other three aspects, it is logical that most effort is spent on that aspect. The effort spent on Interoperability is coherent with the fact that it is common that support staff assist researchers in making data more Interoperable. Data sharing. Most researchers have shared data before, but only a minority (18.0%) shared data with others not working on the same research project. Moreover, researchers prefer sharing data via shared network drives and email over the use of data repositories and data publications, which hampers data findability and introduces security risks. In addition, only a minority of the researchers stated that they have spent some effort in increasing the findability of their data and a minority stated that they add metadata to their datasets and data elements. To achieve a FAIR Data ecosystem, data should be accompanied by metadata, and both the data and metadata should be findable in both a human-and machine-readable manner 2 . In the cases where the data cannot be shared directly, which might often be the case in medical research due to the sensitive nature of health data, the sharing of metadata can already be useful. Furthermore, de-identified and anonymized datasets that aim to preserve the patient's privacy can be used for sharing, or data sharing agreements can be made to ensure the data is handled safely 24 . Research institutes should offer support and the technical infrastructure for researchers to make their metadata or de-identified and anonymized datasets available, as well as to support them in the drafting of the aforementioned agreements.
Self-efficacy and need for support. Nearly all researchers are willing to make their data FAIR, when given the right resources to do so. One of these resources includes help from experts, for the self-reported self-efficacy of researchers increases from 35.1% to 81.6% when help is offered to them. Budget and time are important as well. When researchers do have to spend their own budget, time, or both, on data FAIRification, their intention to act decreases tremendously. Researchers should, therefore, be made more aware of the fact that Research Data Management, including the FAIRification of their data, should be budgeted in a study 5 . Funders play an important role here, since they can explicitly point this out to researchers and, if necessary, allocate extra funding for them. Additionally, institutes should allocate more time and budget to FAIR Research Data Management and appoint support staff, such as data stewards and semantic data experts, that can assist researchers in the FAIRification process. We notice that support staff is increasingly recruited in institutes, however, our results also show that support staff still need training for their self-reported self-efficacy and perceived ease of use is low.
Behavior. With an R 2 of 0.127 the predictive power of the model regarding the behavior of researchers is low.
However, the paths marked as significant in our model are relevant in practice. Researchers that believe that FAIR Data is useful for them do have a more positive attitude regarding data FAIRification. When researchers have a positive attitude towards FAIR, they are more willing to start making their data FAIR. The cross-loadings indicate that observed variables, currently associated with Awareness, Facilitating conditions, and Self-efficacy, could be removed to improve the accuracy of the model. We, however, believe that latent variables in our model can be related to one another. For example, making data FAIR by yourself (Self-efficacy) also explains if the FAIRification process is clear and understandable (Perceived ease of use). Future research should aim to determine, in depth, what steps researchers currently take to make their data FAIR, how effective they are in that (Self-efficacy), and how easy they find the steps they take (Perceived ease of use).
Since making data FAIR is still quite new to researchers, and training and tools for FAIRification still have to be developed, we believe that focusing on the researcher's intention to act is, at this point, more meaningful than the final behavior of the researcher. The behavior of the researchers is currently rather low: only a minority of them add metadata to datasets or data elements, a minority includes FAIRification plans in a DMP, and only a www.nature.com/scientificdata www.nature.com/scientificdata/ small amount of researchers deposits data in a repository. In addition, an analysis of the free-text questionnaire responses, described in 25 , showed that the majority of researchers' FAIRification efforts are focused on human readability. Only a minority is focused on achieving machine readability, for example by submitting metadata to a data repository or using standardized terminologies for describing data elements 25 . However, these metadata are essential to making research data findable and reusable for others 26 , as they allow the data owner to describe the degree to which any piece of data is available for reuse 9 .
We believe that once there are more tools, documentation, and help available that guide researchers through the FAIRification process (Structural assurance and Facilitating conditions), researchers will feel more empowered to add and publish metadata, include FAIRification plans in DMPs, and deposit data, and thus make their data more FAIR (Perceived ease of use and Self-efficacy), and will, therefore, have an increased intention to act. Once researchers are empowered and receive support to perform data FAIRification, the behavioral aspect in the model will also change.
Strengths and limitations. This study has several strengths. First, we used an existing, validated model based on two established Technology Acceptance Model-based models. Second, we directly invited researchers that are responsible for data management, and thus FAIRification, via an EDC platform and Research Data Management departments in University Medical Centers. Lastly, we did not include any reference to FAIR into the invitation emails and popup, which made sure that potential participants that did not know about the FAIR Data Principles were also inclined to participate in the survey.
However, this study does have some limitations. The few adaptions made to the model were not validated, although the original model has been validated before. In addition, our study was carried out in The Netherlands. Although we believe our results are generalizable to other countries, there may be differences between the Dutch and other research communities. Because the FAIR Principles had their origins in the Netherlands and adherence is required by the Dutch Research Council 6 , it is reasonable to assume that the Netherlands has a higher rate of acceptance and awareness of the Principles than other countries. To comply with the General Data Protection Regulation, potential respondents were not directly contacted by the research team and were invited via a popup in an EDC platform or email by contact persons in the University Medical Centers. We are, therefore, unable to determine a response rate for our questionnaire and unable to verify the background of the respondents. Lastly, the questionnaires that were partially filled in were excluded from the analysis, which could have introduced selection bias.
Findings in relation to other studies. Our findings are consistent with Trifan and Oliveira's overview of the adoption and impact of the FAIR principles in the area of biomedical and life-science research 27 . Similar to our results, they found that the adoption of the FAIR Data Principles is an ongoing process within the biomedical community. A part of the factors that drive or inhibit researchers to share and reuse research data identified by Zuiderwijk et al. 28 can also be observed in our SEM results: the attitude of researchers (their personal and intrinsic motivation) is the main driver for researcher's intention to act. In addition, we found a significant difference in the interpersonal influence of researchers that have FAIR knowledge compared to researchers without this knowledge. Facilitating conditions, another factor identified by Zuiderwijk et al. 28 , also significantly differed between researchers with FAIR knowledge and without FAIR knowledge, which can indicate that in institutions that facilitate FAIRification, researchers are more aware of what FAIR is. However, this factor does not have a significant effect to the intention to act in our model. This can be due to the fact that making data FAIR is a relatively new concept to researchers, and that there are limited tools and help available to guide them through the FAIRification process. Lastly, our recommendation to focus more on training and tooling to make data FAIRification more accessible and comprehensible for researchers resonates with Boeckhout et al. 's advice to facilitate and organize data sharing using metadata standards and platforms 29 . Unanswered and new questions. While setting up the questionnaire, we discovered that there were no concrete self-reporting instruments available for clinical researchers that assess the effort one took to make their data FAIR. The FAIR assessment instruments that are available are rather technical, such as the FAIR metrics 30 , or generic, such as the Self-Assessment Tool to Improve the FAIRness of Your Dataset 31 . Since there was a lack of a better alternative, we, therefore, used a 4-point Likert scale in our study. We believe that each research community should develop comprehensible checklists and tooling for researchers in that community that include domain-specific information and standards.
We found a high percentage of researchers that spent at least some effort on the FAIRification of their data. However, in our experience, researchers still struggle with the FAIRification process and do not make their data FAIR at all. Future research should focus on identifying the different types of FAIRification efforts that researchers spend and determining if they actually contribute to data and metadata that is more FAIR. Lastly, since our questionnaire did not focus on a specific type of research data (e.g., genomics, imaging, or Case Report Form data), we were unable to compare FAIRification efforts and efficacy across different research contexts or data types. This comparison, however, is helpful for tailoring training, tools, and support to different groups, and future research should focus on this as well.

Conclusion
Clinical researchers and research support staff are currently undertaking efforts to make collected data more FAIR, but the majority of them do not add human-readable and machine-readable metadata to their datasets and data elements, nor deposit their data in a repository, hampering reusability. However, a majority of the researchers are aware of the potential and usefulness of their data being FAIR for others and stated that they www.nature.com/scientificdata www.nature.com/scientificdata/ would ensure that their data is FAIR if they receive the right resources and help to do so. Institutions, funders, and other organizations involved in clinical research should, therefore, make training researchers and research support staff in Research Data Management and data FAIRification a priority, as well as motivating and (financially) supporting them to FAIRify and share their data. Additionally, tools should be developed that guide the researchers and support staff through the FAIRification process in a user-friendly manner.

Data availability
The questionnaire used for and the data generated or analysed during this study, with the exception of free-text questionnaire responses are available through Figshare 32 .

Code availability
Our R scripts, used for the analysis of the data, are available on GitHub (https://github.com/martijnkersloot/ perceptions_analysis).