Introduction

The past decade has witnessed an increase in international data sharing across biomedical research consortia spurred on by funders and journals to make research data available as rapidly as possible and forced in part by the need for extremely large data sets to detect patterns of health and disease.1 The Global Alliance for Genomics and Health (Global Alliance2), an international coalition dedicated to improving human health by maximizing the potential of genomic medicine through effective and responsible data sharing founded on its Framework for Responsible Sharing of Genomic and Health-Related Data,3 is illustrative of this international drive.

Most public research data resources in genomics have both open and controlled access categories. While open access is typified by the HapMap4 and 1000 Genomes projects,5 controlled data access is used, for example, by the International Cancer Genome Consortium,6 with some data stored in the Database of Genotypes and Phenotypes7 or in the European Genome-phenome Archive.8 A controlled access system mandates review by a Data Access Compliance Office (DACO). Although the use of controlled access has been successful in providing greater access to data, plans for greater integration of data sets and informatics platforms for data-intensive science might well be thwarted in the absence of a more intermediary category that would allow easier access to some data hitherto categorized as ‘sensitive’ and thereby controlled without further qualification or nuance.

Within the Global Alliance, we are developing the concept of ‘registered access’, a novel data access tier that would fall between the now well-established ‘open access’ and ‘controlled access’ (also referred to as ‘managed access’) tiers.8, 9, 10 While not eliminating the need to control access to sensitive or identifiable data, our aim is to expand the currently binary open/controlled approach to protect the privacy of participants and patients and at the same time further the research to which they are contributing their data in a more proportionate manner. We are also focused on responding to the needs of the Global Alliance ‘Demonstration Projects’, scientific initiatives that are being accelerated to demonstrate the value of data sharing, namely: the Beacon Project (http://www.ga4gh.org/#/beacon), Matchmaker Exchange11 and the BRCA Challenge (http://brcaexchange.org). The need for an intermediate category of data and an intermediate data access tier stems from two main considerations. First, the controlled access mechanism is considered too onerous and lengthy a process for access to some types of data that are being shared and brought together by the Global Alliance Demonstration Projects, but that nonetheless do require a level of protection for reasons of privacy. Second, and along similar lines, the degree of oversight required of researchers using controlled access data sets is greater than we envisage would be justified within such a tier for researchers, clinicians and others who may need access to this registered access data. A new registered access tier offers the prospect of enabling rapid access for a wide range of users to all data shared in this way.

Several genomic projects and databases have made use of registration-based systems for access to data. These include the Asthma Gene Database, MedGene and PharmGKB,12 projects participating in the Matchmaker Exchange project such as DECIPHER13 and PhenomeCentral14 and, more recently, the Simons Foundation Autism Research Initiative (https://www.nextcode.com/ssc/). Further development of such approaches to data access was recommended by experts participating in the National Human Genome Research Institute workshop on establishing a central resource of data from genome sequencing projects in 2012.15

The Registered Access Model that we describe here is based on our analysis of applicable research ethics and other legal and administrative frameworks. Its approval process would be considerably simplified compared with controlled access in that some of the multiple steps of the standard controlled access review procedure would either be streamlined or removed. These include, for example, undergoing additional scientific and ethics review. We thereby propose a three-stage approval process for registered access comprising an Authentication, Attestation and Authorization.

Limitations to controlled access

We start by considering the general criteria that are usually checked by Data Access Committees (DACs) and DACOs in the controlled access process and reflect on their impact on data access. These criteria are listed in Table 1 and require a combination of information provided by applicants (see Supplementary Table S1) and assurances provided by the applicants’ host institutions, which assume legal liability for the applicants’ use of controlled access data.

Table 1 List of criteria that are reviewed in controlled access data access by DACs and DACOs

Different types of DACOs exist and may have varying roles, depending on their available resources, the area of expertise of members and the size and nature of the data resource they relate to. For instance, the Public Population Project in Genomics and Society offers DACO services that offer the creation of customized DACOs with the resources and policies required to ensure a complete review of applications for access to controlled data sets, in conformity with the goals and policies of the project, as well as the research participants’ consents. However, in some cases, DACOs may operate on more limited resources and therefore encounter certain limitations to their controlled access review.16 Furthermore, some of the steps of controlled access review are associated with challenges, and they may not be necessary for all data access reviews.

In principle, given the non-exhaustible nature of data, it can be argued that a minimal set of criteria should be envisioned to foster more rapid access to and use of data sets. In this regard, depending on the sensitivity of the data, the necessity of reviewing the scientific merits of research proposals by DACs is questionable. Indeed, funding or research organizations are better positioned to carry out scientific review of research proposals. With the exception of a few large institutes, DACs are often operating on limited financial and human resources, rendering a thorough scientific review difficult if not impossible. Furthermore, in the absence of clearly delineated criteria and procedure for such reviews, the objectivity of decision making for data access could also be undermined.17, 18

The controlled access model can also serve to prevent controversial research uses through DAC review of research proposals.19 Culturally or politically sensitive topics are mentioned as conceivable yet not frequent examples of controversial research uses.16 One can claim such review falls within the scope of ethics review, a task outside the remit of DACs in general. DACs often refrain from adding another layer of ethics review, seeing it as a responsibility of the data users to satisfy the requirements for ethics approval.20 To this end, DACs sometimes require an official ethics approval document from home institutes,21 which have an effective role in ensuring research conducted in their facilities has received ethics approval from competent bodies. The scope of proposed data uses is also subject to review to ensure consistency with the data provider’s objectives and policies and with the original consent of research participants.22 Reviewing this scope is not always straightforward. For example, DACs do not always have access to the consent forms that were used or sufficient resources to interpret them when needed.16 Alternatively, data-use limitations could be more explicitly stated in consent forms and articulated within ethics approvals for data collections. Ethics committees could have a role in controversial cases or when there is ambiguity. Consent-based conditions of data use could also be more clearly conveyed to data users with the use of standardized consent codes.23

Registered access authentication and authorization

Bearing in mind these limitations in the context of controlled access, we propose that the review process in registered access would mainly be concerned with the qualifications of applicants for access to data. This level of review would require an assessment of the likely ethical and legal risks of data misuse (based on consent, identification risk and data sensitivity). For example, for the Beacon Project and Matchmaker Exchange, data uses may be constrained by the way in which, and how much, data can be queried. Therefore, several controlled access review criteria addressing ethics review and grouped under ‘Ethics’ in Supplementary Table S1 may no longer apply. Our Registered Access Model is also particularly suited to access to data resources where data are not ‘distributed’, thereby addressing concerns underlying the third category of review in controlled access: ‘Security’ (see Supplementary Table S1).

Registered access in health research can also draw guidance from national statistics institutes providing researchers with access to microdata. Their access processes reflect strict confidentiality requirements, as access is authorized through legislation rather than consent. Secure access to statistical microdata has been modeled along five dimensions: safe data, person (researcher), project, infrastructure and output.24 A registered access process would focus primarily on ensuring the data user is trusted. By verifying that a data user is bona fide, one can to some extent impute that the project, security infrastructure and output will also be safe. In a data sharing context where the risk of identifying participants, or the sensitivity of the data, is low, additional review of these other dimensions may be redundant, or at least disproportionate. In other words, where the data are ‘safe’ and the data user is ‘trusted’, other aspects of secure access do not need to be heavily scrutinized.

A second observation from the access to microdata literature is the importance of the accountability relationship between data steward and researcher. National statistics organizations typically rely on statutory penalties and data access contracts to hold national researchers legally accountable. The enforceability of both comes into question when data are shared across borders. Administrative accountability – relying on host institutions to impose administrative sanctions for non-compliance – remains as a meaningful form of accountability. Host institutions may be held accountable by data stewards through reputation and contract to in turn ensure the quality and integrity of their activities. Trust and verification of the research, then, could largely be a process of verifying the host institution, through limiting registered access to a trusted network of institutions, or by scrutinizing the credentials of institutions on a case-by-case basis. In genomic research, host institution verification is likely to be established on a case-by-case basis, as no formal accreditation bodies for ‘safe institutions’ or ‘safe researchers’ exist. In the controlled access model, this verification is reinforced with DACO review of researcher competence and local ethics compliance. In short, a robust method of institution verification or accreditation may be an important factor for a responsible registered access scheme. On the other hand, it raises questions about barriers that might ensue from increasing reliance on institutional affiliation.

Furthermore, registered access could be founded on a simple self-declaration system (ie, an attestation) for issues such as verifying compliance of the research with local ethical standards and procedures, adhesion to consent restrictions on the scope of data use and obligations not to reidentify anonymized data. Simple unilateral contractual commitments required of the data user can promote direct accountability of data users, without significantly increasing the arduousness of the access process for them. These commitments could easily cover much of the same requirements imposed through access agreements: that the data user agrees to comply with all applicable regulatory requirements (eg, has ethics approval if applicable); understands and respects data use limits; will not attempt to reidentify the participants; will take reasonable steps to protect the data from unauthorized access and delete it when the time period of approval has expired. This approach clarifies responsibilities and imposes a backstop of contractual accountability comparable to that found within a controlled access framework.

We therefore propose that the registered access criteria shown in Table 2 should constitute the general and basic framework for the Registered Access Model.

Table 2 Proposed registered access criteria

Perhaps, the most challenging aspect of introducing an intermediate data access tier will therefore be defining suitable inclusion criteria for registered data users, which we envisage will include researchers in academia and industry, and different groups of clinical care professionals (eg, doctors, genetic counselors). It may be necessary to leave the specification of the required level of ‘competence’ to groups implementing registered access or to establish a few standardized levels. A few key pieces of administrative information will be important for authentication processes. However, it is currently unclear what should be required to demonstrate bona fide researcher or clinical care professional status. Evidence of academic publication has typically been requested in controlled access review (Dyke et al. in press),25 but a goal of registered access is to broaden access to a greater number of researchers as well as to provide quick and easy access to less sensitive data. Academic publication records, even minimal, preclude access by many including a large section of clinical professionals and students.

Registered access attestation

Registered access would not involve the execution of a Data Transfer or Access Agreement (DAA) between data providers and data users as required in controlled access. In a context where most of the interactions in the data sharing environment are taking place online, the concept of registered access could provide an appropriate ground for the use of online agreements, setting the terms of use for data users who wish to access registered data. Until now, paper-based applications, meetings of DACs and DAAs have been the primary basis governing contractual relations between the data providers, users and their institutions. This can be an administratively heavy step for both the data user and institutions.

Clickwrap-type online agreements, for example, in the form of web-based agreements requiring the end-user to manifest consent by clicking an ‘I agree’ checkbox option at the end of a contract, are generally well documented and used for a variety of online transactions such as purchasing flights online. Although they constitute a legally binding agreement in many jurisdictions, with specific laws applicable to online contracts, their validity and enforceability may however vary from one jurisdiction to another and clickwrap agreements may not be enforceable in all countries. There are instances where these have been used to set conditions and terms of use for access to open access genomic databases.26 For example, at the outset, the HapMap database used an open source data access policy in a clickwrap format.27, 28 Essentially, the HapMap project used a clickwrap licence agreement until all of the data was placed in the public domain, at which point the agreement was abandoned as a requirement for access. While not yet the standard medium for DAAs, these types of online agreements could arguably allow for a more balanced approach to access agreements by creating rapid, open and efficient access to data. They may also help in providing users with clear and upfront instructions on the use of the data. Standard registered access Attestation statements are listed in Table 2 (see 1c–3c) as an example of conditions that would form the core of a registered access agreement. This simple form of agreement would be strengthened by more detailed terms and conditions available on the website that registers the user’s attestation.

Registered access could provide an interesting case for the implementation of such agreements. For instance, an efficient mechanism of clickwrap agreement enforcement when a breach or misuse is discovered is denial of access to the database by the user who has been identified and authorized.27 A feature that would further enhance registered access would be to limit registration for 1 year, so as to renew authorization annually.

The registered access Authorization process would include verifying that the Attestation has been completed. Depending on the other elements requested, we envisage an officer rather than a committee would be responsible for a formal rather than a substantive review for Authorization, with referral to a controlled access review process if applicants fall outside standard registration criteria.

Conclusions

Improving access to health-related data must involve a careful calibration of protections, bearing in mind the public benefits of health research and indeed the rights of scientists and citizens alike to participate in, and to benefit from, scientific research.29, 30

Registered access is likely to be suitable as a mechanism for access to data types that are less sensitive, low risk data, such as non-stigmatizing health-related data from non-vulnerable individuals who would expect, or have consented to, data sharing for the purposes envisaged.31 It could also be a valuable tool to provide tiered access to different types of data users, including researchers and clinicians, and for access to multiple data sets as well as to facilitate data discovery. We aim to develop the Registered Access Model further through implementation and customization with the Global Alliance Demonstration Projects and, in particular, attention to the requirements for its clinical use.

Although not the primary aim, formalising our understanding of registered access may also contribute to improving and streamlining the controlled access process, if only by reducing pressure on DACOs and the controlled access system. Most importantly, in providing clarity to ethics governance bodies and other research partners, thus enabling this novel data access tier, projects for which as a lesser degree of data access review is warranted will be able to benefit from registered access.