Introduction

The increased availability of genetic tests has made the assessment of their performance crucial for clinical and public health practice. However, the evaluation of genetic tests, especially predictive ones, is not straightforward. The main challenge is the lack of scientific evidence on which to base such evaluations [1]. Generating scientific evidence on genetic tests is made difficult by different factors including their complexity, their rapid development and marketing, their widespread impact on families and society, and the lack of standardized outcomes for their evaluation [1]. For predictive genetic tests, perhaps the greatest challenge is to perform high quality, randomized control trials to demonstrate that the test confers an improvement in survival or quality of life [2]. Moreover, the lack of evidence on effectiveness affects the evaluation of cost-effectiveness.

Despite this, several frameworks have been proposed for the evaluation of genetic tests, but it is unclear how and in what respect they differ. The importance of a transparent and well-planned evaluation strategy is twofold. On the one hand, it would avoid the uncontrolled implementation of technologies without proven benefits, which can lead to inappropriate management of patients and detrimental effects on patient health, as well as a waste of resources and loss of public confidence in the medical profession. On the other hand, in line with the requirement for public health programs to maximize population health benefıts, a reliable evaluation strategy would support the implementation of those currently available tests that have proven effectiveness and cost effectiveness [3].

To guide the appropriate translation of genomics into clinical practice, Italy developed a National Plan for Public Health Genomics, which, to our knowledge, is the first specific policy example of its kind in Europe. It has various strategic objectives including the development of a well-planned evaluation strategy for genetic tests [4]. Our systematic review was conducted as part of a project financed by the Italian Ministry of Health to implement this plan and aims to identify and compare the existing evaluation frameworks for genetic tests, taking into account their methodology and evaluation criteria.

Materials and methods

This review was performed according to the Cochrane Handbook for systematic reviews of intervention and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [5, 6].

Selection criteria

We included any document that describes an original evaluation framework for genetic tests, defined as a structured process for the collection of the scientific evidence needed to assess the performance of a genetic test, from the laboratory to clinical practice. We excluded partial evaluation frameworks, defined as those embracing less than three evaluation components (analytic validity, clinical effectiveness, etc.). We limited our search to frameworks specifically created for the evaluation of genetic tests.

Search methods

Two reviewers (EP and CD) searched the bibliographic databases Pubmed, Scopus, ISI Web of Knowledge, Google Scholar, and the world wide web through Google for all English language articles between January 1990 and April 2017. The search terms were grouped as two strings: String A, “genetic testing OR genetic test OR genomic test OR genomic technology OR pharmacogenetic test” AND “evaluation OR assessment OR evaluating OR assessing OR evaluate OR assess” AND “framework OR criteria OR tool OR model OR process OR methods OR evidence based” OR “analytic validity OR clinical validity OR clinical utility”; and String B, “genetic testing OR genetic test OR genomic test OR genomic technologies OR pharmacogenetic test OR public health genomics OR pharmacogenetics OR pharmacogenomics” AND “health technology assessment” (see Supplementary Information for the full electronic search for each database). This search was supplemented by exploring the websites of government agencies and research organizations involved in the evaluation of genetic tests (see Supplementary Information for a list of the respective websites) and by scanning the reference lists of all the relevant articles retrieved. Moreover, experts of the Italian Network of Public Health Genomics were asked to share the evaluation frameworks they were aware of through a Delphi procedure.

Study selection

The two reviewers (EP and CD) removed duplicates and screened the title and abstract of all retrieved records. Studies that clearly did not meet the eligibility criteria were excluded. Full texts of potentially relevant studies were examined for inclusion in the systematic review and reasons for exclusion recorded. Disagreements were resolved by discussion.

Data collection and analysis

Two reviewers (EP and EDA) extracted the following information about the retrieved frameworks: authors; country; year of publication; reference institution; framework name; type of target test; reference frameworks; methodology (format, sources of evidence, quality of the evidence, grading of recommendations, research priorities); practical application; purpose; primary audience; evaluation components (see Supplementary Information for a definition of each category of information extracted). A narrative synthesis of the evaluation frameworks identified was performed, comparing their general features, evaluation components, and methodological aspects.

Results

Study selection

After removal of duplicates, 6027 records resulted from the initial search (Fig. 1). Screening by title and abstract selected 289 records for full text analysis, from which 30 records were selected. Reasons for exclusion were: documents not describing a framework for the evaluation of genetic tests; appraisals of individual genetic tests using an original framework described elsewhere; partial evaluation frameworks; documents focusing on only one evaluation component; broader frameworks of implementation research not proposing an original evaluation process; guidelines on the evaluation of genetic tests; reviews and commentaries of evaluation frameworks and evaluation criteria for genetic tests; full text not available. Six records were added to the previous 30 from the reference lists of relevant articles retrieved. A total of 36 records were included in the systematic review, describing 29 frameworks for the evaluation of genetic tests (some records describe the same framework) [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42]. The Delphi procedure did not add new frameworks to those already retrieved.

Fig. 1
figure 1

PRISMA flow diagram of the review process

Frameworks retrieved

The systematic search identified 29 frameworks from various countries (USA, n = 12; Canada, n = 4; Europe, n = 9; Australia, n = 2; international, n = 2) published between 2000 and 2017 (Table 1).

Table 1 Frameworks used to evaluate genetic tests retrieved by the systematic review process

The majority are based on the ACCE Framework (whose name derives from the evaluation components used: analytic validity, clinical validity, clinical utility, ethical, legal and social implications) (n = 13) [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24], on the Health Technology Assessment (HTA) process (n = 6) [25,26,27,28,29,30,31], or both (n = 2) [32, 33]. The remaining frameworks refer to the Wilson and Jungner screening criteria (n = 3) [34,35,36,37], or to a mixture of preexisting frameworks, which are not necessarily specific for genetic tests, even if the ACCE framework is often included (n = 5; Table 1) [38,39,40,41,42].

Seventeen frameworks deal with genetic tests in general [7,8,9,10,11,12,13,14,15,16,17,18,19, 22, 26, 27, 32, 36,37,38,39,40,41,42]; five refer to genetic susceptibility tests [20, 24, 33,34,35]; three to pharmacogenetic tests [23, 30, 31]; two to predictive genetic tests (including susceptibility and presymptomatic tests) [25, 28]; one to the new technologies of personalized medicine [29] and another to newborn screening [21] (see Supplementary Information for definitions of types of test) (Table 1). Most of the frameworks pursue a wider aim than simply summarize evidence, for example, support provision and coverage decisions or guide clinical practice; the intended primary audience is mainly represented by decision/policy makers (Table 1).

In addition to the two frameworks created as appraisal tools for individual genetic tests (HTA Pharmacogenetics and HTA Susceptibility Test) [31, 33], 11 frameworks were used to generate reports that are available on the web (all open access, except Hayes GTE reports) (Table 1) [43,44,45,46,47,48,49,50,51,52,53,54]. The most productive frameworks are the Evaluation of Genomic Applications in Practice and Prevention (EGAPP) Process (11 evidence reports and 10 recommendations), the GFH Card (46 cards), the Clinical Utility Gene Card (about 136 cards), and the NHS UKGTN Gene Dossier (476 dossiers) [44, 45, 47,48,49].

Evaluation components

The most-represented evaluation components in the retrieved frameworks are analytic validity (included in 93% of retrieved frameworks), clinical validity (96%), clinical utility (100%), economic validity (100%), and ethical, legal, and social implications (ELSI) (76%). The analysis of these components is usually introduced by an overview of the disease and the test under study (86%). Evaluation components frequently missing from the evaluation frameworks are organizational aspects (lacking in 48% of retrieved frameworks), delivery models (73%), and the patient/citizen’s point of view (93%) (Table 2).

Table 2 Evaluation components and methodological aspects considered in the retrieved evaluation frameworks

Analytic validity is the ability of the test to accurately and reliably measure the genotype of interest [55]. It is considered in markedly different ways in 27 retrieved frameworks (Table 2) [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28, 30,31,32,33,34, 36,37,38,39,40,41,42]. It is most frequently addressed in terms of sensitivity and specificity, but assay robustness and quality assurance, including internal and external control programs, are often considered (e.g., ACCE, EGAPP) [7, 11, 12]; some frameworks (e.g., Expanded ACCE, SynFRAME) extend the concept using more criteria [15, 39].

Clinical validity is the ability of the test to accurately and reliably detect or predict a clinical condition [55]. It is considered in 28 retrieved frameworks (Table 2) [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28, 30,31,32,33,34,35,36,37,38,39,40,41,42]. The majority identify clinical validity as test performance and measure it in terms of sensitivity, specificity, and positive and negative predictive value. Other frameworks (e.g., Expanded ACCE, Complex Diseases) preface the evaluation of the performance of the test with explicit evaluation of the scientific validity, that is, the evidence of gene–disease association, which is usually expressed as an odds ratio or relative risk [15, 20].

Clinical utility, in its narrowest sense, compares the risks and benefits of testing and provides evidence of clinical usefulness for the integrated package of care in terms of measurable health outcomes [56]. It is considered in all 29 frameworks retrieved, with a certain heterogeneity (Table 2) [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42]. Some frameworks (e.g., HTA Personalized Health Care, HTA Pharmacogenetics) embrace the narrow definition of clinical utility and consider only efficacy, effectiveness, and safety [29, 31]. Others extend the concept and include considerations of aspects otherwise evaluated independently from clinical utility, such as organizational aspects, cost-effectiveness analysis, and ELSI (e.g., ACCE, Expanded ACCE, ACHDNC) [7, 15, 21]. The broadening of the perception of benefits reaches its greatest extent in the concept of personal utility, adopted in some frameworks (e.g., Complex Diseases, ACHDNC), that is, the full range of personal effects that the test may have on patients, such as improved understanding of the disease, enabling reproductive choices or risk-reducing behaviors [20, 21, 57].

The economic evaluation of genetic tests involves the comparative analysis of both the costs and consequences of the various tests under study [58]. All 29 frameworks retrieved consider the economic dimension, but not always in great detail (Table 2) [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42]. Many models address the cost-effectiveness of the test under study only in the most general terms (e.g., ACCE), and only a few involve precise quantitative and qualitative evaluations of cost-effectiveness and cost-utility evidence (e.g., Codependent Technologies, SynFrame) [7, 30, 39]; other frameworks consider only the financial aspects, either in terms of the cost of the intervention or related savings (e.g., NHS UKGTN Gene Dossier, PACNPGT) [8,9,10, 25].

The ELSI evaluation component is concerned with the moral value that society confers on the proposed interventions, the specific related legal norms and the impact on the social life of the patient and his or her family [55]. They are considered in 22 retrieved frameworks and analyzed independently (e.g., ACCE, Andalusian) or integrated into other components of evaluation, such as clinical utility, as psychosocial outcomes of testing (e.g., EGAPP, AETMIS HTA) (Table 2) [7,8,9,10,11,12,13,14,15, 20,21,22, 25,26,27,28,29, 32,33,34,35,36,37,38,39,40, 42].

We defined a delivery model for the provision of genetic tests as the broad context in which genetic tests are offered to individuals and families with or at risk of genetic disorders. It includes the health care programs (any type of health intervention preceding and following a genetic test), the clinical pathways (the patient flow through different professionals during the testing process), and the level of care (e.g., primary or specialist care level) in which the test is delivered [59]. Although a complete description of the delivery models is lacking in all frameworks retrieved, we ascribe it to eight retrieved frameworks (Table 2): the three screening frameworks, as they include the concept of health care program [34,35,36,37], and five other frameworks, which mentioned some of the elements, albeit not in detail (Table 2) [13, 24, 29, 30, 38].

Organizational aspects include the human, material, and economic resources needed to implement the genetic program as well as the consequences of the implementation on the organizations involved and the whole health care system. Although they do not include a thorough feasibility analysis, 15 retrieved frameworks attempt to estimate the resources required to start up and maintain a particular genetic testing service (Table 2) [7,8,9,10, 13, 14, 24,25,26,27,28,29,30, 32, 34,35,36,37,38].

The perspective of patients provides experiential evidence that can be used in the evaluation process [60]. Only two of the retrieved frameworks evaluate the direct experience of the patient and other affected individuals, for example, by their participation in surveys (Table 2) [15, 33]. One of these (Expanded ACCE) includes the patient perspective as part of the clinical utility component, whereas the other (HTA Susceptibility Test) assigns it its own dedicated section.

Methodological aspects

The most frequently used formats are the key questions format (12 frameworks) [7, 11,12,13,14,15, 20,21,22,23, 32, 41, 42], the card format (five frameworks) [8,9,10, 16,17,18,19, 24, 25], and the checklist format (two frameworks) [30, 38]. The other frameworks have a less structured format and resemble general manuals (Table 2) [26,27,28,29, 31, 33,34,35,36,37, 39, 40].

With respect to the process of evidence review, 13 frameworks provide some indication, albeit in scant detail, of the sources referred to [7, 11,12,13,14,15, 21, 22, 25, 30, 31, 33, 39, 42]; 12 frameworks provide an evaluation of the quality of evidence, but the criteria adopted are not always stated clearly [11,12,13,14, 21, 22, 25, 30, 31, 33, 35 39, 42]; and finally 12 frameworks attempt to deal with evidence gaps through the formulation of research priorities (Table 2) [7, 11, 12, 14, 21, 23, 25, 28, 31, 34, 38, 40, 42].

Only five of the retrieved frameworks provide, or at least suggest, criteria for making recommendations based on the evidence collected [11, 12, 21, 28, 32, 42]. The most frequently used criteria are the magnitude of the net benefit and the level of certainty of the evidence.

Discussion

Our review identified three main approaches to the evaluation of genetic testing: the ACCE model, the HTA process, and the Wilson and Jungner screening criteria. The most popular is the ACCE model, developed in 2000 by the US Centers for Disease Control and Prevention [7]. In 2004, it was further developed to become EGAPP initiative, to make recommendations for clinical and public health practice [11, 12]. The UK Genetic Testing Network and the Andalusian Agency for Health Technology Assessment re-elaborated the ACCE model to guide the introduction of new genetic tests into their public health system, creating the 2004 NHS UKGTN Gene Dossier and the 2006 Andalusian Framework, respectively [8,9,10, 13]. In 2007, the ACCE model was reworked again: an expanded version of ACCE, supported by the PHG Foundation, added health quality measures to the evaluation process, whereas a more streamlined version shortened the systematic review process for emerging genetic tests [14, 15]. In 2010, the ACCE model was applied to specific types of genetic test through the Complex Disease Framework and the ACHDNC Newborn Screening Framework of the Advisory Committee on Heritable Disorders in Newborns and Children [20, 21]. The ACCE model also inspired two related European frameworks, the 2008 GFH Indication Criteria of the German Society of Human Genetics and the 2010 Clinical Utility Gene Card of EuroGentest [16,17,18,19], the latter of which in turn inspired the 2017 Australian Clinical Utility Card [24]. In 2011, the ECRI Institute used the EGAPP process to develop a set of analytical frameworks for different testing scenarios and stakeholder perspectives [22]. Finally, the 2015 Companion tests Assessment Tool (CAT) used the ACCE model as a filter mechanism to determine which tests, in which specific areas, required evaluation [23]. Our results show that the ACCE framework is the main conceptual frame for the evaluation of genetic tests and the most used in practice, as it inspired very productive frameworks in terms of evidence reports such as the EGAPP, the Clinical Utility Gene Card, and the NHS UKGTN Gene Dossier. Some attempts have been made to merge the ACCE and the HTA model, for example, the 2009 framework for genetic tests used by the private American company Hayes and the 2012 framework for susceptibility tests financed by the Italian Ministry of Education, Universities and Research [32, 33].

Due to the widespread use of the ACCE framework, the most frequently employed evaluation components are analytic validity, clinical validity, clinical utility, and ELSI. Although these evaluation components clearly address the technical value of a genetic test, less attention is given to the wider context. Thus, although the clinical context is given sufficient consideration in some cases, in particular the NHS UKGTN evaluation process, whose testing criteria define the appropriate clinical situations for use of a given test [60], the broader context for implementation of a genetic test is often disregarded. In fact, even where an economic evaluation is performed, it is usually rather superficial; similarly, the analysis of delivery models and organizational aspects, when presented, is usually not well structured. These context-related evaluation components are more often considered by the HTA-based evaluation frameworks. The direct experience of patients is almost totally neglected. Nevertheless, being patients the direct beneficiaries of a genetic technology, their perspective could help understanding its value [61]. Finally, the criteria for making recommendations on the clinical implementation of tests are rarely explored.

Since decision makers are the main audience of the evaluation process, the lack of attention to the context-related evaluation components (delivery models, economic evaluation, and organizational aspects) and to the recommendation-making process are arguably the main limitations of the retrieved frameworks. The analysis of the context of implementation, peculiar to the HTA, is critical for securing an efficient and equitable allocation of health care resources and services. An EU-funded research project named HIScreenDiag [62], which closed in 2011, aimed to assess genetic tests using the HTA methodology of the European Network for Health Technology Assessment, which includes a detailed analysis of the economic, organizational, and delivery aspects [63]. Moreover, the adoption of an evidence grading system such as Grading of Recommendations Assessment, Development and Evaluation (GRADE), which scores the strength of recommendations after taking into consideration aspects like patient values and use of resources, would help move the evaluation process from evidence to implementation, and would make the process more comprehensive [64]. Finally, because these frameworks were mainly developed to address single-gene testing, we might consider how appropriate they are for tests based on next-generation sequencing (NGS). However, frameworks that have been adapted for NGS, such as the NHS UKGTN Gene Dossier and the Clinical Utility Gene Card, have proved effective [65, 66].

In contrast to our systematic review, the majority of reviews in the literature on evaluation frameworks for genetic tests have a narrative structure. The only three systematic reviews we retrieved are described as methods for the construction of an evaluation framework (ECRI, SynFRAME, Practical Framework), so their methodology is not strictly reported [22, 39, 40].

One limitation of our work might be a failure to retrieve some of the studies published in the gray literature. To maximize the sensitivity of the search, we used broad search terms, but these yielded results with low specificity; however, we corrected this during the selection process. Moreover, the comparison between the retrieved frameworks could have been affected by the fact that not all frameworks clearly defined their evaluation components, especially with respect to delivery models and organizational aspects.

In conclusion, the ACCE model proves to be a base for the technical appraisal of genetic tests. However, this model is not completely satisfying. We suggest the adoption of a broader HTA approach, including the assessment of the context-related evaluation dimensions (delivery models, economic evaluation, and organizational aspects). This approach would maximize population health benefits, facilitate decision-making and address the main challenges of the implementation of genetic tests, particularly in universal health care systems, where economic sustainability is a major issue.