Main

The completion of the Human Genome Project has generated enthusiasm for translating genome discoveries into testing applications that have potential to improve health care and usher in a new era of “personalized medicine.”14 For the last decade however, questions have been raised about the appropriate evidentiary standards and regulatory oversight for this translation process.510 The US Preventive Services Task Force (USPSTF) was the first established national process to apply an evidence-based approach to the development of practice guidelines for genetic tests, focusing on BRCA1/2 testing (to assess risk for heritable breast cancer) and on HFE testing for hereditary hemochromatosis.11,12 The Centers for Disease Control and Prevention-funded ACCE Project piloted an evidence evaluation framework of 44 questions that defines the scope of the review (i.e., disorder, genetic test, clinical scenario) and addresses the previously proposed6,7 components of evaluation: Analytic and Clinical validity, Clinical utility and associated Ethical, legal and social implications. The ACCE Project examined available evidence on five genetic testing applications, providing evidence summaries that could be used by others to formulate recommendations.1316 Systematic reviews on genetic tests have also been conducted by other groups.1720

Genetic tests tend to fit less well within “gold-standard” processes for systematic evidence review for several reasons.2124 Many genetic disorders are uncommon or rare, making data collection difficult. Even greater challenges are presented by newly emerging genomic tests with potential for wider clinical use, such as genomic profiles that provide information on susceptibility for common complex disorders (e.g., diabetes, heart disease) or drug-related adverse events, and tests for disease prognosis.25,26 The actions or interventions that are warranted based on test results, and the outcomes of interest, are often not well defined. In addition, the underlying technologies are rapidly emerging, complex, and constantly evolving. Interpretation of test results is also complex, and may have implications for family members. Of most concern, the number and quality of studies are limited. Test applications are being proposed and marketed based on descriptive evidence and pathophysiologic reasoning, often lacking well-designed clinical trials or observational studies to establish validity and utility, but advocated by industry and patient interest groups.

THE EGAPP INITIATIVE

The Evaluation of Genomic Applications in Practice and Prevention (EGAPP) Working Group (EWG) is an independent panel established in April, 2005, to develop a systematic process for evidence-based assessment that is specifically focused on genetic tests and other applications of genomic technology. Key objectives of the EWG are to develop a transparent, publicly accountable process, minimize conflicts of interest, optimize existing evidence review methods to address the challenges presented by complex and rapidly emerging genomic applications, and provide clear linkage between the scientific evidence and the subsequently developed EWG recommendation statements. The EWG is currently composed of 16 multidisciplinary experts in areas such as clinical practice, evidence-based medicine, genomics, public health, laboratory practice, epidemiology, economics, ethics, policy, and health technology assessment.27 This nonfederal panel is supported by the EGAPP initiative launched in late 2004 by the National Office of Public Health Genomics (NOPHG) at the Centers for Disease Control and Prevention (CDC). In addition to supporting the activities of the EWG, EGAPP is developing data collection, synthesis, and review capacity to support timely and efficient translation of genomic applications into practice, evaluating the products and impact of the EWG's pilot phase, and working with the EGAPP Stakeholders Group on topic prioritization, information dissemination, and product feedback.28 The EWG is not a federal advisory committee, but rather aims to provide information to clinicians and other key stakeholders on the integration of genomics into clinical practice. The EGAPP initiative has no oversight or regulatory authority.

SCOPE AND SELECTION OF GENETIC TESTS AS TOPICS FOR EVIDENCE REVIEW

Much debate has centered on the definition of a “genetic test.” Because of the evolving nature of the tests and technologies, the EWG has adopted the broad view articulated in a recent report of the Secretary's Advisory Committee on Genetics, Health, and Society10:

“A genetic test involves the analysis of chromosomes, deoxyribonucleic acid (DNA), ribonucleic acid (RNA), genes, or gene products (e.g., enzymes and other proteins) to detect heritable or somatic variations related to disease or health. Whether a laboratory method is considered a genetic test also depends on the intended use, claim or purpose of a test.”

Based on resource limitations, EGAPP focuses on tests having wider population application (e.g., higher disorder prevalence, higher frequency of test use), those with potential to impact clinical and public health practice (e.g., emerging prognostic and pharmacogenomic tests), and those for which there is significant demand for information. Tests currently eligible for EGAPP review include those used to guide intervention in symptomatic (e.g., diagnosis, prognosis, treatment) or asymptomatic individuals (e.g., disease screening), to identify individuals at risk for future disorders (e.g., risk assessment or susceptibility testing), or to predict treatment response or adverse events (e.g., pharmacogenomic tests) (Table 1). Though the methods developed for systematic review are applicable, EGAPP is not currently considering diagnostic tests for rare single gene disorders, newborn screening tests, or prenatal screening and carrier tests for reproductive decision-making, as these tests are being addressed by other processes.10,2939

Table 1 Categories of genetic test applications and some characteristics of how clinical validity and utility are assessed

EGAPP-commissioned evidence reports and EWG recommendation statements are focused on patients seen in traditional primary or specialty care clinical settings, but may address other contexts, such as direct web-based offering of tests to consumers without clinician involvement (e.g., direct-to-consumer or DTC genetic testing). EWG recommendations may vary for different applications of the same test or for different clinical scenarios, and may address testing algorithms that include preliminary tests (e.g., family history or other laboratory tests that identify high risk populations).

Candidate topics (i.e., applications of genetic tests in specific clinical scenarios to be considered for evidence review) are identified through horizon scanning in the published and unpublished literature (e.g., databases, web postings), or nominated by EWG members, outside experts and consultants, federal agencies, health care providers and payers, or other stakeholders.40 Like the USPSTF,23 the EWG does not have an explicit process for ranking topics. EGAPP staff prepares background summaries on each potential topic which are reviewed and given preliminary priorities by an EWG Topics Subcommittee, based on specific criteria and aimed at achieving a diverse portfolio of topics that also challenge the evidence review methods (Table 2). Final selections are determined by vote of the full EWG. EGAPP is currently developing a more systematic and transparent process for prioritizing topics that is better informed by stakeholders.

Table 2 Criteria for preliminary ranking of topics

REVIEW OF THE EVIDENCE

Evidence review strategies

When topics are selected for review by the EWG, CDC's NOPHG commissions systematic reviews of the available evidence. These reviews may include meta-analyses and economic evaluations. New topics are added on a phased schedule as funding and staff capacity allow. All EWG members, review team members, and consultants disclose potential conflicts of interest for each topic considered. Following the identification of the scope and the outcomes of interest for a systematic review, key questions and an analytic framework are developed by the EWG, and later refined by the review team in consultation with a technical expert panel (TEP). The EWG assigns members to serve on the TEP, along with other experts selected by those conducting the review; these members constitute the EWG “topic team” for that review. Based on the multidisciplinary nature of the panel, selection of EWG topic teams aims to include expertise in evidence-based medicine and scientific content.

For five of eight testing applications selected by the EWG to date, CDC-funded systematic evidence reviews have been conducted in partnership with the Agency for Healthcare Research and Quality (AHRQ) Evidence-based Practice Centers (EPCs).41 Based on expertise in conducting comprehensive, well-documented literature searches and evaluation, AHRQ EPCs represent an important resource for performing comprehensive reviews on applications of genomic technology. However, comprehensive reviews are time and resource intensive, and the numbers of relevant tests are rapidly increasing. Some tests have multiple applications and require review of more than one clinical scenario.7,10

Consequently, the EWG is also investigating alternative strategies to produce shorter, less expensive, but no less rigorous, systematic reviews of the evidence needed to make decisions about immediate usefulness and highlight important gaps in knowledge. A key objective is to develop methods to support “targeted” or “rapid” reviews that are both timely and methodologically sound.13,1720,42 Candidate topics for such reviews include situations when the published literature base is very limited, when it is possible to focus on a single evaluation component (e.g., clinical validity) that is most critical for decision-making, and when information is urgently needed on a test with immediate potential for great benefit or harm. Three such targeted reviews are being coordinated by NOPHG-based EGAPP staff in collaboration with technical contractors, and with early participation of expert core consultants who can identify data sources and provide expert guidance on the interpretation of results.43 Regardless of the source, a primary objective for all evidence reviews is that the final product is a comprehensive evaluation and interpretation of the available evidence, rather than summary descriptions of relevant studies.

Structuring the evidence review

“Evidence” is defined as peer-reviewed publications of original data or systematic review or meta-analysis of such studies; editorials and expert opinion pieces are not included.23,44 However, EWG methods allow for inclusion of peer-reviewed unpublished literature (e.g., information from Food and Drug Administration [FDA] Advisory Committee meetings), and for consideration on a case-by-case basis of other sources, such as review articles addressing relevant technical or contextual issues, or unpublished data. Topics are carefully defined based on the medical disorder, the specific test (or tests) to be used, and the specific clinical scenario in which it will be used.

The medical “disorder” (a term chosen as more encompassing than “disease”) should optimally be defined in terms of its clinical characteristics, rather than by the laboratory test being used to detect it. Terms such as condition or risk factor generally designate intermediate or surrogate outcomes or findings, which may be of interest in some cases; for example, identifying individuals at risk for atrial fibrillation as an intermediate outcome for preventing the clinical outcome of cardiogenic stroke. In pharmacogenomic testing, the disorder, or outcome of interest, may be a reduction in adverse drug events (e.g., avoiding severe neutropenia among cancer patients to be treated with irinotecan via UGT1A1 genotyping and dose reduction in those at high risk), optimizing treatment (e.g., adjusting initial warfarin dose using CYP2C9 and VKORC1 genotyping to more quickly achieve optimal anticoagulation in order to avoid adverse events), or more effectively targeting drug interventions to those patients most likely to benefit (e.g., herceptin for HER2-overexpressing breast cancers).

Characterizing the genetic test(s) is the second important step. For example, the American College of Medical Genetics defined the genetic testing panel for cystic fibrosis in the context of carrier testing as the 23 most common CFTR mutations (i.e., present at a population frequency of 0.1% or more) associated with classic, early onset cystic fibrosis in a US pan-ethnic study population. This allowed the subsequent review of analytic and clinical validity to focus on a relatively small subset of the 1000 or more known mutations.45 Rarely, a nongenetic test may be evaluated, particularly if it is an existing alternative to mutation testing. An example would be biochemical testing for iron overload (e.g., serum transferrin saturation, serum ferritin) compared with HFE genotyping for identification of hereditary hemochromatosis.

A clear definition of the clinical scenario is of major importance, as the performance characteristics of a given test may vary depending on the intended use of the test, including the clinical setting (e.g., primary care, specialty settings), how the test will be applied (e.g., diagnosis or screening), and who will be tested (e.g., general population or selected high risk individuals). Preliminary tests should also be considered as part of the clinical scenario. For example, when testing for Lynch syndrome among newly diagnosed colorectal cancer cases, it may be too expensive to sequence two or more mismatch repair genes (e.g., MLH1, MSH2) in all patients. For this reason, preliminary tests, such as family history, microsatellite instability, or immunohistochemical testing, may be evaluated as strategies for selecting a smaller group of higher risk individuals to offer gene sequencing.

METHODS

Methods of the EWG for reviewing the evidence share many elements of existing processes, such as the USPSTF,23 the AHRQ Evidence-based Practice Center Program,46 the Centre for Evidence Based Medicine,47 and others.44,4853 These include the use of analytic frameworks with key questions to frame the evidence review; clear definitions of clinical and other outcomes of interest; explicit search strategies; use of hierarchies to characterize data sources and study designs; assessment of quality of individual studies and overall certainty of evidence; linkage of evidence to recommendations; and minimizing conflicts of interest throughout the process. Typically, however, the current evidence on genomic applications is limited to evaluating gene-disease associations, and is unlikely to include randomized controlled trials that evaluate test-based interventions and patient outcomes. Consequently, the EWG must rigorously assess the quality of observational studies, which may not be designed to address the questions posed.

In this new field, direct evidence to answer an overarching question about the effectiveness and value of testing is rarely available. Therefore, it is necessary to construct a chain of evidence, beginning with the technical performance of the test (analytic validity) and the strength of the association between a genotype and disorder of interest. The strength of this association determines the test's ability to diagnose a disorder, assess susceptibility or risk, or provide information on prognosis or variation in drug response (clinical validity). The final link is the evidence that test results can change patient management decisions and improve net health outcomes (clinical utility).

To address some unique aspects of genetic test evaluation, the EWG has adopted several aspects of the ACCE model process, including formal assessment of analytic validity; use of unpublished literature for some evaluation components when published data are lacking or of low quality; consideration of ethical, legal, and social implications as integral to all components of evaluation; and use of questions from the ACCE analytic framework to organize collection of information.13 Important concepts that underlie the EGAPP process and add value include (1) providing a venue for multidisciplinary independent assessment of collected evidence; (2) conducting reviews that maintain a focus on medical outcomes that matter to patients, but also consider a range of specific family and societal outcomes when appropriate54; (3) developing and optimizing methods for assessing individual study quality, adequacy of evidence for each component of the analytic framework, and certainty of the overall body of evidence; (4) focusing on summarization and synthesis of the evidence and identification of gaps in knowledge; and (5) ultimately, providing a foundation for evidentiary standards that can guide policy decisions. Although evidentiary standards will necessarily vary depending on test application (e.g., for diagnosis or to guide therapy) and the clinical situation, the methods and approaches described in this report are generally applicable; further refinement is anticipated as experience is gained.

The analytic framework and key questions

After the selection and structuring of the topic to be reviewed, the EWG Methods Subcommittee drafts an analytic framework for the defined topic that explicitly illustrates the clinical scenario, the intermediate and health outcomes of interest, and the key questions to be addressed. Table 1 provides generic examples of clinical scenarios. However, analytic frameworks for genetic tests differ based on clinical scenario, and must be customized for each topic. Figure 1 shows the example of an analytic framework used to develop the first EWG recommendation, Testing for Cytochrome P450 Polymorphisms in Adults with Nonpsychotic Depression Prior to Treatment with Selective Serotonin Reuptake Inhibitors (SSRIs); numbers in the figure refer to the key questions listed in the legend.55,56

Fig. 1
figure 1

Analytic framework and key questions for evaluating one application of a genetic test in a specific clinical scenario: Testing for Cytochrome P450 Polymorphisms in Adults With Non-Psychotic Depression Treated With Selective Serotonin Reuptake Inhibitors (SSRIs); modified from reference 56. The numbers correspond to the following key questions:

1. Overarching question: Does testing for cytochrome P450 (CYP450) polymorphisms in adults entering selective serotonin reuptake inhibitor (SSRI) treatment for nonpsychotic depression lead to improvement in outcomes, or are testing results useful in medical, personal, or public health decision-making?

2. What is the analytic validity of tests that identify key CYP450 polymorphisms?

3. Clinical validity: A, How well do particular CYP450 genotypes predict metabolism of particular SSRIs? B, How well does CYP450 testing predict drug efficacy? C, Do factors such as race/ethnicity, diet, or other medications, affect these associations?

4. Clinical utility: A, Does CYP450 testing influence depression management decisions by patients and providers in ways that could improve or worsen outcomes? B, Does the identification of the CYP450 genotypes in adults entering SSRI treatment for nonpsychotic depression lead to improved clinical outcomes compared to not testing? C, Are the testing results useful in medical, personal, or public health decision-making?

5. What are the harms associated with testing for CYP450 polymorphisms and subsequent management options?

The first key question is an over-arching question to determine whether there is direct evidence that using the test leads to clinically meaningful improvement in outcomes or is useful in medical or personal decision-making. In this case, EGAPP uses the USPSTF definition of direct evidence, “…a single body of evidence establishes the connection…” between the use of the genetic test (and possibly subsequent tests or interventions) and health outcomes.23 Thus, the overarching question addresses clinical utility, and specific measures of the outcomes of interest. For genetic tests, such direct evidence on outcomes is most commonly not available or of low quality, so a “chain of evidence” is constructed using a series of key questions. EGAPP follows the convention that the chain of evidence is indirect if, rather than answering the overarching question, two or more bodies of evidence (linkages in the analytic framework) are used to connect the use of the test with health outcomes.23,57

After the overarching question, the remaining key questions address the components of evaluation as links in a possible chain of evidence: analytic validity (technical test performance), clinical validity (the strength of association that determines the test's ability to accurately and reliably identify or predict the disorder of interest), and clinical utility (balance of benefits and harms when the test is used to influence patient management). Determining whether a chain of indirect evidence can be applied to answer the overarching question requires consideration of the quality of individual studies, the adequacy of evidence for each link in the evidence chain, and the certainty of benefit based on the quantity (i.e., number and size) and quality (i.e., internal validity) of studies, the consistency and generalizability of results, and understanding of other factors or contextual issues that might influence the conclusions.23,57 The USPSTF has recently updated its methods and clarified its terminology.57 Because this approach is both thoughtful and directly applicable to the work of EGAPP, the EWG has adopted the terminology; an additional benefit will be to provide consistency for shared audiences.

Evidence collection and assessment

The review team considers the analytic framework, key questions, and any specific methodological approaches proposed by the EWG. As previously noted, the report will focus on clinical factors (e.g., natural history of disease, therapeutic alternatives) and outcomes (e.g., morbidity, mortality, quality of life), but the EWG may request that other familial, ethical, societal, or intermediate outcomes also be considered for a specific topic.54 The EWG may also request information on other relevant factors (e.g., impact on management decisions by patients and providers) and contextual issues (e.g., cost-effectiveness, current use, or feasibility of use).

Methods for individual evidence reviews will differ in small ways based on the reviewers (AHRQ EPC or other review team), the strategy for review (e.g., comprehensive, targeted/rapid), and the topic. These differences will be transparent because all evidence reviews describe methods and follow the same general steps: framing the specific questions for review; gathering technical experts and reviewers; identifying data sources, searching for evidence using explicit strategies and study inclusion/exclusion criteria; specifying criteria for assessing quality of studies; abstracting data into evidence tables; synthesizing findings; and identifying gaps and making suggestions for future research.

All draft evidence reports are distributed to the TEP and other selected experts for technical review. After consideration of reviewer comments, EPCs provide a final report that is approved and released by AHRQ and posted on the AHRQ website; the EPC may subsequently publish a summary of the evidence. Non-EPC review teams submit final reports to CDC and the EWG, along with the comments from the technical reviewers and how they were addressed; the EWG approves the final report. Final evidence reports (or links to AHRQ reports) are posted on the www.egappreviews.org web site. When possible, a manuscript summarizing the evidence report is prepared to submit for publication along with the clinical practice recommendations developed by the EWG.56

Grading quality of individual studies

Table 3 provides the hierarchies of data sources for analytic validity, and of study designs for clinical validity and utility, designated for all as Level 1 (highest) to Level 4. Table 4 provides a checklist of questions for assessing the quality of individual studies for each evaluation component based on the published literature.5,13,23,48,58,59 Different reviewers may provide a quality rating for individual studies that is based on specified criteria, or derived using a more quantitative algorithm. The EWG ranks individual studies as Good, Fair, or Marginal based on critical appraisal using the criteria in Tables 3 and 4. The designation Marginal (rather than Poor) acknowledges that some studies may not have been “poor” in overall design or conduct, but may not have been designed to address the specific key question in the evidence review.

Table 3 Hierarchies of data sources and study designs for the components of evaluation
Table 4 Criteria for assessing quality of individual studies (internal validity)55

Components of evaluation

Analytic validity

EGAPP defines the analytic validity of a genetic test as its ability to accurately and reliably measure the genotype (or analyte) of interest in the clinical laboratory, and in specimens representative of the population of interest.13 Analytic validity includes analytic sensitivity (detection rate), analytic specificity (1-false positive rate), reliability (e.g., repeatability of test results), and assay robustness (e.g., resistance to small changes in preanalytic or analytic variables).13 As illustrated by the “ACCE wheel” figure (http://www.cdc.gov/genomics/gtesting/ACCE.htm), these elements of analytic validity are themselves integral elements in the assessment of clinical validity.13,42 Many evidence-based processes assume that evaluating clinical validity will address any analytic problems, and do not formally consider analytic validity.23 The EWG has elected to pursue formal evaluation of analytic validity because genetic and genomic technologies are complex and rapidly evolving, and validation data are limited. New tests may not have been validated in multiple sites, for all populations of interest, or under routine clinical laboratory conditions over time. More importantly, review of analytic validity can also determine whether clinical validity can be improved by addressing test performance.

Tests kits or reagents that have been cleared or approved by the FDA may provide information on analytic validity that is publicly available for review (e.g., FDA submission summaries).60 However, most currently available genetic tests are offered as laboratory developed tests not currently reviewed by the FDA, and information from other sources must be sought and evaluated. Different genetic tests may use a similar methodology, and information on the analytic validity of a common technology, as applied to genes not related to the review, may be informative. However, general information about the technology cannot be used as a substitute for specific information about the test under review. Based on experience to date, access to specific expertise in clinical laboratory genetics and test development is important for effective review of analytic validity.

Table 3 (column 1) provides a quality ranking of data sources that are used to obtain unbiased and reliable information about analytic validity. The best information (quality Level 1) comes from collaborative studies using a single large, carefully selected panel of well-characterized samples (both cases and controls) that are blindly tested and reported, with the results independently analyzed. At this time, such studies are largely hypothetical, but an example that comes close is the Genetic Testing Quality Control Materials Program at CDC.61 As part of this program, samples precharacterized for specific genetic variants can be accessed from Coriell Cell Repositories (Camden, NJ) by other laboratories to perform in-house validation studies.62 Data from proficiency testing schemes (Levels 1 or 2) can provide some information about all three phases of analytic validity (i.e., analytic, pre- and postanalytic), as well as interlaboratory and intermethod variability. ACCE questions 8 through 17 are helpful in ensuring that all aspects of analytic validity have been addressed.42

Table 4 (column 1) lists additional criteria for assessing the quality of individual studies on analytic validity. Assessment of the overall quality of evidence for analytic validity includes consideration of the quality of studies, the quantity of data (e.g., number and size of studies, genes/alleles tested), and the consistency and generalizability of the evidence (also see Table 5, column 1). The consistency of findings can be assessed formally (e.g., by testing for homogeneity), or by less formal methods (e.g., providing a central estimate and range of values) when sufficient data are lacking. One or more internally valid studies do not necessarily provide sufficient information to conclude that analytic validity has been established for the test. Supporting the use of a test in routine clinical practice requires data on analytic validity that are generalizable to use in diverse “real world” settings.

Table 5 Grading the quality of evidence for the individual components of the chain of evidence (key questions)57

Clinical validity

EGAPP defines the clinical validity of a genetic test as its ability to accurately and reliably predict the clinically defined disorder or phenotype of interest. Clinical validity encompasses clinical sensitivity and specificity (integrating analytic validity), and predictive values of positive and negative tests that take into account the disorder prevalence (the proportion of individuals in the selected setting who have, or will develop, the phenotype/clinical disorder of interest). Clinical validity may also be affected by reduced penetrance (i.e., the proportion of individuals with a disease-related genotype or mutation who develop disease), variable expressivity (i.e., variable severity of disease among individuals with the same genotype), and other genetic (e.g., variability in allele/genotype frequencies or gene-disease association in racial/ethnic subpopulations) or environmental factors. ACCE questions 18 through 25 are helpful in organizing information on clinical validity.42

Table 3 (column 2) provides a hierarchy of study designs for assessing quality of individual studies.13,23,44,4648,50,53,63 Published checklists for reporting studies on clinical validity are reasonably consistent, and Table 4 (column 2) provides additional criteria adopted for grading the quality of studies (e.g., execution, minimizing bias).5,13,23,44,4651,53,58,59,63 As with analytic validity, the important characteristics defining overall quality of evidence on clinical validity include the number and quality of studies, the representativeness of the study population(s) compared with the population(s) to be tested, and the consistency and generalizability of the findings (Table 5). The quantity of data includes the number of studies, and the number of total subjects in the studies. The overall consistency of clinical validity estimates can be determined by formal methods such as meta-analysis. Minimally, estimates of clinical sensitivity and specificity should include confidence intervals.63 In pilot studies, initial estimates of clinical validity may be derived from small data sets focused on individuals known to have, versus not have, a disorder, or from case/control studies that may not represent the wide range or frequency of results that will be found in the general population. Although important to establish proof of concept, such studies are insufficient evidence for clinical application; additional data are needed from the entire range of the intended clinical population to reliably quantify clinical validity before introduction.

Clinical utility

EGAPP defines the clinical utility of a genetic test as the evidence of improved measurable clinical outcomes, and its usefulness and added value to patient management decision-making compared with current management without genetic testing. If a test has utility, it means that the results (positive or negative) provide information that is of value to the person, or sometimes to the individual's family or community, in making decisions about effective treatment or preventive strategies. Clinical utility encompasses effectiveness (evidence of utility in real clinical settings), and the net benefit (the balance of benefits and harms). Frequently, it also involves assessment of efficacy (evidence of utility in controlled settings like a clinical trial).

Tables 3 and 4 (column 3) provide the hierarchy of study designs for clinical utility, and other criteria for grading the internal validity of studies (e.g., execution, minimizing bias) adopted from other published approaches.13,23,4648,57 Paralleling the assessment of analytic and clinical validity, the three important quality characteristics for clinical utility are quality of individual studies and the overall body of evidence, the quantity of relevant data, and the consistency and generalizability of the findings (Table 5). Another criterion to be considered is whether implementation of testing in different settings, such as clinician ordered versus direct-to-consumer, could lead to variability in health outcomes.

Grading the quality of evidence for the individual components in the chain of evidence (key questions)

Table 5 provides criteria for assessing the quality of the body of evidence for the individual components of evaluation, analytic validity (column 2), clinical validity (column 3), and clinical utility (column 4).23,44,47,48,64 The adequacy of the information to answer the key questions related to each evaluation component is classified as Convincing, Adequate, or Inadequate. This information is critical to assess the “strength of linkages” in the chain of evidence.57 The intent of this approach is to minimize the risk of being wrong in the conclusions derived from the evidence. When the quality of evidence is Convincing, the observed estimate or effect is likely to be real, rather than explained by flawed study methodology; when Adequate, the observed results may be influenced by such flaws. When the quality of evidence is Inadequate, the observed results are more likely to be the result of flaws in study methodology rather than an accurate assessment; availability of only Marginal quality studies always results in Inadequate quality.

Based on the evidence available, the overall level of certainty of net health benefit is categorized as High, Moderate, or Low.57High certainty is associated with consistent and generalizable results from well-designed and conducted studies, making it unlikely that estimates and conclusions will change based on future studies. When the level of certainty is Moderate, some data are available, but limitations in data quantity, quality, consistency, or generalizability reduce confidence in the results, and, as more information becomes available, the estimate or effect may change enough to alter the conclusion. Low certainty is associated with insufficient or poor quality data, results that are not consistent or generalizable, or lack of information on important outcomes of interest; as a result, conclusions are likely to change based on future studies.

Translating evidence into recommendations

Based on the evidence report, the EWG's assessment of the magnitude of net benefit and the certainty of evidence, and consideration of other clinical and contextual issues, the EWG formulates clinical practice recommendations (Table 6). Although the information will have value to other stakeholders, the primary intended audience for the content and format of the recommendation statement is clinicians. The information is intended to provide transparent, authoritative advice, inform targeted research agendas, and underscore the increasing need for translational research that supports the appropriate transition of genomic discoveries to tests, and then to specific clinical applications that will improve health or add other value in clinical practice.

Table 6 Recommendations based on certainty of evidence, magnitude of net benefit, and contextual issues

Key factors considered in the development of a recommendation are the relative importance of the outcomes selected for review, the benefits (e.g., improved clinical outcome, reduction of risk) that result from the use of the test and subsequent actions or interventions (or if not available, maximum potential benefits), the harms (e.g., adverse clinical outcome, increase in risk or burden) that result from the use of the test and subsequent actions/interventions (or if not available, largest potential harms), and the efficacy and effectiveness of the test and follow-up compared with currently used interventions (or doing nothing). Simple decision models or outcomes tables may be used to assess the magnitudes of benefits and harms, and estimate the net effect. Consistent with the terminology used by the USPSTF, the magnitude of net benefit (benefit minus harm) may be classified as Substantial, Moderate, Small, or Zero.57

Considering contextual factors

Contextual issues include clinical factors (e.g., severity of disorder, therapeutic alternatives), availability of diagnostic alternatives, current availability and use of the test, economics (e.g., cost, cost-effectiveness, and opportunity costs), and other ethical and psychosocial considerations (e.g., insurability, family factors, acceptability, equity/fairness). Cost-effectiveness analysis is especially important when a recommendation for testing is made. Contextual issues that are not included in preparing EGAPP recommendation statements are values or preferences, budget constraints, and precedent. Societal perspectives on whether use of the test in the proposed clinical scenario is ethical are explored before commissioning an evidence review.

The ACCE analytic framework considers as part of clinical utility the assessment of a number of additional elements related to the integration of testing into routine practice (e.g., adequate facilities/resources to support testing and appropriate follow-up, plan for monitoring the test in practice, availability of validated educational materials for providers and consumers).13 The EWG considers that most of these elements constitute information that should not be included in the consideration of clinical utility, but may be considered as contextual factors in developing recommendation statements and in translating recommendations into clinical practice.

Recommendation language

Standard EGAPP language for recommendation statements uses the terms: Recommend For, Recommend Against, or Insufficient Evidence (Table 6). Because the types of emerging genomic tests addressed by EGAPP are more likely to have findings of Insufficient Evidence, three additional qualifiers may be added. Based on the existing evidence and consideration of contextual issues and modeling, Insufficient Evidence could be considered “Neutral” (not possible to predict with current evidence), “Discouraging” (discouraged until specific gaps in knowledge are filled or not likely to meet evidentiary standards even with further study), and “Encouraging” (likely to meet evidentiary standards with further studies or reasonable to use in limited situations based on existing evidence while additional evidence is gathered).

As a hypothetical example of how the various components of the review are brought together to reach a conclusion, consider the model of a pharmacogenetic test proposed for screening individuals who are entering treatment with a specific drug. The intended use is to identify individuals who are at risk for a serious adverse reaction to the drug. The analytic validity and clinical validity of the test are established and adequately high. However, the specific adverse outcomes of interest are often clinically diagnosed and treated as part of routine management, and clinical studies have not been conducted to show the incremental benefit of the test in improving patient outcomes. Because there is no evidence to support improvement in health outcome or other benefit of using the test (e.g., more effective, more acceptable to patients, or less costly), the EWG would consider the recommendation to be Insufficient Evidence (neutral). In a second scenario, a genetic test is proposed for testing patients with a specific disorder to provide information on prognosis and treatment. Clinical trials have provided good evidence for benefit to a subset of patients based on the test results, but more studies are needed to determine the validity and utility of testing more generally. The EWG is likely to consider the recommendation to be Insufficient Evidence (encouraging).

Products and review

Draft evidence reports are distributed by the EPC or other contractor for expert peer-review. Objectives for peer review of draft evidence reports are to ensure accuracy, completeness, clarity, and organization of the document; assess modeling, if present, for parameters, assumptions and clinical relevance; and to identify scientific or contextual issues that need to be addressed or clarified in the final evidence report. In general, the selection of reviewers is based on expertise, with consideration given to potential conflicts of interest.

When a final evidence report is received by the EWG, a writing team begins development of the recommendation statement. Technical comments are solicited from test developers on the evidence report's accuracy and completeness, and are considered by the writing team. The recommendation statement is intended to summarize current knowledge on the validity and utility of an intended use of a genetic test (what we know and do not know), consider contextual issues related to implementation, provide guidance on appropriate use, list key gaps in knowledge, and suggest a research agenda. Following acceptance by the full EWG, the draft EGAPP recommendation statement is distributed for comment to peer reviewers selected from organizations expected to be impacted by the recommendation, the EGAPP Stakeholders Group, and other key target audiences (e.g., health care payers, consumer organizations). The objectives of this peer review process are to ensure the accuracy and completeness of the evidence summarized in the recommendation statement and the transparency of the linkage to the evidence report, improve the clarity and organization of information, solicit feedback from different perspectives, identify contextual issues that have not been addressed, and avoid unintended consequences. Final drafts of recommendation statements are approved by the EWG and submitted for publication in Genetics in Medicine. Once published, the journal provides open access to these documents, and the link is also posted on the www.egappreviews.org web site. Announcements of recommendation statements are distributed by email to a large number of stakeholders and the media. The newly established EGAPP Stakeholders Group will advise on and facilitate dissemination of evidence reports and recommendation statements.

Summary

This document describes methods developed by the EWG for establishing a systematic, evidence-based assessment process that is specifically focused on genetic tests and other applications of genomic technology. The methods aim for transparency, public accountability, and minimization of conflicts of interest, and provide a framework to guide all aspects of genetic test assessment, beginning with topic selection and concluding with recommendations and dissemination. Key objectives are to optimize existing evidence review methods to address the challenges presented by complex and rapidly emerging genomic applications, and to establish a clear linkage between the scientific evidence, the conclusions/recommendations, and the information that is subsequently disseminated.

In combining elements from other internationally recognized assessment schemes in its methods, the EWG seeks to maintain continuity in approach and nomenclature, avoid confusion in communication, and capture existing expertise and experience. The panel's methods differ from others in some respects, however, by calling for formal assessment of analytic validity (in addition to clinical validity and clinical utility) in its evidence reviews, and including (on a selective basis) nontraditional sources of information such as gray literature, unpublished data, and review articles that address relevant technical or contextual issues. The methods and process of the EWG remain a work in progress and will continue to evolve as knowledge is gained from each evidence review and recommendation statement.

Future challenges include modifying current methods to achieve more rapid, less expensive, and targeted evidence reviews for test applications with limited literature, without sacrificing the quality of the answers needed to inform practice decisions and research agendas. A more systematic horizon scanning process is being developed to identify high priority topics more effectively, in partnership with the EGAPP Stakeholders Group and other stakeholders. Additional partnerships will need to be created to develop evidentiary standards and build additional evidence review capacity, nationally. Finally, the identification of specific gaps in knowledge in the evidence offers the opportunity to raise awareness among researchers, funding entities, and review panels, and thereby focus future translation research agendas.