Main

Clinical validity—the association between genotype and clinical phenotype—is now available for an increasing number of genomic applications. On the other hand, clinical utility—the improvement in patient outcomes and balance of risks and benefits—is largely unknown for most genomic applications. Implementing tests with uncertain clinical utility potentially wastes health-care resources through variable or unnecessary use of those tests. In the worst case, individuals are harmed when they or their health-care provider acts on the test results and they receive ineffective, potentially harmful treatments, or the results cause anxiety or discrimination. Furthermore, clinical utility may be quite specific, as when limited to subgroups with certain genotypes.1 To maximize the clinical relevance of existing and as-yet-unknown genomic applications, it is crucial to ensure that clinically valid tests also have high clinical utility before they become widely used.

Clinical utility may be unclear for numerous reasons, including the relative lack of regulatory requirements for test manufacturers.2 Furthermore, the research community has not aggressively prioritized either the translation of new discoveries into practical use or the generation of evidence with respect to these applications.3 The field is also changing so quickly that evidence becomes rapidly outdated. In some cases, there may be little incentive for private-sector investment in molecular diagnostics because of a lack of value-based reimbursement. Finally, existing paradigms for generating and evaluating evidence may be too slow, too costly, too unwieldy, or too unrepresentative to provide useful evidence to decision makers in a timely manner.4,5,6,7

Comparative effectiveness research (CER) is intended to create evidence for decision making, and to find out “what works” in health care. Although many definitions of CER have been proposed,8,9,10,11,12 we use the Institute of Medicine’s definition:10 “CER is the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition or to improve the delivery of care. The purpose of CER is to assist consumers, clinicians, purchasers, and policy makers to make informed decisions that will improve health care at both the individual and population levels.” Some also use the term “patient-centered outcomes research” to refer to this type of research, although this concept will ultimately carry its own definition.

Concerns over the growing costs of health care13,14,15 have made the use of CER a practical necessity, which has been enabled by $1.1 billion in funding from the American Recovery and Reinvestment Act, and the advent of the Patient-Centered Outcomes Research Institute (http://pcori.org) in the 2010 Patient Protection and Affordable Care Act. Other developments that make CER timely are a new genetic test registry (http://www.ncbi.nlm.nih.gov/gtr/) at the National Institutes of Health, recent congressional hearings stimulated by concerns over direct-to-consumer genetic testing in July 2010, and possible changes at the Food and Drug Administration to consider genetic tests as medical devices, which would require regulatory approval before marketing.

It is critical that all stakeholders (including consumers, insurers, policymakers, and clinicians) possess tools to assess the clinical utility of genomic applications. We describe CER approaches to answer questions about cancer genomic applications, and the potential challenges and opportunities associated with each. We provide case studies of genomic applications to illustrate the types of questions decision makers are facing, and describe potential CER study designs and methods that can be used to address them.

Methods

We searched PubMed for recent literature on CER and searched the citations of these articles to identify additional publications relevant to CER. We also considered additional articles that were not identified through this search but were known to the authors. We selected the following methodology categories for consideration: evidence synthesis, prospective comparative clinical trials, observational research, health economics and decision modeling, and stakeholder engagement. We developed descriptions of these approaches as applied to CER based on literature reviews and the authors’ experience. We then identified a series of case studies of breast cancer genomic applications to clarify CER questions and possible methods to address them. We selected breast cancer because of the public health relevance of the disease, and because of the plethora of genomic applications currently in clinical practice. We used the ACCE framework (analytic validity, clinical validity, clinical utility, and ethical, legal, and social implications)16 as a starting point to identify and organize the information we would abstract on the case studies. Finally, we identified particular challenges for using these CER approaches to conduct genome-based research.

Results

Our results are presented in three sections: (i) identification of the key questions for CER applications in cancer genomics, (ii) illustration of the key questions using examples from breast cancer genomic applications, and (iii) general methodological approaches to addressing the key questions.

Key questions

We framed our analysis using key questions in four areas, which are drawn from the ACCE framework16 and other models.17

  • Is there a significant association between the results of the genomic application and clinical phenotype? (clinical validity)

  • Does the genomic application provide correct information? (analytic validity)

  • Does the genomic application provide clinically significant information? (clinical utility)

  • Does the genomic application lead to improved patient outcomes as compared with the alternative? (comparison or added clinical value)

Illustration of key questions using cancer genomic applications as examples

Genomic applications can span the entire range of disease, from risk identification to diagnosis and patient management. Table 1 shows examples of both conventional and genomic applications in the context of breast cancer for each test category. We provide summary tables of example key questions for breast cancer genomic applications that address risk assessment ( Table 2 ) and treatment decisions ( Table 3 ).

Table 1 Test categories and relationship to breast cancer disease status
Table 2 Risk assessment genomic applications: summary of current evidence for breast cancer case studies
Table 3 Pharmacogenomic applications: summary of current evidence for breast cancer case studies

Clinical validity is the association between the predictor (e.g., genotype, profile, or family history status) and clinical phenotype. Predictors are identified by investigating targeted pathways, by candidate-gene analysis, or through agnostic genome-wide study designs. Methodological problems from multiple testing, heterogeneity, the “winner’s curse” (the likelihood that the first report of a significant test will have a larger effect size than later replication studies), small sample size, and other concerns make interpretation challenging.18,19,20 Further, the attributable risk may be small because of low frequency or low penetrance, or the measured variant may only be linked to the functional variant. For example, initial studies reported an association between CYP2D6 variants and the risk of disease recurrence in women taking tamoxifen ( Table 3 ).21 A systematic evidence review, however, found inconsistent evidence.22 Preliminary results from recent retrospective analyses of large randomized controlled trials (RCTs) including about 5,000 women23,24 found no association between CYP2D6 variants and breast cancer recurrence.

Analytic validity refers to characteristics of the test, including reproducibility (i.e., will the same test performed on the same sample produce the same result?), the lower limit of detection (smallest quantity of the target that can be reliably detected), and analytic specificity (ability to measure the target and only the target). A proficiency testing program (exchange of quality control material for analysis and comparison across laboratories) may be the best approach to address this concern. For example, when HER2 testing ( Table 3 ) was first used in breast cancer clinical trials, it is estimated that up to 20% of test results may have been incorrect. Laboratories with lower volume testing were the most likely to report incorrect findings.25,26 A proficiency testing program has since been implemented for HER2.27

Clinical utility has to do with whether the information provided by the genomic application is actionable, and with evaluating the balance between risks and benefits of available actions. BRCA1/2 testing ( Table 2 ) is one example. Mutation carriers are at increased risk of developing breast and ovarian cancer and can receive more effective breast cancer screening by choice of screening modality or interval, can undergo surgeries to reduce risk by 85–100%, or can select chemoprevention. Women at high risk in families with known mutations who undergo testing and are found not to carry deleterious BRCA1/2 mutations can receive significant psychosocial benefit and avoid these interventions. On the other hand, the clinical utility of gene expression profiles is less clear.28 A key area of uncertainty is how women and their physicians will make treatment decisions based on test results in the intermediate risk category. Two prospective RCTs—TAILORx and RxPONDER—are under way to evaluate how risk profile scores affect patient management, treatment decisions, and subsequent outcomes.29,30

Added clinical value17 asks whether the application provides better clinical, patient, or economic outcomes than those of the alternative, which could be another intervention or usual care. A critical factor is how to define and measure “better,” which could include measures of predictive accuracy, quality of life, survival, or other outcomes, including testing costs, acceptability, or feasibility. Recently, a genetic risk prediction model for breast cancer was published including 10 well-validated single-nucleotide polymorphisms ( Table 2 ).31 The predictive power of this genetic model is only slightly better (about 4%) than the widely used Gail model,32 which uses nongenetic factors to predict risk. Because both models explain about 60% of risk, and because the Gail model can be used without the expense of genetic testing, the added clinical value of the risk prediction model based on single-nucleotide polymorphism profiles is low.

General methods for comparative effectiveness research

The key questions and methodological challenges described earlier, coupled with the need for CER to inform a diverse group of stakeholders, will require a range of innovative strategies, including both evidence synthesis and evidence generation ( Table 4 ).

Table 4 Opportunities in comparative effectiveness research

Synthesis of existing evidence

Evidence synthesis begins with identifying topics through processes such as horizon scanning,33 which searches published literature and gray literature databases (e.g., meeting abstracts, commercial websites, newsletters, or business news) for emerging genomic applications. Horizon scanning may also examine existing curated databases of published literature such as the HuGE Navigator, the GAPP Knowledge Base, and the Pharmacogenomics Knowledge Base. Gray literature sources identify emerging genomic applications because of the lag in reporting on these topics in peer-reviewed published literature; these may be supplemented by a query process from users as an early indicator of burgeoning clinical interest. Once new topics are identified, rapid topic briefs, or short reviews, are used to assess the feasibility of a full systematic review.

Full systematic reviews are often identified through a public nomination process and then commissioned through an existing body such as the U.S. Preventive Services Task Force, Evaluation of Genomic Applications in Practice and Prevention Working Group, or the Agency for Healthcare Research and Quality Effective Healthcare Program. The scope of the review is defined by the analytic framework and key questions, and the reviewers conduct a broad but systematic search to identify evidence. They develop inclusion and validity criteria for the evidence, and abstract needed data, which is then synthesized and summarized in a narrative. Quantitative approaches such as meta-analysis may provide summary estimates of critical measures across studies. Although full systematic reviews are comprehensive, they may not be timely, which is a critical issue in summarizing evidence in genomics.

Generation of new evidence

Clinical trials. Explanatory RCTs are used to evaluate the efficacy of a medical intervention. They are often viewed as the ideal approach to protect against bias. However, this study design also has limitations.34,35 Explanatory RCTs are typically restricted to selected patients, but real-world populations can differ markedly in age, race, comorbid conditions, concomitant medication use, and environmental factors. The generally small sample size of RCTs may underrepresent some patient groups, a particular concern when evaluating genomic-based subgroups. Randomization requires a prospective design, and so RCTs tend to focus on questions of short-term efficacy and safety using intermediate (surrogate) end points. Finally, because RCT protocols are often far removed from routine practice, they may not accurately predict real-world effectiveness.

Innovative strategies in the design of clinical trials seek to overcome these limitations. Pragmatic clinical trials36,37 address the issue of relevance by assessing the effectiveness of the intervention in routine practice by using wide patient inclusion criteria, allowing variation in the treatment protocol, and assessing outcomes relevant to everyday life. However, these studies typically require much larger sample sizes to compensate for heterogeneity in the patient population and the treatment protocol, and longer time frames to assess patient-relevant outcomes.

To fund and implement studies with larger sample sizes, collaborations between researchers, health-care systems, and payers will be critical. A policy framework for conducting such collaborations is coverage with evidence development. Coverage with evidence development is a conditional reimbursement decision by a payer, with an explicit linkage between payment and data collection to reduce uncertainty about the intervention.38,39 The Centers for Medicare and Medicaid Services recently issued a coverage with evidence development policy for warfarin pharmacogenomic testing, in which the Centers will pay for testing if the patient is enrolled in a RCT designed to measure bleeding events.40

Cluster randomized trials are another alternative experimental design in which units such as communities, medical clinics or hospitals, or families are randomized to intervention arms rather than individuals. This design is often used when the intervention is aimed at changing the behavior of the group or the behavior of a provider, or changing the organization of services. This design can also be used to reduce contamination (e.g., “spillover” effects of a mass educational campaign), or to improve the feasibility of a study. Cluster randomized trials require more sophisticated analytic approaches and larger sample sizes because of lack of independence among individual observations.41,42 However, this study design may still be cost-efficient.43 Cluster randomized trials have been used to assess the impact of decision support tools implemented at the provider level, particularly involving genetic risk assessment based on family history.44,45,46

Bayesian or adaptive trial designs can accelerate the pace of evidence generation by incorporating information from prior cases to alter the study midway, based on interim results. An adaptive design incorporates genomic profiles into the trial design by changing the patient randomization process to treatment arms as the trial progresses based on the accumulated data for each profile.47 Despite potential advantages, these trials have not gained widespread acceptance because of nonstandard methods and resistance among Food and Drug Administration regulators.

One example of an adaptive design is the I-SPY 2 project.48 This is a phase II RCT in the neoadjuvant setting for women with locally advanced breast cancer. Patients are randomized to treatment arms based on their biomarker profile. Initially, patients with a given biomarker profile have an equal chance of being randomized to each treatment arm. Over time, the randomization ratio (i.e., the vector of probabilities that a patient will be randomized to each treatment arm) for each biomarker profile is adjusted depending on the experience of previously randomized patients with that profile. Thus, future patients are more likely to be randomized to treatment arms in which patients with similar biomarker profiles achieved a better response.

Observational studies. Observational study designs are a valuable and complementary approach to RCTs.34,35,49 These designs are especially useful when it would be unethical or infeasible to conduct an RCT. For example, Habel and colleagues (2006)50 conducted a retrospective case–control study to evaluate the association between long-term outcomes (the risk of breast cancer death) and Oncotype DX Recurrence Score. Previous studies based on RCTs could not evaluate this outcome and used shorter-term outcomes instead, including rates of distant recurrence as the primary measures.51,52 The primary limitation with observational study designs is the possibility of confounding bias due to unexplained differences between exposure groups, which are not controlled for through randomization. One option is to use risk-adjustment approaches, such as propensity scores or instrumental variables. However, unlike randomization, these approaches cannot control for unmeasured or imperfectly measured covariates, so residual confounding may still be present. Observational designs are less subject to bias when there is no relationship between treatment assignment and treatment response, and they can contribute important information about unanticipated, real-world impacts that complements RCTs.

The use of large, administrative health-care databases to access routinely collected data may offer significant advantages for an observational design. The large population size enables the study of infrequent events. Also, such databases are representative of routine care, making it possible to study real-world effectiveness and utilization patterns. The data are available at relatively low cost without long delays as compared with data-gathering for a new prospectively recruited study. Electronic data from integrated health-care systems with a defined population and electronic medical records (EMRs) allow broad consideration of the patient’s health status. Over time, EMRs and associated databases will make it feasible to consider long-term outcomes. Challenges with the use of EMRs for research include (i) much of the data is in unstructured notes, requiring manual abstraction or natural language processing, (ii) there is a lack of harmonization across systems because of multiple or lacking data standards, (iii) there may be discontinuity of longitudinal data for patients depending on the source or access to health insurance, and (iv) there is variable data quality because of the multitude of providers who enter data into the system. A specific limitation is a lack of clinically derived genomic information or the ability to easily access it.53 Although these challenges may limit use of data for research from the majority of health-care providers in the United Status with EMRs at the moment, nevertheless, there are examples of systems that are currently able to use EMR data for research and that have biorepositories linked to EMRs to facilitate retrospective study designs.

Decision modeling and health economics. Evidence-based bodies have generally relied on RCTs to inform their guideline development when weighing relative benefits and harms. Decision modeling provides a framework to formally incorporate indirect and direct evidence from various sources, to evaluate likely outcomes, and to quantify uncertainty. The advantages of this approach are a structured, transparent framework for assessing the available evidence, and, critically, for quantifying the uncertainty of evidence and its potential impact on patient outcomes. Challenges include timeliness of implementation, development of models acceptable to stakeholders, problems with assumptions and model transparency, and the development of formal guidelines or recommendations based on modeling analyses. Recent work indicates that stakeholders such as clinicians, health-care payers, and guidelines groups are open to using such approaches in genomics if the process is transparent and there is not an overreliance on the model results to drive recommendations.54

Another CER approach is value-of-research analysis, also called value of information analysis, which is used to make decisions about selecting technologies for additional research trials and for designing those trials optimally. The concept behind value of research is that additional research reduces our uncertainty about which intervention to use in clinical practice.55 Reducing uncertainty is valuable because it reduces the chances that the less optimal strategy is selected, and studies that provide “negative” results are still valuable. Impacts on patients’ morbidity and mortality are assessed, as well as health-care costs. These approaches are just beginning to be applied to research prioritization decisions in health care, and must be shown to be feasible as well as useful before widespread implementation. The value of research paradigm may be particularly useful in genomics because the pace of innovation leads to the need to prioritize investment in expensive comparative studies.56

Cost-effectiveness analysis is the standard approach to formally assessing the incremental value of health-care technologies.57 These analyses can incorporate a variety of outcomes including clinical events, life expectancy, quality-adjusted life expectancy, and health-care costs. Applying cost-effectiveness analysis to genomics can be challenging. First, the general lack of comparative effectiveness data makes evaluation of comparative value problematic, and uncertainty must be carefully assessed. Second, the value patients and clinicians place on knowing genetic information (the “value of knowing”) is difficult to measure and to incorporate into policy decisions.58,59 Contingent valuation (willingness-to-pay) approaches have been used;60 more recently, discrete-choice experiments to assess patient preferences have offered significant promise.61

Stakeholder engagement. Given CER’s explicit purpose of producing useful information for decision making, there has been increasing recognition of the importance of including stakeholders such as patients, clinicians, payers, and policymakers in CER activities. The Institute of Medicine recommended specifically that this work “should fully involve consumers, patients, and caregivers in … strategic planning, priority setting, research proposal development, peer review, and dissemination.”10 The rationale is that such involvement will lead to a focus on questions of most relevance to end users.62 Stakeholder involvement should increase the chances that study designs will reflect the specific questions of decision makers, and the greater relevance of the research questions will also facilitate use of results in decision making. Recent work by Deverka and colleagues is one example of an approach to involve stakeholders in assessing the current state of evidence.

Although the need for stakeholder engagement is widely recognized, the published literature on this topic is limited, and there are few formal evaluations of these methods.63 Some qualitative synthesis has identified several recurring themes, including the importance of developing trust and shared understanding through sustained interaction and devoting adequate time and resources to training and preparation.64 The need for valid methods for engaging patients, consumers, and clinicians has been identified as a critical CER methods research priority.65

Discussion

The complexity of developing sufficient evidence for the clinical utility of cancer genomic applications offers opportunities for innovative applications of CER-based approaches. Diagnostic tests such as BRCA1/2 genotyping or Oncotype DX generate information, so it is necessary to use study designs that take into account subsequent therapeutic decisions in determining the clinical impact of the test. Another challenge is to identify and address all important subgroups. In the adaptive clinical trial design of the I-SPY2 project, the subgroups are identified ahead of time, but in other contexts it may be preferable to consider retrospective study designs if the subgroups are not known beforehand. The rapid pace of innovation in genomics means that studies must be extremely efficient and informed by stakeholder needs if the evidence is to remain timely and relevant. Potential solutions to the aforementioned problems include adaptive clinical trials, retrospective studies using EMRs, and decision-modeling approaches to assess indirect evidence. The variable definitions and paucity of data for clinical utility present another challenge. For example, the concept of personal utility, or the value of knowing the information, is clearly relevant for some decision makers and settings (e.g., direct-to-consumer marketing) but may not be relevant in a clinical context,60 and the metrics for measuring personal utility are not well established.58,66 However, stakeholder engagement and approaches to assessing patient preferences such as conjoint analysis may offer a way forward. In the following, we provide a summary of the implications for CER in cancer clinical genomics.

We believe a more comprehensive approach is needed to resolve questions about the clinical utility of genomic applications. Specifically, research is needed that considers more outcome measures, and that is conducted in settings that are relevant to more real-world clinical decisions than have been considered in the past. For example, Table 2 highlights some of the limitations in our knowledge about clinical utility for existing applications in the context of breast cancer. A multitude of stakeholders should have a role in evidence generation. For example, health systems are needed to provide data and facilitate pragmatic trials, providers are needed to use genomic tests in the context of evidence generation, and test developers are needed to make tests available for collaborative study. Such an undertaking, however, will be resource intensive. Thus, a more comprehensive approach will provide clear priorities for CER to ensure that limited resources are used to resolve the most compelling questions. A more comprehensive approach would also engage stakeholders to ensure the study of pressing topics in real-world environments and should establish approaches for rapid evidence synthesis and quantitatively assess the value of prioritized research, considering the health and well-being of patients and the decision-making needs of other stakeholders.

Second, it may be necessary to establish an evidentiary framework to clearly define evidence standards, particularly for clinical utility. Existing frameworks in genomic medicine primarily build upon the ACCE framework16 or the stages of translational medicine,67 and there is no regulatory requirement that applies to all genetic tests. A primary limitation of existing frameworks is that they provide no standard threshold for what constitutes “necessary and sufficient” evidence. What is urgently needed now is to establish appropriate evidentiary thresholds for genomic test adoption; this will require a dialogue and interaction between evidence appraisers and end users to develop consensus. Furthermore, these thresholds need to include appropriate study design criteria and recognize that an RCT is not desirable or feasible in every circumstance, and to establish when (not if) an observational study design and evidence of underlying biological mechanisms contribute to the evidentiary framework.68 Beyond study designs, an evidentiary framework needs to cogently articulate the minimal evidence necessary before clinical application is warranted, taking into consideration the type of genomic application and its clinical context.

Third, strategies that are rapid, timely, and efficient are needed, given the fast pace of discovery in genomic-based approaches.69 Innovative methods, such as evidence heuristics to classify genes and variants, that are capable of addressing whole-genome sequencing and decision modeling frameworks will help address this need.70,71 New strategies will involve transformation of the research infrastructure to “learning systems” that allow continual addition to the evidence base. This approach will achieve greater efficiency through efforts such as establishing biorepositories or registries, linking EMR data or administrative databases to genomic information and creating quality-assured clinical data repositories, or improving standardized coding schemes for genomic applications.

Finally, any reforms of the evidentiary framework should uphold rigorous standards on the statistical validity of the research.72 Although some study designs have a risk of greater uncertainty, we can make strategic choices about when such increased uncertainty is acceptable. We should improve the integrity and conduct of all study designs by using guidelines such as those provided in Strengthening the Reporting of Observational Studies in Epidemiology (STROBE), Consolidated Standards of Reporting Trials Statement (CONSORT), Strengthening the Reporting of Genetic Associations (STREGA), and Genetic Risk Prediction Studies (GRIPS). Also, we can describe how threats to validity are assessed in grading evidence, or require preregistry of the analysis plan for observational studies, as is currently done for RCTs, to reduce biases (including selective outcome reporting) or errors, such as those generated from multiple testing.

Conclusion

Informed decision making in cancer clinical genomics through the development and application of comparative effectiveness research could accelerate the implementation of valuable genomic applications while avoiding harmful applications that can persist in clinical care, leading to waste or patient harm.

Disclosure

D.L.V. reports serving as a consultant for Medco, Novartis Molecular Diagnostics, and Genentech, and is supported by the following genomics-related research grants: P50HG003374, RC2CA148570, and UO1GM092676 from the National Institutes of Health and U18GD000005 from the Centers for Disease Control and Prevention.