Introduction

Despite enthusiasm that advances in genomics will translate into broad clinical applications—for example, to make early disease diagnosis, to improve risk prediction, and to target therapies—the reality is that few genomic tests (i.e., assays that evaluate variation in single or multiple genes or measure gene expression and products) come to market with an adequate evidence base to support their clinical use. The evidence gaps are particularly apparent in oncology, where great strides are being made in the molecular classification of cancer subtypes,1 but concerns have been raised that genomic tests are being adopted without adequate regulatory oversight or evidence of net benefit in populations or health-care delivery systems. Although not confined to genomic tests, the problem of insufficient evidence for clinical and policy decision making is largely predictable given the much larger research investment in early-phase translational research studies relative to later-phase studies designed to inform evidence-based guidelines and real-world practice.2,3 In addition, the evidence gap is likely to widen given that an analysis of the National Cancer Institute (NCI) cancer genomics research portfolio shows that less than 2% of the funded research in this area is translational beyond “bench to bedside”.3 Furthermore, stakeholders such as clinicians, payers, regulators, and researchers have different perspectives regarding how evidence for genomic tests should be generated and what evidence is needed before they are introduced into clinical practice. Initiatives sponsored by the Agency for Healthcare Research and Quality, the Institute of Medicine, and the Secretary’s Advisory Committee on Genetics, Health, and Society have produced reports describing the complex reasons for the existing evidence gaps and recommendations for overcoming these barriers.4,5,6 In addition, the failure to convene multistakeholder, multidisciplinary guideline development groups can undermine the quality of information available for patient, provider, and policy decision making.7

Recognizing this problem, the NCI funded seven centers to conduct comparative effectiveness research in genomics and personalized medicine8 as part of the American Recovery and Reinvestment Act–sponsored Grand Opportunity challenge grants.9 The overarching purpose of those projects is to generate and synthesize evidence that will assist patients, clinicians, payers, and policy-makers to make informed decisions that will improve health at both the individual and population levels. The NCI recently supported a workshop bringing together a diverse group of stakeholders to facilitate discussion about how evidence is interpreted and what level of evidence is needed before a genomic test is adopted in clinical practice. This report summarizes the results from this workshop and offers an overview of common themes that emerged regarding evidence needs for cancer genomic tests.

Materials and Methods

A planning group consisting of representatives of the Grand Opportunity grantees and NCI program staff assumed responsibility for developing the meeting agenda, selecting the case studies, and developing the process for stakeholder engagement. Potential stakeholder representatives were chosen to cover the perspectives of patient advocate/consumer, payer, health-care provider, industry, policy-maker/regulator, or researcher with known expertise in genomics and personalized medicine. An attempt was made to have balanced representation across diverse stakeholder groups.

To facilitate discussion, the meeting focused on evidence reviews for three case studies: (i) Oncotype DX testing to guide management of women with breast cancer that has spread to lymph nodes, (ii) testing of colorectal cancer patients and family members for mutations associated with Lynch syndrome (LS), and (iii) epidermal growth factor receptor (EGFR) mutation testing to guide treatment decisions for non–small cell lung cancer (NSCLC).

Case selection and development

Cases with varying levels of evidence and varying degrees of use were selected that would likely reveal a diversity of stakeholder perspectives regarding how evidence is used in real-world decision-making. All three cases were genomic tests for cancer that had been introduced within the past 10 years for clinical use, but that have different applications—one prognostic, one screening, and one predictive. In addition, the planning group members commissioned experts to develop summaries of each test (Supplementary Appendixes A–C online) modeled after entries in PLoS Currents: Evidence on Genomic Tests.10,11,12,13

Survey development

The planning group also developed a short survey (Supplementary Appendix D online) to assess stakeholders’ level of confidence in the current evidence base for each of the three cases primarily in terms of whether use of the test improves health outcomes. In addition, the survey addressed whether stakeholders felt that the test and its proposed application should be used in clinical practice or if use should be restricted in some manner because of a lack of evidence. Finally, stakeholders were asked whether the costs of the tests should be reimbursed by health insurers. The surveys were administered via a Web-based password-protected database.14 Respondents were also given the option to provide free-text comments to explain their responses. Survey administration and analysis of results were conducted by the Center for Medical Technology Policy in Baltimore, Maryland.

Stakeholder engagement

Stakeholders received an electronic copy of the case summaries along with a link to the Web-based survey ~3 wk prior to the meeting. A full-day meeting was held in January 2011 that was led by a professional facilitator to promote open dialogue and encourage full participation. Each case presentation was accompanied by a review of the premeeting survey results by a member of the planning group followed by a facilitator-led discussion to uncover key factors explaining the variation in stakeholder responses and their expectations regarding evidence thresholds. Although there have been recent workshops in this area,4 the workshop attendees were asked to respond as if they were members of an informal decision-making body that was examining currently available genomic tests and making clinical or coverage recommendations. After each discussion, stakeholders answered the same survey questions using an audience-response system.15 An additional question was added for the meeting asking stakeholders for their recommendation for how each of the tests should be used in clinical practice at the current time. The response options were based on choices found in a previously published risk–benefit policy matrix that includes explicit considerations of uncertainty.16

The purpose of repeating the survey was to see how the meeting discussion about the cases and stakeholder interaction might alter participants’ positions. During the second half of the meeting, the facilitator led an in-depth discussion of the evidentiary framework for each case, with the goal of making recommendations to the NCI for future research that would better meet the information needs of decision-makers for these cases as well as translating other genomic tests into clinical practice.

Results

Twenty-two stakeholders participated in the meeting. The list of stakeholders and their affiliations is shown in the Acknowledgments section. The groups represented include patient advocates/consumers, payers, industry, policy-makers/regulators, health-care providers, and researchers. Given the large amount of data, complete survey results are provided for Oncotype DX testing only; we highlight a subset of the results for the other two case examples.

Oncotype DX testing

This test uses a proprietary algorithm to analyze the expression of 21 genes within a tumor to determine a recurrence score (number between 0–100) that is used to estimate the probability of breast cancer relapse at 10 y. Scores are grouped into three risk categories: low (1–17), intermediate (18–30), or high risk (30+). Studies have demonstrated that patients in the high-risk category show benefit from adjuvant chemotherapy, whereas those in the low-risk category do not.17

Oncotype DX has been recommended for clinical use in lymph node-negative breast cancer patients by most, although not all, professional and technology assessment groups.18,19,20 The vast majority of payers (95%) reimburse for the test in those patients. The evidence is much less established for patients with cancer that involves lymph nodes, although the Centers for Medicare and Medicaid and some private insurers pay for the test for these patients.

Of the 22 respondents to the initial survey, 16 expressed low to intermediate confidence that there is sufficient evidence to determine whether using Oncotype DX for node-positive breast cancer would be useful for guiding decisions about adjuvant chemotherapy. Some stakeholders expressed concerns that recommendations to forgo chemotherapy would be perceived as too risky by patients without evidence from prospective studies. Others doubted that physicians would be willing to recommend forgoing chemotherapy for lymph node–positive patients, even for those with low-recurrence scores. Some stakeholders commented that prospective studies might be difficult to accrue and might even be viewed as unethical by those who felt the evidence base was relatively strong.

Online comments indicated that most stakeholders felt more research was needed to assess the long-term outcomes of patients managed with the test. Some suggested that there was some benefit to providing prognostic information even if it did not influence clinical decision making.

With respect to how the test should be used in clinical practice, 11 stakeholders responded that Oncotype DX testing should be considered in patients’ and physicians’ decision making, whereas 7 responded that the test should not be used for patients with node-positive breast cancer. One commented that Oncotype DX could provide additional information in the risk–benefit analysis for women with comorbidities for whom chemotherapy might pose a higher risk. Another commented that the test should not be used in clinical practice without further clinical trials to assess clinical utility in patients with node-positive breast cancer.

When asked to rate how the test should be reimbursed by insurers, five responded that the test should not be covered; eight responded that it should be covered with restrictions, and four responded that the test should be covered without restrictions. Thus the lack of evidence was still a key rationale for stakeholder ratings and prospective studies were suggested; some respondents wanted to allow patients and providers the opportunity to evaluate the test in the context of individual treatment decisions.

During the discussion, several issues emerged. These included a desire to have a better understanding of the underlying pathophysiology of breast cancer subtypes, the complexities of implementing test results in real-world settings given the probabilistic nature of test results, the growing importance of Food and Drug Administration approval of tests from payer and guideline committee perspectives, the lack of comparative effectiveness research data for traditional biomarkers as well as Oncotype DX, and concerns about the harms to patients of misclassification based on poorly validated test results. This translated into lower levels of confidence that there was sufficient evidence to determine whether using the test will guide adjuvant chemotherapy decisions and improve health outcomes for patients when stakeholders were resurveyed during the meeting. Although postdiscussion opinion shifted slightly in favor of considering the test for clinical use, more stakeholders responded that the test should not be covered or should be covered with restrictions ( Figure 1a ).

Figure 1
figure 1

Stakeholder answers (during the meeting) to “At this time, what recommendation would you make regarding: (a) Oncotype DX; (b) Lynch syndrome; and (c) epidermal growth factor receptor?

LS screening

With LS screening and cascade testing, a newly diagnosed patient with colorectal cancer is proposed to undergo a screening test, which could involve microsatellite instability testing or immunohistochemistry of the tumor sample. In addition, MLH1 methylation and/or BRAF testing are proposed as intermediate rule-out mutations, or a combination of screening tests are proposed to improve the efficiency of the screening cascade. A positive screening test is followed by diagnostic testing in which the four major MMR (mismatch repair) genes are sequenced to identify the exact mutation segregating in the family. Relatives are then tested for the specific mutation in the family, and carriers are offered increased surveillance for colorectal as well as other cancers (e.g., endometrial, ovarian, gastric cancers). Although several organizations—including the National Comprehensive Cancer Network, the American College of Gastroenterology, and the American Cancer Society—have provided guidance regarding genetic testing for LS, they each recommend screening and testing in different subsets of patients. The Evaluation of Genomic Applications in Practice and Prevention Working Group is the only group that recommends screening for LS in all newly diagnosed colorectal cancer cases.

Some concerns were raised in the LS case discussion about the quality and completeness of the underlying evidence and the lack of standardization of the tests across laboratories. In addition, one stakeholder noted that the uptake of the test among family members of a known mutation carrier is not known, a factor that can have a substantial impact on the effectiveness and efficiency of an LS screening program. However, as compared to the previous case, stakeholders agreed that most of the issues surrounding use of LS cascade testing were more practical than scientific. There was concern about insurance coverage for the test and stakeholder responses reflected a more liberal perspective toward coverage policy that paralleled their rankings of greater clinical use. Given the complicated nature of this case (testing both probands and family members), ~20% of the stakeholders did not answer all of the premeeting survey questions. However, following the meeting discussion, there were no missing data and a shift toward increasing confidence in the evidence supporting clinical utility was observed. The perceived significance of ethical, legal, or social issues around LS cascade testing (i.e., insurance coverage and who would disclose such information) in both the proband and the relatives increased after the discussion. Informed consent, family dynamics, stigma, and the clear communication of information were noted as particular concerns.

There was strong support for insurance coverage for LS testing in the probands, with the voting approximately evenly divided between coverage with and without restrictions. With respect to coverage of LS testing for the relatives, there was similarly strong support for coverage; however, the majority of stakeholders (n = 17) favored coverage of LS testing with restrictions (e.g., eligibility or cost-sharing).

EGFR mutation testing

EGFR mutation testing is a predictive test for guiding use of erlotinib in patients with a poor prognosis and advanced NSCLC. Approximately 10–20% of patients with advanced NSCLC have a somatic mutation in EGFR and therefore in theory would realize greater benefit from erlotinib. Recommendations vary among guidelines organizations regarding testing indications.

The American Society of Clinical Oncology recently recommended that patients with NSCLC who are being considered for first-line therapy with an EGFR tyrosine kinase inhibitor should have their tumors tested for EGFR mutations.21 Other organizations have recommended that erlotinib be considered for first-line therapy if the patient is known to carry EGFR mutations, but they have not recommended routine mutation analysis.22,23

The stakeholders expressed uncertainty with respect to the problem of selecting subpopulations for EGFR mutation testing. Canadian medical oncologists and the National Comprehensive Cancer Network have recommended limiting this testing to patients with advanced NSCLC and nonsquamous histology, whereas the Expert Panel of the Italian Association of Thoracic Oncology recommended this analysis for populations with the highest prevalence of EGFR mutations.

The panel also discussed potential harms associated with testing all NSCLC patients for EGFR mutations. A patient advocate questioned whether patients would be denied erlotinib or coverage for it if their test results were negative for EGFR mutations. Other stakeholders clarified that recommendations regarding mutation analysis do not advocate depriving patients so much as directing patients to more appropriate treatments. The number of false-positives and the time needed for test results were cited as potential harms, as physicians would have to wait for results to make a treatment decision, and the number of patients that would have to be tested to identify just one carrier could constitute a burden on the provider.

Several other issues emerged during the facilitated discussion period. For example, potentially clinically relevant variability in the analytic and clinical validity of EGFR mutation analysis was not clear to all stakeholders on the panel prior to the meeting. The relatively low sensitivity of EGFR mutation analysis was an issue that was debated. Finally, some stakeholders noted that the reliability of testing depends on the amount of tissue available, and in many cases, patients must undergo a second biopsy and further testing before a treatment decision is made.

Stakeholders focused on EGFR mutation analysis in a first-line setting for patients with advanced NSCLC. Their survey responses reveal that as with LS testing, stakeholders overall tended to have more confidence in the evidence base for EGFR testing prior to the meeting. During the course of the discussion, stakeholders expressed less confidence in the adequacy of evidence. The postdiscussion vote did not substantively change the “cover with restrictions” recommendation for clinical use, but did decrease the variance toward this response.

Recommendations for clinical use and research based on an evidentiary framework

After presenting the results of each case study, stakeholders were presented with a published framework11 that incorporated risk–benefit profiles associated with both the test and the treatment implied by test results in determining whether and how genomic tests should be used in clinical practice. The most frequent recommendation for Oncotype DX was “Use with Evidence Development,” followed by “Do Not Use, Conduct Additional Research.” By contrast, the majority of stakeholders chose the categories of “Appropriate for Clinical Use” or “Consider Use in Clinical Practice,” for LS and EGFR testing, reflecting the higher level of confidence expressed by the stakeholders ( Figure 1a–c ). However, none of the tests unanimously met the “Appropriate for Clinical Use” standard, suggesting that additional evidence was needed.

Stakeholders were then asked to describe the types of evidence and studies they would need to make a more informed decision regarding each genomic test. The panel agreed that decision-makers and researchers must first identify relevant clinical and policy questions and then prioritize them before deciding which study approach to pursue. For example, specific research questions suggested by stakeholders included obtaining additional insights regarding the point at which providers consider information actionable, further examination of the effects of testing on patient survival and quality of life, and translation of data from efficacy to effectiveness. The stakeholders also stressed the importance of expanding the methods beyond randomized controlled trials for filling the evidence gaps. Some emerging themes that were suggested included greater use of observational data, conducting studies in community settings, and focusing on outcomes of most relevance to patients ( Table 1 ). Although some discussion of study design options occurred, the goal of the workshop was not to thoroughly evaluate the various possible study designs (see companion paper by Goddard et al.,24 which discusses the pros and cons of those approaches). Finally, the stakeholders noted that although all three tests are commercially available, none are approved by the Food and Drug Administration. They were concerned about possible variability in test accuracy and that having the tests undergo Food and Drug Administration approval is important from both payer and guideline committee perspectives.

Table 1 Suggestions for future research

Discussion

Stakeholder engagement is an essential process precisely because various stakeholders come to the table with various perspectives about evidentiary thresholds.25 This workshop used a mixed-methods approach to obtain insights into how stakeholders in genomic testing evaluated the current evidence base for three clinically available tests and then translated their assessments into hypothetical clinical and coverage recommendations. Three different case examples were chosen to illustrate a broad range of relevant issues, including cancer prevention in unaffected individuals, and issues related to prognostic and predictive testing in patients with a known diagnosis of cancer. Similar to evidentiary bodies that routinely rely on indirect evidence to determine improvements in health outcomes, the workshop participants seemed willing to make inferences in the absence of direct clinical utility data, recognizing that it will take many years to obtain this type of information, particularly for rare disorders. Also not surprisingly, when evaluating the published evidence base stakeholder responses often reflected the diverse positions of clinical guideline committees and technology assessment panels.

A recent Institute of Medicine report, “Clinical Practice Guidelines We Can Trust,” noted that numerous factors undermine the quality of guidelines, including the failure to convene multistakeholder, multidisciplinary guideline development groups, as well as the current limitations in the scientific evidentiary base.26 This workshop was designed to determine the degree of variability of knowledge of and attitudes toward evidence supporting cancer genomic tests among stakeholders from diverse interests groups. Somewhat surprisingly, workshop participants strongly recommended a broader set of methods for evidence generation than traditional randomized controlled trials, with the goal of generating evidence in a manner that balances relevance, timeliness, and feasibility. Researchers noted that with observational or pragmatic trials, diagnostic testing can be done in community-based health-care settings rather than in highly specialized cancer or academic centers to determine how data translate to real-world settings. There was also the global impression among stakeholders that the recent federal focus on comparative effectiveness research and the establishment of the Patient-Centered Outcomes Research Institute27 would help to create a favorable environment for the conduct of additional research to fill evidence gaps.

The stakeholders generally supported allowing tests to be introduced into clinical practice while collecting data on test use and impact on patient outcomes. For example, with respect to breast cancer and Oncotype DX testing, observational data being collected by the National Comprehensive Cancer Network and the inclusion of risk score data as part of the Surveillance Epidemiology and End Results Program data are two examples of how stakeholders support such an effort. Another strategy for postregulatory evidence generation that was discussed was the concept of coverage with evidence development, a policy tool that allows payers to offer conditional coverage for promising new technologies while additional data are being generated to understand the technology’s relative benefits and safety. This type of information is critical for assessing a new genomic test’s added value as compared with standard diagnostic tests.

We note several limitations of the data collected at this workshop. Although we attempted to provide concise summaries of the current evidence base for each test, there was still a large amount of information for stakeholders to assimilate, and this may explain why a significant proportion (~20%) of stakeholders did not complete all of the premeeting survey questions. We also did not pilot-test the survey, so some additional variation in the premeeting responses may be attributable to ambiguity in the wording of the questions or the absence of a standard definition of net benefit. Moreover, the survey did not assess the stakeholders’ level of knowledge prior to the meeting; therefore, we cannot determine the degree to which the facilitated discussion may have variably influenced their follow-up responses. What is clear is that discussion with test experts led many stakeholders to alter their perspective as to whether a test was ready for clinical use or reimbursement, most likely based on a deeper understanding of the complexities of the evidence gaps and clinical integration challenges. Due to the limited size and composition of the stakeholder group, it was not possible to quantify the changes in perspective postdiscussion or to evaluate how the results may have been different if other stakeholders such as members of the public had been included at the workshop. We attempted to balance diverse perspectives (those of patients, payers, clinicians, policy-makers, and researchers) with sufficient technical familiarity to enable full participation in the survey and meeting discussion. Although the results would certainly be different if members of the general public had been included, we would have needed to spend more time preparing them to participate in the survey and workshop. Another limitation was the time allotted during the meeting to discuss each case. Some stakeholders expressed concern about being asked to respond to the audience-response system questions based on a large amount of information presented in a short period of time, particularly when asked to extend their interpretation of the data to the policy framework question.

In summary, although there is a willingness by stakeholders to accept indirect evidence of clinical utility—such as using a genomic test in clinical practice while additional evidence is being collected or even simply allowing physicians and patients to determine when to use the test on a case-by-case basis—stakeholders rely on evidence reviews and clinical guideline committee recommendations in their assessment of the appropriateness of genomic tests for clinical integration. However, there is a nuanced interpretation of the adequacy of the evidence base among stakeholders. In addition, the data quality from the current evidence varies greatly across various sources, which is not unique to genomic tests. Clearly, there is a need to develop a better understanding of the information needs of postregulatory decision-makers such as clinicians, patients, and payers so that more useful studies are designed over time. Practically speaking, there will continue to be a need to synthesize the evidence associated with genomic tests and have these evidence reviews available in a centralized location.

Disclosure

The authors declare no conflict of interest.