Main

Breast cancer is the leading incident cancer among women of all major ethnicities in the United States and is the second highest source of cancer mortality.1 Adjuvant chemotherapy has been shown to increase recurrence-free and overall survival,2 but also may produce significant toxicity in the patient. Chemotherapy may cause short-term complications such as alopecia, nausea/vomiting, and myelosuppression and may lead to longer term complications such as permanent ovarian failure in premenopausal patients.3,4 Current NIH clinical guidelines5 recommend adjuvant chemotherapy for women with tumors larger than 1 cm or lymph node involvement. Additionally, tumor markers such as HER2 and histologic grade are used for risk assessment.6,7 Despite widespread use, these criteria are imprecise predictors of distant recurrence.8

Gene expression profiling (GEP) utilizing DNA microarrays9 or RT-PCR10 has been proposed as an alternative approach to identify patients for adjuvant chemotherapy,11 potentially sparing low-risk patients from this treatment. There are predominantly two gene expression profiles currently marketed for clinical use in breast cancer. One of these profiles, MammaPrint, was developed by van't Veer and colleagues at the Netherlands Cancer Institute and is marketed in Europe (and soon will be marketed in the United States) by Agendia.12 The other test, Oncotype DX, is marketed by Genomic Health, Inc. in the United States. The assay marketed by Agendia utilizes a 70-gene microarray-based profile performed on fresh frozen tissue and is intended for patients younger than 55 years with Stage I invasive breast cancer or Stage II node negative invasive breast cancer. In contrast, the test marketed by Genomic Health uses a 21-gene profile utilizing RT-PCR for expression analysis on paraffin-embedded tissue and is intended for patients with node-negative, estrogen receptor–positive (ER+) disease who are taking tamoxifen. Both companies are conducting clinical trials in the United States and Europe to validate their respective expression profiles. Also, there are gene expression profiles in development, such as the one by Ipsogen, a biotechnology company based in France.13 In considering the adoption of GEP in clinical practice, a quantitative evaluation of the clinical and economic outcomes and impact on patient quality of life can clarify the potential tradeoffs compared to current practice. Despite the potential importance of this type of analysis to clinical decision-making about GEP, there have been few such evaluations.14

The objective of this study was to estimate the cost-effectiveness of the Netherlands Cancer Institute GEP assay versus NIH guidelines for the identification of early stage breast cancer patients who would benefit from adjuvant chemotherapy, and to assess the implications of our findings for practice guidelines. The profile we used for our analysis has shown promise for accurate prediction of high-risk women (compared to NIH guidelines) in early studies.9,15 We directly evaluated test characteristics in our analysis to assess the accuracy and prognostic value of identifying high-risk women using GEP.

MATERIALS AND METHODS

We utilized decision analytic techniques to compare GEP and NIH clinical guidelines as prognostic tests to evaluate the risk of distant recurrence for women with early stage breast cancer. Our analysis considered a hypothetical cohort of premenopausal women averaging 44 years of age newly diagnosed with Stage I/II breast cancer. The demographic and clinical characteristics (e.g., age and nodal status) of this target population were chosen to be similar to those of the Netherlands Cancer Institute cohort, upon which our estimates of test performance (sensitivity, specificity, and positive predictive value) are based, and for which we derived empirical estimates for some model parameters.9 Briefly, the Netherlands Cancer Institute cohort comprised a consecutive series of 295 breast cancer patients 52 years or younger who had mastectomy or breast conserving surgery and radiotherapy (if indicated). Fifty-one percent of the cohort had lymph node–positive disease, 77% had ER+ disease, 47% had tumors larger than 2 cm, and 40% had high grade tumors.

The initial component of our model consisted of a decision tree (Fig. 1) that modeled the prognostic categorization and treatment of women with early stage breast cancer during the 6-month period following breast cancer diagnosis. At the time of diagnosis, the use of either prognostic test led to identification of women at high risk (poor prognosis) or at low risk (good prognosis). The results of GEP are a continuous measure, which was dichotomized as “good prognosis” or “poor prognosis” based on a designated test cutoff from the first validation study of GEP15 that yielded a 10% false-positive rate. The gold standard for the estimation of test performance was the actual distant recurrence events in the Netherlands Cancer Institute cohort. The outcomes of correctly classified cases and misclassified cases (i.e., those women with a poor prognosis tumor who were classified as good prognosis and vice versa) were taken into account in the model. For example, misclassified cases in the good prognosis groups are reflected by a recurrence risk greater than zero in those groups. Test sensitivity and specificity were not explicitly included as parameters in the model because test performance was based on empirical analyses of recurrence risk for the good and poor prognosis groups for GEP and the NIH guidelines.

Fig. 1
figure 1

Diagram depicts decision tree portion of decision model for the time period from diagnosis to 6 months subsequent.

We assumed all women identified as “poor prognosis” received chemotherapy, all women identified as “good prognosis” did not receive chemotherapy, and all women lived at least 6 months after diagnosis. The long-term outcomes of these patients were then projected over their remaining lifetime using a Markov model (Fig. 2). A Markov model consists of mutually exclusive clinical outcome states (depicted as ovals), among which a patient can move each time period, in this case defined as 1 year.16 The possible clinical outcome states included in our model were: “no evidence of disease,” “distant recurrence,” and “death” (Fig. 2). Whether or not a woman has received chemotherapy as part of her treatment following diagnosis per GEP or NIH guidelines, all women began the time period 6 months after diagnosis responding to treatment (“no evidence of disease”). From “no evidence of disease,” she could continue to respond to treatment, experience distant recurrence, or die of a non–breast cancer–related cause. Distant recurrence was considered to be progressive disease; that is, once a woman experienced distant recurrence, she could no longer transition to the “no evidence of disease” state. Additionally, we assumed that once a woman experienced distant recurrence, costs, and mortality risk were the same for all women regardless of whether they originally received adjuvant chemotherapy for the primary tumor.

Fig. 2
figure 2

Diagram depicts Markov model portion of decision model for the time period from 6 months after diagnosis until death, which shows all possible clinical outcomes and transitions between them.

We adopted a societal perspective for this analysis and followed the recommendations of the Panel on Cost-Effectiveness in Health and Medicine.17 We programmed the model in a Microsoft Excel spreadsheet and validated it using decision analysis software (TreeAge. Release 4.0. Williamstown, MA: TreeAge Software, Inc., 2004). Study procedures were approved by the University of Washington Institutional Review Board.

Model parameters

The model parameters included probabilities of clinical events, costs, and utilities. Utilities are used to adjust survival for quality of life, and range from 0 (death) to 1 (perfect health).17 The value for the main analysis, range of values for the sensitivity analyses, and data sources for each parameter are shown in Table 1.

Table 1 Parameters used in decision model

Probabilities of clinical events

We estimated the probabilities of clinical events from empirical analyses performed on patient-level data from the Netherlands Cancer Institute cohort, which was also the population for a validation study of GEP.9 We estimated the probabilities of a patient being identified as good prognosis or poor prognosis, the baseline risk of distant recurrence for each prognosis group (positive predictive value for poor prognosis groups, 1-negative predictive value for good prognosis groups), and the mortality associated with distant recurrence from Kaplan-Meier survival analyses. In the analysis of risk of distant recurrence, distant metastases as a first event was defined as a failure and other patients who experienced locoregional recurrence, a second primary cancer (including contralateral breast cancer) or death from causes other than breast cancer were censored. The Netherlands Cancer Institute cohort was a consecutive, population-based group of 295 breast cancer patients ≤ 52 years of age. The median duration of follow-up was 6.7 years. The cohort had approximately equal proportions of women with lymph node negative and positive disease in the GEP and NIH good and poor prognosis groups, and the majority of the cohort (63%) did not receive adjuvant chemotherapy. Women in the GEP and NIH guidelines poor prognosis groups who received chemotherapy appeared to have more favorable survival compared to women who did not, which could have produced bias in the estimates of recurrence risk for GEP and the NIH guidelines. However, because there were approximately equal proportions of women who received chemotherapy in the GEP and the NIH guidelines poor prognosis groups, the recurrence risk estimates in the GEP and NIH guidelines poor prognosis groups should be equivalently biased, and should not affect the comparative recurrence risk estimates. The probability of death from causes other than breast cancer was obtained from US life tables.18

Risk reduction from adjuvant chemotherapy

Results from the Early Breast Cancer Trialists' Collaborative Group meta-analysis of adjuvant systemic therapy2 suggest that for women < 50 years old with early stage breast cancer, chemotherapy reduces the 10-year risk of distant recurrence by 35%. We applied this risk reduction to the yearly recurrence risks for the poor prognosis groups (positive predictive value), which ensures that the chemotherapy benefit in reducing recurrence risk is applied only to those women in whom the disease would recur in the absence of chemotherapy. The way we incorporated the risk reduction from adjuvant chemotherapy in the model arises from the simple assumption that administering chemotherapy to a woman in whom the disease will not recur cannot reduce her risk of recurrence. Because GEP was designed and validated to identify women at high risk of distant recurrence, and because the prognosis of women with distant as compared to locoregional recurrence is less favorable,19 we did not consider the risk of local or regional recurrence. In the Netherlands Cancer Institute cohort, 4% of the women had locoregional recurrence, versus 34% with distant recurrence.

Economic costs

We included direct medical and nonmedical (time and transportation) costs in our calculations. A different GEP assay, targeted to lymph node–negative, ER+ women recently has been marketed for clinical use in the U.S. at a price of $3460.20 We used this price as an estimate for the cost of the GEP assay studied in our analysis, and varied the cost in sensitivity analyses. We assumed that it was costless to use NIH clinical guidelines, as tumor size and lymph node status, which form the basis for the guidelines, are routinely collected at pathology review.

Direct medical costs for breast cancer care were based on estimates from the published literature and empirical analyses. We assigned costs to distant recurrence and adjuvant chemotherapy and did not assign a cost for “no evidence of disease.” To assign costs to distant recurrence, we incorporated point estimates for the annual/episodic costs of breast cancer treatment from the literature. We assigned a one-time cost to distant recurrence that represents the total cost from onset of distant recurrence until death. We considered studies that were (1) conducted in a U.S. health care setting, (2) estimated costs for the relevant clinical outcomes, (3) were published after 1990, and (4) explicitly described their methods. We found a limited number of studies of adjuvant chemotherapy costs in breast cancer; most costs were estimated for older chemotherapy regimens and there was substantial variation in reported results ($5,000–$16,000). We thus undertook an attributable cost analysis using managed care reimbursement data,21 and statistical methods that account for censored data.22,23 This approach estimated chemotherapy attributable costs as the difference between the cancer attributable costs of women with chemotherapy and women without chemotherapy. Briefly, we identified cases from a linked database of claims records from a health plan covering persons under age 65. Women with breast cancer in this plan were identified using the Surveillance, Epidemiology, and End Results (SEER) registry. Resource prices were based on reimbursements from the managed care organization. Women with chemotherapy in the database received a variety of agents and may have also received supportive care agents and/or hospitalization for complications from the chemotherapy. Because we computed incremental costs, the costs of adjuvant chemotherapy and distant recurrence were the only breast cancer treatment costs included in the model. Thus, we did not evaluate other treatment costs for the primary tumor, namely surgery and radiation therapy. All costs are represented in 2003 US dollars.

We estimated direct non medical costs by multiplying the estimated number of hours spent traveling to treatment facilities and in treatment (personal communication, Jeannine S. McCune, June 6, 2003) by the average hourly wage for women ($13.48/hour) in the U.S.24 Transportation costs were estimated based on typical miles traveled per visit (20 miles), cost per mile (36 cents, per IRS automobile mileage reimbursement rates), and cost of parking ($2). The nominal cost of lost wages and transportation ($300) was overwhelmed in the analysis by the considerable costs of GEP and chemotherapy, so that a more accurate estimate of these costs was not necessary.

Quality of life (Utilities)

Utility weights were based on estimates reported in the literature. Utilities refer to the preferences individuals or society may have for any particular set of health outcomes and, in our study, allows for the adjustment of survival for quality of life.25 A utility may range from 0.0 (death) to 1.0 (perfect health). We considered those studies that were published after 1990 and used the standard gamble or time tradeoff methods to estimate utilities.17 We included studies that estimated utilities for clinical outcomes in the model, with explicitly described methods and results that were stated numerically. While it would have been preferable to utilize estimates obtained from community-based samples,17 few such studies were available.26 We assigned utilities to the following clinical outcomes: 6 months after diagnosis, no chemotherapy; 6 months after diagnosis, with chemotherapy; “no evidence of disease” and “distant recurrence.” The utility value for “6 months after diagnosis, with chemotherapy” is lower than the utility value for “6 months after diagnosis, no chemotherapy” because of the detrimental effects of potential toxicity from chemotherapy on quality of life. Additionally, although the utility value assigned to distant recurrence is quite low (0.3), we believe it is an appropriate value for a progressive disease state and is internally consistent with other utility values used in the model. Furthermore, utility for distant recurrence is not an influential parameter in our model, so potential bias in this parameter would not have a substantial impact on results.

Data analysis

Costs and quality-adjusted life years (QALYs) for future years were discounted at 3% per year. The incidence of distant recurrence, the incidence of breast cancer death, total QALYs, and direct medical and nonmedical costs were calculated for GEP and the NIH guidelines. To validate the model, the overall and recurrence-free survival outcomes derived from the model were compared with results from the Netherlands Cancer Institute cohort. Finally, the incremental cost-utility ratio, which is interpreted as the additional cost to provide one additional QALY, was calculated where appropriate. The numerator of the ratio is the difference in total costs between GEP and the NIH guidelines and the denominator is the difference in QALYs.

Sensitivity analyses and alternative testing strategies

We conducted one-way sensitivity analyses to evaluate the effect of varying individual probabilities, costs, and utilities on model results, while holding the others fixed. All parameters in the model were included in the sensitivity analyses (Table 1). In addition, multiway, probabilistic sensitivity analyses were performed using Monte Carlo simulation27 and @Risk software (Palisade Corporation, Newfield, NY). For each simulation, the probabilities, costs, and utilities were randomly drawn from probability distributions that represented the uncertainty of each of the model parameters and a cost-utility ratio was calculated. Ten thousand simulations were conducted to ensure convergence of results, thus removing “first-order” uncertainty, or within-individual predictive uncertainty.28 We used logistic normal distributions for probabilities and utilities27 and lognormal distributions for costs.28 A mean cost-utility ratio and an “uncertainty interval” containing 95% of the values from the simulation were calculated.28 The uncertainty interval provides an estimate of the overall uncertainty in the model due to uncertainty in all of the model parameters, or “second-order” uncertainty.28

Because baseline recurrence risks in the GEP good and poor prognosis groups were not independent, we did not vary these parameters but instead evaluated the range of results that could be achieved using the current (Netherlands Cancer Institute) assay by changing the test cutoff to identify a tumor as poor prognosis.9 Changing the test cutoff alters the recurrence risk for each prognosis group. Finally, in addition to evaluating the testing of all women with either GEP or the NIH guidelines, we also considered the following alternative testing strategies entailing combined use of the results from GEP and NIH guidelines, or not using the results from NIH guidelines: (1) GEP for women identified as poor prognosis using NIH guidelines followed by chemotherapy for women who are identified as poor prognosis on both the NIH guidelines and GEP; (2) GEP for women identified as good prognosis using NIH guidelines followed by chemotherapy for women identified as poor prognosis on either the NIH guidelines or GEP; and (3) identifying 100% of women as candidates for adjuvant chemotherapy in lieu of using results from the NIH guidelines.

RESULTS

The NIH guidelines identified 96% of the cohort as high risk and thus candidates for chemotherapy, whereas GEP identified 61% of patients as high risk (Table 2). This prognostic categorization yielded sensitivities of 98% for the NIH guidelines and 84% for GEP. Specificities were 51% for GEP and 5% for the NIH guidelines. Accounting for the 35% risk reduction in distant recurrence resulting from chemotherapy, utilization of the NIH guidelines to identify and treat high-risk women with chemotherapy prevented 34% of women from experiencing distant recurrence compared to 29% for GEP. When the negative impact on life expectancy and quality of life from chemotherapy and distant recurrence were included, the NIH guidelines and GEP yielded 10.08 versus 9.86 QALYs, respectively. Total costs were $32,636 for the NIH guidelines and $29,754 for GEP. Because GEP produced lower QALYs and lower costs than the NIH guidelines, calculation of an incremental cost-utility ratio was not conducted.17

Table 2 Performance, costs, and outcomes of gene expression profiling vs. NIH guidelines

We validated the model by comparing the overall and recurrence-free survival outcomes and compared them to results from the Netherlands Cancer Institute cohort. We found that the model predicted approximately 7% fewer distant recurrence events than the women in the Netherlands Cancer Institute cohort. This difference is a result of accounting for censoring in the primary data by utilizing Kaplan-Meier survival analyses to estimate risks of distant recurrence and breast cancer mortality. If censoring, or incomplete follow-up time, were not accounted for, risks of distant recurrence would be overestimated.

We also considered the following alternative testing strategies entailing the combined use of results from the NIH guidelines and GEP or not using results from the NIH guidelines: (1) GEP as a confirmatory test for high-risk women identified by NIH guidelines; (2) GEP as a test to identify high-risk women missed by NIH guidelines; and (3) identifying 100% of women as candidates for adjuvant chemotherapy in lieu of using results from the NIH guidelines. Because the NIH guidelines had high sensitivity (96%), and the QALYs lost from failing to identify and treat women who would experience distant recurrence in the absence of chemotherapy was a strong driver of outcomes, the combined testing strategies were dominated by use of the NIH guidelines alone (data not shown) and were not considered further. Comparison of GEP versus identification of 100% of the Netherlands Cancer Institute cohort as candidates of adjuvant chemotherapy did not substantially increase QALYs but increased costs compared to results from the main analysis of GEP versus NIH guidelines (GEP vs. identification of all women: difference in QALYs = −0.17, difference in costs = −$3942).

Results of sensitivity analyses

The tornado diagrams (Figs. 3 and 4) display the results of the one-way sensitivity analyses for the most influential parameters. QALYs were most sensitive to the test cutoff to identify a tumor as poor prognosis. In order for the GEP test to produce equivalent QALYs to the NIH guidelines, GEP sensitivity would need to be 95% or greater, while maintaining specificity (51%). Regardless of the test cutoff used to identify a poor prognosis tumor, the GEP assay studied in our analysis, at its current level of performance, does not attain a sensitivity of this magnitude. Total cost was most influenced by the costs of GEP and chemotherapy, and GEP was cost-saving over the entire range of variation of these parameters. The multi-way probabilistic sensitivity analysis demonstrated a 95% uncertainty interval of −0.32 to −0.09 for the difference in QALYs and −$4686 to −$1042 for the difference in costs.

Fig. 3
figure 3

Diagram depicts the most influential model parameters in determining QALYs from most to least influential, and the effect of varying these parameters on total QALYs

Fig. 4
figure 4

Diagram depicts the most influential model parameters in determining costs from most to least influential, and the effect of varying these parameters on total costs.

DISCUSSION

We evaluated the potential clinical, patient, and economic benefits of a GEP assay versus the NIH guidelines to identify premenopausal women with early stage breast cancer for adjuvant chemotherapy. GEP identified 35% fewer women for chemotherapy than NIH guidelines, but the resultant quality of life benefits are outweighed by the decrease in life expectancy due to GEP's lower sensitivity. GEP's specificity was 10-fold higher than NIH guidelines, leading to lower overall costs. If GEP's sensitivity were to increase to at least 95% and its specificity (51%) was maintained, it would improve quality of life by allowing some women to safely avoid chemotherapy while at the same time not missing women whose survival is compromised by avoiding therapy. Additionally, our results suggest that although the cost of GEP is a major expense compared to the use of NIH guidelines, it appears that using the NIH guidelines may incur more chemotherapy costs, overwhelming the test costs of GEP and leading to the NIH guidelines being more costly overall compared to GEP. Furthermore, although the NIH guidelines identified the preponderance of the cohort (96%) as high-risk, our examination of alternative testing strategies suggests that identifying the entire cohort as high-risk in lieu of using the NIH guidelines would cost more and produce roughly equivalent QALYs compared to the main analysis of GEP versus the NIH guidelines.

There are several limitations to our analysis. First, in addition to the NIH guidelines, there are other guidelines applied in the United States that could have been utilized for the comparison, such as the National Comprehensive Cancer Network criteria.29 The National Comprehensive Cancer Network criteria are more conservative than NIH guidelines, so if we had compared these guidelines to the GEP assay instead of the NIH guidelines, these guidelines would tend to yield higher costs and have higher sensitivity than the NIH guidelines. Second, there is substantial uncertainty in several of our model parameters, most notably in the performance of the GEP assay. Although approximately equal proportions of women in the GEP and NIH guidelines poor prognosis groups received adjuvant chemotherapy, the type and duration of chemotherapy was not specified, which may have introduced bias in the estimates of recurrence risk. For example, if women in the GEP poor prognosis group received more intensive chemotherapy than women in the NIH guidelines poor prognosis group, it is possible that their recurrence risk was underestimated to a greater extent than for women in the NIH guidelines poor prognosis group, enhancing GEP's performance. In the good prognosis groups, a greater proportion of women received chemotherapy in the GEP group versus the NIH guidelines group (in fact no women had chemotherapy in the NIH guidelines good prognosis group), which would produce a bias in favor of GEP. However, because we found that GEP produces lower QALYs than NIH guidelines, the presence of these potential biases would not change decisions about GEP based on our analysis. Additionally, although 77% of patients in the Netherlands Cancer Institute cohort were ER+, only 18% of these patients received adjuvant hormonal therapy (e.g., tamoxifen). Because adjuvant hormonal therapy is the current standard of care for women with ER+ tumors,5 if the cost-utility model were applied to current practice conditions, adjuvant hormonal therapy would reduce the baseline risk of distant recurrence in ER+ women and thus would decrease the absolute incremental benefit of adjuvant chemotherapy according to the proportion of ER+ patients in each prognosis group for GEP and the NIH guidelines. Because information on individual ER status for the GEP and NIH guidelines group was not available, we could not ascertain the direction of potential bias. Analogously, an increase in the efficacy of adjuvant chemotherapy for the GEP poor prognosis group would clearly favor the GEP strategy and there has been the suggestion that gene expression profiles may be able to predict response to chemotherapy.30 Third, for our analysis we used one of several gene expression profiles that are marketed or in development for clinical use, and our results do not necessarily apply to these other tests. Like other profiles, the profile we used has not been extensively validated, but is in the process of further validation.

Fourth, we assumed all women in the poor prognosis groups received chemotherapy. In practice, some physicians may not follow the NIH guidelines in recommending chemotherapy or patients may refuse chemotherapy; thus, the use of chemotherapy may be lower than the values used in our analysis: 96% and 61% for the NIH guidelines and GEP, respectively.31,32 A similar decrease in compliance for GEP and the NIH guidelines would not likely have a dramatic effect on results. However, it is possible that compliance by patients and physicians may be different with GEP due to the perception of the value of genomic information: such issues merit further study.

Fifth, it is possible that the adjuvant chemotherapy cost we used in our model is an inflated estimate, as women who receive adjuvant chemotherapy are also more likely to receive radiation treatment.33 To address uncertainty about the true cost of chemotherapy, we varied the cost of chemotherapy in the sensitivity analysis from $17,930 to $26,037, which would account for up to an 18% overestimate of the true cost of adjuvant chemotherapy. Based on previous report,33 we believe the amount by which we varied the cost of adjuvant chemotherapy in the sensitivity analysis should account for any potential bias in our chemotherapy cost estimate. Furthermore, although the cost of chemotherapy was an influential parameter in the model, the cost savings of GEP compared to NIH guidelines persisted over the entire range of uncertainty over which we varied chemotherapy cost.

Sixth, our cost-utility model may not fully capture all the elements that could influence physician-patient decision-making. Individuals may have different attitudes toward and perceptions of risk,34 which will affect their valuation of GEP and whether or not to undergo or recommend chemotherapy. The withholding of chemotherapy may cause undue anxiety for some patients, producing disutility for GEP as it relates to that patient. Health care providers may also be uneasy with a new risk assessment tool that results in fewer patients receiving chemotherapy. There is a burgeoning literature investigating how oncologists and patients address the benefits and risks of chemotherapy and the interpretation of prognostic information in breast cancer.3546 Many of these studies suggest there is needed improvement in the communication of risks and benefits from provider to patient in cancer care. When genomic technologies are determined to have clinical utility in cancer care, physicians will need to know how to communicate information about risks and benefits in ways that promote informed patient decision-making. As GEP is developed for application in oncology and other clinical specialties, it will add to the genetics expertise required by practicing physicians, and thus to genetics educational needs.4749

This analysis is an illustrative example of how genomic assays used in cancer care may be evaluated based on their accuracy and the downstream outcomes that result from it. Regardless of which GEP assay and clinical guidelines are used for comparison, how the natural history of breast cancer is modeled, and the values assigned to model parameters, it is evident that (1) test performance is crucial in determining outcome; and (2) the quality-adjusted life expectancy benefits of being spared from unwarranted chemotherapy are overwhelmed by the decreased life expectancy due to failing to identify and treat a woman who would experience distant recurrence. Although GEP has significant potential to provide clinical benefit, our study identifies current limitations in GEP test properties and suggests additional refinement and validation are needed before use in clinical practice.

As gene expression profiles are validated in well-designed clinical studies based on standardized protocols, their value as a prognostic factor may be enhanced. Currently, there are international trials underway for the use of this technology.50,51 Ultimately, the most informative studies may involve the evaluation of the incremental prognostic value of GEP over standard pathologic predictors, such as ER status, tumor grade, and proliferation markers such as ki-67 and mitotic count, routinely collected at the time of diagnosis. As additional gene expression profiles are developed, decision modeling will continue to provide a means to assess outcomes and identify key test properties.5266