Direct comparison of PET/CT and MRI to predict the pathological response to neoadjuvant chemotherapy in breast cancer: a meta-analysis

Both PET/CT and breast MRI are used to assess pathological complete response to neoadjuvant chemotherapy (NAC) in patients with breast cancer. The aim is to compare the utility of PET/CT and breast MRI by using head-to-head comparative studies. Literature databases were searched prior to July 2016. Eleven studies with a total of 527 patients were included. For PET/CT, the pooled SEN was 0.87 (95% confidence interval (CI): 0.71–0.95) and SPE was 0.85 (95% CI: 0.70–0.93). For MRI, the pooled SEN was 0.79 (95% CI: 0.68–0.87) and SPE was 0.82 (95% CI: 0.72–0.89). In the conventional contrast enhanced (CE)-MRI subgroup, PET/CT outperformed conventional CE-MRI with a higher pooled sensitivity (0.88 (95% CI: 0.71, 0.95) vs. 0.74 (95% CI: 0.60, 0.85), P = 0.018). In the early evaluation subgroup, PET/CT was superior to MRI with a notable higher pooled specificity (0.94 (95% CI: 0.78, 0.98) vs. 0.83 (95% CI: 0.81, 0.87), P = 0.015). The diagnostic performance of MRI is similar to that of PET/CT for the assessment of breast cancer response to NAC. However, PET/CT is more sensitive than conventional CE-MRI and more specific if the second imaging scan is performed before 3 cycles of NAC.


Results
The database search initially identified 401 potential literature citations, and 3 additional records were obtained by searching the grey literature ( Fig. 1). After reviewing the titles and abstracts, 373 of the studies were excluded as they were not relevant studies. After reading the full texts, we excluded 19 of the remaining 31 articles for the following reasons: 6 article lacked sufficient information to enable completion of a 2 × 2 contingency table, 9 article was not available, the reference standard in 2 articles was clinical response, and 2 article was not published in English. After this final screening, 12 published studies met our inclusion criteria. Ultimately, a total of 11 studies were included in our quantitative synthesis; 1 study was excluded because it assessed the axillary lymph node response to NAC. The data extracted from these individual studies are summarised in Table 1, Table 2, Table S1,  and Table S2.
According to QUADAS-2, the quality assessment of the 12 studies was moderate. The results of the distribution of the study design are shown in Fig. 2.
As there was significant heterogeneity in both pooled analyses (MRI: I 2 = 92.8%, P < 0.001; PET/CT: I 2 = 97.2%, P < 0.001), we used a random-effects coefficient binary regression model.  The results of Deeks funnel plot asymmetry test (P = 0.160 and P = 0.804, respectively) showed no evidence of notable publication bias in the analysis of either MRI or PET/CT (Fig. 5).

Discussion
Although MRI and PET/CT already play daily clinical roles in determining whether to continue, change, or abandon NAC for breast cancer, previous meta-analyses and systematic reviews have yielded inconsistent findings 7,8,24 when assessing these imaging modalities alone or together (Table 4). Several recent head-to-head comparative studies have also yielded inconsistent findings 13,17,21,22 . Because head-to-head comparisons provide the best measurements of the diagnostic accuracy of two different techniques 25,26 , we focused exclusively on direct comparative studies that evaluated both MRI and PET/CT in the same cohort of patients. Compared with the previous meta-analysis by Liu 24 , our research is strengthened by more careful selection of articles and the inclusion of two direct comparative studies 6, 15 that might be missed in their analysis.
The results of our meta-analysis showed that MRI and PET/CT have similar high sensitivities (0.79 vs. 0.87) and specificities (0.82 vs. 0.85). However, among previous meta-analyses, the study focusing on MRI by Michael et al. 8 had a much higher pooled sensitivity (0.92 vs. 0.81) than the study focusing on PET/CT by Mghanga et al. 7 , whereas completely opposite results were observed for pooled specificity (0.60 vs. 0.79). In addition, the AUCs of the two studies were identical (0.88 vs. 0.88). We speculate that the characteristic of high sensitivity with low specificity or vice versa may be caused by a threshold effect originating from the use of different diagnostic cut-off values in various studies. Due to this threshold effect, ROC curve and AUC analysis are more insightful approaches than evaluating the pooled sensitivity and pooled specificity. The AUC in our study (0.87 vs. 0.93) is consistent with these meta-analyses, which suggests that the diagnostic performance of MRI is similar to that of PET/CT for the assessment of breast cancer response to NAC.
Traditionally, tumour response has been monitored by conventional CE-MRI alone with standard anatomic response criteria (Response Evaluation Criteria in Solid Tumors (RECIST) and RECIST 1.1) during the course  27 . Several studies 13,14,20,23 attempted to compare the predictive roles of MRI and PET/CT during NAC using a pre-specified cut-off according to international standards (RECIST vs. PERCIST). Therefore, we performed subgroup analysis of different diagnostic cut-off values. In the pre-specified cut-off subgroup, PET/CT outperformed MRI in assessing pathologic response to NAC, with a higher pooled sensitivity (0.79 vs. 0.61) and a comparable pooled specificity (0.81 vs. 0.83). However, this trend was not observed in the cut-off obtained by ROC subgroup. We also performed subgroup analysis of different MRI modalities. In the conventional CE-MRI subgroup, PET/CT was more effective than MRI in assessing pathologic response to NAC, with a slightly higher pooled sensitivity (0.88 vs. 0.74) and pooled specificity (0.82 vs. 0.82). However, in the functional MRI (perfusion MR, DWI, or MRS) subgroup, PET/ CT appeared to be equivalent to MRI, with lower pooled sensitivity (0.78 vs. 0.88), higher pooled specificity (0.92 vs. 0.82), and similar AUC (0.93 vs. 0.89). These results suggest that PET/CT is more accurate than conventional CE-MRI imaging and that PERCIST criteria may be more appropriate than RECIST criteria for monitoring breast cancer response to NAC. A possible explanation is the general limitation of anatomic MRI techniques, which are unable to distinguish potential residual tumour from fibrotic scar tissue in stable disease 14 .
Because the delay time between the initiation of therapy and changes in tumour size is usually longer than 2 cycles of NAC 28 , several studies [18][19][20]22 have attempted to investigate earlier predictors associated with angiogenesis, metabolism, or cellularity that may change before tumour shrinkage in the breast cancer response to NAC. Moreover, there is no consensus on the optimal timing of second imaging for evaluation of the response to NAC. Therefore, we performed a subgroup analysis of different evaluation time points of second imaging. In the early evaluation subgroup, PET/CT was superior to MRI in assessing pathologic response to NAC, with a notably higher pooled specificity (0.94 vs. 0.83) and a similar pooled sensitivity (0.71 vs. 0.73). By contrast, in the post evaluation subgroup, the pooled sensitivity, specificity and AUC of PET/CT were very similar to those of MRI. Our results support previous conjecture that PET/CT is superior to MRI in assessing response at times before 3 cycles of NAC but not at times after 3 cycles of NAC.
Although breast surgical resection after NAC is based on a combination of clinical and imaging assessments of the response to treatment, the axillary nodal stage continues to play a crucial role in clinical decisions. Hieken et al. 15 reported that PET/CT has higher sensitivity (0.86 vs. 0.59) than MRI in assessing the axillary lymph node response to NAC. However, this result must be interpreted with caution because only one study of this type is available. More clinical studies are required to confirm this result, which would indicate that PET/CT has a greater advantage in assessing both breast cancer and axillary lymph node response to NAC than MRI.
The performance of either PET/CT or MRI alone was shown much different among breast cancer subtypes. Therefore, imaging techniques based on subtypes for personalizing may further improve their performance in NAC monitoring 29 . However, after reviewing the 12 included articles, only two studies with knowledge of the breast cancer subtypes were identified in our study (Table S2). One head-to-head comparative study revealed that it might be better to use PET/CT for early predicting pCR than conventional CE-MRI in luminal B subtype breast  cancer 17 . The second study showed that pCR was associated with the reduction in SUVmax on PET/CT as well as the reduction in largest diameter on MRI in triple-negative tumours, but not in HER2-positive and ER-positive/ HER2-negative tumours 19 . Although current evidence is not sufficient to draw recommendations, these results may be clinically useful and generate hypotheses for further research. Some intrinsic disadvantages of our study should be considered when interpreting our results. First, the sample sizes of comparative studies available in the literature are relatively small, which may contribute to an overestimation of diagnostic accuracy 26 . However, a systematic review 30 focused on meta-analysis studies from the Cochrane Database showed that the number of studies eligible for meta-analysis is typically small in all medical areas and for all outcomes and interventions covered by the Cochrane Reviews. Second, there may be publication bias in this meta-analysis. Our meta-analysis was based only on published and full-text articles, which tend to report positive or significant results rather than negative or not significant results. Although the quality of published data in peer-reviewed journals is generally considered superior to unpublished data 31 , the inclusion of only published studies may lead to reporting bias. Third, accuracy estimates are affected by various factors, such as the definition of pCR and the breast cancer phenotype 8 . As data are limited to investigate those factors, we did not assess these factors in our analyses.
In conclusion, a limited number of head-to-head studies indicates that the diagnostic performance of MRI is similar to that of PET/CT for the assessment of breast cancer response to NAC. However, for monitoring breast cancer response to NAC, PET/CT is more sensitive than anatomic MR imaging, and PERCIST criteria may be more appropriate than RECIST criteria. Moreover, PET/CT is superior to MRI in assessing response at times between 1-3 cycles of NAC but not at time after 3 cycles of NAC. In the future, large-scale, head-to-head, well-designed trials are necessary to compare the predictive value and consider more factors (such as the definition of pCR and phenotype of breast cancer) of these two imaging techniques.

Materials and Methods
We used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement 32 to improve the reporting of our research (Fig. 1).
Search Strategy. A structured approach was followed to identify the patient population, interventions, comparators, outcomes, and study design (PICOS criteria) 32 . Two observers (Lihua Chen and Qifang Yang) performed the literature search of data sources independently (PUBMED, EMBASE, Web of Science, and the Cochrane Library). The search strategy (Appendix A) included both subject headings (MeSH terms) and keywords for the target condition (breast cancer), the imaging techniques under investigation (MRI and PET/CT), and the interventions (neoadjuvant therapy). We limited our search to publications with the search term in the title or abstract of the article and a publication date no later than July 2016. Review articles, letters, comments, case reports, and unpublished articles were excluded. Extensive cross-checking of the reference lists of all retrieved articles was performed.
Criteria for inclusion in the study. Studies were eligible if the following PICOS criteria were met. (a) The patient population consisted of primary breast cancer confirmed histologically; (b) the imaging response for pre-NAC and post-NAC was monitored with both MRI and FDG-PET; (c) histopathologic analysis was available as a reference standard; (d) the study outcome described pCR or near-pCR to NAC; and (e) the study design was described as a direct comparative study or randomised controlled trial.
Non-English and non-Chinese articles were excluded if a full-text translation or evaluation could not be obtained. Both prospective and retrospective studies were included.
We excluded studies if a 2 × 2 table could not be extracted from the data, if there were fewer than 10 patients, and if multiple reports were published for the same study population. In the latter case, the most detailed or recent publication was extracted.

Selection of Articles.
Articles were selected by two authors (Lihua Chen and Qifang Yang) independently.
The two authors initially screened the titles and abstracts of the search results and retrieved all potentially Quality Assessment and Data Extraction. For each included study, the methodological quality was evaluated independently by two observers (Lihua Chen and Qifang Yang) using the standard quality assessment of diagnostic studies (QUADAS-2) checklist, which was specifically developed for systematic reviews of diagnostic accuracy studies [33][34][35] . In addition, the relevant information was also extracted from each study, including the author, year of publication, description of the study population, study nation, study design characteristics,  Table 3. Accuracy estimates for subgroup analyses. pSEN = pooled sensitivities; pSPE = pooled specificities; *P < 0.05.

Meta-analysis.
We constructed forest plots to show the variations of the SEN and SPE estimates together with 95% confidence intervals (CI) for each imaging test in each study. We calculated the SEN, SPE, PLR, NLR and DOR values with their 95% CIs. We constructed HSROC curves to estimate SEN and SPE 36 . Standard χ2-testing and the inconsistency index (I-squared, I 2 ) were used to assess the heterogeneity of the individual studies using Stata software (Stata Corporation, College Station, TX, USA). P < 0.1 or I 2 > 50% suggested notable heterogeneity 37 . If notable heterogeneities were detected, the test performance was summarised using a random-effects coefficient binary regression model; otherwise, a fixed-effects coefficient binary regression model was used 25 .
Subgroup analyses were performed as follows: (a) comparisons of studies using different cut-off values: ROC analysis subgroup (cut-off obtained by ROC analysis) or pre-specified subgroup (cut-off set by pre-specified criteria, MRI with anatomic response criteria, and PET/CT with metabolic response criteria); (b) comparisons of studies using different MRI modalities: conventional CE-MRI subgroup (longest diameter or tumour volume) or functional MRI subgroup (parameter of quantitative perfusion MR, DWI, or MRS); and (c) comparisons of studies with different evaluation time points of second imaging: early evaluation subgroup (second imaging scan before 3 cycles) or post evaluation subgroup (second imaging scan after 3 cycles).
The presence of publication bias was assessed by a Deeks funnel plot and an asymmetry test. Publication bias was considered present if there was a nonzero slope coefficient (P < 0.05), which suggests that only small studies reporting high accuracy had been published 38,39 .   Table 4. Summary of meta-analyses focused on MRI and PET/CT for the assessment of breast cancer response to NAC. PSEN = pooled sensitivities; PSPE = pooled specificities; DOR = diagnostic odds ratio; NR = not reported.