Introduction

Nasopharyngeal cancer (NPC) is a common subtype of head-and-neck cancers prevalent in Asia and Africa, with about 129,000 new cases reported in 20181. NPC is widely treated using intensity-modulated radiotherapy and/or chemotherapy. Although early-stage NPC is adequately treated using radiotherapy alone, chemotherapy is required for locoregionally advanced disease1. However, patients with locoregionally advanced NPC have a high probability of recurrence with poor prognosis, even after chemoradiotherapy2. Thus, early prediction of treatment response and identification of patients with treatment resistance are clinically relevant for individualized treatment planning.

The International Union Against Cancer/American Joint Committee on Cancer TNM staging system is used to classify NPC3. Because of the superior soft-tissue resolution and diagnostic performance in the T stage compared to other modalities, MRI is the imaging modality of choice for primary tumor evaluation4. Furthermore, MRI is valuable for tumor staging and post-treatment tumor response evaluation.

However, conventional MRI has limitations regarding the early prediction of treatment response in NPC5. Advanced imaging techniques, including DWI and perfusion imaging, are usually applied for the evaluation of treatment response and residual or recurred tumor6. Importantly, DWI is a common imaging technique for assessing tissue microstructure by measuring tissue water diffusivity7. The ADC, a quantitative parameter calculated from DWI, reflects tumor microstructure, and pretreatment ADC has shown promise in tumor staging and predicting treatment response8,9,10. The benefits of DWI for predicting treatment response and prognosis in NPC have been reported5,6,9,11,12,13,14,15,16,17,18,19; however, the reported sensitivities and specificities were variable. Thus, the current study sought to bridge the gap in the available literature.

The purpose of this systematic review and meta-analysis was to assess the predictive performance of pretreatment ADC for treatment response in patients with NPC.

Materials and methods

This systematic review and meta-analysis was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines20. The institutional review board of our institution approved this study.

Literature search strategy

A search of the PubMed-MEDLINE and EMBASE databases was performed to identify relevant original articles on the use of DWI MRI for predicting locoregional treatment response in NPC treated with neoadjuvant chemotherapy, definitive chemoradiation therapy, or radiation therapy up until July 22, 2021. The following search terms were used: [(nasopharyngeal)] AND [(carcinoma) OR (carcinomas) OR (cancer) OR (cancers) OR (squamous cell carcinoma)] AND [(chemoradiation) OR (chemoradiotherapy) OR (radiotherapy) OR (radiation therapy)] AND [(“diffusion weighted”) OR (“diffusion-weighted”) OR (dw-mri) OR (DWI) OR (“apparent diffusion coefficient”) OR (ADC) OR (“intravoxel incoherent motion”) OR (IVIM)]. Only studies published in English were included. We defined ‘predictive’ as a biomarker of the treatment response to therapy and ‘prognostic’ as a biomarker of the final survival outcome21. The search was limited to studies involving human patients. The bibliographies of the selected articles were further screened to identify other potentially relevant articles.

Inclusion and exclusion criteria

The inclusion criteria were: (1) population: patients with histologically proven NPC who underwent neoadjuvant chemotherapy, definitive chemoradiation, or radiation therapy; (2) index test: DWI MRI with provision for pretreatment ADC of primary NPC; (3) reference standard: the reference standards of the treatment outcome as determined by histology, clinical/imaging follow-up, or a combination of these; (4) outcomes: locoregional failure after neoadjuvant chemotherapy, definitive chemoradiation, or radiation therapy reported in sufficient detail; and (5) study design: all observational studies (retrospective or prospective).

The exclusion criteria were: (1) case reports, review articles, editorials, letters, and conference abstracts; (2) insufficient data on locoregional failure and control; (3) did not provide ADC values of primary NPC; (4) insufficient detail to produce 2 × 2 tables; and (5) overlapping patients and data. For population overlap, the study with the larger cohort was included. Two reviewers (blinded and blinded) independently selected appropriate study reports using a standardized form. Disagreement was resolved by reaching a consensus after discussion with a third reviewer (blinded).

Data extraction

The following information was extracted into a standardized form: (1) study characteristics: first author, year of publication, affiliation, patient enrollment period, number of patients, and study design; (2) clinical information: age, cancer stages, endpoints, treatment, criteria for treatment response, and follow-up period; and (3) MRI acquisition parameters: manufacturer, model, tesla, time repetition, time echo, field of view, matrix, b-values, ADC threshold values, ADC change between treatment, and method of delineating region of interest.

Quality assessment

Two reviewers (blinded and blinded) with seven years of experiences in the head and neck diagnostic radiology independently extracted the data and performed a quality assessment. The included studies were evaluated using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) criteria22 for predictive studies and Quality in Prognosis Studies (QUIPS) for prognostic studies23. Data extraction and quality assessment were performed independently by two reviewers (blinded and blinded). Any disagreement was resolved by a consensus.

Data synthesis and statistical analyses

Using random-effects modeling, pooled sensitivity and specificity with 95% CIs were generated from individual predictive studies; similarly, pooled overall survival (OS), local relapse-free survival (LRFS), and distant metastasis-free survival (DMFS) with 95% CIs were generated from prognostic studies. For predictive studies, hierarchical summary receiver operating characteristic (HSROC) curves with 95% CIs and predictive regions were graphically visualized. Publication bias was evaluated via Deeks’ funnel plot; Deeks’ asymmetry test was used to determine the statistical significance of publication bias24. Between-study heterogeneity was evaluated using Cochran’s Q test with statistical significance at P < 0.0525; the degree of heterogeneity was based on the Higgins inconsistency index (I2) where an I2 of 0–40%, 30–60%, 50–90%, and 75–100% indicate insignificant, moderate, substantial, and considerable heterogeneity, respectively26.

The threshold effect, a positive correlation between sensitivity and the false-positive rate, was visually assessed by the inspection of coupled forest plots of pooled sensitivity and specificity in which an inverted V-shape would indicate a threshold effect. Additionally, Spearman’s correlation coefficient of sensitivity and false-positive rates was calculated with a value of > 0.6 regarded to indicate a threshold effect27.

For predictive studies, subgroup bivariate meta-regression analyses were performed to determine the causes of heterogeneity across the studies according to the following covariates: (1) year of publication (after 2016 vs. before 2016), (2) study design (retrospective or prospective), (3) follow-up of patients (reported vs. not reported), (4) number of patients (> 60 vs. ≤ 60), (5) MR tesla (3.0-T vs. 1.5-T), 6) number of b-values used (≥ 4 vs. < 4), (7) radiologists blinded to outcome (blinded vs. not reported), (8) type of treatment received (concurrent chemoradiation therapy only vs. inclusion of induction or neoadjuvant chemotherapy), (9) proportion of patients with advanced T stage (T3/4) (> 70% vs. ≤ 70%), (10) proportion of patients with advanced N stage (N2/3) (> 70% vs. ≤ 70%), and (11) region of interest selection (single section vs. volume).

All statistical analyses were performed using STATA version 16.0 (StataCorp, College Station, TX, USA) and R version 3.6.2 (R Foundation for Statistical Computing, Vienna, Austria). P < 0.05 was considered statistically significant.

Results

Literature search

The initial search yielded 186 articles; 56 duplicates were removed. After screening the titles and abstracts, 22 articles were considered potentially eligible after excluding 108 articles for the following reasons: non-English articles (n = 22), case reports (n = 9), not in the field of interest (n = 56), non-original articles (i.e., reviews, letters, editorials, conference abstracts) (n = 18), and non-human studies (n = 3). After full-text review, nine articles were further excluded because they were not in the field of interest (n = 1)28, had overlapping study populations (n = 2)10,29, had insufficient information for the reconstruction of 2 × 2 tables (n = 1)30, insufficient detail of pretreatment ADC (n = 3)31,32,33, or region of interest was on lymph nodes only (n = 2)34,35. Thirteen original studies were included for qualitative synthesis5,6,8,9,11,12,13,14,15,16,17,18,19. Ultimately, 12 original articles were included for quantitative synthesis5,6,9,11,12,13,14,15,16,17,18,19. Studies with the primary aim of predicting treatment response5,6,8,9,11,12,14,15,17,19 and predicting prognosis of patients13,16 were evaluated separately in the meta-analysis; one study predicted both treatment response and prognosis18 (Fig. 1).

Figure 1
figure 1

Flow diagram depicting the study eligibility criteria.

Study characteristics

The total number of patients in all the studies was 2192 (715 for predictive studies5,6,8,9,11,12,14,15,17,19, 634 for prognostic studies13,16, and 843 patients for the predictive and prognostic study18 (Table 1). The number of patients in individual studies ranged from 36–843. The mean age of the patients ranged from 42.2–52 years. Nine studies were retrospective8,9,11,12,13,16,17,18,19, and four were prospective5,6,14,15. All studies originated from China except one from Israel11. T stages were reported in all studies while two studies did not report the N stage8,19; eight studies did not report M stages8,11,13,14,16,17,18,19. In four studies, concurrent chemoradiation was the only treatment modality6,11,14,15, whereas other studies used mixed treatment regimens in addition to radiation therapy, including induction chemotherapy5,8,18 and neoadjuvant chemotherapy9,12,17. The characteristics of the MRI examinations of the included studies are summarized in Table 2.

Table 1 The clinical characteristics of the included studies.
Table 2 MRI characteristics of the included studies.

Quality assessment

For QUADAS-2, all studies showed low risks of bias in flow and timing and patient selection domains; two studies had unclear risks of bias and unclear concerns regarding applicability in the domains of index test and reference standard5,6. For QUIPS, one study had a moderate risk of selection bias (Supplemental Figure 1)13.

Predictive performance of DWI-MRI

Among predictive studies investigating treatment response predictions, DWI showed a pooled sensitivity of 87% (95% CI 75–94%) and specificity of 70% (95% CI 56–80%) (Fig. 2). Between-study heterogeneities were present according to the Q test (P < 0.01); particularly, the I2 statistic revealed substantial heterogeneity in the pooled sensitivity (I2 = 68.5%) and specificity (I2 = 92.3%). However, visual assessment of the coupled forest plot showed no threshold effect, and Spearman’s correlation coefficient of sensitivity and false-positive rates also indicated the lack of a threshold effect (-0.48 [95% CI -0.85–0.22]). In the HSROC curve, a large difference was observed between the areas of 95% confidence and prediction regions, suggesting between-study heterogeneities (Fig. 3). Based on the slope coefficient of Deeks’ funnel plot, the publication bias was low (P = 0.24) (Supplementary Figure 2). In three studies that reported the thresholds of ADC changes between treatment8,12,17, the sensitivities, specificities, and AUCs for predicting treatment response ranged 64–94%, 56.3–72%, and 0.675–0.833, respectively (Table 2).

Figure 2
figure 2

Coupled forest plots illustrating pooled sensitivity and specificity of pretreatment ADC for predicting treatment response in patients with nasopharyngeal carcinoma. Horizontal lines indicate 95% CIs of each study.

Figure 3
figure 3

Hierarchical summary receiver operating characteristic (HSROC) curves of pretreatment ADC for predicting treatment response in patients with nasopharyngeal carcinoma.

Subgroup bivariate meta-regression analyses of predictive studies

Table 3 shows the results of subgroup bivariate meta-regression analyses for determining the causes of between-study heterogeneity. Studies with no heterogeneity (I2 = 0%) had the following characteristics: (1) inclusion of > 60 patients; (2) at least four b-values for ADC mapping; (3) > 70% patients with advanced T stage (T3/4); (4) ≤ 70% patients with advanced N stage (N2/3); and (5) region of interest selection as either single section or volume. Prospective studies showed little heterogeneity (I2 = 3.7%).

Table 3 Subgroup meta-regression analyses for identifying heterogeneity.

Prognostic performance of DWI-MRI

The HRs of OS, LRFS, and DMFS with respect to low ADC were evaluated in three studies13,16,18 (Fig. 4). The pooled HRs were 1.42 (95% CI 1.09–1.85) for OS, 2.31 (95% CI 1.42–3.74) for LRFS, and 1.35 (95% CI 1.05–1.74) for DMFS. Because no heterogeneity was present in OS and DMFS and only three studies were included, subgroup analysis was not performed.

Figure 4
figure 4

Forest plots for HRs of (A) overall survival, (B) local relapse-free survival, and (C) distant metastasis-free survival.

Discussion

This systematic review and meta-analysis assessed the predictive performance of pretreatment ADC for treatment response in patients with NPC. For the prediction of treatment response in NPC, pretreatment ADC showed pooled sensitivity of 87% and specificity of 70%. However, there was significant between-study heterogeneity regarding pooled sensitivity (I2 = 68.5%) and specificity (I2 = 92.3%). In the subgroup bivariate meta-regression analysis to investigate the source of heterogeneity, the studies that included a larger proportion of advanced T stage (> 70%), lower proportion of advanced N stage (≤ 70%), larger number of patients (> 60), used multiple b-values (≥ 4) for ADC mapping, and region of interest selection as either single section or volume had no heterogeneity (I2 = 0%). Additionally, low pretreatment ADC values were associated with worse OS, LRFS, and DMFS. Therefore, pretreatment ADC had a good predictive performance for treatment response, suggesting its potential role in guiding the treatment strategy in locoregionally advanced NPC.

In the subgroup analysis, no between-study heterogeneity was observed for studies consisting of patients with predominantly advanced T stages. Advanced T stage NPC usually manifests with skull base involvement or intracranial extension. Based on conventional MRI, the evaluation of skull base involvement in NPC has good diagnostic performance4,36; however, it has limitations for differentiating tumor involvement from reactive inflammation, leading to difficulties in accurately delineating the primary tumor37. In contrast, DWI reflects tissue water diffusivity7 with low pretreatment ADC values indicating biological features of tumor cells, such as hypoxia and higher cell density16,38,39. DWI performed better than conventional MRI for assessing the tumor microstructure40, making it useful for the evaluation of advanced T staged primary NPC. Moreover, because of the large amount of adipose tissue in the clivus, the diagnostic threshold of ADC in the skull base is higher than that of the nasopharynx41. Therefore, low ADC NPC values in the skull base (i.e., indicating advanced T stage) may provide a more consistent predictive performance than that in the nasopharynx.

In patients with advanced N stages, low pretreatment ADC values of primary tumors provide good predictive performance for detecting treatment response. The ADC value is reportedly directly inversely correlated with histological tumor grade42,43,44. High-grade tumors have higher cellular density with lower ADC values than low-grade tumors45. Moreover, patients with poorly differentiated or undifferentiated tumors (i.e. higher cellularity) are more likely to have metastatic cervical lymph nodes than patients with well-differentiated tumors42. Therefore, low ADC values in primary tumors may be associated with more frequent metastatic cervical lymph nodes making it a potential poor prognostic factor in NPC.

Traditionally, ADC is used to reflect tissue water diffusivity; however, it is no longer considered a true diffusion coefficient7. The concept of ADC is based on the Einstein equation which was calculated using the Gaussian law assuming water molecules move freely as in a glass of water46. However, cancer tissue interacts with cell membranes and surrounding macro- or micro-molecules making it non-Gaussian. To resolve this, intravoxel incoherent motion (IVIM) was introduced to reflect tissue perfusion and blood microcirculation in complex biological environments46. Consistent with our findings, tissue perfusion as measured by the bi-exponential fitting of multiple small b-values could presumably provide a more consistent result than the traditional mono-exponential fitting of ADC.

This study had some limitations. First, all except one study were published in China which might limit the generalizability of the results. Second, substantial heterogeneity was observed in the pooled specificity of treatment response prediction. However, the potential causes of between-study heterogeneity were explored via subgroup meta-regression. Third, we did not include studies investigating other prognostic factors, including clinical factors and molecular biomarkers. Recently, plasma Epstein-Barr virus DNA has also been proposed as a prognostic factor and screening tool in NPC47,48. Further studies using a combination of imaging techniques and molecular biomarkers for demonstrating treatment response in NPC may show more promising results for early treatment strategies. Fourth, for prognostic studies, subgroup analysis was not performed due to the small number of studies. However, the between-study heterogeneity was not substantial. Finally, the number of included studies was small which might have led to unstable results, particularly for the prognostic studies.

In conclusion, pretreatment ADC value was a good predictor of treatment response in NPC, providing clinically relevant information for developing early treatment strategies. The benefit of multiple b-values for the prediction of treatment response will need to be investigated further.