Diffusion-weighted MRI for predicting treatment response in patients with nasopharyngeal carcinoma: a systematic review and meta-analysis

Early prediction of treatment response in nasopharyngeal carcinoma is clinically relevant for optimizing treatment strategies. This meta-analysis was performed to evaluate whether apparent diffusion coefficient (ADC) from diffusion-weighted imaging (DWI) can predict treatment response of patients with nasopharyngeal carcinoma. A systematic search of PubMed-MEDLINE and Embase was performed to identify relevant original articles until July 22, 2021. We included studies which performed DWI for predicting locoregional treatment response in nasopharyngeal carcinoma treated with neoadjuvant chemotherapy, definitive chemoradiation, or radiation therapy. Hazard ratios were meta-analytically pooled using a random-effects model for the pooled estimates of overall survival, local relapse-free survival, distant metastasis-free survival and their 95% CIs. ADC showed a pooled sensitivity of 87% (95% CI 72–94%) and specificity of 70% (95% CI 56–80%) for predicting treatment response. Significant between-study heterogeneity was observed for both pooled sensitivity (I2 = 68.5%) and specificity (I2 = 92.2%) (P < 0.01). The pooled hazard ratios of low pretreatment ADC for assessing overall survival, local relapse-free survival, and distant metastasis-free survival were 1.42 (95% CI 1.09–1.85), 2.31 (95% CI 1.42–3.74), and 1.35 (95% CI 1.05–1.74), respectively. In patients with nasopharyngeal carcinoma, pretreatment ADC demonstrated good predictive performance for treatment response.


Materials and methods
This systematic review and meta-analysis was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines 20 . The institutional review board of our institution approved this study.
Literature search strategy. A search of the PubMed-MEDLINE and EMBASE databases was performed to identify relevant original articles on the use of DWI MRI for predicting locoregional treatment response in NPC treated with neoadjuvant chemotherapy, definitive chemoradiation therapy, or radiation therapy up until July 22, 2021. The following search terms were used: [(nasopharyngeal)] AND [(carcinoma) OR (carcinomas) OR (cancer) OR (cancers) OR (squamous cell carcinoma)] AND [(chemoradiation) OR (chemoradiotherapy) OR (radiotherapy) OR (radiation therapy)] AND [("diffusion weighted") OR ("diffusion-weighted") OR (dwmri) OR (DWI) OR ("apparent diffusion coefficient") OR (ADC) OR ("intravoxel incoherent motion") OR (IVIM)]. Only studies published in English were included. We defined 'predictive' as a biomarker of the treatment response to therapy and 'prognostic' as a biomarker of the final survival outcome 21 . The search was limited to studies involving human patients. The bibliographies of the selected articles were further screened to identify other potentially relevant articles.
Inclusion and exclusion criteria. The inclusion criteria were: (1) population: patients with histologically proven NPC who underwent neoadjuvant chemotherapy, definitive chemoradiation, or radiation therapy; (2) index test: DWI MRI with provision for pretreatment ADC of primary NPC; (3) reference standard: the reference standards of the treatment outcome as determined by histology, clinical/imaging follow-up, or a combination of these; (4) outcomes: locoregional failure after neoadjuvant chemotherapy, definitive chemoradiation, or radiation therapy reported in sufficient detail; and (5) study design: all observational studies (retrospective or prospective).
The exclusion criteria were: (1) case reports, review articles, editorials, letters, and conference abstracts; (2) insufficient data on locoregional failure and control; (3) did not provide ADC values of primary NPC; (4) insufficient detail to produce 2 × 2 tables; and (5) overlapping patients and data. For population overlap, the study with the larger cohort was included. Two reviewers (blinded and blinded) independently selected appropriate study reports using a standardized form. Disagreement was resolved by reaching a consensus after discussion with a third reviewer (blinded).
Data extraction. The following information was extracted into a standardized form: (1) study characteristics: first author, year of publication, affiliation, patient enrollment period, number of patients, and study design; (2) clinical information: age, cancer stages, endpoints, treatment, criteria for treatment response, and follow-up period; and (3) MRI acquisition parameters: manufacturer, model, tesla, time repetition, time echo, field of view, matrix, b-values, ADC threshold values, ADC change between treatment, and method of delineating region of interest.
Quality assessment. Two reviewers (blinded and blinded) with seven years of experiences in the head and neck diagnostic radiology independently extracted the data and performed a quality assessment. The included studies were evaluated using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) criteria 22 for predictive studies and Quality in Prognosis Studies (QUIPS) for prognostic studies 23 . Data extraction and quality assessment were performed independently by two reviewers (blinded and blinded). Any disagreement was resolved by a consensus.

Data synthesis and statistical analyses.
Using random-effects modeling, pooled sensitivity and specificity with 95% CIs were generated from individual predictive studies; similarly, pooled overall survival (OS), local relapse-free survival (LRFS), and distant metastasis-free survival (DMFS) with 95% CIs were generated from prognostic studies. For predictive studies, hierarchical summary receiver operating characteristic (HSROC) curves with 95% CIs and predictive regions were graphically visualized. Publication bias was evaluated via Deeks' funnel plot; Deeks' asymmetry test was used to determine the statistical significance of publication bias 24 . Between-study heterogeneity was evaluated using Cochran's Q test with statistical significance at P < 0.05 25 ; the degree of heterogeneity was based on the Higgins inconsistency index (I 2 ) where an I 2 of 0-40%, 30-60%, 50-90%, and 75-100% indicate insignificant, moderate, substantial, and considerable heterogeneity, respectively 26 .
The threshold effect, a positive correlation between sensitivity and the false-positive rate, was visually assessed by the inspection of coupled forest plots of pooled sensitivity and specificity in which an inverted V-shape would indicate a threshold effect. Additionally, Spearman's correlation coefficient of sensitivity and false-positive rates was calculated with a value of > 0.6 regarded to indicate a threshold effect 27 .
Quality assessment. For QUADAS-2, all studies showed low risks of bias in flow and timing and patient selection domains; two studies had unclear risks of bias and unclear concerns regarding applicability in the domains of index test and reference standard 5,6 . For QUIPS, one study had a moderate risk of selection bias (Supplemental Figure 1 Predictive performance of DWI-MRI. Among predictive studies investigating treatment response predictions, DWI showed a pooled sensitivity of 87% (95% CI 75-94%) and specificity of 70% (95% CI 56-80%) (Fig. 2). Between-study heterogeneities were present according to the Q test (P < 0.01); particularly, the I 2 statistic revealed substantial heterogeneity in the pooled sensitivity (I 2 = 68.5%) and specificity (I 2 = 92.3%). However, visual assessment of the coupled forest plot showed no threshold effect, and Spearman's correlation coefficient of sensitivity and false-positive rates also indicated the lack of a threshold effect (-0.48 [95% CI -0.85-0.22]). In the HSROC curve, a large difference was observed between the areas of 95% confidence and prediction regions, suggesting between-study heterogeneities (Fig. 3). Based on the slope coefficient of Deeks' funnel plot, the publication bias was low (P = 0.24) (Supplementary Figure 2). In three studies that reported the thresholds of ADC changes between treatment 8,12,17 , the sensitivities, specificities, and AUCs for predicting treatment response ranged 64-94%, 56.3-72%, and 0.675-0.833, respectively ( Table 2). Table 3 shows the results of subgroup bivariate meta-regression analyses for determining the causes of between-study heterogeneity. Studies with no heterogeneity (I 2 = 0%) had the following characteristics: (1) inclusion of > 60 patients; (2) at least four b-values for ADC mapping; (3) > 70% patients with advanced T stage (T3/4); (4) ≤ 70% patients with advanced N stage (N2/3); and (5) region of interest selection as either single section or volume. Prospective studies showed little heterogeneity (I 2 = 3.7%).

Subgroup bivariate meta-regression analyses of predictive studies.
Prognostic performance of DWI-MRI. The HRs of OS, LRFS, and DMFS with respect to low ADC were evaluated in three studies 13,16,18 (Fig. 4)

Discussion
This systematic review and meta-analysis assessed the predictive performance of pretreatment ADC for treatment response in patients with NPC. For the prediction of treatment response in NPC, pretreatment ADC showed pooled sensitivity of 87% and specificity of 70%. However, there was significant between-study heterogeneity regarding pooled sensitivity (I 2 = 68.5%) and specificity (I 2 = 92.3%). In the subgroup bivariate meta-regression analysis to investigate the source of heterogeneity, the studies that included a larger proportion of advanced T stage (> 70%), lower proportion of advanced N stage (≤ 70%), larger number of patients (> 60), used multiple b-values (≥ 4) for ADC mapping, and region of interest selection as either single section or volume had no heterogeneity (I 2 = 0%). Additionally, low pretreatment ADC values were associated with worse OS, LRFS, and DMFS. Therefore, pretreatment ADC had a good predictive performance for treatment response, suggesting its potential role in guiding the treatment strategy in locoregionally advanced NPC. www.nature.com/scientificreports/ www.nature.com/scientificreports/ In the subgroup analysis, no between-study heterogeneity was observed for studies consisting of patients with predominantly advanced T stages. Advanced T stage NPC usually manifests with skull base involvement or intracranial extension. Based on conventional MRI, the evaluation of skull base involvement in NPC has good diagnostic performance 4,36 ; however, it has limitations for differentiating tumor involvement from reactive inflammation, leading to difficulties in accurately delineating the primary tumor 37 . In contrast, DWI reflects tissue water diffusivity 7 with low pretreatment ADC values indicating biological features of tumor cells, such as hypoxia and higher cell density 16,38,39 . DWI performed better than conventional MRI for assessing the tumor microstructure 40 , making it useful for the evaluation of advanced T staged primary NPC. Moreover, because of the large amount of adipose tissue in the clivus, the diagnostic threshold of ADC in the skull base is higher than that of the nasopharynx 41 . Therefore, low ADC NPC values in the skull base (i.e., indicating advanced T stage) may provide a more consistent predictive performance than that in the nasopharynx.
In patients with advanced N stages, low pretreatment ADC values of primary tumors provide good predictive performance for detecting treatment response. The ADC value is reportedly directly inversely correlated with histological tumor grade [42][43][44] . High-grade tumors have higher cellular density with lower ADC values than low-grade tumors 45 . Moreover, patients with poorly differentiated or undifferentiated tumors (i.e. higher cellularity) are more likely to have metastatic cervical lymph nodes than patients with well-differentiated tumors 42 . Therefore, low ADC values in primary tumors may be associated with more frequent metastatic cervical lymph nodes making it a potential poor prognostic factor in NPC. Table 2. MRI characteristics of the included studies. NA, not applicable; ROI, region of interest; ADC, apparent diffusion coefficient; FOV, field of view; AUC, area under the curve for predicting treatment response. *Single section = the ROI was drawn at the largest cross-sectional area of the tumor; Volume = the mean of all ADC values obtained from all sections involving tumor. **Not included in quantitative data synthesis. www.nature.com/scientificreports/  www.nature.com/scientificreports/ Traditionally, ADC is used to reflect tissue water diffusivity; however, it is no longer considered a true diffusion coefficient 7 . The concept of ADC is based on the Einstein equation which was calculated using the Gaussian law assuming water molecules move freely as in a glass of water 46 . However, cancer tissue interacts with cell membranes and surrounding macro-or micro-molecules making it non-Gaussian. To resolve this, intravoxel incoherent motion (IVIM) was introduced to reflect tissue perfusion and blood microcirculation in complex biological environments 46 . Consistent with our findings, tissue perfusion as measured by the bi-exponential fitting of multiple small b-values could presumably provide a more consistent result than the traditional monoexponential fitting of ADC.

First author (year of publication) Manufacturer MR tesla TR/TE
This study had some limitations. First, all except one study were published in China which might limit the generalizability of the results. Second, substantial heterogeneity was observed in the pooled specificity of treatment response prediction. However, the potential causes of between-study heterogeneity were explored via subgroup meta-regression. Third, we did not include studies investigating other prognostic factors, including clinical factors and molecular biomarkers. Recently, plasma Epstein-Barr virus DNA has also been proposed as a prognostic factor and screening tool in NPC 47,48 . Further studies using a combination of imaging techniques and molecular biomarkers for demonstrating treatment response in NPC may show more promising results for early treatment strategies. Fourth, for prognostic studies, subgroup analysis was not performed due to the small number of studies. However, the between-study heterogeneity was not substantial. Finally, the number of included studies was small which might have led to unstable results, particularly for the prognostic studies.
In conclusion, pretreatment ADC value was a good predictor of treatment response in NPC, providing clinically relevant information for developing early treatment strategies. The benefit of multiple b-values for the prediction of treatment response will need to be investigated further.  www.nature.com/scientificreports/