Diagnostic performance of magnetic resonance imaging for colorectal liver metastasis: A systematic review and meta-analysis

The prognosis of colorectal cancer (CRC) is largely dependent on the early detection of hepatic metastases. With the advantages of nonradioactivity and the availability of multiple scanning sequences, the efficacy of magnetic resonance imaging (MRI) in the detection of colorectal liver metastases (CRLM) is not yet clear. We performed this meta-analysis to address this issue. PubMed, Embase, and the Cochrane Library were searched for studies reporting diagnostic performance of MRI for CRLM. Descriptive and quantitative data were extracted. The study quality was evaluated for the identified studies and a random effects model was used to determine the integrated diagnosis estimation. Meta-regression and subgroup analyses were implemented to investigate the potential contributors to heterogeneity. As a result, seventeen studies were included for analysis (from the year 1996 to 2018), comprising 1121 patients with a total of 3279 liver lesions. The pooled sensitivity, specificity, and diagnostic odds ratio were 0.90 (95% confidence intervals (CI): 0.81–0.95), 0.88 (0.80–0.92), and 62.19 (23.71–163.13), respectively. The overall weighted area under the curve was 0.94 (0.92–0.96). Using two or more imaging planes and a quantitative/semiquantitative interpretation method showed higher diagnostic performance, although only the latter demonstrated statistical significance (P < 0.05). Advanced scanning sequences with DWI and liver-specific contrast media tended to increase the sensitivity for CRLM detection. We therefore concluded that contemporary MRI has high sensitivity and specificity for screening CRLM, especially for those with advanced scanning sequences. Using two or more imaging planes and adopting a quantitative/semiquantitative imaging interpretation may further improve diagnosis. However, the MRI results should be interpreted with caution because of substantial heterogeneity among studies.

Detection of colorectal liver metastases (CRLM) at an early stage is crucial for improving survival because it facilitates the selection of potential patients who will benefit from curable liver surgery and avoidance of those who are not appropriate surgery candidates 7 . The current diagnostic methods used for the evaluation of CRLM are heterogeneous. Among various imaging methods, such as computed tomography (CT), positron emission tomography (PET) combined with CT and ultrasonography (US), magnetic resonance imaging (MRI) has its superiority in LM detection due to its superior soft tissue resolution, multiple scan sequences, innovative MRI techniques, and the use of hepatocyte-specific contrast agents 8 . Recent studies have shown that liver MRI performed excellently in determining CRLM with both high sensitivity and specificity [9][10][11] . However, controversy exists regarding the role of MRI scanning and whether it can replace other imaging methods in the diagnosis of CRLM.
Therefore, we undertook this meta-analysis to evaluate the possible benefit of contemporary MRI to differentiate metastatic liver lesions from nonmetastatic liver lesions in patients with CRC.

Materials and Methods
Search strategy. A comprehensive online literature search was performed for studies evaluating MRI for screening hepatic metastases in CRC patients. We searched PubMed, EMBASE, and the Cochrane Library Study selection. Studies were included when the following criteria were met: (1) patients diagnosed with CRC; (2) MRI used as the evaluation tool for detection of hepatic metastasis; (3) histopathology (surgery, biopsy),or intraoperative ultrasonography/manual palpation, or clinical/imaging follow-up used as the reference standard for comparison; (4) sufficient data provided to reconstruct 2 × 2 tables of true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN); (4) studies based on a per-lesion analysis; (5) the number of patients was no less than 10; (6) English was the publication language; and (7) original article was the publication type. When there were replicated data presented in different studies, only the study with the largest sample size (i.e., the number of patients) was included. Only full-length articles were included. Case reports, reviews, conference abstracts, and letters, as well as papers using animal models were excluded. In some cases, MRI was used for the evaluation of hepatic metastasis in CRC, but focused on treatment response rather than on diagnostic performance, these articles were also excluded.
Data extraction and quality assessment. Data from the eligible studies were extracted independently by two of the authors (Mao and Zhao), and a third author (Liao) resolved any disagreement pertaining to the extraction of data, and the final consensus was made via discussion. For each report, the relevant information was extracted, including the name of first author, journal, country of origin, year of publication, studied population, study design (prospective or retrospective), patient enrollment procedure, scanner type, type of machine, magnetic field strength, scanning sequences, type of contrast agent (CA) used, number of imaging planes, minimum slice thickness, imaging interpretation method of positive MRI test, and reference standard. Values for TP, FP, FN, and TN findings for the MRI test were also recorded from each study.To assess the methodological quality and applicability of the included studies, the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool was used 12 . Each eligible study was evaluated by two of the authors (Mao and Zhao) independently, and discrepancies were resolved by discussion with a third author (Liao).

Statistical analysis.
We implemented all analyses on a per-lesion data basis. Based on the 2 × 2 tables, a bivariate model was used to obtain the weighted summary estimates of sensitivity and specificity, which were the main outcome measures, and a hierarchical summary receiver operating characteristic (HSROC) model was used to establish summary receiver operating characteristic (SROC) curves with 95% confidence intervals (CI) and prediction regions [13][14][15] . The pooled positive likelihood ratio (PLR) and negative likelihood ratio (NLR), as well as diagnostic odds ratio (DOR), which is a metric that integrates both sensitivity and specificity in its calculation 16 , for MRI detection of CRLM were also calculated. When several MRI sequences were separately evaluated for CRLM detection, the results of the most advanced sequence or most comprehensive protocol were selected for analysis. The data from each study were pooled by a fixed or random effects model based on the degree of heterogeneity. Heterogeneity across the included studies was assessed by Cochran's Q test and Higgins I 2 test. Substantial heterogeneity was considered present when P < 0.05 for Cochran's Q test or I 2 > 50% 17 . Publication bias was assessed by visual judgment of the Deeks' funnel plot and the P value derived from Deeks' asymmetry test 18 . In addition, meta-regression analysis, with a test standard of α = 0.10, was performed to explore the possible sources of heterogeneity among individual studies on pooled diagnostic performance, followed by subsequent subgroup analysis for those suggested variables.
All analyses were conducted using Stata version 13.0 (StataCorp, College Station, TX), with P < 0.05 being considered statistically significant.

Results
Selected studies. The initial search identified 443 potential articles, and 117 out of 443 were excluded due to duplicated articles. A total of 214 articles were excluded by reviewing the titles and abstracts. Full-text reviews were conducted on the remaining 112 articles, and 95 studies were rejected. Ultimately, 17 studies evaluating the diagnostic performance of MRI for LM detection in patients with CRC were considered for further analyses. As a whole, 1121 patients with a total of 3279 liver lesions were included in this meta-analysis. The screening process of the identified articles and reasons for exclusion are shown in Fig. 1. The size of the study population varied from 15 to 184 patients, with the total number of liver lesions ranging from 37 to 533. Of all studies, three claimed that they were prospectively designed 11,19,20 , eleven were retrospectively designed 9,10,21-29 , and the remaining were unclear [30][31][32] . The principal characteristics of the included studies are summarized in Table 1 and Table 2.
Quality assessment and evaluation of publication bias. The quality of the 17 included articles according to the QUADAS-2 assessment tool was considered moderate (see Fig. 2). Ten out of the 17 studies satisfied at least five of the seven QUADAS-2 domains and were considered high quality. For the patient selection domain, three studies were considered to possess a high risk of bias due to nonconsecutive enrollment procedures 22,24,29 . Two studies were considered to have high concern for applicability; one study only included patients with histologically uniform primary tumor and with oligometastasis (<5 metastases) 24 , and one study addressed only those LM that responded to preoperative chemotherapy 28 . For the index test domain, there was a high risk of bias in two studies, as it was not clear whether the interpretation of MRI was blinded to the reference standard 9,29 . None of the studies were considered to have high concern for applicability. For the reference standard domain, there was no high risk of bias or high concern for applicability in all studies. For the flow and timing domain, one study was considered to have a high risk of bias as different reference standards were adopted within the study, and the interval between MRI scan and reference standard was unclear 22 . Note that a majority of the included studies did not use a single reference standard within the study because of the limited feasibility for histologically verifying all hepatic lesions. Deeks' funnel plot asymmetry test 18 was used to evaluate the publication bias, as shown in Fig. 3. The slope was flat, and no publication bias was found (P = 0.987).
Diagnostic accuracy of MRi in detecting cRLM. The pooled sensitivity and specificity of all 17 studies for MRI to detect hepatic metastases in patients with CRC were calculated based on the random effects method www.nature.com/scientificreports www.nature.com/scientificreports/ since significant statistical heterogeneity did exist (I 2 = 96.1% for sensitivity; I 2 = 90.6% for specificity). As shown in Fig. 4, the pooled sensitivity and specificity were 0.90 (95% CI: 0.81-0.95) and 0.88 (95% CI: 0.80-0.92), respectively. Additionally, the DOR, PLR, and NLR were 62.19 (95% CI: 23.71-163.13), 7.21 (95% CI: 4.38-11.86), and 0.12 (95% CI: 0.06-0.23), respectively. The SROC curve of MRI for the diagnosis of CRLM was calculated by sensitivity against specificity (Fig. 5). The curve represented the overall test performance of all included studies. The curve showed that the 95% confidence and prediction regions displayed large variances among studies, further indicating that substantial heterogeneity existed among the studies. The overall weighted area under the SROC curve (AUC) was 0.94 (95% CI: 0.92-0.96).  www.nature.com/scientificreports www.nature.com/scientificreports/  www.nature.com/scientificreports www.nature.com/scientificreports/ Meta-regression and subgroup analysis. As indicated by the meta-regression analysis, a majority of covariates, including publication year, study design, patient enrollment procedure, magnetic field strength, scanning sequences, minimum slice thickness, lesion size, and reference standard, were not strongly associated with accuracy. The factors that potentially showed significant contributions to the heterogeneity, at a test standard of α = 0.10, were number of imaging planes, whether contrast enhancement was used, type of CA, region, and imaging interpretation method (Table 3). Subsequent subgroup analyses were performed for the above-identified variables by meta-regression whose sample size was no less than four studies. The subgroup of no contrast enhancement and type of CA using superparamagnetic iron oxide (SPIO) were excluded from the subgroup analysis because these two subgroups had small sample sizes (i.e., the subgroups included 2 and 3 studies, respectively), making the result unstable. Compared to the respective subgroup comparisons, the results indicated that the use of 2 or more imaging planes, studies from East Asia, and imaging interpreted with quantitative/semiquantitative methods demonstrated higher diagnostic performance for pooled sensitivity or specificity, although significant statistics were found only in the between-subgroup comparison of imaging interpretation method for pooled specificity (P < 0.05, see Table 4). Empirically, we also performed an additional subgroup analysis to see if those sequences including DWI and hepatocellular phase enhancement images performed superiorly in CRLM detection. The result was displayed in Fig. 6. The six studies (Fig. 6b) with both DWI and DCE sequences (only included studies which used hepatophilic contrast media as contrast agent) performed higher in the sensitivity (0.94 vs 0.85), but lower in the specificity (0.88 vs 0.89). However, none of these differences was statistically significant (P > 0.05 for both sensitivity and specificity). Meanwhile, the performance using liver-specific contrast media (LSCM, such as Evoist) also was compared with those using non-LSCM. As shown in Fig. 7. The ten studies (Fig. 7b) with LSCM performed higher in the sensitivity (0.94 vs 0.83), but lower in the specificity (0.87 vs 0.94). Both of these differences were statistically significant (both P > 0.05).  www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
Accurate detection of CRLM is vital for CRC patients due to the predictive significance for treatment and survival 33 . The prognosis of CRLM is largely dependent on the resectability of hepatic metastases 34 . Although preoperative CT scan is the first-line imaging modality for metastatic liver lesions, this modality could result in either missed metastases or unnecessary operations 33 . MRI is becoming the current standard in liver metastasis detection since it displayed superior sensitivity to CT with no potential radiation hazard [35][36][37][38] . For example, in Floriani et al. study 37 , the sensitivity for MRI and CT in the detection of CRLM on a per-lesion basis were 86.3% and 82.6%, respectively, significantly favoring MRI by the calculated odds ratio (0.66). However, opponent evidence also existed among their findings. CT during arterioportography (CTAP) was comparable with MRI either with extra-cellular contrast media or liver-specific contrast media in terms of performance for CRLM detection, though this finding might be unstable because of a small sample size. A more recent study by Choi 35    www.nature.com/scientificreports www.nature.com/scientificreports/ for sensitivity, 87.3% vs 73.5% for specificity). And this conception was also supported by another meta-analysis by Vreugdenburg et al. 38 which showed contrast-enhanced MRI had a higher sensitivity in detecting CRLM than contrast-enhanced CT on either a per-lesion (odds ratio = 1.29, P < 0.001) and per-patient basis (odds ratio = 1.21, P = 0.010). Based on the current evidences and the fact of having no ionizing radiation, MRI should be recommended as the first line screening scheme in the condition of economically affordable.
One of the advantages of MRI is the availability of multiple scan sequences. Currently, diffusion-weighted magnetic resonance images (DWI) and hepatobiliary phase images with CAs such as gadoxetic acid are among the most sensitive sequences in liver lesion detection. In this meta-analysis, we evaluated the diagnostic performance of contemporary MRI for the detection of hepatic metastasis in patients with CRC. It was also the first meta-analysis, to the best of our knowledge, to make a comprehensive performance profile of MRI in the diagnosis of CRLM.
Despite the superiority to CT, the accuracy of MRI for detecting LM in CRC patients has been controversial according to the literature, and MRI has also displayed a higher FP rate than PET/CT in a recent meta-analysis with gadoxetate disodium-enhanced MRI 35 . In this meta-analysis, we used DOR, PLR, and NLR as our measures of diagnostic accuracy. Generally, PLR greater than 10.0 and NLR less than 0.1 indicate a good diagnostic test. The DOR is the ratio of the overall true judgments relative to the overall false judgments and ranges from 0 to infinity. A higher DOR value indicates higher accuracy. Our results showed that DOR, PLR, and NLR were moderate (62.19, 7.21, and 0.12, respectively). For the sensitivity and specificity, our results showed that the pooled per-lesion sensitivity and specificity of the 17 included studies were 0.90 (95% CI: 0.81-0.95) and 0.88 (95% CI: 0.80-0.92), respectively, with the AUC of the SROC curve of 0.94 (95% CI: 0.92-0.96). The pooled sensitivity of the current study was higher than that in a previous meta-analysis with patients not previously treated (0.80, 95% CI: 0.75-0.85) 36 or with patients after neoadjuvant chemotherapy (0.86, 95% CI: 0.70-0.94) 39 and was comparable to that in most recent meta-analyses, which have focused on more advanced sequences such as DWI and contrast-enhanced MRI 35,40 . The results indicated that MRI could be used as a reliable screening tool in clinical practice for CRC patients with a suspicion of LM. However, we also observed notable heterogeneity in www.nature.com/scientificreports www.nature.com/scientificreports/ the homogeneity test for both sensitivity and specificity. Therefore, considering the moderate measurements of DOR, PLR, NLR, and the substantial heterogeneity, the MRI screening results should be interpreted cautiously as a whole, and the source of heterogeneity needs to be explored to understand the potential factors that may influence the pooled diagnostic performance 41,42 .
Using meta-regression analysis with α = 0.10, the factors that may be contributors to the heterogeneity were the number of imaging planes, type of CA, region, whether contrast enhancement was used, and imaging interpretation method. Specifically, studies that used two or more planes, originated in Asia, and used quantitative/ semiquantitative methods showed greater sensitivity and specificity than those that used only one imaging plane, originated in Europe or America, and used qualitative methods. However, the subgroup difference was only statistically significant for the imaging interpretation method in specificity (Table 4). It was unexpected that lesion size did not affect the diagnostic performance of MRI in detecting CRLMs. Contemporary MRI has deficits in the detection of small metastatic liver lesions, especially those smaller than 3 mm 8 . In a previous meta-analysis 43 , the accuracy of MRI in detecting CRLM between lesions smaller and larger than 10 mm was significantly different. We noted that one study in our meta-analysis included only lesions larger than 10 mm and had a moderate sensitivity (0.72, 95% CI: 0.59-0.83) and specificity (0.79, 95% CI: 0.54-0.94) 23 . Given that only a small portion of the included studies provided a separate dataset by lesion size, the detection difference according to lesion size could be easily masked. The results also showed that the sequences with DWI did not display superiority over sequences without DWI in the detection of CRLM. This failure to find a significant difference may be attributed to the factor that most of the included studies adopted contrast-enhanced sequences, either with Gd-related CA or SPIO, since DWI and contrast-enhanced imaging have comparable sensitivities in the detection of CRLM 40 . Although there was no publication bias found in the included studies using Deek's funnel plot, the publication language was limited in English in this meta-analysis, which could have introduced a potential bias.
For the subgroup analyses, though there was no significant difference between the performance of those combined LSCM enhancement with DWI and of those not, our results did reveal a trend that the scanning protocol which included both diffusion-weighted and hepatocellular phase images performed over those not (94% vs 85% for sensitivity). The hepatocyte-deficient tumors such as CRLM normally had lower signal intensity in a higher intensity liver background during the hepatobiliary phase, thus making the tumors conspicuous in liver parenchyma and leading to elevated sensitivity 24,44 . Our result conservatively supported this point. The limited study sample size may decrease the statistical power for a significant difference to manifest. We also performed a comparison between LSCM enhancement like Evosit and those using conventional CAs for the detection of CRLM. Though no significant difference was found again, the results tended to tell that LSCM, comparing with conventional CAs, did increase the true positive rate at the cost of increasing the false positive rate either. In a previous meta-analysis in which the results favored MRI 37 , the difference between MRI and CT in detecting CRLM was higher when LSCM were administered than when conventional CAs were used. This could be an indirect evidence to support the superiority of LSCM. However, no direct and robust evidence was established yet. And based on our results, we conservatively recommended the use of LSCM in the diagnosis of suspicious CRLM.
Additionally, our results showed that MRI detection accuracy for CRLM was greater in studies from Asia than in those from Europe or America, although this difference was statistically nonsignificant. It is unclear why the result favored studies from Asia, however, the four studies from Asia were almost all the most recently published papers (one in 2015 9 , one in 2017 31 , and the other two in 2018 10,28 ). Although our results did not show a year-related trend when the studies before and after 2010 were compared, a previous meta-analysis did indicate that MRI sensitivity in the detection of CRLM in studies after 2004 was significantly increased compared with those before 2004 36 . Our result also favored studies that used two or more imaging planes rather than those that used only one plane. This result is consistent with that of a recent study focusing on the diagnostic performance of MRI in bone metastases from prostate cancer 45 . Another finding was the better performance of the quantitative/ semiquantitative method for imaging interpretation than the qualitative method, especially for specificity, indicating that a quantitative/semiquantitative method used for the interpretation of imaging may reduce the FP rate.
Some limitations of this meta-analysis should be acknowledged. First, per-patient-based analyses were not performed in our study because of the limited data on a per-patient basis. Although a per-lesion-based analysis could be more accurate and could provide crucial information, such as lesion size, number and location, which are required for developing a therapeutic strategy, it is still important to differentiate patients with metastatic lesions from those without metastatic lesions. Second, although there was no significant publication bias in this study, selective reporting biases could exist since reviews, conference abstracts and letters to the editors as well as data published in languages other than in English were excluded. Moreover, the power of the funnel plots might have been low due to the limited sample size of the included studies in this meta-analysis. Third, notable heterogeneity was observed among the included studies. Although we investigated possible sources of heterogeneity by meta-regression analysis, the exploration of heterogeneity may still have been inadequate since the variables collected from the included studies were limited. Additionally, comparisons between some of the subgroups were unavailable because of the limited sample size. Finally, a majority of the included studies were retrospectively designed and used multiple reference standards, which can be considered limitations and potentially bias the results.

conclusions
In conclusion, our meta-analysis shows that MRI demonstrated high sensitivity and specificity for the detection of LM from CRC. Studies using multiple imaging planes for the assessment showed higher diagnostic accuracy than those using only one plane. Imaging interpretation with quantitative/semiquantitative methods was superior to qualitative methods.
Advanced techniques such as scanning with DWI and liver-specific CAs tends to be more sensitive. Nonetheless, given the notable heterogeneity and inherent limitations, large-scale, prospectively designed trials are needed to verify the clinical value of MRI, especially for the added value of DWI and liver-specific CAs enhanced MR imaging.