Non-Gaussian models of diffusion weighted imaging for detection and characterization of prostate cancer: a systematic review and meta-analysis

The importance of Diffusion Weighted Imaging (DWI) in prostate cancer (PCa) diagnosis have been widely handled in literature. In the last decade, due to the mono-exponential model limitations, several studies investigated non-Gaussian DWI models and their utility in PCa diagnosis. Since their results were often inconsistent and conflicting, we performed a systematic review of studies from 2012 examining the most commonly used Non-Gaussian DWI models for PCa detection and characterization. A meta-analysis was conducted to assess the ability of each Non-Gaussian model to detect PCa lesions and distinguish between low and intermediate/high grade lesions. Weighted mean differences and 95% confidence intervals were calculated and the heterogeneity was estimated using the I2 statistic. 29 studies were selected for the systematic review, whose results showed inconsistence and an unclear idea about the actual usefulness and the added value of the Non-Gaussian model parameters. 12 studies were considered in the meta-analyses, which showed statistical significance for several non-Gaussian parameters for PCa detection, and to a lesser extent for PCa characterization. Our findings showed that Non-Gaussian model parameters may potentially play a role in the detection and characterization of PCa but further studies are required to identify a standardized DWI acquisition protocol for PCa diagnosis.


Diffusion Weighted MRI Mathematical Models in PCa
The most commonly used mathematical models used to fit DWI signal are listed below: 1. Mono-exponential Model (ME) 18 In all the equations, S(b) is the measured signal intensity at a certain b, S 0 the signal intensity at b = 0 and b is a factor that measures the degree of diffusion weighting applied. In the ME model, i.e. the standard gaussian model, ADC is the Apparent Diffusion Coefficient, an average value related to diffusion. In the IVIM model f is the perfusion fraction, D the molecular diffusion coefficient and D* the pseudo-diffusion coefficient. D fast and D slow are, respectively, fast and slow diffusion coefficients of the BE model and f fast and f slow their amplitudes. DDC is the distributed diffusion coefficient of the SE model, and α is the heterogeneity index. Finally, in the DKI model, D K is the diffusion coefficient corrected for kurtosis, and K is the kurtosis coefficient. See Supplementary Material-S1-for more details on Non-Gaussian DWI models.

Search strategy and selection criteria. A systematic search for relevant published studies examining
and, if any, comparing mono-exponential DWI and Non-Gaussian DWI models in PCa diagnosis was conducted. The most relevant scientific electronic databases (PubMed, Cochrane Library, MEDLINE, ScienceDirect, Google Scholar) were comprehensively explored and used to build the search. Only studies published after 2012 were selected. The search strategy included the key terms listed in Supplementary materials -S2-. The literature search was restricted to English language publications and studies of human subject.
Two reviewers, after having independently screened identified titles and abstracts, assessed the full text of the articles that evaluated mono-exponential DWI and at least one Non-Gaussian model, between IVIM, BE, SE, DKI, for diagnosis of PCa (detection and/or characterization of aggressiveness) and were not review articles. For articles meeting these criteria with full text available, the following further selection criteria had to be fulfilled: presence of PCa histopathological confirmation (either from biopsy or radical prostatectomy), of information about DW-MRI protocol, of a maximum b-value at least equal to 1500 s/mm² (if DKI was performed). Moreover, articles were excluded if computed b-values were used, if they concerned only model fitting quality or differentiation between PCa and benign lesions, not evaluating NT. The above-mentioned selection procedure was employed to perform a systematic review, providing a qualitative synthesis of currently used Non-Gaussian DWI models in PCa diagnosis.
In order to perform a meta-analysis, a further exclusion procedure was performed: studies were removed if they focused only on correlation of parameters with GS, and not on their ability to differentiate low-GS from high-GS tumors; if number of analyzed region of interest (ROI) was not mentioned; if data were reported in the form of mean and Inter Quartile Range (IQR), median and IQR, or median and 95% Confidence Interval (CI); if mean and standard deviation of parameters were not reported (or could not be calculated); if b = 0 mm 2 /s was not included in the DWI protocol. Moreover, if there was a high heterogeneity in fitting functions/procedures used for biexponential models, statistical analyses had not been conducted.
planning of the study. The articles were classified according to the Non-Gaussian models examined and the diagnostic purpose they had, as reported in Table 1.
Consequently, our work was organized in accordance with Fig. 2: for each Non-Gaussian model, a qualitative analysis followed by a meta-analysis (when a sufficient number of articles was available after the application of selection criteria) was performed both to assess the difference of the mean value of IVIM, BE, DKI and SE parameters between NT and PCa (PCa detection) and to assess the parameters mean differences between low GS and intermediate/high GS PCa (characterization of PCa aggressiveness).

Meta-analyses methods.
Meta-analyses were conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement 25 (See Supplementary Materials-S3-for PRISMA Checklist).
Although there were not enough data for performing an assessment of diagnostic accuracy, the quality of studies included in meta-analysis was evaluated, using the QUADAS-2 26  The Cochrane Collaboration). The quality of each study was evaluated by two reviewers independently and any disagreement was resolved by consensus. If meta-analysis included a number of studies superior to 10 27,28 , publication bias was assessed by visually inspecting a funnel plot. Mean and Standard Deviation (SD) of diffusion model parameters were extracted from each selected article. To analyze the differences between groups, Weighted Mean Differences (WMD) and 95% CI were calculated. The overall result was considered statistically significant if the test for overall effect returns a probability value lower than 0.05.
Heterogeneity among included studies was assessed using the Q statistic of the chi-square value test and the inconsistency index of Higgings 29 I 2 , with values of 25%, 50%, and 75% considered as low, moderate, and high, respectively. If the P-value of heterogeneity test was less than 0.1 or the I 2 -value was greater than 50%, the summary estimate was analyzed by a random-effects model 30 . Otherwise, a fixed-effects model was applied.
All statistical computations were performed using RevMan (version 5.3, The Cochrane Collaboration).

Results
The PRISMA flow diagram of included studies according to the inclusion and exclusion criteria is reported in Fig. 3.
Results of qualitative analysis. 29 studies fulfill all the inclusion criteria and were involved in the qualitative analysis: their characteristics are summarized in Tables 2 and 3. In all selected studies, ADC value proved to be a useful tool for discriminating both NT from PCa and lowfrom high-GS tumors, with lower values in tumors than in NT and in high-GS tumors than in low-GS tumors. On the other hand, findings on Non-Gaussian model features are not always equally consistent. www.nature.com/scientificreports www.nature.com/scientificreports/ Studies on iViM. 14 studies included IVIM comparison with ADC: 6 on detection [31][32][33][34][35][36] , 4on characterization [37][38][39][40] , and 4 on both [41][42][43][44] . Most studies on detection found that D was significantly smaller in PCa when compared to NTs 31-36,41-44 , but Mazzoni et al. 34 revealed a dependence on b-value range, showing statistical significance only using a maximum b value of 800 s/mm 2 . Results on f, when significant, revealed a lower value in PCa than in NT [31][32][33][34]36,43 , with the exception of study by Ueda et al. 35 who, conversely, found f significantly higher in peripheral zone (PZ) cancer than in normal PZ tissue, although they found no significance when performing the same analysis in transitional zone (TZ). Results on D* were found to be not significant in many studies [33][34][35][36] . Pesapane et al. 43 found that D* was significantly higher in tumor tissue when compared to NT, in accordance with Kuru et al. 41 and Valerio et al. 42 . While Linear Discriminant Analysis performed by Pesapane et al. 43 and Valerio et al. 42 showed that the additional use of IVIM increased performances of conventional T2/DWI for PCa detection, ROC analysis performed by Kuru et al. proved that none of IVIM parameters yielded a clear added value, such as in Feng et al. 36 . Whereas, Martin et al. 32 found D and f to be statistically significant, obtaining also a slightly higher AUC for these two parameters than for ADC. In PCa aggressiveness characterization, D seems to be the most performing IVIM parameter [37][38][39][40][41][42] , resulting lower in high-than in low-GS tumors in accordance with ROC analysis performed by Yang et al. 38 . About D*, only Valerio et al. 42 found its value to be significantly higher in high-than in low-GS PCa, while f was considered unable to discriminate between GS grade in all selected studies. Pesapane et al. 43 identified all three IVIM parameters as not useful for characterization on PCa aggressiveness. In addition, some studies carried out a correlation analysis between the IVIM parameters and the GS: they revealed www.nature.com/scientificreports www.nature.com/scientificreports/ a significative negative correlation between D and GS [38][39][40] , while no correlation was found for f and D*. Merisaari et al. 44 used machine learning techniques in order to evaluate if GS prediction could be improved using ADC in combination with IVIM, but they concluded that ADC continued to be the best-performing parameter for this scope.  High GS regions were reported as sum of all PCa, Low GS or High GS regions respectively, regardless of the affected prostate zone (peripheral zone, transition zone, central gland); the number of NT regions were reported as sum of all NT regions, regardless of the affected prostate zone and the patient on which the ROI was placed (e.g. healthy volunteer, PCa patient). The column "Reason for exclusion from MA" reported the reason why the study under investigation, or which analyzed model, was excluded from the meta-analysis. As regards IVIM and BE models, they were excluded from meta-analysis due to heterogeneity in fitting functions/procedures among selected studies and this was indicated by the sentences "Study on IVIM model" or "Study on BE model", if the study included only these models, "Exclusion of IVIM model" or "Exclusion of BE model", if the study reported other Non-Gaussian DWI models that were, instead, retained into meta-analysis.  48 . According to Mazaheri et al. 47 and Liu et al. 49 , DDC and αin PCa were significantly lower than in NT, showing that SE model parameters could provide additional information to ME model. These results are consistent with those of Feng et al. 36 who, however, also performed a ROC analysis showing good performances for both parameters, but no significant performances improvement with respect to ADC. Jambor et al. 46 found similar results for DDC, but not for α. Even ROC analysis performed by Toivonen et al. 48 reveals significance for DDC, both in PCa detection and characterization, showing also a negative correlation between DDC and GS, even if performances are not such to substitute the ME model. In the study by Liu et al. 50 DDC was found to be the only SE parameter able to differentiate low-from high-grade tumors, showing a significant negative correlation with GS. Merisaari et al. 44 found a statistically significant correlation for both DDC and αwith GS (negative and positive respectively), a high AUC for low-versus high-GS classification and an AUC improvement combining DDC and α by means of machine learning. Nevertheless, none of these parameters or their combination considerably outperformed ADC parameter of ME model. All the studies revealed a significantly higher K value in tumor tissue than in NT, with increasing trend in high GS lesions, but there were different opinions regarding the added value offered by the use of DKI for PCa diagnosis. Preliminary findings by Rosenkrantz et al. 54 indicated higher capability of DKI model for both distinguishing benign from malignant PCa lesions and low-from high-grade PCa lesions when compared to ADC, by means of ROC analyses. They found K significantly higher and D K significantly lower in PCa than in NT, and the same behavior was observed when low-and high-GS lesions were compared. The findings on PCa detection were successively confirmed by Tamura et al. 51 , Jambor et al. 46 , Suo et al. 55 , Mazaheri et al. 47 , and Mazzoni et al. 34 who found that K was the best-performing parameter. Feng et al. 36 , Roethke et al. 56 , and Langkilde et al. 15 , although their statistical significance test results were in accordance with the above-mentioned studies, revealed not sufficiently higher performance of D K or K compared to ADC at ROC analysis, in accordance with Toivonen et al. 48 . A similar conclusion was made my Tamada et al. 57 , who studied only K parameter. In the last 4 papers the authors, together with Wang et al. 53 , found similar results also in PCa characterization. Suo et al. 55 found significant differences among low-and high-grade lesions for K value, but not for D K value. Conversely, results by Barrett et al. 58 were not significant, showing poor capability for both D K and K at separating low-and high-grade lesions. Correlation analysis for the DKI parameters, evaluating their relationship with GS, revealed a significatively positive correlation between K and GS 48,52,53,55 , and a significantly negative correlation between D K and GS 48,52 . Suo et al. 55 , however, found no significant correlation between D K value and GS.

Results of meta-analyses.
In order to perform the meta-analyses indicated in Fig. 2, 17 studies were fully excluded from previous selection. Data from the same study were considered multiple times in the following cases: different b-value ranges were used to perform acquisition/processing of DW-MRI images; PCa tissue was compared with different zones of NT (e.g. PCa vs normal TZ and PCa vs normal PZ); NT was compared with different PCa zones (e.g. NT vs PCa in TZ and NT vs PCa in PZ); NT was compared with PCa with different GS (e.g. NT vs PCa with GS = 6 and NT vs PCa with GS ≥ 7); low grade PCa was compared to more than one PCa with higher GS. Data from the study of Langkilde et al. 15 were extracted only for the analyses including b = 0 s/mm 2 .
Finally, as for PCa detection, studies concerning DKI and SE were respectively 32, and 10. For the SE, 1770 regions (694 positive) were analyzed; for the DKI, 5167 regions (2136 positive) were analyzed for K and 3331 (1439 positive) for D K .
As for characterization, the meta-analysis was performed only for DKI model, because there were no sufficient SE data available. DKI was addressed by 9 studies, with 897 lesions (300 low-grade) included for K and 440 (154 low-grade) for D K .
The overall quality of included studies in meta-analyses, for both PCa detection and characterization, was considered good for our purposes (See Supplementary Materials -S4-).
Concerning IVIM and BE models, the high heterogeneity in applied fitting functions and procedures (see Table 2) led us to decide not to perform any quantitative analysis on these models. performance of models in pca detection. The random-effect analysis was used for all SE and DKI parameters. The heterogeneity was high (I 2 > 50% and P < 0.0001). D K was significantly higher in NT, while K was significantly higher in PCa tissue. These results are highlighted by the forest plot in Figs 4 and 5. performance of models in pca characterization. D K was found to be significantly higher in PCa lesions with low GS, while K was significantly higher in those with high GS. These results are highlighted by the forest plot in Fig. 6.
Kuru et al. 41 3    www.nature.com/scientificreports www.nature.com/scientificreports/ Publication bias. Since the number of included studies was superior to 10 only in meta-analyses based on detection and involving DKI parameters (D K and K), publication bias was evaluated plotting funnel plots only in these cases. As shown in Supplementary Materials -S5-, all plots suggested a low risk of publication bias.

Discussions
In this study we investigated the potential value of the most commonly used Non-Gaussian DWI models for detection and characterization of prostate cancer. ME DWI has been shown to be a useful tool for PCa diagnosis, not only for detection of prostatic lesions, but also for characterization of disease aggressiveness 8,9,16,17 . However, in the last years, due to the limited assumption at the basis of this approach, many Non-Gaussian diffusion models have been proposed to better depict diffusion signal 19,20,23,24 and more and more studies investigated their diagnostic role and clinical value on PCa 13,15,31,[33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48][49][50][52][53][54][55][56][57][58][59][60][61] . Due to the large amount of studies and data contained therein, and the not always clear results regarding the benefits of using Non-Gaussian DWI for PCa diagnosis, we performed a systematic review followed by a meta-analysis, reviewing their parameters and trying to understand if they could provide additional information. We initially examined all found studies concerning both Non-Gaussian and Gaussian models for detection and characterization of prostate cancer from 2012 onwards. This qualitative analysis involved 29 studies: results and conclusions of selected studies varied from each other, often showing inconsistence and not a clear idea about the actual usefulness and the added value that Non-Gaussian model parameters may led to standard DWI.
In particular, among IVIM parameters, D seemed to be the most useful for PCa diagnosis, while there was no clarity on the effective significance and usefulness of the parameters related to perfusion, D* and f. DKI parameters, K and D K , according to a large number of examined studies were considered to be useful for PCa detection, but a clear evidence about DKI better performance had not still be reached.
Compared to those on IVIM and DKI, a smaller quantity of studies on BE and SE was present in literature, so it was even more complicated to assess the ability of these models to help in PCa detection and characterization. Among SE parameters, only DDC seemed to show significance, while conflicting views were expressed on α. About BE, there were conflicting or not significant results for D slow , D fast , f fast , and also a not clear interpretation of their physiological basis 45 .
In this scenario, it would be desirable to reach a common view on the use of Non-Gaussian DWI models in addiction or in substitution of the standard DWI protocol for PCa diagnosis. With our systematic review, we summarized the available data reported in previously selected studies, and, if possible, for each Non-Gaussian model, we quantitatively evaluated the capability to differentiate between NT and PCa and between low-and intermediate/high-GS lesions. The high heterogeneity in applied functions and fitting procedures due to a not yet existing consensus on the best processing approach to fit biexponential models 62 led us to not perform any statistical analysis on IVIM and BE models. Moreover, the small quantity of studies on SE models did not allow us to conduct statistical analysis on the power of α and DDC to classify lesions on the basis of GS. Our results showed that, concerning PCa detection, K, D K , DDC and α showed statistical significance, allowing to distinguish prostate cancer from NT. Lower values of D K and DDC found in PCa could be linked to the destruction of the prostatic structure-like acini together with a higher cellular and stroma density typical of PCa tissues 49,55 . The significantly higher value of K in PCa could be associated with the increased microstructural complexity of prostate cancer 54,55 . On the contrary, α was found to be significantly lower in PCa and this is in accordance to its association with histological heterogeneity, clearly present in prostate cancer tissue 49 . It should be considered that studies involving SE were fewer than those selected for DKI and so, although the results on DDC and α showed statistical www.nature.com/scientificreports www.nature.com/scientificreports/ significance, more studies would be required to validate this result. Given the high number of included studies, and the significative performance of both its parameters, DKI model claimed to be helpful for the detection of PCa.
Among outcomes regarding characterization of PCa aggressiveness, only K and D K , showed statistical significance, differentiating lesions according to GS. The significantly greater value of D K in intermediate/high-lesions highlighted reduced diffusion of water in these PCa than in those characterized by a GS equal or lower than 6, and it could be linked to the progressive increased cellularity of malignant lesions 54 . The significantly greater value of Kurtosis (K) in intermediate/high-lesions highlight reduced diffusion of water in these lesions, and it could be linked to the progressive increased microstructural complexity of malignant lesions [52][53][54][55] . The computed heterogeneities were lower than the ones found in the analyses on PCa detection, but it is not so reliable due to the lower number of studies involved 63 .
The obtained high heterogeneities, both in PCa detection and characterization, may be caused by several factors that match to the limitations of the study, which are highlighted in the following subparagraph.
Limitations of the study. This meta-review suffers from several limitations which deserve to be discussed.
As concern patient characteristics, we did not consider patient age and did not draw distinction between different kind of NT ROIs (e.g. placed on prostate of healthy volunteers or on healthy prostate zones of PCa patients) or between the different examined anatomic zone (TZ, PZ, CG). Also, we have not considered the different pathologic reference standard used across studies, be it biopsy or radical prostatectomy specimens, which may lead to assign different Gleason scores for the same prostatic lesion 64,65 .
As to characteristics of DWI protocol, because of the lack of a standard b-value range for PCa, b-values used to acquire DWI signal differed across selected studies, probably contributing to increase heterogeneity due to the supposed dependence of the computed parameters on the adopted b-values 34,36 . Specifically for IVIM parameters, the high variability found across studies can also be explained by the effect of a non-constant Echo Time (TE) 66,67 and of a variable number of selected b-values 68 .
Moreover, the different applied diffusion time, which was found to be related to ADC and fractional anisotropy (FA) by Bourne et al. 69 could have influenced results, but only four of selected studies reported information on diffusion time parameters 15,35,44,47 (see Supplementary Material -S6-for more details). Further studies aimed at finding a standard DW-MRI protocol for PCa are required.
It could be also interesting to explore the huge sources of heterogeneity and their effects on parameters performances in PCa diagnosis using a meta-regression and subgroup analyses, in order to perform a diagnostic test accuracy (DTA) meta-analysis and obtain quantitative comparison of each Non-Gaussian models with mono-exponential model, as done, for example, by Si et al. 70 for DKI. However, selected studies, especially those covering PCa characterization and BE or SE, did not reported sufficient data to build contingency tables, calculate quantitative measures and construct summary ROC curves. Moreover, too few studies on PCa characterization provided correlation coefficient values between parameters and GS to perform a correlation meta-analysis.
With respect to data analysis characteristics, we have not made any differences between 2D and 3D ROIs. In addition, we included studies that used both ROI-based parameters extraction approach (i.e., fitting procedure over the mean signal intensity values in previously traced ROIs) and voxel-based approach (i.e., voxel-by-voxel www.nature.com/scientificreports www.nature.com/scientificreports/ fitting procedure and subsequent ROI measurement on the resulting parametric map 59 ). Further, there was neither an established fitting procedure to obtain parameters belonging to the same Non-Gaussian model and this made it impossible to perform any statistical analysis on certain models such as IVIM and BE. Larger studies are required to find a standard fitting function/procedure. Moreover, it is not to be neglected the lack of access to data in the selected studies, which prevents the external validation of obtained results by independent researchers 71 . Only Feng et al. 36 took a step toward data sharing, providing, in the Supporting Material of their work, a full table including the estimated DWI parameters for each patient included in the study.
Data availability issue is gaining an increasing importance in research and in particular in MR community 72 . Some goals of data sharing should be the creation of reference findings/algorithms/implementations that can be used for comparison when publishing new methods or the possibility to compare individual results, for example in terms of consistency, computation time and hardware/programming language requirements. To our knowledge, among research teams dealing with prostate cancer diagnosis, only the groups of Chaddad et al. 73 and Jambor et al. 74 make their data fully available. With respect to our meta-analysis, the access to external data would have allowed not only to go beyond the traditional meta-analysis of intervention venturing into more complex meta-analytic approaches 75 , but it would have also helped to shed light in the issue of standardization of DWI protocols and fitting functions/procedures for Non-Gaussian DWI parameters. The absence of standardization directly affects repeatability and reproducibility for DWI parameters that are fundamental to allow a fair and robust comparison among studies, leading to a more powerful meta-analysis 71 . In addition, repeatability and reproducibility are also essential in clinical practice for a correct treatment planning, response monitoring and to use DWI as clinical biomarker for cancer diagnosis [76][77][78][79][80][81][82] . To our knowledge, there are only few studies on Non-Gaussian DWI models for PCa diagnosis approaching these issues. Merisaari et al. 44 evaluated five different IVIM fitting methods using a population of 81 patients who underwent DWI examination twice, revealing poor reproducibility of IVIM parameters. Jambor et al. 46 addressed for repeatability of ADC, SE, DKI and BE parameters involving a smaller patient population of 24 patients who underwent four DWI examinations, showing low repeatability for BE fitted parameters and a higher and similar repeatability among ADC, SE and DKI fitted parameters. To our knowledge, current literature lacks of studies concerning reproducibility of Non-Gaussian DWI parameters for PCa diagnosis across different centers, scanners, and readers. This is a direct consequence of the above-mentioned lack of standardization in DWI protocol, fitting functions/procedures, and also automatic tools for ROI placement 44 .
Nevertheless, the strength of this work was that we assessed not only tumor detection, as in other reviews 70 , but also PCa characterization in terms of aggressiveness. In addition, multiple Non-Gaussian models were considered.

conclusions
The role of Non-Gaussian diffusion models in the detection and characterization of PCa aggressiveness and, consequently, in making clinical decisions, remains questionable. In this context, this article, highlighting the challenges that have emerged when comparing studies between each other, may serve as a starting point for future studies evaluating Non-Gaussian DWI performances to give a more precise biophysical interpretation of their parameters with the objective of identifying a standardized DW-MRI protocol in PCa diagnosis.