Introduction

Accurate segmentation of the prostate and the prostatic zones on T2w MRI consist the first step for a plethora of medical image analysis applications where clinically useful information needs to be extracted from the region of interest (ROI). Some of the most common applications are cancer detection and aggressiveness characterization, early prediction of recurrence, detection of metastases and assessment of treatment effectiveness, among others1. As the medical imaging domain moves toward sub-scale levels, with information being extracted from single voxels or pixels, segmentation accuracy is getting more demanding2. Particularly with the rise of radiomics analyses, any variability in the segmentation of the ROI will affect the numerical output of the features, thereby introducing bias into the evaluation of quantitative imaging biomarkers3,4. Furthermore, in the era of MRI-guided radiotherapy, precise organ and tumor delineation is of paramount importance as it may directly affect clinical outcomes5,6. Nevertheless, manual delineation of ROIs, not only is a time-consuming and labor-intensive task but also it thoroughly depends on the radiologist’s experience7.

To date, a plethora of Deep learning (DL) fully connected Convolutional Neural Network (CNN) pipelines have emerged to alleviate the burden of manual annotation in various radiological applications by automating and speeding up the segmentation process8. Most commonly, the backbone of such models is the U-net architecture9. There are a few comprehensive reviews on emerging DL applications for medical image segmentation10,11,12,13. Despite the state-of-the-art performance of novel architectures, the prostate and, particularly the prostatic zone segmentation, remains one of the most compelling research areas14,15.

Given the already large number of parameters included in state-of-the-art prostate segmentation models, image preprocessing, either by using denoising filters or contrast enhancement techniques, is aimed at increasing models performance by emphasizing key image characteristics relevant to the specific learning task16. An indispensable part of image preprocessing is image enhancement which aims to improve the visual quality of the image by modifying the intensity values of individual pixels so that anatomical structures can more easily be recognized by humans and machines. This is achieved by means of adequate gray-scale transformations17 aiming to disentangle the intensity distributions arising from adjacent regions with similar gray level intensities18. Therefore, by sharpening the boundaries between different tissues19, contrast enhancement has emerged as a powerful method for improving the accuracy of DL segmentation models20.

In the literature, several studies have demonstrated the effectiveness of image preprocessing to reduce the ambiguity of CNNs regarding their judgment and the feature extraction process21. There is a plethora of image enhancement techniques, many of which propose modifications of the Histogram Equalization algorithm (HE) or combinations of existing methodologies, such as the contrast limited adapted histogram equalization (CLAHE)22. The CLAHE algorithm consist one of the most popular and well-cited image enhancement techniques, as it appears to be particularly effective in medical imaging applications23,24,25. An improvement of CNN models’ performance on a variety of tasks has been reported after the application of image processing techniques, including object and texture classification26,27. This extends to medical imaging domain as well, in which, some authors have compared the effectiveness of image enhancement techniques for improving the quality of different imaging modalities (i.e. X-rays, CT, MRI) and for different clinical applications, such as lung, bone and vessel segmentation23,28,29, but also for disease detection and classification30. For instance, Rahman et al.29 evaluated the impact of various lung segmentation CNN algorithms and image enhancement techniques, including gamma correction, HE, CLAHE, image complement and Balance Contrast Enhancement Technique (BCET) on COVID-19 detection using X-ray images. The effect of image enhancement on liver segmentation from CT images, cervical cancer segmentation from T2W MR images, and vessel segmentation from 2D fundus images has also been investigated, suggesting that image enhancement prior to CNN model training leads to significant improvement in models’ performance31,32,33.

In this work, we propose an extension of the CLAHE method with the aim to improve the performance of state-of-the-art CNN models for segmenting the prostate gland and the prostatic zones. The performance of the proposed Region Adaptive CLAHE (RACLAHE) pipeline was compared against four prominent histogram-based image enhancement techniques, while the influence of the preprocessing methods on segmentation performance was assessed through the implementation of five well-established CNN models.

Overall, the main contributions of this study are the following:

  • We propose an image enhancement method that consistently improves the performance of CNN segmentation models in T2 MR images of the prostate.

  • We demonstrate, through feature map-driven visual explanations, that the proposed method is capable to enhance the image features that are most relevant to the segmentation task.

  • We introduce a quantitative and qualitative feature importance metric to provide insights regarding DL segmentation models’ performance, thereby enhancing their explainability.

  • To the best of our knowledge, this is the first study to quantitatively and qualitatively evaluate the effectiveness of image enhancement methods employing CNN models for prostate segmentation on MR images.

Results

Datasets description

The impact of four well-known histogram-based image enhancement methods along with the proposed region-adaptive technique were investigated for improving the segmentation of prostate’s whole gland (WG), transitional zone (TZ) and peripheral zone (PZ), using two publicly available datasets. One dataset was used for model training and another dataset was used to test externally the models’ performance. For model training, 204 patients from the Prostate-X dataset34,35 were used, along with the corresponding masks for the WG, TZ and PZ. The dataset consists of 3206 frames from Siemens' T2-weighted MR scans (TrioTim, Skyra models). For model testing, the Prostate 3-T36 dataset was employed, which included 30 patients and 421 frames with the associated annotations for all three regions acquired from Siemens' T2-weighted MRIs (Skyra model). In order to better examine the aforementioned prostatic areas, a descriptive analysis was conducted to quantify the inter- and intra-patient volume variations of the different prostatic regions (Supplementary Fig. S1).

Evaluation of preprocessing methods

The metrics used for the evaluation of the proposed method were the Dice Score index (DS), the Rand Error Index (REI), the Sensitivity, the Balanced Accuracy (BA), the Hausdorff Distance (HD), and the Average Surface Distance (ASD). Tables 1, 2 and 3 show the prostate’s WG, the PZ and the TZ segmentation performance, respectively, of the five DL models using different image enhancement methods. For comparison, the models’ performance was also computed using the original images, without applying any enhancement. In the tables, it is also indicated whether the proposed RACLAHE performed significantly better than other preprocessing methods. The corresponding boxplots of DS, Sensitivity and HD are provided in Supplementary Figs. S2S4 for the WG, PZ and TZ, respectively.

Table 1 Whole gland (WG) segmentation performance.
Table 2 Peripheral zone (PZ) segmentation performance.
Table 3 Transitional zone (TZ) segmentation performance.

Although, for WG segmentation the most robust networks tend to perform best without any image preprocessing (i.e. U-Net++, U-Net3+, USE-NET), the proposed RACLAHE algorithm was able to improve the sensitivity and BA in most cases, as shown in Table 1. AGCWD was efficient in improving U-Net and U-Net++ but degraded U-Net3+. With AGCCPF, only the performance of Unet was improved, achieving results similar to RACLAHE but degraded slightly U-Net++ , U-Net3+ and USE-NET. The CLAHE algorithm marginally outperforming RACLAHE for the ResU-net model, but degraded other networks such as U-Net3+. The RLBHE had the lowest performance compared to other methods with remarkably high variability. It is worth noting that the USE-NET model was the best performing network and remained invariant to image preprocessing. Even without any preprocessing, USE-NET achieved better scores for WG segmentation than all other models (i.e. AUC = 0.88 ± 0.12).

On the other hand, for the evaluation of prostate’s PZ segmentation, the RACLAHE algorithm consistently improved the performance of the majority of DL models, as it is shown in Table 2. The only exception was the ResU-net, for which AGCWD and AGCCPF achieved superior performance. Similar to WG segmentation task, for PZ segmentation, the models’ performance was degraded when the RLBHE was used. The CLAHE algorithm also degraded the models’ performance, except for USE-NET. Overall, the best performance was achieved with the ResU-net model (DS = 0.75 ± 0.17 for AGCCPF). Regarding the segmentation of prostate’s TZ, shown in Table 3, the proposed RACLAHE algorithm was the only consistent preprocessing method for significantly improving the performance of all the five networks. The best results for TZ segmentation were obtained for RACLAHE combined with the USE-NET model (DS = 0.81 ± 0.16). Overall, the average improvement across DL models in terms of DS was 3%, 8% and 9% for WG, PZ and TZ segmentation respectively.

Inter-model performance and variability

The stability of each preprocessing method was computed in terms of method-specific average performance and variance in performance across all models. The inter-model performance, referring to the mean value for each method taking into consideration all the models, shows how each approach impacts DL segmentation models in general, and is described as follows:

$$Performance \left(m,filt\right)=mean\left({p}_{ResU-Net}\left(m,filt\right),{p}_{U-Net}\left(m,filt\right),{p}_{U-Net3+}\left(m,filt\right), {p}_{U-Net++}\left(m,filt\right),{p}_{USE-NET}\left(m,filt\right) \right),$$
(1)

where \(m\) is the metric, \(filt\) is the histogram processing methods, \(p\) is the performance (mean score) for each model. This metric assists further for the identification of the best method as it reveals the method that has the most effective performance among models. In additions the inter-model variability shows how far are the mean values for each model from the \(Performance (m,filt)\) evaluation metric. Typically represents the standard deviation across models (\(std\)) and it is reproduced by Eq. (2):

$$Variability \left(m,filt\right)=std\left({p}_{ResU-Net}\left(m,filt\right),{p}_{U-Net}\left(m,filt\right),{p}_{U-Net3+}\left(m,filt\right), {p}_{U-Net++}\left(m,filt\right),{p}_{USE-NET}\left(m,filt\right) \right).$$
(2)

Figure 1a presents the normalized performance for each preprocessing method while Fig. 1b presents the normalized variability across the segmentation models. For a given preprocessing method, the normalization of the results was performed with respect to the minimum and maximum performance for each metric. For instance, in Fig. 1a the best performing preprocessing method in terms of DS has a value of 1 while the worst has a value of 0. Specifically, for the distance-based metrics (HD, ASD) where lower values indicate a better performance, the inverse of these values was computed in order to have a consistency in scale. In Fig. 1b the variability of the mean performance across models, for each preprocessing method and metric, is shown. After normalization, the lowest variability across models is indicated with the value 0 while the highest variability is indicated with the value 1.

Figure 1
figure 1

(a) Intermodel performance for each metric and histogram based preprocessing technique. The best preprocessing method for each metric is depicted in yellow and the inferior in dark greens. (b) Intermodel variability for each metric and the histogram based preprocessing technique. The best method for each metric is depicted in white and the inferior in dark red.

Considering Fig. 1a, for WG and PZ segmentation, the RACLAHE outperformed all the other preprocessing methods, in terms of sensitivity, BA, DS and REI, while it had the second-best performance, after AGCWD, in terms of HD and ASD. With respect to variability in performance across DL models, shown in Fig. 1b, the RACLAHE has the lowest standard deviation for WG and PZ segmentation, apart from the HD and REI metric, respectively. Regarding the TZ segmentation, the proposed method was superior for all the metrics with the lowest inter-model variability in performance.

Explainability of model’s predictions

In order to explain the effect of each preprocessing method on the DL models, we sought to quantify how each model’s important for the task features diverge from the ground truth density maps of the binary masks. Specifically, the density map of ground truth masks are given as:

$${GT }_{Density \,map}=\sum_{i,j=0}^{i,j=256}\left\{\sum_{k=0}^{k=Nsl}G{T}_{ij}\left(k\right)\right\},$$
(3)

where \({GT }_{Density \,map}\) is the density map that is extracted after the pixel wise aggregation of all binary \(GT\) masks, and \(G{T}_{ij}(k)\) represents the pixel wise aggregation for the total number of binary ground truth images \(Nsl\) in a certain pixel position \(i, j\). The density map of important features is given as:

$${FM }_{Density \,map}=\sum_{i,j=0}^{i,j=256}\left\{\sum_{k=0}^{k=Nsl}F{M}_{ij}\left(k\right)\right\},$$
(4)

where \({FM }_{Density \,map}\) is the density map of extracted features that a model utilize for its decision and is extracted after the pixel wise aggregation of those feature maps, and \(F{M}_{ij}(k)\) represents the pixel wise aggregation for the total number \(Nsl\) of feature maps extracted by a model in a certain pixel position \(i, j\). The latter can take values ranging from 0 to 256 attributed to the spatial dimensions of the density maps. A comprehensive scheme is presented in Fig. 2, while the mean squared error and the absolute subtraction are used as explainability metrics for quantitative and visual assessment. The absolute pixel wise difference map between Eqs. (3) and (4) is given by Eq. (5):

Figure 2
figure 2

The explainability assessment pipeline. Density maps for GT binary masks and Feature maps are extracted via a pixel wise aggregation. Mean squared error and absolute pixel wise subtraction are performed on the density maps for quantitative and visual inspection.

$$DMap=\left[\begin{array}{ccc}\left|FM \,\,Density \,\,ma{p }_{\mathrm{0,0}}-GT\,\, Density \,ma{p}_{\mathrm{0,0}}\right|& \cdots & \left|FM\,\, Density \,ma{p}_{\mathrm{0,256}}-GT\,\, Density \,ma{p}_{\mathrm{0,256}}\right| \\ \vdots & \ddots & \vdots \\ \left|FM\,\, Density \,ma{p}_{\mathrm{256,0}}-GT\,\, Density \,ma{p}_{\mathrm{256,0}}\right| & \cdots & \left|FM\,\, Density \,ma{p}_{\mathrm{256,256}}-GT\,\, Density \,ma{p}_{\mathrm{256,256}}\right|\end{array}\right].$$
(5)

The gradient-weighted class activation mapping (Grad-Cam)37 technique was used to extract the feature maps (FM) from a certain layer of a DL network. In fact, the performance of a model is tightly linked to the feature maps extracted throughout the forward–backward propagation process. Figure 3 presents the significant features processed by the USE-NET model, under the influence of each preprocessing method, applied for the WG, TZ and PZ segmentation. The RACLAHE method seems to improve the accuracy of boundary estimation assisting the model to focus on relevant pixels. Red areas indicate that the model is certain that those pixels belong to the object of interest, yellow areas imply that the model is less certain for these pixels while blue areas denote that these pixels have no contribution to the model’s final decision. Figure 4 provides a visual representation of the pixel-wise absolute differences (DMap) between the ground truth density map (GTdensity map) and the relevant for the model’s decision features density map (FMdensity map). A near-zero signifies that the pixels significance for a model is equal to its GT density. It is evident that the RACLAHE method has more pixels with values close to the GT, compared to other methods. This suggests that the RACLAHE method can more efficiently assist the DL models in focusing on meaningful features that are showcased by the GTdensity map, reducing their uncertainty.

Figure 3
figure 3

Weighted heatmap for the USE-Net model and for each preprocessing method used. The columns are the prostatic zones while the rows are the evaluated methods.

Figure 4
figure 4

The visual assessment after the absolute pixel wise subtraction of \({GT }_{Density \,map}\) and \({FM }_{Density \,map}\) (Eq. 5), for each preprocessing method applied on the USE-NET network.

Discussion

Despite the widespread usage of image enhancement techniques, they are often adopted on the basis of scant literature evidence and are blindly utilized in a plethora of clinical applications. In this regard, the present work addresses two highly relevant issues in the field of medical image preprocessing, namely, how well image enhancement methods generalize across different segmentation algorithms and how to reliably select the most appropriate preprocessing method for a given task. For the former, we estimated the variability in segmentation performance across models, for each one of the preprocessing methods under evaluation. For the latter, we introduce a feature driven approach to explain models’ predictions, enabling both the qualitative assessment of model performance through visual explanations, as presented in Fig. 4, and the quantitative assessment via the estimation of the absolute mean squared error and the subtraction of feature maps and the GT maps. To the best of our knowledge, this is the first study to evaluate the impact and generalizability of image enhancement methods on DL models for prostate and prostatic zone segmentation.

In contrast to other popular histogram-based image enhancement approaches, which displayed low stability and generalizability across different DL models, the success of RACLAHE lies in the consistent, model-invariant improvement achieved for segmenting the prostate and the prostatic zones. The proposed method’s novelty relies on the combination of an automated DL region proposal model and a local enhancement technique. RACLAHE mainly focuses on (a) the effective automated identification of the region of interest to discretize the relevant for the task region features from the redundant features, (b) the enhancement of the relevant region to enhance the adequate pixels for the segmentation task, and (c) the harmonization of the enhanced region and the redundant region to retransform the image in its original dimensions to be visually presented in a more natural way for the clinical practitioners and the CNNs models as well. Specifically, the RACLAHE was the only technique that did not deteriorate models' performance in any of the experiments conducted, as shown in Tables 1, 2 and 3. In most cases, a superior performance was achieved compared to no-preprocessing or, at worst, the performance remained the same. Regarding the stability of image enhancement methods across different DL models, the RACLAHE was found to be more stable, reducing the variation of results across models and improving the overall inter-model average scores, as shown in Fig. 1.

Another important contribution of this work consists the integration of saliency maps as a performance indicator, providing a unique opportunity to interrogate the effect of different preprocessing methods on models’ behavior. We provide feature map-driven visual explanations to assist on the selection of the most appropriate preprocessing method by highlighting the image features that guide the segmentation task. Herein, the Grad-CAM37 technique was employed to visually present class discriminative localization map (heatmap or saliency map) which highlights the most important pixels of a particular class. These heatmaps were coupled with a probabilistic ground truth feature importance map to extract meaningful indications regarding a model’s performance, both qualitatively and quantitatively, as it is shown in Fig. 2. We extend the aforementioned methodology to compare different image enhancement methods and to quantify the contribution of each feature to DL models’ decisions. As depicted in Fig. 4 and Supplementary Table S1, the RACLAHE was able to enhance specific areas within the images that are strongly associated with the ROI. It is worth noting that the most significant differences were observed for the prostate’s WG and the TZ. The proposed method provides better insights about the performance of biomedical imaging applications as it represents a natural way of comparing the ground truth samples from the predicted samples, as indicated in Figs. 2 and 4.

This work has some limitations. Τhe impact of image enhancement was not assessed for 3D segmentation tasks, which would permit to evaluate its effectiveness in every spatial plane. Nonetheless, for this type of analysis, a sufficiently large amount of data is required. Additionally, regarding the RACLAHE method, no hyperparameter tuning was performed to optimally define the cutoff for the CLAHE component and, therefore, the default parameters that are automatically chosen from the algorithm were used. As proposed by Campos et al.38, hyperparameter optimization can be achieved by means of machine learning techniques.

In summary, the outcomes of this study indicate that image enhancement using RACLAHE can improve the segmentation efficacy of CNN networks in a model-agnostic manner, thereby, contributing to establishing a concrete image preprocessing pipeline for effective automatic prostate segmentation tasks. Future research may concentrate on establishing the superiority of the proposed method considering different image enhancement techniques and segmentation algorithms and evaluating its generalizability in other population datasets and clinical scenarios. In addition, the application of RACLAHE prior to CNN model training has led to the generation of more accurate saliency maps. These probabilistic pixel-wise representations, reflect a more natural way to visually explain the outcomes of a model. Ultimately, the explainability module will render model’s prediction more trustworthy among clinicals, to further support the decision-making process.

Methods

Histogram-based enhancement methods

The four image enhancement techniques used for comparison are described below.

Adaptive gamma correction with weighting distribution (AGCWD)

AGCWD is a histogram modification approach to enhance and correct images. The main attribute that differentiates this approach from the power-law transformation is the automatic selection of the gamma factor based on the weighting distribution. Specifically, the authors39 used weighting distribution to find the cumulative distribution and therefore specify the gamma parameter. This hyperparameter is set to 1, which is the default value as suggested by the authors in the original work which brightens low pixel intensities on the image and the high intensities remain intact.

Adaptive gamma correction with color preserving framework (AGCCPF)

AGCCPF employs a two-step processing approach. First, it improves the contrast and brightness of a given image by modifying the probability distribution of pixels’ intensity and then it applies gamma correction. In the second stage, it restores color using a color-preserving framework. This method is an upgrade of AGCWD due to its ability to retain information better than AGCWD as authors claim40. They use a histogram modification function to control the level of contrast enhancement by utilizing the input histogram along with the uniform histogram, which is a histogram equalized image, to produce the resulted one. The difference between input histogram, uniform histogram and the resultant histogram for an image is presented in Fig. S5 in the Supplementary Materials.

Range limited Bi-histogram equalization (RLBHE)

RLBHE considers both contrast enhancement and intensity brightness preservation as valuable factors in the output image. First, the single threshold Otsu’s41 approach is employed to execute histogram thresholding to get better contrast enhancement and avoid over-enhancement. Second, the range of the equalized input image is limited to ensure that the mean output brightness is almost equal to the mean input brightness, preserving the initial information from the input image and third, each partition of the histogram is equalized independently.

Contrast limited adaptive histogram equalization (CLAHE)

CLAHE algorithm22 is considered more stable for contrast enhancement operations due to its local application on the frame. This approach is separated into two steps. First the initial image is divided into 8X8 windows that compose the image. Second histogram equalization algorithm is applied to equalize each window independently from the other. In this way, the histogram equalization method does not take into consideration the global features of the image and, therefore, it optimizes the intensity levels to a neighboring area around the center of each window. Compared to the aforementioned techniques, there are several advantages of CLAHE with the main being the reduced contribution of outliers. In histogram equalization, outliers play an important role as the tuning of the histogram is affected by extreme values. With the partition in windows though, extreme values are scaled within a neighborhood region and, therefore, are smoothed.

The proposed image enhancement method: RACLAHE

Conventionally, the CLAHE algorithm is applied globally on the entire frame of the image. The algorithm utilizes the histogram equalization method in a close neighborhood around a central pixel. Although the histogram equalization is applied frame-wise, the CLAHE algorithm is applied patch-wise enhancing further the contrast of the sub-regions within the frame. The proposed method RACLAHE utilizes the CLAHE algorithm along with the steps described below to transform selected features to be more interpretable for the model. The pipeline is visually presented in Fig. 5. The algorithm that describes the RACLAHE operation is.

Figure 5
figure 5

The RACLAHE algorithm. From the initial 256 × 256 frame an area of \(\{134\pm 15\} \times \{134\pm 15\}\) pixels is selected which contains the region of interest (a). This reduced dimensional space provides more targeted features to be enhanced and simplify the complexity of the problem while introducing some biases to the model regarding the area to identify features from (b).

Let \(Z\) be a space where the intensity features values of each frame lie, \(F{M}_{Z}\in Z, 0\le F{M}_{Z}\le Frame\, width\times Frame\, height\). Each frame is passed from a DL U-Net like structure9,42 that proposes a reduced size area that includes the prostate gland. Specifically, the initial space \(Z\) is reduced into a subspace \(Q\subset Z\) and features \(F{M}_{Q}\in Q\) are selected by reducing the dimensionality of the \(Z\) space in the \(Q\) space. The relation that describes these two spaces is presented in Eq. (6) and the operation is given in Fig. 5a.

$$Q\simeq 0.25 Z\pm 0.12 Z.$$
(6)

The frame is then divided into two subframes, features \(F{M}_{Q}\), \(F{M}_{Z}-F{M}_{Q}\) and those areas are the proposed area that contains the whole gland and the remaining area respectively while this process is presented in Fig. 5b. The CLAHE algorithm is then applied on the features \(F{M}_{Q}\) (proposed area). Specifically, \(F{M}_{Q}\) pixel intensity features are divided into \(8\times 8\) patches and the number of those patches in each \(F{M}_{Q}\) is approximately 196. Then, the probability of the occurrence of each pixel’s unique intensity value \({P}^{patch}({i}_{F{M}_{Q}})\) is given as:

$${P}^{patch}\left({i}_{F{M}_{Q}}\right)=\frac{Num\left({i}_{F{M}_{Q}}\right)}{TotNum},0\le {i}_{F{M}_{Q}}\le L{D}^{patch},$$
(7)

where \(Num({i}_{F{M}_{Q}})\) is the number of occurrences of pixel intensity \({i}_{F{M}_{Q}}\) within the patch, \(TotNum\) is the total number of pixels of the patch, \(L{D}^{patch}\) is the range of values, inside each patch. Consequently, the cumulative distribution for each patch is calculated:

$$CD{F}^{patch}({i}_{F{M}_{Q}})=\sum\limits _{k=0}^{{i}_{FMQ}}{P}^{patch}\left(k={i}_{F{M}_{Q}}\right).$$
(8)

The histogram equalized patch is obtained by Eq. (9) making use of Eqs. (8) and (7):

$$EqHis{t}^{patch}=round\left(L{D}^{patch}-1\right)\times {CDF}^{patch}\left({i}_{F{M}_{Q}}\right).$$
(9)

The enhanced area of Fig. 5b is constructed from the aggregation of the histogram equalized patches and it is obtained as:

$$F{{M}_{Q}}^{trans}=Enhanced\, Area=\sum \limits_{t=0}^{Patches}EqHis{t}^{t} ,$$
(10)

where with \(Patches\) we denote the total number of patches within \(F{M}_{Q}\) while \(F{{M}_{Q}}^{trans}\) indicates the enhanced area. Finally, the RACLAHE resulted image is given by Eq. (11) it is shown in Fig. 5b:

$$RACLAHE =F{{M}_{Q}}^{trans}+F{M}_{Z}-F{M}_{Q}.$$
(11)

Model development

Five CNN algorithms were implemented to evaluate the impact of preprocessing methods to segment the prostate and the prostatic zones, namely the U-Net9, ResUNet43, U-Net3+44, U-Net++45 and USE-NET46 while a brief description of them is given in the Supplementary Materials. For model training, the Prostate X dataset was split into the training and the validation sets where the 85% of image frames were used for training and the remaining 15% for validation. The splits were kept the same for all the experiments run in this study (i.e. for the different models and preprocessing methods). The initial learning rate was kept at 0.0001 whereas the batch size and epochs were 16 and 120, respectively, while the adam optimizer was used for weight updating throughout the training process. As loss function the sigmoid focal crossentropy47 was utilized due to its effectiveness to handle unbalanced data. The early stopping technique used in order to stop model training when the validation performance stopped improving further. The segmentation performance of each model and the preprocessing method was evaluated externally on an independent dataset (Prostate 3 T). The GPU used for the experiments is the NVIDIA Quadro P6000, the drivers are of version 441.66 while the python packages used are numpy = 1.21.2, keras-unet-collection = 0.1.11, scikit-image = 0.18.3, SciPy = 1.7.1, Tensorflow = 2.2.0 and Tensorflow-addons = 0.11.2. The original code and the docker image for RACLAHE and all the experiments are available from the authors upon request.

Performance assessment

Several metrics have been implemented to thoroughly evaluate the performance of the proposed method and existing histogram modification methods. Specifically, DS, REI, Sensitivity, BA, HD and ASD common segmentation metrics were computed thanks to the complementary information they provide which could provide sufficient insights about models’ performance. DS and REI are metrics related to the overlapping between the predicted and the true annotation of the object of interest. On the other hand, Sensitivity and BA provide information about the ability of the model to identify an area of interest with high class imbalance between the background and foreground pixels. HD and ASD employ one dimensional measurement that connect the relevant results with real world insights (S.I unit system). Specifically, HD and ASD are measurements of how far two data points are, one belonging to the ground truth boundary and the other to the prediction. Herein, the 95% HD was employed to avoid using extreme values as they may not be indicative of real model performance. All the performance metrics were computed on the external testing dataset. The Wilcoxon rank-sum test (two-sided) was used to compare the proposed RACLAHE technique with all the other methods and a p-value ≤ 0.05 was considered as significant in performance differences.