Fully automated quantification of in vivo viscoelasticity of prostate zones using magnetic resonance elastography with Dense U-net segmentation

Magnetic resonance elastography (MRE) for measuring viscoelasticity heavily depends on proper tissue segmentation, especially in heterogeneous organs such as the prostate. Using trained network-based image segmentation, we investigated if MRE data suffice to extract anatomical and viscoelastic information for automatic tabulation of zonal mechanical properties of the prostate. Overall, 40 patients with benign prostatic hyperplasia (BPH) or prostate cancer (PCa) were examined with three magnetic resonance imaging (MRI) sequences: T2-weighted MRI (T2w), diffusion-weighted imaging (DWI), and MRE-based tomoelastography, yielding six independent sets of imaging data per patient (T2w, DWI, apparent diffusion coefficient, MRE magnitude, shear wave speed, and loss angle maps). Combinations of these data were used to train Dense U-nets with manually segmented masks of the entire prostate gland (PG), central zone (CZ), and peripheral zone (PZ) in 30 patients and to validate them in 10 patients. Dice score (DS), sensitivity, specificity, and Hausdorff distance were determined. We found that segmentation based on MRE magnitude maps alone (DS, PG: 0.93 ± 0.04, CZ: 0.95 ± 0.03, PZ: 0.77 ± 0.05) was more accurate than magnitude maps combined with T2w and DWI_b (DS, PG: 0.91 ± 0.04, CZ: 0.91 ± 0.06, PZ: 0.63 ± 0.16) or T2w alone (DS, PG: 0.92 ± 0.03, CZ: 0.91 ± 0.04, PZ: 0.65 ± 0.08). Automatically tabulated MRE values were not different from ground-truth values (P>0.05). In conclusion, MRE combined with Dense U-net segmentation allows tabulation of quantitative imaging markers without manual analysis and independent of other MRI sequences and can thus contribute to PCa detection and classification.


Methods
Subjects. The imaging data used in this study were used retrospectively and were acquired in a previously reported population of BPH and PCa patients who underwent PI-RADS-compatible mpMRI and MRE 12 . Our local ethical review board approved this study, and all patients gave written in-formed consent. Forty patients were included, 26 with a PI-RADS score of 2 consistent with BPH. The remaining fourteen men received biopsy with subsequent Gleason scoring. The decision to perform a biopsy was made by the referring urologist based on the clinical assessment and the PIRADS score. Those fourteen men had a suspicious focal lesion (PI-RADS 4: n = 2, maximum tumor diameters of 10 and 11 mm; PI-RADS 5: n = 12, maximum tumor diameters ranging from 16 to 66 mm). In these 14 men, prostate biopsy revealed PCa Gleason scores of 3 + 3 (ISUP class 1, n = 1), 3 + 4 (ISUP class 2, n = 2), 4 + 3 (ISUP class 3, n = 2), 4 + 4 (ISUP class 4, n = 4), and ≥ 4 + 5 (ISUP class 5, n = 5). From the 40 volume datasets of these patients, 30 were randomly selected for training. Each volume dataset included 25 slices, yielding a total of 25 (slices) × 30 (volumes) × 1 (input combination) = 750 training images for each of the 14 input combinations of IMs, and 25 (slices) × 30 (volumes) × 14 (all input combinations) = 10,500 images for UM. In the latter case, all 14 input combinations were included in a single large training set. The remaining 10 datasets with a total of 25 (slices) × 10 (volumes) = 250 slices were used for testing.

MR Imaging. The imaging data were acquired on a 3-Tesla MRI scanner (Magnetom Skyra; Siemens
Healthi-neers, Erlangen, Germany) using both the 18-channel phased-array surface coil and a spine array coil. All patients underwent a clinically indicated mpMRI examination of the prostate in accord-ance with PI-RADS version 2 (2015) 42 which included T2-weighted sequences (T2w) in the axial and coronal planes and DWI. Here we used b-values of 0, 50, 500, 1000, and 1400 s/mm2 (b = 1400 s/mm2 was used as diffusion weighted image, henceforth referred to as DWI_b). ADC maps were automatically generated by the MRI scanner via monoexponential fitting of all b-values. After the clinical MRI examination, patients underwent multifrequency MREbased tomo-elastography using a single-shot spin-echo sequence with three excitation frequencies of 60, 70, 80 Hz 12 . MRE magnitude images (mag) display signal intensities which are T2-weighted with a strong T2* effect. All imaging parameters are summarized in Table 1. MRE data processing was based on multifrequency wavefield inversion for generating frequency-compounded SWS and ϕ maps. SWS was reconstructed by single-derivative finite-difference operators (k-MDEV) 43 while ϕ was obtained by second-order, Laplacian-based, direct inversion (MDEV) 44 . Although k-MDEV is noise-resistant and well-suited for SWS visualization, it is limited regarding the quantification of viscosity-related parameters such as ϕ 43 . Hence, MDEV was used for ϕ recovery. Both k-MDEV and MDEV pipelines are publicly available at bioqic-apps.charite.de 45 . Two experienced radiologists (PA with more than 10 years and FB with 2 years of experience in PIRADS-based PCa detection and classification) segmented and revised the prostate and its zones based on T2w-MRI, DWI, and MRE. The resulting masks of CZ (which includes the transition zone), PZ, and PG were used as ground truth (GT) for training and validation of CNNs. Overall, six independent imaging data were further used for segmentation analysis: T2w, DWI_b, ADC, mag, SWS, and ϕ.
Image preparation and augmentation. Since T2w, DWI, and MRE images had different resolutions, all images were resampled to a common resolution of 0.5 mm isotropic edge length. Images of the same size and resolution were obtained by positioning a cropping window with a size of 256 × 256 pixels at the center of each 3D imaging volume. For image augmentation, 9 random elastic deformations were applied to the original images to increase the number of training sets. As described in 46 , elastic deformation can be driven by two main parameters: σ , which represents the elasticity coefficient, and α , which represents a scaling factor that controls the amplitude of the deformation. The two parameters were set to α = 21 and σ = 512. Examples of augmented images are provided in Fig. 2.
Dense U-net. A previously developed Dense U-net 40 based on state-of-the-art U-net convolutional network architecture 38 was used. The network comprised two main parts-an encoder and a decoder-between which skip connections connected feature maps with similar resolutions. Normal stacks of convolutional layers at each stage were substituted with two densely connected blocks. Each dense block consisted of four convolutional layers with 3x3 kernel size followed by a transitional layer. The structure of the network is illustrated in Fig. 3.
Network training. Stochastic gradient descent with a learning rate of 10-3, a momentum of 0.9, and a decay of 10-6 was used in this study to train and test all proposed models. We used cross-entropy (CE) loss as the main loss function, and performed pixel-wise comparison of ground-truth images and the resulting masks. www.nature.com/scientificreports/ Two main approaches were tested: (1) Individual models (IMs), where we trained and tested a separate model for each combination of sequences/maps or individual sequence/map input. (2) Unified model (UM) without MRE-MRI co-registration (i.e., a single model for T2w, DWI_b, ADC, mag, SWS, and ϕ images, trained on masks that were manually and separately segmented from T2w and mag images), yielding a re-arranged dataset, where all image combinations were taken into account during training, validation and testing. Fourteen input combinations were used for training and testing IMs and the UM, more specifically mag+SWS+ϕ , SWS+mag, SWS+ϕ , mag+ϕ , mag, ϕ , SWS, T2w+ADC+DWI_b, T2w+ADC, T2w+ADC, ADC+DWI_b, T2w, ADC and DWI_b. While each of these input combinations was used to train an individual model in IMs, in UM, each input combination represented a subset and was combined with all other subsets to generate a single, large dataset of all 14 input combinations for training and testing the UM. Figure 4 shows how imaging data were used as inputs for IMs and the UM.  www.nature.com/scientificreports/ Evaluation. We evaluated all resulting segmentations against manually delineated ground-truth masks using standard evaluation statistics such as mean dice score (DS) ± standard deviation (SD), sensitivity (Sen), specificity (Spc), and Hausdorff distance (HD) as a contour consistency measure, and a t-test with p<0.05 indicating a statistically significant difference. The dice score is a similarity measure that quantifies the overlap between predicted masks and ground-truth labels, allowing straightforward comparison of segmentation performance 47 .
The Hausdorff distance is the maximum distance between two edge points from two different sets (predicted mask and ground truth). It is expressed in millimeters (mm) and was defined as follows: where d(i,j) is the Euclidean distance between two points from different sets A and B. Implementation details. All models were trained on individual or combinations of MRI/MRE images in a slice-wise fashion, where all images had a size of 256x256 and an in-plane resolution of 0.5x0.5 mm. We used the SimpleITK library for image preprocessing 48,49 , and Keras library (version 2.4.3) with Tensorflow library (version 2.3.1 Google Inc.) back-end 50 as the main library for model implementation, training, and testing.
All models were trained on a TitanXP GPU with 2 GB video memory (CUDA version of 10.1) and a batch size of 25 images. Training time of the Dense U-net was around 8.5 hours. The computation time during testing for a single 3D volume (of around 25 slices) was approximately 1.5 s.

Ethical approval. All the procedures and experimental protocols were confirmed and approved by Ethics
Committee of Charité -Universitätsmedizin. All methods were carried out in accordance with relevant guidelines and regulations.

Results
This section presents the performance results of our Dense U-net that was used on MRI and MRE images. Dense U-net was applied in two forms: UM and IMs evaluated using DS, HD and MRE values of the automatically segmented prostate zones. Figures 5 and 6 illustrate segmentation results for each of the two training approaches. The boundary of automatically segmented regions closely follows the boundary of manual segmentations.
Unlike IMs, the UM can process any map or combination of maps, which facilitates clinical applications. Compared with IMs, the UM had higher DS in PG and CZ and lower DS in PZ. Assessed for different prostate HD(A, B) = max{max i∈A min j∈B d(i, j), max i∈B min j∈A d(i, j)} www.nature.com/scientificreports/ zones, DS of the UM ranged from 0.77 ± 0.11 (DWI_b) to 0.92 ± 0.04 (mag, SWS, ϕ ), from 0.65 ± 0.08 (DWI_b) to 0.86 ± 0.06 (mag, SWS, ϕ ), and from 0.28 ± 0.10 (DWI_b) to 0.57 ± 0.05 (mag), for PG, CZ, and PZ, respectively. The smallest HD of 1.15, 1.45, and 1.81 for PG, CZ, and PZ, respectively, was found for mag. Images are presented in Fig. 6, and the results are summarized in Table 3. Unlike IMs, the UM showed a significantly higher DS when using MRE data in comparison with MRI data (p < 0.001, 0.05, and 0.05 for PG, CZ, and PZ, respectively). Overall, IMs had significantly more accurate results compared with the UM in terms of both DS and HD with p-values < 0.01 for PG, CZ, and PZ. This is shown in Fig. 7, illustrating that DS was higher and HD lower for IMs than for the UM. Figure 8 shows a case where model segmentations were inaccurate compared with the ground-truth masks. Quantitative analysis, for both IMs and the UM, of pixel values in PG, CZ, and PZ showed no significant difference (p>0.05) between ground-truth and automated prostate segmentation. Group mean values are presented in Tables 4 and 5.

Discussion
Our study shows that Dense U-net segmentation of prostate zones based on MRE data allows automated tabulation of quantitative imaging markers for the total prostate and for the central and peripheral zones.
Our results show that IMs performed excellent across all maps and sequences with high values for DS and low values for HD. In our experiments, segmentation was most reliable when we used T2w and MRE magnitude images, which provide sufficiently rich anatomical details for automated prostate segmentation while quantitative parameter maps such as SWS, ϕ and ADC lack those details. Figure 9 depicts a variety of maps in a patient demonstrating that anatomy of prostate boundaries is well preserved on T2w and MRE magnitude images while it is less clearly visible on ADC, DWI_b, SWS and ϕ maps.
For IMs, we found no significant difference between DS of MRE and DS of MRI in PG and PZ while, in CZ, DS of MRE was higher than that of MRI. This may be explained in part by blurring due to larger slice thickness in MRI (3 mm) than MRE (2 mm). Furthermore, MRI slice volumes covered the entire prostate gland, the seminal vesicles and the periprostatic tissues while MRE volumes were solely focused on the prostate gland given the smaller slice thickness.    www.nature.com/scientificreports/ In contrast to IMs, the UM could process any input combination of MRI/MRE maps without a need for retraining or fine-tuning the network. Similar to IMs, the UM also favored input combinations with rich anatomical detail, such as T2w and magnitude images. In all experiments, HD of the UM ranged from 1.15 to 5.29 mm, which is significantly inferior to IMs. Moreover, by showing the changes in DS and HD due to all possible input combinations for IMs and UM, Fig. 7 illustrates that both IMs and the UM had decent performance while IMs were slightly better. Nevertheless, given the robustness of the UM combined with decently good segmentation Figure 8. Illustration of inaccurate segmentations: left column is the ground truth; middle is predicted masks and right is the overlap between ground truth and predicted masks. Rows from top to bottom, images of PG, PZ and CZ respectively. Table 4. Values obtained in PG, CZ and PZ regions of interest using ground truth and predicted masks. "mask" and "predicted" in the column headers refer to the ground truth and model output, respectively. T-test results between ground truth and predicted masks for PG, CZ, and PZ are reported in the right part of the table (based on segmentation results for the IMs using magnitude images as input). www.nature.com/scientificreports/ Table 5. Values obtained in PG, CZ and PZ regions of interest using ground truth original and predicted masks. "mask" and "predicted" in the column headers refer to the ground truth and model output, respectively. T-test results between ground truth and predicted masks for PG, CZ and PZ are also reported in the right part of the table (based on segmentation results from UM and input images of mag).  Figure 9. Different maps of the same patient. Second and forth rows show zoomed-in images of the first and third rows, respectively. Dotted red and blue lines highlight part of the segmentation for prostate and PZ, respectively and the rest of the delineation was omitted to show the actual pixel intensities. www.nature.com/scientificreports/ results, it is recommended to primarily apply IMs and to use the UM as a second opinion for automated prostate segmentation.

Pixel values (average ± standard deviation) T-test (significance at
Exploiting MRE magnitude images for automatic segmentation instead of high-resolution in-plane T2-weighted MRI had the benefit of not requiring image registration and making full use of the anatomic and viscoelastic information contained in a single MRE dataset. This potentially stabilizes quantitative parameter extraction as any co-registration artifacts are avoided. As the ultimate proof of valid segmentations, viscoelasticity values averaged within volumes of PG, CZ, and PZ obtained from CNNs were not different from those obtained with manually segmented masks (see supplemental information). Thus, we here for the first time used the information of the magnitude signal in prostate MRE, showing that is was fully sufficient for accurate segmentation, which may greatly enhance MRE of the prostate in the future. Figure 8 shows a case where the model failed to achieve accurate segmentation. Inaccuracies appear to be attributable to under-segmentation and discontinuity. Under-segmentation is visible in both the entire prostate gland (first row) and the CZ (last row), where the model did not properly locate and delineate boundaries. Discontinuity can be seen in the PZ (middle row), where the model resulted in a mask with several unconnected neighboring areas. Many factors can contribute to inaccurate segmentation, including boundary ambiguity, partial volume effects, and tissue heterogeneity. Therefore, radiologists typically use 3D information, which is subjectively interpolated by eye to the ambiguous image slice. However, even including adjacent slices for training in a 2.5-D approach 51 or use of full 3D models does not necessarily lead to better segmentation performance due to partial volume effects 52 .
We will implement the proposed CNN-based segmentation on a server, which is currently used for MRE data processing (https:// bioqic-apps. chari te. de), in order to make it available to the research community. Our ultimate aim is to accomplish fully automated tabulation of prostate SWS and ϕ values based on multifrequency MRE data. Once installed, the Dense U-net will be trained using other sets of MRE data acquired with other scanners and MRE sequences in order to generalize its applicability for prostate segmentation. In a next step, we aim at automated segmentation of other organs including the liver, kidneys, and pancreas based on MRE magnitude images.
Although MRE of the prostate is not yet part of standard care imaging, the proposed segmentation tool may help to further integrate this innovative imaging marker into routine clinical practice. We are confident that MRE will add diagnostic value to the PIRADS system in the future as shown in 53 . In addition, when enhanced with the segmentation option presented here, MRE may provide quantitative values on the mechanical consistency of prostate subzones as a reference for fully automated tumor classification.
Our study has limitations, including the small number of patients, which is attributable to fact that we performed a proof-of-concept exploration of CNNs for MRE-based prostate segmentation. For further improvement of segmentation quality and generalization to other domains, future studies should use multicenter MRI/MRE data acquired with different imaging protocols. Finally, we assembled all data for training in a way that avoids image alignment and registration procedures. While this approach ensured robust results based on single sets of data, we cannot rule out that combinations of MRI and MRE images (e.g., magnitude and T2w) would have resulted in slightly better DS and HD scores.
In summary, Magnitude images of prostate MRE were used for automated segmentation of prostate subzones based on trained CNNs. As such, MRE data provide all information needed for extraction of viscoelasticity parameters and for delineation of the prostate regions for which those values are of interest for tabulation and automated classification of suspicious prostate lesions. Dense U-net achieved excellent segmentation results using both IMs and the UM and yielded MRE parameters that were not different from ground truth. Compared with standard image segmentation based on T2w images, MRE magnitude images proved to suffice, as demonstrated by excellent Dice scores and Hausdorff dimension results. Quantitative maps of multiparametric MRI including those of DWI and viscoelasticity did not provide adequate anatomic information for learning-based prostate segmentation. Prostate MRE combined with Dense U-nets allows tabulating quantitative imaging markers without manual analysis and independent of other MRI sequences and thus has the potential to contribute to imaging-based PCa detection and classification.