## Introduction

Prostate cancer (PCa) is one of the most commonly diagnosed malignant neoplasms among men1. Multiparametric Magnetic Resonance Imaging (mpMRI) has gradually gained in importance for both a timely diagnosis and an accurate characterization of PCa lesions, which play a key-role in all PCa patient management steps2,3.

With the goal of standardizing the acquisition and reporting of prostatic mpMRI imaging examinations, the European Society of Urogenital Radiology (ESUR) developed the Prostate Imaging-Reporting and Data System (PI-RADS) in 2013 and then updated it in 2015 (PI-RADS v2) and 2019 (PI-RADS v2.1)4.

PI-RADS evaluation is based on a 5-point scale associated with the probability that a combination of findings on mpMRI modalities (namely T2-weighted—T2, Diffusion Weighted MRI—DWI and Dynamic Contrast-Enhanced MRI—DCE-MRI, abbreviated) correlates with the presence of a clinically significant cancer for detected prostatic lesion. PI-RADS score ranges between 1 and 5, respectively indicating a very low and a very high likelihood that a lesion is malignant. The PI-RADS classification had a crucial role in PCa management since its development, and has proven to be a powerful tool for the identification and aggressiveness characterization of prostatic lesions5,6,7,8.

Although encouraging results have been reported in the literature on the role of the PI-RADS in the diagnosis and characterization of PCa, this system remains to be affected by several limitations, primarily associated with the interpretation of PI-RADS category 3 lesions, namely those lesions on prostate MRI that are termed as ‘intermediate’ or ‘equivocal on the presence of clinically significant cancer’4.

In this context, we aimed at investigating the usefulness of radiomics for detection of PCa (GS ≥ 6) in PI-RADS category 3 lesions and in PI-RADS 3 upgraded to PI-RADS 4 lesions (upPI-RADS 4) in PZ.

## Methods and materials

### Patient population

We performed retrospective analysis of all mpMRI data of patients who underwent mpMRI of the prostate between April 2013 and September 2018 due to elevated PSA level and/or clinical suspicion of PCa and subsequently biopsy. mpMRI images and histopathology records were collected at H.S. Maria delle Grazie, Italy and informed consent was given before Magnetic Resonance (MR) examination24. The study was conducted in accordance with the Declaration of Helsinki, and the study protocol was approved by the Ethics Committee of the Istituto Nazionale Tumouri “Fondazione G. Pascale (protocol number 1/20). PI-RADS 1, 2 and 5 lesions were excluded. Examinations where a PI-RADS 3 lesion was present together with PI-RADS 4 or 5 lesion were excluded. Biopsy results were used as gold standard.

### MRI protocol

Routine clinical mpMRI acquisition includes T2, DCE-MRI, and DWI. The DWI includes an apparent diffusion coefficient (ADC) map generated at the time of acquisition. Patients were injected with contrast agent Gadoteridol (Gd-HP-DO3A; Pro Hance, Bracco Diagnostics, Princeton, NJ, USA) with a dose of 0.1 mL/kg before DCE-MRI acquisition. All patients were imaged using MAGNETOM-Avanto scanner (Siemens Healthcare, Erlangen, Germany) at 1.5 T with both endorectal coil and phase-array pelvic coil24. More details on the technical parameters of the MRI sequences are shown in Supplementary Table S1.

### Biopsy protocol

All prostatic biopsies were TRUS-guided and performed using an 18-gauge tru-cut needle, under anesthesia. Each patient underwent both systematic biopsies, with an average of 12 random samples of the entire prostate gland, and target biopsy, with at least three samples taken from each lesion identified by MRI. The number of randomly-taken samples could vary depending on dimensions of prostate gland, as well as number of target samples could do depending on dimension of each lesion. Target sampling was performed with an MRI/TRUS fusion, using alternately the cognitive technique or dedicated software, coupled with ultrasound platforms from various companies24. The gross description included number and core lengths of needle biopsies. The specimens were fixed in buffered 10% formalin, and routinely processed. Thin sections of fuor microns were cut and stained with hematoxylin and eosin stain (H&E). Supplementary sections were performed for possible immunohistochemical stains to prove the loss of basal cells in small focus of cancer (p63, and high molecular weights keratin) combined with other antibodies overexpress in prostatic cancer (anti-AMACR/p504S). One senior pathologist (with more than 10 years of experience in prostate specimen interpretation) who was blinded to the MRI reports, reviewed pathological slices and classified tumors according to the 4th WHO classification, further grading them by Gleason scores and the group grade cancer28,29,30. The final report also included tumor extent in each needle biopsies and the percent core involvement by tumor.

### Image preprocessing and 3D ROI segmentation

ADC images were non-rigidly coregistered on T2 image using Elastix software (v. 4.9.0) in order to correct for typical spatial distortion arising from DWI acquisition. Subtraction DCE-MRI images were all resliced on T2 images. Two experienced radiologists were asked to consensually draw 3D regions of interest (ROIs) in the biopsied lesions with PI-RADS3 and upPI-RADS4, while also looking at the b = 1000 coregistered volume. Lesion segmentation was performed on T2 images using an in-house developed software for region labeling. During the segmentation procedure, the radiologists were blinded to both the histological results and all clinical information relative to the retrospective prostate mpMR images. Prior to radiomic features extraction, normalization was applied on T2 images intensities. Specifically, intensities were normalized by centering them at their respective mean value with standard deviation of all gray values in the original image31,32,33,34.

#### Feature selection

Feature normalization was performed before feature selection by using z normalization. Specifically, each feature was normalized as $$z=(x-\stackrel{-}{x})/s$$, where $$x$$, $$\stackrel{-}{x}$$, and $$s$$ are the feature, the mean, and the standard deviation, respectively37,38. Due to the relatively small patient sample size and high-dimensional feature size, we then performed feature selection process to select features most related to biopsy outcome, in order to construct prediction models. Feature selection was performed including two steps. In the first step the feature set was restricted through a univariate analysis by using nonparametric Wilcoxon rank-sum test performed to investigate their statistical significance with respect to the outcome (PCa vs non-PCa). The significantly different features (p < 0.05) were then selected and further reduced in the second step using Minimum Redundancy Maximum Relevance (mRMR) algorithm. mRMR algorithm selects an optimal set of features considering both the relevance for outcome prediction and the redundancy between features, using mutual information (MI) to measure both the relevance and the redundancy. At each step of mRMR feature selection process, the feature with the highest predictor importance score (defined as the difference between MI between outcome and the considered feature and the average MI of previously selected feature and the considered feature) will be added to the selected feature set39,40. The top five features with highest predictor importance score were finally used to construct radiomics prediction models. Feature selection procedures were implemented in MATLAB R2019b (The MathWorks Inc., Natick, MA, USA).

#### Multivariable model building and analysis

For each classification task, the reduced feature set was used to build logistic regression models of order from 1 to 5 that would best predict the presence of PCa using an imbalanced-adjusted bootstrap resampling (IABR) approach on 1000 bootstrap samples41. Specifically, 1000 bootstrap samples were randomly drawn with replacement from the available dataset and used as training set, while instances that do not appear in the bootstrap sample are the testing set42. Then, the imbalance-adjustment step was applied duplicating the number of positive instances by a factor equal to the number of negative instances, and the number of negative instances by a factor equal to the number of positive instances. This operation made the probability of picking a positive and a negative instance in the bootstrap sample the same41.

For each model order, the combination of features maximizing the 0.632+ area under the receiver operating characteristic curve (AUC) within 1000 bootstrap training and testing samples was identified43,44. Once optimal combination of features was identified for model orders 1–5, IABR on 1000 samples was performed again for all models in order to evaluate prediction performances.

## Results

### Patients characteristics

Univariate analysis revealed 36 and 43 statistically significant features, respectively for PI-RADS3 and upPI-RADS 4 classification tasks. Statistically significant features are reported in Supplementary Tables S4 and S5. By using the mRMR method on these features, the five highest mRMR-ranked features were selected to build the prediction models. Bar plots of predictor importance score for the top five features selected by mRMR for each classification task are shown in Fig. 4.

Multivariable logistic regression models of order from 1 to 5 were obtained, and their prediction performance for the two classification tasks were reported in Supplementary Table S6 and showed in Fig. 5. By inspecting curves in Fig. 5 and values in Supplementary Table S6, we determined that the simplest multivariable model with the best prediction performances were reached by second order model for PI-RADS 3 and for first and third upPI-RADS4 classification task. For PI-RADS 3 lesion detection, second order model was chosen due to a slightly higher mean sensitivity, specificity and accuracy (80%, 51%, 71%, respectively) respect to first order model (76%, 42%, 65%, respectively), which showed comparable AUC (AUC = 74% for first order model, AUC = 0.76 for second order model). For models of order from 3 to 5, prediction performances get worse. For upPI-RADS 4 classification task, first order model showed higher performances (AUC = 89%, sensitivity = 87%, specificity = 62%, accuracy = 82%) respect to higher order models. However, promising results were also obtained from third order model performance metrics.