Impact of different scanners and acquisition parameters on robustness of MR radiomics features based on women’s cervix

MR Radiomics based on cervical lesions from one single scanner has achieved promising results. However, it is a challenge to achieve clinical translation. Considering multi-scanners and non-uniform scanning parameters from different centers in a real-world medical scenario, we should first identify the influence of such conditions on the robustness of MR radiomics features (RFs) based on the female cervix. In this study, 9 healthy female volunteers were enrolled and 3 kiwis were selected as references. Each of them underwent T2 weighted imaging in three different 3.0-T MR scanners with uniform acquisition parameters, and in one MR scanner with various scanning parameters. A total of 396 RFs were extracted from their images with and without decile intensity normalization. The RFs’ reproducibility was evaluated by coefficient of variation (CV) and quartile coefficient of dispersion (QCD). Representative features were selected using the hierarchical cluster analysis and their discrimination abilities were estimated by ROC analysis through retrospective comparison with the junctional zone and the outer muscular layer of healthy cervix in patients (n = 58) with leiomyoma. This study showed that only a few RFs were robust across different MR scanners and acquisition parameters based on females’ cervix, which might be improved by decile intensity normalization method.


Scientific Reports
| (2020) 10:20407 | https://doi.org/10.1038/s41598-020-76989-0 www.nature.com/scientificreports/ influenced by different scanners and acquisition parameters before pooling multi-center data associated with cervical tissue to prospectively validate the value of radiomics from one single scanner. T2 weighted imaging (T2WI) is a stable and essential sequence of cervical scanning according to the protocol for staging and evaluation of cervical cancer proposed by the European Society of Urogenital Radiology 2010 17 . Although radiomics analysis of cervical tissue has been widely performed on T2W images 6,[18][19][20] , its nonquantitative nature underlines the need for investigating the reproducibility of this sequence in a multi-center scenario. Thus, the purpose of the current study was to quantitatively identify the influence of different scanners and acquisition parameters on the robustness of T2WI radiomics features (RFs) based on females' cervix, which might have some implications for further radiomics studies on cervical lesions.

Results
Inter-MR analysis. The percentages of reproducible RFs obtained from three MR scanners are summarized in Table 1. Regarding the influence of the MR scanners on the robustness of RFs, reproducible RFs ranged from 51.5% (204 of 396) in G.0 (kiwis) to only 24.2% (96 of 396) in G.3 (volunteers) when using QCD and CV indexes with 15 and 0.1 as the cutoff values, respectively. After filtering based on CV < 0.1 for all kiwis and volunteers in G.0-G.3, only 23.5% (93 of 396) reproducible RFs were shared across all groups.
Intra-MR analysis. The percentages of reproducible RFs based on different scanning parameters are summarized in Table 2. The number of reproducible RFs varied largely from 91.4% (362 of 396, G.0, kiwis) to only 37.1% (147 of 396, G.1, volunteers) when the TR was modified by using CV index with 0.15 and 0.1 as the cutoff value, respectively. For each group of acquisition parameters (TR, TE, ST or AM), less than 50% RFs were reproducible in all groups of volunteers based on CV < 0.1. Moreover, we observed images with larger AM, thicker ST, shorter TE, or longer TR had more reproducible RFs, though there was no significant difference (p > 0.05) (Supplementary Materials Table S1).
Feature selection and effects of intensity normalization. Based on CV < 0.1 and QCD < 10, we obtained 43 reproducible features in both inter-MR and intra-MR analyses for each kiwi and each volunteer including 4 histograms, 9 Form Factors, 14 GLCM, 15 RLM and 1 GLZSM features without intensity normalization ( Supplementary Materials Fig. S1). After hierarchical cluster analysis, 8 representative features were acquired according to the CV value in the volunteers' inter-MR analysis, including Compactness1/ Sphericity (Form Factor), Spherical Disproportion (Form Factor), GLCM Entropy_angle90_offset4, GLCM Entropy_All-Direction_offset7, GLCM Entropy_angle135_offset1, histogram Energy/ histogram Entropy, Run Length Nonu-niformity_angle90_offset1, and Maximum 3D Diameter (Form Factor) (Fig. 1a). With image intensity normalization, 60 reproducible features were obtained, with histogram features showing the greatest increase (from 4 to 20) (Supplementary Materials Fig. S1). Next, we selected 10 representative features after hierarchical cluster analysis, including Small Area Emphasis (GLZSM) and Percentile50/Quantile0.5 (histogram), with the remaining eight representative features were the same as without intensity normalization (Fig. 1b). Among these common representative RFs with and without intensity normalization, lower CV values were obtained with intensity normalization, and the area under the ROC curve values of these representative features in discriminating cervi- Table 1. The number of reproducible features for inter-MR analysis across volunteers and kiwis out of a total of 396. The values displayed on the table were means within each group. "Mean ± Standard Deviation" was calculated from mean CV or mean QCD values within each group. G.0 represents the three kiwis, while G.1-G.3 represent the three group volunteers in 6th-10th, 11th-15th, and16th-20th day of physiological cycle respectively.  2). Geometric features were not taken into account in this part owing to their unchanging nature between with and without intensity normalization.

Discussion
In this study, we evaluated the reproducibility of radiomics features across different MR scanners and scanning parameters. We found that a large portion of RFs were non-reproducible, in both inter-and intra-MR analyses. The reproducibility and the discriminative power of RFs were both improved with intensity normalization. A previous study analyzed the influence of CT scanners and acquisition parameters on reproducibility of RFs based on non-biological phantoms 12 . However, results observed in the non-biological phantom might not be applicable on human images after similar experiments. Different from analysis solely based on non-biological phantoms, our results that were based on kiwi phantoms and real human tissue can indicate reality of clinical radiomics. In this observational study, a smaller number of reproducible features were acquired from volunteers than that from refresh kiwis. The stable kiwis can be used to overcome the intrinsic impairment due to the anatomy, positioning or physiological change of cervix. The natural degeneration of kiwis could be ignored since the whole scanning process across the three scanners maximally lasted for two hours. Beyond that, the kiwi is rich in water and has naturally structured textures, which can produce good T2W images and appears suitable to be used to compare different MR protocols and scanners 21,22 . Therefore, the kiwi was utilized as the reference of feature selection and intensity normalization.
This preliminary study included scanning acquisition settings similar to what might be seen in patient scans. If the variability was found to be small, then the scanning protocol could serve as a baseline for future patient studies. Nevertheless, our study showed the quite severe variability of the features even based on consistent scanning parameters across different scanners, which might be caused by the difference in fundamental design of the scanners. The percentage of reproducible RFs obtained from inter-scanner analysis was lower than that from intra-scanner analysis, accordant with the previous study 12 . We also found that signal intensity varied greatly across the three scanners in this study, which cannot be addressed by unifying MR scanning parameters.
In routine MR diagnostic studies, there is a large variability in thickness of slices, pixel size of the images, TR, TE, echo train length or bandwidth resulting from user preferences, protocol requirements, manufacturer's settings, etc. These parameters determine the voxel size, grey level and signal to noise ratio. Therefore, evaluating their impacts on MR radiomics features is of paramount importance. In this study, we found that all four Table 2. The number of reproducible features for intra-MR analysis with different acquisition parameters for volunteers and kiwis out of a total of 396. TR repetition time; TE echo time; ST slice thickness; AM acquisition matrix; G.0 represents the three kiwis, while G.1-G.3 represent the three group volunteers in 6th-10th, 11th-15th, and16th-20th day of physiological cycle respectively; The values displayed on the table were means within each group. "Mean ± Standard Deviation" was calculated from mean CV values within each group. MR scanner: Philips Medical Systems (Ingenia 3.0 T, Philips Healthcare, Best, The Netherlands). www.nature.com/scientificreports/ parameters, AM, ST, TE, and TR can impact reproducibility of radiomics features. We also found that bigger AM, thicker ST, shorter TE and longer TR produced more reproducible RFs, though there was no significant difference. However, texture features of all categories are increasingly sensitive to acquisition parameter variations with increasing spatial resolution (bigger AM) unless the spatial resolution is sufficiently high 13 . Besides, thinner slice images acquired better diagnostic performance than thicker slice(thicker ST) images, which might be caused by larger partial volume effect for thicker slice images 23 . Thus, a future study focusing on balancing the reproducibility and diagnostic performance might be necessary. www.nature.com/scientificreports/ The large variation in signal intensity across different scanners calls for calibration attempts. Intensity normalization is a pre-processing step in the MR radiomics analysis and is vital for successful deep learning-based MR image synthesis 24 , especially for non-quantitative images in a multi-center scenario for shrinking intensity difference. Various intensity normalization methods have been proposed, including Z-score, piecewise linear histogram matching (the decile method), fuzzy C-means based, Gaussian mixture model based, kernel density estimate based, whitestripe and so on, which have met with varying degrees of success and also have their respective limitations 24,25 . Discussing all of them is beyond the scope of this research, that is, impact of different scanners and acquisition parameters on robustness of MR radiomics features. Although Z-score is used in many radiomics studies, but this method mainly emphasizes standardizing data and make them comparable, which does not change the gray distribution histogram of images. Instead, the decile method 26 can adjust the distribution of the intensity, which is useful to not only produce consistent images but maintain the difference between different tissues across different scanners and scan parameters. In this study, we chose the decile method for evaluation also owing to its ease of computation, customizability and speed while maintaining high accuracy, which has been verified in brain across a multi-site multi-scanner MRI data 25 . We demonstrated the effectiveness of the decile approach in cervix for shrinking intra-and inter-scanner variations while at the same time improving the ability for stratifying tissues in this study too.
There are several limitations in this study. First, we just used those established and most common radiomics features, excluding wavelet and Laplacian of Gaussian transformations. To our best knowledge, before deriving these filtered features we have to engage super-parameters, such as convolution kernel size, but no standard kernel size has been provided so far. Besides, the most reproducible were among those calculated on the non-transformed images while filtered features showed the biggest discrepancy 27 . Thus, analysis based on non-transformed images could achieve the purpose of this study instead of exhaustively testing all the image features. Second, only three MR scanners and just 3 T field strength were used. However, our preliminary study quantitatively showed some objective factors affecting MR radiomics' application in a real-world medical scenario. Lastly, just www.nature.com/scientificreports/ the T2WI sequence was evaluated in this study. Other commonly used modalities, such as the apparent diffusion coefficient map, could be further investigated in future.
In conclusion, only a few RFs derived from T2WI were robust across different MR scanners and acquisition parameters based on females' cervix, which might be improved by decile intensity normalization method.

Methods
Phantoms (Kiwis). Prior to volunteers' test, we performed a phantom examination as the reference of identifying reproducible RFs and image intensity normalization across multi-scanners and non-uniform scanning parameters. The phantom was selected based on the following criteria: biological, rich in water, suitable size (approximately 3 cm × 4 cm × 5 cm), certain degree of hardness, and stable textural characteristics. Kiwis were suitable for these criteria and three of them (green varietals, volume 70-75 cm 3 , NESPAR, Greece) were selected and characterized as group 0 (G.0) (Fig. 3). These kiwis were kept in thermostat at 7 °C before and between the experiments.
Volunteers. This prospective observational study of healthy women was aimed to identify robust RFs across three different scanners and non-uniform scanning parameters within one scanner, which was approved by the Institutional Review Board of Renji Hospital, Shanghai Jiao Tong University School of Medicine, and the written informed consent was obtained from all volunteers before the MRI examinations. All procedures were performed in accordance with relevant guidelines and regulations. The inclusion criteria were healthy women with regular menstrual cycles (24-35 days) 28 and negative gynecologic examination findings (gynecologic ultrasonography, serum tumor markers, cytology detection, and HPV DNA detection). A total of 9 women were included in our study (mean age, 25 years old; age range, 22-30 years). Considering that menstruation cycle could affect manifestation of cervix, thus, volunteers were divided into three groups according to their stage of menstrual cycle. Volunteers at 6th-10th, 11th-15th, and 16th-20th day (the date was calculated from the first day the participants had their regular bleeding) of physiological cycle were assigned as group 1, 2, 3, (G.1, G.2, G.3), respectively, with each group having three participants. MR data acquisition. The scanning parameters of leiomyoma patients were showed in Table 3. Their . These systems were the most commonly used in radiomics studies on cervical lesions [5][6][7]18,19,[29][30][31][32][33] . Thus, kiwis and volunteers were also scanned on these three scanners in the current study.
To simulate the clinic reality, scanning protocols of kiwis and volunteers were referred to the clinical scanning parameters of leiomyoma patients. The whole study workflow of volunteers and kiwi-phantoms was showed in Fig. 4. For inter-MR process, we adjusted scanning parameters to be consistent across the three MR scanners. Each volunteer and kiwiphantom was scanned sequentially with a short interval (less than 30 min) among scanners with a dedicated phased-array abdominal coil.  Table 3.
We performed T2 weighted imaging without fat suppression for both kiwis and volunteers and the whole scanning process was less than two hours for each of them. When acquiring kiwi's axial images, we used a house-made adaptive holder to fix kiwi within ultrasound gel to immobilize the phantom during scanning (Fig. 3). Especially, volunteers were asked to fast for 4-6 h and receive butylscopolamine bromide intramuscularly (20 mg) before scanning in each scanner to avoid variation of cervix caused by intestinal movement. The scanning orientation www.nature.com/scientificreports/ of volunteers was based on cervix's major axis, including parallel (sagittal) and vertical (axial) plane. The vertical plane crossing the margin of cervical opening was taken as baseline.
Image preprocessing. A preprocessing pipeline was applied on all T2-weighted images, including the bias field correction, isotropic voxel resampling, registration, intensity normalization and gray-level discretization.
To identify the effects of intensity normalization, data with and without intensity normalization was acquired separately. First, the bias field correction was performed by using N4ITK for all images 34 . And then volumetric regions were isotopically resampled to the in-plane resolution (voxel size = 1mmx1mmx1mm) using cubic interpolation. Third, co-registration 35,36 via SPM12 (https ://www.fil.ion.ucl.ac.uk/spm/softw are/spm12 /) was carried out in order to correct motion artifacts under different scanners or from a long scanning process in one scanner. Next, the decile based on piece-wise linear approach was used for intensity normalization 26,37 . To eliminate the high and unstable signal intensity of urine, the bladder tissue was excluded from images before normalization. Intensity normalization was performed by rescaling the intensity range of each input image (source) to match the referred image (reference) in Matlab software (https ://www.mathw orks.com). The grey value of the randomly selected reference was divided into 10 quantiles: 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, and 95%. The minimum and maximum grey values were abandoned due to the noise effect. Regulated values were obtained www.nature.com/scientificreports/ using cubic interpolation. At last, the gray-level discretization inside the ROI was also applied to reduce the computational time and to improve the signal-to-noise ratio of the texture outcome 38 . This discretization step was built in the Artificial Intelligent Kit (A.K.) offered by GE Healthcare. The ROI data was initially decimated to 256 Gy levels via histogram equalization before extracting features. To segment the normal cervix of volunteers, the whole body of cervix (including the endocervix, the junctional zone and the outer muscular layer of cervix) was selected and liquid in cervical canal was excluded. As for patients, the junctional zone and the outer muscular layer of their cervix were delineated separately (Fig. 5). All segmentations of ROIs were delineated by a junior radiologist with 4 years of experience in gynecological imaging firstly and then validated by a senior radiologist with 16 years of experience in gynecological imaging. Disagreement was resolved by consensus. The ROIs of each section were summated to derive a 3D volume of interest (VOI). www.nature.com/scientificreports/ where Q 1 and Q 3 are the first and third quartiles 41 , respectively. The selection workflow of representative robust RFs included five steps. First and second, inter-and intra-MR reproducible RFs were selected by CV < 0.1 and QCD < 10 for all the kiwis. Third and fourth, the inter-MR and intra-MR analysis of all the volunteers were performed to further select features from reproducible RFs after the second step by CV < 0.1 and QCD < 10. And finally, a hierarchical cluster analysis used for grouping similar features from these selected reproducible features in the fourth step was performed. In every cluster, the RF with the lowest CV value in the volunteers' inter-MR analysis was taken as the representative robust RF. (Supplementary Materials Fig. S1).

Regions of interest (ROIs
Statistical analysis. Statistical analysis was performed in R software v.3.5.0 (https ://www.Rproj ect.org) and IBM SPSS software v.23. CV, QCD were calculated by DescTools. The hierarchical cluster analysis was done through the hclust and rect. hclust functions. Comparison of the higher value group and the lower value group in each sort of scanning parameters was using t-test over the mean CV values. p < 0.05 indicates statistical significance. Receiver operating characteristic curve (ROC) analysis was carried out to identify the capability of representative RFs in discriminating cervical junctional zone from outer muscular layer in leiomyoma patients with healthy cervix. Boxplot was used to show the difference of reproducibility of the representative features between with and without intensity normalization.

Data availability
All data generated or analysed during this study are included in this published article (and its Supplementary Information files). License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.