A radiomics approach for automated diagnosis of ovarian neoplasm malignancy in computed tomography

This paper develops a two-dimensional (2D) radiomics approach with computed tomography (CT) to differentiate between benign and malignant ovarian neoplasms. A retrospective study was conducted from July 2017 to June 2019 for 134 patients with surgically-verified benign or malignant ovarian tumors. The patients were randomly divided in a ratio of 7:3 into two sets, namely a training set (of n = 95) and a test set (of n = 39). The ITK-SNAP software was used to delineate the regions of interest (ROI) associated with lesions of the largest diameters in plain CT image slices. Texture features were extracted by the Analysis Kit (AK) software. The training set was used to select the best features according to the maximum-relevance minimum-redundancy (mRMR) criterion, in addition to the algorithm of the least absolute shrinkage and selection operator (LASSO). Then, we employed a radiomics model for classification via multivariate logistic regression. Finally, we evaluated the overall performance of our method using the receiver operating characteristics (ROC), the DeLong test. and tested in an external validation test sample of patients of ovarian neoplasm. We created a radiomics prediction model from 14 selected features. The radiomic signature was found to be highly discriminative according to the area under the ROC curve (AUC) for both the training set (AUC = 0.88), and the test set (AUC = 0.87). The radiomics nomogram also demonstrated good calibration and differentiation for both the training (AUC = 0.95) and test (AUC = 0.96) samples. External validation tests gave a good performance in radiomic signature (AUC = 0.83) and radiomics nomogram (AUC = 0.95). The decision curve explicitly indicated the clinical usefulness of our nomogram method in the sense that it can influence major clinical events such as the ordering or abortion of other tests, treatments or invasive procedures. Our radiomics model based on plain CT images has a high diagnostic efficiency, which is helpful for the identification and prediction of benign and malignant ovarian neoplasms.


Scientific Reports
| (2021) 11:8730 | https://doi.org/10.1038/s41598-021-87775-x www.nature.com/scientificreports/ diversity and heterogeneity 3 . Subjective evaluation is generally unstable under the influence of wide variations in the human rater expertise. Radiomics is a new subfield of radiology that has recently emerged as an alternative to the traditional qualitative diagnosis approach 4,5 . In radiomics, imaging data quantification is assisted by a variety of advanced methods of image processing. In particular, algorithms for data characterization are utilized for deriving an immense number of numerical features from radiographic images 4,5 . Furthermore, numerous investigations have demonstrated that CT-based radiomics typically show high performance in the differentiation between benign and malignant lesions in several human organs including the kidneys, lungs, and liver 6 . Our work is based on the hypothesis that we can utilize CT-based radiomics features extracted from primary ovarian tumor lesions in order to establish imaging biomarkers that can non-invasively identify benign and malignant tumors, and also differentiate between them.

Materials and methods
General information. The current investigation is a retrospective one, which has been scrutinized closely and thoroughly, and then officially approved and accepted by the Ethics Committee of Jiangxi Provincial People's Hospital Affiliated to Nanchang University. Informed consent was formally secured from all concerned parties, particularly patients. All the relevant guidelines and regulations that are agreed upon worldwide were observed while carrying out this work. We retrospectively reviewed relevant surgical and radiological data collected between 2017 and 2020. Persons selected for inclusion in this study satisfied the following criteria: (a) female patients with histopathological verified ovarian tumors, (b) persons with no history of previous or current malignancy other than that of ovarian tumors, (c) patients who were subjected to preoperative high-resolution procedures for ovarian cancer staging, and (d) patients who had preoperative CT for the pelvic area within the preceding half a month. Among patients satisfying these criteria, 38 patients were not included according to the following considerations: (1) patients who were subjected before the CT examination to radiotherapy, chemotherapy or chemoradiotherapy (n = 20). (2) patients diagnosed to suffer from inflammatory disease conditions (n = 11), (3) patients with low-quality imaging records (n = 7). We also selected 26 eligible patients from another hospital for external validation. Eventually, 160 patients were considered in our study.
CT image acquisition. The CT images were obtained by the SOMATOM Definition CT scanner. We used automatic modulation with those scanning parameters: a tube voltage of 120 kVp, a tube current of 150 mAs, a section thickness of 5 mm, a reconstruction interval of 1 mm, and a slice gap of 1 mm.
Region-of-interest segmentation. All regions of interest were segmented from baseline DICOM images using ITK-SNAP (Version 3.6.0). Manual ROI segmentation from the slice with the largest lesion diameter 7 was performed independently by two radiologists (henceforth referred to as readers A and B, who have 5 and 15 years of abdominal radiology experience, respectively) (See Fig. 1).
Feature extraction. We extracted textural features for 134 ROIs (62 Benign and 72 malignant) using the Artificial Intelligence Kit for life sciences (Version 3.0.1.A, GE Healthcare). For each ROI, a total of 396 features were computed including those of texture, histograms, form factors, gray-level co-occurrence matrices (GLCM), grey level run-length matrix (RLM), and gray-level zone-size matrices (GLZSM). GLCM and RLM in four directions (0°, 45°, 90°, 135°) and three displacements (1,4,7) were calculated to describe patterns or the spatial distribution of voxel intensities. The details are shown in the "Supplement S1". Feature preprocessing. Before feature selection, three steps of feature preprocessing were performed: (1) replacing the outliers by the median of the same feature; (2) the control and patient groups are subdivided into www.nature.com/scientificreports/ training (n = 95) and test (n = 39) sets with an approximate ratio of 7:3; (3) Z-score data normalization is applied Z-score normalization was done in the training dataset to eliminate the differences in the value scales of extraction features. And both training and test datasets were normalized using the mean and standard deviation computed using in the training dataset alone. (In this normalization process, the mean value is subtracted from the original feature value and then the difference is divided by the standard deviation).

Feature selection and model construction.
The feature selection and model construction were performed in the training dataset. First, we tested the robustness and reproducibility of image features. Since the features were extracted based on the ROIs segmented by radiologists manually, we only used the features that were most robust against the manual segmentation among different radiologists 8 . The correlation coefficient for each feature was calculated between the feature set-1 (from Radiologist-A) and feature set-2 (from Radiologist-B) by using the Spearman rank correlation test. Features with correlation coefficients greater than 0.8 were regarded as robust features, since a correlation coefficient of 0.8 indicated a high consistency and repeatability 9 . Second, we employed the maximum-relevance minimum-redundancy (mRMR) algorithm to select the features by maximizing the correlation between selected features and differentiating benign and malignant, eliminating the redundancy between features. Next, the least absolute shrinkage and selection operator 10,11 (LASSO) method was employed to further select the most useful features by penalty parameter tuning λ. We chose the optimal λ based on the minimum criteria according to tenfold cross-validation. The radiomics signature (Radscore) was then calculated for each case via a linear combination of selected features that were weighted by respective coefficients.
The radiomics nomogram construction and evaluation. Univariate logistic regression was exploited to find independent predictors for ovarian tumors. The candidate predictors included clinical factors (i.e., age, ascites, and boundary), biomarker expression (CA125) 12,13 , and the RAD score 7,14 . "Supplementary Methods S1" summarizes the details of the high-performance predictors. Multivariate logistic regression was utilized to combine those individual predictors, develop a more robust prediction model for the ovarian tumor malignancy, and also construct the radiomics nomogram 15 . A calibration curve was used for performance evaluation, and the model fitness was examined using the Hosmer Lemeshow test 16 . The nomogram-based diagnosis performance was assessed using the receiver operating characteristics (ROC). The probabilistic malignancy score for ovarian tumors was determined using the nomogram method, and all involved patients were assigned based on the ROC curve cut-off value to low-or high-probability groups. The clinical significance of the nomograms was assessed on the cases of patients with ovarian tumors of different degrees of malignancy. We performed decision curve analysis (DCA) to check the feasibility of the nomograms 17 .
Statistical analysis. The data normality was verified using the Kolmogorov-Smirnov test. This test was carried out using the SPSS 23.0 software and the R statistical tools (Version 3.4.4). The probability scores from the benign and malignant samples were statistically compared based on the t-test (for normally distributed data) and the Mann-Whitney U test (for data with skewed distributions), where the scores were expressed by mean ± standard deviation (x ± s). The χ 2 test was employed in comparing count data among the two groups. The model predictive efficacy was assessed using typical diagnostic indicators such as the accuracy, the sensitivity, the specificity, as well as the area under the ROC curve (AUC) 18 .
Ethical approval. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Officially approved and accepted by the Ethics Committee of Jiangxi Provincial People's Hospital Affiliated to Nanchang University.
Informed consent. Informed consent was obtained from all individual participants included in the study. Table 1, no significant differences could be detected among patients from the training and test subsets with respect to the factors of age, ascites, boundary, or biomarker expression (i.e., CA125). Nevertheless, for patients with benign or malignant tumors, significant statistical differences were realized in age, ascites, CA125, and the radiomic signature (all p < 0.05). The other differences turned to be insignificant, as demonstrated in Table 2.

Patient characteristics. As indicated in
Performance outcomes for the clinical prediction model. The constructed clinical prediction model for identifying benign and malignant ovarian neoplasms returned the following performance metrics. For the training set, the AUC was 0.82 (with a 95% CI 0.73-0.91), while the sensitivity, specificity, and accuracy were 76.5%, 88.6%, and 82.1%, respectively. For the validation data, the AUC was 0.82 (with a 95% CI 0.68-0.96), while the sensitivity, specificity, and accuracy rates were 71.4%, 88.9%, and 79.5%, respectively (See Table 3).
Construction and assessment of the radiomic signature. Multivariate logistic regression were employed for the construction of the radiomic signature. After feature selection, 14 features were selected, which were utilized in forming the radiomic signature model (radiomics) (see Fig. 2 Table 3). The radscores showed a significant statistical difference among the benign and malignant samples for both training and testing. This indicates that the radiomic signature correlates well with the differential tumor diagnosis, as shown in Fig. 3.  www.nature.com/scientificreports/ Construction and assessment of the radiomic nomogram. We revealed through univariate logistic regression that age, ascites, CA125, and the radiomic signature could independently predict and diagnose ovarian tumors. As shown in Table 2 and Fig. 4, multiple logistic regression was carried out using these predictors in order to construct more robust prediction models and the nomogram. Excellent consistency among the predicted and actual ovarian tumor types was shown using the calibration curves in the radiomic nomograms for both patient sets. The AUC values of the nomogram-based tumor prediction in the three sets were respectively 0.95, 0.96 and 0.95. The accuracy, specificity, and sensitivity were 90.5, 90.2, and 90.9% for the training set; 89.7, 94.7, and 85.0% for the test set and 88.0, 84.6, and91.7% for the external validation set, respectively (Table 3 and Fig. 5). According to the DeLong test, the AUCs of the models based on clinical information were significantly different from the nomogram-based ones for the training and testing sets (See Table 4). Hence, the nomogram method was found to have good performance on both sets. In addition, the Hosmer-Lemeshow test demonstrated no statistically significant differences among the training and testing subsets (p > 0.05). This verifies the nomogram diagnostic superiority. The nomogram was also used to estimate the probability scores of the ovarian tumors, where patients were categorized into the low-and highprobability groups based on the Youden index 19 (with a cut-off value of 0.391), which was defined according to the training-set nomogram. The high-and low-probability groups had a significant difference in the number of benign and malignant samples (p < 0.0001). Figure 6 depicts the DCA plot of the radiomic nomogram. Clearly, the plot shows that the radiomic nomogram method outperforms the clinical model for the "treat none" vs. "treat all" strategies with a treatment probability threshold ranging from 0 to 0.9.

Discussion
Since the introduction of radiomics in 2012, this paradigm has been widely used in investigating ovarian tumors. Zhang et al. 7 report that MRI-based radiomic features show high correlation with ovarian endometrioid carcinoma (OEC) classification and patient prognosis. Also, Park et al. 8 showed that models of machine learning (ML) using age and texture features of contrast-enhanced CT resulted in high sensitivity as well as moderate specificity for malignant lesion detection 20 . However, there is currently no single CT-based texture feature to identify benign and malignant ovarian tumors. Hence, this study explored CT texture features based on plain CT scans, which has a wide range of clinical applicability.
Among the examined features, the gray-level size-zone matrix features have the largest value among the 14 features. This reflects the feature strength heterogeneity and emphasizes the extensive heterogeneity in ovarian tumors 21,22 . The form-factor features describe the 3D size and shape of the tumor area 23 . In this study, no www.nature.com/scientificreports/ form-factor features were consistent with the research focus on 2D plain CT images. This also indicates no statistically significant correlation between the tumor type and size. We now consider the gray-level co-occurrence matrix (GLCM) features, that describe the frequencies of the pairwise arrangements of voxels associated with the same gray-level value. The investigated features in this study included three types of the GLCM features (namely, the energy, the inverse difference moment, and the Haralick correlation), which further characterize the heterogeneity of local tumor regions 22,24 . The grey-level run-length matrix (RLM) features reflect the texture roughness and directionality, since the value of the long-run emphasis    25 . We found that the screening results contained 5 RLM features. In fact, three RLM features were present in the front row: low-run grey-level emphasis, long-run grey-level emphasis, and grey-level nonuniformity. Texture analysis of CT imaging data has demonstrated promising results on various types of tumors for pathological feature prediction, prognosis, and response to therapy 6 . Meng et al. 14 suggested that the approach of CT-based radiomics has a clear potential for differentiating between the sarcomatous renal cell carcinoma (SRCC) and the clear cell renal cell carcinoma (CCRCC). Dong et al. 15 used a deep-learning approach in order to construct a robust predictive model based on preoperative CT images, tumor histology, and cancer grading in patients with cervical cancer. A reasonable accuracy was achieved by this model in predicting the lymph node state in cases of cervical cancer. In our work, we have built a 2D CT-based radiomic nomogram model for identifying benign and malignant ovarian tumors. The nomogram method resulted in AUC scores of 0.95 and 0.96 for the training and test sets, respectively. The nomogram method was indeed capable of providing good calibration and differentiation of ovarian tumors, and proved to be a reliable and effective method for screening malignant ovarian lesions.
In our work, we chose 2D CT-based texture signatures for the analysis of ovarian tumors. As 2D ROIs were easy to manipulate, and the proposed signatures offered lower complexity and faster computations, the use of 2D features in clinical practice is highly recommended 26 . The outcomes based on the introduced signatures for identifying benign and malignant ovarian tumors were surprisingly superior. All ovarian lesions initially emerge as small tumors that show temporal steady growth. So, the tumor volume estimate depends clearly on the imaging time. Therefore, a small or intermediate tumor volume could not be a reliable biomarker 27 . Future studies should be implemented with large expanded datasets and more clinical features. Such enhancements shall reduce the dependence of the radiomic model on relevant clinical features 4,28 .
There are several limitations of this study. First, ROI segmentation for the ovarian tumors was carried out manually. This inherently resulted in both inter-observer and intra-observer variabilities, as it has been usually the case for other cancer types. The applicability of the developed signature is limited to scans acquired with the same scanner and scanning parameters. The reproducibility of radiomic features across different acquisition and reconstruction parameters based on data with reference values (Phantom scans) will be performed in our future study to obtain a generalizable radiomic model. Second, due to the retrospective nature of the analysis, the reproducibility and comparability of the results would be hindered by potential selection bias. Third, borderline tumors were not included in this research, and this deliberate choice may cause bias. Last but not the least, besides expanding the sample set, state-of-the-art techniques (e.g. fully-automated image segmentation, feature dimensionality reduction, deep learning, and multiobjective optimization) could be further exploited for boosting classification performance.