Value of radiomics-based two-dimensional ultrasound for diagnosing early diabetic nephropathy

Despite efforts to diagnose diabetic nephropathy (DN) using biochemical data or ultrasound imaging separately, a significant gap exists regarding the development of integrated models combining both modalities for enhanced early DN diagnosis. Therefore, we aimed to assess the ability of machine learning models containing two-dimensional ultrasound imaging and biochemical data to diagnose early DN in patients with type 2 diabetes mellitus (T2DM). This retrospective study included 219 patients, divided into a training or test group at an 8:2 ratio. Features were selected using minimum redundancy maximum relevance and random forest-recursive feature elimination. The predictive performance of the models was evaluated using the area under the receiver operating characteristic curve (AUC) for sensitivity, specificity, Matthews Correlation Coefficient, F1 score, and accuracy. K-nearest neighbor, support vector machine, and logistic regression models could diagnose early DN, with AUC values of 0.94, 0.85, and 0.85 in the training cohort and 0.91, 0.84, and 0.84 in the test cohort, respectively. Early DN diagnosing using two-dimensional ultrasound-based radiomics models can potentially revolutionize T2DM patient care by enabling proactive interventions, ultimately improving patient outcomes. Our integrated approach showcases the power of artificial intelligence in medical imaging, enhancing early disease detection strategies with far-reaching applications across medical disciplines.


Inclusion and exclusion criteria
The inclusion criteria were (1) patients with uncomplicated T2DM; (2) patients with DN (categories G1A1-G3A3) diagnosed by a nephrologist or patients with DN (stages I-III) diagnosed by renal biopsy using the 2012 Kidney Disease: Improving Global Outcomes (KDIGO) clinical practice guideline for the evaluation and management of chronic kidney disease (CKD) based on the cause, GFR, and albuminuria system and the Renal Pathology Society's consensus classification of DN (RPS) 21,22 ; and (3) 2D ultrasound images and biochemical data were collected during the study period.
The exclusion criteria were (1) patients with CKD not caused by T2DM; (2) patients with kidney tumors, stones, and cysts; (3) patients with low-quality images; and (4) patients with missing data.

Ultrasound features and diagnostic accuracy
Radiomics extracts many imaging features, including first-order features, shape, texture, gray level co-occurrence matrix, and gray level size zone matrix, which can be analyzed by machine learning algorithms 23 .Quantitative features obtained from ultrasound images include echogenicity, shapes, shear wave velocity, intrarenal artery peak systolic velocity, end-diastolic velocity, resistive index, and organ size 24,25 .We used random forest (RF) to determine the optimal combination of radiomics features to distinguish between healthy and diseased kidneys.Moreover, we performed five-fold cross-validation over the entire dataset to determine optimal parameters.The diagnostic accuracy of each model was evaluated using the area under the receiver operating characteristic curve (AUC) for sensitivity, specificity, Matthews Correlation Coefficient (MCC), F1 score, and accuracy.

Kidney segmentation
Ultrasound images and clinical data were imported into the Darwin research platform using "Darwin research platform" software developed by Yi Zhun Medical (https:// www.yizhun-ai.com).The flowchart of image processing is illustrated in Fig. 2. Regions of interest (ROIs) were selected by a physician with 5 years of experience in ultrasound.The largest coronal section of the kidney was collected, and the ROI was placed on the kidney (Fig. 3).Another sonographer with 10 years of experience in ultrasound examined the segmentation results.Disagreements were resolved by consensus.

Radiomics feature extraction and selection
Feature extraction was performed using the radiomics software package (Python version 3.6.9).The software implements algorithms that calculate a wide range of features, including first-order statistics, shape-based features, texture-based features (e.g., GLCM, GLRLM, or wavelet-based features), and higher-order statistical features.After extracting a large number of radiomics features, a subset of relevant features is typically selected to reduce dimensionality and improve model performance.Machine learning-based techniques (e.g., recursive feature elimination or random forest feature importance) are commonly used to identify the most informative features 18 .In total, 1183 features were obtained.To reduce data overfitting and locate the most relevant features, the top 10 features in the training set were identified using minimum redundancy maximum relevance (mRMR).
The classifier evaluates feature values and determines the optimal combination of features through repeated iterations.For this purpose, the optimal feature combination was determined based on accuracy using random forest-recursive feature elimination (RF-RFE).

Radiomics model construction
The patients were randomly divided into a training set (N = 175) and test set (N = 44) (ratio: 8:2).The top 10 features were selected for further examination.Using a standardization method, all selected features were scaled and transformed into the (0, 1) range for preprocessing.After selecting stable and optimal radiomic features using the feature selection, three separate models based on these classifiers were built: KNN, SVM, and LR.The classifier is generally selected according to the data characteristics, sample size, etc.To avoid overfitting the data, three algorithms that performed well in small data sets were used for verification, and their parameter adjustment results are presented in Supplementary Table S2.Grid search was used to determine the optimal hyperparameters based on a given set of values, such as the best k value for KNN was trained in the range of 3-10; the specific parameters are listed in Supplementary Table S2.Five Κ-fold cross-validation was used to improve model performance.
Feature extraction based on 2D ultrasound images and clinical information yielded radiomics quality scores, which were calculated from the weighted sum of coefficients of the selected features.The radiomics quality score of our study is presented in Supplementary Table S3.The flowchart of model development is presented in Fig. 1.

Model evaluation
After building the three models (KNN, SVM, and LR), the diagnostic accuracy of the models was evaluated using the AUC for sensitivity, specificity, MCC, F1 score, and accuracy.

Statistical analyses
The AUCs were compared using the Delong test.All statistical analyses were performed using SPSS Statistics version 21.0 (IBM Corporation, Armonk, NY, USA).p-values < 0.05 were considered statistically significant.Data normality was evaluated using the Kolmogorov-Smirnov test.Normally distributed continuous variables were compared using Student's t-test and are expressed as means and standard deviations.Categorical variables were compared using the chi-square test and are expressed as frequencies.1).We observed no significant differences in age, sex, TC, TG, HDL-C, LDL-C, Apo-A, Apo-B, BUN, CREA, BUN/CREA, UA, or GLU between the two cohorts (Table 2).

Feature extraction and selection
A total of 1302 radiomics features were extracted from the ultrasound images of each patient.After applying mRMR to the extracted features, the top 10 features were obtained in the training cohort (Fig. 4A).RF-RFE selected the resultant features and determined the best combination of performance features (Fig. 4B).Principal component analysis (PCA) was used to extract the principal components of each group of features and perform dimensionality reduction so that each group of cases could be divided by the eigenvalues.The results are shown in Fig. 4C.The selected features could accurately distinguish between positive and negative cases in the PCA and successfully classify cases with different labels on the left and right sides.The colors in the heat maps correspond to the values of the selected features (Fig. 4D).

Comparison of the models
The KNN, SVM, and LR radiomics models and respective AUCs are listed in Table 3.The AUCs of these models were 0.94, 0.85, and 0.85 in the training cohort and 0.91, 0.84, and 0.84 in the testing cohort (Fig. 5A-C).The calibration curve assesses the agreement between predicted probabilities and observed outcomes from a predictive model: Calibration curves of the models are presented In Fig. www.nature.com/scientificreports/performance (Fig. 5a-c).Decision curve analysis decision (DCA) helps evaluate the net benefit of a diagnostic model over a range of decision thresholds.According to the DCA, the use of the radiomics model to predict DN provided a greater net benefit than the "treat all" or "treat none" strategy over a wide range of threshold probabilities, indicating the high clinical applicability of the radiomics model (Fig. 6D, E).Moreover, in the internal five-fold cross-validation, the radiomics model still showed good diagnostic performance (Fig. 6A-C).The results of the Delong test indicated that the KNN model containing ultrasound and clinical features performed better than the other classifiers for early DN diagnosis than the other models.Diagnostic performance in the training and test cohorts is summarized in Table 3.In addition, we attempted to build diagnostic model using ultrasound and clinical biochemical data separately.The results of its feature extraction and selection, AUCs, ROC curve, calibration curves of the models, and the DCA are presented in Supplementary Figs.S1-6 and Tables S4, 5.

Discussion
In this study, we created a radiometric model based on 2D ultrasonography and clinical features to detect early DN.The diagnostic abilities of the KNN, SVM, and LR models containing ultrasound and clinical features were ideal with AUC values all greater than 0.8 in the training and test sets.In particular, the AUC value of this combined model diagnosing early DN based on KNN radiomics reached 0.94, and the DeLong's test showed the statistical significance of its diagnostic ability.Further, we used clinical data and ultrasound images alone to build diagnostic models.Interestingly, most of their AUC values also exceeded 0.7, especially for the clinical data, showing a great diagnostic advantage.However, comprehensive analysis of AUCs, sensitivity, specificity, MCC, F1 score, accuracy, and DCA revealed that the diagnostic ability of the combined model based on radiomics ultrasound and clinical data was better than that of the clinical data alone or ultrasound images.Thus, we conclude that clinical indicators can complement and reinforce the diagnostic significance of ultrasound indicators.
Long-term hyperglycemia and osmotic diuresis in patients with diabetes can cause glomerular enlargement and hyperfiltration, damaging glomerular capillary endothelial cells, increasing the mesangial matrix, and leading to glomerulosclerosis.Furthermore, hyperglycemia and osmotic diuresis can cause the production and accumulation of extracellular matrix and induce renal fibrosis [26][27][28] .Although proteinuria and low eGFR can help detect DN, their use as a clinical early diagnostic factor is debatable 29,30 .Ultrasonography is a noninvasive, simple, and low-cost method used clinically to evaluate renal morphology and echogenicity in patients with T2DM 31 .However, many patients with T2DM undergo diagnosis, treatment, and prognostic screening in primary care hospitals, posing a challenge to sonographers' skills and the recognition of ultrasound imaging.In this context, radiomics extracts imaging features to support diagnostic decision making is significant 12 .
Ke et al. 32 reported that two color Doppler ultrasound markers, lower intrarenal arterial end-diastolic blood flow velocity and higher arterial resistance index, could accurately diagnose early DN.Bandara et al. 14 conducted an analysis of the radiomics characteristics of chronic kidney disease based on ultrasound.Their results indicate that Doppler ultrasound has an effective, quantitative index for the diagnosis of early DN, and radiomics is rich and sensitive to extract information on ultrasound images.The application of machine learning and radiomics permitted us to construct excellent diagnostic models from 2D ultrasound images and clinically accessible biochemical indicators.For patients with T2DM, 2D ultrasound and biochemical examinations are routine management schemes, which makes our radiomics model more suitable for the popularization of medical practice.Lee et al. 15 conducted an analysis of machine learning to assist diagnosis of chronic kidney disease based on ultrasound imaging and found that ultrasound images alone could achieve an AUC of 0.81.Our findings are consistent with their observations, although there were some differences in population selection and disease stage.According to previous reports, the rate of DN misdiagnosis based on clinical information alone is 49.2% 33 .Zou et al. 34 used machine learning algorithms to build a risk prediction model of ESRD in patients with TD2M and DN and demonstrated that cystatin C, serum albumin, hemoglobin, 24-h urinary protein excretion, and estimated glomerular filtration rate (eGFR) could accurately predict the risk of ESRD.Those results indicate that machine learning based solely on effective clinical data can also diagnose diseases to a large extent.Qu et al. 35 reported that serum CREA levels had the greatest impact on predicting ESRD in machine learning models.Similarly, in our radiomics model construction, CREA was the largest contribution among all features.However, as in the ultrasound-based multimodal radiomics model developed by Ge et al., for the fibrosis detection of chronic kidney disease, the radiomics model combined with clinical features can offer better diagnostic value 16 .
Classifiers play an important role in machine learning.In our study, three commonly used classifiers were applied to build diagnostic models, and the results revealed that KNN showed the best classification performance.KNN is easier to understand than in all the algorithms, but the category distribution in the sample is very sensitive.The average AUC value of the model was 0.87 after five-fold cross-validation.Compared to the training set with an AUC value of 0.94, despite a slight trend towards an unbalanced data distribution, it still showed excellent diagnostic performance.
Radiomics uses statistical algorithms to quantify medical images.The machine learning section was used for the outcome prediction in the subsequent steps.The combined model we built was better than ultrasound alone or clinical biochemical index model.This may be because the features based on the ultrasound images have a strong specificity for the corresponding disease, such as the texture features that can reflect the disease.Combined with the clinical data, the possible interference of some common diseases is greatly excluded, which creates a good diagnostic performance of our model.In terms of clinical utility, the DCA showed that the radiomics model provided a higher overall net gain.
This study has some limitations.First, the study entailed a single-center design and a small sample, which may not be representative of the entire population.Second, given the retrospective design, we analyzed only 2D ultrasound images and limited clinical data, which inevitably led to selection bias.Nonetheless, we intend to perform a prospective study involving ultrasound elastography imaging and omics data in the future.Third, not all clinical characteristics of patients were included in the analysis.Therefore, larger studies with more features are necessary to construct multiparameter machine learning models and improve the diagnosis of DN.
In conclusion, we established a noninvasive method for diagnosing early DN and demonstrated that the models containing ultrasound imaging and biochemical markers helped diagnose early DN more accurately than the other models.A simple diagnostic tool is beneficial for establishing personalized clinical intervention programs to delay disease progression.Nonetheless, the present findings are only an alternative to clinicians and cannot replace the gold standard of diagnostic biopsy for renal injury.

Figure 1 .
Figure 1.Flow diagram of the included participants.

Figure 2 .
Figure 2. The general workflow of this study.

Figure 3 .
Figure 3. Representative plot of region of interest segmentation in patients with diabetic nephropathy (A) and diabetic patients without renal complications (B).(a) Raw grayscale image, (b) segmentation of the kidney region of interest.

Figure 4 .
Figure 4.The top 10 properties of the combined model (A).The RFE-RF feature selection and distribution of distinct cases on PCA.Step by step, RFE-RF (B) was used to determine the optimal feature combination, and the combinations with the highest accuracy would be put into the models.PCA (C) demonstrated that the selected features could separate cases in each group intuitively based on their feature values.Heat maps of the features chosen (D).The maps' colors represented the value of the specified characteristics.

Figure 5 .
Figure 5. Performance of the radiomics model.Receiver operating characteristic (ROC) curves based on the radiomics models of the KNN (A), SVM (B), and LR (C) classifier, respectively.Calibration curves based on the radiomics model of KNN (a), SVM (b), LR (c) classifier, respectively.The gray diagonal dashed line indicates the perfect predictions, and the solid line indicates the model performance.The solid line is closer to the dashed line, indicating better calibration.

Figure 6 .
Figure 6.Five-fold cross-validation over the entire dataset results based on different classifiers and decision curve analyses for the combined model.The red line indicates the combined model, the gray line indicates the hypothesis that all patients had diabetic nephropathy, and the black line indicates the hypothesis that no patient had diabetic nephropathy.(A) KNN, (B) SVM, (C) LR, (D) Training set, and (E) Test set.

Table 1 .
5. The gray dashed line and the solid line indicate complete and estimated prediction, respectively.These lines are close to each other, indicating good Baseline data for the normal kidney group and the diabetic nephropathy group.

Table 2 .
Baseline data for the training cohorts and the validation cohorts.

Table 3 .
Diagnostic performance of different radiomics classifiers in the training cohorts and the validation cohorts of the combined model.p-values are from the DeLong's test; we compared the models under the different classifiers.