Machine learning regression algorithms to predict short-term efficacy after anti-VEGF treatment in diabetic macular edema based on real-world data

The objective of this retrospective study was to predict short-term efficacy of anti-vascular endothelial growth factor (VEGF) treatment in diabetic macular edema (DME) using machine learning regression models. Real-world data from 279 DME patients who received anti-VEGF treatment at Ineye Hospital of Chengdu University of TCM between April 2017 and November 2022 were analyzed. Eight machine learning regression models were established to predict four clinical efficacy indicators. The accuracy of the models was evaluated using mean absolute error (MAE), mean square error (MSE) and coefficient of determination score (R2). Multilayer perceptron had the highest R2 and lowest MAE among all models. Regression tree and lasso regression had similar R2, with lasso having lower MAE and MSE. Ridge regression, linear regression, support vector machines and polynomial regression had lower R2 and higher MAE. Support vector machine had the lowest MSE, while polynomial regression had the highest MSE. Stochastic gradient descent had the lowest R2 and high MAE and MSE. The results indicate that machine learning regression algorithms are valuable and effective in predicting short-term efficacy in DME patients through anti-VEGF treatment, and the lasso regression is the most effective ML algorithm for developing predictive regression models.


Source of data and participants
Patients diagnosed with DME in Ineye Hospital of Chengdu University of Traditional Chinese Medicine from April 2017 to November 2022 were included.After preparing valid data, ML was used to predict the short-term efficacy of DME patients after anti-VEGF treatment.
The inclusion criteria for the patients were: (1) clinical diagnosis of DME based on the Diabetic Retinopathy Preferred Practice Pattern 2019 27 ; (2) receipt of at least one anti-VEGF treatment; (3) age between 18 and 85 years; (4) follow-up period of no more than 3 months.The exclusion criteria were: (1) the presence of other eye disorders that may affect VA, such as glaucoma, age-related macular degeneration (AMD), retinal detachment, etc.; (2) lack of clinical data; (3) refractive interstitial opacification obscuring the macula; (4) undergoing intraocular surgery.

Data preprocessing
BCVA was converted from decimal notation to LogMAR notation for statistical analysis.Count finger, hand movement and light perception were recorded as 2.0, 2.3 and 2.6 in LogMAR notation respectively 29 .No light perception should be recorded as infinity in LogMAR notation, for calculation, was recorded as 100 in LogMAR notation.514 missing data were complementary by IBM SPSS statistics 27 using methods of regression to adjust residuals.

Machine learning models and training
We established eight ML regression models, including linear regression, polynomial regression, ridge regression, lasso regression, support vector machines (SVM), regression tree, multilayer perceptron (MLP) and stochastic gradient descent (SGD) regression.Using Python 3.9.0 and Scikit-learn 1.2.0 for training and testing.We used Matplotlib 3.5.2 to draw the figures.We split the data into 75% for training and 25% for testing.
Linear models are regression methods that assume a linear relationship between the features and the target.Ordinary Least Squares is a linear model that minimizes the sum of squared errors between the observed and predicted targets 30 .Polynomial regression models y as an nth-degree polynomial in x.It can capture nonlinear relationships.It was influential in regression analysis history, with a focus on design and inference 31 .Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated 32 .It regularizes ill-posed problems and reduces multicollinearity in linear regression 33 .In general, the method provides improved efficiency in parameter estimation problems in exchange for a tolerable amount of bias 34 .Lasso is a linear model with sparse coefficients.It reduces the number of features and can recover the exact non-zero coefficients under certain conditions 30 .SVM is a young and practical branch of statistical learning theory.It transforms low-dimensional nonlinear functions into high-dimensional spaces via a smart nonlinear mapping, without requiring its explicit form 35,36 .Classification and regression trees are nonparametric regression methods that recursively partition the feature space into rectangular areas 37 , first proposed by Breiman et al. 38 .They use binary recursive partitioning to split the data into smaller groups along each branch 37 .MLP is a feedforward artificial neural network (ANN) with full connectivity and at least three layers: input, hidden and output.Each node except the input ones is a nonlinear neuron.MLP uses backpropagation for supervised learning 39,40 .It differs from a linear perceptron by its multiple layers and non-linear activation.It can handle non-linearly separable data 41 .SGD is a simple yet very efficient approach to fit linear models.It is particularly useful when the number of samples (and the number of features) is very large.The method allows online/out-of-core learning 30,42 .Python 3.9.0 and Scikit-learn 1.2.0 were used for training and testing.Matplotlib 3.5.2was used to draw the figures.The dataset was split into training dataset (75%) and testing dataset (25%).

Evaluating the performance of prediction models
Mean absolute error (MAE), mean square error (MSE), and coefficient of determination (R 2 ) score were used as the evaluation metrics to assess the accuracy of the prediction models.
MAE is a risk metric that corresponds to the expected value of the absolute error loss or -norm loss l1 .On the other hand, MSE is a risk metric that corresponds to the expected value of the squared (quadratic) error or loss.R 2 value represents the proportion of variance (of y ) that can be explained by the independent variables in the model, which indicates goodness of fit.A lower value of MAE and MSE indicates a higher accuracy of the prediction model.Conversely, a higher R 2 score, with the best possible score being 1.0, suggests a better fit of the prediction model.It's worth noting that the R 2 score can be negative if the model's performance is arbitrarily worse.Therefore, a score closer to 1 indicates a better fit of the prediction model 30,42 .
For a sample of n observations y ( y i , i = 1,2, . ..,n) and n corresponding model predictions y , the MAE, MSE and R 2 score are Weighted sum of MSE, MAE and R 2 score in BCVA, CAT, CST and CV for each model to tally the final score.

Data correlation
Regression coefficients (coef) are estimates of the unknown population parameters and describe the relationship between a predictor variable and the response.Which can be shown in four models (linear regression, ridge regression, lasso regression and SGD).The larger the absolute value of coef, the higher the correlation between the predictor variable and the response variable.A positive coef value indicates a positive correlation with the predicted outcome, while the opposite is a negative correlation.

Ethics approval
This study was reviewed and approved by the Academic Committee and the Ethics Committee of Ineye Hospital of Chengdu University of Traditional Chinese Medicine (Ethics number: 2022yh-023).

Characteristics of dataset
279 eyes were included in our research, the overall characteristics of the dataset are shown in Table 1.

Model performance on various clinical indicators
The results are shown in Table 2.For BCVA, the best model is regression tree, which has zero MAE and MSE and one R 2 for both sets.The worst model is SVM, which has the highest MAE and MSE for both sets.The other models have similar performance on the training set but vary on the testing set.For CST, the best model is lasso regression, which has the lowest MAE and MSE and the highest R 2 for both sets.The worst model is polynomial regression, which has very high MAE and MSE and negative R 2 for the testing set.The other models have similar performance on both sets.For CV, the best model is MLP, which has the lowest MAE and MSE and the highest R 2 for both sets.The worst model is polynomial regression, which has very high MAE and MSE and negative R 2 for the testing set.The other models have similar performance on both sets.For CAT, the best model is MLP, which has the lowest MAE and MSE and one of the highest R 2 for both sets.The worst model is polynomial

Model performance
The visualization of prediction results is shown in Fig.

Discussion
In this study, we developed eight ML regression models to predict the therapeutic effect of DME patients after anti-VEGF treatment.Our results demonstrate that ML regression algorithms have a high potential for efficacy prediction in diseases.We observed that the overall performance of the eight ML models was better on BCVA than on the other three clinical indicators.A possible reason for this is that BCVA measurements were performed independently by a standard procedure, while other clinical indicators were obtained by two OCT machines that varied in algorithm, clarity, and scan interval.Among all models, MLP and lasso regression models outperformed others.MLP is a type of ANN, which is essential for building nonlinear relationship models in www.nature.com/scientificreports/high-dimensional datasets 43 .MLP adds one or more hidden layers on top of the single-layer neural network, which can be iterated.Therefore, it has a greater capacity to learn and generalize, to fit multiple classes of functions, and to predict nonlinear data.Lasso regression has been widely applied in general regression models to predict the risk of likely outcomes 44 .Lasso regression performs both variable selection and regularization to improve the prediction accuracy and interpretability of the resulting model 45 .Thus, lasso regression might be more appropriate for our regression task.Our results indicate that CAT and macular area thickness have a significant impact on BCVA prediction.DME can severely impair VA when retinal leakage accumulates in the OPL, which increases the overall macular thickness and may disturb the normal path of light from the inner retinal surface to the outer segments 46 .Moreover, DME is also associated with vascular changes that have been linked to the degree of visual impairment [47][48][49][50] .Meanwhile, CST prediction is more related to the overall macular thickness.In pathological conditions, altered organization and stability of junction proteins of RMG (e.g., ZO-1) can lead to cyst formation 46,51 .RMG cell density is about five times higher in the fovea than in the periphery.Therefore, due to its thinness, the fovea is more susceptible to edema and has a greater influence on macular thickness 52 .The prediction of CAT and CV was more correlated with the thickness of the outer macular ring and ganglion cell layer because of their anatomical characteristics.The outer macula has a thick layer of ganglion cells while the central macular recess is the thinnest part of the entire macula 46 .This may result in a higher likelihood of volume change in the outer macula affecting the thickness of different areas of the macula.
We also found that the type of anti-VEGF drugs was highly correlated with the predicted outcomes of BCVA, CST, CAT, and CV.Conbercept had a greater impact on these outcomes than ranibizumab and aflibercept.Both aflibercept and conbercept are recombinant decoy receptor groups of VEGF 53 .They block free VEGFmediated signalling through their cognate receptors, thus inhibiting the pro-inflammatory, hyperpermeable, and pro-angiogenic effects of VEGF in a similar way.However, aflibercept showed greater therapeutic effects than conbercept, including more VA improvement and anatomical recovery.It also had advantages over mAb drugs (e.g., bevacizumab and ranibizumab) in enhancing VA and reducing macular edema 54 .Nevertheless, aflibercept is the most expensive among anti-VEGF drugs and may have low patient acceptability due to the high cost.Conbercept is a novel multi-targeted anti-VEGF drug that has demonstrated excellent efficacy in treating patients with DME and neovascular AMD.The induction of domain 4 distinguishes conbercept from aflibercept and enhances its VEGF binding capacity.Due to the structural change, conbercept may affect steadily in the vitreous humour 55 .Conbercept requires fewer injections, has a lower risk of injection-related complications, and is more cost-effective than ranibizumab 54,56 .Among the included patients, most of them chose conbercept.This may have been driven by financial considerations and better outcomes.However, this also introduces some degree of bias.The better efficacy for DME and the bias may have contributed to the higher predictive relevance of conbercept compared to other anti-VEGF drugs.
Several studies [57][58][59][60] used classification algorithms with less than six models.We used regression algorithms innovatively and established up to eight models.However, the evaluation indicators of classification and regression algorithms are different, so we cannot directly compare their accuracy.We hope that future research will explore more on the regression algorithm.Moreover, most of these studies used public datasets as their data sources, while we used a self-built database based on real-world data.Therefore, our model may be more consistent with the current state of the population and have epidemiological significance.Some limitations should be drawn out.First, we had limited access to patient information, such as blood glucose, blood pressure, insulin use, smoking history, family history, etc. 61 , that was not available in our outpatient system.This may have led to less input for model training and reduced model accuracy.We imputed missing data using regression methods, but this may differ from the actual data and affect the prediction results.Second, polynomial regression and regression trees showed overfitting due to a large number of parameters and complex structure.Both polynomial regression and regression trees are prone to overfitting because they both increase the complexity and degrees of freedom of the model which may learn too many features that do not reflect the general patterns of the data.
The application of artificial intelligence (AI) in the medical field is currently not mature enough.Machine learning (ML), as a subset of AI, is commonly used for classification tasks, while regression tasks, which are commonly used for quantitative analysis, are rarely addressed.However, ML regression algorithms have potential in clinical settings and the public health field, such as developing treatment plans and predicting post-operative complications 62,63 .For example, for diabetic macular edema (DME) patients, clinicians can select the most suitable anti-VEGF drug by considering ML-predicted indicators and drug prices, The improved model can be better applied to medical institutions in underdeveloped areas and primary hospitals, and enhance the diagnostic efficiency and accuracy of clinicians, as well as help reduce the economic burden on patients.With the rapid development of computer science, deep learning has emerged as an extension of ML that is commonly used for prediction and classification tasks using images in the medical field.Combining ML regression algorithms with deep learning can greatly expand the type and amount of data available to make more accurate clinical predictions, this will help improve the medical level in underdeveloped areas and greatly reduce the pressure on clinicians.In general, AI will inevitably change the existing mode of clinical practice in the future.Although AI may not replace physicians, the "AI + physician" mode of diagnosis and treatment may not be far away.

Conclusion
ML regression algorithms are effective in predicting the short-term efficacy of anti-VEGF treatment in DME patients, which are valuable in clinical and public health settings.Our results show that BCVA has the best prediction result compared to CST, CAT and CV.Furthermore, our analysis suggests that the lasso regression algorithm is the most effective ML technique for developing predictive regression models.

Figure 1 .
Figure 1.Model performance on the test set.(a) Performance of models with MAE as evaluation index.(b) Performance of models with MSE as evaluation index.(c) Performance of models with R 2 as evaluation index.

Figure 2 .
Figure 2. Visualization of prediction results.(a) Performance of each model when predicting BCVA, (b) performance of each model when predicting CST, (c) performance of each model when predicting CV, (d) performance of each model when predicting CAT.

Table 1 .
Baseline of characteristics.

Table 2 .
Model performance on various clinical indicators.