Machine learning-based models for predicting clinical outcomes after surgery in unilateral primary aldosteronism

Unilateral subtype of primary aldosteronism (PA) is a common surgically curable form of endocrine hypertension. However, more than half of the patients with PA who undergo unilateral adrenalectomy suffer from persistent hypertension, which may discourage those with PA from undergoing adrenalectomy even when appropriate. The aim of this retrospective cross-sectional study was to develop machine learning-based models for predicting postoperative hypertensive remission using preoperative predictors that are readily available in routine clinical practice. A total of 107 patients with PA who achieved complete biochemical success after adrenalectomy were included and randomly assigned to the training and test datasets. Predictive models of complete clinical success were developed using supervised machine learning algorithms. Of 107 patients, 40 achieved complete clinical success after adrenalectomy in both datasets. Six clinical features associated with complete clinical success (duration of hypertension, defined daily dose (DDD) of antihypertensive medication, plasma aldosterone concentration (PAC), sex, body mass index (BMI), and age) were selected based on predictive performance in the machine learning-based model. The predictive accuracy and area under the curve (AUC) for the developed model in the test dataset were 77.3% and 0.884 (95% confidence interval: 0.737–1.000), respectively. In an independent external cohort, the performance of the predictive model was found to be comparable with an accuracy of 80.4% and AUC of 0.867 (95% confidence interval: 0.763–0.971). The duration of hypertension, DDD of antihypertensive medication, PAC, and BMI were non-linearly related to the prediction of complete clinical success. The developed predictive model may be useful in assessing the benefit of unilateral adrenalectomy and in selecting surgical treatment and antihypertensive medication for patients with PA in clinical practice.

www.nature.com/scientificreports/ Several previous studies have reported preoperative predictors of clinical outcomes such as age, sex, body mass index (BMI), duration of hypertension, use of preoperative antihypertensive drugs, medical history of diabetes, plasma aldosterone concentration (PAC), target organ damage and adrenal nodule size on imaging [6][7][8][9][10][11][12] . Accordingly, the likelihood of persistent hypertension after unilateral adrenalectomy is complicated, and the predictors are not consistent across studies, which may impede clinical application. This may be because it is not well understood how predictors affect clinical outcomes.
Machine learning provides techniques that can automatically build computational models of complex relationships between observable variables and relevant objective variables by processing the available data, thus maximizing performance criteria 13 . Traditionally, generalized linear models have been used as statistical methods to make predictions. However, in clinical practice, the relationship between observable and objective variables is often non-linear. Machine learning models based on trees are the most popular non-linear models in use today 14 . Indeed, we and others have recently developed machine learning models for the diagnostic prediction of PA [15][16][17] . If the relationship between preoperative predictors and clinical outcomes is non-linear, the machine learning approach may be more appropriate for predicting postoperative outcomes in PA.
This study aimed to develop optimal machine learning-based models for predicting postoperative hypertensive remission in patients with PA using preoperative predictors readily available in routine clinical practice and to evaluate the relationship between predictors and clinical outcomes.

Materials and methods
Patients studied and diagnosis of PA. In this retrospective cross-sectional study, we consecutively included 123 patients with PA as a cohort used for modeling, who were referred to Kyushu University Hospital or Sapporo City General Hospital, underwent unilateral adrenalectomy, and had a postoperative follow-up of at least 6 months between January 2007 and November 2020. The exclusion criteria were patients with PA who failed to achieve complete biochemical success after adrenalectomy, assumed to be mainly due to misinterpretation of lateralization or incomplete resection of aldosterone-producing tissues, and patients with a postoperative follow-up shorter than 6 months. This study was part of the Kyushu Adrenal Network Database for Advanced medicine (Q-AND-A) study 18 . All patients were diagnosed with PA according to the guidelines of the Japan Endocrine Society 19 and the Japanese Society of Hypertension 20 , with case detection, confirmatory tests, and subtype classification. At least one diagnostic confirmation test (saline infusion test or captopril challenge test) was performed based on the methods and criteria described in the above guidelines. More than two weeks prior to the diagnosis, antihypertensive drugs were routinely changed to calcium channel blockers and/ or α-adrenergic blockers. Oral potassium supplementation was generally administered to patients with hypokalemia. For subtype diagnosis, adrenal venous sampling (AVS) with adrenocorticotropic hormone stimulation was performed and evaluated as described previously 21 . Unilateral adrenalectomy was performed upon request in patients with unilateral subtype of PA as determined by AVS and in those with clinically determined unilateral subtype of PA if AVS failed or AVS results were difficult to interpret. This retrospective study was approved by the institutional review board of Kyushu University (approval number: 2021-395) and was performed in accordance with relevant guidelines. Informed consent was obtained from the patients upon admission to our hospitals.
Definition of biochemical and clinical outcomes. Biochemical and clinical outcomes were classified as complete, partial, or absent success and were defined according to the PASO study 5 . Briefly, patients with complete clinical success comprise those with normalized blood pressure levels without the use of antihypertensive medication after unilateral adrenalectomy. Patients with normalization of hypokalemia (if present preoperatively) and normalization of aldosterone-to-renin ratio were classified as having complete biochemical success.
Model development by machine learning. Predictions were generated using a gradient-boosting machine model built with decision-tree base learners 22 . The advantages of using a decision tree are its ability to act strongly against outliers in the data and handle discrete features and missing values. We used the gradientboosting predictor trained with the LightGBM 23 . Compared with other gradient-boosting predictors, LightGBM has better performance in terms of computational speed, memory consumption, and communication costs for parallel learning. Hyperparameters of the LightGBM were optimized using a grid search; the optimized values included max depth, learning rate, min data in leaf, num leaves, and n estimators. We selected clinical and biochemical data that are readily available in routine clinical practice as features among previously reported predictors 6-12 , including age, sex, BMI, duration of hypertension, defined daily dose (DDD) of antihypertensive medication, medical history of diabetes and PAC. We used a holdout method and randomly assigned patients with PA to two groups for analysis: training (80%) and test datasets (20%). Predictive models trained using stratified five-fold cross-validation on the training dataset were applied to the test dataset and their performance was evaluated. To evaluate predictive models, accuracy, positive predictive value, negative predictive value, and areas under the receiver operating characteristic curve (AUC) were calculated. Recently, in addition to the statistical validity of machine learning predictive models, their interpretability has been emphasized in the usefulness of the models 24 . We applied a Shapley additive explanations (SHAP) algorithm to our predictive model to obtain interpretations of the features that drive patient-specific predictions to mitigate the issue of black-box predictions 14,25 . SHAP is one of the well-established explainability methods for machine learning models, which are often difficult to interpret. SHAP is a model-agnostic representation of feature importance where the impact of each feature on a particular prediction is represented using Shapley values inspired by cooperative game theory. A Shapley value states, given the current set of feature values, how much a single feature in the context of its interaction with other features contributes to the difference between the actual prediction and the mean prediction. That is, the sum of the Shapley values for all features plus the mean prediction equals the actual Assay methods. The PAC used in the development of the model was measured using a commercial radioimmunoassay kit (SPAC-S Aldosterone; Fujirebio, Tokyo, Japan). The reference range of PAC for this kit in the supine position was 3.0-15.9 ng/dL. All other biochemical features were assayed in plasma or serum using standard methods.

Statistical analysis.
Clinical characteristics are presented as medians with interquartile ranges or counts with frequencies. Data between two groups were compared using the Mann-Whitney U test and Fisher's exact test. Statistical analysis was performed using EZR statistical software (Saitama Medical Center, Jichi Medical University, Saitama, Japan), which is a graphical user interface for R (The R Foundation for Statistical Computing, Vienna, Austria, version 3.5.2) 28 . Statistical significance was set at P < 0.05.

Results
Clinical characteristics of the patients studied. A total of 107 patients with PA who achieved complete biochemical success after adrenalectomy were included for modeling in this study (Fig. 1). Based on the PASO study, of the 107 patients studied, 40, 38, and 29 patients achieved complete clinical success, partial clinical success, and absent clinical success, respectively. The clinical and biochemical characteristics of the patients in the modeling cohort are presented in Table 1. Patients who achieved complete clinical success were younger, more frequently female, had lower BMI, shorter duration of hypertension, and lower DDD of antihypertensive drugs than those who achieved partial and absent clinical success. In the following analysis, 85 (32 complete clinical success and 53 partial plus absent clinical success) and 22 (8 complete clinical success and 14 partial plus absent clinical success) patients were randomly assigned to the training and test datasets, respectively. We included 51 out of 58 patients with PA in the external cohort and excluded 7 patients who failed to achieve complete bio-  www.nature.com/scientificreports/ chemical success after adrenalectomy from further analysis (Fig. 1). Clinical and biochemical characteristics of the modeling and external cohorts are shown in Table 2.
Machine learning-based model for predicting complete clinical success. We first developed a machine learning-based model using seven clinical features. Figure 2 shows the mean absolute SHAP value of each feature in the model. In this study, a higher SHAP value for a feature was interpreted as a higher probability of complete clinical success after unilateral adrenalectomy. The rank order of the mean absolute SHAP values across the seven features in the developed model reflects their importance. For the prediction of complete clinical success, the rank order was the duration of hypertension, followed by DDD of antihypertensive medication and PAC, whereas the medical history of diabetes did not contribute at all to the prediction in this model (mean absolute SHAP value = 0). To improve the predictive performance, reduce the effect of the curse of dimensionality, and shorten learning time 29 , we removed the medical history of diabetes from the list of features and redeveloped the model. The predictive accuracy, positive predictive value, negative predictive value, and AUC for the model developed using the six features in the test dataset were 77.3%, 80.0%, 76.5%, and 0.884 (95% confidence interval: 0.737-1.000), respectively. Figure 3a shows the receiver operating characteristic curve of the model.
Validation of the predictive model in the external cohort. Our predictive model with the six features was validated using the independent external cohort. As a result, the predictive accuracy, positive predictive value, negative predictive value, and AUC were 80.4%, 81.8%, 79.3%, and 0.867 (95% confidence interval: 0.763-0.971), respectively. Figure 3b shows the receiver operating characteristic curve of the model. Furthermore, this performance was roughly comparable to that obtained with the test dataset, suggesting that the effect of overfitting in our model was minimal.

Interpretation and evaluation of the machine learning-based model. The SHAP summary plot
and SHAP dependence plots of the model developed using the six features were used for interpreting the relationship between the values of each feature and predicting complete clinical success (Fig. 4). As shown in Fig. 4a, predicting complete clinical success was found to be associated with a shorter duration of hypertension, lower DDD of antihypertensive medication, higher PAC, female sex, and lower BMI. However, there was no consistent trend in age. The duration of hypertension, DDD of antihypertensive medication, and BMI were negatively www.nature.com/scientificreports/ and non-linearly related to the prediction of complete clinical success (Fig. 4b). An almost linear negative correlation was observed when the duration of hypertension was less than 7 years, and a constant low probability of complete clinical success was observed when the duration was more than 7 years. Similarly, an almost linear negative correlation was observed when DDD of antihypertensive medication was less than 3, and a constant low probability of complete clinical success was observed when DDD was more than 3. PAC contributed to the prediction of complete clinical success at 37 ng/dL and above, whereas the effect was smaller at 67 ng/dL and above. BMI contributed to the prediction of complete clinical success at 22 kg/m 2 or less, whereas it showed a constant low probability of 22 kg/m 2 or more. SHAP decision plots offer a detailed view of a model's inner workings and show how models make decisions. In Fig. 5, we synthesized prediction lines for patients in the training dataset who were misclassified by our predictive model. When the more influential feature, hypertension duration, the individual prediction lines changed significantly, leading to misclassification of the final risk score. Of the six features used in the developed model, the duration of hypertension is often difficult to estimate accurately and may have misestimated in the prediction of complete clinical success for some patients.

Discussion
In this study, we developed a machine learning-based model for predicting complete clinical success in patients with PA using preoperative clinical data readily available in routine clinical practice. Our predictive model was about 80% accurate on both the test dataset and the external cohort. Altogether, these observations suggest that our model has a high predictive value for complete clinical success after adrenalectomy. We found six features of high predictive importance in the model and clarified the relationship between these features and clinical  www.nature.com/scientificreports/ outcomes. Because more than half of patients with PA who undergo surgical treatment do not achieve complete clinical success, our model can provide useful information for clinicians and patients with PA in making treatment decisions. This model should also help clinicians identify patients who need close follow-up, because patients who do not achieve complete clinical success after unilateral adrenalectomy may have residual hypertension and require continuous monitoring after surgery.
Although unilateral subtype of PA can be cured by unilateral adrenalectomy, biochemical and clinical success are not often coincide 5 , which may discourage patients with unilateral subtype of PA from undergoing adrenalectomy even when appropriate. Indeed, of those with unilateral subtype of PA determined by AVS, 21.8% in Japanese centers and 8.6% in European centers reportedly did not undergo adrenalectomy 30 . Thus, the application of this model to patients with unilateral subtype of PA can help correlate their clinical data effectively with clinical outcomes, thereby leading to a reduction in the number of those who are reluctant to undergo adrenalectomy.
Several predictors related to postoperative clinical outcomes in PA have been reported [6][7][8][9][10][11][12] . However, the reported predictors were not often consistent across studies. This may be partly because previous studies assumed the linearity of the features and used generalized linear models. In this study, we comprehensively incorporated clinical data that are readily available in routine clinical practice among predictors reported to be related to clinical outcomes in the non-linear machine learning algorithm. As a decision tree-based model, LightGBM also has the advantage of being robust against multicollinearity 31 . When features were selected to improve the performance of the developed predictive model, six important features were found in the following order: duration of hypertension, DDD of antihypertensive medication, PAC, sex, BMI, and age. Due to the different test datasets used in each study, it is difficult to make a direct comparison of the accuracy between our developed predictive model and previously reported scoring models. However, in the test dataset of each study, the predictive accuracy, positive predictive value, and negative predictive value of our predictive model were 77.3%, 80.0%, and 76.5%, respectively, compared to 65.7%, 75.0%, and 63.6%, respectively, for the aldosteronoma resolution score (ARS) (≥ 4 points) 12 and 73.0%, 64.4%, and 80.0%, respectively, for the primary aldosteronism surgical outcome (PASO) score (> 16 points) 7 . The performance of our predictive model compared favorably with those reported previously. We could confirm the usefulness of applying a non-linear machine learning algorithm in predicting clinical outcomes in PA.
The advantage of our study is the use of machine learning to reveal the non-linear relationship between clinical features and complete clinical success, and the use of SHAP values to uncover the black box of the machine learning-based model. Visualizing the change in SHAP values can help understand how these features affect the output of the predictive model (Fig. 4). In this study, the duration of hypertension, DDD of antihypertensive medication, PAC, and BMI showed a non-linear relationship with complete clinical success. Our predictive model can provide the importance of the features involved in the prediction by showing as a SHAP decision plot how the model uses the features to make decisions for each patient used. This visual description allows the clinician to immediately and easily interpret the output of the model for each patient. For example, Fig. 6 shows the SHAP decision plots for two cases in the dataset. In case (a), although the SHAP values for PAC and BMI were negative, the prediction of complete clinical success was derived with a particularly large effect of duration of hypertension and DDD. In case (b), the SHAP values of all six features were negative, predicting that complete clinical success would not be achieved. As predicted by the model, postoperatively, blood pressure levels normalized in case (a), while hypertension remained in case (b). The duration of hypertension was the most relevant feature to complete clinical success. In addition, the persistence of hypertension due to excessive aldosterone secretion over a long-time period could have detrimental and irreversible effects on the cardiovascular system. The mechanistic relationship between the duration of hypertension and clinical outcomes must await further investigation.
There are several limitations to this study. First, because this study retrospectively investigated patients with PA who underwent unilateral adrenalectomy, prospective studies are required to assess the validity of our predictive model. Second, the model developed cannot distinguish between patients with partial clinical success and those with absent clinical success, although they both require postoperative antihypertensive medication and continuous blood pressure control. Third, patients with PA who failed to achieve complete biochemical success were excluded from this study, and the performance of the predictive model when they are included is unknown In conclusion, we developed a machine learning-based model to predict complete clinical success after unilateral adrenalectomy. Moreover, our predictive model is based on readily available clinical features, which may be used with ease in routine clinical practice. The developed predictive model may be useful in assessing the benefit of unilateral adrenalectomy and in selecting surgical treatment and antihypertensive medication for patients with PA in clinical practice.

Data availability
The datasets generated and analyzed during the current study is not publicly available but are available from the corresponding author on reasonable request. The processed codes are available from GitHub: https:// github. com/ hiroki-kane/ Clini cal-Outco mes-of-PA.   cases (a, b) in the dataset. The prediction line represented the pathway leading to the estimated probability of complete clinical success for each patient in our predictive model.