Dialysis adequacy predictions using a machine learning method

Dialysis adequacy is an important survival indicator in patients with chronic hemodialysis. However, there are inconveniences and disadvantages to measuring dialysis adequacy by blood samples. This study used machine learning models to predict dialysis adequacy in chronic hemodialysis patients using repeatedly measured data during hemodialysis. This study included 1333 hemodialysis sessions corresponding to the monthly examination dates of 61 patients. Patient demographics and clinical parameters were continuously measured from the hemodialysis machine; 240 measurements were collected from each hemodialysis session. Machine learning models (random forest and extreme gradient boosting [XGBoost]) and deep learning models (convolutional neural network and gated recurrent unit) were compared with multivariable linear regression models. The mean absolute percentage error (MAPE), root mean square error (RMSE), and Spearman’s rank correlation coefficient (Corr) for each model using fivefold cross-validation were calculated as performance measurements. The XGBoost model had the best performance among all methods (MAPE = 2.500; RMSE = 2.906; Corr = 0.873). The deep learning models with convolutional neural network (MAPE = 2.835; RMSE = 3.125; Corr = 0.833) and gated recurrent unit (MAPE = 2.974; RMSE = 3.230; Corr = 0.824) had similar performances. The linear regression models had the lowest performance (MAPE = 3.284; RMSE = 3.586; Corr = 0.770) compared with other models. Machine learning methods can accurately infer hemodialysis adequacy using continuously measured data from hemodialysis machines.


Results
Hemodialysis sessions. This study included 1333 hemodialysis sessions corresponding to the monthly examination dates of 61 patients where URR was measured. The mean blood flow was 265.2 mL/min (SD, 41.4), the mean dialysate flow was 571.0 mL/min (SD, 116.3), the mean dialyzer surface area was 1.8 m 2 (SD, 0.2), the mean URR was 77.7% (SD, 5.3), and the mean total ultrafiltration volume was 2209.0 mL (SD, 826.5) ( Table 1). The fivefold cross-validation method divided the data into five approximately equal-sized portions (the minimum and the maximum number of participants was 12 and 13, respectively). The total number of data points was 319,920. www.nature.com/scientificreports/ Feature importance. Feature importance was calculated for the random forest and XGBoost models to investigate which covariates affect the URR prediction the most (Fig. 2). Pre-dialysis weight was the most important covariate for predicting URR in both models, followed by height and gender. Artificial features extracted by blood flow rate (i.e., the mean and intercept of the linear regression) had higher importance than other artificial features.

Model performances.
Sensitivity analyses. Sensitivity analyses were conducted to confirm the fivefold cross-validation results, which were performed in units of sessions instead of patients. After randomizing the sessions, the linear regression, ML, and DL models were trained, and the sensitivity analysis results were similar to the primary results ( Table 3). The ML and DL models still performed better than the linear regression model. Sensitivity analysis was also performed on data that eliminated URR outliers to determine how outliers affected model fitting. Sessions  www.nature.com/scientificreports/ with URR values greater than the 95th percentile and less than the 5th percentile were removed. The model performances are summarized in Table 3. The models had better performances after eliminating outliers. However, the performance differences among models were similar before and after outlier removal.

Discussion
Current guidelines recommend checking dialysis adequacy once per month because dialysis adequacy is related to the prognosis of end-stage kidney disease patients 3 . However, determining adequacy is challenging owing to the cost and blood exposure. The prediction model used parameters that determine hemodialysis efficiency, such as blood flow and dialysate flow rates, dialysis time, and the dialyzer type [8][9][10] . However, it is difficult to predict dialysis adequacy using these parameters through traditional statistical methods as the relationships between these parameters and urea clearance are not linear; they frequently change during hemodialysis with fluctuations in blood pressure or other symptoms. This study showed that ML and DL models using continuous measurements obtained during hemodialysis predicted dialysis adequacy. Furthermore, there are significant implications in repeated measurements from hemodialysis machines for making such predictions. For example, there is no additional cost because the adequacy predictions are based on measurements obtained from any hemodialysis machine, making this approach useful when remote monitoring is required, such as with at-home hemodialysis. DL has been mainly used for image processing, although recently, DL has also been used for predicting laboratory results or the short-term prognosis of patients based on continuously measured data. Additionally, large-scale intensive care unit datasets, such as the Medical Information Mart for Intensive Care III 11 and eICU Collaborative Research Database 12 , and intra-or post-operative vital sign data are now available for use in research 13 . Various studies have also used DL to investigate hemodialysis. Akl et al. 14 suggested decades ago that the neural network can achieve artificial-intelligent dialysis control, and studies on intradialytic hypotension predictions [15][16][17][18] , the optimal dry weight setting 19 , and anemia control 20 for hemodialysis have been presented. DL in research has also expanded to other kidney diseases to predict acute kidney injury outcomes 21,22 and hyperkalemia 23 . Despite challenges, such as data cleansing costs, the required modeling resources, and algorithm validations, the DL approach is expected to improve the prognosis of hemodialysis patients in the future.
There are some limitations to our study. First, despite a relatively large number of hemodialysis sessions, this study was conducted on a small number of patients. For this reason, DL models might show lower performances than random forest or XGBoost models in this study. A large, prospective study is needed to validate our model. Second, some factors influencing the blood urea nitrogen level during hemodialysis were not considered (e.g., the catabolic status, the exact residual renal function, and access recirculation). However, this study was based on outpatient clinic data with few acutely ill patients, and ultrafiltration (a factor affecting the blood urea nitrogen level) was included in our model. Therefore, the effect of the catabolic status was minimized. Finally, URR has been used as a standard method to measure the hemodialysis dose 4 . However, the current guidelines do not recommend using URR for hemodialysis adequacy. Nevertheless, URR is widely used in clinics because it is easy to calculate and has a similar sensitivity to urea reduction compared with other methods 24 . Models that predict spKt/V require verification in the future.
In conclusion, ML can accurately infer hemodialysis adequacy through repeatedly measured data during hemodialysis sessions. We expect to be able to develop personalized hemodialysis profile recommendation models through prospective data collection soon.

Materials and methods
Study population. The data were extracted from the Severance Hospital hemodialysis database, which stores information about each hemodialysis session. A total 21,004 sessions of 75 outpatients aged over 19 which were automatically recorded in the Therapy Data Management System from May 2015 to September 2020 were screened. Among them, 61 patients who were examined for dialysis adequacy regularly were finally selected and clinical information including dialysis adequacy was additionally collected. The study was performed following the Declaration of Helsinki principles, and the Severance Hospital institutional review board approved this study (no. 4-2021-0056) and waived informed consent as only de-identified, previously collected data was accessed.
Data collection and measurements. Demographic and anthropometric data (including sex and age) were collected corresponding to the hemodialysis date from electric medical records. Blood pressure, the vascular access type (arteriovenous fistula, arteriovenous graft, and catheter), and the dialyzer type (surface area) were recorded at the initiation of each hemodialysis session. Data, including blood flow and ultrafiltration rates, bicarbonate and sodium levels, dialysate flow, vein and artery pressures, and the dialysate temperature, were measured every minute from the start of each session unless problems or interventions occurred. Monitoring software linked to each dialysis machine recorded the hemodialysis measurements in real-time and collected 240 measurements (about 4 h) from each session; missing values were completed using an interpolation method. URR (the blood urea concentration decrease [%] during hemodialysis) was measured as an indicator of dialysis adequacy. All hemodialysis sessions included in this study used the Fresenius 5008S (Fresenius Medical Care, Bad Homburg, Germany) hemodialysis device.
Model building. Linear regression was considered the base model for performance comparisons with ML and DL algorithms. Random forest 25 and XGBoost 26 were chosen for the ML algorithms. The convolutional neural network and gated recurrent unit 27 architectures were chosen for the DL algorithms to extract features from time-varying covariates. The DL algorithms were trained with a batch size of 128, Adam optimizer 28 and the root mean squared error (RMSE) loss function. The detailed architectures of the DL algorithms are illustrated in Supplementary Figure S1 and Supplementary Figure S2. The hyperparameters were optimized to minimize the RMSE through a random search with fivefold cross-validation in ML algorithms. All selected hyperparameters are described in Supplementary Table S1.
Covariates were normalized to have values between 0 and 1 in the DL algorithms, which can automatically extract features from time-varying covariates. In contrast, the linear regression and ML algorithms require a