An Equation Based on Fuzzy Mathematics to Assess the Timing of Haemodialysis Initiation

In order to develop an equation that integrates multiple clinical factors including signs and symptoms associated with uraemia to assess the initiation of dialysis, we conducted a retrospective cohort study including 25 haemodialysis centres in Mainland China. Patients with ESRD (n = 1281) who commenced haemodialysis from 2008 to 2011 were enrolled in the development cohort, whereas 504 patients who began haemodialysis between 2012 and 2013 were enrolled in the validation cohort comprised. An artificial neural network model was used to select variables, and a fuzzy neural network model was then constructed using factors affecting haemodialysis initiation as input variables and 3-year survival as the output variable. A logistic model was set up using the same variables. The equation’s performance was compared with that of the logistic model and conventional eGFR-based assessment. The area under the bootstrap-corrected receiver-operating characteristic curve of the equation was 0.70, and that of two conventional eGFR-based assessments were 0.57 and 0.54. In conclusion, the new equation based on Fuzzy mathematics, covering laboratory and clinical variables, is more suitable for assessing the timing of dialysis initiation in a Chinese ESRD population than eGFR, and may be a helpful tool to quantitatively evaluate the initiation of haemodialysis.

www.nature.com/scientificreports www.nature.com/scientificreports/ based on Fuzzy mathematics, covering laboratory and clinical variables, is more suitable for assessing the timing of dialysis initiation in a Chinese ESRD population than eGFR, and may be a helpful tool to quantitatively evaluate the initiation of haemodialysis.
Maintenance haemodialysis is the main renal replacement therapy used for patients with end-stage renal disease (ESRD). However, the optimal time of haemodialysis initiation remains a vital factor to reduce the morbidity of complications and mortality associated with dialysis [1][2][3][4] . The early initiation of haemodialysis leads to an accelerated decline in residual renal function, poor quality of life, and waste of medical resources. In contrast, the late initiation of haemodialysis increases the incidence of complications, causes higher mortality, and increases treatment costs. Over the past three decades, several studies have attempted to assess the initiation time of dialysis; however, no consensus has been reached.
From the 1970s to 1990s, studies have shown that a high initial KT/V level (K, dialyzer clearance of urea; t, dialysis time; V, volume of the distribution of urea) for urea can improve dialysis outcomes [5][6][7] . In the USA, the mean estimated GFR (eGFR) at dialysis initiation has gradually increased from 1996 until 2008. In particular, the proportion of patients beginning haemodialysis with an eGFR of >10 mL/min has increased from 20% to 52%, whereas those beginning haemodialysis with an eGFR of ≥15 mL/min has increased from 4% to 17% 8 . However, subsequent observational studies based on these registries included a large number of patients, but produced controversial results [9][10][11][12][13][14] . In particular, the IDEAL study in 2010 showed that survival did not significantly differ between ESRD patients with early and late dialysis initiation 14 .
One reason for the conflicting results was that the definitions of "early" and "late" were based on serum creatinine-based GFR estimations, including equations such as the Modification of Diet in Renal Disease (MDRD) study equation 15 . The equations for GFR estimation do not consider essential clinical factors such as nutrition, diabetes mellitus, and signs and symptoms of uraemia (e.g., volume overload, gastrointestinal tract symptoms, and anaemia), which may affect dialysis initiation. Therefore the updated 2015 Kidney Disease Outcomes Quality Initiative (KDOQI) guidelines recommend that the decision to initiate dialysis should be based on an assessment of multiple factors, such as signs and symptoms of, for example, uraemia, volume overload, and heart failure, and not only the eGFR level 16 . However, these clinical factors are subjectively assessed by doctors and depend on their individual experience. Hence, the quantitative assessment of these aforementioned clinical factors may enhance the accuracy of the assessment of the timing of haemodialysis initiation, especially for doctors who lack extensive clinical experience.
However, it is difficult to quantify these 'fuzzy' clinical factors, especially the non-linear relationships between these factors and outcomes, using traditional statistical methods 17 . In the present study, we adopted 'fuzzy' methods instead of traditional statistical methods. In past decades, the development of fuzzy mathematics has impacted the fields of modelling because it can describe vague statements 18 . Artificial neural networks (ANNs), as one of the widely used techniques of fuzzy mathematics, have the advantage of being able to detect complex, non-linear problems 19 . Generally, ANNs consist of multiple layers; hence the information transfers from the input layer to the output layer of the neuronal network layer by layer. They can be calibrated using almost any type of input data (i.e., assumed risk factors), and the output can be one-dimensional or high dimensional (i.e., outcomes) and can simultaneously consider all possible interactions between those risk factors. Therefore, ANNs have been used to predict technique survival for peritoneal dialysis, and the results showed ANNs have higher accuracy than logistic regression models 20,21 . Marshall et al. created an ANN model to predict GFR, which showed better results than algebraic formulas. These studies confirmed that ANNs could deal with dialysis datasets 22 . In our previous study, we established an improved ANN model termed the kernel logistic neural network-restricted Boltzmann machine (KLNN-RBM) to solve complex variable screening problems efficiently 23 .
Moreover, fuzzy neural networks combine the advantages of fuzzy logic in processing vague and uncertain information, and neural networks in good learning abilities. The Takagi-Sugeno (T-S) type fuzzy neural network is the most widely used modelling method among fuzzy neural networks 24 . This technique has been applied in biological and clinical fields for modelling, especially in the processing of time-delay datasets (i.e., risk factors and outcomes) and has produced satisfactory results 25,26 . Therefore, the KLNN-RBM and the T-S type fuzzy neural networks appear to be viable methods to select variables and for modelling for assessing dialysis initiation.
To the best of our knowledge, this is the first report of the use fuzzy mathematics to develop a novel equation to assess dialysis initiation, which we termed as the "dialysis initiation based on fuzzy mathematics equation" (DIFE). Furthermore, we compared the DIFE with the conventional eGFR-based assessment and showed that the DIFE is more accurate to evaluate dialysis initiation. Our results suggest that the DIFE offers a novel method to assess the initiation of dialysis, and may have implications for decision-making related to dialysis initiation in clinical practice.

Methods
Study design and participants. This retrospective cohort study encompassed 25 haemodialysis centres covering seven geographical regions of Mainland China. All the study centres serve as quality control centres of blood purification or were recommended by these centres in each province, municipality, or autonomous region. People's Hospital of Zhengzhou University. We obtained written informed consent from each patient, and personal information was protected during data collection. All the study methods were performed in accordance with relevant guidelines and regulations.
Patients with ESRD who began maintenance haemodialysis between January 1, 2008, and September 30, 2013, were enrolled. Patients were 18-85 years of age, were diagnosed with chronic kidney disease (CKD), had two successive eGFR measurements of ≤30 mL/min/1.73 m 2 within 3 months before haemodialysis initiation, and had commenced haemodialysis for a minimum of 3 months. The exclusion criteria were as follows: patients who were diagnosed with acute kidney injury (AKI); those who underwent or were scheduled to undergo peritoneal dialysis or kidney transplantation; those with a malignancy that significantly affected survival (e.g., malignant tumours, hepatic cirrhosis); and those who experienced accidental death caused by unexpected reasons, including traffic accidents and suicide.
Of the 1802 patients who began haemodialysis between January 1, 2008, and September 30, 2013, 17 patients with missing serum albumin and serum phosphate data were excluded. The enrolled patients were divided into two cohorts according to the start time of haemodialysis. A total of 1281 patients who started haemodialysis between January 1, 2008, and December 31, 2011, were included as the development cohort, whereas 504 patients who started haemodialysis between January 1, 2012, and September 30, 2013, were retained within the validation cohort ( Fig. 1). Data collection. Data from the time spanning the initiation of haemodialysis to the outcomes were extracted from the inpatient/outpatient records at the haemodialysis centres by the investigators of each centre. To assess the precision of the determination of clinical signs and symptoms from the medical records, three nephrologists (Y.L., X.H., and L.W.) independently reviewed the data of a random sample of 100 records. The documentation used for abstraction included inpatient medical records for haemodialysis initiation, first-time haemodialysis records, laboratory data, and surgical records for first access. Baseline demographic data and clinical data included sex, birth date, date of haemodialysis initiation, first access, death date, primary disease, and comorbidity. The clinical signs and symptoms at the time of haemodialysis initiation were grouped as follows: heart failure, vomiting, uremic encephalopathy, and oedema grade 2+ and 3+. Determinations of the clinical signs and symptoms were made based on the clinical guidelines by nephrologists during the outpatient or inpatient consultations 16 .
Laboratory data collected within 3 months prior to haemodialysis initiation included haemoglobin, serum albumin, blood urea nitrogen, serum creatinine, serum potassium, serum sodium, free calcium, and serum phosphorus levels. In all the clinical laboratories of the study centres, serum creatinine levels were measured using the sarcosine oxidase method. The eGFR (mL/min/1.73 m 2 ) at the initiation of haemodialysis was computed using the Chinese modified MDRD study equation 7: eGFR = 170 × serum creatinine −0.999 × age −0.176 × blood urea nitrogen −0.170 × serum albumin 0.318 (×0.762 if the patient is female; ×1.202 if the patient is of Chinese descent). All participating investigators were nephrologists and had received uniform study training.
Outcomes. The primary outcome was all-cause mortality within 3 years after haemodialysis initiation. The dates and reasons of death were obtained from the medical records of the study centres. Survival, expressed as months, was defined as the time from the start date of haemodialysis to the date of death for the patient (within 3 years after initiation) or 36 months for the surviving patients.  16 and our previous study 27 , 13 candidate variables were considered for inclusion in the DIFE, including age, sex, serum creatinine level, blood urea nitrogen level, serum albumin level, blood haemoglobin level, serum potassium level, serum phosphorus level, heart failure, vomiting, oedema grade 2+ and 3+, uremic encephalopathy, and diabetes mellitus. Sex, diabetes mellitus, heart failure, vomiting, oedema grade 2+ and 3+, and uremic encephalopathy were used as binary variables, and were transformed using dummy variable encoding (e.g., female = 1, male = 0; yes = 1, no = 0).
Using the KLNN-RBM model, the variables adopted to establish the DIFE equation were detected in the development cohort. We initialized weight and bias parameters with the help of RBM first, and then optimized the parameters using a modified maximum likelihood estimation and stochastic gradient descent method to obtain higher classification accuracy. The structure of the KLNN-RBM model is shown in Fig. 2. The inputs of the KLNN-RBM model included the candidate variables, whereas the output included the patient survival time after haemodialysis initiation (<12 months, 12-36 months, and >36 months). In other words, the development cohort was subjected to a three-classification condition. To determine reproducibility, twenty different numerical simulations were processed independently, and for each numerical simulation, 10-fold cross-validation was employed. Both the number of iterations from RBM and the stochastic gradient descent iterations were set as 100. Moreover, the mean of the simulation results indicated the classification accuracy for different combinations of the candidate variables. Through selection and comparison, according to the best classification accuracy, the final variables combination was settled.
Equation development using the T-S type fuzzy neural network. The final variables were used to establish a T-S type fuzzy neural network in the development cohort. The weighting function multiplier W was introduced for numerical indicators to establish a new haemodialysis initiation evaluation equation. After removing the outliers, 10-fold cross-validation was used for network training, and the parameters of the model were adjusted by particle swarm optimization (PSO) 28 . Based on a survival time of more or less than 36 months, patients in the development cohort were divided into two groups to determine the threshold and to evaluate the performance of the equation. Patients who survived for ≥36 months were assigned to the good survival group, whereas patients who survived for <36 months were assigned to the poor survival group. The sensitivity, specificity, and diagnostic accuracy of the 3-year mortality prediction after haemodialysis initiation were used to evaluate the equation's performance. The candidate threshold with the best performance was determined as the final threshold. Meanwhile, we developed a logistic model with the same variables as the DIFE using the development cohort.

Equation validation.
We divided the patients of the validation cohort into predicted timely and late start groups based on the DIFE threshold and plotted the Kaplan-Meier curves. Kaplan-Meier curves stratified by different eGFR thresholds (5, 6, 7, 8, and 9 mL/min/1.73 m 2 ) were also created. The performance of the DIFE was validated based on the bootstrap-corrected Receiver-operator characteristic (ROC) area of the 3-year mortality prediction after haemodialysis initiation, and by comparing this with the logistic model and the conventional eGFR-based measurements, wherein eGFR was calculated by both the Chinese modified MDRD study (C-MDRD) equation Statistical analysis. Continuous variables are expressed as the mean ± standard deviation or interquartile range. Categorical variables are expressed as frequencies (percentages). For the comparison of patient baseline data between the two cohort groups, continuous variables were evaluated using the Mann-Whitney U test, and categorical variables were evaluated using the χ 2 test.
Statistical analysis was conducted using SPSS software (version 19.0; IBM, Armonk, NY, USA). All statistical tests were two-sided. Statistical significance was set at p < 0.05. All ANN models were constructed using MATLAB software (version 2011b; The MathWorks Inc., Natick, MA, USA), whereas R (version 3.4.1) open source software (R Foundation for Statistical Computing; www.Rproject.org) was used for validation.

Results
Cohort description. The baseline characteristics of the development cohort (n = 1281) and the validation cohort (n = 504) are presented in Table 1. Significant differences were observed in the body mass index (BMI), heart failure, vomiting, diabetes mellitus, haemoglobin level, blood urea nitrogen level, serum creatinine level, potassium level, calcium level, and phosphate level at the time of haemodialysis initiation between the two cohorts.
Within the first 3 years of haemodialysis, 156 patients died in the development cohort, and 57 patients died in the validation cohort. The 3-year mortality rate in the development cohort was 12.2 deaths per 100 patient-years, whereas the corresponding rate in the validation cohort was 11.3 deaths per 100 patient-years.
Equation variable selection using the KLNN-RBM model. A total of 13 candidate variables were considered as the input variables in the KLNN-RBM model, including five variables (age, sex, serum creatinine level, blood urea nitrogen level, and serum albumin level) as part of the MDRD study equation 7 29 and eight potential clinical factors (i.e., blood haemoglobin level, serum potassium level, serum phosphorus level, heart failure, vomiting, oedema grade 2+ and 3+, uraemia, encephalopathy, and diabetes mellitus). Classification accuracies obtained by the combination of different candidate variables are listed in Appendix Table 1. The best classification accuracy (64.30%) was achieved by using the following nine candidate variables: age, sex, serum creatinine level,  Table 1. Baseline characteristics of the development cohort and validation cohort. Note:The conversion factor for the serum creatinine level in mg/dL to µmol/L is × 88.4; the conversion factor for the blood urea nitrogen level in mg/dL to mmol/L is × 0.357; the conversion factor for the haemoglobin and serum albumin levels in g/dL to g/L is ÷10. Abbreviation: eGFR, estimated glomerular filtration rate. *Calculated using the Chinese modified Modification of Diet in Renal Disease equation 7.

Equation development using a T-S type fuzzy neural network model. A T-S type fuzzy neural
network model was established in the development cohort using the nine selected variables. W in the equation was a function of six numerical variables (i.e., serum creatinine level, age, serum albumin level, haemoglobin level, blood urea nitrogen level, and phosphorus level), and served as a multiplier representing the effects of these variables on the patients' outcomes.  Table 3, and the threshold was considered to be 30. The logistic model was set up with the same nine variables, with a calibration value of 8.06 (P = 0.428) (shown in Table 2).

Performance of the equation in the validation cohort.
We tested the accuracy of the equation in the validation cohort. The diagnostic accuracy rate of the equation was 72.42%, the specificity was 75.84%, and the sensitivity was 45.61%, with a threshold of 30, which remained the best among the different candidate thresholds. The 3-year mortality rates in the good survival group and poor survival group were 8.38 deaths per 100 patient-years and 19.40 deaths per 100 patient-years, respectively. The validation accuracies were similar to those in the development cohort and showed robust performance with the DIFE (listed in Table 3). Moreover, we evaluated the performance of the C-MDRD equation in the validation cohort based on the eGFR thresholds (5, 6, 7, 8, and 9 mL/min/1.73 m 2 ). The best sensitivity, specificity, and diagnostic accuracy rates for the conventional assessment of 3-year mortality in the validation cohort were 19.3%, 77.0%, and 70.6% respectively, when the eGFR was 5 mL/min/1.73 m 2 ; however, the values were all lower compared with those obtained using the DIFE ( Table 3).

Comparison of the equation with the logistic model and the conventional eGFR-based assessments in the validation cohort.
We found that the area under the receiver-operating characteristic curve (AUC) was 0.70 (95% confidence interval [CI], 0.64-0.76) for the DIFE and 0.60 (95% CI, 0.53-0.68) for the logistic model; the P value was 0.021 after 2000 times bootstrapping. When compared with the conventional eGFR-based assessments, the AUC was 0.55 for the C-MDRD study equation (95% CI, 0.47-0.63) and 0.53 for the CKD-EPI equation (95% CI, 0.45-0.62); the P values were 0.013 and 0.006 after 2000 times bootstrapping (Fig. 3). Furthermore, the Kaplan-Meier curves for the predictive timely and late start groups, based on the DIFE threshold, indicated a greater cumulative incidence of death in the predicted late start group (χ 2 _ = 212.1, P < 0.001; Fig. 4). Moreover, there was no significant difference between the two groups, regardless of the eGFR threshold (P > 0.05; Appendix Fig. 1).

Discussion
The GFR is commonly used to assess renal function and was also considered as a critical factor to evaluate the dialysis initiation time. However, eGFR alone is not sufficient to assess the dialysis initiation time 16,30,31 . Hence, we developed a novel equation, termed the DIFE, based on fuzzy mathematics and ANNs, which integrated multiple affecting factors, to assess dialysis initiation.
In this nation-wide prospective cohort, all the participants were enrolled from major haemodialysis centres in Mainland China. Nephrologists at these centres have considerable clinical experience and can make precise  Table 2. Hazard Ratios for the Logistic Model in the Development Cohort. Note: The conversion factor for the serum creatinine level in mg/dL to µmol/L is × 88.4; the conversion factor for the blood urea nitrogen level in mg/dL to mmol/L is × 0.357; the conversion factor for the haemoglobin and serum albumin levels in g/dL to g/L is ÷ 10.
www.nature.com/scientificreports www.nature.com/scientificreports/ subjective judgments regarding dialysis initiation. Therefore, the new equation developed based on data from these centres, would ensure the optimum and precise assessment of dialysis start time in China.
Some observational studies and meta-analyses suggested that several clinical factors at dialysis initiation (e.g., haemoglobin value, serum phosphorus level, and heart failure) were associated with the prognosis of patients with ESRD, and some of these variables were already present in certain predictive models for CKD, renal failure, and risk of death [32][33][34][35][36][37][38][39][40] . Moreover, the KDOQI guidelines recommended that the decision to initiate dialysis should be based on an assessment of multiple clinical factors, including demographic factors, renal function, nutritional status, clinical signs and symptoms, and comorbidity 16 . Hence, we included these factors in the KLNN-RBM model as candidate variables, among which the signs and symptoms variable was the most frequently documented and routinely obtained.
The kernel logistic regression models and restricted Boltzmann machines are both proven techniques to handle feature selection and dimensionality reduction, and to determine the initial value of the model in classification situations. These two techniques have been used in disease diagnosis, gene screening, and other biological/ clinical studies [41][42][43][44][45] . However, to optimize the initial parameters and promote the classification capability of the logistic model, we combined a kernel logistic neural network with RBM and established the KLNN-RBM model. In our previous study, we validated KLNN-RBM in a single centre prospective cohort of dialysis patients. The results showed that the KLNN-RBM achieved higher accuracy compared with traditional logistic regression 27 . We also used six University of California Irvine (UCI) Machine Learning Repository datasets to test the performance of KLNN-RBM. The UCI Machine Learning Repository is a widely used primary source of machine learning data sets for the empirical analysis of machine learning algorithms, which has some biological and clinical datasets. The results also showed that the KLNN-RBM could achieve higher accuracy for the binary classification and multi-class classification problems 23 . These suggested that the KLNN-RBM model is an appropriate method for candidate variable screening.
In the present study, the results showed that the combination of sex, age, serum creatinine, blood urea nitrogen, serum albumin, haemoglobin, serum phosphorus, diabetes mellitus, and heart failure as equation variables resulted in the best accuracy. The clinical factors employed within the DIFE equation were consistent with those in the clinical guidelines and other cohort studies 10,16,34,35 . Based on these studies, we successfully determined the quantitative combination of these clinical factors for the first time. We initially compared the results of DIFE with a logistic model that had the same variables as the DIFE, and the DIFE demonstrated better model discrimination (the AUC of the DIFE was 0.70 vs. the AUC of the logistic model of 0.60). This result indicated that when integrating multiple variables, especially including some subjective judgments, fuzzy mathematics could be a more appropriate modelling method than traditional statistical methods. We also compared the AUC of the DIFE with those of two conventional eGFR-based assessments, and neither result was better than that of the DIFE (0.55 for the C-MDRD equation and 0.53 for the CKD-EPI equation). Therefore, the DIFE was more accurate to assess dialysis initiation than the conventional eGFR-based assessments. Furthermore, the Kaplan-Meier curves between the predictive timely and late start groups showed a significant difference based on the DIFE equation (P < 0.001); however, the C-MDRD equation and the CKD-EPI equation did not show any difference in the start time of dialysis for any eGFR value (P > 0.05). This result indicated that the DIFE could be a more suitable assessment method for the timing of dialysis initiation compared with the eGFR based conventional assessment.
The following example shows the manner in which the DIFE can provide a quantified assessment of dialysis initiation. The decision for haemodialysis initiation was unclear in two ESRD patients with the same eGFR of  www.nature.com/scientificreports www.nature.com/scientificreports/    www.nature.com/scientificreports www.nature.com/scientificreports/ 10 mL/min/1.73 m 2 (clinical data shown in Table 4). If the other clinical factors were not considered, the eGFR value alone could not be used to make this decision. However, the DIFE equation yielded significantly different values: the value was 29.35 in patient A and 42.16 in patient B. Thus, the DIFE could directly identify that patient A required haemodialysis immediately, whereas patient B could undergo some preparation (e.g., vascular access placement) and undergo haemodialysis later. Liu reported that the major challenge for the management of patients with CKD in China is the lack nephrology specialists 46 . Patients with ESRD in China are usually treated by primary physicians who lack specialist training in nephrology and patients may receive inappropriate decisions on dialysis initiation. The Chinese government is implementing reforms in the medical education system to produce well-trained primary health-care providers; however, this will take time given the country's vast need. The DIFE could help primary physicians to quantitative assess the initiation of dialysis through the DIFE value. Furthermore, to be applied conveniently, we developed a mobile phone application of the DIFE. The physician can get the DIFE value directly by entering the variables into a mobile phone.
The present study had certain limitations. First, the DIFE included retrospective data from patients who had already undergone haemodialysis and may have excluded patients who died prior to haemodialysis initiation, which could lead to survivor bias. Second, because of some incomplete data in the retrospective cohorts, some clinical indicators, such as malnutrition symptoms and subjective global assessment (SGA), could not be included as candidate variables. We are conducting a prospective, multicentre, randomized, controlled trial concerned nutrition status, including SGA and malnutrition symptoms, to verify and improve the equation (the clinical-Trials.gov ID is NCT 03385902). Although neural network techniques have been developed in a wide range of applications in recent years, and are proven to be superior to conventional statistical models to assess initiation of dialysis, how to optimize the structure and parameters in neural networks remains a challenge.
In the present study, we developed a novel fuzzy neural network model to evaluate the optimal time of haemodialysis initiation in a Chinese ESRD population. The variables in the equation included the clinical indicators of haemodialysis initiation, and the performance of the equation was found to be more precise than conventional eGFR-based assessments. This equation may be a helpful tool to quantitatively evaluating the initiation of haemodialysis.

Data Availability
The datasets generated during the current study are available from the corresponding author on reasonable request.