Warfarin maintenance dose Prediction for Patients undergoing heart valve replacement— a hybrid model with genetic algorithm and Back-Propagation neural network

Warfarin is the most recommended anticoagulant drug for patients undergoing heart valve replacement. However, due to the narrow therapeutic window and individual dose, the use of warfarin needs more advanced technology. We used the data collected from a multi-central registered clinical system all over China about the patients who have undergone heart valve replacement, subsequently divided into three groups (training group: 10673 cases; internal validation group: 3558 cases; external validation group: 1463 cases) in order to construct a hybrid model with genetic algorithm and Back-Propagation neural network (BP-GA), For testing the model’s prediction accuracy, we used Mean absolute error (MAE), Root mean squared error (RMSE) and the ideal predicted percentage of total and dose subgroups. In results, whether in internal or in external validation group, the total ideal predicted percentage was over 58% while the intermediate dose subgroup manifested the best. Moreover, it showed higher prediction accuracy, lower MAE value and lower RMSE value in the external validation group than that in the internal validation group (p < 0.05). In conclusion, BP-GA model is promising to predict warfarin maintenance dose.

Participants. The participants were patients undergoing heart valve replacement extracted from the database "Chinese Low Intensity Anticoagulant Therapy after Heart Valve Replacement" (CLIATHVR), which was collected from April 1st, 2011 to December 31st, 2015, through a multi-central registered clinical system in 35 medical centers all over China.
The inclusion criteria: (1) Chinese people; (2) age over 18 years; (3) receiving warfarin as the only oral anticoagulation in regular and monitor by INR as index after receiving heart valve replacement; (4) assuring the fluctuation of INR less than 0.2 units for three times continuously and the INR range was 1.5-2.5 during the later follow-up. The included patients should meet all the above criteria.
The exclusion criteria: (1) severe liver or kidney dysfunction before or after the operation; (2) drug combination of non-steroidal anti-inflammatory drugs or other drugs affecting anticoagulation effect; (3) anticoagulant complications (thrombosis; embolism; bleeding; death) occurred during anticoagulant therapy (considering our objective was to predict warfarin maintenance dose. In that dose, it realized the ideal state where warfarin took great anticoagulation effect and complications did not occur). The patient who was in any of the above situations would be excluded.
Included Variables. The input variables. The input variables were extracted by two methods: the analysis of covariance and the enrollment of mandatory variables.
Such specific steps were performed: firstly, for data cleaning, based on the clinical professional knowledge, we screened items from all the 706 items included in the database as the latent input variables. Then the analysis of covariance was used for extracting the primary input variables that have statistical significance (Type I, α = 0.05). Finally, we enrolled in the mandatory variables relevant to warfarin in clinic whether it had statistical difference or not.
The output variable. Warfarin maintenance dose was the output variable, which was identified when the INR value was all at the target range of 1.5-2.5 and the fluctuation was less than 0.2 units for three times in succession.
Data set. We divided the eligible cases into three groups: Group A (Training group), Group B (the internal validation group) and Group C (the external validation group).
According to the distribution method mentioned by Steyerberg 17 and Lópe 18 , we chose the medical centers enrolling less than 200 cases (the medical centers of small cases size often do not belong to professional cardiothoracic hospital or Tertiary hospital, and have low compatibility) as Group C. The remaining was randomly divided into group A and group B by the ratio of 3:1. Group A was used to generate model, Group B and C were used to verify the predication accuracy of internal and external validation, respectively.
Model construction. The introduction of BP-GA model. The BP neural network of three layers (input layer, hidden layer and output layer) has been widely used in medicine with superior solution of nonlinear relationship and good error-tolerance capability. However, BP neural network is easy to fall into the local optimal solution, and the convergence performance of BP neural network is weak 19,20 . GA (genetic algorithm) follows the principle of evolution and takes the individual of good evolution as the optimal solution through searching the whole solution space. Hence, it can obtain the global optimal solution to optimal BP neural network. In our study, the process to construct BP-GA Model included two basic parts as depicted in Figure 1.
Part one: BP neural network modeling. It was a process of information transfer in feed-forward and error back propagation. To be specific, according to the given input and output layer sample data, it went on training to construct the network structure in the propagation process until the error between actual output and target output met minimum. Accordingly, in our study, the final independent variables were the neurons in the input layer, and the neuron in the output layer was warfarin maintenance dose. The number of neurons in hidden layer was in the range of + + α m n (α was the constant integer of 1-10, m and n were the numbers of neurons in the input layer and output layer, respectively.), which was determined when the error between actual output and target output got the minimum. The error was the value of MAE (the Mean absolute error between the predicted dose and the actual dose). The smaller the MAE value was, the more accurate the prediction model was. Part two: The weights and thresholds optimization by GA algorithm. Although the network structure has been constructed, it generated the initial weights and thresholds randomly. The improper initial weights and thresholds would worsen the prediction accuracy. Hence, it was inevitable to go on weights and thresholds optimization by GA algorithm.
The process of optimization was listed as the following steps.
Step 1: A group of individuals composed of a certain generation, each individual was described as a chromosome. Chromosome consisted of a series of real numbers, which represented the connection weights between hidden layer and input layer, the connection weights between output layer and hidden layer, and the thresholds in the hidden and output layer.
Step 2: The reciprocal of the absolute value, which represented the difference between the predicted and actual outputs of each individual, was regarded as the fitness function. As shown in eq. (1), it was in direct proportion to the viability of the chromosome.
(n denotes the number of neurons in the output layer, y i and o i represent the predicted and actual outputs of the ith neuron, respectively).
Step 3: The weights and the thresholds optimization can be obtained by performing the following operations, such as selection, crossover, and mutation.
1. The roulette method was used as the selection strategy according to a certain probability (Pi) based on the size of fitness value, as showed in eq. (2).
(fi denotes the fitness value of the ith individual, N denotes the number of population. The larger the value of individual fitness is, the greater the opportunity to be selected will be). 2. The crossover operation was that the two individuals were selected from the group according to a given probability, and the parts of the two individual's codes were exchanged to get two new individuals. 3. The mutation operation was that individuals can be selected according to a given mutation probability. The chromosome mutation position of the individual was randomly determined.
Next, it went on the process of iterative evolution from step 1 to step 3 until getting the Near-Optimal fitness value or going to the default maximum generation of evolution. The output of this process was the individual with the best fitness, and the individual consists of the weights and threshold, which would be used as the final weights and thresholds of the BPNN.
The parameters and software used in our study. The BP neural network was constructed by Neural Network Toolbox of MATLAB R2010b. The parameters of BP neural network are set according to the engineering experiences, in our study, they were listed as follows: the training times were 1000, the target of error was 0.001 and the learning rate was 0.1.
The GA was constructed by the GAOT Toolbox in MATLAB R2010b. The parameters were set as follows: The size of population was 50, the generation of evolution was 100, the crossover rate was 0.95, and the mutation rate was 0.09.

Model validation.
We used MAE, Root mean squared error (RMSE: the square root of the mean square error between the predicted dose and actual dose) to measure the prediction accuracy. In the meantime, according to the defined method of Klein et al. 7 , the ideal predicted percentage (the percentage of whose absolute error between the predicted dose and actual dose was within 20% of the actual dose) was used to test the clinical utility of BP-GA. The smaller the value of MAE and RSME was, the better the prediction accuracy was. And the larger the ideal predicted dose percentage was, the better the clinical utility was.
Dose subgroups analysis was also conducted to decrease the clinical heterogeneity, which was based on the 25% and 75% quartile of the actual value of warfarin maintenance dose:

Statistical analysis.
The independent sample t-test was used for assessing the statistical difference between two groups including training and internal validation groups; training and external validation groups, internal validation and external validation groups. Difference of the predicted percentage between the internal and external validation was analyzed by chi-square test, and the statistical significance level of all analysis was set up as 0.05 with two-sided test by SPSS 20.0.

Results
Participants' characteristics. As the flow diagram showed in Figure 2, we finally included15694 eligible cases in the analysis, the cases in training group, internal validation group and external validation group were 10673, 3558 and 1463, respectively. The basic characteristics of the participants were showed in Table 1. Overall, the patients' age was centered on 40-65 years old (the mean age = 50.24 years). The male and female sexed ratio was near to 9:11. Most eligible patients were Han-Chinese, and the mean warfarin maintenance dose was 2.73 ± 0.73 mg/d.
There was no statistical difference (p > 0.05) between the training group and the internal evaluation group in characteristics. And there was difference (p < 0.05) between the training and the external evaluation group in relevant demographic and clinical features such as age, weight. Compared with the internal validation group, the patients in the external validation group have statistical difference (P < 0.05) in many characteristics, such as weight, BSA (Body surface area), APTT (Activated partial thromboplastin time) and steady-state INR.
Included variables. The independent variables. Firstly, in the process of data cleaning, 45 items were screened as the latent independent variables after removing the items not related to warfarin dose or whose integrity was under 50%.
Then, the primary 10 input variables were selected by the analysis of covariance (η 2 ≥ 0.01 and p < 0.05) in Table 2, η 2 meant the contribution of a certain input variable on the output variable. Considering the requirement of model conciseness, η 2 was set as more than 0.001. The 10 input variables were age, EF (Ejection fraction), left ventricular diastolic diameter, operation history), albumin, urea nitrogen, creatinine, preoperative APTT of one day before valve replacement, timing of first anticoagulant and warfarin origin. Operation history meant whether the included patient has undergone other surgery before heart valve replacement, and warfarin origin meant where the warfarin is made, in China or abroad. We used 1 to represent warfarin made in China (Qilu pharma or Shanghai Xinyi Pharma) and 2 to represent warfarin made abroad (Orion Corporation Orion Pharma).
Next, height and weight were used as the mandatory variables part. The reason was that Gu et al. 21 found the three variables (age, weight and height) can explain 76.8% of the total warfarin dose variation Hence, in the end, 12 independent variables were filtered out.
The output variable: warfarin maintenance dose was the output variable. Model construction. The primary BP neural network was as the following: m was 12 because we finally selected 12 input variables, and n was 1 because the output variable only included warfarin maintenance dose, the number of point of hidden layer was 9, which got the minimum value of MAE. As it showed in Figure 3, the whole process of BP-GA model went on the 23rd generation training to get the best and stable value of the fitness, and showed the genetic algorithm can be used for optimizing the weighs and thresholds value. The predicted diagrams were showed in Figures 4 and 5.

Model validation.
In the analysis of the total ideal predicted percentage (Table 3), BP-GA both showed over 58% predicted percentage. Moreover, it showed higher prediction accuracy (p < 0.05) in the external validation group than that in the internal validation group. When considering the MAE and RMSE, the value of MAE was also lower in the external group (internal group: 0.383 mg/d; external group: 0.370 mg/d) with statistical difference (p < 0.05). Meanwhile, the value of RMSE in the external group was lower than that in the internal group (internal group: 0.664 mg/d; external group: 0.656 mg/d).
In the dose subgroup analysis (Table 4), whether in the internal or external validation, BP-GA had the best predicted percentage in the intermediate dose subgroup. Meanwhile, the predicted percentage of the external group in the intermediate group was higher than (p < 0.05) that in the internal group (the internal group: 77.90%;the external group: 84.20%). What's more, whether in internal or external validation subgroup, BP-GA model showed over 98% over-prediction in low dose subgroup, and it manifested over 98% under-prediction in high dose subgroup.

Discussion
The summary of main results. Our study has three important features: firstly, our study was based on a clinical registered system of 27012 cases using warfarin after heart valve replacement; secondly, we used BP-GA, an artificial intelligence method, to build a model based on 15694 eligible patients from the database; thirdly, the average warfarin maintenance dose was 2.73 mg/d, which was less than the previous IWPC 7 maintenance dose of 4 mg/d. And the target INR value range 22 was 1.5-2.5, which was less than the western standard (INR 2.0-3.0) 23 . These features proved that Chinese people were more sensitive to warfarin and they should be given low-intensity anticoagulation.
In summary, there was statistical difference (p < 0.05) between training and external evaluation groups, internal and external validation groups, which manifested the two groups were from different samples of divergent demographic and clinical characteristics. When considering the value of MAE, RMSE and total ideal predicated percentage, BP-GA model all showed significant prediction accuracy no matter in internal or external validation. Furthermore, in the dose subgroup, BP-GA model showed the best prediction accuracy in the intermediate dose subgroup. And the prediction accuracy in the external validation was higher than that in the internal validation, which enlightened that BP-GA was a useful model with high external validity.
The plausibility of final independent variables. In this study, we used two ways (the analysis of covariance and the enrollment of mandatory variables)to select the final independent variables: Hence, in the end, 12 independent variables (age, EF, left ventricular diastolic diameter, operation history, albumin, urea nitrogen, creatinine, preoperative APTT of one day before valve replacement, timing of first anticoagulant, warfarin origin, weight and height) were selected. It has been validated in the previous study 21 that the three variables (age, weight and height) can explain 76.8% of the total warfarin dose variation. Masayasu et al. 24 found that EF and left ventricular diastolic diameter were related to the formation of thrombus, thus, they are also related to the use of warfarin. Meanwhile, creatinine and urea nitrogen are the typical Laboratory inspection indicators of kidney  function. Nita et al. 25 found kidney function influenced warfarin responsiveness. Albumin is one of the major carriers proteins in the body and constitutes approximately half of the protein found in blood plasma, Osama et al. 26 found it had one of the protein's major binding sites "Sudlow I" which included a binding pocket for the drug   warfarin (WAR), hence, albumin is also related to the warfarin dose. In our study, APTT was the preoperative APTT of one day before valve replacement. When referring to Kucuk M et al. 27 , they found a preoperative low APTT value may be an indicator for thrombosis in patients who have undergone heart surgery, hence, it affected the postoperative anticoagulation. Furthermore, it may also affect the use of warfarin after heart valve replacement. In Dong et al. 28,29 , the warfarin maintenance dose made in China was different from the imported brands. In Lip et al. 30 , time in therapeutic range and medical history were related to the bleeding event of warfarin, hence, operation history and the time of first anticoagulant may also influence the use of warfarin. Therefore, the final included variables were plausible reasoning.
The comparison between the existing models. When considering the total predicted percentage, it was 62.8% of the external validation in our BP-GA model, which was higher than that (48.46%) inYu et al. 31 . Yu et al. 31 was the appropriate reference for that it also went on external validation of 130 Han-Chinese after heart valve replacements. However, obvious difference existed between our study and Yu et al. 31 , which may be exactly the reasons for the diversity of predicted percentage. The first was the inconsistency between the training group and the external validation group. Our training group and the external validation group had the same characteristics of the single disease type (undergoing heart valve replacement), the single ethnic (Chinese) and the same INR target value (1.5-2.5). However, Yu et al. 31 used the existing authoritative IWPC 7 model, the training group was multi-ethnic and multi-disease with a certain target INR value (2.5-3.0), which was different from its own external validation group. Secondly, it was the different sample size of the training group. Our BP-GA model was constructed by the training group of 10673 cases, which was larger than the training group of IWPC model (4043 cases) in Yu et al. 31 . It was in accord with the fact that the lager sample size would achieve better prediction performance 32 . Thirdly, comparing with IWPC model, the BP-GA model of our study had strong generalization ability to address the nonlinear relationship. When the prediction performance was assessed through MAE, our BP-GA model was less than 0.40 mg/d and better than that of Li et al. 9 (over 0.60 mg/d), it used the seven models (SVR, ANN, RT, MLR, RFR, BRT and MARS) to predict warfarin maintenance dose of 1295 Chinese people. The reasons why our model showed better prediction accuracy may because of larger samples size and a new artificial intelligence model used in our study. However, when comparing with Fu-hua model 33 , which was constructed by the training group of Chinese  Table 3. The Comparison of total predication accuracy of the BP-GA. Note: BP-GAiv: the internal validation group. BP-GAev: the external validation group. *P < 0.05 (independent sample t-test of the MAE and RMSE between the internal and external validation group, respectively). **P < 0.05 (chi-square test the ideal percentage between the internal and external validation group). Ideal: the percentage of patients whose predicted absolute error between predicted dose and actual dose was within 20% of actual dose. Underestimate: the percentage of patients whose predicted dose was less than actual dose and the predicted absolute error between predicted dose and actual dose was more than 20% of the actual dose. Overestimate: the percentage of patients whose predicted dose was more than actual dose and the predicted absolute error between predicted dose and actual dose was more than 20% of the actual dose.  31 were in line that they both showed best predicted percentage of intermediate-dose subgroup, which had the biggest sample size of the training group. Furthermore, compared with Yu et al. 31 , the prediction accuracy of our BP-GA in the intermediate subgroup was better. We may find the reason through the specific case distribution of subgroup. In our study, the proportion of the case in the intermediate dose subgroup (75.7%) was higher than that of IWPC 7 model (53%). Accordingly, it also explained why our BP-GA model showed weak prediction accuracy of low and high dose group (over-prediction in low dose subgroup and under-prediction in high dose group). To be specific, comparing with the sample size in intermediate dose subgroup, the size in low and high dose subgroup was small. And BP-GA model captured more characteristics of intermediate dose group. As a result, the predicted dose was near to intermediate dose whether in the low or high dose subgroup. Hence, it showed over-prediction in low dose subgroup and under-prediction in high dose group. Because the over-estimation would cause the overdose use of warfarin, which was related to the severe symptoms of bleeding (hematuria; bleeding from mucous membranes of the nose or gums; ecchymosis on the extremities; bleeding from the gastrointestinal tract; massive liver hematoma; diffuse alveolar hemorrhage) [36][37][38][39][40] , and warfarin-related hemorrhages result in thousands of emergency department visits and hospital admissions annually 41 . Meanwhile, under-estimation caused the under-dose use of warfarin, which was in accord with insufficiency of anticoagulation and the presence of thrombosis 42 , a main cause of death and disability worldwide 43 . Hence, in the following study, it is inevitable to find a proper way such as stratified training to improve the prediction accuracy of low and high dose subgroups. And it also reminded us that we had better consider genetic factors when predicting maintenance dose of the low and high dose group, but there was no need in the intermediate dose group of the biggest sample size, which will lessen the medical burden, particularly under the fact that the cost of genotype testing was too expensive and has not been covered by medicine reimbursement in China.

Limitation and Future
There were some limitations of this study need to be addressed. One limitation was that we did not add the genetic information and obvious influential variables into BP-GA model, such as drug combination and diet, which may influence the prediction accuracy. What's more, it was a retrospective study, hence, the BP-GA model only described the existed phenomenon. And we may discard some important information when we deleted the items whose integrity was under 50%. Before BP-GA model going to clinical application as a useful prediction model, a prospective study was in need in the following series study to validate its predication accuracy and to improve the integrity of follow-up. Meanwhile, the analysis of covariance and compulsorily enrolling the variables used for selecting variables may leave out some important features having latent non-linear relationship with the outcome and lower the prediction accuracy of BP-GA model. Hence, in the future study, a more appropriate method of variables selection is in need. And in fact, the INR and measurements will vary from day to day from the initiation of therapy, thus, it was better to confirm the INR of a certain day to use as the potential input variable. However, our study was a retrospective study, which was based the existed database. In the original data, the frequency and day to start measure INR after valve replacement was not fixed. Hence, our study was difficult to collect INR of a certain day and we hadn't tested INR in the variable selection. In the following prospective study, it recommends us to collect INR value of a certain day to improve prediction accuracy.  Table 4. Comparison of the models' predicated percentage of dose subgroup Note: BP-GA iv: the internal validation group; BP-GA ev: the external validation group. **P < 0.05 (chi-square test the ideal percentage between the internal and external validation group). Ideal: the percentage of patients whose predicted absolute error between predicted dose and actual dose was within 20% of actual dose. Underestimate: the percentage of patients whose predicted dose was less than actual dose and the predicted absolute error between predicted dose and actual dose was more than 20% of the actual dose. Overestimate: the percentage of patients whose predicted dose was more than actual dose and the predicted absolute error between predicted dose and actual dose was more than 20% of the actual dose.

Conclusion
In conclusion, BP-GA model was a promising model to predicate warfarin maintenance dose for patients undergoing valve replacement, because in both of the total and dose subgroup analysis, BP-GA all showed high prediction accuracy, particularly in the external validation group which represented the condition of real clinical practice.