Application of Machine-Learning Models to Predict Tacrolimus Stable Dose in Renal Transplant Recipients

Tacrolimus has a narrow therapeutic window and considerable variability in clinical use. Our goal was to compare the performance of multiple linear regression (MLR) and eight machine learning techniques in pharmacogenetic algorithm-based prediction of tacrolimus stable dose (TSD) in a large Chinese cohort. A total of 1,045 renal transplant patients were recruited, 80% of which were randomly selected as the “derivation cohort” to develop dose-prediction algorithm, while the remaining 20% constituted the “validation cohort” to test the final selected algorithm. MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied and their performances were compared in this work. Among all the machine learning models, RT performed best in both derivation [0.71 (0.67–0.76)] and validation cohorts [0.73 (0.63–0.82)]. In addition, the ideal rate of RT was 4% higher than that of MLR. To our knowledge, this is the first study to use machine learning models to predict TSD, which will further facilitate personalized medicine in tacrolimus administration in the future.

Scientific RepoRts | 7:42192 | DOI: 10.1038/srep42192 fewer dose modifications 12 . A more recent trial indicated that CYP3A4 activity and CYP3A5 genotype can explain 56-59% variability in tacrolimus dose and clearance 9 . These successes in clinic revealed the possibility to improve clinical outcomes of tacrolimus therapy by taking pharmacogenomic factors into account.
However, most of the algorithms for predicting tacrolimus dose were based on relatively small clinical population, and the predictive accuracy was usually uncertain. Moreover, proposed algorithms are mostly based on multiple linear regression (MLR) methods, which have some well-known limitations that may impair prediction accuracy. For example, MLR model assumes independence between variants, and the relationship between the dependent and independent variables is always complex and non-linear 20 . Therefore, MLR may not be the most applicable model for accurate prediction of drug outcomes.
Machine learning techniques, compared with traditional statistical models, have many advantages including high power and accuracy, ability to model non-linear effects, interpretation of large genomic data sets, robustness to parameter assumptions and dispense with normal distribution test 21 . Recently it has been widely used in predicting warfarin dose 22 . For example, Random Forest Regression (RFR), Boosted Regression Tree (BRT) and Support Vector Regression (SVR) models were utilized to predict warfarin maintenance dose in African Americans, with much higher accuracies than previous models 21 . Artificial neural networks (ANNs) algorithm reached high accuracy in predicting warfarin maintenance dose, more than 70% of patients in the low (≤ 21 mg/) and median dose (21-49 mg) subgroups have been correctly identified 23 . In our previous work, eight machine learning algorithms were compared with MLR in predicting warfarin dosing, results showing Bayesian additive regression trees (BART), multivariate adaptive regression splines (MARS) and SVR significantly outperformed other models; machine learning methods also performed better than MLR in the low-and high-dose ranges 20 .
To our knowledge, the development and application of machine learning algorithms to predict tacrolimus dose has not been reported. We therefore conducted this research to investigate the clinical and genetic factors significantly associated with tacrolimus stable dose as well as to identify the most feasible algorithm for prediction of the dose requirement of Chinese renal transplantation recipients.

Results
Basic Characteristics of the Study Cohorts. In total, 1,045 renal transplant recipients were enrolled in the trial, whose basic characteristics are shown in Table 1. Continuous variables are shown as mean ± standard deviation, and categorical variables are shown as number (%). No significant difference was found in the demographic, clinical and genetic data between the derivation cohort (n = 838) and the validation cohort (n = 207). All the tested SNPs were in Hardy-Weinberg equilibrium except ABCB1129TC and ABCB12677GTA, and the genotype frequencies were in accordance with previously reported data of Chinese population (Supplementary Table S1).
In the derivation cohort, the mean TSD among these patients was 3.48 ± 1.28 mg/day. Patients had an average age of 36.19 ± 10.62 years old, and 71.2% were males. The most common complication was hypertension with 561 patients (66.9%), while 19 patients were diagnosed diabetes (2.3%). The most common combined medication was calcium channel blocker (544 patients, 64.9%), in addition, 377 patients were given metoprolol (45.0%), and 246 were given omeprazole (29.4%). Among all these patients, 50.6% were carriers of CYP3A5*3 GG genotype, 9.2% were carriers of AA genotype, and 16 unknown (1.9%). The detected results of other genotypes are illustrated in Supplementary Table S1.

Identification of Clinical and Genetic Factors Significantly Associated with TSD.
In order to construct a predictive model that only contains important factors, we have first investigated the relationship between each factor and the STD of patients. Univariate analysis was used to test all the clinical and genetic factors, resulted in four factors that were significantly associated with TSD: whether has hypertension, whether has diabetes, whether taking omeprazole and CYP3A5 genotype. Among them, CYP3A56986AG is the most significant influencing factor, with a P value of 2.2*10 −16 in the F-test (Supplementary Table S2).
Tacrolimus stable dose was also compared among patients with different ABCB1 genotypes. The comparison was carried out for the three polymorphisms (1236 C/T, 2677 G/A/T and 3435 C/T) most studied in the literature. No statistically significant difference was found (data not shown).
Next, multivariable regression model was used to test the above four variates. It was shown that diabetes was not a significant factor in the model and thereafter excluded. The remaining three factors (hypertension, use of omeprazole and CYP3A5 genotype) were used to construct the MLR and 8 machine learning models. (Supplementary Table S3).

Overall Comparison of Predictive Algorithms.
In order to determine the overall predictive accuracy, approaches including MAE and ideal rate (predictive dose fell within 20% of the actual dose) were applied. Among all the machine learning models, RT performed best in both derivation [0.71 (0.67-0.76)] and validation cohorts [0.73 (0.63-0.82)] (not statistically significant) (Fig. 1). Although MLR had similar MAE with RT, the ideal rate of RT was 4% higher than that of MLR. On the other hand, the lowest MAE was also seen using RFR model. However, as RT is a simpler and more easily understood model that fits clinical use, it was chosen for this study. The developed RT model is shown as Fig. 2.
Clinical Relevance. In general, RT algorithm provided more accurate prediction of TSD than other 8 algorithms. The performance of the three dose ranges, low dose (≤ 2.5 mg/day), intermediate dose (2.5-4 mg/day), and high dose (> 4 mg/day), were compared as shown in Tables 2 and 3. Patient doses in the intermediate range were best predicted compared with the actual stable dose in both derivation and validation cohorts (MAE = 0.50 and 0.48 mg/day, respectively) ( Table 2). For patients who required 2.5 mg/day or less (24.4% of the total 693 patients), 38.5% of the predicted dosage fell into ideal dose range (20% of the actual dose). While for the patients who required 4 mg/day or more (20.8% of the total patients), 44.4% of the prediction dropped into ideal dose.

Discussion
Compared with traditional dosing strategies in clinic, the current study was successful in providing a novel approach that can predict TSD more accurately and conveniently. In general, the performances of the 9 algorithms were similar in predicting TSD. While the best performance was observed in RT model in this study, comprehensive evaluation of these algorithms in various studies is needed to come to a final conclusion. It should also be noted that the current study was performed in Chinese, studies in other ethnic groups may come to different results.
The most influential factor in this study was CYP3A5 genotype. The SNP 6986 A > G on the CYP3A5 gene results in absence of function protein. Carriers of homozygous 6986 G allele (designated as CYP3A5*3) have no CYP3A5 activity, which impair the whole-blood concentration of tacrolimus 24 and subsequently the time required to reach target concentration 3 . None of the included ABCB1 SNPs were found any significant impact on the algorithm. In fact, previous researches checking the association between ABCB1 genotypes and tacrolimus dosage have come to conflicting results 24-26 . Our results indicated that the intermediate dose range exhibited better accuracy (lower MAE and higher ideal rate) than that in the high-and low-dose ranges. Nevertheless, patients in this dose range are less likely to benefit from statistical models based on pharmacogenomics. In practice, patients who require extreme dose administrations (or whom grouped in the high-and low-dose ranges) are more likely to face overdose or underdose and hence suffer from adverse clinical consequence 22 . Therefore, better prediction of extreme dose ranges are needed to present real help to those patients.
Whilst machine learning techniques demonstrated their capability in solving inferential problems by self-adjust their structure when encounter errors, as well as dealing with numerous variables simultaneously 20 , we should be noted that they are still far from omnipotent in clinical use. The relationship between dependent variables and independent variables are very complicated in all these statistical algorithms, and the existence of  gene-gene and gene-environment interactions bring more challenge to the researchers [27][28][29] . Inclusion of larger number of genotypic variables in a predictive model may be helpful to obtain a better performance, but this may lead to addition of redundant data and may hinder its application in clinical practice 21 . The complicated situation of real patients should be well considered, as additional comorbidity and interacting drugs are always the case, which may not be completely included in the models 30 . Therefore, even the statistical models are utilized to increase the predictive accuracy of TSD, continuous monitoring of drug concentration is still needed at the moment.
There are some limitations in this study, no other potentially important factors were included, such as smoking, alcohol consumption and other genetic factors; secondly, data regarding tacrolimus initial doses or adverse reactions were not gathered in the study, only data about stable therapeutic doses were considered; in addition, using of p-value threshold to select significant SNPs may be not enough to generate most complementary SNP set 21 .

Patients. Stable tacrolimus-treated renal recipients at The Third Xiangya Hospital of Central South University
and Peking University Health Science Center between Oct 2012 and Sep 2014 were considered for enrollment. All patients were Chinese with a minimum age of 18 years old. The clinical research admission was approved by Chinese Clinical Trial Registry (registration number: ChiCTR-RNC-12002894). The study protocol was approved by the Ethics Committee of Institute of Clinical Pharmacology, Central South University (CTXY-120030-2), all methods were performed in accordance with the relevant guidelines and regulations, and written informed consent was obtained from all patients.
The demographic and clinical information of the subjects were obtained from their clinical records as well as clinical and telephone follow-ups. Information of combined diseases such as hypertension and diabetes and concomitant medications such as omeprazole, metoprolol and calcium channel blockers were collected. All the patients received tacrolimus, together with mycophenolate and glucocorticoid after transplantation. Tacrolimus was initiated at 0.05 mg/Kg every 12 h then dose adjusted to target trough concentration of 6-10 ng/ml for the first month, and then 6-8 ng/ml afterwards. Tacrolimus concentration was monitored daily during the hospital stay and in the follow-up visits. Doses were adjusted by 25% each time when it was out of the above target range. Tacrolimus stable dose (TSD) was defined as the total daily dose after 3 months of transplantation, and at least three consecutive blood concentrations were within target range and within 20% of each other 19 .   Model Building and Statistical Analyses. The overall modeling process is illustrated as Fig. 3. Generally, 80% (838 patients) of the eligible patients were randomly selected as the "derivation cohort" to develop dose-prediction algorithm. The remaining 20% of the patients (207 patients) constituted the "validation cohort", which was used to test the final selected algorithm. Meanwhile, to obtain robust results, 100 times of resampling were run to minimize the overfitting problem. Next, univariate and stepwise multivariate linear regression (MLR) was used to select covariates related to tacrolimus stable dose. Covariates with statistical significance (CYP3A5 genotype, hypertension and use of omeprazole) were used to develop algorithms within derivation cohort (train set). The performances of the algorithms were evaluated and compared using the mean absolute error (MAE) and the mean percentage of patients whose predicted dose fell within 20% of the actual dose (ideal rate) in the remaining 20% of patients (test set). The MAE is defined as the average of the absolute value of actual dose minus predicted dose, while the percentage of patients within 20% of the actual dose was selected by us since this definition has been widely applied 22 . Descriptive statistics was utilized to determine means and standard deviations, frequency and percentage distributions. Chi-square test was used to assess deviations of allele frequencies from Hardy-Weinberg equilibrium. MLR and eight machine learning techniques, namely, support vector regression (SVR), artificial neural network (ANN), regression tree (RT), random forest regression (RFR), boosted regression tree (BRT), multivariate adaptive regression splines (MARS), lasoo regression (LAR), Bayesian additive regression trees (BART) were applied in tacrolimus dose prediction. All analyses in this study were implemented using R (Version 3.2.2) 31 with related packages or our custom written functions. We used the RSNNS package for ANN 32 , rpart package for RT 33 , gbm package for BRT 34 , e1071 package for SVR 35 , randomForest package for RFR 36 , earth package for MARS 37 , glmnet package for LAR 38 and bartMachine package for BART 39 . Default parameters were used (Supplementary Table S4).
In the validation cohort, the MAE and ideal rate of the pharmacogenomic algorithm were calculated overall, also in terms of tacrolimus dose range, which was divided into three categories based on the 25% and 75% quartiles of TSD: low dose (≤ 2.5 mg/day), intermediate dose (2.5-4 mg/day), and high dose (> 4 mg/day).