Development of a system to support warfarin dose decisions using deep neural networks

The first aim of this study was to develop a prothrombin time international normalized ratio (PT INR) prediction model. The second aim was to develop a warfarin maintenance dose decision support system as a precise warfarin dosing platform. Data of 19,719 inpatients from three institutions was analyzed. The PT INR prediction algorithm included dense and recurrent neural networks, and was designed to predict the 5th-day PT INR from data of days 1–4. Data from patients in one hospital (n = 22,314) was used to train the algorithm which was tested with the datasets from the other two hospitals (n = 12,673). The performance of 5th-day PT INR prediction was compared with 2000 predictions made by 10 expert physicians. A generator of individualized warfarin dose-PT INR tables which simulated the repeated administration of varying doses of warfarin was developed based on the prediction model. The algorithm outperformed humans with accuracy terms of within ± 0.3 of the actual value (machine learning algorithm: 10,650/12,673 cases (84.0%), expert physicians: 1647/2000 cases (81.9%), P = 0.014). In the individualized warfarin dose-PT INR tables generated by the algorithm, the 8th-day PT INR predictions were within 0.3 of actual value in 450/842 cases (53.4%). An artificial intelligence-based warfarin dosing algorithm using a recurrent neural network outperformed expert physicians in predicting future PT INRs. An individualized warfarin dose-PT INR table generator which was constructed based on this algorithm was acceptable.

www.nature.com/scientificreports/ There have been several studies that used mathematical modeling or machine learning to assist in warfarin dosing [11][12][13][14][15][16][17][18] . Previous research has mainly depended on genetic or drug interaction information to increase the accuracy of warfarin dose prediction. However, these methods are insufficiently precise and genetic analysis is impractical due to its cost. Each patient has their own metabolism with its own specific pharmacokinetics and pharmacodynamics, so predicting warfarin doses based only on constant variables such as sex, age, race, body mass index, genetic test results is clearly limited in reliability.
This study had two goals. The first goal was to develop an individualized PT INR prediction model that learns from patients' previous responses to warfarin. The second goal was to develop an individualized clinical support system for setting warfarin maintenance doses that used the PT INR prediction model, which is a precision warfarin dosing platform that predicts PT INRs. The decision support system was developed in a way that it can be easily used in clinical practice.

Methods
Data preparation. Complete architecture of the study is shown in Fig. 1. Only retrospectively collected data was used in this study. Patients included in the study had to be inpatients, taking warfarin, and be at least 18 years old. Baseline patient characteristics were sex, age, body weight, and height. PT INR values and warfarin dose information were collected. Data were collected via the electronic health information systems of Severance Hospital (SEVH, from 2008 to 2018), Sejong General Hospital (SGH, from 2010 to 2018), and Seoul National University Bundang Hospital (SNUBH, from 2003 to 2018) (Fig. 2). SEVH and SNUBH are tertiary referral hospitals and SGH is a specialized cardiovascular intervention and surgery institute.
Data was pre-processed using R 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria). The machine learning algorithm was developed and tested with Python 3.5.6 (Anaconda Inc., Austin, TX, USA), including the Keras 2.2.2 and Tensorflow 1.10.0 libraries. Categorical variables are expressed as the value with the percentage in parentheses while continuous variables are expressed as mean ± standard deviation.
The pre-processing of the data is summarized in Fig. 3. During pre-processing, outliers or obviously unreliable data were excluded, including observations from patients with weight < 35 kg or > 120 kg, height < 130 cm or > 220 cm, patients who took two or three doses of warfarin per day, PT INRs > 10.0, and daily warfarin doses > 20 mg (see S1a of Supplementary Appendix for inclusion and exclusion criteria). Body surface area (BSA) was calculated using the Mosteller formula 19 . Raw data were checked line by line. To avoid inconsistency or inaccuracy in warfarin doses, we checked the nursing record of whether the drug was taken by the patient very carefully, excluding the doses that were not actually given to the patients. Only data for 5 sequential days were converted into a single-line format using automated coding (see S1b of Supplementary Appendix for dataset baseline characteristics).
The 5th-day PT INR prediction model. The structure of the 5th-day PT INR prediction model is shown in Fig. 4. All variables were standardized to have a mean of 0 and a standard deviation of 1. Then sex, age, weight, height, and BSA were put into layers of a dense neural network. The time-series variables, namely the PT INR and warfarin dose on days 1-4 were reshaped to 4 × 2 tensors and put into layers of a recurrent neural network with long short-term memory (LSTM). The parameters of the dense neural network and recurrent neural net- www.nature.com/scientificreports/ work were concatenated, and the parameters were passed through layers of the dense neural network. Finally, the model produced a single value for the predicted 5th-day PT INR. Rectified linear unit (ReLU) activation functions were used in the dense neural network. Mean absolute error was used as the loss and the Adam optimizer was used for parameter tuning. The model was trained and validated with the SEVH dataset and was tested with the SGH and SNUBH datasets. Training epochs were determined based on the point at which overfitting began which was defined as the constant increase in validation loss (the validation set was set to 20% of the training set). As there was no improvement in mean absolute error when we added more than 5 layers in LSTM, we used 5 layers in LSTM to avoid overfitting. For node numbers, we tested 16, 32, 64, 128, and 256 nodes in each layer, and 32 nodes were chosen because a higher number of nodes resulted in overfitting. In the dense layers for baseline patient characteristics, we limited the number of nodes to 16 because there were only five input variables in the dense layers. In the dense layers for concatenated tensors, we fixed the node numbers at 32 and limited the number of layers to 4 after tests based on manual adjustment.  (Fig. 2).
Machine learning model predictions compared to expert physician predictions. The performance metrics of the machine learning model and expert physicians are compared in Table 1 (see S6 of Supplementary Appendix for performance in each hospital). A histogram of the differences between predicted and actual 5th-day PT INRs is shown in Fig. 6a. The predictions made by the machine learning model were within 0.3 of the actual value in 84.0% of cases (10,650/12,673). This performance was statistically significantly better than that of the 10 expert physicians whose predictions were within 0.3 of the actual value in 81.9% of cases (1647/2000) (P = 0.014). The predictions made by the machine learning model were more than 0.5 away from      Fig. 6B).

Discussion
There are various warfarin dose suggestion models that use artificial intelligence (AI) 17,20,21 However, these models are all limited in that they were constructed to predict optimum initial starting or maintenance warfarin doses using only cross-sectional data. However, this approach does not reflect the actual human decision-making process. In reality, physicians examine patterns in PT INRs and warfarin doses to determine the optimum steadystate warfarin dose which will maintain the PT INR in the target range when administered repeatedly. Therefore, this study was conducted to develop a model that mimics the human decision-making process. However, training a machine learning model to predict the optimum warfarin dose would be inappropriate because the dose would be an independent variable while the PT INR would be a dependent variable. The optimum warfarin dose is determined when repeated fixed doses of warfarin maintain the PT INR in the target range. However, each patient has their own target range and there is little actual data about repeated administration of fixed warfarin doses. This lack of data makes the reliability of an optimum warfarin dose questionable. Therefore, the model developed in this study was designed to predict future PT INRs based on previous patient data.
The machine learning algorithm developed in this study examined the pattern of changes in PT INRs to determine how it was influenced by warfarin doses over the course of 4 days to predict the 5th-day PT INR as accurately as possible. LSTM, a type of recurrent neural network, was used to discover patterns in sequential PT INR-dose data. LSTM draws on past data when making calculations. LSTM is widely used for machine learning on time-series data, such as natural language processes. The algorithm developed in this study was trained to adapt to individual pharmacokinetic and pharmacodynamic characteristics by examining how fluctuations in a person's PT INR correlated with their warfarin doses over the 4 days. Several other studies divided patients according to dose ranges and developed a separate algorithm for each and so could not cover a wide dose range 11,17,21 . However, the algorithm developed in this study does not require patients to be classified by dose because it was trained on the whole warfarin dose range and PT INR responses. Theoretically, patients would not need to be classified according to race or ethnicity either if training data includes such characteristics.
Before performing the chain calculation, which was the main interest of this study, the predictions of the 5th-day PT INR had to be verified to be substantially accurate. The accuracy of this prediction is very important because any inaccuracies may be amplified during chain calculations. Therefore, the accuracy of the predictions produced by this study's algorithm and expert physicians with at least 3 years of warfarin prescription experience were compared. Ten expert physicians made predictions about a total of 2000 questions that were drawn from the machine learning test dataset. The physicians commented that, in clinical practice, they try to determine the optimum warfarin dose, not predict the next-day PT INR, so they are not used to predicting PT INR of the next day. Interestingly, the expert physicians were slightly, but statistically significantly, less accurate than the machine learning algorithm. This difference was smaller than was hypothesized.
The algorithm developed in this study examines data from the preceding 4 days to predict the 5th-day PT INR. Therefore, a chain calculation which is conducted by putting in 'virtual' future warfarin dose was possible (Fig. 5). For example, this algorithm could predict the PT INRs of days 5-8 based on repeated fixed doses of warfarin administered on days 4-7. It is possible to vary the input doses, so individualized warfarin dose-PT INR tables can be generated. As in the real world, the predicted PT INR values converged on a static value on days 5-8 under the assumption that a repeated fixed dose of medication is administered. In order to evaluate the accuracy of the predicted 8th-day PT INR based on the data from days 1-4, it was assumed that a fixed dose of warfarin is administered on days 4-7. Then the data used in this study was searched to find data from 8 sequential days in which the same warfarin dose was administered from days 4-7. The accuracy of the 8th-day PT INR by chain calculation was lower than that of the predicted 5th-day PT INR. However, approximately 50% of the predictions were still within 0.3 of the actual value. This level of accuracy was similar to the accuracy of pharmacogenetic algorithms in other studies. However, unlike those algorithms, the algorithm in this study did not require the input of genetic information and did not have a limited target range that cannot be changed 22,23 .
Physicians can refer to the dosing tables generated by the algorithm developed in this study as an aide in determining a patient's optimum warfarin dose. Using this table will likely reduce fluctuations in warfarin doses Table 1. The performance of the machine learning algorithm and expert physicians in predicting the 5th-day prothrombin time international normalized ratio (PT INR). The predictions were based on data from days 1-4. www.nature.com/scientificreports/ by helping physicians more precisely determine the target PT INR range and more accurately selecting warfarin doses from the beginning of administration. An interesting finding in this study was that the accuracy of the predicted 8th-day PT INR differed by hospital that the data came from. The baseline characteristics of datasets from each hospital were similar, but the chain calculation may have amplified differences between the SGH and SNUBH datasets. The machine learning algorithm developed in this study could be more accurate if it is trained with more variables, such as whether the patient is taking other drugs that can interact with warfarin. However, training it with these extra variables might make it impractical to use the model because of the complexity of the data input process. Therefore, this study was conducted to develop an algorithm that included clinically essential variables www.nature.com/scientificreports/ that the physicians depend on while making warfarin dose decisions. Clinicians around the world can run this algorithm on their own data through the website aiwarfarin.org. This study had several limitations. The algorithm was not prospectively tested yet. Its accuracy must be improved before it is used in a prospective trial. Another limitation is that the algorithm was not trained on data which included missed administrations, represented as 0 mg doses. This study also did not account for the addition or discontinuation of other drugs that can interact with warfarin. PT INRs are influenced by a number of factors, so the algorithm's prediction accuracy may ultimately not be able to exceed a certain level.

Conclusion
A machine learning algorithm using a recurrent neural network outperformed expert physicians in predicting PT INRs. The individualized warfarin dose-PT INR table generator which was developed based on this algorithm was substantially accurate. If this table generator is integrated in the health information system, it can help physicians reduce errors in warfarin prescription. A prospective study must be conducted to validate the efficacy of this warfarin dose decision platform.

Data availability
Data are available upon reasonable request.
Received: 4 May 2021; Accepted: 9 July 2021 Table 2. Prediction of 8th-day prothrombin time international normalized ratio (PT INR) based on data from days 1-4 and assuming that a fixed dose of warfarin was administered from day 4 to 7. These predictions were compared to real data for patients who were administered the same dose of warfarin from day 4 to 7.