In 2012, the Centers for Medicare and Medicaid Services (CMS) launched the Hospital Readmissions Reduction Program (HRRP) to reduce the risk of readmission in patients hospitalized for acute myocardial infarction, pneumonia, and heart failure. In 2014, it was extended to include readmissions for elective total knee and total hip replacements as well as chronic obstructive pulmonary disease (COPD) exacerbations1. Based on 2013–2016 Medicare data, the 30-day observed readmission rate for pneumonia was 16.9%, and for COPD was 19.8% from 2014 to 20152. A 30-day rehospitalization rate of 17.6% among Medicare beneficiaries in 2005 equated to an estimated expense of $17 billion3.

The Department of Health and Human Services identified effective transition of care as a major quality improvement goal4,5. To reduce unnecessary readmissions and identifiy high readmission risk patients early, providers should facilitate timely post-discharge support. However, readmission risk models historically have performed poorly6,7.

The objectives of this study are four-fold: (1) to extract day one admission data from a clinical data warehouse and apply machine learning techniques to develop a readmission risk prediction app for patients hospitalized with the primary diagnoses of “COPD”. The app will utilize a minimal number of input variables available on day one of the hospitalization; (2) to validate the app in a second population; (3) to initiate in-hospital interventions and early post discharge care planning, based on the app’s designation of ‘high risk of readmission’; and (4) to prospectively evaluate the subsequent readmission rates.


Parameter selection and data characteristics

Figure 1 shows the time period and data source of the data collected for each task in the model development process. Variables for univariate analysis were derived from 3005 COPD patients from January 2013 to June 2014, except Rothman Index. Supplementary Tables S1 outlines the results of the univariate analysis for each parameter among those patients. 332 COPD patient visits from January 2013 to June 2014 were collected as training sample (with Rothman Index). Four parameters highly predictive (p value < 0.01) of subsequent readmission, identified by the Wald test include: the number of in-patient admissions in the previous 6 months, number of medications administered on admission day, insurance status, and Rothman Index on hospital day one.

Figure 1
figure 1

The time period and data source of collected data for each task in the process of model developing.

A summary of the training dataset is provided in Table 1 and Supplementary Table S2. The raw dataset was pre-processed prior to modeling using our machine learning algorithm. The ‘Rothman Index at the day of admission’ numerical score was binned into four groups and labeled as 0–3 with those cut points that maximize the difference of the readmission ratio between the groups8. The variable ‘insurance status’ has five categories: “Medicare”, “Managed Care”, “Self-Pay”, “Other” and “unknown”, and was recoded with indicator variables. This retrospective dataset forms the basis of the information used to train and develop the artificial neural network and logistic regression model’s predictive capabilities. A separate prospective dataset was then used for validation. The relative importance of the input variables was given in Supplementary Tables S3 calculated by using the Garson’s algorithm9.

Table 1 Full dataset, Characteristics, and variables used in readmission prediction of COPD patients.

Validation of the models on the separate data

The final artificial neural network model achieved an average AUC of 0.683 (± 0.009) by three-fold cross-validation on the training data, while the average AUC for the logistic regression model was 0.640 (± 0.038), demonstrating that the artificial neural network model has better prediction power over the logistic regression model for readmission for COPD patients.

The predictive results were validated prospectively in 172 patients with COPD (July 2014–December 2014) before the implementation of clinical plan intervention. AUCROC curves were compared to evaluate the predictive power of the two models on the COPD validation samples (Fig. 2). We also trained a logistic regression model and an artificial neural network model on the 3005 COPD patients, using partial parameters without Rothman Index, which was validated on 705 patients with COPD (July 2014–December 2014). The two dashed ROC curves indicate the performances of the two models without Rothman Index. The AUCROC of the COPD validation data result without Rothman Index was 0.655 for logistic regression and 0.680 for artificial neural network; the AUCROC of the COPD validation data result with all four parameters was 0.701 for logistic regression and 0.767 for artificial neural network. The difference between the validation results suggests that even with using different parameters, the artificial neural network has better prediction performance for readmission than logistic regression for COPD patients. The artificial neural network model using the four parameters, with percentile score 50% as the threshold for high-risk patients, achieves a sensitivity 0.75 (95% CI 0.61–0.86), specificity 0.67 (95% CI 0.55–0.77), and positive predictive value (PPV) 0.61 (95% CI 0.48–0.73) on the validation data. The calibration curves demonstrate poorer alignment (accuracy) for the logistic regression model than the artificial neural network model on the validation data (Supplementary Figure S1).

Figure 2
figure 2

Comparisons of Logistic Regression and Artificial Neural Networks models by the receiver operating characteristic (AUCROC) curve for COPD validation samples. Partial parameters include the number of in-patient admissions in the previous 6 months, number of medications prescribed on admission day, insurance status; Rothman Index is added to form all parameters we used in the model.

Clinical app implementation

The artificial neural network model was implemented in the Re-Admit app (Supplementary Figure S2). This smartphone application communicates with the clinical data repository. The four input parameters are automatically converted into the proper inputs for the artificial neural network model, and the percentile rank of the risk score calculated by the model is then instantly returned to the interface of the app.

Clinical intervention results

Based on data obtained from VIZIENT (Irving, Texas, U.S.A.), among 847 COPD patient admissions from January 2015 to December 2016 before the app and intervention plan, there were 73 readmissions within 30 days (total readmission rate of 8.6%). Of 1778 patient admissions between January 2017 and July 2020 who were evaluated by Re-Admit app and given intervention according to their scores, there were 111 30-day readmissions (total readmission rate of 6.2%). Figure 3 shows the detailed readmission rates in high- and low-risk groups in each month from January 2015 to July 2020. In the low-risk group of 2015–2016 (before app and intervention), the average readmission rate was 3.9% while in the high-risk group, the average readmission rate was 15.2%. Post app and intervention i.e., January 2017–July 2020, the average readmission rate in the low-risk group was 3.6% (no significant difference with 3.9%), while in the high-risk patient group identified by the app, and subsequently receiving in-hospital clinical interventions as described, the average readmission rate reduced to 7.9%, a 48% decrease. Figure 4 shows the readmission rate trend from January 2015 to July 2020.

Figure 3
figure 3

Comparisons of readmission rate on high-risk versus low-risk patients’ group before and after intervention. (a) The readmission rate in the high-risk patient group in 2015–2016 before intervention is 15.2% and in the low-risk patient group in 2015–2016 before intervention is 3.9%. The high-risk patients are identified by the Re-Admit App with risk scores larger than 50%. (b) The readmission rate in the high-risk patient group in 2017-July 2020 after intervention is 7.9% and in the low-risk patient group in 2015–2016 before intervention is 3.6%. The high-risk patients are identified by the Re-Admit App with risk scores larger than 50%.

Figure 4
figure 4

The trend of readmission rate from 2015-July 2020. The trend line was smoothed by averaging the readmission rate of 6 months. The regression lines show the readmission rate increased slightly in 2015–2016 and a decreasing trend since 2017 when the follow-up care plan intervention started.


In this study, we constructed a neural network model for predicting readmission risk in patients with COPD on admission day one. The algorithm, deployed in a smartphone application, allowed the COPD Readmission Prevention Committee at Houston Methodist Hospital to predict readmission risk using only four variables conveniently and accurately. Previously published patient readmission prediction tools required multiple variables, relied on discharge parameters10, or were simply unreliable11. This risk assessment model appropriately identified high readmission risk patients with COPD. Clinical interventions successfully reduced 30-day readmissions in COPD.

Previous comparisons of logistic regression and artificial neural network models applied to patient readmission prediction have varied considerably, possibly reflecting different primary admission diagnoses12,13,14,15. This study found that the artificial neural network model has better predictive performance for patient readmission risks on the first day of patient admission than logistic regression models. The neural network, which can be used to simulate complex nonlinear relationships, has a more sophisticated modeling structure than logistic regression. Thus, few assumptions are required before constructing the model. Furthermore, the neural network can continue to increase in efficacy as new datasets are added.

The Re-Admit app also leverages the composite Rothman Index as one of its prediction factors of readmission likelihood. The index has been validated for assessing in-hospital mortality risk and subsequent post-discharge one-year mortality. It is widely utilized in many hospital systems, and routinely available from day one and throughout the patient’s hospital stay. However, it has not been used in any other readmission risk prediction models to date. The assessment index’s expanding acceptance and integration into the EMR of health systems nationwide and beyond translates to ease of adoption and deployment of Re-Admit in the future. We are investigating a small subset of parameters in the Rothman Index to construct the artificial neural network model when the Rothman Index is not available.

An advantage of the Re-Admit predictive model is the utilization of a small set of input parameters—four variables obtained on the first day of admission, compared to similar models requiring as many as twenty variables for similar predictive capability16. Existing methods for readmission prediction require information not yet available upon admission, e.g., length of hospital stay—hence the prediction can only be made on or after discharge—often too late for meaningful intervention. By identifying at-risk patients on day one of hospital admission, discharge planning/out-patient transition resources can be focused on those at greatest risk of readmission. This early intervention may reduce costs and lead to better patient outcomes for patients with COPD.

Limitations & strengths

This study was performed on the derivative population in the Houston Methodist System (a large urban eight-hospital system) and the region (Urban Southwest US), and the generalization of its findings are needed to validate in other health systems of different patient demographics or clinical practice. Even if the Houston Methodist’s Re-Admit app is not immediately generalizable to other hospitals and systems, the artificial neural network is trainable such that the four variables could provide the same degree of readmission risk prediction that is unique to that institution or region. A disadvantage of the artificial neural network or its variants is that it needs more coefficients for training than logistic regression models. The training requirement is more demanding. Meanwhile, those variables that were significant in our patient population may or may not be relevant in other groups, thus ongoing prospective validation in other health systems is necessary.

One limitation is using the Rothman Index as this may restrict the generalization of the model. Many other hospitals may use other severity index products rather than the Rothman Index. We can replace the commercial Rothman index that is not available in the local hospital with similar index used in the electronic medical record (EMR) of the local hospital. We could also create our own indicator that performs the same function as Rothman Index using patient data in the EMR. Since we use EPIC in our health system, we plan to experiment and compare with the EPIC risk score with the purpose of using the EPIC risk score as a replacement. For generalization, a few regulatory steps need to be conducted, including: (1) obtaining the approval by IRB, (2) securing the support and engage the Readmission Risk Committee, (3) validating the Re-Admit app in both retrospective and prospective studies, (4) submitting the results to Systems Quality Control Committee for approval for use in routine clinical operation, (5) integrating into the electronic health record system of the local hospital, and (6) implementing the process to certify the Re-Admit app on an annual basis using criteria defined by the Systems Quality Control Committee or Council.

The choice of interventions to mitigate readmission included specialist notification of risk status and formal pulmonary consultations. Pharmacist and respiratory therapist training, medication reviews and interventions, early post discharge clinic/office appointment and home health care coordination were executed weekly during concurrent review of inpatients by the COPD Readmission Reduction Committee. CONNECT phone calls, an automated outreach telefonic program, were also instituted to query the patients’ needs, concerns and questions immediately post discharge.


In summary, this Re-Admit app utilizing an artificial neural network-based predictive model successfully classified patients at risk for readmission with the primary diagnosis of COPD. The risk stratification was performed accurately on day one of the admission thus successfully stratifying the patients early for interventions. Focused in-hospital teaching and care-transition initiatives based on the high risk of readmission identified patients on day one. This has led to decreased 30-day readmissions for our COPD patients, improving outcomes and health care savings. This research also illustrates the emerging paradigm of the smart optimization of clinical care pathway driven by rigorously validated AI apps to improve outcomes.


Data collection and statistical analysis

Medical records of patients with COPD as a primary diagnosis were queried from METEOR (Methodist Environment for Translational Enhancement and Outcomes Research) clinical data warehouse of all eight Houston Methodist Hospitals17. As the Rothman index score is not stored as raw data, the score was manually extracted from the monitoring panel. The Rothman Index (PeraHealth, Charlotte NC) is a regularly updated integrated health score using a range of twenty-six physiological measures, including lab test results, vital signs, and nursing assessments18. It is an automated, proprietary third-party algorithm embedded within commercial electronic medical record systems. The Rothman index has been shown in multiple trials to be a valuable metric for predicting mortality in hospitalized patients. It has not, however, been utilized as a component for predicting readmission. Supplementary Figure S3 shows the twenty-six variables and the corresponding Rothman Index for a patient in the monitoring panel.

The following available parameters that were considered to potentially impact readmission included demographics, index admission type, day one data on severity of illness, comorbidities, laboratory data, medications on admission, Rothman Index on admission, procedures, and chief complaint for admission19,20. Univariate analysis was employed for each parameter to assess its association with subsequent readmission21. A Wald test from logistic regression was conducted on the parameters and the parameters were ranked by p-value. Parameters highly predictive of subsequent readmission were identified from the training sample, and subsequently validated against a second population of patients with COPD.

All aspects of this study were carried out in accordance with relevant guidelines and regulations including preserving patients' privacy. The Houston Methodist Hospital Institutional Review Board approved this study, and this study is one of several process quality improvement projects commissioned by Houston Methodist Hospital management to improve quality of patient care. Retroactive patient data accruals was used in developing the model and the Houston Methodist IRB granted a waiver for patient’s informed consent.

Model building and training

The task of predicting early patient readmissions was formulated as a binary classification task—Readmission yes/no into any of Houston Methodist system hospitals within 30 days. Two mathematical modeling approaches were constructed: logistic regression22,23 and artificial neural network24,25.

Training and testing the neural network

Artificial neural network has the advantage of modeling complex nonlinear functions. A classic neural network includes an input layer, some hidden layers, and an output layer. Each layer contains some nodes or neurons. Each node in the hidden layer is a mathematical function to transfer information from input to output. Connections between two nodes from two adjacent layers are called weights. The logistic regression model is considered as a simple form of the neural network with only one node in the hidden layer and one output unit. A model of components in a simple neutral network is presented in Supplementary Figure S4.

The artificial neural network can be described by mathematical formula as:

$$y_{i}^{\left( 1 \right)} = f\left( {\mathop \sum \limits_{j = 1}^{{m_{0} }} w_{i,j}^{\left( 1 \right)} *y_{j}^{\left( 0 \right)} + w_{0,j} } \right)$$

whereas \({y}_{j}^{\left(0\right)}\) is the jth input in the input layer, \({w}_{i,j}^{\left(1\right)}\) the connection weight from the jth input node to the ith node in the first hidden layer, \({w}_{0,j}\) the bias, and \(f(x)\) function referred as activation function which is some predefined function, such as the hyperbolic tangent, sigmoid function, softmax function, or Gaussian function. The information of each layer is passed to the next layer based on the formula until the output layer.

The best artificial neural network model was settled using a three-fold cross-validation method; the same procedure was conducted to train the logistic regression model for comparison proposes. For each round of three-fold cross-validation, the neural network model was trained a hundred times with different randomly initialized coefficients, as the neural network is easily trapped into the local optimal solution because of improper initialization. The network with the best prediction performance with the test data set was selected as the best model for each round. The final performance was estimated on the average performance of three-fold cross-validation. That process was repeated four times with 2, 3, 4, or 5 nodes in the hidden layer with the purpose of determining the optimal number of hidden nodes. After the number of nodes settled, we compared the artificial neural network model with the logistic regression model based on their average performance of three-fold cross-validation. Furthermore, the output scores of the final neural network model on the whole training samples were converted to percentiles as the final risk score for readmission. Percentiles equal to or greater than 50% are considered high risk for readmission.

The training of the neural network prediction model was conducted by using the ‘neuralnet’ package26 in R27. The neural network was trained by the resilient backpropagation (RPROP)28 with weight backtracking method.

Validation of the models

For validation purposes, the logistic regression and artificial neural network algorithms were applied prospectively to predict readmission in COPD patients admitted from separate time periods. The performance of prediction of readmissions was then compared between the neural network model and logistic regression model. Area under the curve of receiver operating characteristic (AUCROC) curves, sensitivity, specificity, and positive predictive value (PPV) were calculated to compare the readmission prediction in the two models. Model calibration was evaluated using plots of predicted versus observed 30-day COPD readmission rate.

Implementation of high readmission risk protocols for COPD cohort

Since 2017, patients with a primary diagnosis of COPD were evaluated weekly by the hospital’s Readmissions Reduction Committee, which reviewed the patients’ diagnosis and care plan in the context of the readmission risk as determined by the predictive model. Patients identified as high risk for readmission received the following: specialist consultation or notification of risk status, medical educational visits by clinical pharmacists and respiratory therapists, and home health and early physician follow-up visits scheduled prior to discharge. Implementation of transition telephone calls following discharge (CONNECT) ensured that all aspects of discharge planning were proceeding properly.

All COPD patients admitted between January 2015 and July 2020 from VIZIENT were evaluated to compare readmission rates before and after the introduction of app prediction and subsequent interventions. VIZIENT is an external dataset providing readmission data inclusive of readmissions outside the Houston Methodist system. Ultimately, the Re-Admit app is made available on the smartphones of clinicians at the bedside.

Statistical analysis

Model performance was evaluated based on the AUC with standard variance, and 95% CIs of sensitivity, specificity, and positive predictive value (PPV). The average AUC of the models on the training set was computed on the threefold cross-validation. 95% CIs of the metrics for the neural network model on the validation dataset was computed with 2000 bootstrapping. Calibration analysis was performed to compare the alignment for the logistic regression model and the artificial neural network model on the validation data.