Predictive Analytics for Retention in Care in an Urban HIV Clinic

Consistent medical care among people living with HIV is essential for both individual and public health. HIV-positive individuals who are ‘retained in care’ are more likely to be prescribed antiretroviral medication and achieve HIV viral suppression, effectively eliminating the risk of transmitting HIV to others. However, in the United States, less than half of HIV-positive individuals are retained in care. Interventions to improve retention in care are resource intensive, and there is currently no systematic way to identify patients at risk for falling out of care who would benefit from these interventions. We developed a machine learning model to identify patients at risk for dropping out of care in an urban HIV care clinic using electronic medical records and geospatial data. The machine learning model has a mean positive predictive value of 34.6% [SD: 0.15] for flagging the top 10% highest risk patients as needing interventions, performing better than the previous state-of-the-art logistic regression model (PPV of 17% [SD: 0.06]) and the baseline rate of 11.1% [SD: 0.02]. Machine learning methods can improve the prediction ability in HIV care clinics to proactively identify patients at risk for not returning to medical care.

and behaviors. In contrast, a more fine-grained machine learning approach to identify individuals at risk for falling out of care can overcome these shortcomings by building models tailored to individual features, rather than just group characteristics.
Machine learning methods are particularly well suited for early warning systems that inform interventions for patient retention because they (1) are optimized for future predictive accuracy, (2) can detect non-linear complex interactions (as opposed to traditional methods), (3) are able to rank and prioritize individuals according to risk score rather than group risk, and (4) combine multi-source data at different levels of granularity. Traditional methods (e.g., differential equation modeling or agent-based modeling) focus on understanding HIV transmission in aggregate rather than at the individual level and are not optimized for prediction. Accordingly, the aim of this study was to develop a machine learning predictive model of retention in HIV care among individuals in an urban HIV care clinic using electronic medical record (EMR) data, geospatial data, and US Census data. Our machine learning models are scalable, adaptive, and produce patient-level dynamic predictions.

Methods
Study sample. HIV-positive individuals 18 years of age and older who attended at least one medical appointment at the University of Chicago adult HIV care clinic between January 1, 2008 and May 31, 2015 were included in the study. The University of Chicago adult HIV care clinic is located on the south side of Chicago, a major U.S. HIV epicenter 27 . For all eligible patients, the following data were collected from the EMR: demographics, insurance information, other medical conditions, medications, HIV care provider, substance use history, and laboratory test results. Appointment attendance history including attended, cancelled, and missed visits was collected from the beginning of the study period up to one year after study enrollment (through May 31, 2016). Both billing diagnoses as well as clinician-assigned diagnoses documented in the "problem list" section of the EMR were collected. All medical encounters within the University of Chicago were collected including outpatient appointments in the HIV clinic, all other outpatient appointments, hospitalizations, and Emergency Department visits. Laboratory test results collected included HIV viral load, lymphocyte subset data (e.g., CD4 count), sexually transmitted infection (STI) test results, and toxicology test results. Patients' addresses were geocoded and the travel distance and travel time to the clinic as well as the crime rate along the travel rate were calculated. Geocoding methods have been previously described 28 . Using data from the American Community Survey (US Census Bureau), characteristics of a patient's community at the census tract level including racial composition, fraction of population on Supplemental Nutrition Assistance Program, commute characteristics and education levels were collected 29 . Patients were censored, meaning the machine learning system no longer generated a prediction for the patient for a given window of time, when they transferred care to another clinic or died.
Predictor variables. Using the data described above, we generated a set of ~ 800 predictor variables (features) to be considered for inclusion in the machine learning models. Prior literature was used to guide feature creation, including factors previously shown to be associated with retention in HIV care, such as age, CD4 count, substance use, psychiatric illness, and prior visits [8][9][10][11][12][13][14][15][16][17] . Categories of features included demographics, diagnoses, location-based features, laboratory test results, medical visits, and specific providers seen. For each feature, measures were aggregated by time (e.g., count for the past six months, standard deviation for the past year, etc.) or time and space (e.g., the number of thefts in the patient's residential census tract in the past six months). We explored a range of values for the time (6 months, 1 years, 3 years, all history) and space (by zipcode and census tract) aggregations as well as different aggregation functions (mean, minimum, maximum, standard deviation). Categorical variables (such as race) were dummified. We detail this list in the appendix (Appendix eTable 1).
This methodology allows the machine learning model to use the time and space aggregation of the feature that is most predictive of the final outcome. For example, if more recent (6 month) viral loads are better correlated with retention in care than viral loads from several years ago, the method will use the average viral load in the past six months rather than average viral load for the past three years.
Missing data. Features with missing data had values imputed with the choice of value depending on the variable (e.g., a missing birth date resulted in an age assignment of the mean age of the population). For more details, see appendix (eTable 1). We also included a flag for whether or not the value was imputed as an additional predictor variable, allowing the model to use the missingness of a predictor variable as a predictor itself. Study outcomes. Two outcomes were studied: (1) retention in care and (2) access to care. Retention in care was defined as attending at least 2 HIV care visits greater than 90 days apart within a 12-month period 30 . This definition of retention is from the Health Resources and Services Administration HIV/AIDS Bureau (HRSA HAB). While there is no true "gold standard" of retention in care, this definition has been shown to be correlated with patient health outcomes including HIV viral suppression 31 . Access to care (also referred to as a 6 month gap) is defined as having a single HIV care visit within a 6 month period 31,32 . This metric is used by public health departments for the purposes of surveillance 27 . The outcome was predicted at the time of each patient's HIV care appointment, replicating the workflow (and data available) in the clinic, in which the patient arrives for their appointment and then receives a risk score. This predicted risk score can then inform and prioritize interventions to improve future retention in care.
Model training, validation, and selection. We tested the performance of 5 machine learning models in comparison to the current methods used by HIV clinicians for predicting retention and access to care. Methods comprised of regularized logistic regressions (l1 and l2), gradient boosting decision trees, decision trees, extra trees, and random forests. The five machine learning models were chosen to cover a large spectrum of possible classifiers and the spectrum of linear, tree, tree ensemble, and boosting models. Using Triage 33 , ~100 hyperparameter combinations for each model were tested, then fit to each training set 34 . Validation was performed using temporal cross validation 35 . Temporal cross validation was used instead of k-fold cross-validation to account for serial correlation and temporal patterns in the data and correctly replicate the modeling workflow in deployment. The data were divided into sets of model building cohorts and validation cohorts (alternatively, training set and test set), each of which is split by time (eFig. 1). This allows models to be developed on all appointments occurring before the year of prediction and tested on appointments occurring during the year of prediction. Model reporting complies with the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guidelines 36 .
Model performance was evaluated using the positive predictive value (PPV) with a population threshold of 10% (i.e., appointments were ranked by their scores and the top 10% of those were classified as high risk of retention failure). The PPV is the percentage of individuals correctly identified by the model as at risk that go on to drop out of care (i.e., the number of true positives divided by the number of predicted positives). In order to use retention resources efficiently, the system should minimize false positives, which would minimize wasting resources on patients who would not drop out of care. The choice of threshold was driven by the authors' clinic's capacity for intervention -10% of the population is approximately 150 appointments a year. We chose not to use Area Under the Curve (AUC), which is often reported for predictive models, because it is not appropriate for our limited resources setting as it captures the overall performance across every threshold. To prioritize a small number of individuals for intervention, positive predictive value ensures the model selected will minimize false positives within the intervention set.
For each model type, we chose the hyperparameters that consistently had high performance for each of the validation sets (i.e., each of the time periods). Specifically, we chose the model that most frequently was within 5% of the PPV of the best possible model over each time period (e.g., if the best possible PPV for a time period was 0.80, all models above 0.75 PPV were selected). This ensures that the final model selected is one that is both stable and has high performance.

Performance evaluation.
Predictions were made at the appointment level to simulate deployment in a clinical setting. For any prediction at the time of appointment, the training data and predictor variables included only information known before that point in time. We compared the PPV at 10% of the machine learning models to a logistic regression model based on the literature-identified features 37 , referred to as the 'previous-state-of-the-art' model. This 'previous-state-of-the-art' model uses the factors that clinicians might use to predict whether a patient will be retained in care based on previously published literature 37,38 . These features included demographics, age, race, gender, diagnosis of psychiatric illness, substance use history, viral load, and time since HIV diagnosis. We also compare our model results with the prior (the fraction of individuals who are not retained in care or not accessing care).

Bias evaluation.
Machine learning models deployed in this setting with many at-risk groups involved have the potential to disproportionately affect some sub-groups and exacerbate disparities. We audited our models using Aequitas 39 to ensure that prediction errors do not disproportionately impact certain protected classes (e.g., racial minorities).
While bias can be measured in many ways, we focus on metrics that measure disproportionate false negatives since failing to detect people at risk for retention failure is presumably more harmful than detecting false positives in these groups. A patient at risk for retention failure who does not receive an intervention loses opportunities for underlying challenges to be addressed (e.g., transportation might be a challenge and a case worker might be able to help navigate public transit). On a group level, a group can be negatively impacted if they systematically do not receive an intervention when it is needed. To measure this impact, we use False Omission Rate (FOR), which is defined as the number of false negatives divided by the number of negative predictions (alternatively Negative Predictive Value). Identifying a patient to be falsely at risk carries less negative impact to the patient, though the clinic intervention can become more inefficient when the clinic staff intervene on patients who are falsely identified as high risk.
Given the racial composition of the patient population, we focused our attention on auditing models for parity in FOR by race. Specifically, we considered a model to be disparate if its FOR ratio of Black vs White is either less than 0.9 or greater than 1.1. Implementation. The system was built using Triage, an open source machine learning tool 33 , for building features, running models across a large hyperparameter space, model selection, and model evaluation. The data and results are stored in a PostGreSQL database. We used Python's scikit-learn package for the machine learning models. The configuration file used to specify features and models can be found on GitHub 34 .

Ethical review of study and waiver of consent. This study was approved by the University of Chicago
Institutional Review Board (IRB). The IRB waived the need for informed consent as part of the study approval. Research was carried out in accordance with the ethical standards in the Declaration of Helsinki.

Results
Over the study period, 713 patients attended at least one HIV care appointment (Table 1), accounting for 11,445 total visits. Of these appointments, between 8-12% of appointments were not followed by a subsequent appointment at least 90 days later within a 12-month period, indicating a lack of retention in care for that time period (eFig. 2). Also, of these appointments, approximately 10% of the appointments did not have a subsequent appointment in a six-month period (access to care).
Model evaluation. Retention in care. The previous-state-of-the-art model had an average PPV of 14.1% [SD: 0.04] throughout the study period for the top 10% of predicted risk individuals, an improvement of 100% ables, sharing 80% of the top 20 predictors. A patient's history of past retention in care and previous HIV care encounters are important predictors for the machine learning models for both retention in care and access to care in 6 months (Fig. 2). In general, the previous-state-of-the art model found demographic features important. The top features of the previous-state-of-the art model were demographic features such as race, ethnicity as well as features such as days since first appointment and number of days since diagnosis. The best Random Forest model initially found these features predictive, but as the system collected more data the Random Forest model found the medical history of a patient-retention history, appointments, history of lab tests-to be more predictive.  (Fig. 3) (Fig. 4). The previous-state-of-the-art model had FOR of 0.26 [SD 0.05] and 0.29 [SD 0.08] for black and white patients respectively. When selecting for models with minimal overall FOR disparity, there is a tradeoff -the average PPV of the lower disparity models is 18% and 22% lower for retention in care and accessing care respectively. The FOR ratios are calculated over a relatively small sample. The predicted positive group is approximately 120 appointments per year which are split into different racial categories. As a result, this metric is susceptible to variation from small population size.

Characteristics Value
Age at first visit in study period, Mean (SD) 47.3 (13.6) Average crime rate on route from from residence to clinic, Mean (SD) 0.11 (0.03)

Discussion
This study demonstrates the potential of machine learning models to identify individual patients at the highest risk for falling out of HIV care, allowing busy HIV care clinics to direct limited resources toward patients who need them the most. To our knowledge, this is the first use of machine learning to understand retention in care among individuals living with HIV. Clinicians have difficulty predicting patients' risk for missing appointments, and may be subject to bias in determining which patients would benefit from resource-intensive retention interventions 40 . Our machine-learning model had a higher PPV and was less biased than the previous-state-of-the-art logistic regression model. Furthermore, while most prior literature regarding retention in care examines factors associated with retention at a single point in time, our model dynamically predicts retention longitudinally. Patients' appointment attendance patterns change over time, with patients often transitioning in and out of care 41 . The method we developed provides a retention risk score at the visit level and recalculates the score at each subsequent visit, incorporating new data that becomes available as well as characteristics that change over time (e.g., prior appointment attendance, HIV viral load, substance use patterns, change of address, etc.).
We modeled two different definitions of healthcare utilization: retention in care and access to care. Both definitions are used in practice and described in the literature. Overall, the machine learning model for access to care had a greater performance improvement over the previous-state-of-the-art model compared to the model for retention in care. Therefore, the model for access to care may be more efficient to implement in practice for the same amount of intervention resources. This will have to be decided upon based on a clinic's priorities for intervention.
We found that the most important predictor variables in the machine learning models for both retention and access were based on previous retention history and clinic visit history (e.g., total number of attended appointments). This is in keeping with prior literature that has shown that patients' history of missing appointments is predictive of future missed appointments. Pence et al. reported that the most important predictor of future missed visits among HIV-positive patients is prior missed visits 42 . Other studies have found that low initial CD4 count and elevated HIV viral load are risk factors for poor retention 11,13 . We found that the existence of CD4 or viral The 10% threshold was chosen to match the resources the clinic has for launching an intensive intervention. The machine learning models shown below are the best performing model (blue) and the best performing model of an alternate model type (for retention in care, a decision tree, and for access to care in 6 months, a logistic regression). (2020) 10:6421 | https://doi.org/10.1038/s41598-020-62729-x www.nature.com/scientificreports www.nature.com/scientificreports/ load tests acts as a proxy for the existence of an appointment and is thus more relevant to retention than the exact values of the laboratory tests.
Other factors that have been reported in prior literature to be related to retention including race and age did not figure prominently in our model. However, these were important predictors in models built on earlier time periods, indicating that when other historical information is not available, these factors can be useful predictors. Additionally, our population was 82% African American and with a mean age of 48 years. We may not have had sufficient numbers of other races or young patients for these factors to influence retention outcomes in our model. Of note, geospatial factors including travel time to clinic, neighborhood crime rate, and neighborhood characteristics were not among the most important predictive features in the models. This may be because many of our patients live in neighborhoods with similar characteristics (i.e., high poverty, similar crime rates) on the south side of Chicago. When our methodology is applied to a different and more socioeconomically diverse patient population, these features may rank higher in importance. To our knowledge, this is the first use of bias auditing of predictive models in an HIV care setting. Further work is needed to understand how to mitigate the risk of exacerbating disparities.
Our study has several limitations. EMR data regarding patients' diagnoses, medications, etc. may be inaccurate if providers do not accurately document and update patient data at each visit. Prior studies have shown wide variability in accuracy of billing diagnoses and incomplete problem list documentation in the EMR 43,44 . We attempted to limit inaccuracy due to poor documentation by incorporating multiple fields from the EMR. For example, patients with a history of substance abuse were detected not only by examining billing diagnoses for substance abuse, but also by collecting clinician-assigned diagnoses in the problem list, social history documentation of substance abuse, and toxicology screen results. Additionally, our EMR database only stores each patient's most recent home address. Therefore, we were unable to account for changes in patients' home address or living www.nature.com/scientificreports www.nature.com/scientificreports/ situations in our geospatial analyses. Furthermore, certain factors that may have an important impact on retention in care may not be captured within structured fields of the EMR, i.e., life stressors, social support, child care or other responsibilities, etc. In the future, we plan to incorporate natural language processing of unstructured clinical notes into our model to detect these factors.
Other sites can replicate the process presented here for extracting electronic data and incorporating them into machine learning systems using the in-house framework 33 and our open source code. The vast majority of outpatient medical practices in the U.S. utilize EMRs 45 , allowing them to replicate our process. Our open source code is available at https://github.com/dssg/hiv-retention-public.
In summary, we have created a machine learning system to predict which patients are most likely not to be retained-in-care that creates a longitudinal and panoramic view of the patient, incorporating different types of data at different levels of granularity, that outperforms the previous-state-of-the-art model as well as being more adaptable, scalable, and fair. Future areas of study include incorporating the model into the EMR to allow it to be used in real time to direct retention resources for patients most at risk for falling out of care.
conclusions Retention in care is crucial for individual and public health, yet the majority of people living with HIV in the United States are not retained in care. This study demonstrates that a machine learning framework to derive an optimal model to identify individuals at risk for falling out of care has the potential to improve retention. Our machine learning model was compared to logistic regression model and shown to have superior performance, be more adaptive, and have less disparate impact on minorities. Such a model will allow more precise prioritization of retention resources to patients likely to benefit most. Over time, the disparity in FOR for both our best performing machine learning models reduces. The machine learning model that is selected for best stable performance (blue) is better performing than the previous state of the art model (red). The best decision tree model (orange) has slightly lower performance and similar FOR ratios. The remaining models (black) were chosen for minimal disparity. show the first and third quartiles; the thin lines show the 5% and 95% percentiles. The purple band is the band of minimal disparity in FOR i.e., the ratio of the FOR for Black vs White races is within [0.9, 1.1]. Note that the x-axis goes from 0 to 0.4 to highlight the performance of the models. (bottom) Over time, the disparity in FOR for both our best performing machine learning models reduces. The machine learning model that is selected for best stable performance (blue) is better performing than the previous state of the art model (red). The best logistic regression model (green) has slightly lower performance and similar FOR ratios. The remaining models (black) were chosen for minimal disparity.