Introduction

Suicide among children and adolescents is one of the most critical public health concerns1,2,3,4,5. As the second leading cause of death among children, adolescents and young adults between ages 10–24 years, suicide claims over 6000 young lives every year in the US alone6. There has been an alarming increase in suicide rates among those 10–24 years of age7. Compared to 2000, suicide rates were 2–3 times higher in 2016 for this population8. The burden of suicide attempts is many folds higher than suicide deaths. In 2017, about 1 in 6 adolescents and young adults seriously considered attempting suicide and 1 in 13 attempted suicide9. Every year approximately 700,000 adolescents seek healthcare after an attempted suicide10.

In clinical practice, predicting the risk of suicide accurately within an actionable time frame is critical11,12,13,14. However, most of the clinical risk assessment tools available to clinicians are not sufficiently accurate to identify high-risk patients14,15,16,17. While the Columbia Suicide Severity rating scale presents an effective alternative for older clinical assessment tools, it may not be possible to screen every patient with this tool during the clinical encounter. Hence, it is evident that clinical practitioners need more than just clinical assessment tools to identify patients at risk of suicide. Recent efforts to apply machine learning with electronic health record (EHR) data to predict suicide risk in adult populations18,19,20,21,22,23 have not only confirmed the importance of prominent risk factors for suicidal behavior identified in prior research24,25,26,27 but also identified other characteristics leading to improved accuracy in suicide prediction compared to previous efforts19,28,29. However, there is scarcity of advanced risk prediction models that can be applied in clinical practice to the general population of children, adolescents, and young adults. Walsh et al.30 constructed a predictive model for adolescents who required no in-person assessment using EHR data with good prediction performance in a limited population in the Southern US. However, there is a need for a more comprehensive risk predictive models that consider a wider range of clinical factors beyond mental health comorbidities and medications and generate quantifiable risk scores that can be applied to determine the patient’s risk level for suicidal behavior.

In this retrospective study, we examined a range of demographic, diagnostic, laboratory, and medication-related factors derived from EHRs to identify significant predictors for suicidal behavior among children and adolescents receiving inpatient and outpatient care at a major children’s hospital to compare and contrast predictors across different time windows to examine the potential for variability in the factors associated with short- and longer-term risk. This is one of the few analyses in the suicide risk modeling literature to explicitly differentiate models predicting near term vs. longer term risk.

Methods

Ethics statement

Data analyzed in this study were deidentified by removing protected personal health information. Our study was approved by the University of Connecticut Health Center Institutional Review Board and Weill Cornell Medical College Institutional Review Board.

Cohort

We analyzed data from the Connecticut Children’s Medical Center (CCMC) EHR database, which contains information on 641,708 visits among 129,485 patients from 1 October 2011, to 30 September 2016. We included in the analysis all encounters related to outpatient, emergency room, or inpatient care among patients between 10 and 18 years of age at the time of the first suicide diagnosis (for whom has the suicide attempts) or the last visit (for whom has no suicide attempt). Patients who have no encounter related to outpatient, emergency room, or inpatient care were considered as missing longitudinal data and hence were excluded for analysis. In addition, patients whose first recorded visits due to suicide attempts were excluded since the lack of longitudinal information. To identify suicide attempts, we used an algorithm including both the external cause of injury codes of International Classification of Diseases, Ninth Revision (ICD-9) and other ICD-9 code combinations that are indicative of suicidal behaviors (as shown in Supplementary Table 1)31,32. Since our data included diagnostic codes in ICD-10 format, we converted the ICD-9 codes from the algorithm to ICD-10, using a public toolkit, AHRQ MapIT33. Figure 1 summarizes the data screening process to define the study population and suicide-positive and -negative groups for this study.

Fig. 1: Study flow outlining exclusion criteria.
figure 1

CCMC Connecticut Children’s Medical Center, ICD-10 International Classification of Diseases, Tenth Revision.

Measures and outcomes

To gain insight on both short- and long-term suicide risk and how the risk factors evolve through different periods of observation, we developed a series of models predicting the risk of future suicide attempts with varying prediction windows. We constructed models for prediction windows, including 0, 7, 14, 30, 60, 90, 180, 270, and 365 days. The model for each prediction window only uses data equal to or more distant in time than the length of the window. For example, the model for a prediction window of 60 days aims to predict the occurrence of a future suicide attempt that happens 60 days or longer following the encounter. Consequently, the model is trained using patient’s data that are captured at least 60 days prior to their end points, where the end point for a suicide-positive subject is defined as the time of his/her first suicide attempt, and the end point of a negative subject is defined as the time of his/her last recorded clinical visit. The model trained with the 0 day prediction window is expected to predict the suicide risk any time after the latest encounter. See Supplementary Fig. 1 and Table 2 for an illustration.

We utilized a broad range of variables as candidate predictors, including patients’ demographic characteristics, diagnosis codes, prescribed medications, and laboratory test data. Demographic characteristics include age, sex, race, and ethnicity. All diagnosis information was presented in ICD-10 format. For medication data, we used a publicly available toolkit MedEx34 based on RxNorm35 to standardize the prescribed medications. Laboratory data include the name of the laboratory test and its results. The predictors were constructed as binary variables, each of which indicates the absence or presence of a particular factor related to diagnoses, medications, or laboratory tests. We also included all the two-way interactions among these binary variables as candidate predictors.

Predictor screening

Predictor screening was performed based on testing the marginal correlation of each predictor and the occurrence of suicide attempts. Specifically, each predictor and the occurrence of suicide attempts produced a 2 × 2 contingency table. If all the cells were bigger than 5, then a Chi-square test was used to test the association, else Fisher’s exact test was used. To adjust for multiple testing, we implemented p value correction using the Benjamini–Hochberg procedure36 to control the false discovery rate (FDR) at 10%. As such, all the predictors with an adjusted p value smaller than 0.1 were used as candidate predictors to build the predictive models. We also applied the marginal screening procedure on the interaction variables (see Supplementary Appendix 1 for more details).

Model development and predictor selection

For each prediction window, we randomly divided the data into 90% training and 10% testing sets. As shown in Supplementary Fig. 2, we first performed the predictor screening to select the informative candidate predictors. Next, we developed the predictive model using the training set. A logistic regression classifier was built via a sequential forward selection procedure37 for selecting predictors to minimize the prediction error. The use of the sequential forward selection procedure allows us to watch the order in which predictors are added, and hence can provide valuable information about the quality of the candidate predictors. We concluded the selection procedure when predictive performance, as judged by a fivefold cross-validation, reached the peak. We repeated all the above procedures 10 times to validate the effectiveness of our predictive model and identify the predictors with the strongest association with suicide attempts. The level of importance of each predictor was measured by its frequency of being selected among the 10 predictive models, based on which the final set of predictors was determined. Supplementary Appendix 2 provides more details about the method for constructing predictive model for each prediction window.

Model evaluation

In order to estimate the proposed predictive models, we implemented logistic regression with L1 regularization for comparisons. The L1 logistic regression models were trained based on candidate predictors passed the univariate screening. To evaluate the predictive performance of the models, we examined out-of-sample performance metrics, including area under the receiver operator characteristic curve (AUC), sensitivity, specificity, and positive predictive value (PPV). AUC is a broad metric of discrimination performance in the machine learning community that ranges from 0.5 (random guessing) to 1.0 (perfect prediction). Since our dataset is highly imbalanced, i.e., 180 suicide-positive subjects vs. 41,541 suicide-negative subjects, we calculated sensitivities when setting specificities to 90% and 95%, respectively. We also calculated PPV, which is the probability that predicted high-risk patients have actual suicide attempts. For each prediction window, performance metrics were calculated based on prediction results over the testing set.

To further evaluate the usefulness of our predictive models, we grouped the selected predictors into seven categories: demographics, depression-related factors (including both diagnoses and medications), other mental health-related factors, routine tests, drug tests, pregnancy-related factors (among females), and other factors. We implemented logistic regression analysis on female and male cohorts, respectively, to estimate the contribution coefficient for each category.

Results

Following application of our inclusion and exclusion criteria, 180 (0.43%) patients with suicide attempts were labeled as positive subjects, and 41,541 patients without suicide attempts were labeled as negative subjects. Table 1 illustrates the demographic characteristics of the study population.

Table 1 Demographics of 41,941 patients of the study cohort.

Model performance

Patients with no clinical records before the prediction window were excluded for training the model. Hence as the prediction window increases, the number of patients eligible for analysis decrease (see Table 2). Though we analyzed the effects of two-way interactions among the predictors on the risk of suicidal behavior, we ultimately excluded them for building predictive models due to poor predictive performance when they were included. This was mainly due to the general sparseness of the data which was exacerbated by noise introduced by the interaction terms.

Table 2 Overall performance of the predictive models.

Statistics summarizing the ability of our models to predict the suicide risk, including receiver operating characteristic (ROC) curves, AUC, sensitivity at predefined specificity levels, and PPV, are presented in Table 2, and Supplementary Figs. 3 and 4. The proposed models predicted suicidal behavior with an overall AUC > 0.80 across all prediction time windows. The model performed similarly in terms of AUC for 0- to 270-day prediction windows (AUC = 0.84–0.86). The predictive performance declines for the one-year prediction window (AUC = 0.81, 95% confidence interval [CI] 0.76–0.86) since fewer patients had clinical records 1 year before the observation point. For all prediction windows, the model detected 53–62% of suicide cases with 90% specificity. Consistent with the low prevalence (from 0.43 to 0.95%) of suicidal behavior in the studied cohort, the PPVs across all prediction windows ranged from 3 to 6% for 90% specificity, and from 4 to 8% for 95% specificity (see Table 2). Overall, predictive performances of the proposed models were higher than those of the baseline L1 penalized logistic regression models (see Table 2 and Supplementary Fig. 4).

Predictor importance

Figure 2 depicts predictor importance, as measured by the frequency a specific predictor was enrolled by the sequential forward selection procedure. Characteristics of predictors are listed in Supplementary Tables 311. The importance of predictors varied across the prediction windows. ICD-10 code R45 (symptoms and signs involving emotional state) and F32 (depressive episode), and gender are the most common risk factors across all prediction windows. Age is also a significant factor, with patients between 10 and 12 years old less likely to attempt suicide than older patients. In addition, antidepressant medications, including sertraline and escitalopram, and urine culture tests are risk factors that show relatively high importance across most prediction windows.

Fig. 2: Predictor importance over all prediction windows.
figure 2

Importance of predictors of which summation of frequencies over all prediction windows is no less than 0.5. Predictor importance is measured by frequency that specific predictor is enrolled by the sequential forward selection procedure. Each predictor is shown with its associated type in square brackets embedded as D demographics, I International Classification of Diseases, Tenth Revision (ICD-10) diagnostic codes, M medication, T Lab Test. An asterisk concatenating two variables indicates interaction predictor. In addition, each Lab Test is shown with its associated result in braces embedded as U unspecified, H high, A abnormal.

To aid in the interpretation of factors having different impacts across short- and longer-term time windows, we grouped the selected predictors into seven categories and calculated the contribution coefficient for each category in identifying suicide risk among both female and male patients (see Fig. 3). Details of each predictor category are listed in Supplementary Table 12. The radar charts presented in Fig. 3 show that, in general, demographics, depression-related factors, and other mental health-related factors were important predictors across all prediction windows. However, as the prediction window lengthens to greater than 180 days several diagnostic factors and laboratory tests are much less useful in predicting suicide risk. In particular, when the prediction window is larger than 270 days, the effects of depression-related and other mental health-related factors became much smaller. For female patients, the effects of female-specific predictors, i.e., pregnancy-related factors, vanish when prediction window is larger than 180 days. Supplementary Figure 5 reveals that the information available for estimating suicide risk, as reflected in the percent of patients in each predictor category having a particular characteristic, declines substantially over time.

Fig. 3: Categorized predictor contribution over all prediction windows.
figure 3

The selected predictors were grouped into seven categories: demographics, depression-related factors (including both diagnoses and medications), other mental health-related factors, routine tests, drug tests, pregnancy-related factors (among females), and other factors. For each prediction window, the value of each predictor category was the normalized cumulative predictor contribution coefficient derived by logistic regression analysis on female patients (red) or male patients (blue), respectively.

Discussion

Accurate risk prediction plays an important role in the intervention of suicide attempts among children and adolescents. The traditional clinical risk assessment tools have been demonstrated to be not sufficiently accurate to identify high-risk patients14,15,16,17. Recently machine learning approaches have been applied to EHR data for the prediction of suicide risk in either adult populations18,19,20,21,22 or children and adolescent population30. Although the previous studies have archived advanced predictive performances compared to the traditional methods, there remains a need for a more comprehensive risk prediction models that utilize clinical data available to produce quantifiable risk scores that can be applied to estimate the patient’s risk level.

In this analysis we have shown that a combination of patient demographic characteristics, diagnoses, procedures, medications, and laboratory tests can be used to construct accurate machine learning models predicting the risk of suicide attempts among pediatric patients receiving inpatient and outpatient care in a children’s medical center. This distinguishes our analysis from virtually all other efforts at suicide risk prediction among both children and adolescents, which have typically had risk horizons of several years to maximize the cases available for analysis11. Notably, our models demonstrated good performance over very short time windows, indicating that the detection of short-term risk of suicidal behavior in this population is attainable. In addition, the proposed models accounted for improvements in both short- and long-term prediction, compared to the traditional logistic regression with L1 regularization. The PPVs observed from the proposed models constitute a 5- to 10-fold improvement in suicide attempt prediction compared to the base rate (see Supplementary Table 13), which is superior to that reported in a similar study by Barak-Corren et al.19.

Another major finding in this analysis (see Supplementary Figs. 3 and 4, and Table 2) is that time matters how well the models perform. The models performed well in time windows ranging between 7 and 180 days, with declining performance observed 9 months and 1 year prior to the attempt. While this is likely due to the fact that, major identified risk factors, such as depression24,38,39, other mental disorders27,40,41,42, and substance abuse25,43,44,45 are proximate risks for suicidal behavior; on the other hand, it is also due to the volume and content of the information available in varying time windows. Supplementary Figure 5 shows that percent of patients whose medical records contained evidence of the identified risk factors falls precipitously over time, indicating that there is a dearth of information with which to construct an accurate suicide risk model among pediatric patients as the time from the previous medical encounter increases beyond 6 months.

In addition, time also matters the attendant risk factors (see Figs. 2 and 3), i.e., contributions of the identified predictors to risk detection vary across prediction windows. In particular, as shown in Figs. 2 and 3, except for demographics, depression-, and other mental disorder-related factors, most predictors shift their importance in risk prediction across prediction windows. This validated the previous finding that majority of the risk factors don’t continuously contribute to the suicide risk30,46. Even though identification of the concrete role of an individual risk factor over time is difficult, we did detect predictor importance pattern reflecting short- and long-term risk of suicide behavior. Our produced risk factor panels, as shown in Fig. 3, leads to the potential of assisting the clinical practitioners in assessing risk levels of the patients. First, female’s short-term risk factors (0- to 90-day prediction windows) come from a broader spectrum, as with a higher area within the curve of each radar chart. The important short-term risk factors of female include demographics, depression-, and other mental disorder-related factors. In contrast, the most important risk factors of male are mental disorder-related factors. The findings suggest that, when estimating short-term suicide risk, risk factors of female and male populations could be emphasized differently. In addition, for both female and male, impacts of the non-demographic risk factors for long-term risk estimation (270- and 365-day prediction windows) decrease due to the dearth of information (Supplementary Fig. 5). Besides, depression- and other mental disorder-related factors are also important in predicting long-term risk.

Strengths and limitations

This study provides a number of critical insights to inform clinical practice. First, we have shown that information that is routinely collected in clinical encounters and maintained in structured clinical records can be used to create accurate predictive models of the risk of suicidal behavior among children and adolescents. Nothing drastically novel was observed among the factors emerging as significant predictors of suicide risk, which is a good thing: it means that the information needed to identify at risk patients is readily available and just requires a mechanism to incorporate it into clinical care. Second, not only do we find that short-term risk of suicidal behavior can be detected, but that longer periods between clinical encounters results in less accurate prediction of suicide risk. This indicates that high-risk patients, whether identified through risk algorithms or by clinical history (e.g., a prior attempt), would benefit from ongoing clinical monitoring.

The present study has several limitations. First, the data are restricted to a single clinical setting with a limited number of suicidal events, potentially limiting both its power and generalizability. Among the 41,721 patients eligible for analysis, only 180 (0.43%) are cases with records of suicide attempt. We did not introduce specific strategies to address the positive-negative imbalance of suicide attempt. One possible consequence of this limitation is that certain risk factors, which are associated with suicide risk but appear infrequently in this patient population, would not be identified. This highlights the need of methods addressing imbalanced clinical data analysis47,48. Second, data for this analysis were collected from a single institution, i.e., CCMC EHR database. This may lead to the possibility that patients identified as negative subjects due to an absence of suicide records actually have been treated for suicide attempts at other institutions. In addition, drawing training and testing data from the same dataset also downgrades the power and generalizability of our models. Third, mining the text of clinical notes has been demonstrated to enrich the predictive models of suicide attempts21,22,49, while in this analysis we did not have access to patients’ clinical notes. Future work may incorporate clinical notes to combine with structured EHR data to enhance our predictive models, but it should be noted that de-identification in clinical text is more complex than that in the structured EHR data50,51 and hence needs more attention. Finally, among the 41,721 eligible patients, 19,941 (47.80%) received health insurance through Medicaid, which is slightly higher than the national average for Medicaid coverage among children (38%)52. This may lead to bias and potentially limit the generalizability of our findings for commercially insured patients.

Conclusions

Our study demonstrated the feasibility of creating predictive models of suicide risk of children and adolescents by using demographics, comorbidity diagnosis codes, laboratory test results, and medications from clinical records. Such models showed good predictive performances for estimation of short-term and long-term risks and identified significant predictors which may assist in clinical practices.