Introduction

Adolescent suicide is a major public health issue and accounts for an estimated 6% of all causes of death among young people worldwide1. In Korea, the rate of death due to suicide is significantly higher compared with that in other developed countries2. Moreover, the most common cause of death in Korean adolescents is suicide; however, among adolescents in other developed countries, road traffic accidents are the most common cause of death3,4. In fact, the rate of suicide attempts among Korean adolescents has continuously increased; in 2016, the suicide rate in individuals aged 12–18 years was 7.9 per 100,000, which was twice as that in 20015.

Suicide attempt is the strongest known clinical predictor of completed suicide6 and is the main risk factor for continued suicidal behaviour and future death due to suicide in adolescents7. The ratio of completed suicide to suicide attempt is approximately 1:108, and approximately 10–15% of those who attempted suicide finally died due to it9. Therefore, early detection of the adolescents who have attempted suicide and identifying them as a high risk group of suicide is extremely important. Because adolescents are more impulsive and emotionally unstable compared with adults, they are more likely to unexpectedly attempt suicide; this makes it difficult to detect adolescent suicide early10. Moreover, considering that many adolescents often hide or under report their mental health problems or disclose concerns on some parts of the situation only11,12, identifying their suicidal behaviours becomes more difficult. According to a large survey, 24–36% of adolescents reported having suicidal thoughts in the past year, but very few went on to seek further help for their problems13.

Several researches have demonstrated various risk factors for adolescent suicide; these include substance use, childhood abuse and parental psychopathology14,15. Other significant factors include problems in relationships with family or friends16, access to the means of self-harm17 and personality factors18,19. However, considering the complex mechanism of suicide occurrence, the individual application of each risk factor alone may be biased or inaccurate to predict and manage the suicidal behaviours of patients in the real clinical setting. Therefore, the overall combined effect of these risk factors needs to be considered in order for these to be applicable to the actual clinical setting or public health site.

A recent large-sample study on adults suggested a prediction model for suicide death using Cox regression, support vector machine and deep learning20. However, there remains a great lack of research on the process of selecting risk factors, the respective impact of each factor and the overall impact of these factors by a complex mechanism. Furthermore, there is no predictive model for suicide attempt in adolescents and most of the previous studies on adolescent suicide attempt focused largely on psychiatric patients or biased samples. The present study proposed a risk stratification model for adolescent suicide attempts using community-based national representative samples to collect data, including sociodemographic variables, risk behaviours and psychological variables that are readily available in everyday life.

Methods

Study population and source of data

The present study was performed using data obtained from the 2012–2016 Korea Youth Risk Behaviour Web-based Survey (KYRBS), which was established in 2005 by the Centers for Disease Control and Prevention in South Korea and is an ongoing annual nationwide cross-sectional survey that uses a stratified multi-stage cluster strategy among middle- and high-school students21. The KYRBS, which assessed the prevalence of health risk behaviours among adolescents, contains more than 100 questions that are divided into multiple sections on sociodemographic characteristics, health-related behaviours and mental and physical health22. After the survey had been fully explained, the participants were provided written informed consent to participate in the survey and provided with identification numbers. All participants were guaranteed anonymity before being asked to complete an online self-reported questionnaire and completed the anonymous self-administered web-based questionnaire in a school computer room. All data used in this study have been completely anonymized before accession and were analyzed anonymously. The KYRBWS was approved by the Institutional Review Board of the Korea Centers for Disease Control and Prevention. (Statistics Korea, approval No. 11758). All methods in the study were carried out with relevant guidelines and regulations.

The samples in this study were derived from the following datasets: 8th KYRBWS 2012 (N = 76,980, 96.4% response rate); 9th KYRBWS 2013 (N = 75,149, 96.4% response rate); 10th KYRBWS 2014 (N = 75,149, 97.2% response rate); 11th KYRBWS 2015 (N = 70,362, 96.7% response rate) and 12th KYRBWS 2016 (N = 67,983, 96.4% response rate). We excluded participants with incomplete information on self-questionnaires. Finally, we included 247,222 subjects for the present analysis and for the development of the suicide index model.

Procedures and statistical analysis

Considering the most important risk factor correlated with adolescent suicide is a previous suicide attempt14, we set suicide attempt as an outcome variable of the suicide index model. Suicide attempt as the outcome variable was defined as a positive response to the question ‘In the past year, have you ever attempted suicide?’. The subjects were asked to respond with either (1) no, I never attempted suicide or (2) yes, I have attempted suicide.

For the suicide index model, we first used a logistic regression model with several predictor variables before selecting the independent variables by two processes. As a first step, a psychiatrist outlined the candidate variables that could influence suicide attempt. Through this first step, we screened from a literature-based search the following covariates that were previously demonstrated to be related with the risk of suicide attempt: age, sex, breakfast consumption, experience of violence, sleep duration, perceived stress, feelings of sadness, current cigarette smoking, current alcohol drinking, chronic allergic disease, perceived health status, perceived academic record, residential area, household economic status, paternal/maternal education level and living with biological or adoptive parent. The detailed information about the references in the literature-based search for variables was described in Supplementary Table 1. As a second step, a computer scientist determined the input variables for the final suicide index model through a statistical method. In this step, we performed binary logistic regression between the candidate variables and suicide attempt in order to select the covariates that were significantly related with the suicide attempt. These accepted variables from binary analysis were re-evaluated by backward stepwise logistic regression.

Statistical analysis was performed using the Statistical Package for the Social Science (SPSS ver. 20.0; IBM Corp., Armonk, NY). Data distribution was assessed to be normal, thereby, allowing the use of the Student’s t-test, which is a parametric test, for continuous variables, such as age and sleep duration. For categorical variables, the Chi-square test was used to compare the frequencies between groups. P < 0.05 was considered statistically significant.

We determined the 2012–2015 KYRBS and 2016 KYRBS data as the training and validation datasets, respectively. We determined a generalised linear model (GLM) for classification and to come up with the probability of suicide attempt among adolescents. The final GLM model was constructed from the training dataset and was validated through the validation dataset.

We measured the area under the receiver operating characteristic (ROC) curve and the F-measure of the test dataset to assess the performance of the final model, which was graphically illustrated as the ROC curve and F-score plot, respectively. R language (R packages, ver. 3.4.1) was used for constructing the GLM model and iteratively reweighted least squares was used for optimising the parameters.

Results

The general characteristics of the training and testing datasets are presented in Tables 1 and 2, respectively. The mean age was younger in the suicide group than in the non-suicide group in both the training dataset (14.80 years vs. 15.15 years; P < 0.001) and testing dataset (14.95 years vs. 15.19 years; P < 0.001). In both the training and testing datasets, adolescent women were more likely to attempt suicide than did men; less consumption of breakfast, more experience of violence, less sleep duration, more perceived stress, more cigarette smoking and more alcohol drinking were significantly observed in the suicide group, compared with those in the non-suicide group. In both the training and testing datasets, compared with the non-suicide group, the suicide group had more number of participants with ≥2 chronic allergic diseases, who perceived health status as poor and academic record as low, who lived in the rural area, with low household economic status and who lived with one parent or others who were not their parents.

Table 1 Baseline characteristics of the training dataset.
Table 2 Baseline characteristics of the training and testing datasets.

Table 2 shows the results of univariate and multivariate logistic regression analyses of the covariates for suicide attempt. On univariate logistic regression, all the selected variables that were demonstrated in literature to be related with the suicide had significant results and were included in the backward stepwise logistic regression (Table 3). In model 1, the number of chronic allergic diseases, residential area and parental education level had non-significant associations with suicide attempt; the other variables were entered into the next stepwise logistic regression for model 2. Finally, 13 variables were determined as input features for the suicide risk stratification model (Table 3).

Table 3 Backward-stepwise logistic regression for selecting the input variables.

The coefficients of the generalised linear model that included the 13 input features are presented in Table 4. The top three coefficients that were positively related with an increased suicide attempt rate were feeling of sadness, experience of violence and perceived stress. In addition, the bottom three coefficients that were negatively related with suicide attempt were age, breakfast consumption and sleep duration.

Table 4 Regression coefficients of the final model for predicting suicide attempt.

As suggested by previous studies23,24, the suicide risk concentration was analyzed by the tier of predicted probability and OR between suicidal death and the tier of the calculated probabilities (Table 5). In group 1, the expected and observed suicide attempt ratios were 0.5% and 7.0%, respectively. Considering the bottom 10% of the predicted probability as the reference group, the top 0.5% of the predicted probability had an OR of 400. Further, the percentages of suicidal death gradually increased to the top of each group, except for groups 2 and 3. This result indicated the feasibility of risk-stratified preventive interventions using the tiers of suicidal death predicted probability calculated by our model.

Table 5 Suicide risk concentration by tier of predicted probability calculated by the generalised linear model (n = 42,814, testing dataset).

From these parameters, we obtained the suicide index (Appendix), which was a score between 0 and 1 to indicate the probability of a suicide attempt. Assessment of the performance of this model from the testing dataset yielded an area under the receiver operating curve (AUC) of 0.85 (Supplementary Fig. 1). Furthermore, the cut-off for the suicide index was determined as 0.12 or 12% based on the maximum value of the f-measure (Fig. 1).

Figure 1
figure 1

F-score curve of test dataset (KYRBS 2016). The suicide model generated the maximum F-score of 0.23 for participants attempting suicide.

Discussion

Using the annual national representative data from the 2013 to 2017 KYRBWS, we proposed the use of a suicide index as a predictive model for adolescent suicide attempt. We calculated the combined effects of the risk factors, rather than simply measuring the separate risk for suicide attempt with each factor. This model provided relevant discrimination between those who had suicide attempt and those who did not over the past year.

The AUC of 0.85 obtained for the suicide index in this study can be compared with that of the other prediction models that were previously used in the field of psychiatry [e.g. suicide deaths of adults (AUC 0.68), new-onset psychosis (AUC 0.79) and bipolar disorder (AUC 0.76)]20,25,26. The 13 simple predictors are easily to obtain in a primary care or school health setting. Therefore, this simple risk calculator can be easily used every day and can provide a tool for not only clinicians, but also for teachers, family members and friends to rapidly identify those who have risks for suicidal behaviour. Such tools can be particularly useful in settings, such as homes and schools, where it is difficult to contact mental health professionals immediately.

Although risk calculators have been widely used in other medical fields, development of prediction models in psychiatry had some limitations. A number of predictive models have been identified in a recent review article, but most of these models were for depression and psychosis27. In addition, a small of number of suicide prediction studies were conducted based on mental pathology, such as depression and behavioural disorders2,20. On the other hand, this study highlighted the elaborate logical developments, and the generalisability of this model can precisely explain the complex mechanism of adolescent suicides.

In this study, combination of the top three suicide attempt-related variables (i.e. feeling of sadness, experience of violence and perceived stress) generated an AUC of 0.82. Addition of the other variables selected from literature and by statistical validation improved the AUC to 0.85. These results indicated that public information, which can be readily obtained from questionnaires but are usually ignored, could help predict the risk for suicidal behaviours in adolescents.

Depressive mood is one of the most important risk factors of adolescent suicide. A study on 1,176 non-suicidal subjects and 109 subjects with previous suicide attempts or suicidal ideation, showed that depressive disorder was significantly associated with the risk of suicide (odds ratio, 11.4)28. The present study investigated the risk of suicide attempt by replacing the diagnosis of depression with information on feelings of sadness, which can be more readily collected in everyday life, and generated a similar risk (odds ratio, 6.294). Further, in the final model of this study, the single variable of sadness feelings generated an AUC of 0.773, indicating that, by itself, this variable was useful enough to determine the risk of suicidal behavior.

Previous studies have demonstrated that compared with those who had not, adolescents who had been victims of violence during the past year were more likely to experience negative mental health outcomes, including suicidal ideation (odds ratio, 5.41)29. However, there were substantial differences among the studies in terms of response rate, design, confounders that were controlled for and the use of self-reports of victim of violence30. In the suicide index model in this study, the association of the variable victim of violence with suicide attempt (odds ratio, 3.52 and AUC, 0.63) further confirmed that this variable is an important risk factor for adolescent suicide. Consistent with a previous study31, the current study also demonstrated that subjects who had a high stress level had a higher probability of suicide attempt, compared with those who had low stress level (odds ratio, 2.392).

The main task of a risk prediction index model is to identify an optimal set of risk factors that best predicts the outcome. Many of the variables, such as age, sex and substance use, which predicted adolescent suicide attempt in this study were also reported by previous studies meta-analysis and review articles. However, some factors, including chronic allergic disease, residential area and paternal education level, which were found on literature to be risk factors for suicide attempt were not included in the suicide index, because these variables were not associated with the risk for suicide after adjusting the covariates.

Considering that a past suicide attempt is one of the most important risk factors for future suicides, our model could be used to screen for high-risk groups. It has been suggested that some suicide attempts may be preventable, if the problem of under-treatment can be overcome by the sensitive insights of health professionals and the public32. In this respect, building an individualised risk calculator may be an important practical tool for clinicians and public health professionals to assess the suicide risk in adolescents who had a suicide attempt in the past year and to plan further evaluations and necessary early interventions. Changes in the risk score can be monitored over time to provide a risk trajectory for suicidal adolescents and to evaluate the effectiveness of an intervention program. In future studies, we recommend development of an adjunctive risk calculator based on this algorithm using web-based, smartphone apps; nomograms or score chart33.

The key strength of the model generated in this study, which was based on a large national representative sample, was that it can predict the possibility of a suicide attempt using basic characteristics and demographic factors. The participants of this study had a very high response rate and had almost no missing data; these may have been due to the systematic support of national organizations and the technical advantages of online survey methods34. National surveillance systems for monitoring adolescent health risk behaviours have been implemented in many countries, and the reliability of the KYRBWS questionnaire had been validated for over time in a number of previous studies35. Moreover, we increased the predicting power and maximised the statistical value by the high performance of the risk stratification model and repeating validation. This study included participants from a general adolescent population, rather than from clinical patients; therefore, the results of this study could be universally applicable to the general adolescent population. The results of this study provided a practical tool that consists of a simple input of variables that can be easily obtained from daily life of the people around.

The present findings should be considered in light of some limitations. First, we used a cross-sectional design and the risk for future suicide was estimated indirectly from a past suicide attempt. Future works should investigate the risk for suicide attempt using longitudinally designed data and make a risk calculator that can directly predict future suicide attempts in adolescents. Second, our risk model was purposefully derived from a general population of adolescents, and the study lacked information on other potential covariates, such as clinical diagnosis, psychiatric symptoms and psychiatric/suicide family history, which were not available in the KYRBWS data. Third, although we had an adequate number of participants to build a risk model to predict suicide attempts, this study did not include data on adolescents outside of the school, which accounts for approximately 1.8% of those aged 12–17 years in Korea. Therefore, there may be limitations in applying the present risk models to all adolescents in the society. Fourth, there is a possibility that those respondents would be answering questions about their mental state and other characteristics after a suicide attempt. Therefore, the indicated level of psychological distress might partly result from the suicide attempt, rather than be a contributing factor. Further research is required to exclude the possibility of increasing the statistical association between such covariates and the dependent variables, further clearing the causality. Such research could involve prospective models, which are beyond the data and design of the current study. In a similar context, we hope the present study may be a cornerstone for further extended research exploring several types of potentially relevant covariates, such as clinical records, as previously performed in studies20.

To our best knowledge, this was the first study that used circular logic based on a large sample to create a prediction model for adolescent suicide attempt. We built our risk stratification model using the results from a recent meta-analysis15 and found that the combination of basic characteristics in daily life provided clinically relevant discrimination between those who had suicide attempts in the past year and those who have not. Replication of these findings and longitudinal research might be warranted, in order for the risk calculator to be used confidently by clinicians, teachers, friends and family members. The clinical application of the model presented in this study includes the development of websites or applications that can apply weights to this risk calculator. Furthermore, the provision of appropriate guidelines for screened suicide risk groups is needed for further studies. Nevertheless, this risk calculator can be a practical tool for assessing the risk for suicidal behaviour and for early interventions in adolescents with high suicidal risk.

Case Study

Suicidal behaviour classification and an example from the GLM model

The following formula represents the GLM model:

$$Risk=\sum _{i=1}^{m}\,{w}_{i}{x}_{i}+{w}_{0}$$
$$\mu =f\,(Risk)=\frac{1}{1+\exp \,(\,-\,Risk)}$$

where μ which indicates the mean of the distribution of the suicide index; Risk represents the linear predictor that is a weighted sum of the covariates (xi); and f represents an activation function, the inverse of which is the link function showing the relationship between the linear predictor and the suicide index. In an example case (Table 5), the risk of a suicidal attempt based on the GLM model is calculated, as follows:

Example case

Variables

Parameters w i

Values x i (Score)

Age

−0.275

15

Sex

0.399

Female (2)

Breakfast consumption

−0.112

Having breakfast (1)

Experience of violence

1.261

Having experience (1)

Sleep duration

−0.086

6 hours

Perceived stress

0.872

Perceiving stress (1)

Feeling of sadness

1.840

Feeling of sadness(1)

Current cigarette smoking

0.701

Having experience (1)

Current alcohol drinking

0.409

Having experience (1)

Perceived health status

0.326

Fair/Poor (3)

Perceived academic record

0.088

Low (3)

Household economic status

0.038

Low (3)

Living with biological or adoptive parent

0.180

Living with others (3)

Bias

−6.648

 
$$\begin{array}{rcl}a & = & \sum _{i=1}^{m}\,{w}_{i}{x}_{i}+{w}_{0}\\ & = & (\,-\,0.275)\times 15+0.399\times 2+(\,-\,0.112)\times 1+(1.261)\times 1\\ & & +\,(\,-\,0.086)\times 6+0.872\times 1+1.840\times 1+0.701\times 1\\ & & +\,0.409\times 1+0.326\times 3+0.088\times 3+0.038\times 3\\ & & +\,0.018\times 3-6.648\\ & = & -3.624\end{array}$$
$$\mu =f\,(a)=\frac{1}{1+\exp \,(\,-\,3.624)}=0.025983\,\fallingdotseq \,2.6 \% $$

The suggested probability of the person’s suicide was reported as 2.6, which belongs to the high-risk group (μ > 0.12).