Development of a suicide index model in general adolescents using the South Korea 2012–2016 national representative survey data

Suicide is a leading cause of death among adolescents and a major public health concern. Here we developed a risk stratification model for adolescent suicide attempts using sociodemographic characteristics, risk behaviours and psychological variables. Participants were 247,222 subjects in the Korea Youth Risk Behavior Web-based Survey (KYRBS). We developed a suicide index based on the suicide risk estimated in the generalized linear model and proposed the risk stratification model using the R language to measure the probability of suicide attempt among adolescents. Among the study population, the annual rate of suicide attempt was approximately 4%. The model provided good prediction for suicide attempt (AUC = 0.85). The important univariate risk factors for the outcome were dimensional measures of age, sex, breakfast consumption, experience of violence, sleep duration, perceived stress, feeling of sadness, current cigarette smoking, current alcohol drinking, perceived general health, perceived academic record, household economic status and living with biological or adoptive parents. Our suicide index model allowed the identification of adolescents who are at a high risk for suicide. This tool may promote the prevention of adolescent suicide and can be particularly useful in everyday settings where it is difficult to contact mental health professionals immediately.

Scientific RepoRts | (2019) 9:1846 | https://doi.org/10.1038/s41598-019-38886-z overall combined effect of these risk factors needs to be considered in order for these to be applicable to the actual clinical setting or public health site.
A recent large-sample study on adults suggested a prediction model for suicide death using Cox regression, support vector machine and deep learning 20 . However, there remains a great lack of research on the process of selecting risk factors, the respective impact of each factor and the overall impact of these factors by a complex mechanism. Furthermore, there is no predictive model for suicide attempt in adolescents and most of the previous studies on adolescent suicide attempt focused largely on psychiatric patients or biased samples. The present study proposed a risk stratification model for adolescent suicide attempts using community-based national representative samples to collect data, including sociodemographic variables, risk behaviours and psychological variables that are readily available in everyday life.
Methods study population and source of data. The present study was performed using data obtained from the 2012-2016 Korea Youth Risk Behaviour Web-based Survey (KYRBS), which was established in 2005 by the Centers for Disease Control and Prevention in South Korea and is an ongoing annual nationwide cross-sectional survey that uses a stratified multi-stage cluster strategy among middle-and high-school students 21 . The KYRBS, which assessed the prevalence of health risk behaviours among adolescents, contains more than 100 questions that are divided into multiple sections on sociodemographic characteristics, health-related behaviours and mental and physical health 22 . After the survey had been fully explained, the participants were provided written informed consent to participate in the survey and provided with identification numbers. All participants were guaranteed anonymity before being asked to complete an online self-reported questionnaire and completed the anonymous self-administered web-based questionnaire in a school computer room. All data used in this study have been completely anonymized before accession and were analyzed anonymously. The  procedures and statistical analysis. Considering the most important risk factor correlated with adolescent suicide is a previous suicide attempt 14 , we set suicide attempt as an outcome variable of the suicide index model. Suicide attempt as the outcome variable was defined as a positive response to the question 'In the past year, have you ever attempted suicide?' . The subjects were asked to respond with either (1) no, I never attempted suicide or (2) yes, I have attempted suicide.
For the suicide index model, we first used a logistic regression model with several predictor variables before selecting the independent variables by two processes. As a first step, a psychiatrist outlined the candidate variables that could influence suicide attempt. Through this first step, we screened from a literature-based search the following covariates that were previously demonstrated to be related with the risk of suicide attempt: age, sex, breakfast consumption, experience of violence, sleep duration, perceived stress, feelings of sadness, current cigarette smoking, current alcohol drinking, chronic allergic disease, perceived health status, perceived academic record, residential area, household economic status, paternal/maternal education level and living with biological or adoptive parent. The detailed information about the references in the literature-based search for variables was described in Supplementary Table 1. As a second step, a computer scientist determined the input variables for the final suicide index model through a statistical method. In this step, we performed binary logistic regression between the candidate variables and suicide attempt in order to select the covariates that were significantly related with the suicide attempt. These accepted variables from binary analysis were re-evaluated by backward stepwise logistic regression.
Statistical analysis was performed using the Statistical Package for the Social Science (SPSS ver. 20.0; IBM Corp., Armonk, NY). Data distribution was assessed to be normal, thereby, allowing the use of the Student's t-test, which is a parametric test, for continuous variables, such as age and sleep duration. For categorical variables, the Chi-square test was used to compare the frequencies between groups. P < 0.05 was considered statistically significant.
We determined the 2012-2015 KYRBS and 2016 KYRBS data as the training and validation datasets, respectively. We determined a generalised linear model (GLM) for classification and to come up with the probability of suicide attempt among adolescents. The final GLM model was constructed from the training dataset and was validated through the validation dataset.
We measured the area under the receiver operating characteristic (ROC) curve and the F-measure of the test dataset to assess the performance of the final model, which was graphically illustrated as the ROC curve and F-score plot, respectively. R language (R packages, ver. 3.4.1) was used for constructing the GLM model and iteratively reweighted least squares was used for optimising the parameters.

Results
The general characteristics of the training and testing datasets are presented in Tables 1 and 2, respectively. The mean age was younger in the suicide group than in the non-suicide group in both the training dataset (14.80 years vs. 15.15 years; P < 0.001) and testing dataset (14.95 years vs. 15.19 years; P < 0.001). In both the training and testing datasets, adolescent women were more likely to attempt suicide than did men; less consumption of breakfast, more experience of violence, less sleep duration, more perceived stress, more cigarette smoking and Scientific RepoRts | (2019) 9:1846 | https://doi.org/10.1038/s41598-019-38886-z more alcohol drinking were significantly observed in the suicide group, compared with those in the non-suicide group. In both the training and testing datasets, compared with the non-suicide group, the suicide group had more number of participants with ≥2 chronic allergic diseases, who perceived health status as poor and academic record as low, who lived in the rural area, with low household economic status and who lived with one parent or others who were not their parents. Table 2 shows the results of univariate and multivariate logistic regression analyses of the covariates for suicide attempt. On univariate logistic regression, all the selected variables that were demonstrated in literature to be related with the suicide had significant results and were included in the backward stepwise logistic regression (Table 3). In model 1, the number of chronic allergic diseases, residential area and parental education level had non-significant associations with suicide attempt; the other variables were entered into the next stepwise logistic regression for model 2. Finally, 13 variables were determined as input features for the suicide risk stratification model ( Table 3).
The coefficients of the generalised linear model that included the 13 input features are presented in Table 4. The top three coefficients that were positively related with an increased suicide attempt rate were feeling of sadness, experience of violence and perceived stress. In addition, the bottom three coefficients that were negatively related with suicide attempt were age, breakfast consumption and sleep duration.
As suggested by previous studies 23, 24 , the suicide risk concentration was analyzed by the tier of predicted probability and OR between suicidal death and the tier of the calculated probabilities (Table 5). In group 1, the expected and observed suicide attempt ratios were 0.5% and 7.0%, respectively. Considering the bottom 10% of the predicted probability as the reference group, the top 0.5% of the predicted probability had an OR of 400. Further, the percentages of suicidal death gradually increased to the top of each group, except for groups 2 and 3. This result indicated the feasibility of risk-stratified preventive interventions using the tiers of suicidal death predicted probability calculated by our model. From these parameters, we obtained the suicide index (Appendix), which was a score between 0 and 1 to indicate the probability of a suicide attempt. Assessment of the performance of this model from the testing dataset yielded an area under the receiver operating curve (AUC) of 0.85 ( Supplementary Fig. 1). Furthermore, the cut-off for the suicide index was determined as 0.12 or 12% based on the maximum value of the f-measure (Fig. 1).

Discussion
Using the annual national representative data from the 2013 to 2017 KYRBWS, we proposed the use of a suicide index as a predictive model for adolescent suicide attempt. We calculated the combined effects of the risk factors, rather than simply measuring the separate risk for suicide attempt with each factor. This model provided relevant discrimination between those who had suicide attempt and those who did not over the past year.
The AUC of 0.85 obtained for the suicide index in this study can be compared with that of the other prediction models that were previously used in the field of psychiatry [e.g. suicide deaths of adults (AUC 0.68), new-onset psychosis (AUC 0.79) and bipolar disorder (AUC 0.76)] 20,25,26 . The 13 simple predictors are easily to obtain in a primary care or school health setting. Therefore, this simple risk calculator can be easily used every day and can provide a tool for not only clinicians, but also for teachers, family members and friends to rapidly identify those  Although risk calculators have been widely used in other medical fields, development of prediction models in psychiatry had some limitations. A number of predictive models have been identified in a recent review article, but most of these models were for depression and psychosis 27 . In addition, a small of number of suicide prediction studies were conducted based on mental pathology, such as depression and behavioural disorders 2,20 . On the other hand, this study highlighted the elaborate logical developments, and the generalisability of this model can precisely explain the complex mechanism of adolescent suicides.
In this study, combination of the top three suicide attempt-related variables (i.e. feeling of sadness, experience of violence and perceived stress) generated an AUC of 0.82. Addition of the other variables selected from literature and by statistical validation improved the AUC to 0.85. These results indicated that public information, which can be readily obtained from questionnaires but are usually ignored, could help predict the risk for suicidal behaviours in adolescents.
Depressive mood is one of the most important risk factors of adolescent suicide. A study on 1,176 non-suicidal subjects and 109 subjects with previous suicide attempts or suicidal ideation, showed that depressive disorder was significantly associated with the risk of suicide (odds ratio, 11.4) 28 . The present study investigated the risk of suicide attempt by replacing the diagnosis of depression with information on feelings of sadness, which can be more readily collected in everyday life, and generated a similar risk (odds ratio, 6.294). Further, in the final model of this study, the single variable of sadness feelings generated an AUC of 0.773, indicating that, by itself, this variable was useful enough to determine the risk of suicidal behavior.
Previous studies have demonstrated that compared with those who had not, adolescents who had been victims of violence during the past year were more likely to experience negative mental health outcomes, including suicidal ideation (odds ratio, 5.41) 29 . However, there were substantial differences among the studies in terms of response rate, design, confounders that were controlled for and the use of self-reports of victim of violence 30 . In the suicide index model in this study, the association of the variable victim of violence with suicide attempt (odds ratio, 3.52 and AUC, 0.63) further confirmed that this variable is an important risk factor for adolescent suicide. Consistent with a previous study 31 , the current study also demonstrated that subjects who had a high stress level had a higher probability of suicide attempt, compared with those who had low stress level (odds ratio, 2.392).
The main task of a risk prediction index model is to identify an optimal set of risk factors that best predicts the outcome. Many of the variables, such as age, sex and substance use, which predicted adolescent suicide attempt in this study were also reported by previous studies meta-analysis and review articles. However, some factors, including chronic allergic disease, residential area and paternal education level, which were found on literature to be risk factors for suicide attempt were not included in the suicide index, because these variables were not associated with the risk for suicide after adjusting the covariates.
Considering that a past suicide attempt is one of the most important risk factors for future suicides, our model could be used to screen for high-risk groups. It has been suggested that some suicide attempts may be preventable, if the problem of under-treatment can be overcome by the sensitive insights of health professionals and the public 32 . In this respect, building an individualised risk calculator may be an important practical tool for clinicians and public health professionals to assess the suicide risk in adolescents who had a suicide attempt in the past year and to plan further evaluations and necessary early interventions. Changes in the risk score can be monitored  over time to provide a risk trajectory for suicidal adolescents and to evaluate the effectiveness of an intervention program. In future studies, we recommend development of an adjunctive risk calculator based on this algorithm using web-based, smartphone apps; nomograms or score chart 33 .
The key strength of the model generated in this study, which was based on a large national representative sample, was that it can predict the possibility of a suicide attempt using basic characteristics and demographic factors. The participants of this study had a very high response rate and had almost no missing data; these may have been due to the systematic support of national organizations and the technical advantages of online survey methods 34 . National surveillance systems for monitoring adolescent health risk behaviours have been implemented in many   countries, and the reliability of the KYRBWS questionnaire had been validated for over time in a number of previous studies 35 . Moreover, we increased the predicting power and maximised the statistical value by the high performance of the risk stratification model and repeating validation. This study included participants from a general adolescent population, rather than from clinical patients; therefore, the results of this study could be universally applicable to the general adolescent population. The results of this study provided a practical tool that consists of a simple input of variables that can be easily obtained from daily life of the people around. The present findings should be considered in light of some limitations. First, we used a cross-sectional design and the risk for future suicide was estimated indirectly from a past suicide attempt. Future works should investigate the risk for suicide attempt using longitudinally designed data and make a risk calculator that can directly predict future suicide attempts in adolescents. Second, our risk model was purposefully derived from a general population of adolescents, and the study lacked information on other potential covariates, such as clinical diagnosis, psychiatric symptoms and psychiatric/suicide family history, which were not available in the KYRBWS data. Third, although we had an adequate number of participants to build a risk model to predict suicide attempts, this study did not include data on adolescents outside of the school, which accounts for approximately 1.8% of those aged 12-17 years in Korea. Therefore, there may be limitations in applying the present risk models to all adolescents in the society. Fourth, there is a possibility that those respondents would be answering questions about their mental state and other characteristics after a suicide attempt. Therefore, the indicated level of psychological distress might partly result from the suicide attempt, rather than be a contributing factor. Further research is required to exclude the possibility of increasing the statistical association between such covariates and the dependent variables, further clearing the causality. Such research could involve prospective models, which are beyond the data and design of the current study. In a similar context, we hope the present study may be a cornerstone for further extended research exploring several types of potentially relevant covariates, such as clinical records, as previously performed in studies 20 .
To our best knowledge, this was the first study that used circular logic based on a large sample to create a prediction model for adolescent suicide attempt. We built our risk stratification model using the results from a recent meta-analysis 15 and found that the combination of basic characteristics in daily life provided clinically relevant discrimination between those who had suicide attempts in the past year and those who have not. Replication of these findings and longitudinal research might be warranted, in order for the risk calculator to be used confidently by clinicians, teachers, friends and family members. The clinical application of the model presented in this study includes the development of websites or applications that can apply weights to this risk calculator. Furthermore, the provision of appropriate guidelines for screened suicide risk groups is needed for further studies. Nevertheless, this risk calculator can be a practical tool for assessing the risk for suicidal behaviour and for early interventions in adolescents with high suicidal risk. where μ which indicates the mean of the distribution of the suicide index; Risk represents the linear predictor that is a weighted sum of the covariates (x i ); and f represents an activation function, the inverse of which is the link function showing the relationship between the linear predictor and the suicide index. In an example case (Table 5) The suggested probability of the person's suicide was reported as 2.6, which belongs to the high-risk group (μ > 0.12).