Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A machine learning approach for predicting suicidal thoughts and behaviours among college students


Suicidal thoughts and behaviours are prevalent among college students. Yet little is known about screening tools to identify students at higher risk. We aimed to develop a risk algorithm to identify the main predictors of suicidal thoughts and behaviours among college students within one-year of baseline assessment. We used data collected in 2013–2019 from the French i-Share cohort, a longitudinal population-based study including 5066 volunteer students. To predict suicidal thoughts and behaviours at follow-up, we used random forests models with 70 potential predictors measured at baseline, including sociodemographic and familial characteristics, mental health and substance use. Model performance was measured using the area under the receiver operating curve (AUC), sensitivity, and positive predictive value. At follow-up, 17.4% of girls and 16.8% of boys reported suicidal thoughts and behaviours. The models achieved good predictive performance: AUC, 0.8; sensitivity, 79% for girls, 81% for boys; and positive predictive value, 40% for girls and 36% for boys. Among the 70 potential predictors, four showed the highest predictive power: 12-month suicidal thoughts, trait anxiety, depression symptoms, and self-esteem. We identified a parsimonious set of mental health indicators that accurately predicted one-year suicidal thoughts and behaviours in a community sample of college students.


College students are vulnerable to mental health problems and suicidal thoughts and behaviours (STB)1,2. In a large study in eight countries the 12-month prevalence rates were 17.2% for suicidal ideation, 8.8% for suicidal planning, and 1.0%, for suicide attempt3. Factors that may contribute to the increased risk of STB in this population include the transition from high-school to university, increasing workload, increased psychosocial stress and academic pressures, and adaptation to a new environment4. Avoiding the onset or aggravation of STB requires early detection of students at risk, to help them access mental health services or having them engaged in coaching strategies5,6. However, identifying students with STB is challenging due to limited resources on campus7, and because college students may be reluctant to share information about their mental health. Effective screening would require (1) the identification of characteristics that predict STB; and (2) minimally intrusive questions integrated into short assessments easier to administer to large populations. Most previous studies of STB prediction in students have been based on logistic regression models that account for a limited number of predictors, and have provided association measures8,9. However, the identification of factors associated with STB does not necessarily imply that they could help predict future STB10,11 and for that, models specifically designed for prediction are needed. Moreover, some variables used in previous studies (e.g., psychiatric assessment)12 are impractical to assess in a large population of students as they require the expertise of a trained clinician. As pointed out in a recent paper summarizing 50 years of research on STB, further research should shift from identification of risk factors associated with STB to focus on developing predictive algorithms using machine learning methods13. Such methods enable the inclusion of several risk and protective factors, while accounting for their potential interactions14,15, which is consistent with the shared concept that STB result from complex interactions between social, psychiatric, psychological, and environmental factors16.

In this study we applied a machine learning method to develop an algorithm to predict STB in the next 12 months after baseline assessment using a large longitudinal cohort of French university students. All analyses were stratified by gender as recommended16,17,18.


Study design and participants

Our study sample comprised participants in the ongoing internet-based Students’ Health Research Enterprise (i-Share) project—a prospective population-based study on students’ health which was launched in some French universities in 2013. Students were informed about the purpose and aims of the study through flyers, communications in classes or social media. To be eligible students must be registered at a university or higher education institute, be at least 18 years of age, and be able to read and understand French. Volunteers provided an on-lined informed consent. The enrolment procedure has been previously described19.

The i-Share protocol was approved by the “Commission nationale de l’informatique et des libertés” (CNILNational Commission of Informatics and Liberties) (number: DR-2013-019), which ensures that data collection does not violate freedom, rights, or human privacy. The study follows the principles of the Declaration of Helsinki and the collection, storage and analysis of the data comply with the General Data Protection Regulation (EU GDPR). At enrolment (i.e., baseline assessment), self-administered on-line questionnaires collected sociodemographic characteristics, physical and mental health parameters, personal and familial history, living conditions, lifestyle habits, and substance use. One year later, students were invited by email to complete a follow-up questionnaire. Three reminder emails were sent at 14, 28, and 33 days following the invitation. For the present longitudinal study, we used data from a sample of students who were included in the i-Share cohort study between February 2013 and September 2019, who participated in the follow-up, and for whom data on STB were available.

Baseline information was available for 15, 667 students. These students were solicited to participate in the follow-up, and 5255 agreed to participate (33.5% response rate). At baseline, compared to the students who participated in the follow-up, the non-participants reported slightly more 12-month suicidal ideation (n = 2285, 22.0% vs. n = 1151, 21.9%; p = 0.0004) and more lifetime suicide attempts (n = 682, 6.6% vs. n = 298, 5.7%; p = 0.0004). Additionally, the non-respondents were more likely boys (n = 2963, 25.9% vs. n = 1099, 20.9%; p < 0.0001). We did not observed differences between participants and non-participants for the year of study or parental depression history (Supplementary Table 1). Among the respondents, 189 (3.6%) were excluded because they did not answer the STB-related questions.


The one-year follow-up questionnaire included questions about suicidal thoughts and suicide attempts during the last 12 months. Participants who reported having occasional or frequent suicidal thoughts and/or suicide attempts were coded as positive for STB.

We considered baseline assessments of 70 potential predictors (Supplementary Table 2). These variables included socio-demographic characteristics (e.g., age, year of study, scholarship, and accommodation type), lifestyle habits (e.g., time spent on screens and sleep quality), familial characteristics (e.g., perceived parental support, parental divorce, and parental history of depression), physical health (e.g., handicap and perceived health), and substance use (e.g., tobacco and alcohol use). Baseline characteristics also included history of diagnosed psychiatric disorders, lifetime suicide attempts and suicidal thoughts during the 12 months preceding inclusion (latter called baseline STB). We measured several mental health parameters with validated scales: depression symptoms using the 9-item Patient Health Questionnaire (PHQ-9)20; trait anxiety using the Spielberger State-Trait Anxiety Inventory (STAI-YB)21; self-esteem using the Rosenberg scale22; perceived stress using the Perceived Stress Scale (PSS-4)23; and impulsivity using the Barratt Impulsivity Scale (BIS-11)24.

Childhood adversities are not investigated in the baseline questionnaire. In order to take into account these important potential predictors in our models, a subsample of 1911 participants was administered a supplementary questionnaire adapted from the Childhood trauma questionnaire25. This questionnaire included 17 variables assessing experiences of sexual abuse, physical or psychological maltreatment, or neglect (Supplementary Table 2).

Statistical analyses

We first described the overall study sample and according to the gender. Continuous variables are expressed as mean ± standard error. Categorical variables are described as the proportion.

Prediction of one-year STB

To predict STB we used a random forests model, which is a non-parametric ensemble machine learning method applicable for both classification and regression prediction26. This technique is broadly used due to its high performance and robustness, and because it enables the use of variables independently of type and distribution27. Random forests are based on the aggregation of a set of decision trees created through recursive bootstraps of the initial sample28. In each bootstrap sample, a decision tree is created using two-third of the observations. The remaining one-third, termed the out-of-bag sample, is used to obtain an unbiased performance measure of the created algorithm. This evaluation of prediction performance yields a measure termed the out-of-bag error, which represents the overall error of the algorithm in terms of outcome prediction. The out-of-bag sample is also used to calculate the relative importance of each variable for the prediction. To this end, the value of a given variable is randomly shifted in the out-of-bag sample, and any resulting change of the out-of-bag error reflects the variable’s importance in the prediction. Finally, all individual decision trees are aggregated to create the final predictor algorithm. To carry out these analyses we used the randomForest and caret packages in SAS and R. Missing data on the predictors (2%) were handled using the R missForest algorithm29 specifically designed to deal with missing data in random forest models.


For the main analyses, the 70 potential predictors were included in the model. We then performed two secondary analyses. First, we re-estimated our models in a subsample of participants who did not report STB at baseline to better identify new cases30. Second, we re-estimated our models in the subsample including data on childhood adversity.

Evaluation of model performance

We evaluated the prediction quality of our models in the testing sample using the following performance metrics: (1) out-of-bag error, obtained using the out-of-bag sample of the training set, which represents the overall error in the prediction (ranges from 0%, indicating that no individual is correctly classified, to 100%, indicating that all individuals are correctly classified); (2) area under the curve (AUC)31, which measures the accuracy of discrimination performance represented by the predicted true positive rate against the false positive rate (ranges from 0.5, indicating prediction by chance, to 1, indicating perfect prediction); (3) sensitivity, representing the rate of actual cases (i.e. students reporting STB) identified by the algorithm; and (4) the positive predictive value, describing the proportion of algorithm-predicted cases that are actual cases. To prevent these performances to be over-fitted and to increase the generalizability of the prediction model, we estimated these indices through cross-validation. We therefore split randomly the initial dataset into 10-folds, we created the model using 9 of the 10 folds and tested on the remaining fold. We repeated this process until all the folds were used as test sets. All the values needed for the prediction, i.e. the real outcome, predicted outcome and probabilities of belonging in each class of the outcome, were calculated in each test sample and stored in an independent file; the final prediction metrics were then obtained with all the stored values and we reported the mean value of the out-of-bag error across the 10 models.

All models were carried out in accordance with the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement for prediction model development32.


Description of the sample

The final study population comprised 5066 students, including 4005 (79.1%) girls and 1061 (20.9%) boys. Of the 5066 participants, 874 (17.3%) students reported experiencing STB in the past 12 months (17.1% reported suicidal ideation and 0.7% suicide attempts). The STB prevalence did not significantly differ between girls (n = 696; 17.4%) and boys (n = 178; 16.8%). Among the 874 students who reported STB, 61.3% (n = 536) reported 12-month suicidal thoughts (with or without history of lifetime suicide attempts), and 14.6% (n = 128) reported a lifetime suicide attempt at baseline.

The main baseline characteristics did not significantly differ according to gender (Table 1). The mean participant age was 20.7 years (SD 2.6). Over one-third of the sample (n = 1932; 38.1%) was in their first year of university education. The majority of the students lived alone in an apartment (n = 1544; 30.5%) or at their parents’ home (n = 1495; 29.5%), and 17.5% (n = 884) described their current economic situation as difficult or very difficult. The most prevalent indicators of childhood adversity were maternal depression history (n = 1536; 30.3%) and parental divorce or separation (n = 1484; 29.3%). At baseline, one in five students reported 12-month suicidal thoughts (n = 1072; 21.2%) and 5.4% (n = 275) reported a lifetime suicide attempt.

Table 1 Sample characteristics at baseline.

Prediction of suicidal thoughts and behaviours

Among girls, the predictive model had an out-of-bag error of 24.6%, suggesting the overall misclassification of a quarter of the female participants. Among boys, the out-of-bag error was 28.1%. The model showed an AUC of 0.84 (95% CI 0.83–0.86) for girls, indicating a discrimination 68% better than chance, and 0.82 (95% CI 0.79–0.86) for boys (Fig. 1). The sensitivity was 0.79 for girls and 0.81 for boys, indicating that the model correctly predicted 79–81% of the actual cases (Table 2). The predictive positive values were 0.40 and 0.36 for girls and boys, respectively, meaning that 40% and 36% of predicted cases were actually cases. Analysis of the variables’ importance for the prediction, as measured by the mean decrease in accuracy, revealed that the following four variables were the most predictive in both girls and boys: 12-month suicidal thoughts at baseline, self-esteem, trait anxiety, and depression symptoms (Fig. 2).

Figure 1
figure 1

Area-under-the-curve plots of the sensitivity and specificity of random forests predictive models for suicidal thoughts and behaviours, stratified by gender.

Table 2 Predictive performances metrics.
Figure 2
figure 2

Ranking of the importance of baseline variables in a random forests model for predicting one-year suicidal thoughts and behaviours, stratified by gender.

Secondary analyses

We repeated these analyses in a subsample of participants who did not report STB at baseline, and found that the predictive performances were lower than in the main analyses. For girls (n = 3114) and boys (n = 832), respectively, the AUC was 0.72 and 0.74, and the sensitivity was 0.63 and 0.62. Variable importance for the prediction was different between girls and boys, with the following main predictive variables for girls: depression symptoms, self-esteem, trait anxiety, and academic stress (Fig. 3). For boys, we found one main predictor i.e. self-esteem followed by trait anxiety (Fig. 3). We then fitted our random forests models among the 1497 girls and 414 boys who answered the childhood adversity questionnaire. The predictive performances were similar for girls (AUC 0.82; sensitivity of 79%) and boys (AUC 0.75; sensitivity of 76%). In girls, the four main predictive variables were baseline suicidal thoughts, depression symptoms, self-esteem, and trait anxiety. In boys, the four top predictors were 12-month suicidal thoughts, perceived stress, trait anxiety and self-esteem (Supplementary Figure 1). Thus, in both genders, childhood adversity variables did not contribute to STB prediction.

Figure 3
figure 3

Ranking of the importance of baseline variables in a random forests model for predicting one-year suicidal thoughts and behaviours, removing participants with baseline suicidal thoughts and behaviours, stratified by gender.


Using random forests models in this large sample of college students we found that four main baseline variables predicted STB at 12-month: suicidal thoughts at baseline, trait anxiety, depression symptoms, and self-esteem. The model including these variables showed good predictive performance (AUC = 0.8) estimated using cross-validation. In secondary analyses in a subsample excluding participants who reported STB at baseline, the main predicting variables were depressive symptoms, self-esteem, and academic stress for girls and mainly self-esteem for boys. These predictors differ according to gender only among participants who did not report STB at baseline. Finally, childhood adversity variables did not contribute to STB prediction.

To our knowledge, only two prior studies have developed STB predictive models in students and reported comparable predictive performances to our study. One study used the random forests method to predict suicide attempts among medical students, using a cross-sectional design33. The other study used a logistic regression model to develop a risk-screening algorithm for persistence of suicidal behaviours during college34.

STB prediction was not influenced by childhood trauma or perceived parental support, which are usually strongly associated with STB in young adults35,36. These results are in line with previous studies34,37. This finding highlights that association does not necessarily means prediction11, and that proximal risk factors of STB may be better than distal or early life one for predicting one-year STB38. We can also assume that important predictors such as depression symptoms are the downstream consequences of higher adversity during childhood, and as they are more recent, they could be overshadowing the importance of early adversity in STB prediction. Furthermore, following the diathesis stress model of suicide, the predictors we found (anxiety, depression) might affect more vulnerable individuals who have experienced childhood adversities39.

We identified a small number of major predictors that ensured high accuracy in STB prediction. These predictors, derived from short and commonly used questionnaires, may help developing a large-scale screening tool for university students. For example, they could be integrated into a short online screening administered upon college entrance. An online questionnaire may prove acceptable to students, and would provide an alternative to mental health assessment by a physician for students who are often reluctant to disclose sensitive personal information in face-to-face interviews40,41.

The quantitatively most important predictor was suicidal thoughts at baseline34,42. Likewise, anxiety and depression were often comorbid with STB in students43. Interestingly, self-esteem emerged as one of the main predictors of STB. Low self-esteem is known to be a part of social anxiety, and to overlap with depression, both of which are associated with STB44. Self-esteem, which is an important marker of psychological vulnerability in young adults45,46,47 has also been found associated with suicidality48. Our study showed that self-esteem is an independent and prominent predictive marker of STB and should therefore be used in a screening tool.

Overall, our results suggested that baseline suicidal ideation associated with three validated psychological scales (Rosenberg scale for self-esteem, STAI-YB Spielberger scale for trait anxiety, PHQ-9 for depression) are informative enough to identify students who will present STB at the one-year assessment.

Key strengths of this study are the large sample of students and the longitudinal design. Since there are many different paths to STB, accurate STB prediction requires the consideration of a complex combination of a large number of factors13. The i-Share baseline questionnaire includes a large number of variables, which enabled analyses with a large number of potential STB predictors (70 in the main analyses and 87 for the secondary analyses). Our analyses were conducted following the current recommendations and best methods for prediction analysis, especially the use of different samples for creating the predictors and then for calculating the predictive performance, which prevents the performance measures from being overfitted10,11. The variables identified as main predictors of STB were consistent across main and secondary analyses, suggesting robust and consistent findings. Some limitations should nevertheless be acknowledged when interpreting the results. First, the follow-up response rate (33.5%) was moderate, as is common in longitudinal studies with students49 and differences were observed between respondents and non-respondents in the follow-up. These differences were not major (proportions were similar) and should have a limited impact when identifying STB predictors. Nevertheless, caution is needed regarding the external validity of our results and the possibility of generalizing conclusions to all students and to all settings. Second, girls were over-represented in our sample (79%) compared to the 50–60% of female students in France50, and our sample might not be representative of the whole student population. Third, the self-reported questionnaires could lead to information and recall bias, particularly if participants under-reported their frequency of STB due to concerns about social desirability. However, such under-reporting is likely to be reduced by the use of an online questionnaire. Additionally, and more importantly, relying on other data (e.g., clinical assessment) would defeat our aim of finding easily assessable predictors of SBT in large university student samples. Fourth, given the adaptation of the CTQ used in this study, we could not create subsamples ‘with’ and ‘without’ childhood adversity. Thus we could not explore deeply if the identified predictors affected more individuals with childhood adversities. Finally, we could not strictly separate analyses between suicidal ideation and suicide attempts due to the small number of one-year suicide attempts in our sample even after combined the genders (n = 35).

In conclusion, we identified a parsimonious number of predictors that can be used to accurately identify students who will present STB within one-year from the predictor assessment. Pending replication of these results in other studies, these predictors may be used to develop a screening tool to be routinely used among university students. For example, a web-based screening tool could represent a promising approach for identifying students at suicide risk and to refer them to counselling and mental health services.

Data availability

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.


  1. Auerbach, R. P. et al. Mental disorders among college students in the World Health Organization World Mental Health Surveys. Psychol. Med. 46, 2955–2970 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  2. Auerbach, R. P. et al. WHO World Mental Health Surveys International College Student Project: prevalence and distribution of mental disorders. J. Abnorm. Psychol. 127, 623–638 (2018).

    PubMed  PubMed Central  Article  Google Scholar 

  3. Mortier, P. et al. Suicidal thoughts and behaviors among first-year college students: results from the WMH-ICS project. J. Am. Acad. Child Adolesc. Psychiatry 57, 263–273 (2018).

    PubMed  PubMed Central  Article  Google Scholar 

  4. Blasco, M. J. et al. Predictive models for suicidal thoughts and behaviors among Spanish University students: rationale and methods of the UNIVERSAL (University & mental health) project. BMC Psychiatry 16, 122 (2016).

    PubMed  PubMed Central  Article  Google Scholar 

  5. Auerbach, R. P. et al. Mental disorder comorbidity and suicidal thoughts and behaviors in the World Health Organization World Mental Health Surveys International College Student initiative. Int. J. Methods Psychiatr Res. 28, e1752 (2019).

    PubMed  Article  Google Scholar 

  6. Duffy, A. et al. Mental health care for university students: a way forward?. Lancet Psychiatry 6, 885–887 (2019).

    PubMed  Article  Google Scholar 

  7. Ream, G. L. The interpersonal-psychological theory of suicide in college student suicide screening. Suicide Life Threat Behav. 46, 239–247 (2016).

    PubMed  Article  Google Scholar 

  8. Liu, C. H., Stevens, C., Wong, S. H. M., Yasui, M. & Chen, J. A. The prevalence and predictors of mental health diagnoses and suicide among U.S. college students: implications for addressing disparities in service use. Depress Anxiety 36, 8–17 (2019).

    PubMed  Article  Google Scholar 

  9. Wilcox, H. C. et al. Prevalence and predictors of persistent suicide ideation, plans, and attempts during college. J. Affect. Disord. 127, 287–294 (2010).

    PubMed  PubMed Central  Article  Google Scholar 

  10. Poldrack, R. A., Huckins, G. & Varoquaux, G. Establishment of best practices for evidence for prediction: a review. JAMA Psychiat. 77, 534–540 (2020).

    Article  Google Scholar 

  11. Bzdok, D., Varoquaux, G. & Steyerberg, E. W. Prediction, not association, paves the road to precision medicine. JAMA Psychiat. 78, 127–128 (2021).

    Article  Google Scholar 

  12. Ryan, E. P. & Oquendo, M. A. Suicide risk assessment and prevention: challenges and opportunities focus. Am. Psychiatr. Publ. 18, 88–99 (2020).

    Google Scholar 

  13. Franklin, J. C. et al. Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research. Psychol. Bull. 143, 187–232 (2017).

    PubMed  Article  Google Scholar 

  14. Walsh, C. G., Ribeiro, J. D. & Franklin, J. C. Predicting suicide attempts in adolescents with longitudinal clinical data and machine learning. J. Child Psychol. Psychiatry 59, 1261–1270 (2018).

    PubMed  Article  Google Scholar 

  15. Van Mens, K. et al. Predicting future suicidal behaviour in young adults, with different machine learning techniques: a population-based longitudinal study. J. Affect Disord. 15, 169–177 (2020).

    Article  Google Scholar 

  16. Turecki, G. et al. Suicide and suicide risk. Nat. Rev. Dis. Primers 5, 74 (2019).

    PubMed  Article  Google Scholar 

  17. Gray, A. L., Hyde, T. M., Deep-Soboslay, A., Kleinman, J. E. & Sodhi, M. S. Sex differences in glutamate receptor gene expression in major depression and suicide. Mol. Psychiatry 20, 1139 (2015).

    CAS  PubMed  Article  Google Scholar 

  18. Ruud, N., Løvseth, L. T., Ro, K. I. & Tyssen, R. Comparing mental distress and help-seeking among first-year medical students in Norway: results of two cross-sectional surveys 20 years apart. BMJ Open 10, e036968 (2020).

    PubMed  PubMed Central  Article  Google Scholar 

  19. Macalli, M. et al. Perceived parental support in childhood and adolescence and suicidal ideation in young adults: a cross-sectional analysis of the i-Share study. BMC Psychiatry 18, 373 (2018).

    PubMed  PubMed Central  Article  Google Scholar 

  20. Kroenke, K., Spitzer, R. L. & Williams, J. B. The PHQ-9: validity of a brief depression severity measure. J. Gen. Intern. Med. 16, 606–613 (2001).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  21. Laux, L., Glanzmann, P., Schaffner, P. & Spielberger, C. Das State-Trait Angstinventar: STAI. (ed. Weinheim: Beltz, 1981).

  22. Rosenberg, M. Society and the Adolescent Self-Image (University Press, 1965).

  23. Cohen, S., Kamarck, T. & Mermelstein, R. A global measure of perceived stress. J. Health Soc. Behav. 24, 385–396 (1983).

    CAS  PubMed  Article  Google Scholar 

  24. Barratt, E. S. Impulsiveness and aggression. In: Violence and mental disorder: Developments in risk assessment 61–79 (The University of Chicago Press, 1994).

  25. Paquette, D., Laporte, L., Bigras, M. & Zoccolillo, M. Validation of the French version of the CTQ and prevalence of the history of maltreatment. Sante Ment Que 29, 201–220 (2004).

    PubMed  Article  Google Scholar 

  26. Shatte, A. B. R., Hutchinson, D. M. & Teague, S. J. Machine learning in mental health: a scoping review of methods and applications. Psychol. Med. 49, 1426–1448 (2019).

    PubMed  Article  Google Scholar 

  27. Kaur, A. & Kaur, K. An empirical study of robustness and stability of machine learning classifiers in software defect prediction. In El-Alfy E-SM, Thampi SM, Takagi H, Piramuthu S, Hanne T, eds. Advances in Intelligent Informatics. Advances in Intelligent Systems and Computing 383–397. (Springer, 2015).

  28. Breiman, L. Random forests. Mach Learn. 45, 5–32 (2001).

    MATH  Article  Google Scholar 

  29. Stekhoven, D. J. & Bühlmann, P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics 28, 112–118 (2012).

    CAS  PubMed  Article  Google Scholar 

  30. Zhang-James, Y. et al. Machine-learning prediction of comorbid substance use disorders in ADHD youth using Swedish registry data. J. Child Psychol. Psychiatry 61, 1370–1379 (2020).

    PubMed  PubMed Central  Article  Google Scholar 

  31. Singh, J. P., Desmarais, S. L. & Van Dorn, R. A. Measurement of predictive validity in violence risk assessment studies: a second-order systematic review. Behav. Sci. Law 31, 55–73 (2013).

    PubMed  Article  Google Scholar 

  32. Moons, K. G. M. et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann. Intern. Med. 162, W1-73 (2015).

    PubMed  Article  Google Scholar 

  33. Shen, Y. et al. Detecting risk of suicide attempts among Chinese medical college students using a machine learning algorithm. J. Affect Disord. 273, 18–23 (2020).

    PubMed  Article  Google Scholar 

  34. Mortier, P. et al. A risk algorithm for the persistence of suicidal thoughts and behaviors during college. J. Clin. Psychiatry 78, e828–e836 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  35. Angelakis, I., Austin, J. L. & Gooding, P. Association of childhood maltreatment with suicide behaviors among young people: a systematic review and meta-analysis. JAMA Netw. Open 3, e2012563 (2020).

    PubMed  PubMed Central  Article  Google Scholar 

  36. Macalli, M., Côté, S. & Tzourio, C. Perceived parental support in childhood and adolescence as a tool for mental health screening in students: a longitudinal study in the i-Share cohort. J. Affect Disord. 266, 512–519 (2020).

    PubMed  Article  Google Scholar 

  37. Baldwin, J. R. et al. Population vs individual prediction of poor health from results of adverse childhood experiences screening. JAMA Pediatr. (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Navarro, M. C. et al. Machine learning assessment of early life factors predicting suicide attempt in adolescence or young adulthood. JAMA Netw. Open 1, e211450 (2021).

    Article  Google Scholar 

  39. Belsky, J. & Pluess, M. Beyond diathesis stress: differential susceptibility to environmental influences. Psychol. Bull. 135, 885–908 (2009).

    PubMed  Article  Google Scholar 

  40. Ribeiro, J. D. et al. Self-injurious thoughts and behaviors as risk factors for future suicide ideation, attempts, and death: a meta-analysis of longitudinal studies. Psychol. Med. 46, 225–236 (2016).

    CAS  PubMed  Article  Google Scholar 

  41. Zalsman, G. et al. Suicide prevention strategies revisited: 10-year systematic review. Lancet Psychiatry 3, 646–659 (2016).

    PubMed  Article  Google Scholar 

  42. King, C. A. et al. Online suicide risk screening and intervention with college students: a pilot randomized controlled trial. J. Consult. Clin. Psychol. 83, 630–636 (2015).

    PubMed  PubMed Central  Article  Google Scholar 

  43. Eisenberg, D., Gollust, S. E., Golberstein, E. & Hefner, J. L. Prevalence and correlates of depression, anxiety, and suicidality among university students. Am. J. Orthopsychiatry. 77, 534–542 (2007).

    PubMed  Article  Google Scholar 

  44. Fergusson, D. M., Beautrais, A. L. & Horwood, L. J. Vulnerability and resiliency to suicidal behaviours in young people. Psychol. Med. 33, 61–73 (2003).

    CAS  PubMed  Article  Google Scholar 

  45. Arsandaux, J. et al. Pathways from ADHD symptoms to suicidal ideation during college years: a longitudinal study on the i-share cohort. J. Atten. Disord. (2020).

    Article  PubMed  Google Scholar 

  46. Arsandaux, J., Galéra, C. & Salamon, R. The association of self-esteem and psychosocial outcomes in young adults: a 10-year prospective study. Child Adolesc. Ment. Health (2020).

    Article  PubMed  Google Scholar 

  47. Arsandaux, J., Michel, G., Tournier, M., Tzourio, C. & Galéra, C. Is self-esteem associated with self-rated health among French college students? A longitudinal epidemiological study: the i-Share cohort. BMJ Open 9, e024500 (2019).

    PubMed  PubMed Central  Article  Google Scholar 

  48. Harpin, V., Mazzone, L., Raynaud, J. P., Kahle, J. & Hodgkins, P. Long-Term outcomes of ADHD: a systematic review of self-esteem and social function. J. Atten. Disord. 20, 295–305 (2016).

    CAS  PubMed  Article  Google Scholar 

  49. Ebert, D. D. et al. Prediction of major depressive disorder onset in college students. Depress Anxiety 36, 294–304 (2019).

    PubMed  Article  Google Scholar 

  50. Ministère de l’enseignement supérieur, de la recherche et de l’innovation. (2017).

Download references


We would like to thank: the program ‘Invest for the future’ (reference ANR-10-COHO-05), the Nouvelle-Aquitaine Regional Council (Conseil Régional Nouvelle-Aquitaine) (Grant N°4370420), the Bordeaux ‘Initiatives d’excellence’ (IdEx) program of the University of Bordeaux (ANR-10-IDEX-03-02), Public Health France (Santé Publique France) (contract 19DPPP023-0), the National Cancer Institute (INCA) (grant INCA_11502), and the Medical Research Foundation (FRM). M.Macalli was supported by a PhD grant of the Nouvelle-Aquitaine Regional Council (Grant N° 17 EURE-0019). M. Macalli et M. Navarro were supported by the PhD Digital Public Health Graduate School Program supported within the framework of the PIA3 (Investment for the Future) (Project reference: 17-EURE-0019). The authors are grateful to the coordinating team of the i-Share project: Clothilde Pollet, Edwige Pereira, Marie Mougin, Elena Milesi, Aude Pouymayou and Garance Perret.

Author information

Authors and Affiliations



M.M., M.N. et C.T. designed the study. M.M. et M.N. conducted the statistical analysis. M.M., M.N. et C.T. wrote the first draft of the manuscript. All authors contributed to editing and commenting the final version.

Corresponding authors

Correspondence to Melissa Macalli or Christophe Tzourio.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Macalli, M., Navarro, M., Orri, M. et al. A machine learning approach for predicting suicidal thoughts and behaviours among college students. Sci Rep 11, 11363 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing