Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Building machine learning prediction models for well-being using predictors from the exposome and genome in a population cohort

Abstract

Effective personalized well-being interventions require the ability to predict who will thrive or not, and the understanding of underlying mechanisms. Here, using longitudinal data of a large population cohort (the Netherlands Twin Register, collected 1991–2022), we aim to build machine learning prediction models for adult well-being from the exposome and genome, and identify the most predictive factors (N between 702 and 5874). The specific exposome was captured by parent and self-reports of psychosocial factors from childhood to adulthood, the genome was described by polygenic scores, and the general exposome was captured by linkage of participants’ postal codes to objective, registry-based exposures. Not the genome (R2 = −0.007 [−0.026–0.010]), but the general exposome (R2 = 0.047 [0.015–0.076]) and especially the specific exposome (R2 = 0.702 [0.637–0.753]) were predictive of well-being in an independent test set. Adding the genome (P = 0.334) and general exposome (P = 0.695) independently or jointly (P = 0.029) beyond the specific exposome did not improve prediction. Risk/protective factors such as optimism, personality, social support and neighborhood housing characteristics were most predictive. Our findings highlight the importance of longitudinal monitoring and promises of different data modalities for well-being prediction.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Model performance (R2) of unimodal and multimodal analyses.
Fig. 2: Feature importance of the specific exposome.
Fig. 3: Feature importance of the general exposome.
Fig. 4: Model performance (R2) based on single waves of phenotypic (specific exposome) data.
Fig. 5: Model performance (R2) based on the type of longitudinal phenotypic (specific exposome) data.

Similar content being viewed by others

Data availability

Being part of a national prospective cohort study, the Netherlands Twin Register data cannot be made publicly available for privacy reasons, but they are available for legitimate researchers via the data access procedure at https://tweelingenregister.vu.nl/information_for_researchers/working-with-ntr-data. Data of the Geoscience and health cohort consortium (GECCO) can be requested via the data access request form at https://www.gecco.nl/exposure-data-1/.

Code availability

Python scripts for the machine learning models can be found at https://osf.io/zphw8/.

References

  1. Keyes, C. L. M. The mental health continuum: from languishing to flourishing in life. J. Health Soc. Behav. 43, 207–222 (2002).

    Article  PubMed  Google Scholar 

  2. Diener, E. Subjective well-being. Psychol. Bull. 95, 542–575 (1984).

    Article  PubMed  Google Scholar 

  3. Ryan, R. M. & Deci, E. L. On happiness and human potentials: a review of research on hedonic and eudaimonic well-being. Annu Rev. Psychol. 52, 141–166 (2001).

    Article  PubMed  Google Scholar 

  4. Oparina, E. et al. Human wellbeing and machine learning. Preprint at https://arxiv.org/abs/2206.00574 (2022).

  5. Wild, C. P. The exposome: from concept to utility. Int. J. Epidemiol. 41, 24–32 (2012).

    Article  PubMed  Google Scholar 

  6. Wild, C. P. Complementing the genome with an ‘exposome’: the outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer Epidemiol. Biomarkers Prev. 14, 1847–1850 (2005).

    Article  PubMed  Google Scholar 

  7. van de Weijer, M. P. et al. Expanding the environmental scope: an environment-wide association study for mental well-being. J. Expo. Sci. Environ. Epidemiol. https://doi.org/10.1038/s41370-021-00346-0 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  8. von Stumm, S. & d’Apice, K. From genome-wide to environment-wide: capturing the environome. Perspect. Psychol. Sci. 17, 30–40 (2022).

    Article  Google Scholar 

  9. van de Weijer, M. P. et al. Capturing the well-being exposome in poly-environmental scores. J. Environ. Psychol. https://doi.org/10.3389/fpsyt.2021.671334 (2024).

    Article  Google Scholar 

  10. Rutter, M., Kim‐Cohen, J. & Maughan, B. Continuities and discontinuities in psychopathology between childhood and adult life. J. Child Psychol. Psychiatry 47, 276–295 (2006).

    Article  PubMed  Google Scholar 

  11. Lahey, B. B., Zald, D. H., Hakes, J. K., Krueger, R. F. & Rathouz, P. J. Patterns of heterotypic continuity associated with the cross-sectional correlational structure of prevalent mental disorders in adults. JAMA Psychiatry 71, 989–996 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Anglim, J., Horwood, S., Smillie, L. D., Marrero, R. J. & Wood, J. K. Predicting psychological and subjective well-being from personality: a meta-analysis. Psychol. Bull. 146, 279–323 (2020).

    Article  PubMed  Google Scholar 

  13. Chu, P., Sen, Saucier, D. A. & Hafner, E. Meta-analysis of the relationships between social support and well-being in children and adolescents. J. Soc. Clin. Psychol. 29, 624–645 (2010).

    Article  Google Scholar 

  14. Mann, F. D., DeYoung, C. G., Tiberius, V. & Krueger, R. F. Social-relational exposures and well-being: using multivariate twin data to rule-out heritable and shared environmental confounds. J. Res. Personality https://doi.org/10.1016/j.jrp.2019.103880 (2019).

    Article  Google Scholar 

  15. Uher, R. & Zwicker, A. Etiology in psychiatry: embracing the reality of poly‐gene‐environmental causation of mental illness. World Psychiatry 16, 121–129 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Klijs, B. et al. Neighborhood income and major depressive disorder in a large Dutch population: results from the LifeLines Cohort study. BMC Public Health 16, 1–13 (2016).

    Article  Google Scholar 

  17. Generaal, E., Timmermans, E. J., Dekkers, J. E. C., Smit, J. H. & Penninx, B. W. J. H. Not urbanization level but socioeconomic, physical and social neighbourhood characteristics are associated with presence and severity of depressive and anxiety disorders. Psychol. Med. 49, 149–161 (2019).

    Article  PubMed  Google Scholar 

  18. De Vries, S. et al. Local availability of green and blue space and prevalence of common mental disorders in the Netherlands. BJPsych Open 2, 366–372 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Gong, Y., Palmer, S., Gallacher, J., Marsden, T. & Fone, D. A systematic review of the relationship between objective measurements of the urban environment and psychological distress. Environ. Int. 96, 48–57 (2016).

    Article  PubMed  Google Scholar 

  20. Yang, T., Wang, J., Huang, J., Kelly, F. J. & Li, G. Long-term exposure to multiple ambient air pollutants and association with incident depression and anxiety. JAMA Psychiatry 80, 305–313 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Liao, P., Shaw, D. & Lin, Y. Environmental quality and life satisfaction: subjective versus objective measures of air quality. Soc. Indic. Res. 124, 599–616 (2015).

    Article  Google Scholar 

  22. Baselmans, B. M. L. et al. A genetic investigation of the well-being spectrum. Behav. Genet. 49, 286–297 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Thorp, J. G. et al. Symptom-level modelling unravels the shared genetic architecture of anxiety and depression. Nat. Hum. Behav. https://doi.org/10.1038/s41562-021-01094-9 (2021).

    Article  PubMed  Google Scholar 

  24. Kim, S. et al. Shared genetic architectures of subjective well-being in East Asian and European ancestry populations. N. Hum. Behav. 6, 1014–1026 (2022).

    Article  Google Scholar 

  25. Meng, X. et al. Multi-ancestry genome-wide association study of major depression aids locus discovery, fine mapping, gene prioritization and causal inference. Nat. Genet. 56, 222–233 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Okbay, A. et al. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet. 48, 624–633 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Routledge, K. M. et al. Shared versus distinct genetic contributions of mental wellbeing with depression and anxiety symptoms in healthy twins. Psychiatry Res. 244, 65–70 (2016).

    Article  PubMed  Google Scholar 

  28. Bzdok, D., Varoquaux, G. & Steyerberg, E. W. Prediction, not association, paves the road to precision medicine. JAMA Psychiatry 78, 127–128 (2021).

    Article  PubMed  Google Scholar 

  29. Habets, P. C. et al. Multimodal data integration advances longitudinal prediction of the naturalistic course of depression and reveals a multimodal signature of remission during 2-year follow-up. Biol. Psychiatry https://doi.org/10.1016/j.biopsych.2023.05.024 (2023).

    Article  PubMed  Google Scholar 

  30. Rutter, M. & Silberg, J. Gene-environment interplay in relation to emotional and behavioral disturbance. Annu Rev. Psychol. 53, 463–490 (2002).

    Article  PubMed  Google Scholar 

  31. Dunn, E. C. et al. Genome‐wide association study (GWAS) and genome‐wide by environment interaction study (GWEIS) of depressive symptoms in African American and Hispanic/Latina women. Depression Anxiety 33, 265–280 (2016).

    Article  PubMed  Google Scholar 

  32. Assary, E., Vincent, J. P., Keers, R. & Pluess, M. Gene-environment interaction and psychiatric disorders: review and future directions. Semin. Cell Dev. Biol. 77, 133–143 (2018).

    Article  PubMed  Google Scholar 

  33. Abdellaoui, A. et al. Genetic correlates of social stratification in Great Britain. Nat. Hum. Behav. 3, 1332–1342 (2019).

    Article  PubMed  Google Scholar 

  34. Kourou, K. et al. A machine learning-based pipeline for modeling medical, socio-demographic, lifestyle and self-reported psychological traits as predictors of mental health outcomes after breast cancer diagnosis: An initial effort to define resilience effects. Comput. Biol. Med. 131, 104266 (2021).

    Article  PubMed  Google Scholar 

  35. Taliaz, D. et al. Optimizing prediction of response to antidepressant medications using machine learning and integrated genetic, clinical, and demographic data. Transl. Psychiatry 11, 1–9 (2021).

    Article  Google Scholar 

  36. Cearns, M. et al. Predicting rehospitalization within 2 years of initial patient admission for a major depressive episode: a multimodal machine learning approach. Transl. Psychiatry 9, 1–9 (2019).

    Article  Google Scholar 

  37. Tate, A. E. et al. A Genetically informed prediction model for suicidal and aggressive behaviour in teens. Transl. Psychiatry https://doi.org/10.1038/s41398-022-02245-w (2022).

  38. Macalli, M. et al. A machine learning approach for predicting suicidal thoughts and behaviours among college students. Sci. Rep. 11, 1–8 (2021).

    Article  Google Scholar 

  39. Yang, H., Liu, J., Sui, J., Pearlson, G. & Calhoun, V. D. A hybrid machine learning method for fusing fMRI and genetic data: combining both improves classification of schizophrenia. Front. Hum. Neurosci. 4, 192 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Dwyer, D. B., Falkai, P. & Koutsouleris, N. Machine learning approaches for clinical psychology and psychiatry. Annu. Rev. Clin. Psychol. 14, 91–118 (2018).

    Article  PubMed  Google Scholar 

  41. Chilver, M. R., Champaigne-Klassen, E., Schofield, P. R., Williams, L. M. & Gatt, J. M. Predicting wellbeing over one year using sociodemographic factors, personality, health behaviours, cognition, and life events. Sci. Rep. 13, 5565 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Runeson, B. et al. Instruments for the assessment of suicide risk: a systematic review evaluating the certainty of the evidence. PLoS ONE 12, e0180292 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process Syst. 30, 6785–6795 (2017).

    Google Scholar 

  44. Snep, R. P. H., Klostermann, J., Lehner, M. & Weppelman, I. Social housing as focus area for Nature-based Solutions to strengthen urban resilience and justice: lessons from practice in the Netherlands. Environ. Sci. Policy 145, 164–174 (2023).

    Article  Google Scholar 

  45. Musterd, S. Public housing for whom? Experiences in an era of mature neo-liberalism: the Netherlands and Amsterdam. Housing Studies 29, 467–484 (2014).

    Article  Google Scholar 

  46. Hoekstra, J. Social housing in the Netherlands: the development of the Dutch social housing model. In 2nd Multinational Knowledge Brokerage Event’ Sustainable Housing in a Post-Growth Europe’ (Univ. Barcelona, 2013).

  47. Clair, A. Housing: an under-explored influence on children’s well-being and becoming. Child Indic. Res. 12, 609–626 (2019).

    Article  Google Scholar 

  48. Burger, M. J., Morrison, P. S., Hendriks, M. & Hoogerbrugge, M. M. Urban-rural happiness differentials across the world. World Happiness Rep. 2020, 66–93 (2020).

    Google Scholar 

  49. Hoogerbrugge, M. & Burger, M. J. in Housing and Urban–Rural Differences in Subjective Wellbeing in The Netherlands 97–118 (Edward Elgar Publishing, 2024).

  50. Groenewegen, P. P., van den Berg, A. E., de Vries, S. & Verheij, R. A. Vitamin G: effects of green space on health, well-being, and social safety. BMC Public Health 6, 1–9 (2006).

    Article  Google Scholar 

  51. Gao, Y., Wang, Z., Liu, C. & Peng, Z.-R. Assessing neighborhood air pollution exposure and its relationship with the urban form. Build. Environ. 155, 15–24 (2019).

    Article  Google Scholar 

  52. De Vries, L. P., Baselmans, B. M. L. & Bartels, M. Smartphone-based ecological momentary assessment of well-being: a systematic review and recommendations for future studies. J. Happiness Studies 22, 2361–2408 (2021).

    Article  Google Scholar 

  53. Henches, L. et al. Polygenic risk score prediction accuracy convergence. Preprint at bioRxiv https://doi.org/10.1101/2023.06.27.546518 (2023).

  54. Abdellaoui, A., Yengo, L., Verweij, K. J. H. & Visscher, P. M. 15 years of GWAS discovery: realizing the promise. Am. J. Hum. Genetics 110, 179–194 (2023).

    Article  Google Scholar 

  55. Dudbridge, F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 9, e1003348 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Mitchell, J. et al. Physical activity in young children: a systematic review of parental influences. Early Child Dev. Care 182, 1411–1437 (2012).

    Article  Google Scholar 

  57. Grey, E. B. et al. A systematic review of the evidence on the effect of parental communication about health and health behaviours on children’s health and wellbeing. Prev. Med. 159, 107043 (2022).

    Article  PubMed  Google Scholar 

  58. Aalbers, M. B., Hochstenbach, C., Bosma, J. & Fernandez, R. The death and life of private landlordism: how financialized homeownership gave birth to the buy-to-let market. Housing Theory Soc. 38, 541–563 (2021).

    Article  Google Scholar 

  59. Baselmans, B. M. L. & Bartels, M. A genetic perspective on the relationship between eudaimonic –and hedonic well-being. Sci. Rep. 8, 1–10 (2018).

    Article  Google Scholar 

  60. Gallagher, M. W., Lopez, S. J. & Preacher, K. J. The hierarchical structure of well-being. J. Pers. 77, 1025–1050 (2009).

    Article  PubMed  Google Scholar 

  61. Healthy Environment, Healthy Lives—how the Environment Influences Health and Well-Being in Europe (European Environment Agency, 2020).

  62. Schmitz, O. et al. High resolution annual average air pollution concentration maps for the Netherlands. Sci. Data 6, 1–12 (2019).

    Article  Google Scholar 

  63. Richens, J. G., Lee, C. M. & Johri, S. Improving the accuracy of medical diagnosis with causal machine learning. Nat. Commun. 11, 3923 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Ligthart, L. et al. The Netherlands twin register: longitudinal research based on twin and twin-family designs. Twin Res. Hum. Genet. 22, 623–636 (2019).

    Article  PubMed  Google Scholar 

  65. Van Beijsterveldt, C. E. M. et al. The Young Netherlands Twin Register (YNTR): longitudinal twin and family studies in over 70,000 children. Twin Res. Hum. Genet. 16, 252–267 (2013).

    Article  PubMed  Google Scholar 

  66. Willemsen, G. et al. The Netherlands twin register biobank: a resource for genetic epidemiological studies. Twin Res. Hum. Genet. 13, 231–245 (2010).

    Article  PubMed  Google Scholar 

  67. Willemsen, G. et al. The adult netherlands twin register: twenty-five years of survey and biological data collection. Twin Res. Hum. Genet. 16, 271–281 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Diener, E., Emmons, R. A., Larsem, R. J. & Griffin, S. The satisfaction with life scale. J. Pers. Assess. 49, 71–75 (1985).

    Article  PubMed  Google Scholar 

  69. Lyubomirsky, S. & Lepper, H. S. A measure of subjective happiness: preliminary reliability and construct validation. Soc. Indic. Res. 46, 137–155 (1999).

    Article  Google Scholar 

  70. Cantril, H. The Pattern of Human Concerns (Rutgers Univ. Press, 1965).

  71. Cole, D. A., Martin, N. C. & Steiger, J. H. Empirical and conceptual problems with longitudinal trait-state models: introducing a trait-state-occasion model. Psychol. Meth 10, 3–20 (2005).

    Article  Google Scholar 

  72. Rosseel, Y. Lavaan: an R package for structural equation modeling and more. J. Stat. Softw. 48, 1–36 (2012).

    Article  Google Scholar 

  73. Devlieger, I. & Rosseel, Y. Factor score path analysis. Methodology 13, 31–38 (2017).

    Article  Google Scholar 

  74. Croon, M. in Latent Variable and Latent Structure Models (eds Marcoulides, G. and Moustaki, I.) 195–223 (Erlbaum, 2002).

  75. Verstynen, T. & Kording, K. P. Overfitting to ‘predict’ suicidal ideation. Nat. Hum. Behav. 7, 680–681 (2023).

    Article  PubMed  Google Scholar 

  76. Hu, L. & Bentler, P. M. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct. Equ. Model. 6, 1–55 (1999).

    Article  Google Scholar 

  77. Wray, N. R. et al. Research review: polygenic methods and their application to psychiatric traits. J. Child Psychol. Psychiatry 55, 1068–1087 (2014).

    Article  PubMed  Google Scholar 

  78. Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  79. Timmermans, E. J. et al. Cohort profile: the geoscience and health cohort consortium (GECCO) in the Netherlands. BMJ Open 8, e021597 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  80. Lakerveld, J. et al. Deep phenotyping meets big data: the Geoscience and hEalth Cohort COnsortium (GECCO) data to enable exposome studies in The Netherlands. Int. J. Health Geogr. 19, 1–16 (2020).

    Article  Google Scholar 

  81. Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A. & Leisch, F. e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien (R Forge, 2021).

  82. Troyanskaya, O. et al. Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001).

    Article  PubMed  Google Scholar 

  83. Mohammed, M. B., Zulkafli, H. S., Adam, M. B., Ali, N. & Baba, I. A. Comparison of five imputation methods in handling missing data in a continuous frequency table. In AIP Conference Proceedings vol. 2355 (eds. Phang, C. et al.) 40006 (AIP Publishing LLC, 2021).

  84. Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005).

    Article  Google Scholar 

  85. Nogueira, S., Sechidis, K. & Brown, G. On the stability of feature selection algorithms. J. Mach. Learn. Res. 18, 6345–6398 (2017).

    Google Scholar 

  86. Papini, S. et al. Ensemble machine learning prediction of posttraumatic stress disorder screening status after emergency room hospitalization. J. Anxiety Disord. 60, 35–42 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  87. Tate, A. E. et al. Predicting mental health problems in adolescence using machine learning techniques. PLoS ONE 15, e0230389 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  88. Field, C. A. & Welsh, A. H. Bootstrapping clustered data. J. R. Stat. Soc. B 69, 369–390 (2007).

    Article  Google Scholar 

  89. Jiang, Y., Lee, M.-L. T., He, X., Rosner, B. & Yan, J. Wilcoxon rank-based tests for clustered data with R package clusrank. J. Stat. Softw. 96, 1–26 (2020).

    Article  Google Scholar 

  90. Rosner, B., Glynn, R. J. & Lee, M.-L. T. The Wilcoxon signed rank test for paired comparisons of clustered data. Biometrics 62, 185–192 (2006).

    Article  PubMed  Google Scholar 

  91. Benjamin, D. J. et al. Redefine statistical significance. Nat. Hum. Behav. 2, 6–10 (2018).

    Article  PubMed  Google Scholar 

  92. R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2021).

  93. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  94. Chen, T. & Guestrin, C. Xgboost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds. Krishnapuram, B. & Shah, M.) 785–794 (2016).

Download references

Acknowledgements

D.H.M.P. is funded by an Amsterdam Public Health AI and Machine learning grant and ERC consolidation grant (WELL-BEING 771057, M. Bartels). The NTR data collection was supported by the following grants: NWO large investment grant (NTR: 480-15-001/674), ZonMW Addiction program (31160008), Spinozapremie (NWO/SPI 56-464-14192), Twin-family database for behavior genetics and genomics studies (NWO 480-04-004), genetic influences on stability and change in psychopathology from childhood to young adulthood (NWO/ZonMW 91210020), Genetic and Family influences on Adolescent psychopathology and Wellness (NWO 463-06-001), A twin-sib study of adolescent wellness (NWO-VENI 451-04-034), The US National Institute of Mental Health as part of the American Recovery and Reinvestment Act of 2009: Genomics of Developmental Trajectories in Twins (1RC2MH089995-01), Determinants of Adolescent Exercise Behavior (NIH-1R01DK092127-01), and part of the genotyping and analyses were funded by the Genetic Association Information Network (GAIN) of the Foundation for the US National Institutes of Health (NIMH, MH081802). M.B. is funded by an NWO VICI grant (VI.C.211.054). Geo-data were collected as part of the Geoscience and Health Cohort Consortium (GECCO), which was financially supported by the Netherlands Organisation for Scientific Research (NWO), the Netherlands Organisation for Health Research and Development (ZonMw) and Amsterdam UMC. More information on GECCO can be found at www.gecco.nl. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank all Netherlands Twin Register participants who provided data for this study. GECCO (Geoscience and Health Cohort Consortium) is acknowledged for gathering and combining existing data into the GECCO repository and maintaining the infrastructure necessary for these data. We thank A. Wagtendonk in particular for providing the data for the present study.

Author information

Authors and Affiliations

Authors

Contributions

D.H.M.P. designed the study, with input from P.C.H., C.H.V. and M.B. D.H.M.P. analyzed the data, with support from P.C.H. for the machine learning models. D.H.M.P. designed the figures and tables and drafted the paper. L.L. and C.E.M.v.B. were responsible for providing and support with the NTR data, R.P. was responsible for the polygenic scores. All authors contributed to and approved the final version of the paper.

Corresponding author

Correspondence to Dirk H. M. Pelt.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Mental Health thanks Elham Assary, Jurriaan Hoekstra and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Flow chart of full data preparation and machine learning pipeline – unimodal specific exposome.

Note. N,f = sample size, number of features, XGBoost = extreme gradient boost, SVM = support vector machine. Dotted lines represent transformations/selections in test set based on train set.

Extended Data Fig. 2 Flow chart of full data preparation and machine learning pipeline – unimodal genome.

Note. N,f = sample size, number of features, XGBoost = extreme gradient boost, SVM = support vector machine. Dotted lines represent transformations/selections in test set based on train set. * Participants either had all or no genomic data available, ** 13 polygenic scores, 10 principal components, 6 platform dummies.

Extended Data Fig. 3 Flow chart of full data preparation and machine learning pipeline – unimodal general exposome.

Note. N,f = sample size, number of features, XGBoost = extreme gradient boost, SVM = support vector machine. Dotted lines represent transformations/selections in test set based on train set.

Supplementary information

Supplementary Information

Supplementary Figs. 1–6 and Materials 1–3.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–10.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pelt, D.H.M., Habets, P.C., Vinkers, C.H. et al. Building machine learning prediction models for well-being using predictors from the exposome and genome in a population cohort. Nat. Mental Health (2024). https://doi.org/10.1038/s44220-024-00294-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s44220-024-00294-2

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing