Abstract
Effective personalized well-being interventions require the ability to predict who will thrive or not, and the understanding of underlying mechanisms. Here, using longitudinal data of a large population cohort (the Netherlands Twin Register, collected 1991–2022), we aim to build machine learning prediction models for adult well-being from the exposome and genome, and identify the most predictive factors (N between 702 and 5874). The specific exposome was captured by parent and self-reports of psychosocial factors from childhood to adulthood, the genome was described by polygenic scores, and the general exposome was captured by linkage of participants’ postal codes to objective, registry-based exposures. Not the genome (R2 = −0.007 [−0.026–0.010]), but the general exposome (R2 = 0.047 [0.015–0.076]) and especially the specific exposome (R2 = 0.702 [0.637–0.753]) were predictive of well-being in an independent test set. Adding the genome (P = 0.334) and general exposome (P = 0.695) independently or jointly (P = 0.029) beyond the specific exposome did not improve prediction. Risk/protective factors such as optimism, personality, social support and neighborhood housing characteristics were most predictive. Our findings highlight the importance of longitudinal monitoring and promises of different data modalities for well-being prediction.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 digital issues and online access to articles
$79.00 per year
only $6.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Being part of a national prospective cohort study, the Netherlands Twin Register data cannot be made publicly available for privacy reasons, but they are available for legitimate researchers via the data access procedure at https://tweelingenregister.vu.nl/information_for_researchers/working-with-ntr-data. Data of the Geoscience and health cohort consortium (GECCO) can be requested via the data access request form at https://www.gecco.nl/exposure-data-1/.
Code availability
Python scripts for the machine learning models can be found at https://osf.io/zphw8/.
References
Keyes, C. L. M. The mental health continuum: from languishing to flourishing in life. J. Health Soc. Behav. 43, 207–222 (2002).
Diener, E. Subjective well-being. Psychol. Bull. 95, 542–575 (1984).
Ryan, R. M. & Deci, E. L. On happiness and human potentials: a review of research on hedonic and eudaimonic well-being. Annu Rev. Psychol. 52, 141–166 (2001).
Oparina, E. et al. Human wellbeing and machine learning. Preprint at https://arxiv.org/abs/2206.00574 (2022).
Wild, C. P. The exposome: from concept to utility. Int. J. Epidemiol. 41, 24–32 (2012).
Wild, C. P. Complementing the genome with an ‘exposome’: the outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer Epidemiol. Biomarkers Prev. 14, 1847–1850 (2005).
van de Weijer, M. P. et al. Expanding the environmental scope: an environment-wide association study for mental well-being. J. Expo. Sci. Environ. Epidemiol. https://doi.org/10.1038/s41370-021-00346-0 (2021).
von Stumm, S. & d’Apice, K. From genome-wide to environment-wide: capturing the environome. Perspect. Psychol. Sci. 17, 30–40 (2022).
van de Weijer, M. P. et al. Capturing the well-being exposome in poly-environmental scores. J. Environ. Psychol. https://doi.org/10.3389/fpsyt.2021.671334 (2024).
Rutter, M., Kim‐Cohen, J. & Maughan, B. Continuities and discontinuities in psychopathology between childhood and adult life. J. Child Psychol. Psychiatry 47, 276–295 (2006).
Lahey, B. B., Zald, D. H., Hakes, J. K., Krueger, R. F. & Rathouz, P. J. Patterns of heterotypic continuity associated with the cross-sectional correlational structure of prevalent mental disorders in adults. JAMA Psychiatry 71, 989–996 (2014).
Anglim, J., Horwood, S., Smillie, L. D., Marrero, R. J. & Wood, J. K. Predicting psychological and subjective well-being from personality: a meta-analysis. Psychol. Bull. 146, 279–323 (2020).
Chu, P., Sen, Saucier, D. A. & Hafner, E. Meta-analysis of the relationships between social support and well-being in children and adolescents. J. Soc. Clin. Psychol. 29, 624–645 (2010).
Mann, F. D., DeYoung, C. G., Tiberius, V. & Krueger, R. F. Social-relational exposures and well-being: using multivariate twin data to rule-out heritable and shared environmental confounds. J. Res. Personality https://doi.org/10.1016/j.jrp.2019.103880 (2019).
Uher, R. & Zwicker, A. Etiology in psychiatry: embracing the reality of poly‐gene‐environmental causation of mental illness. World Psychiatry 16, 121–129 (2017).
Klijs, B. et al. Neighborhood income and major depressive disorder in a large Dutch population: results from the LifeLines Cohort study. BMC Public Health 16, 1–13 (2016).
Generaal, E., Timmermans, E. J., Dekkers, J. E. C., Smit, J. H. & Penninx, B. W. J. H. Not urbanization level but socioeconomic, physical and social neighbourhood characteristics are associated with presence and severity of depressive and anxiety disorders. Psychol. Med. 49, 149–161 (2019).
De Vries, S. et al. Local availability of green and blue space and prevalence of common mental disorders in the Netherlands. BJPsych Open 2, 366–372 (2016).
Gong, Y., Palmer, S., Gallacher, J., Marsden, T. & Fone, D. A systematic review of the relationship between objective measurements of the urban environment and psychological distress. Environ. Int. 96, 48–57 (2016).
Yang, T., Wang, J., Huang, J., Kelly, F. J. & Li, G. Long-term exposure to multiple ambient air pollutants and association with incident depression and anxiety. JAMA Psychiatry 80, 305–313 (2023).
Liao, P., Shaw, D. & Lin, Y. Environmental quality and life satisfaction: subjective versus objective measures of air quality. Soc. Indic. Res. 124, 599–616 (2015).
Baselmans, B. M. L. et al. A genetic investigation of the well-being spectrum. Behav. Genet. 49, 286–297 (2019).
Thorp, J. G. et al. Symptom-level modelling unravels the shared genetic architecture of anxiety and depression. Nat. Hum. Behav. https://doi.org/10.1038/s41562-021-01094-9 (2021).
Kim, S. et al. Shared genetic architectures of subjective well-being in East Asian and European ancestry populations. N. Hum. Behav. 6, 1014–1026 (2022).
Meng, X. et al. Multi-ancestry genome-wide association study of major depression aids locus discovery, fine mapping, gene prioritization and causal inference. Nat. Genet. 56, 222–233 (2024).
Okbay, A. et al. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet. 48, 624–633 (2016).
Routledge, K. M. et al. Shared versus distinct genetic contributions of mental wellbeing with depression and anxiety symptoms in healthy twins. Psychiatry Res. 244, 65–70 (2016).
Bzdok, D., Varoquaux, G. & Steyerberg, E. W. Prediction, not association, paves the road to precision medicine. JAMA Psychiatry 78, 127–128 (2021).
Habets, P. C. et al. Multimodal data integration advances longitudinal prediction of the naturalistic course of depression and reveals a multimodal signature of remission during 2-year follow-up. Biol. Psychiatry https://doi.org/10.1016/j.biopsych.2023.05.024 (2023).
Rutter, M. & Silberg, J. Gene-environment interplay in relation to emotional and behavioral disturbance. Annu Rev. Psychol. 53, 463–490 (2002).
Dunn, E. C. et al. Genome‐wide association study (GWAS) and genome‐wide by environment interaction study (GWEIS) of depressive symptoms in African American and Hispanic/Latina women. Depression Anxiety 33, 265–280 (2016).
Assary, E., Vincent, J. P., Keers, R. & Pluess, M. Gene-environment interaction and psychiatric disorders: review and future directions. Semin. Cell Dev. Biol. 77, 133–143 (2018).
Abdellaoui, A. et al. Genetic correlates of social stratification in Great Britain. Nat. Hum. Behav. 3, 1332–1342 (2019).
Kourou, K. et al. A machine learning-based pipeline for modeling medical, socio-demographic, lifestyle and self-reported psychological traits as predictors of mental health outcomes after breast cancer diagnosis: An initial effort to define resilience effects. Comput. Biol. Med. 131, 104266 (2021).
Taliaz, D. et al. Optimizing prediction of response to antidepressant medications using machine learning and integrated genetic, clinical, and demographic data. Transl. Psychiatry 11, 1–9 (2021).
Cearns, M. et al. Predicting rehospitalization within 2 years of initial patient admission for a major depressive episode: a multimodal machine learning approach. Transl. Psychiatry 9, 1–9 (2019).
Tate, A. E. et al. A Genetically informed prediction model for suicidal and aggressive behaviour in teens. Transl. Psychiatry https://doi.org/10.1038/s41398-022-02245-w (2022).
Macalli, M. et al. A machine learning approach for predicting suicidal thoughts and behaviours among college students. Sci. Rep. 11, 1–8 (2021).
Yang, H., Liu, J., Sui, J., Pearlson, G. & Calhoun, V. D. A hybrid machine learning method for fusing fMRI and genetic data: combining both improves classification of schizophrenia. Front. Hum. Neurosci. 4, 192 (2010).
Dwyer, D. B., Falkai, P. & Koutsouleris, N. Machine learning approaches for clinical psychology and psychiatry. Annu. Rev. Clin. Psychol. 14, 91–118 (2018).
Chilver, M. R., Champaigne-Klassen, E., Schofield, P. R., Williams, L. M. & Gatt, J. M. Predicting wellbeing over one year using sociodemographic factors, personality, health behaviours, cognition, and life events. Sci. Rep. 13, 5565 (2023).
Runeson, B. et al. Instruments for the assessment of suicide risk: a systematic review evaluating the certainty of the evidence. PLoS ONE 12, e0180292 (2017).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process Syst. 30, 6785–6795 (2017).
Snep, R. P. H., Klostermann, J., Lehner, M. & Weppelman, I. Social housing as focus area for Nature-based Solutions to strengthen urban resilience and justice: lessons from practice in the Netherlands. Environ. Sci. Policy 145, 164–174 (2023).
Musterd, S. Public housing for whom? Experiences in an era of mature neo-liberalism: the Netherlands and Amsterdam. Housing Studies 29, 467–484 (2014).
Hoekstra, J. Social housing in the Netherlands: the development of the Dutch social housing model. In 2nd Multinational Knowledge Brokerage Event’ Sustainable Housing in a Post-Growth Europe’ (Univ. Barcelona, 2013).
Clair, A. Housing: an under-explored influence on children’s well-being and becoming. Child Indic. Res. 12, 609–626 (2019).
Burger, M. J., Morrison, P. S., Hendriks, M. & Hoogerbrugge, M. M. Urban-rural happiness differentials across the world. World Happiness Rep. 2020, 66–93 (2020).
Hoogerbrugge, M. & Burger, M. J. in Housing and Urban–Rural Differences in Subjective Wellbeing in The Netherlands 97–118 (Edward Elgar Publishing, 2024).
Groenewegen, P. P., van den Berg, A. E., de Vries, S. & Verheij, R. A. Vitamin G: effects of green space on health, well-being, and social safety. BMC Public Health 6, 1–9 (2006).
Gao, Y., Wang, Z., Liu, C. & Peng, Z.-R. Assessing neighborhood air pollution exposure and its relationship with the urban form. Build. Environ. 155, 15–24 (2019).
De Vries, L. P., Baselmans, B. M. L. & Bartels, M. Smartphone-based ecological momentary assessment of well-being: a systematic review and recommendations for future studies. J. Happiness Studies 22, 2361–2408 (2021).
Henches, L. et al. Polygenic risk score prediction accuracy convergence. Preprint at bioRxiv https://doi.org/10.1101/2023.06.27.546518 (2023).
Abdellaoui, A., Yengo, L., Verweij, K. J. H. & Visscher, P. M. 15 years of GWAS discovery: realizing the promise. Am. J. Hum. Genetics 110, 179–194 (2023).
Dudbridge, F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 9, e1003348 (2013).
Mitchell, J. et al. Physical activity in young children: a systematic review of parental influences. Early Child Dev. Care 182, 1411–1437 (2012).
Grey, E. B. et al. A systematic review of the evidence on the effect of parental communication about health and health behaviours on children’s health and wellbeing. Prev. Med. 159, 107043 (2022).
Aalbers, M. B., Hochstenbach, C., Bosma, J. & Fernandez, R. The death and life of private landlordism: how financialized homeownership gave birth to the buy-to-let market. Housing Theory Soc. 38, 541–563 (2021).
Baselmans, B. M. L. & Bartels, M. A genetic perspective on the relationship between eudaimonic –and hedonic well-being. Sci. Rep. 8, 1–10 (2018).
Gallagher, M. W., Lopez, S. J. & Preacher, K. J. The hierarchical structure of well-being. J. Pers. 77, 1025–1050 (2009).
Healthy Environment, Healthy Lives—how the Environment Influences Health and Well-Being in Europe (European Environment Agency, 2020).
Schmitz, O. et al. High resolution annual average air pollution concentration maps for the Netherlands. Sci. Data 6, 1–12 (2019).
Richens, J. G., Lee, C. M. & Johri, S. Improving the accuracy of medical diagnosis with causal machine learning. Nat. Commun. 11, 3923 (2020).
Ligthart, L. et al. The Netherlands twin register: longitudinal research based on twin and twin-family designs. Twin Res. Hum. Genet. 22, 623–636 (2019).
Van Beijsterveldt, C. E. M. et al. The Young Netherlands Twin Register (YNTR): longitudinal twin and family studies in over 70,000 children. Twin Res. Hum. Genet. 16, 252–267 (2013).
Willemsen, G. et al. The Netherlands twin register biobank: a resource for genetic epidemiological studies. Twin Res. Hum. Genet. 13, 231–245 (2010).
Willemsen, G. et al. The adult netherlands twin register: twenty-five years of survey and biological data collection. Twin Res. Hum. Genet. 16, 271–281 (2013).
Diener, E., Emmons, R. A., Larsem, R. J. & Griffin, S. The satisfaction with life scale. J. Pers. Assess. 49, 71–75 (1985).
Lyubomirsky, S. & Lepper, H. S. A measure of subjective happiness: preliminary reliability and construct validation. Soc. Indic. Res. 46, 137–155 (1999).
Cantril, H. The Pattern of Human Concerns (Rutgers Univ. Press, 1965).
Cole, D. A., Martin, N. C. & Steiger, J. H. Empirical and conceptual problems with longitudinal trait-state models: introducing a trait-state-occasion model. Psychol. Meth 10, 3–20 (2005).
Rosseel, Y. Lavaan: an R package for structural equation modeling and more. J. Stat. Softw. 48, 1–36 (2012).
Devlieger, I. & Rosseel, Y. Factor score path analysis. Methodology 13, 31–38 (2017).
Croon, M. in Latent Variable and Latent Structure Models (eds Marcoulides, G. and Moustaki, I.) 195–223 (Erlbaum, 2002).
Verstynen, T. & Kording, K. P. Overfitting to ‘predict’ suicidal ideation. Nat. Hum. Behav. 7, 680–681 (2023).
Hu, L. & Bentler, P. M. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct. Equ. Model. 6, 1–55 (1999).
Wray, N. R. et al. Research review: polygenic methods and their application to psychiatric traits. J. Child Psychol. Psychiatry 55, 1068–1087 (2014).
Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).
Timmermans, E. J. et al. Cohort profile: the geoscience and health cohort consortium (GECCO) in the Netherlands. BMJ Open 8, e021597 (2018).
Lakerveld, J. et al. Deep phenotyping meets big data: the Geoscience and hEalth Cohort COnsortium (GECCO) data to enable exposome studies in The Netherlands. Int. J. Health Geogr. 19, 1–16 (2020).
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A. & Leisch, F. e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien (R Forge, 2021).
Troyanskaya, O. et al. Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001).
Mohammed, M. B., Zulkafli, H. S., Adam, M. B., Ali, N. & Baba, I. A. Comparison of five imputation methods in handling missing data in a continuous frequency table. In AIP Conference Proceedings vol. 2355 (eds. Phang, C. et al.) 40006 (AIP Publishing LLC, 2021).
Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005).
Nogueira, S., Sechidis, K. & Brown, G. On the stability of feature selection algorithms. J. Mach. Learn. Res. 18, 6345–6398 (2017).
Papini, S. et al. Ensemble machine learning prediction of posttraumatic stress disorder screening status after emergency room hospitalization. J. Anxiety Disord. 60, 35–42 (2018).
Tate, A. E. et al. Predicting mental health problems in adolescence using machine learning techniques. PLoS ONE 15, e0230389 (2020).
Field, C. A. & Welsh, A. H. Bootstrapping clustered data. J. R. Stat. Soc. B 69, 369–390 (2007).
Jiang, Y., Lee, M.-L. T., He, X., Rosner, B. & Yan, J. Wilcoxon rank-based tests for clustered data with R package clusrank. J. Stat. Softw. 96, 1–26 (2020).
Rosner, B., Glynn, R. J. & Lee, M.-L. T. The Wilcoxon signed rank test for paired comparisons of clustered data. Biometrics 62, 185–192 (2006).
Benjamin, D. J. et al. Redefine statistical significance. Nat. Hum. Behav. 2, 6–10 (2018).
R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2021).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Chen, T. & Guestrin, C. Xgboost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds. Krishnapuram, B. & Shah, M.) 785–794 (2016).
Acknowledgements
D.H.M.P. is funded by an Amsterdam Public Health AI and Machine learning grant and ERC consolidation grant (WELL-BEING 771057, M. Bartels). The NTR data collection was supported by the following grants: NWO large investment grant (NTR: 480-15-001/674), ZonMW Addiction program (31160008), Spinozapremie (NWO/SPI 56-464-14192), Twin-family database for behavior genetics and genomics studies (NWO 480-04-004), genetic influences on stability and change in psychopathology from childhood to young adulthood (NWO/ZonMW 91210020), Genetic and Family influences on Adolescent psychopathology and Wellness (NWO 463-06-001), A twin-sib study of adolescent wellness (NWO-VENI 451-04-034), The US National Institute of Mental Health as part of the American Recovery and Reinvestment Act of 2009: Genomics of Developmental Trajectories in Twins (1RC2MH089995-01), Determinants of Adolescent Exercise Behavior (NIH-1R01DK092127-01), and part of the genotyping and analyses were funded by the Genetic Association Information Network (GAIN) of the Foundation for the US National Institutes of Health (NIMH, MH081802). M.B. is funded by an NWO VICI grant (VI.C.211.054). Geo-data were collected as part of the Geoscience and Health Cohort Consortium (GECCO), which was financially supported by the Netherlands Organisation for Scientific Research (NWO), the Netherlands Organisation for Health Research and Development (ZonMw) and Amsterdam UMC. More information on GECCO can be found at www.gecco.nl. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank all Netherlands Twin Register participants who provided data for this study. GECCO (Geoscience and Health Cohort Consortium) is acknowledged for gathering and combining existing data into the GECCO repository and maintaining the infrastructure necessary for these data. We thank A. Wagtendonk in particular for providing the data for the present study.
Author information
Authors and Affiliations
Contributions
D.H.M.P. designed the study, with input from P.C.H., C.H.V. and M.B. D.H.M.P. analyzed the data, with support from P.C.H. for the machine learning models. D.H.M.P. designed the figures and tables and drafted the paper. L.L. and C.E.M.v.B. were responsible for providing and support with the NTR data, R.P. was responsible for the polygenic scores. All authors contributed to and approved the final version of the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Mental Health thanks Elham Assary, Jurriaan Hoekstra and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Flow chart of full data preparation and machine learning pipeline – unimodal specific exposome.
Note. N,f = sample size, number of features, XGBoost = extreme gradient boost, SVM = support vector machine. Dotted lines represent transformations/selections in test set based on train set.
Extended Data Fig. 2 Flow chart of full data preparation and machine learning pipeline – unimodal genome.
Note. N,f = sample size, number of features, XGBoost = extreme gradient boost, SVM = support vector machine. Dotted lines represent transformations/selections in test set based on train set. * Participants either had all or no genomic data available, ** 13 polygenic scores, 10 principal components, 6 platform dummies.
Extended Data Fig. 3 Flow chart of full data preparation and machine learning pipeline – unimodal general exposome.
Note. N,f = sample size, number of features, XGBoost = extreme gradient boost, SVM = support vector machine. Dotted lines represent transformations/selections in test set based on train set.
Supplementary information
Supplementary Information
Supplementary Figs. 1–6 and Materials 1–3.
Supplementary Tables
Supplementary Tables 1–10.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pelt, D.H.M., Habets, P.C., Vinkers, C.H. et al. Building machine learning prediction models for well-being using predictors from the exposome and genome in a population cohort. Nat. Mental Health (2024). https://doi.org/10.1038/s44220-024-00294-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s44220-024-00294-2