Main

Depressive symptoms are a type of chronic mental health condition with complex etiology, and major depressive disorder (MDD) is the clinical disorder diagnosed when depressive symptoms reach a threshold of severity and duration. Depressive symptoms and MDD lead to a serious public health burden. The updated Global Burden of Diseases study showed that the age-standardized prevalence of MDD was 4% (3,951 per 100,000 people) in Western Europe, higher than the global level, and underlined the heavy burden on people aged between 15 and 24 (ref. 1). Among adolescents, a 2021 systematic review indicated that the pooled prevalence of self-reported depressive symptoms was 34% and of MDD was 5% from the studies between 2001 to 2020, and the prevalence is increasing2. The COVID-19 pandemic exacerbated the already growing trend of hardship. Given a growing body of evidence on the environmental effect on depressive symptoms and MDD3,4, more systematic investigation is urgently needed, especially among youth.

The concept of the ‘exposome’, which depicts the dynamic totality of the environment that an individual experiences, was raised in 20055. The exposome is divided into three parts—specific external, general external, and internal exposomes—and the external exposome could be further subdivided into the familial, social, built exposome, and so on. Instead of studying a single or small group of exposures, an exposome study aims to investigate the overall effect of the environment while, unavoidably, complexities such as interaction or ubiquity increase the difficulty6. An exposome- (environmental, exposure) wide association study (ExWAS), like other ‘WAS’ studies, denotes an agnostic and systematic method for hypothesis generating, which is comparatively appropriate to the exposome’s spatiotemporal variabilities and multi-level structure7. Several ExWAS studies have targeted mental health8,9,10, and Choi et al11. used clinical incident depression as the outcome and identified multiple modifiable factors. As it is the early warning sign of MDD, focusing on depressive symptoms in adolescence or young adulthood could be easier to guide translational intervention as early as possible, which would be more cost effective.

Despite the benefits of the exposome approach, there are some other hindrances. First, under the current technique, we cannot measure every possible exposure (far from reaching ‘1-genome’), and the exposome keeps updating, expanding, and enriching. Moreover, some studies have emphasized exposures’ non-genetic properties, which ignores how the environment interacts with genetics through multiple mechanisms among many traits, including depression12,13. Medda and colleagues, on the basis of the Italian Twin Registry, demonstrated the substantial genetic role in exogenous metallomics, where the estimations of standardized genetic variance, as a proportion of total variance of the measured exposures, ranged from 0.15 (arsenic) to 0.79 (zinc)14. As a natural experiment, twin and family studies provide a method to evaluate genetic and environmental relationships between traits and exposures. This design decomposes the variance of traits into additive genetic (A), dominant genetic (D), common environmental (C), and unique environmental (E) components, which contain the distinct features of the exposome as the overall environmental effect. Such indirect evidence of genetic effects based on genetic relationships of family members is an efficient way to demonstrate the presence (or lack of) genetic effects. Thus, the combination of exposome and twin studies could advance our knowledge of the complexities between genes and environments, improve our understanding of existing deficiencies in exposome measures, and produce further research questions. A natural extension is then to include measured genotypes, either targeting specific genes such as those involved in the metabolism of external compounds or more broad-based genome-wide approaches to derive polygenic scores of genetic susceptibility.

In this study, based on the FinnTwin12 cohort, we aim to (1) comprehensively and systematically determine exposures that are significantly associated with depressive symptoms and MDD in late adolescence and early adulthood through three ExWASes and (2) estimate to what extent the exposome and depressive symptoms share the same genetic and environmental risk factors.

Results

Characteristics of the study, participants, and exposures

Figure 1 shows the flowchart of the analysis pipeline, which consisted of three ExWASes and the following bivariate twin modeling. Per the FinnTwin12 cohort, there were 3,025, 1,236, and 4,127 individual twins included in three separate ExWASes with the outcomes of general behavior inventory (GBI) score in young adulthood (primary), the incidence of MDD in young adulthood, and GBI score at age 17, respectively. The characteristics of each ExWAS are shown in Table 1.

Fig. 1: Flowchart of the analysis pipeline.
figure 1

Flowchart of the analysis pipeline demonstrating the path from the choice of exposures to ExWAS analysis and ending with bivariate twin modeling. The full path was used for depressive symptoms (GBI) at two ages. Only ExWAS was completed for MDD.

Table 1 Characteristics of ExWASes

For individual twins included in ExWASes of all outcomes (Table 2), the majority were female and from dizygotic (DZ) pairs, and their parental education levels were limited (less than high school). At age 17, 25.4% of individual twins reported being current smokers, and 82.6% were full-time students and not working. In young adulthood, 25.3% of individual twins reported that they were currently smoking, and 51.4% had a full-time job. The mean GBI scores at age 17 and in young adulthood were 5.0 (s.d.: 4.9) and 4.4 (s.d.: 4.7), respectively, and the two measures correlated with 0.49. The incidence of lifetime MDD in young adulthood was 12.3%.

Table 2 Characteristics of included twins according to the ExWAS

Exposures’ code names, description, and statistics based on twins included in the ExWAS of GBI in young adulthood (before imputation) are presented in Supplementary Table 1. There are 12 domains of exposures, colored in the following plots: air pollution, building, blue and green spaces, population density, geocoordinates, prenatal exposures, passive smoking, family and parents, friend and romantic relationships, school and teachers, stressful life events, and social indicators. In principal component analysis (PCA), the first principal component (PC1) attributed only 10.93% and 10.66% to the total variability of all included exposures in young adulthood and at age 17, respectively (Extended Data Fig. 1). From the scatter plots of PC1 and PC2, we identified some potential clusters of exposures from domains of building, blue and green spaces, and social indicators via visual assessment.

ExWAS of depressive symptoms and MDD in young adulthood

The adjusted coefficient and –log10(P value) of all exposures included for both adult outcomes are presented in Supplementary Table 2. There were 40 significant P values in 29 exposures, which were associated with log-transformed GBI score in young adulthood, identified from 385 exposures (Fig. 2a). There were 24, 2, and 3 exposures belonging to the domains of family and parents, friend and romantic relationships, and school and teachers, respectively. For the most protective exposure, compared with twins who felt their home environment was completely unfair, quite unfair, or somewhat unfair at age 17 (unfair_A17), twins who felt it was not at all unfair at age 17 were associated with a 0.40 lower log-transformed GBI score (95% confidence interval (CI): −0.50, −0.31) (Fig. 2b). For the most harmful exposure, compared with twins who were completely satisfied with their relationships with friends at age 14 (sat_friend_A14), twins who felt somewhat satisfied, mainly not satisfied, or not at all satisfied at age 14 were associated with a 0.42 higher log-transformed GBI score (95% CI: 0.29, 0.55) (Fig. 2b). By contrast, none of the exposures showed a significant association with MDD (Extended Data Fig. 2).

Fig. 2: Association results between exposure and log-transformed GBI score in young adulthood, adjusted for covariates (individual twin n = 3,025), using generalized linear regressiona.
figure 2

a, Manhattan association plot for exposures in relation to log-transformed GBI score in young adulthood. The y axis is showing statistical significance as –log10(P value) for the adjustment for multiple testing. b, Forest plot for the adjusted beta for significant exposures in descending order from top to bottom (from harmful to protective). The center dot and bar present the effect size (coefficient of linear regression) and 95% CI, and the sizes of the dots present the effect size relatively. The color legend applies to both a (Manhattan association plot) and b (forest plot). The adjusted covariates were sex, zygosity, parental education, smoking in young adulthood, work status in young adulthood, secondary-level school in young adulthood, and age when twins provided the GBI assessment in young adulthood.

ExWAS of depressive symptoms at age 17

The adjusted coefficient and –log10(P value) for the age 17 outcome are presented in Supplementary Table 2. There were 71 significant P values in 46 exposures, which were significantly associated with log-transformed GBI score, identified from 286 exposures (Extended Data Fig. 3a). There were 32, 6, 4, and 4 exposures belonging to the domains of family and parent, friend and romantic relationship, school and teachers, and stressful life events, respectively. For the most harmful exposures, compared with twins who were completely satisfied with their success at work or studies at age 17 (sat_studywork_A17), twins who felt mainly not satisfied or not at all satisfied at age 17 were associated with a 0.65 higher log-transformed GBI score (95% CI: 0.55, 0.74) (Extended Data Fig. 3b). For the most protective exposure, the same as the result in young adulthood, compared with twins who felt their home environment was completely unfair, quite unfair, or somewhat unfair at age 17 (unfair_A17), twins who felt it was not at all unfair at age 17 were associated with a 0.50 lower log-transformed GBI score (95% CI: −0.57, −0.43) (Extended Data Fig. 3b). There are 27 exposures that are significantly associated with log-transformed GBI scores both in young adulthood and at age 17, and 22 exposures belong to the domain of family and parents.

Twin modeling of depressive symptoms with exposome scores

Before the bivariate modeling, the best-fit univariate AE model (had the lowest Akaike information criterion compared with ADE and E models) indicated E explained 61% of the variance of depressive symptoms in males and 45% in females at age 17, and the numbers slightly reduced to 59% and 42%, respectively, in young adulthood (Supplementary Table 3). The exposome score was created by confirmatory factor analysis (CFA) based on the significant exposures from ExWASes. The standardized root mean square residual of models in young adulthood and at age 17 were 0.100 and 0.078, respectively, indicating acceptable model fit. MDD was not included in the CFA or following twin modeling due to the smaller sample size and no significant exposure being identified. Then we used the exposome score to conduct bivariate twin modeling between the exposome score and depressive symptoms. Given the sex differences in the prevalence of depressive symptoms, the differences in heritability, and the fact that sex-limited bivariate models also indicated significant sex differences (Supplementary Table 4) at both age points, we ran the bivariate models separately for males and females.

Figure 3 and Supplementary Table 5 show the path coefficients for the model for exposome score and log-transformed GBI score in young adulthood (mean age: 23.9). Unique environmental factors accounted for 23% and 13% of the covariances in males and females, respectively, while additive genetic factors accounted for 77% in males and 87% in females. In males, standardized variances of Eexposome and EGBI were 0.32 (95% CI: 0.26, 0.39) and 0.51 (95% CI: 0.42, 0.62); the numbers reduced to 0.25 (95% CI: 0.21, 0.30) and 0.50 (95% CI: 0.42, 0.58) in females. The remaining share of variance was accounted for by additive genetic effects.

Fig. 3: Bivariate Cholesky AE model for the exposome score and log-transformed GBI score in young adulthood (twin pair n = 846).
figure 3

A, standardized variance of additive genetic effect; E, standardized variance of unique environmental effect. The a and e stand for pathway coefficients from A and E, respectively, to both the exposome score and log-transformed GBI score. The 95% CIs of standardized variances and pathway coefficients are presented in Supplementary Table 4.

Extended Data Fig. 4 and Supplementary Table 5 show the path coefficients for the model for exposome score and log-transformed GBI score at age 17. Unique environmental factors accounted for 31% and 13% of the covariances in males and females, respectively. Additive genetic factors accounted for 69% in males and 87% in females. The standardized variances of Eexposome at age 17 are similar to Eexposome in young adulthood regardless of sex. The standardized variance of Eexposome is 0.26 (95% CI: 0.22, 0.30) and 0.22 (95% CI: 0.19, 0.25) and of EGBI is 0.64 (95% CI: 0.55, 0.73) and 0.44 (95% CI: 0.38, 0.50) in males and females, respectively. The remaining share of variance was accounted for by additive genetic effects.

Post hoc mixed model repeated measures

On the basis of the longitudinal design and 27 significant exposures selected by both ExWASes of log-transformed GBI score, after adjusting for covariates and baseline effect, all the exposures were still significantly associated with log-transformed GBI score in young adulthood. The results are presented in Supplementary Table 6.

Discussion

Using data on depressive symptoms and diagnosed MDD from the FinnTwin12 study and a wide range of exposures from multiple sources, we applied a two-stage analysis to first screen the exposome and then estimate the environmental sources of correlation between the exposome and depressive symptoms via twin modeling. First, multiple exposures by self-report have been identified across domains of family and parents, friend and romantic relationships, school and teachers, and stressful life events, which were significantly associated with depressive symptoms in young adulthood and at age 17. By contrast, none of the exposures correlated with the incidence of MDD in young adulthood. Second, after generating an exposome score based on significantly associated exposures, the best-fitting bivariate AE models indicated that unique environmental effects accounted for a marked fraction of the covariance between the exposome score and depressive symptoms. This environmental fraction was higher in males than in females, suggesting a notable sex difference. Our result implies that environmental effects are more impactful compared with genetic effects in males than in females.

Influence from the familial component of the social exposome, especially from the familial atmosphere, was demonstrated by our evidence as having the most substantial impact on depressive symptoms in late adolescence and early adulthood and their trajectory. A large Chinese survey also found that familial factors such as cohesion, conflict, and control correlated with the occurrence of depressive symptoms among university students15. Other studies have revealed the connection of family triangulation (parent–child coalition and alliance) and satisfaction with depressive symptoms from childhood to late adolescence across countries16,17. Fairness (largest protective effect size of GBI at both age points), as a dimension of parentification, was demonstrated as a unique predictor of mental health symptoms18. These existing conventional investigations were consistent with ours, while our ExWAS more systematically evaluated a wide range of exposures and reduced the chance of type I error without any pre-identified hypothesis. Moreover, instead of traditional scales for assessing familial and interpersonal relationships, we treated each scale component as an ‘independent’ exposure in models, which helped us to identify new correlations, detect the relative importance, and prepare for further analysis of more intricate relationships between different components and depressive symptoms.

Results from bivariate twin modeling reveal a complex relationship among genes, environments, and depressive symptoms. Although the unique environmental factor explains a notable amount of covariance between exposome score and depressive symptoms, the additive genetic factor explained relatively more. Many significant exposures were chosen under the guidance of the exposome paradigm, but it does not necessarily imply a pure environmental effect. Many familial influences are considered ‘inheritable factors’ between generations to a certain extent, according to the intergenerational transmission theory. Such effects can be transmitted from parents to children through shared genes but also by shared environments. Early studies have found that life satisfaction or family violence from parents and origin families led to an important impact on the development of subsequent similar familial environments among offspring19,20. Moreover, we should consider the existence of the gene–environment interaction (G×E), which suggests the different effects of a genotype on disease risk in persons with different environmental exposures21. Choi et al11. stratified the ExWAS by polygenic risk scores of major depression and found that some significant factors in the full sample became null in the genetically at-risk sample. Another study suggested the multiple modulation pathways by exposure to DNA methylation, through numerous testing, regarded as the G×E-WAS22. In addition, previous twin studies found geographic confounding in the assessment of A, C, and E variances, possibly attributable to differences in genetic ancestry. Results from the Netherlands Twin Register found 1.8% of the variance in children’s height was captured by regional clustering23. In the Netherlands, there were strong genetic differentiations between the north and south, between the east and west, and between the middle band and the rest of the country by PCA on genome-wide data24. In the Finnish population, also a substantial population structure difference is observed between the east and west parts of country25. In brief, the hidden heritable and genetic factors critically influence the association between the exposome and depressive phenotype through various mechanisms, which potentially lead to a propensity to weak associations in our findings.

Furthermore, exposures from the more external domains, particularly in the physical exposome, also showed, at most, weak connections with depressive symptoms. While it may be the case that the relative importance of the physical exposome is much less than that of the social and familial exposome with respect to depressive symptoms, there are possibly other explanations. First, a more complex structure of the exposome, such as the interaction or correlation between individual exposures and external exposome, may exist. Some previous exposome analyses have indicated this26,27, but the ExWAS design cannot characterize it. For example, the social exposome is an explaining part of the physical exposome, which could not be completely separated. We aim to investigate the complicated effect of the depressive phenotype in the pluralistic platform like machine learning on the basis of our findings in the future. Second, Finland has been ranked very high in the beneficial environmental effect on the child by UNICEF (United Nations Children’s Fund), providing environments with low air pollution, high greenness, safe water, and other constructive aspects relatively equally to most residents in childhood and adolescence28. It could explain null results with external living environments due to a lack of individual variation in exposures. Another matter contributing to large familial effects is the overlap between interpersonal relationships and depressive symptoms. In a Swedish twin study among females, interpersonal relationships contributed between 18% and 31% of the variance for depressive symptoms29. Some personality disorders are tightly connected with interpersonal relationships, for example, borderline, avoidant, and paranoid personality disorders’ liability factors overlapped substantially with MDD’s, in particular, clusters among Norwegian young adults30. This overlapping may have led to an overestimation of the importance of interpersonal relationships.

For social indicators, besides the critical period, various risk models such as accumulation or trajectory may exist, which may also explain the null results. Morrissey and Kinderman confirmed the hypothesis that accumulation of adverse financial hardship negatively affects mental health, but not the hypothesis of critical periods31, while our risk model is the ‘critical period’. Another study demonstrated the complicated effect among changes in racial composition, neighborhood socioeconomic status, and depressive symptoms32. The social indicators derived from Statistics Finland’s (stat.fi/tilastotieto) registers are at the postal code or municipality level, which leads to some concern about the inaccurate measurement of an individual’s exposure (information bias).

Several previous ExWAS studies linking the exposome to mental health had some similar or heterogeneous results to ours. van de Weijer et al10. identified several social indicators such as safety and income being linked to mental well-being, but the links were weak in our analysis. This may be due to using different outcomes, the older age in their samples, and different statistical methods between the two countries’ authorities10. Although the ExWAS of Choi et al. was on the general population in the United Kingdom, they also found that a higher frequency of visits with family/friends reduced the odds of depression incidence, and Mendelian randomization reinforced the causality of this association11. However, we do not have many common variables with Choi et al11. in which they included many lifestyle factors (specific external exposome), while we have more general external exposome variables. Another ExWAS on psychotic experiences identified many stressful life-event factors, a result that was similar to our study8. Despite the divergent findings, the accumulation of ExWAS findings from different countries, populations, and age groups enhances our understanding of growing concepts of the exposome on depression, as well as broad mental health. The inclusion of a large number of exposures about interpersonal and person–societal relationships is also an important addition to the existing evidence. Notably, some of the information was provided by the parents, not only the twins. Furthermore, some scientists have raised the concept of an ‘eco-exposome’ to thoroughly assess the internal exposome, including molecules affected by exogenous exposures33, which could be assimilated into further research.

The sex difference is notable. Our previous study found that male twins tend to stay together longer, implying more exposure to any familial impact34. In a Swedish study, family structure, conflict, and child disclosure of information to parents were associated with offending behavior in boys, while only one factor was salient in girls35. Another British study found that boys in detrimental familial environments were increasingly disadvantaged in school achievement compared with girls36. The evidence hints that males are more easily affected by the family environment, which could explain the higher contribution of E on the covariance between the exposome and depressive symptoms in males. This inference is not certain, and there is contrary evidence37. Moreover, sex differences exist in many biological mechanisms regarding how the body neurophysiologically reflects the external environment. Several sex-differentially expressed neurotransmitters or hormones, such as progesterone in females, are involved in systemic dysregulation, inducing depression38. Furthermore, environmental endocrine-disrupting chemicals are able to alter neurodevelopment with sex-specific effects at very early developmental stages39. In the future, integrating with the internal exposome such as metabolites and other omics will help us advance the study of sex-difference mechanisms on the relationship between the exposome and depressive phenotype.

As a part of the European Human Exposome Network, our overarching goal is to evaluate the impact of the exposome on human health across various age groups and with respect to multiple outcomes. The present analysis represents one individual analysis, and by pooling our collective efforts, important implications for clinical practice can be drawn in the future. Our findings suggest that studies on the familial component of social exposome should be noticed and investigated in the improvement of current therapy. It does not mean that we should ignore the physical exposure group, due to ubiquity, even though their relevance is not salient40. In addition, it is imperative to incorporate the consideration of familial effects and genetic liability at the same time for a more thorough understanding in future studies.

There are some other limitations in our study. First, compared with other ExWASes, our sample size is relatively small. Although Chung et al. indicated that a sample size between 1,795 and 3,625 participants is adequate when using the Bonferroni correction41, we did not stratify the ExWAS by sex due to the sample size being reduced by half. Second, we did not further assess the causality. Causal inferences are critical for further policymaking and intervention. Mendelian randomization in larger samples is a future direction. Third, the ExWAS, CFA, and twin modeling were all performed on the basis of the FinnTwin12 cohort, which raises concerns about model overfitting and leakage. Different models with different purposes, hypotheses, and methodologies in two stages reduce the risk of overfitting and leakage. ExWAS was used to identify salient exposure, while CFA and twin modeling were used to explore. The observational unit was each twin pair in twin modeling, while in ExWAS and CFA, it is each individual twin. Replication on other twin cohorts and in family datasets is warranted.

Conclusion

This study applied a two-stage analysis. First, in ExWAS, we identified that exposures from family and parents, friend and romantic relationships, school and teachers, and stressful life events were significantly associated with depressive symptoms in late adolescence and young adulthood. The family and parent exposures were the most influential. Second, twin modeling between the exposome and depressive symptoms uncovered a complex relationship among genes, environments, and depressive symptoms with sex differences. The findings underline the importance of systematic evaluation of the environmental effects on depressive symptoms and recommend the consideration of genetic effects in future studies.

Methods

Study participants

The participants came from the FinnTwin12 cohort, which is a nationwide prospective cohort among all Finnish twins born between 1983 and 1987. First, the overall epidemiological study consisted of all 5,184 twins who responded (age 11–12) at wave 1, and there are three general following waves at ages 14, 17, and in young adulthood (mean age: 21.9). Moreover, 1,035 families with 2,070 twins were invited to take part in an intensive study with psychiatric interviews, some biological samples, and additional questionnaires42. At age 14 (wave 2), 1,854 twins participated. They were then invited to participate again as young adults (wave 4) of the study. Psychiatric interviews in young adulthood were completed for 1,347 twins in the intensive study, including assessment of MDD using the Semi-Structured Assessment for Genetics of Alcohol based on Diagnostic and Statistical Manual of Mental Disorders IV criteria43,44. The twins also completed questionnaires on health, health behaviors, work, and multiple psychological scales. The flowchart of general FinnTwin12 cohort is presented in Extended Data Fig. 5. An updated review has been published45. .

The ethics committee of the Department of Public Health of the University of Helsinki and the Institutional Review Board of Indiana University approved the FinnTwin12 study protocol from the start of the cohort. The ethical approval of the ethics committee of the Helsinki University Central Hospital District (HUS) is the most recent and covers the most recent data collection (wave 4) (HUS/2226/2021). The HUS reviews the study annually, and 2023’s statement is number 4/2023, dated 1 February 2023. All participants and their parents/legal guardians gave informed written consent to participate in the study. The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.

Measures

The primary outcome is the short-version GBI scores in young adulthood. It is a self-reported inventory to evaluate the occurrence of depressive symptoms, which is composed of ten Likert-scale questions46. The total score ranges from 0 to 30, and a higher score implies more depressive symptoms occurred. There are two secondary outcomes: GBI scores at age 17 and incidence of MDD in young adulthood.

In total, we curated 385 environmental exposures under the concept of the Equal-Life project47 from multiple sources and grouped them into 12 domains. Air pollution exposures came from the annual average air quality of each observation station from the Finnish Meteorological Institute. Domains of building, blue and green spaces, population density, and a part of geocoordinates were from Equal-Life enrichment. Their description can be found in a previous study48 and is presented in Supplementary Note 1. Exposures from prenatal exposures, passive smoking, family and parents, friend and romantic relationships, school and teachers, and stressful life events domains were from FinnTwin12 questionnaires by self-report or parent report and are described in a published review45. Social indicators were from Statistics Finland and are described in Supplementary Note 1. Except for FinnTwin12 questionnaires, exposures from other sources were linked to individual twins via EUREF-FIN geocoordinates. The full residential history of the twins from birth onward until 2020 was obtained as geocoordinates and dates of moving in and out of specific addresses from the Digital and Population Data Services Agency in Finland34. The types of exposures are continuous, binary, and categorical. Considering the temporality, we included repeated exposures for the critical-period risk model, and Extended Data Fig. 6 presents the timeline of the study. There are three exposure inclusion criteria: (1) twins have available residential history, (2) twins and their family completed at least one questionnaire at any wave, and (3) the percentage of missing values is less than 20% in ExWAS. The code names of each exposure were developed from the description as closely as possible, and their domains, resources, and dates are presented in Supplementary Table 1. The missing patterns of each exposure in each ExWASes are presented in Supplementary Table 7

For analysis of outcomes in young adulthood, we a priori identified seven covariates: sex (male, female), zygosity (monozygotic (MZ), DZ, unknown), parental education (limited, intermediate, high)49, smoking (never, former, occasional, current), work status (full-time, part-time, irregular, not working), secondary-level school (vocational, senior high school, none), and age. The latter four variables were reported by twins as young adults (wave 4). For analysis of outcome at age 17, sex, zygosity, parental education, smoking (reported at age 17) remained. Study and working status (neither study nor work, only study, only work) were included when most participants were in school at age 17 (wave 3). The inclusion of covariates, besides sex, zygosity, and age, was based on the previous literature, which shows correlations with the environment and depressive symptoms50,51,52. Parental education was adjusted for to represent the family resources and resilience 49.

Data pre-processing and descriptive statistics

Participants missing information on outcome or covariates were excluded from the corresponding age’s analyses. Due to the skewness of the GBI score, we added one to the GBI score and log-transformed it. Appropriate regrouping was conducted for categorical exposures, and then we used multivariate imputation by chained equation to replace the missing values of exposures. As a dimension reduction technique, PCA was utilized to measure the proportion of total variability of all included exposures attributed to each PC and visually assess the potential clusters of exposures (correlated) on the basis of the two-dimensional coordinate with the first and second components. It was conducted only for outcomes of GBI at age 17 and in young adulthood, not for the incidence of MDD.

Exposome-wide association study

To conduct the ExWAS, a generalized linear regression model with Gaussian distribution (essentially linear regression) for the outcomes of log-transformed GBI score was repeatedly performed for each exposure. We used Bonferroni correction by the number of effective tests (calculated by PCA) to adjust for multiple testing and account for correlation between exposures53. Covariates were adjusted and the cluster effect of sampling based on families of twin pairs was controlled for by the robust standard error. For the outcome of the incidence of MDD, the distribution was switched to be binomial. The number of included exposures of secondary outcomes was smaller due to the third exposure inclusion criteria, and the sample size varied; thus, the P-value thresholds varied. Due to categorical exposures, the number of P values was higher than the number of exposures. We used the rexposome package in the R environment (version 4.2.3) 54.

We further calculated power using the R package WebPower (R environment, version 4.2.3) for ExWASes of the log-transformed GBI score at both age points. These calculations were based on the smallest absolute effect size among significant results (0.12 in young adulthood and 0.10 at age 17), sample size (3,025 in young adulthood and 4,127 at age 17), number of predictor variables in a single model (8 in young adulthood and 6 at age 17), and significant thresholds (3.09 × 10−4 in young adulthood and 3.63 × 10−4 at age 17). The powers were 1 for ExWASes both in young adulthood and at age 17, indicating adequate sample sizes in this study.

Generating exposome score

Based on the significant exposures selected from the ExWAS, CFA was used to estimate an exposome score, preparing for the following twin modeling. According to the concept of the environment’s totality, we indicated a one-factor structure for the exposome. The CFA assumes the correlation between exposures due to the exposome score and verifies it based on structural equation modeling as theory driven. We used maximum likelihood to estimate the score and standardized root mean square residual to evaluate the model fit55. The cluster effect was controlled like before. Due to multiple subgroups in categorical exposures, we included the whole exposure variable when there was at least one subgroup that was significant compared with the reference in ExWAS. The coefficients of significant exposures were presented in Supplementary Tables 8 and 9 for outcomes of GBI in young adulthood and at age 17, respectively. In addition, we conducted exploratory factor analysis (EFA) estimated by maximum likelihood with 100 optimizations, whereas a large number of retained factors indicated potential overfitting of EFA. The CFA and EFA were performed using Stata 18.0 (StataCorp), and package sem was used.

Twin modeling

In twin modeling, the genetic effect is usually divided into additive and dominant genetic effects56. Since MZ twins are roughly genetically identical and DZ twins share roughly half of their segregating genes, the correlation of A is set to 1.0 and 0.5 and of D is set to 1.0 and 0.25 within MZ and DZ twin pairs, respectively. The epistatic effect is a part of A. The environmental effect is also divided into two components: common environment, whose correlation is assumed to be 1.0 regardless of zygosity, and unique environment (no correlation), which includes unmeasured errors. The use of the twin model assumes the absence of assortative mating for the trait under study among the parents and equal effects of the environment by zygosity.

The intrapair correlations of GBI in DZ (ρ = 0.22 in young adulthood and 0.16 at age 17) and MZ (ρ = 0.52 in young adulthood and 0.51 at age 17) indicated to use an ADE model initially, instead of the ACE model (ρMZ > 2ρDZ). Due to using only the twin pair design, instead of the extended family design, we could not use an ACDE model. The saturated twin model was performed to test the assumptions of equal means and variances for twin order and for zygosity, via constraint means and variances, and to detect the sex difference via sex limitation. In the saturated model (Supplementary Table 10), the Akaike information criterion and likelihood ratio test between models suggested that the assumptions were basically met. Results of the sex-limitation saturated model (Supplementary Table 10) indicated a notable sex difference.

Finally, to assess how the current exposome score explains the variance of depressive symptoms, we employed the bivariate Cholesky AE model to fit the exposome score and log-transformed GBI score (Extended Data Fig. 7) at both age points, which efficiently decomposes the phenotypic correlation and offers the attribution (%) to genetic and environmental factors57. Two latent factors (Aexposome and Eexposome) influence both the exposome score (a11 and e11) and log-transformed GBI score (a21 and e21), and another two latent factors (AGBI and EGBI) influence only the log-transformed GBI score (a22 and e22). The overall correlation between the exposome score and GBI could be calculated as \({a}_{11}\times {a}_{12}+{e}_{11}\times {e}_{12}\). Variances of Aexposome, Eexposome, AGBI, and EGBI were calculated as \({a}_{11}^{2}+{a}_{12}^{2}\), \({e}_{11}^{2}+{e}_{12}^{2}\), \({a}_{22}^{2}\), and \({e}_{22}^{2}\), respectively. We also re-assess the sex difference via an additional sex-limited saturated bivariate twin model.

Only full MZ and DZ twin pairs were included in the twin modeling. We dropped the opposite-sex DZ pairs and stratified the univariate and bivariate twin models by sex. The characteristics of included and excluded individual twins in the twin modeling are presented in Supplementary Table 11, and we did not observe a large difference, suggesting low selection bias risk due to sex, zygosity, and twin pair. Age, reported in the young adulthood survey, was adjusted in univariate and bivariate models for the outcome in young adulthood. We used the OpenMx package in the R environment (version 4.2.3) 58.

Post hoc mixed models for repeated measures

On the basis of the exposures significantly associated with GBI at both time points, we performed the mixed models for repeated measures (MMRM) as a post hoc analysis to further explore the effects on the trajectory of depressive symptoms. This method analyzes the influence on the log-transformed GBI in young adulthood by both exposures of interest (fixed effect) and ‘baseline’ log-transformed GBI at age 17 (random effect)59. The sample size and covariates of the MMRM were the same as in the ExWAS of log-transformed GBI score in young adulthood. The cluster effect was controlled by the robust standard error. The multiple testing was controlled by the false discovery rate (Q value < 0.05 was considered statistically significant). These post hoc analyses were performed using Stata 18.0 (StataCorp).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.