The effect of environment on depressive symptoms in late adolescence and early adulthood: an exposome-wide association study and twin modeling

The exposome represents the totality of environmental effects, but systematic evaluation between it and depressive symptoms is scant. Here we sought to comprehensively identify the association of the exposome with depressive symptoms in late adolescence and early adulthood and determine genetic and environmental covariances between them. Based on the FinnTwin12 cohort (3,025 participants in young adulthood and 4,127 at age 17), the exposome-wide association study (ExWAS) design was used to identify significant exposures from 12 domains. Bivariate Cholesky twin models were fitted to an exposome score and depressive symptoms. In ExWASes, 29 and 46 exposures were significantly associated with depressive symptoms in young adulthood and at age 17, respectively, and familial exposures were the most influential. Twin models indicated considerable genetic and environmental covariances between the exposome score and depressive symptoms with sex differences. The findings underscore the systematic approach of the exposome and the consideration of relevant genetic effects. In this exposome-wide association study using the FinnTwin12 cohort, Wang et al. show that familial component of social exposure has a significant association with depressive symptoms in late adolescence and early adulthood.

The exposome represents the totality of environmental effects, but systematic evaluation between it and depressive symptoms is scant.Here we sought to comprehensively identify the association of the exposome with depressive symptoms in late adolescence and early adulthood and determine genetic and environmental covariances between them.Based on the FinnTwin12 cohort (3,025 participants in young adulthood and 4,127 at age 17), the exposome-wide association study (ExWAS) design was used to identify significant exposures from 12 domains.Bivariate Cholesky twin models were fitted to an exposome score and depressive symptoms.In ExWASes, 29 and 46 exposures were significantly associated with depressive symptoms in young adulthood and at age 17, respectively, and familial exposures were the most influential.Twin models indicated considerable genetic and environmental covariances between the exposome score and depressive symptoms with sex differences.The findings underscore the systematic approach of the exposome and the consideration of relevant genetic effects.
Depressive symptoms are a type of chronic mental health condition with complex etiology, and major depressive disorder (MDD) is the clinical disorder diagnosed when depressive symptoms reach a threshold of severity and duration.Depressive symptoms and MDD lead to a serious public health burden.The updated Global Burden of Diseases study showed that the age-standardized prevalence of MDD was 4% (3,951 per 100,000 people) in Western Europe, higher than the global level, and underlined the heavy burden on people aged between 15 and  24 (ref.1).Among adolescents, a 2021 systematic review indicated that the pooled prevalence of self-reported depressive symptoms was 34% and of MDD was 5% from the studies between 2001 to 2020, and the prevalence is increasing 2 .The COVID-19 pandemic exacerbated the already growing trend of hardship.Given a growing body of evidence on the environmental effect on depressive symptoms and MDD 3,4 , more systematic investigation is urgently needed, especially among youth.
The concept of the 'exposome', which depicts the dynamic totality of the environment that an individual experiences, was raised in 2005 5 .The exposome is divided into three parts-specific external, general Article https://doi.org/10.1038/s44220-023-00124-xfamily members is an efficient way to demonstrate the presence (or lack of) genetic effects.Thus, the combination of exposome and twin studies could advance our knowledge of the complexities between genes and environments, improve our understanding of existing deficiencies in exposome measures, and produce further research questions.A natural extension is then to include measured genotypes, either targeting specific genes such as those involved in the metabolism of external compounds or more broad-based genome-wide approaches to derive polygenic scores of genetic susceptibility.
In this study, based on the FinnTwin12 cohort, we aim to (1) comprehensively and systematically determine exposures that are significantly associated with depressive symptoms and MDD in late adolescence and early adulthood through three ExWASes and (2) estimate to what extent the exposome and depressive symptoms share the same genetic and environmental risk factors.

Characteristics of the study, participants, and exposures
Figure 1 shows the flowchart of the analysis pipeline, which consisted of three ExWASes and the following bivariate twin modeling.Per the FinnTwin12 cohort, there were 3,025, 1,236, and 4,127 individual twins included in three separate ExWASes with the outcomes of general behavior inventory (GBI) score in young adulthood (primary), the incidence of MDD in young adulthood, and GBI score at age 17, respectively.The characteristics of each ExWAS are shown in Table 1.
For individual twins included in ExWASes of all outcomes (Table 2), the majority were female and from dizygotic (DZ) pairs, and their parental education levels were limited (less than high school).At age 17, 25.4% of individual twins reported being current smokers, and 82.6% were full-time students and not working.In young adulthood, 25.3% of individual twins reported that they were currently smoking, and 51.4% had a full-time job.The mean GBI scores at age 17 and in young adulthood were 5.0 (s.d.: 4.9) and 4.4 (s.d.: 4.7), respectively, and the external, and internal exposomes-and the external exposome could be further subdivided into the familial, social, built exposome, and so on.Instead of studying a single or small group of exposures, an exposome study aims to investigate the overall effect of the environment while, unavoidably, complexities such as interaction or ubiquity increase the difficulty 6 .An exposome-(environmental, exposure) wide association study (ExWAS), like other 'WAS' studies, denotes an agnostic and systematic method for hypothesis generating, which is comparatively appropriate to the exposome's spatiotemporal variabilities and multilevel structure 7 .Several ExWAS studies have targeted mental health [8][9][10] , and Choi et al 11 .used clinical incident depression as the outcome and identified multiple modifiable factors.As it is the early warning sign of MDD, focusing on depressive symptoms in adolescence or young adulthood could be easier to guide translational intervention as early as possible, which would be more cost effective.
Despite the benefits of the exposome approach, there are some other hindrances.First, under the current technique, we cannot measure every possible exposure (far from reaching '1-genome'), and the exposome keeps updating, expanding, and enriching.Moreover, some studies have emphasized exposures' non-genetic properties, which ignores how the environment interacts with genetics through multiple mechanisms among many traits, including depression 12,13 .Medda and colleagues, on the basis of the Italian Twin Registry, demonstrated the substantial genetic role in exogenous metallomics, where the estimations of standardized genetic variance, as a proportion of total variance of the measured exposures, ranged from 0.15 (arsenic) to 0.79 (zinc) 14 .As a natural experiment, twin and family studies provide a method to evaluate genetic and environmental relationships between traits and exposures.This design decomposes the variance of traits into additive genetic (A), dominant genetic (D), common environmental (C), and unique environmental (E) components, which contain the distinct features of the exposome as the overall environmental effect.Such indirect evidence of genetic effects based on genetic relationships of  Flowchart of the analysis pipeline demonstrating the path from the choice of exposures to ExWAS analysis and ending with bivariate twin modeling.The full path was used for depressive symptoms (GBI) at two ages.Only ExWAS was completed for MDD.
Exposures' code names, description, and statistics based on twins included in the ExWAS of GBI in young adulthood (before imputation) are presented in Supplementary Table 1.There are 12 domains of exposures, colored in the following plots: air pollution, building, blue and green spaces, population density, geocoordinates, prenatal exposures, passive smoking, family and parents, friend and romantic relationships, school and teachers, stressful life events, and social indicators.In principal component analysis (PCA), the first principal component (PC1) attributed only 10.93% and 10.66% to the total variability of all included exposures in young adulthood and at age 17, respectively (Extended Data Fig. 1).From the scatter plots of PC1 and PC2, we identified some potential clusters of exposures from domains of building, blue and green spaces, and social indicators via visual assessment.

ExWAS of depressive symptoms and MDD in young adulthood
The adjusted coefficient and -log 10 (P value) of all exposures included for both adult outcomes are presented in Supplementary Table 2.There were 40 significant P values in 29 exposures, which were associated with log-transformed GBI score in young adulthood, identified from 385 exposures (Fig. 2a).There were 24, 2, and 3 exposures belonging to the domains of family and parents, friend and romantic relationships, and school and teachers, respectively.For the most protective exposure, compared with twins who felt their home environment was completely unfair, quite unfair, or somewhat unfair at age 17 (unfair_A17), twins who felt it was not at all unfair at age 17 were associated with a 0.40 lower log-transformed GBI score (95% confidence interval (CI): −0.50, −0.31) (Fig. 2b).For the most harmful exposure, compared with twins who were completely satisfied with their relationships with friends at age 14 (sat_friend_A14), twins who felt somewhat satisfied, mainly not satisfied, or not at all satisfied at age 14 were associated with a 0.42 higher log-transformed GBI score (95% CI: 0.29, 0.55) (Fig. 2b).By contrast, none of the exposures showed a significant association with MDD (Extended Data Fig. 2).

ExWAS of depressive symptoms at age 17
The adjusted coefficient and -log 10 (P value) for the age 17 outcome are presented in Supplementary Table 2.There were 71 significant P values in 46 exposures, which were significantly associated with logtransformed GBI score, identified from 286 exposures (Extended Data Fig. 3a).There were 32, 6, 4, and 4 exposures belonging to the domains of family and parent, friend and romantic relationship, school and teachers, and stressful life events, respectively.For the most harmful exposures, compared with twins who were completely satisfied with their success at work or studies at age 17 (sat_studywork_A17), twins who felt mainly not satisfied or not at all satisfied at age 17 were associated with a 0.65 higher log-transformed GBI score (95% CI: 0.55, 0.74) (Extended Data Fig. 3b).For the most protective exposure, the same as the result in young adulthood, compared with twins who felt their home environment was completely unfair, quite unfair, or somewhat unfair at age 17 (unfair_A17), twins who felt it was not at all unfair at age 17 were associated with a 0.50 lower log-transformed GBI score (95% CI: −0.57, −0.43) (Extended Data Fig. 3b).There are 27 exposures that are significantly associated with log-transformed GBI scores both in young adulthood and at age 17, and 22 exposures belong to the domain of family and parents.

Twin modeling of depressive symptoms with exposome scores
Before the bivariate modeling, the best-fit univariate AE model (had the lowest Akaike information criterion compared with ADE and E models) indicated E explained 61% of the variance of depressive symptoms in males and 45% in females at age 17, and the numbers slightly reduced to 59% and 42%, respectively, in young adulthood (Supplementary Table 3).The exposome score was created by confirmatory factor analysis (CFA) based on the significant exposures from ExWASes.The standardized root mean square residual of models in young adulthood and at age 17 were 0.100 and 0.078, respectively, indicating acceptable model fit.MDD was not included in the CFA or following twin modeling due to the smaller sample size and no significant exposure being identified.Then we used the exposome score to conduct bivariate twin modeling between the exposome score and depressive symptoms.Given the sex differences in the prevalence of depressive symptoms, the differences in heritability, and the fact that sex-limited bivariate models also indicated significant sex differences (Supplementary Table 4) at both age points, we ran the bivariate models separately for males and females.
Figure 3 and Supplementary Table 5 show the path coefficients for the model for exposome score and log-transformed GBI score in young adulthood (mean age: 23.9).Unique environmental factors accounted for 23% and 13% of the covariances in males and females, respectively, while additive genetic factors accounted for 77% in males and 87% in females.In males, standardized variances of E exposome and E GBI were 0.32 (95% CI: 0.26, 0.39) and 0.51 (95% CI: 0.42, 0.62); the numbers reduced to 0.25 (95% CI: 0.21, 0.30) and 0.50 (95% CI: 0.42, 0.58) in females.The remaining share of variance was accounted for by additive genetic effects.
Extended Data Fig. 4 and Supplementary Table 5 show the path coefficients for the model for exposome score and log-transformed GBI score at age 17.Unique environmental factors accounted for 31% and 13% of the covariances in males and females, respectively.Additive genetic factors accounted for 69% in males and 87% in females.The standardized variances of E exposome at age 17 are similar to E exposome in young adulthood regardless of sex.The standardized variance of E exposome is 0.26 (95% CI: 0.22, 0.30) and 0.22 (95% CI: 0.19, 0.25) and of E GBI is 0.64 (95% CI: 0.55, 0.73) and 0.44 (95% CI: 0.38, 0.50) in males and females, respectively.The remaining share of variance was accounted for by additive genetic effects.
Post hoc mixed model repeated measures.On the basis of the longitudinal design and 27 significant exposures selected by both ExWASes of log-transformed GBI score, after adjusting for covariates and baseline effect, all the exposures were still significantly associated with logtransformed GBI score in young adulthood.The results are presented in Supplementary Table 6.

Discussion
Using data on depressive symptoms and diagnosed MDD from the FinnTwin12 study and a wide range of exposures from multiple sources, we applied a two-stage analysis to first screen the exposome and then estimate the environmental sources of correlation between the exposome and depressive symptoms via twin modeling.First, multiple exposures by self-report have been identified across domains of family and parents, friend and romantic relationships, school and teachers, and stressful life events, which were significantly associated with depressive symptoms in young adulthood and at age 17.By contrast, none of the exposures correlated with the incidence of MDD in young adulthood.Second, after generating an exposome score based on significantly associated exposures, the best-fitting bivariate AE models indicated that unique environmental effects accounted for a marked fraction of the covariance between the exposome score and depressive symptoms.This environmental fraction was higher in males than in females, suggesting a notable sex difference.Our result implies that environmental effects are more impactful compared with genetic effects in males than in females.
Influence from the familial component of the social exposome, especially from the familial atmosphere, was demonstrated by our evidence as having the most substantial impact on depressive symptoms in late adolescence and early adulthood and their trajectory.A large Chinese survey also found that familial factors such as cohesion, conflict, and control correlated with the occurrence of depressive symptoms among university students 15 .Other studies have revealed the connection of family triangulation (parent-child coalition and alliance) and satisfaction with depressive symptoms from childhood to late adolescence across countries 16,17 .Fairness (largest protective effect size of GBI at both age points), as a dimension of parentification, was demonstrated as a unique predictor of mental health symptoms 18 .These existing conventional investigations were consistent with ours, while our ExWAS more systematically evaluated a wide range of exposures and reduced the chance of type I error without any pre-identified hypothesis.Moreover, instead of traditional scales for assessing familial and interpersonal relationships, we treated each scale component as an 'independent' exposure in models, which helped us to identify new correlations, detect the relative importance, and prepare for further analysis of more intricate relationships between different components and depressive symptoms.
Results from bivariate twin modeling reveal a complex relationship among genes, environments, and depressive symptoms.Although the unique environmental factor explains a notable amount of covariance between exposome score and depressive symptoms, the additive genetic factor explained relatively more.Many significant exposures were chosen under the guidance of the exposome paradigm, but it does not necessarily imply a pure environmental effect.Many familial influences are considered 'inheritable factors' between generations to a certain extent, according to the intergenerational transmission theory.Such effects can be transmitted from parents to children through shared genes but also by shared environments.Early studies have found that life satisfaction or family violence from parents and origin families led to an important impact on the development of subsequent similar familial environments among offspring 19,20 .Moreover, we should consider the existence of the gene-environment interaction (G×E), which suggests the different effects of a genotype on disease risk in persons with different environmental exposures 21 .Choi et al 11 .stratified the ExWAS by polygenic risk scores of major depression and found that some significant factors in the full sample became null in the genetically at-risk sample.Another study suggested the multiple modulation pathways by exposure to DNA methylation, through numerous testing, regarded as the G×E-WAS 22 .In addition, previous twin studies found geographic confounding in the assessment of A, C, and E variances, possibly attributable to differences in genetic ancestry.Results from the Netherlands Twin Register found 1.8% of the variance in children's height was captured by regional clustering 23 .In the Netherlands, there were strong genetic differentiations between the north and south, between the east and west, and between the middle band and the rest of the country by PCA on genome-wide data 24 .In the Finnish population, also a substantial population structure difference is observed between the east and west parts of country 25 .In brief, the hidden heritable and genetic factors critically influence the association between the exposome and depressive phenotype through various mechanisms, which potentially lead to a propensity to weak associations in our findings.
Furthermore, exposures from the more external domains, particularly in the physical exposome, also showed, at most, weak connections with depressive symptoms.While it may be the case that the relative importance of the physical exposome is much less than that of the social and familial exposome with respect to depressive symptoms, there are possibly other explanations.First, a more complex structure of the exposome, such as the interaction or correlation between individual exposures and external exposome, may exist.Some previous exposome analyses have indicated this 26,27 , but the ExWAS design cannot characterize it.For example, the social exposome is an explaining part of the physical exposome, which could not be completely separated.We aim to investigate the complicated effect of the depressive phenotype in the pluralistic platform like machine learning on the

Article
https://doi.org/10.1038/s44220-023-00124-xbasis of our findings in the future.Second, Finland has been ranked very high in the beneficial environmental effect on the child by UNICEF (United Nations Children's Fund), providing environments with low air pollution, high greenness, safe water, and other constructive aspects relatively equally to most residents in childhood and adolescence 28 .It could explain null results with external living environments due to a lack of individual variation in exposures.Another matter contributing to large familial effects is the overlap between interpersonal relationships and depressive symptoms.In a Swedish twin study among females, interpersonal relationships contributed between 18% and 31% of the variance for depressive symptoms 29 .Some personality disorders are tightly connected with interpersonal relationships, for example, borderline, avoidant, and paranoid personality disorders' liability factors overlapped substantially with MDD's, in particular, clusters among Norwegian young adults 30 .This overlapping may have led to an overestimation of the importance of interpersonal relationships.For social indicators, besides the critical period, various risk models such as accumulation or trajectory may exist, which may also explain the null results.Morrissey and Kinderman confirmed the hypothesis that accumulation of adverse financial hardship negatively affects mental health, but not the hypothesis of critical periods 31 , while our risk model is the 'critical period'.Another study demonstrated the complicated effect among changes in racial composition, neighborhood socioeconomic status, and depressive symptoms 32 .The social indicators derived from Statistics Finland's (stat.fi/tilastotieto)registers are at the postal code or municipality level, which leads to some concern about the inaccurate measurement of an individual's exposure (information bias).
Several previous ExWAS studies linking the exposome to mental health had some similar or heterogeneous results to ours.van de Weijer et al 10 .identified several social indicators such as safety and income being linked to mental well-being, but the links were weak in our analysis.This may be due to using different outcomes, the older age in their samples, and different statistical methods between the two countries' authorities 10 .Although the ExWAS of Choi et al. was on the general population in the United Kingdom, they also found that a higher frequency of visits with family/friends reduced the odds of depression incidence, and Mendelian randomization reinforced the causality of this association 11 .However, we do not have many common variables with Choi et al 11 .in which they included many lifestyle factors (specific external exposome), while we have more general external exposome variables.Another ExWAS on psychotic experiences identified many stressful life-event factors, a result that was similar to our study 8 .Despite the divergent findings, the accumulation of ExWAS findings from different countries, populations, and age groups enhances our understanding of growing concepts of the exposome on depression, as well as broad mental health.The inclusion of a large number of exposures about interpersonal and person-societal relationships is also an important addition to the existing evidence.Notably, some of the information was provided by the parents, not only the twins.Furthermore, some scientists have raised the concept of an 'eco-exposome' to thoroughly assess the internal exposome, including molecules affected by exogenous exposures 33 , which could be assimilated into further research.
The sex difference is notable.Our previous study found that male twins tend to stay together longer, implying more exposure to any familial impact 34 .In a Swedish study, family structure, conflict, and

Article
https://doi.org/10.1038/s44220-023-00124-xchild disclosure of information to parents were associated with offending behavior in boys, while only one factor was salient in girls 35 .Another British study found that boys in detrimental familial environments were increasingly disadvantaged in school achievement compared with girls 36 .The evidence hints that males are more easily affected by the family environment, which could explain the higher contribution of E on the covariance between the exposome and depressive symptoms in males.This inference is not certain, and there is contrary evidence 37 .Moreover, sex differences exist in many biological mechanisms regarding how the body neurophysiologically reflects the external environment.Several sex-differentially expressed neurotransmitters or hormones, such as progesterone in females, are involved in systemic dysregulation, inducing depression 38 .Furthermore, environmental endocrine-disrupting chemicals are able to alter neurodevelopment with sex-specific effects at very early developmental stages 39 .In the future, integrating with the internal exposome such as metabolites and other omics will help us advance the study of sexdifference mechanisms on the relationship between the exposome and depressive phenotype.As a part of the European Human Exposome Network, our overarching goal is to evaluate the impact of the exposome on human health across various age groups and with respect to multiple outcomes.The present analysis represents one individual analysis, and by pooling our collective efforts, important implications for clinical practice can be drawn in the future.Our findings suggest that studies on the familial component of social exposome should be noticed and investigated in the improvement of current therapy.It does not mean that we should ignore the physical exposure group, due to ubiquity, even though their relevance is not salient 40 .In addition, it is imperative to incorporate the consideration of familial effects and genetic liability at the same time for a more thorough understanding in future studies.
There are some other limitations in our study.First, compared with other ExWASes, our sample size is relatively small.Although Chung et al. indicated that a sample size between 1,795 and 3,625 participants is adequate when using the Bonferroni correction 41 , we did not stratify the ExWAS by sex due to the sample size being reduced by half.Second, we did not further assess the causality.Causal inferences are critical for further policymaking and intervention.Mendelian randomization in larger samples is a future direction.Third, the ExWAS, CFA, and twin modeling were all performed on the basis of the FinnTwin12 cohort, which raises concerns about model overfitting and leakage.Different models with different purposes, hypotheses, and methodologies in two stages reduce the risk of overfitting and leakage.ExWAS was used to identify salient exposure, while CFA and twin modeling were used to explore.The observational unit was each twin pair in twin modeling, while in ExWAS and CFA, it is each individual twin.Replication on other twin cohorts and in family datasets is warranted.

Conclusion
This study applied a two-stage analysis.First, in ExWAS, we identified that exposures from family and parents, friend and romantic relationships, school and teachers, and stressful life events were significantly associated with depressive symptoms in late adolescence and young adulthood.The family and parent exposures were the most influential.Second, twin modeling between the exposome and depressive A and E, respectively, to both the exposome score and log-transformed GBI score.The 95% CIs of standardized variances and pathway coefficients are presented in Supplementary Table 4.

Article
https://doi.org/10.1038/s44220-023-00124-xsymptoms uncovered a complex relationship among genes, environments, and depressive symptoms with sex differences.The findings underline the importance of systematic evaluation of the environmental effects on depressive symptoms and recommend the consideration of genetic effects in future studies.

Study participants
The participants came from the FinnTwin12 cohort, which is a nationwide prospective cohort among all Finnish twins born between 1983 and 1987.First, the overall epidemiological study consisted of all 5,184 twins who responded (age 11-12) at wave 1, and there are three general following waves at ages 14, 17, and in young adulthood (mean age: 21.9).Moreover, 1,035 families with 2,070 twins were invited to take part in an intensive study with psychiatric interviews, some biological samples, and additional questionnaires 42 .At age 14 (wave 2), 1,854 twins participated.They were then invited to participate again as young adults (wave 4) of the study.Psychiatric interviews in young adulthood were completed for 1,347 twins in the intensive study, including assessment of MDD using the Semi-Structured Assessment for Genetics of Alcohol based on Diagnostic and Statistical Manual of Mental Disorders IV criteria 43,44 .The twins also completed questionnaires on health, health behaviors, work, and multiple psychological scales.The flowchart of general FinnTwin12 cohort is presented in Extended Data Fig. 5.An updated review has been published 45

Measures
The primary outcome is the short-version GBI scores in young adulthood.It is a self-reported inventory to evaluate the occurrence of depressive symptoms, which is composed of ten Likert-scale questions 46 .The total score ranges from 0 to 30, and a higher score implies more depressive symptoms occurred.There are two secondary outcomes: GBI scores at age 17 and incidence of MDD in young adulthood.
In total, we curated 385 environmental exposures under the concept of the Equal-Life project 47 from multiple sources and grouped them into 12 domains.Air pollution exposures came from the annual average air quality of each observation station from the Finnish Meteorological Institute.Domains of building, blue and green spaces, population density, and a part of geocoordinates were from Equal-Life enrichment.Their description can be found in a previous study 48 and is presented in Supplementary Note 1. Exposures from prenatal exposures, passive smoking, family and parents, friend and romantic relationships, school and teachers, and stressful life events domains were from FinnTwin12 questionnaires by self-report or parent report and are described in a published review 45 .Social indicators were from Statistics Finland and are described in Supplementary Note 1. Except for FinnTwin12 questionnaires, exposures from other sources were linked to individual twins via EUREF-FIN geocoordinates.The full residential history of the twins from birth onward until 2020 was obtained as geocoordinates and dates of moving in and out of specific addresses from the Digital and Population Data Services Agency in Finland 34 .The types of exposures are continuous, binary, and categorical.Considering the temporality, we included repeated exposures for the critical-period risk model, and Extended Data Fig. 6 presents the timeline of the study.There are three exposure inclusion criteria: (1) twins have available residential history, (2) twins and their family completed at least one questionnaire at any wave, and (3) the percentage of missing values is less than 20% in ExWAS.The code names of each exposure were developed from the description as closely as possible, and their domains, resources, and dates are presented in Supplementary Table 1.The missing patterns of each exposure in each ExWASes are presented in Supplementary Table 7 For analysis of outcomes in young adulthood, we a priori identified seven covariates: sex (male, female), zygosity (monozygotic (MZ), DZ, unknown), parental education (limited, intermediate, high) 49 , smoking (never, former, occasional, current), work status (full-time, part-time, irregular, not working), secondary-level school (vocational, senior high school, none), and age.The latter four variables were reported by twins as young adults (wave 4).For analysis of outcome at age 17, sex, zygosity, parental education, smoking (reported at age 17) remained.Study and working status (neither study nor work, only study, only work) were included when most participants were in school at age 17 (wave 3).The inclusion of covariates, besides sex, zygosity, and age, was based on the previous literature, which shows correlations with the environment and depressive symptoms [50][51][52] .Parental education was adjusted for to represent the family resources and resilience 49 .

Data pre-processing and descriptive statistics
Participants missing information on outcome or covariates were excluded from the corresponding age's analyses.Due to the skewness of the GBI score, we added one to the GBI score and log-transformed it.Appropriate regrouping was conducted for categorical exposures, and then we used multivariate imputation by chained equation to replace the missing values of exposures.As a dimension reduction technique, PCA was utilized to measure the proportion of total variability of all included exposures attributed to each PC and visually assess the potential clusters of exposures (correlated) on the basis of the twodimensional coordinate with the first and second components.It was conducted only for outcomes of GBI at age 17 and in young adulthood, not for the incidence of MDD.

Exposome-wide association study
To conduct the ExWAS, a generalized linear regression model with Gaussian distribution (essentially linear regression) for the outcomes of log-transformed GBI score was repeatedly performed for each exposure.We used Bonferroni correction by the number of effective tests (calculated by PCA) to adjust for multiple testing and account for correlation between exposures 53 .Covariates were adjusted and the cluster effect of sampling based on families of twin pairs was controlled for by the robust standard error.For the outcome of the incidence of MDD, the distribution was switched to be binomial.The number of included exposures of secondary outcomes was smaller due to the third exposure inclusion criteria, and the sample size varied; thus, the P-value thresholds varied.Due to categorical exposures, the number of P values was higher than the number of exposures.We used the rexposome package in the R environment (version 4.2.3) 54.
We further calculated power using the R package WebPower (R environment, version 4.2.3) for ExWASes of the log-transformed GBI score at both age points.These calculations were based on the smallest absolute effect size among significant results (0.12 in young adulthood and 0.10 at age 17), sample size (3,025

Fig. 1 |
Fig.1| Flowchart of the analysis pipeline.Flowchart of the analysis pipeline demonstrating the path from the choice of exposures to ExWAS analysis and ending with bivariate twin modeling.The full path was used for depressive symptoms (GBI) at two ages.Only ExWAS was completed for MDD.

Fig. 2 |
Fig. 2 | Association results between exposure and log-transformed GBI score in young adulthood, adjusted for covariates (individual twin n = 3,025), using generalized linear regression a .a, Manhattan association plot for exposures in relation to log-transformed GBI score in young adulthood.The y axis is showing statistical significance as -log 10 (P value) for the adjustment for multiple testing.b, Forest plot for the adjusted beta for significant exposures in descending order from top to bottom (from harmful to protective).The center

50 Fig. 3 |
Fig. 3 | Bivariate Cholesky AE model for the exposome score and logtransformed GBI score in young adulthood (twin pair n = 846).A, standardized variance of additive genetic effect; E, standardized variance of unique environmental effect.The a and e stand for pathway coefficients from . .The ethics committee of the Department of Public Health of the University of Helsinki and the Institutional Review Board of Indiana University approved the FinnTwin12 study protocol from the start of the cohort.The ethical approval of the ethics committee of the Helsinki University Central Hospital District (HUS) is the most recent and covers the most recent data collection (wave 4) (HUS/2226/2021).The HUS reviews the study annually, and 2023's statement is number 4/2023, dated 1 February 2023.All participants and their parents/legal guardians gave informed written consent to participate in the study.The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.

Extended Data Fig. 2 |
Association results between exposure and incidence of MDD, adjusted for covariates (individual twin n = 1236), using generalized binomial regression.The adjusted covariates were: sex, zygosity, parental education, smoking in young adulthood, work status in young adulthood, secondary level school in young adulthood, and age when twins provided the GBI assessment in young adulthood.Extended Data Fig. 3 | Association results between exposure and logtransformed GBI score at age 17, adjusted for covariates (individual twin n = 4127), using generalized linear regression a .Panel A is a Manhattan association plot for exposures in relation to log-transformed GBI score at age 17.The y-axis is showing statistical significance as -log10(P value) for the adjustment for multiple testing.Panel B presents the adjusted beta for significant exposures in descending order from top to bottom (from harmful to protective).In panel B, the center dot and bar present the effect size (coefficient of linear regression) and 95% confidence interval, and the size of the dots presents the effect size relatively.The color legend applies to both Panel A (Manhattan association plot) and B (forest plot).The adjusted covariates were: sex, zygosity, parental education, smoking at age 17, and study and working status at age 17.Extended Data Fig. 4 | Bivariate Cholesky AE model for the exposome score and log-transformed GBI score at age 17 (twin pair n = 1000).A stands for standardized variance of additive genetic effect.E stands for standardized variance of unique environmental effect.MZ and DZ stand for monozygotic and dizygotic twin pairs, respectively.The a and e stand for pathway coefficients from A and E, respectively, to both the exposome score and log-transformed GBI score.The 95% confidence intervals of standardized variances and pathway coefficients are presented in Extended Supplementary Table
https://doi.org/10.1038/s44220-023-00124-xExtended Data Fig. 6 | Calendar timeline of included exposures and outcomes.The information started to be recorded in 1983 during the pregnancy of twins' mothers and until 2015.

Secondary-level school (young adulthood)
in young adulthood and 4,127 at age 17), number of predictor variables in a single model (8 in young adulthood and 6 at age 17), and significant thresholds (3.09 × 10 −4 in young adulthood and 3.63 × 10 −4 at age 17).The powers were 1 for ExWASes both in young adulthood and at age 17, indicating adequate sample sizes in this study.