The impact of the initial COVID-19 outbreak on young adults’ mental health: a longitudinal study of risk and resilience factors

Few studies assessing the effects of COVID-19 on mental health include prospective markers of risk and resilience necessary to understand and mitigate the combined impacts of the pandemic, lockdowns, and other societal responses. This population-based study of young adults includes individuals from the Neuroscience in Psychiatry Network (n = 2403) recruited from English primary care services and schools in 2012–2013 when aged 14–24. Participants were followed up three times thereafter, most recently during the initial outbreak of the COVID-19 outbreak when they were aged between 19 and 34. Repeated measures of psychological distress (K6) and mental wellbeing (SWEMWBS) were supplemented at the latest assessment by clinical measures of depression (PHQ-9) and anxiety (GAD-7). A total of 1000 participants, 42% of the original cohort, returned to take part in the COVID-19 follow-up; 737 completed all four assessments [mean age (SD), 25.6 (3.2) years; 65.4% female; 79.1% White]. Our findings show that the pandemic led to pronounced deviations from existing mental health-related trajectories compared to expected levels over approximately seven years. About three-in-ten young adults reported clinically significant depression (28.8%) or anxiety (27.6%) under current NHS guidelines; two-in-ten met clinical cut-offs for both. About 9% reported levels of psychological distress likely to be associated with serious functional impairments that substantially interfere with major life activities; an increase by 3% compared to pre-pandemic levels. Deviations from personal trajectories were not necessarily restricted to conventional risk factors; however, individuals with pre-existing health conditions suffered disproportionately during the initial outbreak of the COVID-19 pandemic. Resilience factors known to support mental health, particularly in response to adverse events, were at best mildly protective of individual psychological responses to the pandemic. Our findings underline the importance of monitoring the long-term effects of the ongoing pandemic on young adults’ mental health, an age group at particular risk for the emergence of psychopathologies. Our findings further suggest that maintaining access to mental health care services during future waves, or potential new pandemics, is particularly crucial for those with pre-existing health conditions. Even though resilience factors known to support mental health were only mildly protective during the initial outbreak of the COVID-19 pandemic, it remains to be seen whether these factors facilitate mental health in the long term.


Supplementary Materials 1 Primary Outcome Measures
To verify the underlying factor structure of our primary outcome measures, we used confirmatory factor analysis (CFA) on the NSPN baseline sample (data available: n [K6] = 2376; n [SWEMWBS] = 2368). Analyses were performed using the R package MplusAutomation [Version 0.8] 1 which acts as programming interface to MPLUS, a powerful statistical modelling software primarily known for its latent variable modelling capabilities. 2 We treated item-level data as categorical, analysing the polychoric correlation matrices, and using mean-and variance-adjusted weighted least squares (WLSMV) as estimator. We assessed goodness-of-fit using comparative as well as absolute fit indices. [3][4][5] More specifically, we used Comparative Fit Index (CFI) as well as Tucker-Lewis Index (TLI) and considered values of ≥0.95 as good fit. We further assessed Standardised Root Mean Squared Residual (SRMR) and Root Mean Squared Error of Approximation (RMSEA) with a cut-off of <0.08, preferably lower than that. Chi-square test results are also reported in the tables below. We did not, however, rely on chi-square test results to assess model fit due their sensitivity to sample size. We further examined modification indices where appropriate to identify potential improvements in model fit. Reliability was calculated using coefficient omega. Please note that guidelines of model fit indices were developed under normal-theory maximum likelihood with continuous data. To date, there is a lack of investigation into appropriate fit indices for models with categorical data and applying conventional cut-offs can sometimes indicate better model fit than is truly the case. 6 Exercising caution as well as assessing a range of different fit indices when evaluating approximate fit with categorical data is therefore of importance.      wellbeing Figure 5: Standardised factor loadings of SWEMWBS items.
Confirmatory Factor Analysis (CFA) Table 1 shows reasonably good model fit for both the K6 as well as the SWEMWBS. We carefully examined modification indices (MI), adding covariance between the error terms where appropriate, but also limiting such modifications to an absolute minimum due to the small number of items per scale. Modification indices for the one-factor K6 model indicated that model fit would be improved by adding covariance between the error terms for item 1 (feeling nervous) and item 3 (feeling restless; MI = 52.57), both referring to similar anxiety-related symptoms. Similarly, adding covariance between the error terms for item 1 (feeling optimistic) and item 2 (feeling useful) within the one-factor SWEMWBS model would significantly improve model fit (MI = 326.89). Fit and reliability indices for both unmodified and modified models are displayed in Table 1. Overall, both modified models show reasonable, although not perfect, fit to treat both scales as unidimensional constructs for the purpose of our study.

Pandemic-related Risk Factors
Pre-existing Health Conditions: Do you have any of the following medical conditions? (Note: Tick those that apply; references to clinically-diagnosed refers to any diagnosis by a healthcare professional.) • Resilience factor scores were estimated based on the full information sample at baseline following a similar methodological approach as discussed by Fritz and colleagues. [7][8][9][10] We used confirmatory factor analysis (CFA) to assess factorial validity as well as reliability for each scale. Where appropriate, for instance, where little consensus about a scale's factor structure exists, we compared several models to determine the model of best fit within our sample. We followed the same procedure as discussed in Supplementary Materials 1, i.e., treating item-level data as categorical, analysing polychoric correlation matrices, and using mean-and variance-adjusted weighted least squares (WLSMV) as estimator. If several models were tested, we used maximum likelihood estimation to calculate Akaike's (AIC) and Bayesian Information Criterion (BIC) for model selection. The following sections provide an overview of performed analyses including a list of items used for each resilience factor.

Alabama Parenting Questionnaire (APQ)
The Alabama Parenting Questionnaire was originally developed by Paul Frick in the early 1990s measuring five dimensions of parenting including parental involvement (10 items), positive parenting (6 items), poor supervision (10 items), inconsistent discipline (6 items), and corporal punishment (3 items). 11 The scale deployed within the NSPN 2400 cohort study is a short version of the original 35-item version. It consists of nine items of the short form reported by Elgar and colleagues as well as the entire original corporal punishment scale and three items from the parental involvement scale. 12 Exploratory as well confirmatory factor analyses commonly show support for a fivefactor solution that is largely consistent with the a priori scale structure, however, four-factor solutions where items of the parental involvement and positive parenting load onto the same factor have also been reported. 13 We were particularly interested in the latter two sub-scales as potential resilience factors for our network analysis as both factors have been reported as modifiable resilience factors in Fritz and colleagues' systematic review but have not been included in further investigations. 7 To examine whether items of these two sub-scales should be treated as two distinct resilience factors, or whether they could be combined into one positive and involved parenting resilience factor, we tested both a four-(Model A) and five-factor (Model B) solution.
Both AIC and BIC were slightly lower for the five-factor solution. Model fit indices were similar for both the four-and five-factor solution. Total omega was above 0.80 for both models (Model A: ω total = 0.84; Model B: ω total = 0.81). In the five-factor model, omega coefficients were 0.20 and 0.14 for the positive parenting and parental involvement sub-scale respectively. More variance, however, was captured in the fourfactor model where both sub-scales were combined into one factor and omega increased to 0.49. Given the high correlation of both sub-scales in the five-factor model (r = 0.87) as well as the higher variance captured within the four-factor model, we decided to estimate individual factor scores based on the latter, i.e., we included a combined resilience factor which we labelled 'positive and involved parenting' rather than treating

Cambridge Friendship Questionnaire (CFQ)
To assess friendship support, we used five items of the Cambridge Friendship Questionnaire. 15 These were the same items as used by Fritz and colleagues. [8][9][10] We estimated individual factor scores based on a fitted one-factor model with added covariance between the error terms for item 1 ("Are you happy with the number of friends you have got at the moment?") and item 5 ("Overall, how happy are you with your friendships?") due to their similar wording. The model showed very good comparative fit and reliability but rather poor absolute fit (n = 2367;

Family Assessment Device (FAD)
The original McMaster Family Assessment Device assesses family functioning on seven different sub-scales including problem solving (5 items), communication (6 items), roles (8 items), affective responsiveness (6 items), affective involvement (7 items), behaviour control (9 items), and general functioning (12 items). 16 The general functioning subscale, however, has since been evaluated as a single index measure and has also been deployed as such in the NSPN 2400 cohort study. Fritz and colleagues have previously used this scale to assess family support (5 items) and family cohesion/climate (7 items). [8][9][10] We examined model fit for a one-factor solution (Model A), a two-factor solution as used by Fritz and colleagues (Model B) as well as a bi-factor model with two specific factors (Model C). AIC and BIC were lowest for the bi-factor model (Model C) which also showed good comparative fit and reasonable absolute fit. McDonald's hierarchical omega was high (ω hierarchical = 0.90; ω total = 0.95). We therefore estimated individual estimates for general family functioning based on this model.

Rosenberg Self-Esteem Scale (RSES)
We assessed self-esteem using the Rosenberg Self-Esteem Scale which measures positive (5 items) and negative (5 items) feelings about the self. 17 Even though psychometric properties of the scale have been widely researched, there is little agreement about its factorial validity. Most commonly reported factor structures include one-factor or two-factor solutions. Fritz and colleagues established two one-factor models for positive and negative self-esteem for their network models, however, noted that both factors measure topologically similar concepts. 8-10 Over the last years, bi-factor solutions measuring global self-esteem with two method factors have shown promising results. 18 We therefore tested four different models including a one-factor model (Model A), a two factor model with uncorrelated (Model B) and correlated (Model C) factors as well as a bi-factor model with two specific method factors (Model D).
Both Model C and Model D showed similar AIC and BIC values as well as similar comparative and absolute model fit. Conceptually, however, a global measure of selfesteem seems more appropriate. A general factor explained the majority of the variance within our sample (ω hierarchical = 0.84; ω total = 0.96) with specific factors showing rather low reliability. Our results are in line with previous findings comparing more conventional models with bi-factor models. It has since been suggested that the RSES is likely heavily influenced by methods effects due to item wording. 18 We estimated factor scores based on the bi-factor model, using a global self-esteem factor rather than two distinct factors measuring positive and negative aspects of the concept.

Summary Descriptives
As mentioned in the introduction to this section, we used the full information sample at baseline to estimate resilience factor scores (n > 2300). Exact sample sizes used to estimate individual resilience factors have been reported alongside CFA results in the preceding subsections. For the purpose of this study, however, we were particularly interested in individuals who participated in all assessments and completed either of the primary outcome measures at all four time points. Below we present summary statistics of computed resilience factor scores for both longitudinal analysis samples. We further list all items used to compute these scores. Note that all resilience factors have been coded so that higher scores represent higher resilience. Note: All resilience factors are coded so that higher scores indicate higher protection. Abbreviations: APQ = Alabama Parenting Questionnaire, ABC = Antisocial Behaviour Checklist, CFQ = Cambridge Friendship Questionnaire, FAD = Family Assessment Device, RSES = Rosenberg Self-Esteem Scale. • Your parents ask you about your day in school. [PI] • Your parents help you with your homework. [PI] • Your parents compliment you when you have done something well. [PP] • Your parents praise you for behaving well. [PP] Antisocial Behavioural Checklist [4-point Likert scale]

Items of interest
• I deliberately damaged property (e.g., broke windows, wrote graffiti, started fires).
• I deliberately hurt or threatened someone (e.g., bullying or fighting).
• I have carried or used a weapon in a fight (e.g., a knife or a stick).
• I have deliberately hurt or been cruel to an animal (e.g., a pet). • I was satisfied with myself.
• I felt I had a number of good qualities.
• I was able to do things as well as most people.
• I felt I did not have much to be proud of. [R] • I certainly felt useless at times. [R] • I felt that I was as good as anyone else.
• I wished I could have more respect for myself. [R] • I felt that I was a failure. [R] • I took a positive attitude towards myself.

Item-Level Resilience Factors
We further included two item-level resilience factors. Expressive suppression was measured by one item ("You hide your feelings or emotions from others"; 3-point Likert scale) from the Antisocial Process Screening Device. 19 The same item has been used by Fritz and colleagues. [8][9][10] Ruminative brooding was measured by one item from the Leyton Obsessional Inventory. 20 Please note that one further item of this scale has been used by Fritz and colleagues as well as a further five items from a scale not available in the NSPN 2400 cohort. As both items of the Leyton Obsessional Inventory strongly correlated, we chose the slightly more general worded of two available items ("I worried a lot if I did something not exactly the way I liked." [chosen] compared to "I kept thinking about things that I had done because I wasn't sure that they were the right things to do."; 4-point Likert scale). Item-level data for both expressive suppression and ruminative brooding were not standardised but entered as categorical variables in the network models. As previously, reported all resilience factors, including item-level factors, were coded so that higher scores represent higher resilience.

Supplementary Materials 4c
Additional Descriptives Figure 6 and Figure 7 provide a breakdown of pre-existing health conditions as well as pandemic-related adverse experiences; both risk factors have been binarised for linear regression analyses. Figure 6 shows that the majority of participants reported clinically diagnosed anxiety and/or depression. Of those who reported more than one pre-existing health condition, anxiety and depression were most often reported together. Pandemicrelated adverse experiences were driven by a major cut in household income and/or loss of job followed by losing someone close due to COVID-19 or any other cause. The exact wording of items has been provided in Supplementary Materials 2.

Regularised Partial Correlation Networks
We estimated network models separately for the pandemic-related distress (K6) and mental wellbeing (SWEMWBS) response as measured by the respective extended residuals. Both models are visualised as network graphs in the article ( Figure 5) whereas nodes (circles or squares) represent variables, in our case, the extended residual of interest and resilience factors, and edges (lines) represent conditional dependencies, estimated as partial correlations, between two respective variables.
A key element of network estimation is not only the identification of conditional dependencies, but also the identification of edges that are truly zero. A fully connected network, for instance, is hard to interpret and not very helpful in gaining insight into predictive and potentially causal relationships. To estimate a parsimonious and interpretable network, we used the R-package qgraph [version 1.6.9]. 22 We estimated both networks via the Extended Bayesian Information Criterion (EBIC) graphical least absolute shrinkage and selection operator (LASSO) using polychoric correlations for item-level resilience factors (i.e., low expressive suppression/low ruminative brooding, see Supplementary Materials 3). 23 The graphical LASSO sets edges that are likely to be spurious to exactly zero, resulting in a sparse network. 24 It estimates a collection of networks where the optimal network is chosen by minimising the model selection criterion, i.e., EBIC. The latter requires using a hyperparameter gamma (γ) which controls the degree to which simpler models are preferred; it is usually set between 0-0.5 where higher values indicate simpler models. We used default settings of γ = 0.5, ensuring a more conservative approach.
Focusing on the network structure as a whole within the article, we here present additional analyses of centrality indices including node betweenness, closeness, and strength.
Such centrality indices can provide information of relative importance of nodes within a specific network structure. Betweenness, for instance, measures how often a specific node acts as bridge along the shortest path between two other nodes. Closeness, on the other hand, acts as an indicator of how close a node is to other nodes within the network. Node strength is measured by the sum of edge weights connected to the node. The use of centrality indices, however, is increasingly de-emphasised as its underlying assumptions, originating from social network theory, may not necessarily correspond to relationships between psychological variables; therefore considerable care is needed in the interpretation of results. 25 We further used non-parametric bootstrapping (N boot = 2000) to assess the robustness of the estimated networks parameters and descriptive statistics. Accuracy and stability checks were conducted using the R-package bootnet [version 1.4.3]. 26 For further details, we would like to refer the reader to Sacha Eshkamp and Eiko Fried's tutorial on regularised partial correlation network. 27 In the following pages, we report:  . Nodes surrounded by a circle denote factor scores, nodes surrounded by a square denote single item-level (categorical) scores. Edge colour refers to either positive (blue) or negative (red) relations. Note that within the faded network the colour fades the weaker the edge weight is. Note: We used the cor auto function of the R-package qgraph to automatically compute the appropriate correlation matrix based on polychoric and Pearson correlations. For a legend, see Figure 8. : Lasso regularised network including pre-pandemic resilience factors and pandemic-related distress response measured by K6 extended residual scores where higher scores indicate higher-thanexpected psychological distress during the first national lockdown compared with expected levels over seven years (n = 632). All resilience factors are coded so that higher scores indicate higher protection. . Nodes surrounded by a circle denote factor scores, nodes surrounded by a square denote single item-level (categorical) scores. Edge colour refers to either positive (blue) or negative (red) relations. Note that within the faded network the colour fades the weaker the edge weight is. Note: For a legend, see Figure 9.
Summary No. 1: The association network in Figure 8 shows that all resilience factors are positively (blue edges), and mostly strongly (edge thickness), correlated. The negative correlations (red edges) between resilience factors and pandemic-related distress response as measured by the K6 extended residuals mean that higher-than-expected psychological distress was related to lower resilience scores and vice versa. The lasso regularised network model is discussed in detail within the main article.  Figure 10: Accuracy of edge-weights and their non-parametric bootstrapped 95% confidence intervals for the lasso regularised network including pre-pandemic resilience factors and pandemic-related distress responses as measured by the K6 extended residuals. Edges of the network are displayed on the yaxis, ordered from lowest (bottom) to highest (top) edge (for a legend of variable labels, see Figure 8 or Figure 9). The grey area around the red (observed sample) and black (bootstrapped means) dots indicates the confidence interval; the more reliable an edge, the smaller its confidence interval.

Summary No. 2:
As can be seen in Figure 10, the confidence intervals for many of the estimated edgeweights overlap, suggesting that these edge-weights likely do not significantly differ from each other which, in turn, is confirmed by the edge-weight difference test depicted in Figure 11. Whilst the confidence intervals are sizeable in some cases (indicating that their order should be interpreted with care), there is little overlap between the intervals of negatively and positively weighted edges, indicating that the sign of these effects is robust within the data.  Figure 12: Centrality indices shown as standardised z -scores for the lasso regularised network including pre-pandemic resilience factors and pandemic-related distress responses as measured by the K6 extended residuals; for a legend of variable labels, see Figure 8 or Figure 9; for standardised raw scores, see Table 13; for centrality stability, see Figure 13. Note: For a legend of variable labels, see Figure 8 or Figure 9. Centrality indices show that some nodes differ quite substantially in their estimates. Self-esteem, for instance, has the highest betweenness, closeness, and strength. Using case-dropping subset bootstrap, we can further examine their stability, otherwise they would be difficult to interpret. Figure 13 shows that node strength seems to be the most stable estimate of the three assessed centrality indices. This is confirmed by the correlation stability coefficient (CS -coefficient). The CS -coefficient expresses the highest proportion of cases which can be discarded whilst retaining a correlation to the measured centrality above a given threshold, here using the default of 0.7, with 95% confidence. Preferably, it should be above 0.5. 26 Node betweenness (CS (cor = 0.7) = 0.44) was below this threshold, node closeness (CS (cor = 0.7) = 0.52) only just met this relatively arbitrary cut-off and should be interpreted with care, node strength (CS (cor = 0.7) = 0.75), however, was high, suggesting sufficient stability.  Nodes surrounded by a circle denote factor scores, nodes surrounded by a square denote single item-level (categorical) scores. Edge colour refers to either positive (blue) or negative (red) relations. Note: We used the cor auto function of the R-package qgraph to automatically compute the appropriate correlation matrix based on polychoric and Pearson correlations. For a legend, see Figure 14.  Note: For a legend, see Figure 15.

Summary No. 1:
Similar to what has been discussed in Supplementary Materials 6a, the association network in Figure 14 shows that all resilience factors are positively (blue edges), and mostly strongly (edge thickness), correlated. The negative correlations (red edges) between resilience factors and pandemic-related mental wellbeing response as measured by the SWEMWBS extended residuals mean that lower-than-expected mental wellbeing was related to higher resilience scores and vice versa. The lasso regularised network model is discussed in detail within the main article.  Figure 16: Accuracy of edge-weights and their non-parametric bootstrapped 95% confidence intervals for the lasso regularised network including pre-pandemic resilience factors and pandemic-related mental wellbeing responses as measured by the SWEMWBS extended residuals. Edges of the network are displayed on the y-axis, ordered from lowest (bottom) to highest (top) edge (for a legend of variable labels, see Figure 14 or Figure 15). The grey area around the red (observed sample) and black (bootstrapped means) dots indicates the confidence interval; the more reliable an edge, the smaller its confidence interval.  Figure 17: Differences of edge-weights for the lasso regularised network including pre-pandemic resilience factors and pandemic-related mental wellbeing responses as measured by the SWEMWBS extended residuals. Edges are listed on both the x -and y-axis (for a legend of variable labels, see Figure 14 or Figure 15); grey boxes indicate edges that are not significantly different from each other whilst black boxes indicate edges that are significantly different from each other (significance level α = 0.05; please note that the test does not control for multiple comparisons). The diagonally coloured boxes refer to the edge colour of the network graph where positive relations are blue and negative relations are red.

Summary No. 2:
Similar to what has been discussed in Supplementary Materials 6a, the confidence intervals for many of the estimated edge-weights overlap (see Figure 16) This suggests that these edge-weights likely do not significantly differ from each other which, in turn, is confirmed by the edge-weight difference test depicted in Figure 17. Further it can be seen that the edge between self-esteem (slf) and the extended residual (wlb) differs significantly from any other edge, hence, showing high accuracy, particularly when compared to the other two edges connecting the extended residual, i.e., family functioning (fam) and positive and involved parenting (pip).  Figure 18: Centrality indices shown as standardised z -scores for the lasso regularised network including pre-pandemic resilience factors and pandemic-related mental wellbeing responses as measured by the SWEMWBS extended residuals; for a legend of variable labels, see Figure 14 or Figure 15; for standardised raw scores, see Table 16; for centrality stability, see Figure 19. Note: For a legend of variable labels, see Figure 14 or Figure 15. Centrality indices show that some nodes differ quite substantially in their estimates. Again, similar to what has been discussed in Supplementary Materials 6a, self-esteem, for instance, has the highest betweenness, closeness, and strength. Using case-dropping subset bootstrap, we further examined their stability. Figure 19 shows that node strength seems to be the most stable estimate of the three assessed centrality indices, however, both betweenness as well as closeness appear relatively stable as well. This is confirmed by the correlation stability coefficient   . Nodes surrounded by a circle denote factor scores, nodes surrounded by a square denote single item-level (categorical) scores. Edge colour refers to either positive (blue) or negative (red) relations. Note that within the faded network the colour fades the weaker the edge weight is. Note: We used the cor auto function of the R-package qgraph to automatically compute the appropriate correlation matrix based on polychoric and Pearson correlations. For a legend, see Figure 20.

Supplementary Materials 7b
Further Correlational Analysis (SWEMWBS) Observed SWEMWBS Association Network  . Nodes surrounded by a circle denote factor scores, nodes surrounded by a square denote single item-level (categorical) scores. Edge colour refers to either positive (blue) or negative (red) relations. Note that within the faded network the colour fades the weaker the edge weight is. Note: We used the cor auto function of the R-package qgraph to automatically compute the appropriate correlation matrix based on polychoric and Pearson correlations. For a legend, see Figure 21.