Phenotypically independent mental health profiles are genetically related

1 Phenotypically independent mental health profiles are genetically related 1 Daniel Roelfs1,*, MSc, Dag Alnæs1, PhD, Oleksandr Frei1, PhD, Dennis van der Meer1,2, PhD, Olav B. 2 Smeland1, PhD, Ole A. Andreassen1, PhD, Lars T. Westlye1,3, PhD, Tobias Kaufmann1,*, PhD 3 4 1 NORMENT, KG Jebsen Centre for Neurodevelopmental Disorders, Division of Mental Health and 5 Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway 6 2 School of Mental Health and Neuroscience, Faculty of Health, Medicine and Life Sciences, Maastricht 7 University, Maastricht, The Netherlands 8 3 Department of Psychology, University of Oslo, Oslo, Norway 9 10 * Correspondence: Daniel Roelfs & Tobias Kaufmann, Ph.D. 11 Email: daniel.roelfs@medisin.uio.no, tobias.kaufmann@medisin.uio.no 12 Postal address: OUS, PO Box 4956 Nydalen, 0424 Oslo, Norway 13 Telephone: +47 23 02 73 50, Fax: +47 23 02 73 33 14 15 Counts: Main: 3545 words | Abstract: 244 words | Tables: 0 | Figures: 4 | 16 Supplementary Tables: 4 | Supplementary Figures: 6 17 18


Introduction 42
Psychiatric disorders are highly polygenic, exhibiting a multitude of significantly associated genetic variants 43 with small effect sizes. Recent large-scale genome-wide association studies (GWAS) have identified a large 44 number of single-nucleotide polymorphisms (SNP) associated with psychiatric disorders such as 45 schizophrenia (SCZ) 1 , bipolar disorder (BD) 2 , major depression (MD) 3 , attention deficit hyperactivity 46 disorder (ADHD) 4 , autism spectrum disorders (ASD) 5 , post-traumatic stress disorder (PTSD) 6 , and anxiety 47 (ANX) 7 . In addition to substantial polygenicity, previous findings have documented genetic overlap 48 between disorders 8-11 , even in the absence of genetic correlations as recently demonstrated for schizophrenia 49 and educational attainment 12,13 . Adding to the complexity, psychiatric disorders also overlap with multiple 50 complex traits, such as BMI 14 and cardio-metabolic diseases 15 . Taken together, the landscape of current 51 psychiatric genetics suggests highly complex patterns of associations and unclear specificity for many 52 common psychiatric disorders. 53 54 While GWAS studies have allowed to disentangle parts of the genetic architecture of psychiatric disorders, 55 these methods alone are not sufficient to answer some of the challenges posed in psychiatric genetics. One 56 of those challenges is the lack of clinical demarcation between psychiatric disorders. For example, patients 57 with the same diagnosis may not necessarily exhibit common symptoms 16 and patients with different 58 diagnoses may show highly overlapping clinical phenotypes 17 . The notion that mental disorders like 59 schizophrenia and bipolar disorders reflect biologically heterogeneous categories is also supported by 60 neuroimaging studies 18,19 . Nonetheless, a majority of large-scale genetic studies use a classical case-control 61 design based on a categorical operationalization of disease without stratifying other measures such as 62 symptoms, functioning or symptom severity. Likewise, control groups are rarely screened for subthreshold 63 symptoms. For example in the case of psychosis, approximately 6% of the general population are reported 64 to have a psychotic experience in their lifetime, and only a minority of that group will develop a diagnosed 65 psychiatric illness such as schizophrenia or bipolar disorder 20 . Finally, the likelihood of inducing selection 66 bias when drawing cases and controls from different populations are high and may impose confounds in 67 case-control designs 21 . Thus, whereas studies using the classical case-control design have been instrumental 68 and produced a strong body of discoveries in psychiatric genetics, these designs have limitations that may 69 prevent us from discovering signal more closely related to clinical characteristics of the disorder. In addition, 70 case-control designs require immense effort and resources given that the high polygenicity of common 71 psychiatric disorders requires vast sample sizes to detect effects 22 . 72 73 Recent large-scale population level efforts such as the UK Biobank 23 now provide alternatives for the study 74 of psychiatric disorders. The mental health data available in UK Biobank includes data from more than 75 150,000 individuals and covers questions on current and previous symptoms in different psychiatric 76 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted May 19, 2020. ;https://doi.org/10.1101https://doi.org/10. /2020 the National Health Service National Research Ethics Service (ref. 11/NW/0382). Participants with a 112 diagnosed psychiatric or neurological disorder (F or G ICD10 diagnosis) were excluded from the analysis 113 except for those with a nerve, nerve root and plexus disorders (categories G50 to G59). In addition, we 114 excluded participants with more than 10% missing answers in the mental health questionnaires. health online assessment, we removed questions that asked specifically about symptoms occurring in the 123 past two weeks to remove potential short-term temporal effects. Furthermore, we excluded questions where 124 more than 10% of the responses were missing (1 question excluded). In the resulting set of 43 questions 125 (Suppl . Table 1), we imputed missing data using k-nearest neighbor imputation with k = 3 with the bnstruct 126 package 30 in R 31 and z-standardized the data (Suppl. Fig. 2). 127

128
The resulting data covering 43 questions from 136,678 individuals was decomposed using independent 129 component analysis (ICA). Using icasso 32 in MATLAB and by visually inspecting the loadings of the 130 questions on the components, we estimated that a model order of 13 independent components yields the 131 best clustering solution where the resulting components are stable and highly interpretable. The PCA 132 identified 13 components with an eigenvalue larger than 1, and stability (Iq) was effectively 1. A model 133 order lower than 13 would group together questions into components which we preferred to keep separate. 134 A model order larger than 13 was not reasonable as it would yield components that largely reflect single 135 items. The individual scores for each of the 43 questions were subsequently residualized for age (both linear 136 and quadratic term), sex, and the first 20 genetic principal components. Next, we decomposed the items into 137 13 independent components using the fastICA algorithm as implemented in R 33 . Fig. 1B depicts how each 138 of the 43 items loaded on the components, indicating independent components (ICs) that captured questions 139 on sexual abuse (IC1), psychosis (IC2), anxiety, depression and mental distress (IC3), a diagnosis with a 140 life-threatening illness (IC4), social instability (IC5), traumatic experiences (IC6), stress in the past month 141 (IC7), experiences of feeling loved (IC8), thoughts around self-harm behavior (IC9), general happiness 142 (IC10), addiction behavior and manic experiences (IC11), experiences of emotional abuse (IC12), and 143 alcohol abuse (IC13). 144 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  loadings of IC1, IC2, IC5, IC9, IC10,   IC11, and IC12 were inverted so that all components showed the same direction of effect (higher component score indicating increased scoring on the items). All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted May 19, 2020. ;https://doi.org/10.1101https://doi.org/10. /2020 7 The distribution of IC2 indicated very few non-zero scores (Suppl. Fig. 3). This component loaded mostly 146 on psychosis questions (Fig. 1B), indicating that only few of the included healthy individuals had symptoms 147 in this domain. We therefore conducted an additional supplemental analysis in which we dichotomized IC2 148 such that loadings lower than 1 were labeled as "no/few symptoms", and loadings equal to or higher than 1 149 were labeled as "with symptoms". 150

151
Processing of genetic data 152 From the UK Biobank v3 imputed genetic data, we removed SNPs with an imputation quality score below 153 0.5, with a minor allele frequency below 0.001, missing in more than 5% of individuals, and that failed the 154 Hardy-Weinberg equilibrium test at p < 1e-9. We removed also individuals with more than 10% missing 155 data. We performed a genome-wide association analysis (GWAS) on each of the 13 independent 156 components in PLINK 2 34,35 . Using a publicly available conversion toolbox for GWAS summary statistics 157 (github.com/precimed/python_convert), we removed the MHC region and calculated a z-score for every 158 SNP (8,165,726 SNPs after QC). We utilized linkage-disequilibrium score regression 10,36 to estimate genetic 159 correlations between each of the independent components, and between the components and publicly 160 available GWAS summary statistics for SCZ 1 , BD 2 , MD 37 , ADHD 4 , ASD 5 , PTSD 6 , ANX 7 , as well as 161 intelligence 38 , and educational attainment 39 (Suppl. Table 2). For all aforementioned GWASs, we used those 162 versions that did not have UK Biobank participants included. From the MD GWAS, we removed 163 participants from the 23andMe dataset as well, leaving only cases with a diagnosed major depressive 164 disorder (MDD). Prior to estimating genetic correlations, we set a threshold that only ICs with a heritability 165 1.96 times larger than its standard error should be included in the analysis and only those where visual 166 quality control of corresponding Q-Q plots indicated genetic signal. These quality control steps were 167 implemented to ensure that we did not make inferences on data that did not provide sufficient variance 168 explained by genetics. Partitioned heritability 40 was estimated using the LDSC toolbox 36 and Q-Q plots were 169 generated using custom scripts in R. Finally, we processed the GWAS summary statistics of each Code and GWAS summary statistics will be made publicly available via GitHub (github.com/norment) upon 176 acceptance of the manuscript. Furthermore, the derived independent components (individual level data) will 177 be made available to the UK Biobank upon acceptance (derived variable return) to allow its use in future 178 UK Biobank studies. 179 180 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  . Table 3 for additional statistics). Heritability 182 was generally low, yet all components yielded a heritability that was higher than 1.96 times the standard 183 error. IC13, capturing questions on alcohol abuse had the highest heritability (h2 = 0.0763, SE = 0.0055), 184 closely followed by IC3, capturing anxiety, depression, and mental distress (h2 = 0.0744, SE = 0.0052). The 185 lowest heritability among the components was for IC2, reflecting psychosis questions (h2 = 0.0089, SE = 186 0.0043), likely owing to the low number of individuals with psychosis symptoms (Suppl. Fig. 3). We 187 therefore performed a supplemental analysis to investigate if dichotomization of this IC would benefit the 188 analysis (Suppl. Fig. 4). In brief, as dichotomization only slightly improved heritability estimates, we kept 189 194 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Fig. 3. Genetic correlation between the independent components and disorders and cognitive traits
For each disorder, the associations with ICs are sorted by decreasing absolute genetic correlation such that the most leftward box reflects the strongest association between a given disorder and the 13 ICs.
Numbers in brackets under each IC label denote the genetic correlation (rg). Size of the boxes reflect the standard error. Significant correlations (p < FDR) are indicated with a black border.
200 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted May 19, 2020. ; https://doi.org/10. 1101/2020 We assessed genetic correlations between each of the 13 ICs and a set of psychiatric disorders as well as 201 cognitive traits. Out of 117 comparisons, 70 were significant after FDR correction, which amounts to 60%. 202 Fig. 3 depicts all genetic correlations with ICs, sorted separately for each disorder or cognitive trait (sorted 203 by absolute genetic correlation). We found that in most cases the strongest genetic correlation was with the 204 IC most closely related to that disorder or trait. For example, anxiety most strongly correlated with IC3, 205 which reflects anxiety, depression, and mental distress (genetic correlation rg = 0.70, pFDR < .00027). SCZ 206 was most highly correlated with IC2, which represents psychosis questions (rg = 0.54, pFDR = .001). The 207 highest genetic correlation of BD was with IC11, which represents addiction and mania (rg = 0.5, pFDR = 208 6.5e-12). For PTSD, the component reflecting traumatic experience (IC6) only ranked sixth among the 209 sorted associations, yet the two ICs showing strongest association with PTSD reflected anxiety, depression, 210 and mental distress (IC3; rg = 0.53, pFDR = .0017) and diagnosed with life-threatening illness (IC4; rg = 0.51, 211 pFDR = .080), both of which are closely related to PTSD. ASD correlated strongest with IC2 (reflecting 212 psychosis; rg = 0.40, pFDR = .031) and ADHD correlated strongest with IC8 (Felt loved; rg = -0.51, pFDR = 213 4.7e-21). Educational attainment and intelligence were both strongest negatively correlated with the IC 214 reflecting social instability (IC5, rg = -0.74 and rg = -0.76, respectively; both pFDR < 2.5e-74). In general, the 215 strongest associations among all ICs, either positive or negative were with MDD while the weakest 216 associations were with educational attainment. 217 218 Next, we assessed the genetic correlations between the ICs. Independent components are statistically 219 independent by design, and thus on the phenotype level the ICs were not correlated with each other (Fig. 4, 220 lower half; correlations essentially zero). However, approximately half of the IC pairs were nonetheless 221 significantly genetically correlated with each other (51%, p < FDR). IC3 (anxiety, depression, mental 222 illness) was genetically correlated with 10 other ICs. IC9 (self-harm) was correlated with 9 other ICs and 223 IC6 (traumatic experiences) and IC8 (felt loved) were each genetically correlated with eight other ICs. IC11 224 (addiction/mania) and IC12 (emotional abuse) were each genetically correlated with seven other ICs. IC1 225 (sexual abuse) and IC5 (social instability) were both genetically correlated with six other ICs. IC2 226 (psychosis) was correlated with 5 other ICs. IC4 (diagnosed with life-threatening illness) and IC13 (alcohol 227 abuse) were both genetically correlated with 4 other ICs. And IC7 (stress last month) and IC10 (general 228 happiness) were both genetically correlated with 3 other ICs. No IC had no significant genetic correlations 229 with other ICs. The analysis therefore revealed a large amount of genetic correlations despite statistical 230 (phenotypic) independence of the symptom profiles. 231 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  In the present study, we decomposed mental health questionnaire data from more than 130,000 individuals 234 into phenotypically distinct mental health profiles (independent components). We found that variations in 235 mental health in healthy individuals (without a neurological or psychiatric diagnosis) were genetically 236 correlated with psychiatric disorders and cognitive traits. Strongest correlations were observed between 237 components and disorders with known symptoms in a similar domain (e.g. psychosis symptoms with 238 schizophrenia), but the large amount of significant correlations between disorders and mental health profiles 239 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted May 19, 2020. ; https://doi.org/10. 1101/2020 suggested limited specificity. Indeed, we found a large proportion of significant genetic correlations 240 between the phenotypically uncorrelated profiles, suggesting overlapping genetic architectures underlying 241 distinct symptoms. The implications of our findings are twofold. First, our results support pleiotropy in 242 psychiatric disorders beyond overlapping symptoms (e.g. BD and MDD both involving depressive 243 episodes), suggesting that even distinct psychiatric symptoms are genetically overlapping. Second, our 244 findings support that normal variability in mental health within healthy individuals may inform the study of 245 the biology of psychiatric disorders. 246 247 While pleiotropy between major psychiatric disorders has been widely established 9-11 (reproduced in Suppl. 248 Fig. 1), the sources underlying pleiotropy remain largely unknown. Specifically, disorders oftentimes 249 overlap in symptomatology and therefore the degree to which the observed genetic correlations between 250 disorders reflect phenotypic overlap between disorders remains to be investigated. Our approach of 251 decomposing mental health data into distinct profiles allowed us to study genetic correlations in a sample 252 with known phenotypic correlations and to assess how these profiles correlate with the genetics of different 253 diagnoses. We observed that most disorders correlated strongest with the independent components capturing 254 a related phenotype. For example, the strongest association with IC3, which reflects variance in anxiety, 255 depression, and mental distress, was with ANX, the strongest association with IC2 (psychosis) was with 256 SCZ. Therefore, the ranking of association strengths suggested a certain degree of specificity. However, 257 that degree was strongly limited as most of the disorders and components were significantly genetically 258 correlated. For example, MDD showed significant correlations between all but one component, ASD 259 correlated with all but four components, and ANX and ADHD were correlated with all but 5 components, 260 though correlation strengths were overall lower than with MDD, possibly due to lower sample size. There 261 were also significant associations between components and cognitive traits although overall weaker 262 associations compared to those with disorders. About half of the genetic correlations with intelligence and 263 educational attainment pointed in the opposite direction, considerably more than for the psychiatric 264 disorders, reflecting higher cognitive ability with fewer psychiatric symptoms. Importantly, when looking 265 at the correlations between mental health profiles, we found that almost half of the genetic correlation matrix 266 between ICs yielded significant genetic correlations despite a lack of phenotypic correlations (independence 267 of the components). This suggests that some of the same genes are involved in the genetics of distinct mental 268 health profiles and may indirectly support pleiotropy independent of phenotypic overlap in psychiatric 269 disorders. Whereas more research is needed before conclusions on the sources underlying the observed 270 pleiotropy can be drawn, one possible explanation for the significant correlations in the ICs could be that, 271 since all independent components each capture a facet of mental health, there may be a number of SNPs 272 that are involved across mental health symptoms. These SNPs may be involved in overall mental health, 273 from psychological well-being to psychosis symptoms. Our analysis of significant SNPs in FUMA did not 274 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Notable strengths of the present study include the use of data-driven decomposition of mental health data in 282 a large sample of healthy individuals and its application to study pleiotropy in psychiatric genetics. Its main 283 limitations include the low heritability of the resulting independent components, and the limited number of 284 individuals with psychosis symptoms yielding suboptimal distribution in IC2 (Suppl. Fig. 3). First, it is 285 important to note that all ICs passed quality control. Heritability of all ICs exceeded our pre-defined 286 heritability threshold of 1.96 times its standard error, and all Q-Q plots indicated genetic signal (Suppl. Fig.  287 5). Furthermore, low heritability can still produce good genetic signal as a result from a low number of 288 genetic variants involved but where each has large effects 13 . For example, while IC2 had the lowest 289 heritability among the ICs, it showed one of the strongest genetic signal and together with IC7 and IC8 it 290 ranked second in terms of the number of loci discovered in FUMA, following IC13 (alcohol abuse) that 291 showed the highest heritability, strongest genetic signal on the Q-Q plot and the largest number of significant 292 loci and mapped genes. Second, although sample size and symptom distributions factored into the results, 293 these are mostly reflected in the standard error of genetic associations, not in a lack of effect. For example, 294 ANX 7 (n = 21,761) and PTSD 6 (n = 9,537) GWASs have relatively little power as reflected in the larger 295 standard errors in genetic correlations with these disorders, but nonetheless the strongest associations with 296 these disorders were with components that match symptoms of the disorders (both correlated strongest with 297 IC3, reflecting anxiety/depression/mental distress). Likewise, the suboptimal symptom distributions in IC2 298 and corresponding low heritability is reflected in large standard errors of the resulting genetic correlations 299 but nonetheless IC2, reflecting psychosis, was most strongly associated with SCZ. Supplemental analysis 300 with dichotomized IC2 also confirmed that the distribution alone is unlikely to explain the observed 301 associations (Suppl. Fig. 4). 302

303
Conclusion 304 In the present study, we revealed genetic overlap between statistically independent mental health profiles 305 and provide evidence that variations in mental health in healthy individuals relate genetically to psychiatric 306 disorders and cognitive traits. These findings support that pleiotropy between psychiatric disorders cannot 307 simply be explained by overlapping symptoms but may rather point to similar biological underpinnings of 308 distinct symptoms. Our results underscore the potential of data-driven approaches to the study of mental 309 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted May 19, 2020. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted May 19, 2020. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted May 19, 2020. ;https://doi.org/10.1101https://doi.org/10. /2020  (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  Fig. 1. Genetic correlation between the disorders and cognitive traits Numbers inside the boxes denote correlation (rg). Size of the boxes reflect standard error. Significant correlations (p < FDR) are indicated with a black border. In line with previous reports 9,25 , the weakest correlation was between PTSD and ANX (rg = -0.004, SE = 0.3408) and the strongest between ANX and MDD (rg = 0.8441, SE = 0.1724). 455 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted May 19, 2020. ;https://doi.org/10.1101https://doi.org/10. /2020 Suppl. Fig. 5. Q-Q plots Q-Q plots showing the genetic signal from each of the ICs. IC13 showed the strongest signal. IC2 showed a strong signal despite having the lowest h2. None of the GWAS summary statistics showed any noticeable inflation. 459 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.