The overlap of genetic susceptibility to schizophrenia and cardiometabolic disease can be used to identify metabolically different groups of individuals

Understanding why individuals with severe mental illness (Schizophrenia, Bipolar Disorder and Major Depressive Disorder) have increased risk of cardiometabolic disease (including obesity, type 2 diabetes and cardiovascular disease), and identifying those at highest risk of cardiometabolic disease are important priority areas for researchers. For individuals with European ancestry we explored whether genetic variation could identify sub-groups with different metabolic profiles. Loci associated with schizophrenia, bipolar disorder and major depressive disorder from previous genome-wide association studies and loci that were also implicated in cardiometabolic processes and diseases were selected. In the IMPROVE study (a high cardiovascular risk sample) and UK Biobank (general population sample) multidimensional scaling was applied to genetic variants implicated in both psychiatric and cardiometabolic disorders. Visual inspection of the resulting plots used to identify distinct clusters. Differences between these clusters were assessed using chi-squared and Kruskall-Wallis tests. In IMPROVE, genetic loci associated with both schizophrenia and cardiometabolic disease (but not bipolar disorder or major depressive disorder) identified three groups of individuals with distinct metabolic profiles. This grouping was replicated within UK Biobank, with somewhat less distinction between metabolic profiles. This work focused on individuals of European ancestry and is unlikely to apply to more genetically diverse populations. Overall, this study provides proof of concept that common biology underlying mental and physical illness may help to stratify subsets of individuals with different cardiometabolic profiles.


Results
The IMPROVE and UK Biobank studies. The demographic characteristics of the IMPROVE, UK Biobank subsets 1 (UKB1) and 2 (UKB2) are provided in Table 1. At baseline, individuals in IMPROVE (a European high cardiovascular-risk cohort) were older, more overweight and more likely to have T2D, hypertension or medication for hypertension or lipid-lowering medication than the UKB subsets (self-reported white British general population cohort). UKB1 and UKB2 were very similar, with lower frequency of hypertension at followup in UKB1 (51.5%) compared to UKB2 (62.0%) but slightly larger carotid Intima-media thickness (cIMT, indicative of vessel wall remodelling) measures in UKB2 to UKB1. Despite different proportions of UKB1 and UKB2 completing the mental health questionnaire, the frequencies of BD, MDD and GAD were similar. Figure 1 provides a schematic overview of the analysis procedure.

SCZ-CM loci can identify metabolically distinct groups of individuals in IMPROVE.
When using IMPROVE and single nucleotide polymorphisms (SNPs) with a minor allele frequency (MAF) > 1%, implicated in both SCZ and CMD (SCZ-CMD), plotting the first two multi-dimensional scaling components (C1 and C2) demonstrated 3 groups of individuals (by visual inspection) (Fig. 2a). Separation was predominantly due to C1, and whilst C1 is nominally significantly correlated with latitude (rho = − 0.036, p = 0.0339), the clustering is not being driven by latitude ( Supplementary Fig. 1). SNPs with MAF as low as 1% might differ across populations (even within the same ancestry grouping), therefore robustness to MAF threshold also assessed. When using MAF > 5% showed additional groups (Fig. 2b), whereas MAF > 10% showed similar groups to MAF > 1% (Fig. 2c). Assignment to groups was consistent using MAF > 1% and MAF > 10% (Supplementary Table 1). The three groups appear to have modest differences in cardiometabolic profiles ( Table 2): Group 3 had a significantly lower frequency of hypertension (group 3: 74% vs groups 1 or 2: 80% or 81% respectively, P = 0.004) and lower fastest progression of cIMT (group 3: 0.156 mm vs groups 1 or 2: 0.176 mm or 0.166 mm, P = 0.002). This is surprising given the (non-significant) higher rates of smoking in this group. Group 2 had (non-significantly) lower rates of T2D than the other groups (group 2: 25% vs groups 1 or 3: 28%). Similar groups were observed using T-distributed Stochastic Neighbour Embedding (tSNE) or principal component analyses (PCA, Supplementary Methods), with the majority of individuals being consistently grouped together (Supplementary Figs. 2 and 3,respectively). This result appears specific to SCZ-CMD SNP subset; no separation into groups was observed when using MDD-CMD SNPs, irrespective of the MAF filter used (Fig. 2d-f). For BD-CMD SNPs (Fig. 2g-i), grouping is apparent at MAF > 1%, but not when MAF > 5% or 10% were considered.
Validation of method and sensitivity testing of clustering in UKB1. In order to assess whether MDS analysis of SCZ-CMD SNPs could reproducibly identify three groups of individuals, validation of the method was attempted in UKB1. Firstly, to directly replicate the analysis conducted in IMPROVE (Fig. 3a,b), the post-filtering SNPs from IMPROVE were used (Fig. 3c); however the grouping is not convincing as there is little separation between the groups. Secondly, to assess robustness of the method to differences in MAF and LD structure between populations, the SCZ-CMD SNPs were filtered for MAF and LD in UKB1. As noted in Fig. 1, the majority of SNPs included in the two approaches were the same. Unsurprisingly, the SNPs that differed were www.nature.com/scientificreports/ mainly those with MAF < 10%. Using SCZ-CMD and conducting MAF and LD filtering in UKB1, nine groups are evident when using SNPs with MAF > 1% (Fig. 3d), whereas three groups are observed when using SNPs with MAF > 10% (Fig. 3e). When comparing the metabolic profiles of the 3 groups, no significant differences www.nature.com/scientificreports/ were seen ( Fig. 4a and Table 3). This is unsurprising, given that it is a smaller cohort with a lower cardiovascular burden.

Validation of metabolic differences between clusters in UKB2.
In an attempt to replicate the clustering and validate the metabolic differences between groups, the larger UKB2 subset was analysed. As filtering with MAF > 10% and 1% gave similar clusters, filtering with MAF > 10% was applied as it is more likely to generalise to other populations. Again, three major groups were identified (Fig. 4b), similar to those identified in IMPROVE and UKB1. Additional clusters between the major three groups were apparent, but they account for ~ 7% of the studied population, and were omitted from the groups. Consistent with the IMPROVE study, clinically modest (and statistically significant) differences were observed in baseline SBP, SDP adjusted for blood-pressure medication, and frequency of hypertension and T2D (Table3). These effects were not observed at follow-up, potentially due to lifestyle or medications changes in response to baseline observations. It was also noted that the frequency of MDD but not BD differed between the groups. The number of SCZ in UKB2 is too low to provide meaningful statistics.
Impact of MDD/BD on clusters. As phenotypes and genetic loci for SCZ overlap with those for MDD and BD, it is perhaps unsurprising to see that the clusters include different proportions of individuals with MDD. To investigate whether these individuals were driving the clustering, the process was repeated in those without BD/ MDD separately from those with these diagnoses (using SNPs with MAF > 10%). In those without mental illness, similar to the overall UKB2, there were there main groups, intermediate clusters accounting for 7.4% of the sample (Fig. 5a). In those with mental illness the three clusters were observed, with better between-group separation and only 1.3% of the sample being ungrouped (Fig. 5b). Small but significant differences between groups were observed for blood pressure measures and rates of hypertension, in both those with and without mental illness (Supplementary Table 2). These results suggest that this method is applicable to the general population, as well as those with increased genetic burden for mental illness.

All genetic loci associated with SCZ do not identify clusters in UKB.
To determine whether it is common biology (ie. Overlap in loci for SCZ and CMD) per se, rather than SCZ in general that drives the clus- www.nature.com/scientificreports/ tering, the same procedure was followed using all SNPs in loci associated with SCZ in UKB2, with the same MAF and LD filtering being applied prior to MDS analysis. As shown in Supplementary Fig. 4, SNPs in loci associated with SCZ do not separate individuals into groups. A further "negative control" experiment was conducted in UKB2. When repeating the analysis using the genetic loci (Supplementary Table 4) associated with eye colour 6,7 , there was no evidence of subgroups ( Supplementary Fig. 5). These results confirm that it is the overlap of SCZ and CMD loci (rather than a methodological artefact), and therefore probably common biological mechanisms, which are driving the clustering.

Discussion
This study provides proof of principle that, using the genetic overlap between SCZ and cardiometabolic disorders, subsets of European ancestry individuals with different metabolic profiles can be identified. These findings support the existence of mechanisms common to SCZ and blood pressure regulation. The discovery cohort IMPROVE deliberately recruited to identify genes and biomarkers associated with the risk of cardiovascular diseases, at a time when psychiatric disorders were typically excluded from non-psychiatric studies, therefore only a portion of the spectrum of psychiatric genetic burden is represented. In contrast, UKB1 and UKB2 are general population cohorts and therefore have a wider spectrum of both psychiatric and cardiometabolic disorder genetic burden, although it is recognised that the recruitment skews this distribution towards to the healthier segment of the population 8 . It is therefore both striking that the grouping was present in IMPROVE, and unsurprising that the blood pressure and hypertension differences between groups were more modest in UKB2 than those in IMPROVE.
It is worth noting that similar groups were observed in the IMPROVE cohort, using three different methods and (where applicable) exploring a variety of parameter settings. This suggests that the grouping is robust. The metabolic profiles of the groups did not completely agree between the 3 cohorts, however the repeated observation of between-group differences in T2D and blood pressure/hypertension deserves further attention. If the method can be refined to better identify whether an individual is at increased risk of either hypertension or T2D would be of immense value. Even if the method is only robust in high CMD-risk populations (such as those with family history, multiple risk factors or psychiatric diagnoses), it could be of clinical importance. www.nature.com/scientificreports/ It is interesting that the analyses using BD and MDD genetic loci did not enable clustering of individuals in the same way as was observed for SCZ, particularly given that BD and SCZ demonstrate an overlap in genetic loci. There are several possible explanations for this, most notably the ability to identify genetic loci for each mental illness: SCZ is clinically a more severe phenotype with diagnostic criteria that are relatively specific (for example psychotic episodes). In comparison, MDD spans a wide spectrum severity, with phenotypic heterogeneity potentially diluting or obscuring some true genetic effects. Whilst BD can be considered an intermediate (some symptoms more severe than MDD, most are less severe than for SCZ) diagnostic criteria for MDD and BD overlap to a large degree as both involve episodes of depression, meaning that there is potential for misdiagnosis and therefore dilution of genetic effects for either trait. Another explanation is that the mechanisms leading to CMD in SCZ differ from those in MDD or BD, with processes that are represented on the CardioMetabo and Immuno chips failing to capture some pathological mechanisms. With this in mind, the finding of different frequencies of MDD in the groups was not anticipated, as the MDD genetics did not achieve any form of grouping, and the overlap of MDD and SCZ genetics is modest. However, MDD is highly heterogeneous, therefore it would be of interest to further explore whether there are any differences between the MDD cases in each group, specifically whether any of the groups corresponds to the recently proposed atypical depression subtype 9,10 .
Genetic correlation analyses have begun to explore the common biology and causal relationships between psychiatric and cardiometabolic diseases 1,3,10 , however these methods assume that the entire genome influences both sets of traits. The small to moderate correlations could suggest that it is only a portion of the genome that has common effects. In contrast, the current study focuses on only the parts of the genome that have been implicated Table 2. Demographic characteristics of the IMPROVE participants, by cluster (MAF > 10%). Highlighted in bold are the significant (p < 0.05) differences between groups. Where: *, adjusted to provide estimates of treatment-naïve levels as per Ehret et al.; Statistical analyses compared levels or frequncies across groups 1, 2 and 3. Ungrouped (.) were omitted from the analyses). **P for Pearsons chi square for categorical variables and Kruskal-Wallis for continuous variables; na, not available.  www.nature.com/scientificreports/ in both psychiatric and CMD. Whilst this study does not bring us any closer to understanding the mechanisms underlying the common pathological mechanisms, it does suggest that exploration of the SCZ-CMD loci could have clinical utility, irrespective of mechanistic understanding. One limitation is that these analyses were conducted in individuals of European ancestry and as SNPs were filtered by MAF and linkage disequilibrium, it is not possible to generalise them to other populations. Indeed, to apply current information from European ancestry individuals to additional ancestry groups has the potential to be misleading and is certainly incomplete. Whilst there is a recognised need 11 and growing efforts around the world to explore genetics of disease in non-European ancestry individuals, it will take time to gain full insight into the genetic architecture of diseases in these ancestry groups. www.nature.com/scientificreports/ Another limitation is that the CardioMetabo and Immuno chips do not include all loci implicated in cardiometabolic disorders. Since these chips were described (2012 and 2011 respectively), many more loci involved in many more processes have been identified. However, as more and more samples are available for GWAS analyses, loci are being identified with smaller and smaller effect sizes. Therefore whilst not all possible information is captured by using the CardioMetabo and Immuno chips, the loci with the largest effects are represented. Table 3. Demographic characteristics of the UKB1 and UKB2 participants, by cluster. Highlighted in bold are the significant (p < 0.05) differences between groups. Where: *, adjusted to provide estimates of treatmentnaïve levels; na, not available; Statistical analyses compared levels or frequencies across groups 1, 2 and 3. Ungrouped (.) were omitted from the analyses). **P for Pearson's chi square for categorical variables and Kruskal-Wallis for continuous variables.  www.nature.com/scientificreports/ In conclusion, this study provides proof of concept that common biology underlying mental and physical illness is probable and can distinguish subsets of individuals with differing metabolic profiles, even if full understanding of mechanisms is lacking. Given that large-scale genotyping is not available to healthcare providers and the differences between groups are subtle, there is currently limited potential for translation of this into clinical practice. Further investigation with longitudinal datasets, particularly in high CVD risk populations, would define whether or not there is potential for clinical value in this method.

Methods
Cohorts: phenotyping and genotyping. The IMPROVE study has been described previously 12,13 . In short, 3700 individuals aged between 54-79 years with high CVD risk profiles (the presence of at least 3 classical CVD risk factors, including family history of CVD, type 2 diabetes, hypertension, hyperlipidaemia and smoking) were recruited from seven centres in Finland, Sweden, the Netherlands, France and Italy. At baseline, individuals completed lifestyle and medical questionnaires and anthropometric measures taken. Blood was sampled for DNA extraction and clinical biochemistry and stored for further biochemical analyses. Detailed ultra-sound examination of the carotid intima-media thickness (cIMT) was conducted at baseline, 15 months and 30 months. Linear regression using all data points was used to calculate progression of cIMT. Mental illness was not assessed; however it is believed that if there is mental illness in this cohort it is likely to be subclinical. All participants provided written informed consent and the study was conducted in accordance with the Helsinki Declaration. Ethical approval was granted by the Regional Ethics Review Boards at Karolinska Institutet, Stockholm Sweden, the Groupe Hôpitalier Pitie-Salpetriere, Paris, France, the Comitato Etico delle Aziende Sanitarie della regione Umbria, Perugia, Italy, the Ospedale Niguarda Ca´Granda, Milano, Italy, the University Hospital Groningen, Groningen, the Netherlands, the Hospital District of Northern Savo, Kuopio, Finland and the University of Eastern Finland, Kuopio, Finland.
The IMPROVE study was genotyped on the Illumina Cardio-Metabo 14 and Immuno chips 15 , therefore cardiometabolic disorders (including immune and inflammatory components) were well represented. Standard quality control procedures were conducted, namely exclusion of SNPs for low call rate (< 95%) and deviation from Hardy-Weinberg Equilibrium (p < 1 × 10 -6 ) and exclusion of samples for low call rate (< 95%), sex-mismatch, cryptic relatedness. Quality control was conducted on each chip separately, followed by a further round of quality control on the combined chip.
The UK Biobank (UKB) has been described previously 16,17 . Approximately 500,000 volunteers aged 39-73 years were recruited from 22 centres across the UK. At baseline, detailed questionnaires on sociodemographic factors, lifestyle factors and medical history were completed by all individuals. Measurements of anthropometric variables were recorded and blood samples were taken for DNA extraction. Subsequently (4-8 years after baseline), subsets of participants were invited for follow-up measurements and extensive imaging. All participants provided written informed consent and ethical approval was granted by the NHS national Research Ethics Service. This work was conducted under projects #6533 (Smith) and #1755 (Pell).
Ultrasound measurement of cIMT was conducted in a pilot phase of ~ 2500 individuals (henceforth denoted as UKB1) followed by a subsequent phase including ~ 22,000 individuals (denoted UKB2) using the same recruitment and measurement protocol. cIMT measurements were generally consistent with the measurements available in IMPROVE. A mental health/thoughts and feelings questionnaire was also completed by a subset of participants, which enabled estimation of life history of MDD and BD. For both UKB1 and UKB2, 73% of participants completed the mental health questionnaire.
Genome-wide genotyping was conducted and standard quality control procedures were applied by the UK Biobank team 18 . Imputation was conducted using the Haplotype reference consortium and 1000 Genomes with standard pre-and post-imputation quality controls being applied by the UK Biobank team (further information is provided in 18 ).

Multi-dimensional scaling (MDS) to identify clusters.
Genome-wide genetic loci reported to be associated with SCZ 19 , MDD 20 and BD 21 were identified. SNPs within these (SCZ, MDD or BD) loci which were present on the CardioMetabo and Immuno chips were selected 14,15 (denoted SCZ-CM SNPs, MDD-CM SNPs or BD-CM SNPs, respectively). SNPs with MAF > 1% were included (Supplementary Table 3). A schematic diagram of the analyses steps is provided in Fig. 1.
Clustering was performed using multi-dimensional scaling, implemented in PLINK, using default settings. Multidimensional scaling essentially measures similarity between individuals, in this case using the patterns of genetic variation as the assessment criteria 23,24 . Individuals with similar genetic sequences are deemed more similar to each other than those with less similar genetic sequences. Clustering was also conducted using tSNE and PCA (Supplementary Methods).
Subsequently in UKB1, SCZ-cardiometabolic SNPs only were used and individuals with > 1% missing genetic data were excluded prior to clustering. MDS analyses was conducted using either exactly the same SNPs as were used in IMPROVE (ie SCZ-CM SNPs after filtering for MAF and LD in IMPROVE) or SCZ-CM SNPs with filtering for MAF and LD being done in UKB1.
Finally, in UKB2, Individuals with > 1% missing genetic data were excluded prior to clustering. MDS analysis was conducted on SCZ-CM SNPs with filtering for MAF and LD in UKB2, or on all SCZ SNPs after MAF filtering and pruning in UKB2.
The first two MDS components (C1 and C2) were plotted for visual assessment.
Scientific Reports | (2021) 11:632 | https://doi.org/10.1038/s41598-020-79964-x www.nature.com/scientificreports/ Choosing a negative control experiment is not straight forward, as current evidence suggests that most genetic variants are highly pleiotropic and that complex traits overlap with each other to a large degree. Despite some overlap with CMD or SCZ-related traits, SNPs in genetic loci associated with eye colour were used as a negative control experiment. The analysis was conducted in UKB2 with MAF > 10% filtering and pruning as described above.
Statistical analyses. In IMPROVE, Spearmans rank correlation coefficients were used to assess the relationship between the MDS components and latitude. For IMPROVE, UKB1 and UKB2, Differences between groups were assessed by Pearsons chi squared test for categorical values and Kruskal-Wallis test for continuous variables. All statistical analyses were conducted in Stata (version 11.0). The threshold for significance was set at p < 0.05. No adjustment for multiple testing was applied, because these analyses are exploratory rather than definitive and secondly because most of the cardiometabolic phenotypes tested are interrelated and thus are not independent tests.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author request.