Applying polygenic risk scoring for psychiatric disorders to a large family with bipolar disorder and major depressive disorder

Psychiatric disorders are thought to have a complex genetic pathology consisting of interplay of common and rare variation. Traditionally, pedigrees are used to shed light on the latter only, while here we discuss the application of polygenic risk scores to also highlight patterns of common genetic risk. We analyze polygenic risk scores for psychiatric disorders in a large pedigree (n ~ 260) in which 30% of family members suffer from major depressive disorder or bipolar disorder. Studying patterns of assortative mating and anticipation, it appears increased polygenic risk is contributed by affected individuals who married into the family, resulting in an increasing genetic risk over generations. This may explain the observation of anticipation in mood disorders, whereby onset is earlier and the severity increases over the generations of a family. Joint analyses of rare and common variation may be a powerful way to understand the familial genetics of psychiatric disorders.

T he development of polygenic risk scoring (PRS) has greatly advanced the field of psychiatric genetics. This approach allows for even sub-genome-wide significant threshold results from large genome-wide meta analyses to be leveraged to explore genetic risk in smaller studies 1 . The effect sizes at many individual single-nucleotide polymorphisms (SNPs), estimated by large genome-wide association studies (GWAS) on the disorder of interest, are used to calculate an individual level genome-wide PRS in individuals from an independent genetic dataset. The PRS based on the summary statistics of the schizophrenia (SCZ) GWAS by the Psychiatric Genomics Consortium (PGC) 2,3 has proven to be most powerful in predicting not only SCZ 1,4 but also other psychiatric disorders [5][6][7] . In addition, updated, more powerful, summary statistics from the Psychiatric Genomics Consortium from the latest GWAS for bipolar disorder (BPD) and major depressive disorder (MDD) are available via the PGC Data Access Portal (https://www.med.unc.edu/pgc/shared-methods).
Aside from increasing power in traditional case-control designs, PRS algorithms also open up new avenues for studying common variation. In this study, we consider the application of PRS within a family context. While pedigree studies have been traditionally used to explore rare genetic variation through linkage analyses, studying patterns of PRS throughout a pedigree would allow for assessment of phenomena like assortative mating and anticipation. Assortative (non-random) mating is a common phenomenon where mated pairs are more phenotypically similar for a given characteristic than would be expected by chance 8 . Results from a recent study by Nordsletten et al. 9 show extensive assortative mating within and across psychiatric, but not physical disorders. This could explain some of the features of the genetic architecture of this category of disorders [9][10][11] . This includes anticipation, a phenomenon where later generations exhibit more severe symptoms at an earlier age, robustly reported (although not explained) in BPD 12 , and recently highlighted in genetic studies of MDD 13,14 .
In the current study, we aim to discuss the application of polygenic risk scoring for SCZ, MDD, and BPD to explore patterns of common risk variation within a family context. We illustrate our discussion by investigating the relationship between PRS and apparent assortative mating, and anticipation within a complex multigenerational pedigree affected with mood disorders.

Results
Study overview. We identified a large pedigree in Brazil, the Brazilian Bipolar Family (BBF), after examination of a 45-yearold female who presented with severe Bipolar Type 1 (BPI) disorder. She stated there were dozens of cases of mood disorders in the family, most of whom lived in a small village in a rural area of a large state north of São Paulo (see Methods for details). We conducted 308 interviews using the Portuguese version of the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I)16 for family members over the age of 16 and the Portuguese version of Kiddie-SADS-Present and Lifetime Version (K-SADS-PL)17 for family members aged 6-16. Following diagnostic interviews, we conducted genotype analysis of all interviewees using the Illumina Infinium PsychArray-24. Polygenic risk scores (PRS) were assigned to each family member using PRS thresholds most predictive in discriminating affected from unaffected family members (see Methods).
Assortative mating. Married-in individuals were defined as individuals married to a BBF member, but having no parents in the family themselves. Of the 70 married-in individuals ascertained (irrespective of having genotype data) 19 (27%) were affected with a psychiatric disorder. This is significantly higher than the 17% population prevalence of the most common of the three disorders: MDD (Fisher's exact p = 0.02) 15 . The unaffected married-in group does not differ from the general healthy population as evidenced by no significant differences in PRS as compared to the population control group (BRA; see Methods). The above led us to investigate whether we can observe assortative mating on a genetic level, using PRS. In spouse pairs, we were unable to predict the PRS of the husband, using that of his wife, even when selecting concordant (both affected or both unaffected) pairs only. We considered the possibility that the marriedin individuals might confer a different genetic predisposition to mood disorders to their offspring than the original family members. The number of children contributed per spouse pair to each offspring category is shown in Supplementary Table 1. Demographics of the offspring in the different offspring categories (no affected parents (n = 54); one affected family member parent (n = 69); one affected married-in parent (n = 15) and two affected parents (n = 38)) are given in Supplementary Tables 2  and 3 Anticipation. The BBF shows patterns of anticipation, with individuals having an earlier age at onset (AAO) in later generations. For 104 individuals (irrespective of having genotype data), the average age at onset significantly decreases over generations with G2 (n = 1, AAO = 8), G3 (n = 23, AAO = 30.2 yrs ± 21.1), G4 (n = 53, AAO = 31.2 yrs ± 12.3), G5 (n = 23, AAO = 19.7 yrs ± 9.5), and G6 (n = 4, AAO = 13 yrs ± 3.6) (Supplementary Figure 2) with older participants recalling their AAO directly and younger participants confirmed using clinical records or parental recall (Beta = −4.549, SE = 1.793, Z-ratio = −2.537, p = 0.013, R 2 = 0.059). We hypothesized that this decrease in AAO would be reflected in a negative correlation with PRS, subsequently resulting in a pattern of increased PRS over generations. Because of a limited sample size of affected individuals per generation, a direct correlation of AAO and PRS does not reach significance, although the youngest generation (G5) does show trends towards negative correlations for SCZ:PRS and MDD:PRS (Supplementary Figure 3). The SCZ:PRS does show a significant increase over generations (Fig. 2) where n = 197 family members were included (46 married-in individuals were excluded from the analysis to capture inheritance patterns of SCZ:PRS) in a linear regression with generation as independent variable (Beta = 0.131, SE = 0.049, Z-ratio = 2.668, p = 0.008, R 2 = 0.025). The presence of such an effect when comparing generations suggests ascertainment effects such as relying on the recall of older family member with very long duration of illness in previous generations may be masking an overall effect across the entire family.
Balance of common and rare genetic risk. Transmission disequilibrium test analysis within the chr2p23 linkage region resulted in identification of rs1862975, a SNP originally typed on the Affymetrix linkage array (combined test p = 0.003). The homozygous T genotype was detected in 68% affected family members, 57% affected married-ins, 36% unaffected family members and 24% unaffected married-ins. Since this SNP was present only on the Affymetrix array, we identified rs12996218 as a proxy in CEU/TSI populations (D′ = 1.0, R 2 = 0.92) via the LDproxy option in LDlink (Machiela et al. 16 , https://analysistools. nci.nih.gov/LDlink/). Of the 57 BRA controls, 9 individuals (15%) carried the GG genotype equivalent to the rs1862975 TT risk genotype. The distribution of the rs1862975 genotypes in affected and unaffected individuals over generations is given in Supplementary Figure 4. The number of individuals carrying the TT does not significantly change over generations in either group. None of the PRS showed a significant difference when comparing PRS for rs1862975 genotypes in affected and unaffected individuals (Supplementary Figure 5).

Discussion
The current study is one of the first the first to probe patterns of common genetic variation within a traditional pedigree design. While increased polygenic scores in patients as compared to unaffected family members have been demonstrated recently 17 , we aimed to illustrate the possibilities of this approach by investigating apparent assortative mating and anticipation in a large multigenerational pedigree affected with mood disorders through polygenic risk scores for SCZ 2 , MDD 18 , and BPD 19 , and thereby improve mechanistic understanding of common genetic risk for psychiatric disorders.
Highlighting the possibilities of PRS applications within a family context, we set out to utilize patterns of common variation to illuminate phenomena within the family that are out of reach from traditional case/control studies. Assortative mating is one of the features in this family, where many married-in individuals are more affected with a mood disorder than the general population. As opposed to the family members, the married-in individuals were more often affected with (r)MDD instead of BP. As diagnoses were determined after the couples were married, we cannot rule out that this could be a result from a causal effect of a spouse's mental health on that of their partner. However, non-random mating patterns have been reported in the population regarding body type, socio-economic factors and psychiatric traits 9,10 . The BBF provides a unique opportunity to look at the genetic correlation between spouse pairs and the contribution of married-in individuals to overall psychiatric morbidity. A recent study has found genetic evidence for assortative mating when studying BMI and height in spouse pairs 11 . In the BBF; the affected married-in individuals have a higher, though non-significant, polygenic score than affected or unaffected family members but it appears that we observe significant consequences of this in that the offspring of an affected married-in parent collectively show significantly increased SCZ:PRS and BPD:PRS. However, it is puzzling we do not see an effect on offspring of two affected parents (which would include a married-in parent), which could indicate this finding to be of limited statistical robustness.
A contribution of the married-in parents to a genetic driven anticipation in age of onset is supported by the increase in SCZ: PRS over generations, although our cross sectional study dataset was less well powered to find an association with age at onset within affected family members. We did observe a trend for association between age at onset and PRS in the youngest generation in this study but not when combining sample across generations. Age at onset can be considered a proxy for severity 20,21 and has been previously associated with genetic risk in MDD 13,14 . However, this variable needs to be interpreted with caution, especially when analyzing patterns over time since it is dependent on context and memory 22 . Ascertainment bias can be a confounding factor in studies of psychiatric traits, with older generations having less access to psychiatric care and possibly misremembering the onset or nature of their first episode. In addition, although currently classified as "unaffected" or "unknown", members of the youngest generations can still develop a psychiatric disorder in the future.
Finally, we explored the balance of common and rare risk variation through combining our current PRS results with previously performed linkage analyses. We did not find a decrease in potential rare risk allele genotypes over generations contrasting the increase in SCZ:PRS, and PRS profiles for individuals carrying rare risk genotypes are not significantly different. This indicates that these factors separately confer independent disease risk. We recognize the limitations in sample size of our pedigree and therefore the power to draw statistically robust conclusions, especially in the offspring and combined linkage and PRS analyses. Even though the BBF might not be sufficiently powered, our point is to use this dataset to illustrate our approach and emphasize the unique nature of the family enabling the study of patterns of PRS and the balance of common and rare genetic risk for psychiatric disorders conferred within families. We encourage replication in similar pedigrees including affected married-in individuals when available to fully utilize the potential of PRS in this setting.
In conclusion, our study is an exploration of PRS as a tool for investigating patterns of common genetic risk in a traditional pedigree context. The SCZ and BPD scores appear best suited in our data for teasing apart patterns of assortative mating and anticipation, whereby increased polygenic risk for psychiatric disorders is contributed by affected individuals who married into the family, adding to the already present rare risk variation passed on by the early generations 23 .

Methods
Subject description. The Brazilian bipolar family (BBF) was ascertained via a 45year-old female proband who presented with severe Bipolar Type 1 (BPI) disorder and stated there were dozens of cases of mood disorders in the family, most of whom lived in a small village in a rural area of a large state north of São Paulo. Cooperation from the family and a 2003 self-published book about their history was invaluable for our ascertainment. Historically, the entire BBF consists of 960 members. Living family members > 16 years of age underwent semi-structured interviews, using the Portuguese version of the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I) 24 . Members aged 6-16 were assessed using the Portuguese version of Kiddie-SADS-Present and Lifetime Version (K-SADS-PL) 25 . In total 308 interviews were completed, and 5 eligible members declined an interview. In the rare event of discrepancies, two independent psychiatrists reviewed them and a final consensus diagnosis was assigned. All affected and unaffected adult family members that have been included in the genetic study have given informed consent. Minors have given assent, followed by consulted consent by their parents in accordance with accepted practice in both the U.K. and Brazil. The project was approved by the Brazilian National Ethics Committee (CONEP). Table 1 contains the demographics of the subjects used in the current analysis (n = 243 passed genotype quality control procedures described below). The population control dataset (BRA controls) was collected in Sao Paulo, Brazil, as a control dataset in a genetic study of first-episode psychosis 26 . They were volunteers who had no abnormal psychiatric diagnoses (SCID) or family history of psychotic illness. The Research Ethics Committee of Federal University of Sao Paulo (UNI-FESP) approved the research protocol, and all participants gave informed consent (CEP No. 0603/10). Demographics for n = 57 BRA controls can be found in Table 1.
Genotype data. Following diagnostic interview, interviewers obtained whole blood in EDTA containing monovettes for adults and lesser amounts or saliva given personal preference or age (DNA Genotek Inc., Ontario, Canada). Genomic DNA was isolated from whole blood and saliva at UNIFESP using standard procedures. Whole-genome genotype data was generated using the Illumina Infinium PsychArray-24 (http://www.illumina.com/products/psycharray.html) for both the BBF and the BRA control dataset at the in-house BRC BioResource Illumina core lab according to manufacturers protocol. Samples were excluded when average call rate was <98%, missingness >1% with additional check for excess heterozygosity, sex, family relationships and concordance rates with previous genotyping assays. SNPs were excluded when missingness > 1%, MAF < 0.01 or HWE < 0.00001 and if showing Mendelian errors for the BBF dataset in Plink v1.07 27 and v1.9 28 or Merlin v1.1.2 29 . The BBF and BRA control datasets were QC'd separately and then merged, applying the same SNP QC thresholds to the merged dataset as well. This quality control procedure resulted in a dataset of 225,235 SNPs for 243 BBF individuals (197 family members and 46 married-in individuals) and 57 BRA controls. Eigensoft v4.2 30 was used to check for population differences between the BBF family members, married-in individuals and BRA control sets. The BBF members self-reported mixed Southern European ancestry, confirmed by genome-wide principal components analysis showing that family members clustered closely with the Northern and Western European and Tuscan Italian populations in Hapmap3, with a relative lack of African or Native American ancestry (Supplementary Figure 6). The principal components appear to represent within-family structure, with most PCs seemingly separating subfamilies ( Supplementary Figures 7 and 8). PRS analyses as described below were also performed to include subfamily as a fixed effect, controlling for household effects (Supplementary Table 3 Polygenic risk scores. Polygenic risk scores for each family member (n = 243) and population control (n = 57) were generated in the same run using the PRSice v1.25 software 31 with the publically available PGC schizophrenia GWAS 2 as a base dataset (36,989 SCZ cases, 113,075 controls), in addition to MDD (51,865 MDD cases, 112,200 controls, not including 23andme individuals) and BPD (20,352 BPD cases, 31,358 controls) summary statistics from the latest PGC meta analyses (unpublished data 18,19 ). We performed p-value-informed clumping on the genotype data with a cut-off of r 2 = 0.25 within a 200-kb window, excluding the MHC region on chromosome 6 because of its complex linkage disequilibrium structure. Acknowledging the possibility of over-fitting, we selected the PRS thresholds most predictive in discriminating affected from unaffected family members through linear regression in PRSice for SCZ:PRS (p < 0.00055, 1218 SNPs), MDD:PRS (p <  Linkage analysis. The main linkage analyses identifying rare genetic risk variation were performed as part of a previous paper on the BBF 23 using the Affymetrix 10k linkage genotyping array. In order to explore the balance between common and rare risk variation, we selected the strongest signal for affected versus unaffected family members on chr2p23 (chr2:30000001-36600000, LOD = 3.83). Following the strategy described by Rioux et al. 32 , we performed a transmission disequilibrium test on the 25 markers in this linkage region in an attempt identify "linkage positive" individuals in n = 300 family members with one or both types of genotype array data. N = 155 individuals overlap with the current study and based on exploration of patterns of PRS in the current study we attempted to answer two questions: (1) with an increase of common risk variation, does rare risk variation become less important over generations, (2) do linkage positive individuals carrying the presumed risk allele show differences in PRS.
Statistical testing. All PRS were standardized mean = 0 and SD = 1. Linear mixed model analyses were selected to be able to model covariates and relatedness within this complicated dataset. The analyses were performed using the Wald conditional F-test 33 in ASReml-R software 34 with one of the categories of mood disorders or family status as dependent variable and PRS as the independent variable (Supplementary Methods). Age (except for the generation analysis) and sex were fitted as fixed effects in the models. For 7 individuals in the BBF age at collection was missing and imputed to be the mean age of the relevant generation. To account for relatedness in within-family comparisons, an additive genetic relationship matrix was fitted as a random effect. The relationship matrix was constructed using LDAK software 35 with weighted predictors and LD correction parameters suited for pedigree data, resulting in pairwise relatedness estimates and inbreeding coefficients on the diagonal. The variance explained by each PRS was calculated using: (var(x × β))/var(y), where x was the standardized PRS, β was the corresponding regression coefficient, and y was the phenotype 36 . For the analysis of offspring, we defined four spouse pair categories ("both unaffected", "married-in parent affected", "family parent affected", "both affected"). While most spouse pairs contribute 1 or 2 children to the same offspring category (Supplementary Table 1); two "both affected" spouse pairs contribute 7 and 8 children, respectively. To prevent bias in our analysis in the event of more than one child per couple, we calculated the mean PRS for all offspring per spouse pair and entered this in the model as being one representative child for that couple. All p-values reported are uncorrected for multiple testing, since all tests concern overlapping individuals and thus have a complex dependence structure. However, we have performed 42 tests as listed in Supplementary Table 4, and so a conservative Bonferroni threshold for p < 0.05 is 0.001.

Data availability
In order to ensure privacy of the family members and to comply with Brazilian regulations, restrictions apply on availability of the data as determined by the Brazilian National Ethics Committee (CONEP

Additional information
Competing Interests: G.B. has been a consultant in preclinical genomics and has received grant funding from Eli Lilly ltd within the last 3 years. A.G. has participated in advisory boards for Janssen-Cilag and Daiichi-Sankyo. The remaining authors declare no competing interests.
Reprints and permission information is available online at http://npg.nature.com/ reprintsandpermissions/ Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/ licenses/by/4.0/.