Introduction

Bipolar disorder (BD) is a serious mental illness affecting more than 48 million adults worldwide with a lifetime prevalence of 1–3% [1, 2]. BD is characterized by extreme episodes of mood elevation and depression, along with disturbances in thinking and behavior that often result in significant disability [3]. The high lifetime morbidity of BD and frequent suboptimal outcomes with available treatments present an urgent need to better understand the pathogenesis of BD. Prior studies have suggested a role for dysregulated neuronal signaling and synaptic plasticity [4,5,6,7], as well as pathways that regulate neurotransmitter function, neuronal development, and oxidative stress [8, 9]. However, a comprehensive characterization of biologic mechanisms in BD remains elusive. A better understanding is critical for development of effective therapies and personalized care.

BD is highly heritable (59–93%) within families, and monozygotic twins show higher concordance rates (40–80%) than dizygotic twins (5–30%) [10,11,12,13,14]. Genome-wide association studies (GWAS) have identified over 30 loci that contribute to BD susceptibility, implicating genes that encode ion channels, neurotransmitter transporters, and synaptic components [15,16,17,18]. Genes implicated by GWAS might also harbor rare variants [19,20,21], however this has not been studied in BD. Whole-exome sequencing (WES) in BD families has examined the role of rare exonic variation influencing BD risk. [22,23,24,25,26]. However, these studies have been small (discovery cohort <200 BD cases) [22,23,24,25], are single-cohort studies [22, 25], and only provide modest evidence implicating rare coding variants [23, 24, 26]. Overall, it remains unclear whether rare genetic variation significantly influences risk of BD.

BD shares clinical features and genetic susceptibility with schizophrenia (SCZ) [12, 18, 27,28,29], particularly through common genetic variation. In addition to the observation that BD and SCZ share a high genetic correlation due to common variation [30], the SCZ polygenic risk score (PRS) is associated with psychosis in BD, and the BD PRS is associated with manic behavior in schizophrenia [28, 31]. Studies of rare variation show that individuals with SCZ carry a higher burden of ultra-rare protein-altering variants across the exome, particularly in genes that are evolutionarily conserved, in genes that affect neuronal synaptic function, and that have been implicated through studies of de novo variation in SCZ and autism spectrum disorder [32,33,34,35,36]. Rare variants in these gene classes have not been systematically examined in BD. Despite the shared susceptibility between BD and SCZ through common genetic variation, it remains unknown whether rare variants in genes implicated in SCZ contribute to BD.

This study aims to test whether rare, functional protein-altering variation impacts susceptibility to BD. To this end, we examined the protein-coding sequences (exomes) of 3,987 individuals with BD and 5,322 healthy controls of predominantly European ancestry recruited across four studies to address the following questions: do individuals with BD carry a greater burden of rare functional coding variants (1) in their exomes overall; (2) in genes implicated by common variant GWAS in BD; and (3) in genes implicated through common and rare variant studies in SCZ, including neuronal synaptic and loss-of-function intolerant genes. We subsequently attempted replication of positive findings in an additional 9,929 individuals with BD and 14,018 matched controls.

Methods

Cohorts

The study protocol for the combined analysis was approved by the Johns Hopkins University Institutional Review Board. All human studies were approved by each respective institutional ethics review committee, and all participants provided written informed consent. We examined the exomes of 3,987 individuals with BD (3,615 with bipolar 1 disorder (B1D), 252 with bipolar 2 disorder (B2D), 91 with schizoaffective disorder bipolar type (SAB)) and 5,322 unaffected individuals from four case-control cohorts of predominantly European ancestry that comprise the Bipolar Sequencing Consortium (BSC) (Table 1, Figs. S1, S2, and Supplemental Methods). These included 1,712 cases and 1,844 controls from the Bipolar Research in Deep Genome and Epigenome Sequencing (BRIDGES) study [37], 961 cases and 1,039 controls from the Rare Bipolar Loci Identification Through Synaptome Sequencing (RareBLISS) study (NIMH Genetics Initiative repository) [26], 831 cases and 1,956 controls from Sweden (Swedish National Hospital Discharge Register and population-based registries), and 483 cases and 483 matched controls from the Kaiser Permanente Northern California (KPNC) cohort (Multi-ethnic Genome-wide Study of Bipolar1 Disorder and Resource for Genetic Epidemiology Research in Adult Health and Aging (GERA) cohort [38]). All studies were comprised of individuals of European ancestry except for KPNC, which included individuals of European (EUR, 192 cases, 192 controls), African American (AFR, 96 cases, 95 controls), Latino (LAT, 98 cases, 100 controls), and East Asian (EAS, 97 cases, 96 controls) ancestry.

Table 1 Bipolar disorder sequencing cohorts.

Sequencing and quality control

Whole-genome sequencing (WGS) was performed in the BRIDGES cohort using the Illumina HiSeq 2500 system. Library preparation was performed using Nimblegen SeqCap EZ Exome (RareBLISS and KPNC) or Agilent SureSelect Human All Exon v2 kit (Sweden), and whole-exome sequencing (WES) was performed using Illumina HiSeq 2000 or 2500 systems in RareBLISS, Sweden, and KPNC. Paired sequence reads were aligned to the human reference build hg19 using BWA [39]. Variant calling was performed using the GotCloud sequence analysis pipeline [40] for WGS data in BRIDGES, and joint variant calling was performed using the Genome Analysis ToolKit (GATK) [41] for WES data within each of RareBLISS, Sweden, and KPNC. Genotypes with low sequence coverage or poor call quality were removed. Samples that were identified as population outliers, duplicates or relatives, or that failed study-specific sequencing metrics were removed. Variants with high missingness across samples, poor average genotyping quality, or found to significantly deviate from Hardy-Weinberg equilibrium were removed. A full description of sequencing and data quality control is provided in Supplemental Materials.

Variant selection and annotation

Analysis was limited to single nucleotide variants (SNVs) that are bi-allelic and whose minor allele matched the alternative (non-reference) allele. Annotation of all variants identified from exome and genome sequencing was done using ANNOVAR, and pathogenicity was assigned according to 2015 American College of Medical Genetics (ACMG) criteria using InterVar, a computational implementation of expert panel recommendations for clinical interpretation of genetic variants (ACMG 2015 criteria) [42,43,44]. ACMG classification categories include pathogenic (P), likely pathogenic (LP), variant of uncertain significance, likely benign, and benign. Variants that were rare (maximum population-specific minor allele frequency [MAF] < 1% in the Genome Aggregation Database [gnomAD] [45, 46]), protein-altering (missense, splice site, stopgain, startloss, stoploss (none observed), and classified as pathogenic or likely pathogenic by InterVar were retained for analysis.

Gene-level burden testing

To assess whether individuals with BD carry a greater burden of rare pathogenic or likely pathogenic (P-LP) variants in individual genes, we performed Firth logistic regression on the number of P-LP alleles within the gene per individual (adjusted for sex and at least five ancestry principal components) for each of seven groups (BRIDGES, RAREBLISS, Sweden, KPNC-EUR, KPNC-AFR, KPNC-LAT, KPNC-EAS), followed by a meta-analysis across groups using the weighted Z-score method. We calculated a one-sided p value for enrichment of P-PL variants in BD cases from the meta-analysis Z-score, and used a Holm-adjusted p value < 0.05 to report significant associations.

Gene set selection

We assessed whether exomes of BD patients are enriched for rare functional protein-altering variants in genes implicated by the largest BD GWAS meta-analysis [18]. We defined the following three BD-related gene sets: 165 genes identified by physical proximity to variants in linkage disequilibrium (r2 > 0.1) with BD GWAS-associated variants (BD GWAS), 153 genes identified using the software package MAGMA (Multi-marker Analysis of GenoMic Annotation) [47] on BD GWAS data (BD MAGMA), and 81 genes identified through the intersection of the above two gene sets (BD MAGMA-GWAS). Given that BD shares genetic susceptibility with schizophrenia (SCZ) [12, 18, 27,28,29], particularly through common genetic variation, we examined genes implicated in a large schizophrenia GWAS (CLOZUK+PGC2) [48] by defining the following three gene sets: 623 genes identified by variants in linkage disequilibrium (r2 > 0.1) with schizophrenia GWAS-associated variants (SCZ GWAS), 550 genes identified using MAGMA on schizophrenia GWAS data (SCZ MAGMA), and 342 genes identified through the intersection of these two gene sets (SCZ MAGMA-GWAS). Given the strong impact of rare variant burden on SCZ susceptibility and the partial clinical overlap between BD and SCZ [34, 48,49,50], we also examined three gene sets previously implicated in SCZ rare variant risk: 3,055 synaptic genes whose mRNA are bound by the splicing factor RBFOX2 [32, 51] (Synaptic-RBFOX2), 1,033 synaptic genes whose mRNA are bound by FMRP [32, 52] (Synaptic-FMRP), and 3,230 genes classified as loss-of-function (LOF) intolerant from the large ExAC dataset [45] (LOF-intolerant).

Gene set burden testing

We used two statistical methods to evaluate whether BD cases carry a higher burden of rare functional protein-altering variants (described above) than controls for each candidate gene set. First, we performed meta-analysis of Firth logistic regressions from each of seven groups described above. Second, we performed Cochran–Mantel–Haenszel (CMH) chi-square tests on the P-LP count in cases and controls across studies, and determined the empirical (one-sided) p value for enrichment of P-PL variants in BD cases by comparing this CMH chi-square statistic to a null distribution generated from a random selection of an equivalent number of genes in 10,000 simulations. We performed these random simulations to control for potential confounding due to variability in sample selection and sequencing methods between cases and controls within each study. We conducted all statistical analyses in R [53]. See Supplemental Methods for full burden testing methods.

Replication

We examined exome sequence data in an independent cohort of 9,929 individuals with BD and 14,018 matched controls from the Bipolar Exomes (BipEx) collection (see Supplemental Materials). Of 3,639 BD cases with known subtypes, 2,684 were classified as B1D, and 955 were classified as B2D. After WES variant calling and quality control, we annotated exonic variants using InterVar, and extracted all rare (gnomAD maximum population MAF < 1%), protein-altering (missense, splice site, stopgain, startloss) SNVs that were classified as P or LP, as was done for the BSC studies. Statistical analysis was the same as used in the BSC analysis. Specifically, we examined the burden of rare P-LP variants in BD cases versus controls, and in B1D cases versus controls, within the three BD gene sets (BD GWAS, BD MAGMA, BD MAGMA-GWAS). We performed Firth logistic regression on the number of rare P-LP alleles per individual (adjusted for sex, 10 principal components, and the total burden of non-reference alleles) within each of six cohorts that comprised the BipEx replication sample (US, UK, Sweden-Umea, Sweden-Karolinska, Netherlands, and Germany), followed by a meta-analysis across cohorts using the weighted Z-score method.

PRS analysis of common (GWAS) variants

Except for BRIDGES, which used genotypes from WGS, studies used previously published GWAS array data from European-descent individuals imputed into the 1000 Genomes or 1000 Genomes + Haplotype Reference Consortium reference panels to select variants for inclusion in polygenic risk score analysis [18, 38, 54,55,56,57,58]. We identified the set of variants present and well imputed (imputation R2 or info score >0.5) and with MAF > 0.01 in all studies and limited to the European ancestry sample in KPNC. We further filtered variants where any individual cohort frequency deviated from the mean frequency by >0.2, resulting in a final set of 2,855,373 SNPs for building polygenic scores.

To generate SNP sets and per SNP weights for building polygenic scores independent from the data in the current study, we started with association results from studies that participated in recent PGC2 analyses for bipolar disorder [18] and schizophrenia [59], then removed all studies with participants overlapping with BSC study samples and re-ran case-control association meta-analysis. These modified meta-analyses included roughly 11,000 cases and 17,000 controls for bipolar disorder and 19,000 cases and 28,000 controls for schizophrenia. In both the bipolar and schizophrenia association results, we extracted the subset of SNPs overlapping the high quality BSC variants described above and identified sets of variants to include in PRS analysis by clumping all SNPs based on LD (1MB windows and r2 > 0.1) using 1000 Genomes European individuals as the reference sample. For each trait, we selected variants at three different p value thresholds (P < 0.01, <0.05, and <0.1) to determine the effect of threshold on the strength of the PRS association with bipolar disorder. Each study then generated a PRS score for each individual based on genotype dosages and using log(OR) from the restricted PGC2 analyses as weights. Each group then tested for association using logistic regression of the PRS on bipolar case-control status in their BSC GWAS/WGS data adjusting for sex, genotype-based principal components, and where necessary, genotype batch. We combined results from all studies using inverse variance-weighed meta-analysis.

Code used in the analysis of the BSC samples is available at: http://metamoodics.org/bsc/consortium/bsc-case-control-workgroup

Results

Exome analysis in 3,987 individuals with BD (91% B1D, 6% B2D, and 2% SAB) and 5,322 unaffected individuals (Table 1, Table S1, Fig. S1) identified 1,328,324 exonic SNVs that passed quality control, of which 792,375 were rare (gnomAD maximum population MAF < 1%) and protein-altering (primarily missense, splice-site, and stop-gain); 9,883 (0.7%) of exonic SNVs were classified as pathogenic (P) or likely pathogenic (LP) using InterVar [42] (Table S2). Variants classified as pathogenic included 2,629 stop-gain and 1,963 splice-site altering SNVs affecting 2,076 genes. Variants classified as likely pathogenic included 5,152 missense, 463 stop-gain, and 1 stop-loss affecting 1,173 genes (Table S2). On average, each individual carried 0.72 pathogenic and 1.2 likely pathogenic SNVs across the exome. Most (81%) of P-LP variants were singleton mutations, and a significant proportion (38%) of such variants were not previously reported in the large gnomAD database (Fig. 1).

Fig. 1: P-LP variant distribution.
figure 1

Minor allele counts for variants classified as P or LP from 9,309 individuals in the BSC cohort show that 81% of P-LP variants are singleton mutations. 38% of P-LP variants are not present in gnomAD. 44% of such variants are classified as P (predominantly splice site and nonsense variants), and 56% are classified as LP (predominantly missense variants). P pathogenic, LP likely pathogenic, S Splice site, N nonsense, M missense, ACMG American College of Medical Genetics, GnomAD Genome Aggregation Database.

The exome-wide burden of rare synonymous variants did not differ between BD cases and controls (odds ratio [OR] = 1.00, 95% confidence interval [CI] = 0.9985–1.0004, p = 0.23), suggesting an absence of exome-wide bias in detecting rare variants across a heterogeneous group of sequencing studies. We subsequently restricted our analysis to rare protein-altering variants classified as pathogenic or likely pathogenic (P-LP) (Table S2). Examination of gene-level burden of rare P-LP variants did not reveal any genes that passed exome-wide significance (Table S3). Likewise, the exome-wide P-LP burden also did not differ between BD cases and controls (OR = 1.00, 95% CI = 0.97–1.03, p = 0.39), differing from sequencing studies of SCZ and developmental disorders of similar sample size showing an increased burden of rare coding variants across the exome [32, 42, 49].

However, BD cases carried a higher burden of P-LP variants compared to controls within 165 bipolar GWAS genes (meta-analysis of Firth logistic regressions OR = 1.89, 95% CI = 1.29–2.77, one-sided p = 6.0 × 10−4) and within 153 bipolar genes identified using MAGMA (meta-analysis OR = 1.56, 95% CI = 1.11–2.19, one-sided p = 0.0052) (Figs. 2, 3, Tables 2, S4 and S5). The relative burden of rare P-LP variants was higher within 81 genes representing the intersection of these two gene sets (meta-analysis OR = 2.66, 95% CI = 1.35–5.21, one-sided p = 0.0023) (Figs. 2, 3, Tables 2, S4 and S5). The CMH analysis results were consistent with those from meta-analysis of Firth logistic regressions described above and there was no evidence of heterogeneity in OR among the studies (Tables S4 and S5 for individual BSC study results). Meta-analysis results were also broadly consistent when subset to subjects of European ancestry (Tables S4 and S5).

Fig. 2: P-LP variant burden in candidate BD- and SCZ-related gene sets.
figure 2

Meta-analysis of Firth logistic regression of P-LP variants shows that BD cases in BSC cohorts appear to carry a higher burden of P-LP alleles in three BD GWAS-derived gene sets. Horizontal bars represent 95% confidence intervals. No enrichment of P-LP variants was observed in three schizophrenia GWAS-derived gene sets, in two neuron synaptic-related genesets (RBFOX2- and FMRP-related genes), in genes classified as LOF-intolerant, or in all genes across the exome. P values are one-sided for enrichment of P-PL variants in BD cases and derived from the meta-analysis Z-score. Numbers in parentheses represent the number of genes with P-LP variants and total number of genes within each gene set, respectively. BD bipolar disorder, OR odds ratio, P-LP pathogenic or likely pathogenic, GWAS genome-wide association study, LOF loss-of-function.

Fig. 3: P-LP variant burden in BD GWAS gene sets within BSC discovery and BipEx replication cohorts.
figure 3

Forest plots of log odds ratios (Firth logistic regressions) for association between P-LP variant burden and BD across seven cohorts/ethnicities in the BSC study (top) and across six strata in the BipEx replication cohort (bottom). p values are one-sided for enrichment of P-PL variants in BD cases and derived from the meta-analysis Z-score. Meta-analysis shows that individuals with BD do not carry a replicable enrichment of P-LP variants in 165 BD GWAS-derived genes, in 153 BD genes identified using MAGMA, or in 81 genes that overlap between the BD GWAS and BD MAGMA gene sets. Horizontal black lines indicate 95% confidence intervals around the effect size. Unshaded boxes indicate absence of P-LP variants within a specific cohort. Gray boxes indicate meta-analysis within the BSC (discovery) and BipEx (replication) cohorts, respectively, and black box indicates meta-analysis across all BSC and BipEx cohorts. Numbers in parentheses indicate the number of BD cases and controls. BSC Bipolar Sequencing Consortium, BipEx Bipolar Exomes collection, REP replication cohorts, META meta-analysis.

Table 2 Rare variant burden in candidate gene sets.

Attempted replication in WES data from 9,929 BD cases and 14,018 controls (Table S6) from the BipEx collection did not reproduce the observed enrichment of rare P-LP variants within the BD GWAS (meta-analysis OR = 0.94, one-sided p = 0.70), BD MAGMA (OR = 0.87, p = 0.87), or BD MAGMA-GWAS (OR = 0.64, one-sided p = 0.95) gene sets (Fig. 3, Table S7). Similarly, examination of 2,684 B1D-only cases and 14,018 controls did not reveal an enrichment of rare P-LP variants in the BD GWAS (OR = 0.89, one-sided p = 0.74), BD MAGMA (OR = 0.98, one-sided p = 0.54), or BD MAGMA-GWAS (OR = 0.85, one-sided p = 0.65) gene sets (Table S7). Thus, the initial observed enrichment of rare functional protein-altering variants within BD GWAS genes was not replicated.

Given the shared clinical features and reports of shared common genetic susceptibility between BD and SCZ, we also examined the burden of rare P-LP variants for three schizophrenia-related gene sets. Meta-analysis across cohorts showed no significant enrichment of rare P-LP variants in the 623 SCZ GWAS genes (OR = 1.19, 95% CI = 1.02–1.39, one-sided p = 0.01, adjusted one-sided p = 0.08), in 550 SCZ MAGMA genes (OR = 1.07, 95% CI = 0.89–1.28, one-sided p = 0.24), and in 342 SCZ MAGMA-GWAS genes (OR = 1.11, 95% CI = 0.94–1.32, one-sided p = 0.12) (Fig. 2, Tables 2S4 and S5). CMH analysis again showed results similar to meta-analysis with Firth logistic regression (Tables 2 and S4). This suggests that despite the shared common genetic susceptibility between BD and SCZ, we found no evidence for BD risk being associated with rare functional protein-altering variation in schizophrenia GWAS genes.

We examined three gene sets previously reported to harbor rare variants that influence neuropsychiatric traits: 3,055 genes whose mRNAs are bound by the synaptic protein RBFOX2, 1,233 FMRP-related genes that co-localize with neuronal synapses, and 3,230 genes reported to be intolerant to loss-of-function mutations. Each of these gene sets has been implicated in rare variant studies of schizophrenia [26, 32, 34, 50, 52]. We observed no significant enrichment of rare P-LP variants in the RBFOX2-related genes (meta-analysis OR = 0.99, 95% CI = 0.98–1.01, one-sided p = 0.86), FMRP-related genes (OR = 0.93, 95% CI = 0.82–1.04, one-sided p = 0.91), or LOF-intolerant genes (OR = 0.94, 95% CI = 0.86–1.02, one-sided p = 0.94) in BD cases versus controls across studies (Fig. 3). CMH analysis confirmed a lack of rare variant enrichment within these gene sets (Table 2), thus we had no evidence of BD risk from rare functional protein-altering variants in a broad set of genes encoding proteins that localize to neuronal synapses, that were previously associated with schizophrenia.

Lastly, we used published PGC2 case-control GWAS meta-analysis results for BD and schizophrenia to test the common polygenic burden on risk of BD in our BSC subjects. Consistent with previous publications [60], common variant PRS for both BD and schizophrenia were highly significant in the combined meta-analysis (pBD = 1.52 × 10−41 and pSZ = 1.51 × 10−32, Table S8), but explained only a small proportions of variance in disease risk (~1–4%) in individual BSC studies, again demonstrating the shared genetic etiology of these psychiatric conditions, at least from common variants and in contrast to rare variant analyses.

Discussion

To our knowledge, this primary analysis of 3,987 BD cases and 5,322 controls, along with independent replication/second stage samples of 9,929 cases and 14,018 controls represents the largest study of coding sequence variation in BD. We examined the exomes of these individuals, and identified rare coding variants classified as P-LP according to ACMG criteria [42, 44]. We did not observe a significant enrichment of rare P-LP variants in BSC BD cases versus controls overall or for any individual gene. We also did not observe an increased burden of rare P-LP variants in genes implicated by BD GWAS regions in the combined BSC and BipEx cases. This observation diverges from studies in some other complex disorders that found an enrichment of rare coding variation in genes associated with GWAS loci identified from common variants and vice versa [19,20,21, 61].

We did not observe a difference in rare P-LP variant burden within gene sets that influence synaptic function (RBFOX2 and FMRP), genes that are intolerant to loss-of-function mutations, or all genes across the exome. This diverges from prior studies of exonic variation in familial BD cases that suggest a role for synaptic-related genes [25], ion channels [23], and modulation of neurotransmitter activity [24]. However, those studies were small (far smaller than ours) and their findings unreplicated, highlighting the need for large studies with replication cohorts. While this study was not powered to identify novel single gene associations, it provides important insight into the genetic architecture of rare coding variants in BD, specifically the absence of replicable enrichment of such variants in several biologically plausible gene sets.

A shared genetic susceptibility to BD and SCZ is suggested through investigations of genome-wide common variation and polygenic risk scores (PRS) [12, 27,28,29]. Our results confirmed the correlation of common SCZ variants with BD risk. Moreover, examination of sub-phenotypes suggests that the SCZ PRS is associated with psychosis in BD, and that the BD PRS is associated with mania in SCZ [23, 24]. However, examination of rare coding variation does not support a clear role for BD-related genes in SCZ risk [32]. Similarly, we observe no significant enrichment of rare P-LP variants within SCZ GWAS genes in individuals with BD, or in other gene sets (RBFOX2, FMRP, LOF-intolerant, exome-wide) that have been reported to harbor rare variants in SCZ [32, 35]. While larger studies might reveal a smaller effect within these broader gene sets, this study does not show a clear role for rare variants within SCZ-related genes in BD.

The absence of deleterious rare variant enrichment in BD cases contrasts to studies of SCZ of similar size (4–5 K cases, 7–9 K controls) which show that individuals with SCZ carry an increased burden of rare protein-truncating variants across the exome, particularly in genes intolerant to loss-of-function mutations [32, 35]. This burden was particularly elevated in SCZ cases with intellectual disability [35]. Another study showed that rare CNVs are enriched in individuals with schizoaffective disorder bipolar type (SAB), but not in all individuals with BD [62]. These observations suggest that, while there exists a shared susceptibility for BD and SCZ through common genetic variation, rare coding variation in gene sets implicated in SCZ might preferentially confer a risk for SCZ-specific features independent of the affective symptoms that are definitional of BD. Thus, individuals with rare variants in genes implicated in SCZ might present with psychotic or cognitive predominant symptoms, rather than manic or hypomanic episodes, which are required for a diagnosis of BD.

In this study, variants classified as pathogenic are loss-of-function (LOF) mutations (splice-site and stop-gain) in a gene where LOF is a known mechanism of disease, while those classified as likely pathogenic are primarily missense variants in a gene with a low rate of benign missense variation. Both types of variants are absent or at very low frequency in large control cohorts (gnomAD), and have multiple lines of evidence for a deleterious effect. While there are likely a number of P-LP variants not included as P-LP in our study (e.g., frameshift LOF variants), their non-inclusion would only have limited the power of our analysis, and not created a bias. Nonetheless, examination of the P-LP variants we included did enable systematic interrogation of candidate gene sets that might be biologically relevant for a complex phenotype such as BD.

Although we found three significantly associated genesets in our BSC study, these genesets did not replicate in the BipEx study and they were not significant in a meta-analysis of the BSC and BipEx studies. There are differences in the case composition of the BSC and BipEx samples. More BD cases were classified as B1D, a form of BD with more extreme episodes of mood elevation [63], in the BSC discovery cohort (91% of cases) than in the BipEx replication cohort (74% of cases with known subtypes), However, when restricting to the smaller set of BipEx B1D cases we still did not detect association in the genesets identified in the BSC studies. It may be that the BSC geneset-based results are false positives or that larger sample sizes with deleterious variants defined by other metrics may be necessary to consistently detect a signal.

We acknowledge a number of limitations. First, there are a number of inter-study differences in methods used for case-control selection, DNA sample acquisition, exome or genome sequencing, and data quality control. While these differences might contribute to biases in variant detection in cases and controls, we showed that our approach produced no exome-wide deviation of rare synonymous variants in cases versus controls. Second, our analysis was limited to coding single nucleotide variants, which have the highest quality annotation in large public databases (gnomAD) and annotation tools (InterVar). Future WGS studies that include analysis of copy number, insertion-deletion, and regulatory variants may shed further light on the role of rare variants in BD. Third, while genome-wide common variant studies implicate loci that influence susceptibility for BD, many genes in these regions that were included in our analysis are likely unrelated to BD, highlighting the need for fine-mapping and functional studies. Lastly, our ability to further investigate the observed differences between the discovery and replication cohorts was limited due to the lack of deep sub-phenotype data (such as severity of mood episodes, psychosis, hospitalizations) in the majority of BD cases.

Despite the limitations, this study provides important biologic insight into disease pathogenesis in BD. While common and rare variant associations implicate several genes critical for neuronal development and synaptic function [15,16,17,18, 22,23,24,25,26], this study suggests that BD risk may not be broadly influenced by rare coding variation in genes within these categories. Moreover, while common genetic association studies suggest a role for shared genetic susceptibility between BD and SCZ, there is limited evidence for a shared susceptibility from rare genetic variation for these related mental disorders. Additional studies are needed to dissect the relationship between rare coding variants and subtypes of BD, including SAB. Neuroimaging and neurophysiological biomarkers [64, 65] may be used to identify genetic factors that influence endophenotypes within these complex disorders. Lastly, loci identified through BD common variant studies likely require fine-mapping through functional studies, and not only through scalable genomics methods, for translational impact.