Introduction

Bipolar disorders (BD) are chronic, heterogeneous, and complex mental disorders with a worldwide prevalence of ~1%1. Their etiopathology is poorly understood but genetic contribution arises has an important factor, with an estimated heritability ranging between 60% and 80%2,3. Recent genome-wide association studies (GWAS) have demonstrated the contribution of common variants in the increased vulnerability of developing BD4. Associated variants had a low effect size, but when combined in an additive model, they were able to significantly distinguish individuals with BD from controls5. However, these overall results explain only 25% of the genetic variance6, revealing a “missing heritability” and increasing the potential interest for rare penetrant variants.

The rapid development of next-generation sequencing led to the possibility to unravel the potential contribution of rare variants in BD. In other psychiatric disorders, e.g. autism spectrum disorder (ASD) and schizophrenia (SZ), whole-exome sequencing (WES) has already shown a higher frequency of de novo mutations in affected individuals compared to control populations7,8. However, few studies have been conducted yet in subjects with BD with a limited consensus on rare penetrant mutation burden and their relationship with low penetrant common variants9,10,11,12,13,14,15.

Here, we combined high-throughput genotyping data and WES in cohorts of individuals with BD as well as in multiplex families with a high density of affected individuals in order to determine the relative contribution of common and rare variants in clinical manifestations of BD. We first calculated polygenic risk scores (PRS) for BD and SZ in a cohort of 445 patients with BD and 1636 individuals from the general population, and determine their impact on the family history of BD. We then used these scores in multiplex families and compared affected and unaffected individuals. Using WES, we identified rare damaging variants in constrained genes that were shared by affected subjects and highlighted impacted biological pathways. Association with BD was confirmed by comparing mutation rates of associated genes in an independent cohort of 241 subjects with BD and 21,071 non-Finnish European individuals from the non-psychiatric population of the Exome Aggregation Consortium (ExAC)16. Finally, we combined PRS and rare variants identified in families and showed different genetic architectures between affected and unaffected individuals in multiplex families.

Materials and methods

Multiplex families

A written informed consent was obtained from all subjects prior to participation in the study and research ethic board of the Pitié-Salpêtrière Hospital (CPP Ile de France VI) approved protocols and procedures.

We recruited eight multiplex families (Supplementary Fig. S1) of French ancestry with at least two individuals meeting the BD criteria of the fourth edition of the diagnostic and statistical manual for psychiatric disorders (DSM-IV)17. In total, we collected DNA for 38 individuals: 21 with BD, 1 with schizo-affective disorder (SAD), 1 with ASD, 1 with unipolar depression, 1 with major depressive episode, and 13 healthy individuals. Taking into account the high shared heritability between BD and SZ and to avoid too much heterogeneity, we considered, as “affected” subjects in multiplex families, those with a diagnosis of either a BD or a SAD. All other phenotypes and unaffected subjects were pooled under the name “unaffected”.

Cohorts

Two-hundred-and-eighty patients with BD were recruited in nine FondaMental Academic Centers of Expertise for Bipolar Disorder (FACE-BD) located in France18. This network is coordinated by the French scientific cooperation foundation, Fondation FondaMental (www.fondation-fondamental.org). All patients met the DSM-IV diagnostic criteria for BD. In addition, a previously published cohort19,20 of 445 subjects with BD and 1636 control individuals of French origin has been used for genotyping analyses (Supplementary Fig. S2).

DNA extraction and genotyping

Genomic DNA was isolated from venous blood sample or from saliva. DNA extraction was performed as previously described21 for blood samples or using prepIT® L2P kit (DNA genotek, Kanata, ON, Canada), following the manufacturer’s instructions. In multiplex families, 27 DNAs were genotyped for ~960,000 single-nucleotide polymorphisms (SNPs) using the OmniExpressExome-8v1-4_A1 beadchips (Illumina Inc., San Diego, CA, USA) and 11 DNAs were genotyped for ~660,000 SNPs, using the Infinium Global Screening Arrays-24 v2.0 (Illumina Inc.), according to the manufacturer’s protocol (Supplementary Fig. S2). Quality control of genotypic data, family relationships, and ancestry were performed using the PLINK (v.1.9)22, the KING (v.2.1.3)23, and the Eigensoft (v.7.2.1)24 softwares, respectively. Briefly, samples with sex discordance, a heterozygosity rate different of the meanhet ± 0.2, a genotyping rate lower than 0.98 or without a European ancestry were excluded of analyses. SNPs with a genotyping rate lower than 0.95 were removed. Remaining SNPs were subsequently used for imputation in 37 samples using the Sanger imputation server. Prephasing was conducted using SHAPIT2 (v2.r790)25 and imputation using PBWT26 with 1000 Genomes phase 3 as reference panels27. After imputation, only biallelic SNPs with a minor allele frequency (MAF) higher than 0.01 in the reference panel and an imputation score higher than 0.9 were used for further studies.

The previously published cohort has been genotyped using HumanHap550 bead arrays (Illumina Inc.) for individuals with BD and HumanHap300 bead arrays (Illumina Inc.) for controls and imputed by the Psychiatric Genomics Consortium (PGC) based on HapMap 2 CEU sample19.

Polygenic risk score analysis

PRS were calculated for each individual using the PRSice software (v.2.0.15)28 and were based on the PGC summary statistics for BD4 or SZ29 with only European individuals. The previously published cohort was included in the PGC BD summary statistics. Therefore, summary statistics excluding the French cohort have been generously provided by Dr. Eli Stahl, who conducted the last PGC BD analyses. Only genotyped or imputed independent SNPs (r2 < 0.2) both in multiplex families and in the previously published cohort with a MAF higher than 0.01 and an imputation score higher than 0.9 were used for PRS calculation. We estimated the best PRS threshold to discriminate patients from controls in our cohort by testing all p values threshold between 0.00001 and 0.5 with an incremental factor of 0.00001. We then selected the one showing the lowest p value and the highest R2. We obtained a p value threshold of 0.3319 for the BD-associated SNPs-based PRS (BD-PRS) and of 0.2025 for SZ-associated SNPs-based PRS (SZ-PRS) (Supplementary Figs. S3 and S4, respectively). In addition, at the threshold previously determined, we isolated SNPs that only contributed to BD-PRS (BDonly-PRS), i.e. only present in BD summary statistics with INFO > 0.9 and p value < 0.3319, from those that only contributed to SZ-PRS (SZonly-PRS), i.e. only present in SZ summary statistic with INFO > 0.9 and p value < 0.2025, and from those that contributed both to BD and SZ, i.e. present in BD summary statistics with INFO > 0.9 and p value < 0.3319 and SZ summary statistics with INFO > 0.9 and p value < 0.2025. For the latter ones, we calculated a new PRS (BD–SZ-PRS) based on the average effect, i.e. the mean of the odds ratio (OR), observed in the SZ and the BD summary statistics. The correlation between PRS and the number of rare damaging variants in multiplex families was calculated using the sum of the scores (PRSsum) of BDonly-PRS, SZonly-PRS, and the BD–SZ-PRS for each individual.

Exome sequencing

Exons were captured using the SureSelectXT Human All Exon V5, the SureSelectQXT Human All Exon V6 or the SureSelectQXT Human All Exon V6+ UTR libraries (Agilent Technologies, Santa Clara, CA, U.S.A.). WES was conducted on NextSeq500 with paired-end 75 bp reads, HiSeq2000 with paired-end 75 bp reads, or HiSeq2500 with paired-end 125 bp reads. Data processing consisted in using an home-made workflow based on Trimmomatic (v.0.36)30, BWA-MEM (v.0.7.12)31, Picard (v.2.8.1), and GATK (v.3.7)32 softwares. A bed intersect file between the three different captures generated with BedOps (v.2.4.28)33 and the human reference genome (hg19:GRCh37) were used. Variant calling was performed following the GATK best practices recommendations and annotation was made with SnpEff (v.4.3)34 and bcftools (v1.6). Only variants passing quality filter of VariantRecalibration tool of GATK, with a depth of read higher than 10, a genotype quality score higher than 20, and no missingness in family were kept. Same quality control of individuals as for beadchips was applied using the PLINK, KING, and Eigensoft softwares except a genotyping rate lower than 0.94. After quality control, 34 samples were used in subsequent analysis. In order to identify relevant variants, we selected variants absent from the ExAC16 or the 1000 Genomes27 databases or with a MAF lower than 0.01 in these two databases. We then selected damaging variants, i.e. missense variants, indels, frameshift, loss or gain of stop or start codon, and those altering splice sites, with a phred CADD score higher than 15 (v1.4)35, corresponding to the 5% most deleterious variants in the genome. We also selected variants in constrained genes, i.e. with a z-score higher than 3 for missense variants and inframe indels, or with a pLI > 0.9 for loss-of-function variants16. All the mutations identified have been checked by Sanger sequencing.

For the cohort of 280 subjects with BD, the same quality control was performed. In addition, related individuals and variants with a missingness higher than 0.03 were excluded. After quality control, 241 were used in subsequent analyses (Supplementary Fig. S2).

For general population, we downloaded vcf files (ftp://ftp.broadinstitute.org/pub/ExAC_release/release1/subsets/ExAC_nonpsych.r1.sites.vep.vcf.gz) corresponding to non-psychiatric individuals of the ExAC population16 and selected only non-Finnish European individuals. We then extracted damaging variants (missense or loss-of-function) with an MAF < 1% and which passed filter of VariantRecalibration tool of GATK in mutated genes identified in our multiplex families. Only variants covered at 95% of the maximum count of allele were considered and only genes with more than 70% of their variants meeting this criterion were considered.

Network and pathway analyses

Network and pathway analyses were performed using the R version of the STRING (v.10) database36. Network was defined using a minimum interaction score of 0.4 and with active interaction sources as text-mining, co-expression, experiments, and database. Pathway analyses were based on the Gene Ontology (GO) resource37,38 and a custom background gene list of 18,016 genes. Only GO term including between 5 and 1800 genes and pathways with at least 10% of mutated genes were considered for enrichment analyses.

Gene length bias was controlled by randomly selecting 10,000 times the number of genes identified in multiplex families (N = 34) with a cumulative exon length equal to the mean size of mutated genes (±20%). GO terms were removed when they were found more than 500 times after 10,000 permutations.

Statistical analyses

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Statistical analyses were conducted with R (v.3.4.1). All variables were expressed as median and interquartile range and between-group comparisons were analyzed by Wilcoxon rank-sum tests. Group comparison proportions were analyzed using the Fisher’s exact test or Chi-squared test according to sample size. Benjamini–Hochberg correction was applied for multiple testing when comparing mutation rates. Differences were considered as significant when the false discovery rate (FDR) was lower than 0.05. Bonferroni correction was applied for PRS comparisons, multivariate PRS analyses, clinical data analyses, and linear regression between PRSsum and the number of rare mutations. In all cases, p values were adjusted (padj) according to the number of tests, i.e. 12, 3, 5, and 2, respectively, and difference were considered as significant when the padj was lower than 0.05.

Results

SZ-specifically associated SNPs contribute only to familial forms of BD

Common polymorphisms have been demonstrated to explain up to 25% of the genetic variance in BD, more than half of which could be shared with SZ6. In order to determine whether the polygenic contribution was similar in familial and non-familial forms of BD, we used imputed genotyping data of 1,709,567 SNPs for 445 patients with BD, 1636 controls, and 37 individuals in multiplex families to calculate two PRS based on PGC BD4 and SZ29 summary statistics. Both BD-PRS and SZ-PRS contributed to BD vulnerability for subjects with (N = 144) and without (N = 218) a family history of BD when compared with controls (Fig. 1). Although there was no difference in BD-PRS between individuals with and without a family history of BD in our cohort (Wilcoxon rank-sum test, W = 16,212; p = 0.60; padj = 1.00; Fig. 1a), a higher SZ-PRS was observed for familial forms when compared to non-familial ones (Wilcoxon rank-sum test, W = 13,225; p = 0.01; padj = 0.12; Fig. 1b). However, this difference did not resist to correction for multiple testing. We then split PRS with the aim of distinguishing SNPs that were specifically associated with BD (BDonly-PRS) or SZ (SZonly-PRS), and those that contributed to both disorders (BD–SZ-PRS) and performed multivariate analyses (Table 1). While the SZonly-PRS and the BD–SZ-PRS both contributed to the risk of developing BD in familial forms of the disease, only the BD-associated polymorphisms (BDonly-PRS and BD-SZ-PRS) contributed to the disease vulnerability in non-familial forms of BD. Consistent with this observation, we showed that SZ specific associated polymorphisms (SZonly-PRS) were able to distinguish familial and non-familial forms of BD (Table 1).

In multiplex families, we observed that affected subjects had a higher BD-PRS than controls (Wilcoxon rank-sum test, W = 10,675; p = 0.003, padj = 0.04; Fig. 1a). Although, BD-PRS was slightly higher in unaffected subjects in multiplex families, this difference did not resist to correction for multiple testing (Wilcoxon rank-sum test, W = 9199; p = 0.04, padj = 0.48) and no difference was observed between affected and unaffected subjects (Wilcoxon rank-sum test, W = 182; p = 0.68, padj = 1.00). In contrast, no higher SZ-PRS was observed neither for affected (Wilcoxon rank-sum test, W = 15,589, p = 0.47, padj = 1.00) nor for unaffected (Wilcoxon rank-sum test, W = 12,124, p = 0.61, padj = 1.00; Fig. 1b) subjects in multiplex families. These results suggested that common polymorphisms were unable to explain alone the clinical status differences observed in multiplex families of BD.

Fig. 1: Polygenic risk scores for bipolar disorder and schizophrenia in multiplex families and the general population.
figure 1

Violin plot showing polygenic risk scores for bipolar disorder (BD-PRS) (a) and schizophrenia (SZ-PRS) (b) calculated for affected (gray) and unaffected (blank) individuals. Scores were normalized on the control median value and compared between controls (N = 1636), non-familial forms of bipolar disorder (BDnonfam, N = 218), familial forms of bipolar disorder (BDfam, N = 144), affected subjects in multiplex families (FAMaff, N = 21), and unaffected subjects in multiplex families (FAMunaff, N = 16) with Wilcoxon test. A p value threshold of 0.3319 including 27,100 independent SNPs was chosen for BD-PRS as the best threshold to discriminate between individuals with bipolar disorder (BD) and controls. A p value threshold of 0.2025 including 21,668 independent SNPs was chosen for SZ-PRS as the best threshold to discriminate BD and controls. Nominal p values are indicated for each comparison. Significant results after Bonferroni correction for multiple testing are indicated with stars: *padj < 0.05, **padj < 0.01, ***padj < 0.001, ****padj < 0.0001.

Table 1 Multivariate analyses in family and non-familial forms of bipolar disorder using polygenic risk scores based on bipolar disorder-specific-, schizophrenia-specific-, and bipolar disorder and schizophrenia shared-associated variants.

Rare damaging variants shared by affected subjects in multiplex families affect gene regulation or neuron development and morphogenesis

In order to determine whether rare variants may also contribute to the risk of developing BD in multiplex families, we carried out a WES on 34 subjects from the 8 multiplex families. The mean coverage was 91.5X with 94.7% of bases covered by 20 reads on average. We identified on average 38,347 variants per individual. Focusing on constrained genes has been shown in the past to be a relevant strategy to identify genes associated with multiple psychiatric disorders10,39. Thus, after variants filtration, we had on average four rare deleterious variants in constrained genes shared by affected individuals per family (Supplementary Table S1) for a total of 34 mutations located in 34 different genes. We then conducted a network and pathway analysis with STRING software in order to determine whether the proteins encoded by these genes worked in interaction or belonged to the same biological pathways. We showed that genes with mutations have a protein–protein interaction enrichment (p = 0.01) (Fig. S5), highlighting their biological connections. The GO terms analysis showed an overrepresentation of 63 biological pathways (FDR < 0.05), and all but three resisted to 10,000 random permutations when we checked for putative gene size bias (Supplementary Table S2). Combining GO data (Supplementary Tables S2S4) with a review of the literature, we were able to gather functionally convergent biological pathways and showed that the 34 genes were involved either in the development and morphogenesis (N = 31), or the regulation of gene expression (N = 25), or both (N = 22). In addition, we showed that 29 of them were expressed in neurons (Supplementary Fig. S5 and Supplementary Table S3).

Genes with shared damaging mutation in multiplex families are more frequently mutated in patients with BD than in the general population

In order to determine whether the genes with shared damaging mutations in multiplex families were associated with BD, we sequenced 241 additional independent individuals with BD and compared the mutation rate in patients with the 21,071 non-Finnish European individuals of the ExAC non-psychiatric population. We selected rare (MAF < 1%) missense or loss-of-function variants according to the type of mutations identified in multiplex families in 21 genes. Four genes (SMARCC2, WDR37, UPF2, and HELLS) showed a higher mutation rate in individuals with BD than in the ExAC population after correction for multiple testing (Table 2).

Table 2 Mutation rate in BD cohort and ExAC database for genes mutated in multiplex families.

Patients with mutations in BD-associated genes display a specific clinical profile

For the four genes with a higher mutation rate in patients with BD than in general population, we analyzed the clinical profile of mutated subjects. As the severity of illness may depend on its duration, we first checked that there was no difference in the duration of illness between subjects with and without mutations in associated genes (Wilcoxon rank-sum test, W = 1635, p = 0.60, padj = 1.00). In order to determine whether carrying a rare damaging mutation increased the severity of the disease, we calculated a severity score that combined the age at onset, the age at first hospitalization, as well as the number of total episodes and the number of hospitalizations per year of disease. Mutated individuals showed no difference in severity of the disease when compared with non-mutated ones (Wilcoxon rank-sum test, W = 1497, p = 0.59, padj = 1.00; Supplementary Fig. S6). Nevertheless, we observed particular clinical manifestations for individuals with mutation in some of the BD-associated genes (Fig. 2). Thus, individuals with SMARCC2 mutations had a predominant manic polarity, displaying more manic than depressive episodes in total (χ2 = 15.3, df = 1, p = 9.1 × 10−5, padj = 4.6 × 104; Fig. 2a). Furthermore, individuals with missense variants in HELLS displayed more mixed episodes per year of disease than non-mutated subjects (Wilcoxon rank-sum test, W = 884, p = 0.006, padj = 0.03; Fig. 2b). Although individuals with missense variants in UPF2 seemed to have more depressive episodes than non-mutated patients, this difference was not significant (Wilcoxon rank-sum test, W = 865, p = 0.49, padj = 1.00; Fig. 2c). As only two patients had mutation in WDR37 and only one had available clinical data, we were unable to perform statistical analysis for this gene.

Fig. 2: Clinical profiles of mutated individuals in genes with a higher mutation rate in bipolar disorder.
figure 2

Radar plots represent the median number of manic, depressive, or mixed episodes, as well as those with psychotic symptoms for individuals with (red) or without (blue) mutation in SMARCC2 (a), HELLS (b), and UPF2 (c). For graphical representation, we performed a feature scaling by normalizing data and zoomed on the median values for each feature.

Affected and unaffected subjects in multiplex families display different genetic pattern of vulnerability

We showed that BD- and SZ-associated polymorphisms both contributed to familial forms of BD. In addition, we identified rare variants in constrained genes that were shared by affected subjects in multiplex families and which may also contribute to the BD vulnerability. In order to explore the genetic architecture explaining the clinical difference between affected and unaffected individuals in multiplex families, we computed a new PRS (PRSsum) combining the information from BD- and SZ-PRS and analyzed this score in regard to the number of rare mutations identified in each individual. Whereas no correlation was observed for affected individuals (R2 = −0.014, F = 0.76, p = 0.40, padj = 0.80; Fig. 3a), a negative linear regression was observed between the PRSsum and the number of rare variants in unaffected individuals (R2 = 0.32, F = 7.19, p = 0.02, padj = 0.04; Fig. 3b). The observation that only unaffected individuals in multiplex families had either a high PRSsum and low rate of rare damaging mutations or the opposite, suggested the need to reach a threshold of BD-associated variants to declare the disease and showed the important role of both common and rare variants in BD vulnerability.

Fig. 3: Correlation between the polygenic risk score and the number of rare damaging mutations in multiplex families with bipolar disorder.
figure 3

Each dot represents a single individual and lines symbolize results of the linear regression. The polygenic risk score (PRSsum) results from the sum of the scores of BDonly-PRS, SZonly-PRS, and the BD–SZ-PRS for each individual. a No correlation was observed between the polygenic risk score and the number of rare damaging mutations in affected individuals (R2 = −0.014, F = 0.76, p = 0.40, padj = 0.80). b Unaffected individuals showed a significant negative linear regression (R2 = 0.32, F = 7.19, p = 0.02, padj = 0.04) between the polygenic risk score and the number of rare damaging mutations.

Discussion

In this study, we explored the genetic architecture of familial forms of BD combining both common and rare variant analyses. PRS has been demonstrated to be a powerful tool to show the role of common polymorphisms in many psychiatric disorders as well as their shared heritability6. However, few studies looked at PRS in families of individuals with BD. Here, we report a higher BD-PRS in families of individuals with BD than in the general population, but no difference was observed between affected and unaffected subjects in multiplex families. Although the higher PRS in BD families is well supported by previous studies15,40,41, those calculating a PRS for unaffected subjects reported inconsistent results depending on the number of subjects under study13,15,41. Here, the small sample size in multiplex families prevent to exclude any differences between affected and unaffected individuals. Indeed, we estimated that we had a statistical power of 80% to show a large difference (Cohen’s d = 0.7) between affected and unaffected individuals, but only 12.5% chance to detect a small effect size (d = 0.2). These observations pinpoint the importance of having cohorts as large as possible to conduct such studies and replication on larger samples is required to exclude a difference in PRS between affected and unaffected individuals of multiplex families. Nevertheless, in the whole population of individuals with BD (N = 362), we have been able to show that the PRS based on SNPs that have been specifically associated with SZ were higher only in familial forms of BD. The difference in sample size between SZ and BD cohorts to calculate the summary statistics might make the SZ-PRS more robust than the BD-PRS. However, according to the sample size of this cohort, we estimated that we had a statistical power higher than 80% to show a difference in BD-PRS between familial and non-familial forms of BD if this difference was similar to the one observed using SZ-PRS. SZ-PRS has already been shown to explain a shared genetic variance between SZ and BD5,6. But difference between familial and non-familial forms of BD had never been reported yet. Thus, our results suggest that the shared genetic vulnerability between SZ and BD might concern mainly familial forms of BD. This is consistent with the observation of a high frequency of individuals with SAD in relatives of individuals with BD42. In the multiplex families that we studied, no significant difference between affected and unaffected subjects was detected. Although the small number of individuals in multiplex families limits our possibility to draw robust conclusion, this suggests this score alone was unable to explain the difference in clinical status. We thus investigated the impact of the rare genetic variation in constrained genes that were shared by affected individuals in multiplex families. Interestingly, the negative correlation that we observed between PRS and rare damaging variants specifically in unaffected individuals suggests that the balance between rare and common variants might explain the difference in clinical manifestation that we observed in multiplex families of BD. The small size of our sample forces us to consider these results with caution and replication on independent larger samples will be needed to confirm such hypothesis.

Genes with rare damaging mutations which were shared by affected subjects in multiplex families revealed a network of proteins involved in neuronal development or in the regulation of neuronal gene expression. These pathways have already been reported in numerous WES in ASD or SZ7,8,39, indicating a clear overlap in the etiology of BD with these disorders as already suggested9,10,11,14. These findings are also consistent with results from GWAS and transcriptome analyses of SZ and, to a lesser extent of ASD6,43,44, suggest that the regulation of gene expression during brain development can be a common pathogenic pathway impaired in BD, SZ, and ASD.

Here, we identified a set of four novel genes, for which the mutation rate in individuals with BD was significantly higher with respect to the general population. Among these genes, SMARCC2 (MIM: 601734) encodes a scaffolding core subunit of the chromatin remodeling complex mSWI/SNF that controls the accessibility of DNA sequences to transcription factors. This complex plays an essential role during neuron development and the absence of Smarcc2 in mice resulted in impaired embryonic and adult neurogenesis inducing cognitive dysfunction45. Mutations in this gene have already been found in patients with ASD46 or neurodevelopmental disorder47, and mutations in another subunit of this complex, SMARCA2, have been reported in individuals with SZ48,49.

HELLS (MIM: 603946), the second gene identified to be frequently mutated in patients with BD, belongs to the SNF2 subfamily of chromatin remodeling ATPase as SMARCA2. This protein is required for genome-wide methylation50 and it is responsible of immunodeficiency ventromeric region instability and facial anomalies (ICF) syndrome (MIM: 616910). The ICF syndrome is characterized by centromeric instability, as the cytogenetic hallmark, facial dysmorphism, and severe immunodeficiency, as well as developmental delay and intellectual deficit. In mice, HELLS was demonstrated as regulator of neural stem cell fate, affecting self-renewal and proliferation of neural progenitor cells51, suggesting a direct role in nervous system development.

The third gene frequently mutated in our cohort of individuals with BD is UPF2 (MIM: 605529), a core component of the nonsense-mediated mRNA decay (NMD) pathway, a surveillance pathway that eliminates mRNA with premature translation termination codon. In drosophila, Upf2 plays a role for proper development of synapse architecture and synaptic vesicle efficacy52. Moreover, copy number variants in this gene were found in intellectual disabilities syndrome with DiGeorge syndrome associated53 and de novo point mutations have been identified in a patient with SZ54. Altogether, these observations suggest a role for UPF2 in nervous system development.

Few things are known about the last gene, WDR37. Its protein belongs to the WD40 repeat domain family that is characterized by diverse cellular function, as chromatin assembly, RNA processing, immunity, or development55. This interacting domain scaffolds protein–protein or protein–DNA interactions. Although WDR37 function is unknown, mice knockout for this gene show a larger brain than wild-type littermates56. Interestingly, 57 copy number variants including this gene have been reported in the DECIPHER v9.30 released (https://decipher.sanger.ac.uk/). Among associated phenotypes, we reported developmental delay, seizures, intellectual disabilities, and ASD.

In summary, we demonstrated that both common and rare variants contribute to phenotypic outcome in multiplex families of BD, and our data suggest that the difference between the affected and unaffected status of individuals within multiplex families might come from the balance between the two types of variants. In addition, we confirmed the shared genetic vulnerability between BD, SZ, and ASD and suggest the familial forms of BD might share a higher heritability with SZ than non-familial ones. Finally, we highlighted a neurodevelopmental impairment in familial forms of BD. Altogether, our results shed new lights on the genetic architecture and biology of BD and highlight the importance of using large pedigrees with a high density of affected individuals to investigate the etiology of the disease.