Contribution of common and rare variants to bipolar disorder susceptibility in extended pedigrees from population isolates

Current evidence from case/control studies indicates that genetic risk for psychiatric disorders derives primarily from numerous common variants, each with a small phenotypic impact. The literature describing apparent segregation of bipolar disorder (BP) in numerous multigenerational pedigrees suggests that, in such families, large-effect inherited variants might play a greater role. To identify roles of rare and common variants on BP, we conducted genetic analyses in 26 Colombia and Costa Rica pedigrees ascertained for bipolar disorder 1 (BP1), the most severe and heritable form of BP. In these pedigrees, we performed microarray SNP genotyping of 838 individuals and high-coverage whole-genome sequencing of 449 individuals. We compared polygenic risk scores (PRS), estimated using the latest BP1 genome-wide association study (GWAS) summary statistics, between BP1 individuals and related controls. We also evaluated whether BP1 individuals had a higher burden of rare deleterious single-nucleotide variants (SNVs) and rare copy number variants (CNVs) in a set of genes related to BP1. We found that compared with unaffected relatives, BP1 individuals had higher PRS estimated from BP1 GWAS statistics (P = 0.001 ~ 0.007) and displayed modest increase in burdens of rare deleterious SNVs (P = 0.047) and rare CNVs (P = 0.002 ~ 0.033) in genes related to BP1. We did not observe rare variants segregating in the pedigrees. These results suggest that small-to-moderate effect rare and common variants are more likely to contribute to BP1 risk in these extended pedigrees than a few large-effect rare variants.

. Comparison of genotype missing rate of individuals from imputed data and the genotype concordance rate between genotypes imputed by GIGI and microarray genotypes. Each point represents each imputed individual. Genotypes were missing if genotype probability from GIGI imputation was lower than the genotype probability threshold, and we included only individuals whose genotype missing rates were < 10% for subsequent analyses. Supplementary Figure 7. An example of a pedigree for the segregation analysis. In this family, among 10 individuals, only the latest offspring (individuals 9 and 10) are affected while all others have a missing phenotype, which generates the maximum Srare statistic of 2. The p-value of Srare statistic of 2 is much more significant when the top founder (founder 1) is F rv (founders with a rare variant) compared to when a parent of individuals 9 and 10 (founder 8) is F rv . The fact that the rare variant is inherited in every generation and shared among the two affected individuals in the last generation is a more rare event than when the rare variant is directly inherited from a parent and hence yields a more significant p-value.
Supplementary Figure 8. First two dimensions of a PCA analysis of founders from the BP1 WGS data set, using 1000 Genomes phase 3 samples as reference.

African Ancestry
Affection Status Proportion Supplementary Figure 10. The number of rare SNVs passing the three different filters. The first filter (F1) is the number of rare variants with genotype missing rate < 5% and in which at least two affected individuals in a family share the variant. The second filter (F2) is the number of rare variants for which all founders have high-quality genotypes (the highest genotype probability among the three probabilities from GIGI imputation > 0.8). The third filter (F3) is the number of rare variants for which all founders except a pair of top founders have high-quality genotypes. The "Passing F1 & (F2 or F3)" filter is the number of rare variants that we analyzed in the segregation analysis.
Supplementary Figure 11. The number of rare CNVs passing the three different filters. The first filter (F1) is the number of rare variants with genotype missing rate < 5% and in which at least two affected individuals in a family share the variant. The second filter (F2) is the number of rare variants for which all founders have high-quality genotypes (the highest genotype probability among the three probabilities from GIGI imputation > 0.8). The third filter (F3) is the number of rare variants for which all founders except a pair of top founders have high-quality genotypes. The "Passing F1 & (F2 or F3)" filter is the number of rare variants that we analyzed in the segregation analysis.

Supplementary Tables
Supplementary Table 1

Supplementary Text Ascertainment and Phenotypic Assessment of Study Sample
The study sample consisted of members of 26 pedigrees, 15 from CR and 11 from CO, each ascertained based on multiple members who had previously been clinically diagnosed with BP1. Our group has studied three of the CR families 1, 2 and several of the CO families 3 in past linkage studies of BP1, however the composition of the families changed somewhat as we recruited new individuals. We increased the number of families for study by recruiting family members of BP1 individuals individually recruited for population-level mapping studies 4, 5 . We expanded each family by systematically evaluating first degree relatives of the BP1 proband, and included other branches as we encountered individuals with suspected BP1. The ascertainment and phenotyping strategy employed in expansion of these pedigrees was previously reported in Fears et al. 6 . Due to the way the pedigree was expanded, with an emphasis on collecting equivalent numbers of BP1 individuals and their unaffected relatives, and with systematic preferred expansion of branches with BP1 individuals, the BP1 phenotype itself is not heritable in this pedigree collection. Individual-level QC of WGS Data: Before checking sequencing quality of each individual, we first removed variants that failed Variant Quality Score Recalibration (VQSR) in the GATK pipeline and we set each genotype whose genotype quality (GQ) score was ≤ 20 to missing. Additionally, all multi-allelic SNVs (0.44% of all SNVs after the VQSR filter) were excluded. In the remaining variants, for each sequenced individual we assessed genotype missing rate, the number of singletons and WGS genotype concordance with array genotypes, verified agreement between reported sex and sex determined from X chromosome markers, and compared empirical estimates of kinship with theoretical estimates. We also performed principal component analysis (PCA) using 1000 Genomes (1KG) Phase 3 as a reference panel 11 . We used EIGENSTRAT 12 for PCA, and included only founders from WGS data, and used independent common SNVs present in both BP1 WGS and 1KG Phase 3. We identified three individuals with poor sequencing quality who had high genotype missing rate (> 5%) or an excessive number of singletons and subsequently excluded the WGS data of these individuals. In comparison with array genotype data, we discovered two possible sample mix-ups, and WGS data from these samples were excluded. In the remaining subjects, all empirically derived kinship coefficients were consistent with theoretical kinship and genetic sex agreed with reported sex. In PCA plots, all founders from BP WGS were mapped close to Admixed American (AMR), especially in close proximity to Colombians from Medellin, Colombia (CLM). 11

Variant-level QC of WGS Data:
In doing variant-site QC, rather than using fixed thresholds for missingness, HWE, etc. to filter out poor quality variant sites, we used logistic regression to predict the probability of variant sites being of good or poor quality, with prediction based on mean and standard deviation of variant sequencing depth and genotype quality and a fraction of individuals meeting certain thresholds for depth and quality. We trained the regression model on a set of variants deemed to be of obvious good and poor quality using several QC measures such as genotype missing rate, HWE p-value, and the number of Mendelian errors.
We assessed the accuracy of the model using cross validation (see the next section for details).

Imputation of Sequenced Sites into Genotyped Individuals:
The 454 individuals in 22 of the 26 CO/CR families who underwent WGS were selected with the goal of imputing the rest of 334 family members who were genotyped on the Omni 2.5 chip, but not directly sequenced (Supplementary Figure 1). We used the imputation software GIGI 14 , an approach designed to impute genotypes on large extended pedigrees. First, we obtained a genetic map of variant sites using the Rutgers genetic map interpolator 15 . We then used MORGAN 16 to obtain pedigree IVs based on independent common SNP array genotype data of individuals in 22 families with WGS data. GIGI used these IVs to impute WGS variants identified in the directly sequenced individuals into the pedigree members with only SNP genotype data. GIGI also imputed sequence data for pedigree members who were neither sequenced nor genotyped in the Omni 2.5 chip. GIGI imputed each family separately, and we further divided each chromosome into 10 Mb intervals to perform efficient imputation by utilizing parallelization in the high performance cluster. After imputation, GIGI generated the probability of each imputed genotype and we used the threshold-based calling with the default threshold to call genotypes for the rare variant burden analysis while we used the most likely genotype calls for the rare variant segregation analysis. For the threshold-based calling approach, only genotypes with probability in excess of 80% were called, genotypes that did not meet this threshold were set to missing. After the GIGI imputation, there were 782 individuals who were either sequenced or imputed with the high quality in 22 pedigrees. Among them, 190 are BP1 and 130 are controls.
Measuring accuracy of imputation: One measure to assess the imputation quality is the number of MEs after imputation. Considering all SNVs, we observed no MEs with the threshold-based calling and 239 MEs with the most likely genotype calls. Another measure is the missing rate of imputed genotypes in individuals without WGS data, as the higher missing rate indicates the lower imputation quality. One subject among the 334 imputed individuals had missing rate > 10% and was excluded (Supplementary Figure 1). For SNVs, 37.23% of common variants (MAF > 3%) and 0.37% of rare variants (MAF < 3%) had missing rate > 10% (computed from 334 imputed subjects). Another measure for imputation quality is to check genotype concordance between SNV genotype data and imputed data for SNVs. A majority of SNVs in the Omni 2.5 chip were present in WGS data and imputed during the GIGI imputation process. Excluding the individual who had the high imputation missing rate, all individuals with the SNV genotype data had genotype concordance rate > 97.93% between the Omni 2.5 chip and GIGI imputation data for non-missing calls in both datasets. Among those individuals, 80.48% had genotype missing rate < 10% for imputed data on SNVs present in Omni 2.5 chip. The genotype concordance rate between SNV and imputed data and the genotype missing rate of imputation had high negative correlation as expected (r=-0.877) (Supplementary Figure 2).

Building a logistic regression model to predict the quality of variants
In genetic studies, a few quality control (QC) measures are assessed to filter out variants with poor quality. For example, we may discard a variant if its genotype missing rate among genotyped individuals is > 10% or its Hardy Weinberg equilibrium (HWE) p-value is < 1E-7. These variants may represent sequencing or genotyping errors and may cause spurious findings. It is not, however, straightforward to choose the right threshold values for the QC measures, and they are often chosen arbitrarily. This approach may include variants with poor quality in analysis or may exclude variants with good quality.
In this paper, we aimed to build a classifier based on a logistic regression model to predict whether a variant has good or poor quality using its sequencing quality measures. We used the following six sequencing quality measures of those variants with good and poor quality to train the classifier: 1) average sequencing depth (DP), 2) average genotype quality (GQ), 3) standard deviation of DP, 4) standard deviation of GQ, 5) a fraction of individuals whose DP is < 34 for SNVs, and 6) a fraction of individuals whose GQ is < 99 for SNVs. The rationale for using these quality measures is that variants with better quality are likely to have higher average DP and GP, lower SD of DP and GQ, and the lower fraction of individuals with DP and GQ < the thresholds. The thresholds (34 and 32 for DP and 99 and 81 for GQ) were chosen because they corresponded to the first quartile of the distribution of DP and GQ of genotypes that do not cause Mendelian errors for SNVs and are consistent with genotypes in Omni 2.5 chip for SNVs on chromosome 1. These sequencing quality measures were computed over the 449 subjects after removing five individuals with poor quality sequencing and sample mix-up.
To determine how accurately these six sequencing quality measures predict the quality of variants, we performed 10-fold cross validation in which we divided a set of variants with good and poor quality into 10 subsets. We trained a classifier using 9 subsets and tested it on the remaining set. We measured the accuracy of the classifier, which is the fraction of correct classifications on the 1/10 of variants. We repeated this procedure for all 10 subsets and measured accuracy and regression coefficients of each subset. Because the number of variants with good quality is much greater than that with poor quality (20M vs. 4,957 variants for SNVs), we randomly chose the same number of variants with good quality as the number of variants with poor quality. We found that the average accuracy of our classifier using the 10-fold cross validation is 99.17% for SNVs.
We built the classifier by using the average regression coefficients estimated from the 10-fold cross validation and applied it to the set of variants with uncertain quality. We found, however, that some of those variants had extremely poor quality measures on one or two criteria, and we decided to mark them as outliers that would be considered as variants with poor quality. A variant was an outlier if it failed one of the following QC; # of discordant genotypes to Omni 2.5 chip > 20 for SNVs typed in Omni 2.5 chip, # of MEs > 10 and > 8 for common and rare variants, respectively, genotype missing rate > 10% and 8% for common and rare variants, respectively, and HWE p-value < 1E-8 and < 0.001 for common and rare variants, respectively. The number of outlier variants was 406,840 (32% among variants with uncertain quality) for SNVs. We applied the classifier to the rest of variants with uncertain quality to obtain the probability of them having good quality, and we considered them as variants with good quality if this probability was > 80%. We found that 68.82% of SNVs with uncertain quality were predicted to have good quality. In the end, there were 20,907,280 SNVs with good quality (96.85% of total SNVs) and 680,477 SNVs with poor quality (3.15%). Summary QC statistics for these 20.91M SNVs can be found in Supplementary Table 1.

CNV calling using genotyping arrays
SNP genotyping was performed in three separate batches. In order to obtain the most accurate intensity measures and reduce inter-batch variability, we first produced an initial set of intensity measures and genotyping calls using the canonical cluster file (*.egt) in Beeline 2.0 (Illumina). We excluded all assays with an LRRSD > 0.30 and a genotype call rate < 98%, and generated custom cluster files generated for each genotyping batch individually in GenomeStudio 2.0 (Illumina). These cluster files were then used to generate new LRR and BAF values. We then restricted our analysis to the 2,364,179 SNP assays common across all versions of the Omni2.5 arrays and ran two separate CNV calling algorithms, PennCNV 17 and QuantiSNP. 18 PennCNV was run as recommended by the developer with waviness correction 19 and a custom *.pfb file generated from all samples. QuantiSNP 2.0 was run using default settings including GC-based correction of LRR. Calls from both algorithms were compared on the sample level, and retained only if they were called by both algorithms (based on any overlap) and of the same relative type (gain or loss). For these overlapping calls, we retained the CNV boundaries as defined by PennCNV.
We removed intensity outliers (mean +/-3 SD or by visual inspection) based on three different metrics extracted from QC logs produced by PennCNV (Supplementary Figure 3): LRR standard deviation (LRR_SD), BAF standard deviation (BAF_SD), and waviness factor (WF). Samples were excluded if they had an LRR_SD > 0.21, a BAF_SD > 0.054 or |WF| > 0.02. We then performed post-segmentation cleaning of fragmented CNV calls using PennCNV's "clean_cnv.pl" script. Adjacent calls of the same type were merged if the number of intervening markers was less than 20% of the combined segment. We removed calls in regions known to produce spurious CNV calls. Calls were removed if they spanned or were contained within centromeric regions (+500K), or if they overlapped with telomeric regions (+100K), VDJ recombination regions, or segmental duplications by more than 50% of their length. After filtering for rare events (see Methods) spanned by a minimum of 10 SNPs and > 5kb in length, we also removed samples with excessive CNV load, both in terms of the number of CNV segments (n > 50) and total autosomal CNV length (>5Mb).

CNV calling using WGS
We used GenomeSTRiP v2.00.1747 20 to detect and genotype CNVs across all samples with WGS. Alignment files were first preprocessed using a reference metadata bundle adapted from the Broad's HG19 version (ftp://ftp.broadinstitute.org/pub/svtoolkit/reference_metadata_bundles) using a 100bp alignability mask. Our aim was to impute CNVs detected in the subset of samples with WGS. Therefore, we focused on deletions only, since deletions are most reliably imputed 20 and also because our imputation method relies on the availability of discrete genotype calls. To discover deletions, we ran the SVDiscovery pipeline script to in two different batches to discover variants of different size ranges: 100bp-100kbp and 100kbp-10Mb. Sites from both discovery runs were filtered using recommended parameters established by the Genome of the Netherlands Consortium. 21 Discovery sites with an alpha-satellite fraction of >0.9 were also excluded. We excluded outlying samples with an excessive number of autosomal discovery CNVs (Supplementary Figure 4), and genotyped all remaining samples using read-depth on a total of 10,609 putative deletions events.
Inspection of the resultant genotypes in available trios (n=68) demonstrated consistently low levels of Mendelian inconsistencies across these candidate loci (0.0026±0.00067, Supplementary Figure 4). All genotyped sites were further refined based genotype calls using the following criteria: (i) duplicate sites with a 50% overlap and discordance LOD score greater than zero, (ii) non-variant sites without confident (GSNONVARSCORE ≥ 13) non-reference genotype calls, (iii) sites with cluster means deviating from expected values (GSM1 < 0.5 or > 2.0), and (iv) sites with poor cluster separation (GSCLUSTERSEP ≤ 2). Using an Intensity Rank Sum test on available SNP-array intensity data, we estimated a genome-wide FDR for the entire callset ranging from 0.04 (all deletions) decreasing to 0.018 (deletions ≥ 5kb) and lower, depending on the size-cutoff used.
Prior to performing imputation, we removed all monomorphic CNVs and CNVs with a genotype missing rate > 5% after setting Mendelian inconsistent genotypes to missing. We also removed CNVs failing within VDJ recombination regions. We then imputed CNVs across all samples using the same imputation pipeline for SNVs. We restricted our final callset to the set of samples with SNP genotyping data and kept only samples that were also well-imputed for SNVs.
We retained well-imputed CNVs using the threshold-based calling approach where a genotype with the highest probability needs to be > 80% and used a custom Perl scripts to convert our imputed callset to PLINK's native CNV format and annotate imputed CNV calls for their overlap with the Omni2.5 array. To make our callset comparable to 1000 Genomes data, we also removed all CNVs that had <200bp of uniquely-alignable sequence as defined by the 35-mer alignability mask used by Phase 3 of the 1000 Genomes Project. To completely remove redundancy at the sample level (which could bias subsequent burden analysis), we also combined all strictly overlapping CNV calls within the same individual by taking the union of segment boundaries. We then used frequency information defined by 1000 Genomes Phase 3 Structural variant callset 22 to define a final set rare, imputed CNVs as described in the Main Methods.

Linkage analysis
Overview: Several of the 26 pedigrees are large and complex, with multiple inbreeding or marriage loops. Such pedigrees pose a significant challenge to analysis, as standard multipoint approaches that employ the Lander-Green algorithm fail in this setting. Possible solutions would include trimming and cutting pedigrees into smaller units, however we found that applying this approach in these pedigrees led to a reduced capability to resolve phase, and therefore resulted in substantially reduced power. We also considered MCMC approaches 23 , however given the density of markers in a genome-wide setting, such approaches are exceedingly slow, and we found it difficult to evaluate if we had reached convergence. We evaluated linkage in our pedigrees using a two-point parametric linkage on both SNPs and multi-allelic STRs identified from the WGS data using LobSTR 24 , as well as a non-parametric method proposed specifically for use in large, complex pedigrees 25 .
Quality control of microarray data: A subset of samples was repeated in each batch to enable concordance checks. A total of 2,026,257 SNPs were polymorphic and passed all QC procedures, including the evaluation of call rate, testing for Hardy Weinberg equilibrium, and Mendelian error. For linkage studies we used 99,446 SNPs with MAF>0. 35, that had been LDpruned to have r 2 <0.5. During QC procedures, allele frequency calculations, calculations of HWE, and estimates of LD were performed using only unrelated (founder) individuals. Further pedigree-wide Mendel checks were performed on the set of 99,446 SNPs used in linkage analysis; 0.7% of markers had 1 or more errors in Mendelian inheritance in these additional checks and all data for the marker in the family that generated the error was set to missing. Individual-level QC checks included verifying the pedigree structure by comparing theoretical kinship with empirical estimates, assessing missing rate, and verifying that the genetic sex agreed with the reported sex.

Identification and QC of STRs:
We detected STRs with the lobSTR software 24 (version 4.0.0) that uses sequencing data to call STRs. The BAM files aligned with BWA-MEM that were generated during the variant calling process were used as input to lobSTR, which then generated VCF files for STR loci. To measure the accuracy of STR calls from the lobSTR software, we compared lobSTR STR calls to STR data previously collected for 19 individuals in one family; STRs were detected with electrophoresis. The previous STR data had 367 genome-wide STRs. Based on the filtering script that the lobSTR software provides, we developed different filtering strategies.
For each filter, we measured the number of STRs passing the filter, the genotype concordance between the lobSTR STR calls and the previous STR data, and also the number of Mendelian inconsistencies using PedCheck software 26 . We then identified the filter that achieves greater than 95% genotype concordance between the two STR calls and fewer than 0. The null hypothesis of no linkage was evaluated using the LOD (logarithm of odds) score, which is the log 10 of the ratio of the likelihood of linkage, given the estimated recombination fraction, to the likelihood given the null value for the recombination fraction (0.5). Traditionally, a LOD score of 3 (p=0.0001) was considered significant linkage, however Lander & Kruglyak 28 suggested LOD=3.3 (p=4.9E-05) was a more appropriate threshold. The HLOD (LOD with heterogeneity), tests 2 parameters: the recombination fraction and proportion of linked pedigrees. The HLOD score has traditionally been evaluated using a combination of chi-square distributions with 1 and 2 degrees of freedom 29 . By this metric, an HLOD of 3.6-3.7 corresponds to p~0.0001; using the Lander and Kruglyak p-value threshold of 4.9E-05 corresponds to an HLOD of 4.1.
Non-parametric Linkage Analysis: We used Rapid 25 for non-parametric analysis of allele-sharing of SNPs among BP1 individuals in our pedigrees. Rapid is designed for use in large complex pedigrees, where estimation of identity by descent (IBD) can be computationally demanding. Rapid evaluates significance using an approximation to the empiric p-value, which Abney et al. 25 states is conservative. After the first genome-wide scan to identify promising results using the approximation, we re-analyzed markers with -log10(P) > 3.5 with 250,000 simulations to obtain a true empiric p-value. Markers with p<4.9e-05 were then evaluated with 1,000,000 simulations to obtain a more accurate p-value. In the NPL analysis, we approximated the empiric p-values using Rapid, and 13 SNPs and 156 STRs resulted in -log10(P)>3.5. These SNPs and STRs were re-analyzed with 250,000 simulations to obtain a true empiric p-value, and among those, eight SNPs and one STR had p<4.9e-05, which corresponds to the HLOD threshold > 4.1. We re-analyzed those eight SNPs and one STR with 1,000,000 simulations, and empiric p-values from this round of simulations ranged from 2e-06 to 3.9e-05, including two SNPs with p=2e-06, one on 16q23 (at 77.7 Mb) and one SNP on 21q21 (at 23.9 Mb) (Supplementary Table 5).

Identification of genes for burden and segregation analyses for rare variants
We highlighted genes for burden and segregation analysis of rare variants from three sources: (1) genes in regions that demonstrated evidence of enrichment of BP1 heritability, as evaluated in the PGC BP1 GWAS data set (summary GWAS statistics); (2) genes near PGC BP1 GWAS peaks; (3) genes within 1Mb of our linkage peaks. We only consider genes where there is at least one deleterious SNV in our BP1 dataset.
We identified regions with enrichment of BP1 heritability using PGC GWAS summary statistics and 220 epigenetic profiles that originated from 100 individual cell types or tissues. The epigenetic annotation came from cell-type specific histone modification (ChIP-seq) data generated by the NIH Roadmap Epigenome Project 30 . H3K4me1, H3K4me3 and H3K9ac data were post-processed by Trynka et al. 31 The definition of annotations used in our analysis have been previously described. 32,33 Briefly, peaks were called using MACS v.1.4. For each cell-type and specific histone mark, start and end position of identified regions (i.e. called peaks) were determined and together define an annotation. H3K27ac data were post-processed separately by Hnisz et al. 34 , also using MACS v.1.4. The 220 cell type-specific annotations were then divided into 10 groups by taking a union of the cell type-specific annotation within each group, following Finucane et al. 33 Within all regions marked for a specific tissue group, we evaluated the possible enrichment of BP1 heritability using PGC BP1 GWAS summary statistics and Stratified LD Score Regression, a statistical method that partitions SNP-based h2 from GWAS summary statistics. 33 For tissue groups that demonstrated significant BP1 heritability in the PGC, we then highlighted for further analysis in our BP1 families all genes within regions marked by the annotation process; that is, all genes overlapping intervals defined by start and end positions of identified regions (peaks).
For each cell-type group annotation, we first estimated partitioned LD scores using the ldsc.py --l2 function with MAF > 5%, a 1 centimorgan (cm) window, and the 1000 Genomes European reference panel (CEU) to estimate LD matched to the population of the PGC GWAS. We next ran sLDSR (ldsc.py --h2) for each cell-type group while accounting for the overall heritability that is distributed across 53 baseline annotations (baseline model), as recommended by the developers. If a cell-type group annotation is associated with increased h2, LD to SNPs in that annotation will increase the χ2 statistic of a SNP more than LD to SNPs outside of that cell-type annotation. To determine if this effect is significant, it estimates the contribution of that annotation to the per-SNP h2 while accounting for the baseline model. We evaluate the Zscores of the contribution of each annotation, which is calculated using standard errors that are obtained via a block jackknife procedure.
We also specifically targeted genes near genome-wide significant association signals in the PGC BP1 GWAS. The PGC BP1 GWAS summary statistics were clumped in PLINK, using our BP1 genotype data as the LD reference (founder genotypes only). Clumps were formed in windows of 250 Kb and using an r 2 threshold of 0.1. Among the resultant clumps, if the lead SNV was genome-wide significantly associated to BP1 in the PGC data (p<5e-08), we determined the physical extent of the SNVs that were in the same clump as the lead SNP, and considered for further analysis all genes within such regions.
Lastly we targeted genes within 1 Mb of linkage peaks (both parametric and non-parametric) identified in our BP1 pedigrees. Linkage peaks were defined as parametric HLOD>4.1, or nonparametric p-value <4.9e-05.

Identifying founders who carried a rare variant in the segregation method
To detect F rv , founders who introduced the rare variant into a family, we used the results of GIGI imputation. GIGI generated the probability of imputed genotypes for everyone in a family, even those who were not genotyped with the microarray (including all founders). Assuming biallelic variants, GIGI generated three probabilities for three genotypes (1/1, 1/2, and 2/2 where 1 and 2 are two alleles). We had two strategies to identify F rv . In the first strategy, if the highest genotype probability among the three probabilities of a founder was > 0.8, we considered this founder to have a high-quality genotype. If all founders in a family had high-quality genotypes, we included this rare variant for analysis and identified F rv who had a rare variant. We used the second strategy when all founders except a pair of top founders had high-quality genotypes. A pair of top founders was a couple who did not both have parents in the pedigree structure, and they were not usually sequenced or imputed well. In some cases, GIGI assigned about 0.5 probability of having a rare variant to both top founders. This indicated that a rare variant was inherited from one of the couples, but GIGI was not able to accurately identify which founder introduced the variant. Assuming that both top founders were neither sequenced nor imputed well, and they did not contribute to the Srare statistic, the segregation p-value would be the same regardless of which of the founders carried the rare variant. Hence, we randomly assigned the rare variant to one of the top founders in the second strategy. We ignored rare variants in which there was more than one pair of top founder in a family who did not have the high-quality genotypes as well as rare variants in which founders carried two alleles of a rare variant. We analyzed rare variants with low genotype missing rate (< 5%) that at least two affected individuals shared in a family as we were not interested in rare variants present in one or zero affected individuals.

Performance of the segregation method for rare SNVs and CNVs
To measure the performance of the segregation method for rare SNVs and CNVs, we generated simulation data following the simulation framework of Sul et al. 35 . We used a modified version of the wide pedigree structure in Sul et al.; the original wide pedigree structure had 30 individuals where there were two founders in the first generation, two founders and two nonfounders in the second generation, and 24 nonfounders in the third generation (see Figure  S1 in Sul et al.). We doubled the number of nonfounders in this simulation to simulate a larger pedigree structure. We used COSI software to generate haplotypes of unrelated individuals assuming European ancestry and assigned them to founders such that only one founder had a rare variant. We then generated haplotypes of nonfounders using a gene dropping procedure from founders (refer to Sul et al. for detailed descriptions of simulation framework). To measure the false positive rate (FPR) of the segregation method, we generated the null simulation dataset by randomly assigning affected and unaffected status to individuals in pedigrees. To measure the power, affection status of each individual was determined with the following logistic regression model.
!(# = 1) = '()(* + + -. *) 1 + '()(* + + -. *) P(A=1) was the probability that individual A was affected, and * + = log (3/(1 − 3)) where W was the baseline prevalence, which is 20% in our simulation. X was the genotype vector where 1 means carrying a rare variant. * = log (67) and we used odds ratio (OR) of 4, 6, 8, 10 and 12. We used large odds ratio to simulate the perfect segregation or nearly perfect segregation in a family. We generated 10,000 replicates for the false-positive simulation and 2,000 replicates for power simulation for each OR. We generated 20 families in each replicate, and the minimum number of affected individuals was 19 and the maximum number was 37 in each family. We performed 100,000 random IV sampling to estimate a p-value of each segregation, and we only tested family-level p-values in this simulation. We included only variants that were shared by at least two affected individuals in a family, which is consistent with how we tested segregation of rare variants for the CO/CR families. FPR was measured as a fraction of p-values < a = 0.1 and 0.05 among all p-values calculated (the number of families with a rare variant in all replicates), and the power was similarly calculated at a = 0.05. Results show that FRP was 0.088 at a = 0.1 and 0.039 at a = 0.05, which indicates our method had slightly lower FPR than expected. The power of our method was 0.54, 0.75, 0.84, 0.89, and 0.92 for OR of 4, 6, 8, 10 and 12, respectively. These results indicate that our method has the appropriate power to detect a rare variant with strong effect in a large family.