Because of the high costs associated with ascertainment of families, most linkage studies of Bipolar I disorder (BPI) have used relatively small samples. Moreover, the genetic information content reported in most studies has been less than 0.6. Although microsatellite markers spaced every 10 cM typically extract most of the genetic information content for larger multiplex families, they can be less informative for smaller pedigrees especially for affected sib pair kindreds. For these reasons we collaborated to pool family resources and carried out higher density genotyping. Approximately 1100 pedigrees of European ancestry were initially selected for study and were genotyped by the Center for Inherited Disease Research using the Illumina Linkage Panel 12 set of 6090 single-nucleotide polymorphisms. Of the ∼1100 families, 972 were informative for further analyses, and mean information content was 0.86 after pruning for linkage disequilibrium. The 972 kindreds include 2284 cases of BPI disorder, 498 individuals with bipolar II disorder (BPII) and 702 subjects with recurrent major depression. Three affection status models (ASMs) were considered: ASM1 (BPI and schizoaffective disorder, BP cases (SABP) only), ASM2 (ASM1 cases plus BPII) and ASM3 (ASM2 cases plus recurrent major depression). Both parametric and non-parametric linkage methods were carried out. The strongest findings occurred at 6q21 (non-parametric pairs LOD 3.4 for rs1046943 at 119 cM) and 9q21 (non-parametric pairs logarithm of odds (LOD) 3.4 for rs722642 at 78 cM) using only BPI and schizoaffective (SA), BP cases. Both results met genome-wide significant criteria, although neither was significant after correction for multiple analyses. We also inspected parametric scores for the larger multiplex families to identify possible rare susceptibility loci. In this analysis, we observed 59 parametric LODs of 2 or greater, many of which are likely to be close to maximum possible scores. Although some linkage findings may be false positives, the results could help prioritize the search for rare variants using whole exome or genome sequencing.
Bipolar I disorder (BPI), also termed manic-depressive illness, is a complex neuropsychiatric disorder afflicting approximately 1% of the worldwide population (APA 2000). Onset of illness typically occurs in late adolescence to early adulthood when individuals present with acute mania, severe depression or mixed mood states. For most persons the disorder is lifelong with recurring episodes of mania and/or depression. BPI causes substantial morbidity with high costs to the individuals as well as society, and an estimated 10% of those afflicted die by suicide. As mania and depression are seemingly antipodal neurobiological states, BPI disorder is one of the most enigmatic of medical illnesses.
Although fundamental neurobiological alterations underlying illness are not yet known, family, twin and adoption studies indicate that BPI is genetically based with heritability estimates of 80%.1 These studies also show that BPI displays variable expressivity, reduced penetrance and unknown modes of inheritances. Results of some segregation analyses implicate major loci in a subset of pedigrees.2, 3 As the disorder is relatively common, BPI is likely to have substantial genetic heterogeneity, both allelic and non-allelic (locus). Similar to other complex genetic disorders such as diabetes, autoimmune disorders, cardiovascular disease and Alzheimer Disease, the genetic architecture is likely to be comprised of both common and rare variants.
For most of the past few decades, linkage analysis has been the only genome-wide method available for mapping genetic illnesses. It is robust to allelic heterogeneity (although not to locus heterogeneity) and is optimal for mapping rare variants with relatively large effect sizes. Over 25 genome-wide linkage studies of BPI have been completed using mostly small/nuclear pedigrees derived from outbred populations, and results are compatible with the presence of large effect size rare alleles with substantial locus heterogeneity.4 Meta-analyses have produced some significant findings although these results were based on molecular methods with lower information content than methods currently available.5 Genome-wide linkage analyses of uniquely large pedigrees—mostly from isolated populations—have also implicated a number of chromosomal regions,6, 7, 8, 9, 10, 11 many generating LODs of 3 or more in individual pedigrees. Firm replication has not been achieved for any of these implicated regions, but this is not surprising as exceedingly large samples are required for replication.12
Association analysis is more robust than linkage given a wider range of locus heterogeneity (though not allelic heterogeneity), and is a powerful method for mapping common alleles with small effect sizes. By the mid 2000s the completion of the second-generation haplotype map of the human genome (HapMap2) coupled with the advent of low-cost genotyping platforms for interrogation of several hundred thousand single-nucleotide polymorphisms (SNPs), made genome-wide association studies (GWAS) feasible. GWAS has been tremendously successful for mapping common variants for over 40 complex disorders.13, 14, 15 Although GWAS findings represent a significant biomedical advance, the implicated loci explain no more than 5–10% of the genetic liability for any disorder. Most associated SNPs give no insight into disease mechanisms. Initial GWAS for BPI using modest sample sizes (∼a few thousand cases and controls) produced some suggestive results. Subsequent meta-analyses (comprised of several thousand cases and controls) yielded significant evidence for association to a few loci including CACNA1C and ANK3.16, 17, 18, 19, 20, 21, 22 These results represent a significant advance in unraveling the genetics of bipolar disorder; however, the identified loci to date explain only 1–2% of disease liability.
In contrast to GWAS, sample sizes for linkage studies have been comparatively small, due in part to the high costs and effort necessary for the ascertainment of multiplex pedigrees (versus individual cases and controls). As a result, almost all linkage studies have been significantly underpowered. In addition, the genetic information content for the vast majority of linkage studies has been under 60%. For these reasons we established a collaboration to pool family resources to carry out higher density genotyping. We report genome-wide linkage analyses for 972 pedigrees genotyped with the Illumina (San Diego, CA, USA) 6K SNP array. The families reported in this study include 2284 individuals with BPI disorder (140 are schizoaffective, bipolar type (SABP)), 498 with BPII disorder and 702 with recurrent major depression.
Materials and methods
The sample is described in Supplementary Table 1. For this study, only European–American samples were genotyped. Three different affection status models (ASMs) are used: ASM1: bipolar I and SABP; ASM2: ASM1 and bipolar II; and ASM3: ASM2 and recurrent major depression. For each model, only individuals without significant mood, anxiety or psychotic disorders were coded as unaffected. The remaining individuals were coded as unknown. Supplementary Table 2 lists the number of relative pairs that are concordantly affected or unaffected, or discordant for each of the three ASMs. Supplementary Table 3 shows the distribution of the number of affected individuals per pedigree.
All subjects underwent semi-structured interviews, and hospital records were obtained where available. All information for each subject was evaluated by two experienced clinicians who made consensus diagnoses. When the two clinicians disagreed, a third clinician derived the final diagnoses. Individuals from the National Institute of Mental Health (NIMH) Genetics Initiative were interviewed using the Diagnostic Interview for Genetic Studies.23 Diagnostic and Statistical Manual (DSM)-IIIR criteria were used for BPI diagnoses and research diagnostic criteria were used for BPII and recurrent major depression diagnoses.24 Subjects ascertained at Cardiff University and Trinity College Dublin were interviewed using the Schedule for Clinical Assessment in Neuropsychiatry,25 and DSM-IV criteria were applied for all diagnoses. Individuals identified by Johns Hopkins University were assessed either using the Lifetime Version of the Schedule for Affective Disorders and Schizophrenia (SADS-L) or the Diagnostic Interview for Genetic Studies.26 Research diagnostic criteria were used for all Johns Hopkins University diagnoses. Individuals evaluated by the University of California San Diego were interviewed using the Structured Clinical Interview for DSM-IIR, except those derived from the first family ascertained for which the SADS-L was used.27 The University of California San Diego diagnoses were made using DSM-IIIR criteria, modified to require a 2-day minimum duration for hypomania. Individuals at Edinburgh were assessed using the SADS-L, and DSMIIR criteria were used.28 NIMH-Intramural/University of Chicago families was assessed using SADS-L.29 Families from the University of California San Francisco (UCSF)/Utah site were interviewed using the SADS-L, and research diagnostic criteria were used for all diagnoses.30
Genotyping was performed by the Center for Inherited Disease Research using the Illumina Linkage Panel 12 set of 6090 SNPs. In all, 5670 autosomal and X-linked SNPs passed Center for Inherited Disease Research's quality control. The investigators assessed the SNPs for missingness, allele frequency and Hardy–Weinberg equilibrium using PLINK (http://pngu.mgh.harvard.edu/purcell/plink/) analysis.31 Mendelian errors were assessed by both PLINK and Pedcheck (http://watson.hgen.pitt.edu/register/docs/pedcheck.html) analyses,32 and if there were errors in a pedigree, the SNP was set to missing for all members in the pedigree. Unlikely recombinants were assessed by Merlin (http://www.sph.umich.edu/csg/abecasis/merlin/index.html) analysis,33 and problematic genotypes were set to missing. A total of 5642 autosomal and X-linked SNPs were included in the linkage analysis. Missingness was less than 3% for SNPs and less than 2% for genotyped individuals, frequency was at least 1%, and Hardy–Weinberg equilibrium P-value >0.001 for these SNPs.
Pedigree relationship checks
PLINK analysis was used to identify unexpected relatedness between pedigrees, using identity by descent (IBD) scores across all the SNPs and gender misspecification by looking at homozygosity of X-linked SNPs. PREST (http://fisher.utstat.toronto.edu/sun/Software/Prest/) analysis34 was used to test whether the observed IBD scores were consistent with reported relationships within pedigrees. For the PREST analysis, a subset of 4969 autosomal SNPs with low linkage disequilibrium (pairwise r2<0.5) was used. Eight duplicate pairs were found and any duplicate that was inconsistent with the rest of the pedigree was excluded. In two instances, partial pedigree overlaps were identified between two sites (University of California San Diego and Johns Hopkins University, NIMH and UCSF/Utah). These pedigrees were joined, using the observed IBD scores to identify the relationships. If phenotype information for the overlapping samples did not agree between sites, then the individual's phenotype was set to unknown (1/5 overlapping samples). In another pair of pedigrees from two sites, a parent in an NIMH pedigree was consistent with being a child in an NIMH-Intramural/Chicago pedigree. These two pedigrees were merged. Two UCSF/Utah pedigrees had founders that were siblings and these pedigrees were merged. For 22 samples, genetic gender was not consistent with reported gender. If the IBD scores were otherwise consistent with the reported relationship, the gender was changed; otherwise, the individual was discarded. Pedigree structures were corrected when relationships determined by IBD were unambiguous; otherwise, samples were discarded. PLINK IBD and PREST analyses were performed after modifications to assess the accuracy of the pedigree relationships.
Principal components analysis, as implemented in EIGENSOFT,35 was used to identify subjects of non-European–American ancestry. The genotype data for the founders of each of the pedigrees were combined with genotypes from the HapMap sample (founders only) for the CEPH, Yoruban, Japanese and Chinese sample. HapMap genotypes that did not overlap with the linkage sample were discarded. The first two components of the PCA analysis were plotted (Supplementary Figure 1). None of the samples clustered with the Yoruban or Asian samples, and almost all of the samples clustered with the CEPH sample. Individuals from 10 families (four NIMH, two Johns Hopkins University, three Cardiff and one University of California San Diego) were identified as outliers and these 10 families were discarded from the analysis.
Parametric and non-parametric analyses were performed for three ASMs using two different methods of analysis. All SNPs were analyzed with Merlin version 1.1.233 using the correction for linkage disequilibrium by identifying clusters of SNPs with pairwise linkage disequilibrium of at least 0.1. Large pedigrees were trimmed using PedShrink software (http://mayoresearch.mayo.edu/schaid_lab/software.cfm), which trims pedigrees to a specified bit size and prioritizes trimming uninformative individuals. All informative individuals were analyzed with MORGAN version 3.036 (http://www.stat.washington.edu/thompson/Genepi/pangaea.shtml), which uses Markov chain Monte Carlo methods for linkage mapping in large pedigrees. A subset of 4679 SNPs with pairwise r2<0.1 was used in the MORGAN linkage analysis.
Parametric heterogeneity (HLOD) scores under dominant model (risk allele frequency 0.01, penetrances 0.01, 0.70 and 0.70) and recessive model (risk allele frequency 0.1, penetrances 0.01, 0.01 and 0.70) were calculated in Merlin and MORGAN/ lm_markers. These models are powerful for rare, relatively highly penetrant susceptibility alleles. Non-parametric linkage (NPL) analyses were performed using (1) the SAll statistic under the exponential model37 implemented in Merlin which can be powerful in datasets that show very strong linkage signals or which include larger pedigrees; (2) the SPairs statistic under the linear model37 implemented in Merlin, which may be more powerful for relatively common alleles; (3) the Spairs statistic implemented in MORGAN/lm_ibdtests; and (4) the Slambda statistic implemented in MORGAN/lm_ibdtests, which incorporates affected and unaffected individuals. For the MORGAN non-parametric tests, the normality-based tests, rather than assessing significance through permutation of affection status, were used as many families did not have unaffected relatives.
Sex-specific genetic maps were obtained from Rutgers Map Interpolator software (http://compgen.rutgers.edu/old/map-interpolator) that estimates genetic distances from the physical locations of the SNPs, using the Rutgers combined linkage physical map of the human genome.38
Thresholds for suggestive and significant linkage for each analysis were estimated from the data using autoregressive models to estimate the correlation between standard normal statistics at adjacent map points, and then using this correlation to estimate study-specific critical values (http://www.wpic.pitt.edu/wpiccompgen/robust_estimation_genome_scan.htm).39
Information on age of onset, presence of psychosis, presence of mood-incongruent psychosis and suicide attempts was present for most of the families studied. The effects of these covariates on linkage were analyzed using ordered subsets analysis40 as implemented in FLOSS (flexible ordered subset analysis).41 Covariate designation was pedigree specific. For age of onset, pedigrees were classified based on the youngest age of onset within a pedigree. For the other covariates, pedigrees were classified by the presence of at least one individual who was positive for the covariate. Also, the number of affected individuals within a pedigree was analyzed as a covariate as pedigrees with many affected individuals may be segregating for different loci (possibly with rarer, more highly penetrant alleles) than pedigrees with few affected individuals. NPL scores for each pedigree were calculated by Merlin using the Spairs statistic. Families were ordered by covariate scores. Families with the same covariate score had the same rank. Thus, for binary covariates, there were only two subsets of families. Multipoint linkage was performed on the subset of families with the k-smallest or k-largest scores. The permutation test compared the maximum ordered subset linkage score for the covariate score by ordering of the families with the maximum ordered subset linkage scores obtained for random orderings of the families. Permutation controls for multiple testing of all regions of a single chromosome, but not for multiple chromosomes or multiple covariates.
Age of onset, presence of psychosis, presence of mood-incongruent psychosis and suicide attempts were also analyzed as phenotypes. For binary covariates, individuals were considered affected if they were positive for the covariate and unknown otherwise. These covariates were analyzed with Merlin using the SAll statistic under the exponential model and the SPairs statistic under the linear model. Age of onset was analyzed with variance components analysis as implemented in Merlin. This analysis partitions the variance of the trait into the following components: major locus, a polygenic component and non-genetic variation.
Imprinting was analyzed using GeneFinder42 (http://people.virginia.edu/~wc9c/genefinder/index.html) with linkage extensions. This analysis uses generalized estimating equations to test whether sharing of maternal alleles among affected sib pairs is significantly different from sharing of paternal alleles. For this analysis, extended IBD scores were calculated using Merlin.
Non-parametric and parametric linkage analyses
The critical LOD scores for suggestive (occurring once by chance in a single genome scan), significant (probability of 0.05 of occurring in a single genome scan by chance) and significant criteria with Bonferroni correction for 24 different analyses, are presented in Supplementary Table 4. Bonferroni correction is likely to be overly conservative as the analyses are not independent of each other. Suggestive criteria scores range from 1.9–2.3, significant criteria scores range from 3.2–3.6 and criteria scores after correcting for multiple testing are 4.5–5.0. Information content was estimated using Merlin's entropy measure, where 1 is completely informative. Mean information content was 0.87 (s.d. 0.026) for all the SNPs and 0.86 (s.d. 0.028) for SNPs after pruning for linkage disequilibrium.
Supplementary Figures 2–4 show the Merlin results for ASM1, ASM2 and ASM3, respectively. Table 1 presents all results that meet suggestive criteria for significance. Results that met significant criteria are found at 6q21 for ASM1 (pairs LOD 3.4 for rs1046943 at 119 cM) and 9q21 for ASM1 (pairs LOD 3.4 for rs722642 at 78 cM). Suggestive results were found in 10 additional regions. No results were significant after correcting for multiple analyses.
The MORGAN linkage results are presented in Supplementary Figures 5–7 for ASM1, ASM2 and ASM3, respectively. Table 2 presents all the results that met the suggestive criteria for significance. No results met the significant criteria. With the exception of 15q22 for ASM2 (Slambda LOD 3.0 for rs782944 at 60 cM), all of the regions presented in Table 2 also met a minimum suggestive criteria for significance in the Merlin analyses.
Table 3 presents the results from the MORGAN parametric analyses for which the LOD score for a single pedigree exceeds 2. Although most of these results are likely to be false positives, inspection of results for larger families may suggest regions in which a rare, highly penetrant locus exists, particularly when three or more families show elevated LOD scores in the same region. Such regions include chromosome 4 (152–214 cM) and chromosome 19 (23–58 cM).
The exploratory covariate analyses performed using FLOSS identified genome-wide significant regions of enhanced linkage for families with more affected individuals, younger ages of onset and lack of psychosis. However, no results were significant when corrected for the number of phenotypes tested. All genome-wide significant results are presented in Table 4. Families with younger ages of onset demonstrated enhanced evidence of linkage for chromosome 2 at 118 cM (ASM2—Base NPL 3.24, Subset NPL 5.18, P=0.0007, 429/785 families). Linkage to chromosome 8 at 82 cM was enhanced in families without psychosis (ASM3—Base NPL 2.02, Subset NPL 3.76, P=0.0015, 173/943 families). Families with a larger number of affected individuals showed increased evidence for linkage on chromosome 12 at 97 cM (ASM2—Base NPL 2.40, Subset NPL 4.36, P=0.0015, 27/872 families)
The results for the exploratory analyses using the covariates as phenotypes include: (1) psychosis showed a Exp LOD score of 2.23 on chromosome 12 at 122 cM, an Exp LOD score of 2.27 on chromosome 15 at 23 cM, a pairs LOD of 2.27 on chromosome 18 at 64 cM and a pairs LOD of 2.1 on chromosome 18 at 83 cM and (2) mood-incongruent psychosis showed a pairs LOD of 2.03 on chromosome 18 at 33 cM. Neither analysis met genome-wide significance. Finally, there was no evidence of imprinting from the GENEFINDER analyses (results not shown).
The strongest linkage findings occurred at 6q21 (pairs LOD 3.4 for rs1046943 at 119 cM) and 9q21 (pairs LOD 3.4 for rs722642 at 78 cM) using only BPI and SABP cases, diagnostic classifications that are presumably the least heterogeneous of phenotypes. Both results met genome-wide significant criteria, but were not significant when corrected for multiple analyses. These regions did not show enhanced evidence of linkage for any of the subgroups defined by age, psychosis, suicide or number of affected individuals within a pedigree. The parametric LOD scores in these regions are substantially lower than the non-parametric LOD scores, suggesting low penetrance as data from unaffected individuals appear to be detracting from the linkage results. The 6q21 linkage region was also identified in a meta-analysis of 11 linkage studies performed by McQueen et al.5 Because there is a large overlap between samples included in this study and the meta-analysis, these findings are not independent. Given the overlap between the two studies, it is noteworthy that this study found no evidence of linkage to 8q, a region identified in the meta-analysis. However, the meta-analysis did find suggestive evidence of linkage to 9q at 46–48 cM, which is 30 cM proximal to our finding. Under ASM1 the next two highest results occurred at 2q (pairs LOD 3.1 at 118) and 17q (pairs LODs 2.7 at 109 cMs). Under ASM2 or ASM3, suggestive results were found in 10 additional regions (Table 1). No results were significant after correcting for multiple analyses.
Association analysis has been performed for the 6q21 region and bipolar disorder by Fan et al.43 In all, 3047 SNPs were analyzed in a case-control sample of 1064 individuals and a family-based sample of 256 nuclear families. A replication study of 151 SNPs in 622 cases and 1181 controls, involving SLC22A16, DDO, PREP, NT5DC1, GPR6 and the region around rs794854, was also performed. Evidence of association with SLC22A16 in all three samples, although not consistently with the same SNPs, was found. SLC22A16 encodes an organic cation/carnitine transporter. Organic ion transporters transport various medically and physiologically important compounds, including pharmaceuticals, toxins, hormones, neurotransmitters and cellular metabolites. For 9q21, evidence of association to NTRK2, a tyrosine kinase receptor gene which binds with brain-derived neurotrophic factor, was found in a whole-genome association study of bipolar disorder African–Americans.21 However, there was no association found in the European–American sample in this region.
Copy number variants have been analyzed by Zhang et al.44 in a sample of individuals drawn from the NIMH families in this study. No evidence of association with a specific CNV was found. However, singleton deletions that were present in cases but not controls were found in 6q21 (one case), 6q26 (two cases, no overlap) and 9q21 (one case). Grozeva et al.45 analyzed copy number variants in the The Wellcome Trust Case Control Consortium sample, which includes about 50 cases that are in our linkage sample.25 Significant evidence of association between bipolar disorder and copy number variants was not found in this study. Copy number variants present in cases but not controls were found on 6q21 (two cases), 6q22 (two cases, no overlap) and 9q21 (two cases, no overlap).
Exploratory covariate analyses found enhanced evidence of linkage in the subset of families with younger age of onset at 2q12, lack of psychosis at 8q13 and larger numbers of affected individuals at 12q21. Mathieu et al.46 found evidence of linkage to 2q14, 10–30 cM away from our finding, in the portion of their sample with age of onset less than 22 years but not in the subsample with older age of onset. The finding on 8q13 is approximate 70 cM proximal to the 8q linkage region reported in McQueen et al.,5 and thus probably is not the same region. Analyses using covariates as phenotypes found no genome-wide significant findings.
We also inspected scores for the larger multiplex families and observed 59 parametric LODs of 2 or greater, many of which are close to maximum possible scores (Table 3). Although some of these LOD scores may represent false positive linkage findings, there were multiple regions in which at least one pedigree had a LOD score of at least 2. In particular chromosome 4q (152–214 cM) and chromosome 19 (23–58 cM) are regions in which three or more pedigrees showed elevated LOD scores and may represent regions in which a rare, relatively penetrant locus may be segregating in a small proportion of families. The 12q21 area may also represent such a region, as there was increased evidence for linkage in families with a larger number of affected individuals. These data could be an invaluable asset for mapping rare variants via genome-wide sequencing. Large families, producing relatively high LOD scores, are more likely to contain rare alleles with relatively large effect sizes. Focusing on regions showing elevated LOD scores will help reduce the number of rare variants to be examined. In addition, co-segregation of the rare variant may be tested within large families, which will further filter the number of variants for follow-up.
Despite a large sample of 972 pedigrees of European ancestry and almost optimal extraction of genetic information (0.87), there were no linkage findings that met genome-wide significance after correction of multiple testing. However, the genetic models are interdependent and statistical correction may be overly conservative under these circumstances. Thus, some of the linkage findings may represent true positives. A number of factors could explain the inability to obtain robust and replicable findings in this linkage study as well as in the field. First, the assumption that rare large-effect size alleles underlie some of the genetic variation of bipolar disorder may be incorrect. In contrast to linkage, GWAS has found evidence for a few common loci that appear to be quite significant. However, these loci explain only a small percentage of the genetic variation. Given that both rare and common loci have been found for most complex genetic diseases, it is likely that both rare and common variation predispose to BPI. Second, assuming rare variants exist, it appears that there is a large range of locus heterogeneity and that even large samples may not have sufficient power. Linkage results reported over the last two decades are compatible with this. Under a model of extensive locus heterogeneity uniquely large pedigrees will be invaluable for mapping loci, as the ability to ascertain thousands of smaller nuclear/sib pair families are cost prohibitive. For this study, seven separate sites contributed families and it is possible—if not probable—that varying fractions of families linked to particular loci are found in each of the samples. There were also different ascertainment and diagnostic criteria among some of the sites, and this may have increased heterogeneity. Another factor that hinders the ability to obtain replicable linkage results, is that very large sample sizes are required for replication.12 Several thousands of pedigrees may be required and it is unlikely that this will ever be possible. Although the era of large-scale linkage studies is probably over, linkage analysis of uniquely large pedigrees may still be worthwhile.
This work was supported by National Institutes of Health (NIH/NIMH) research Grant R01-MH077314 (JAB, WB). Wellcome Trust 045267/Z/WRE/MB/JAT (MG, RS).
About this article
Supplementary Information accompanies the paper on the Molecular Psychiatry website (http://www.nature.com/mp)
Psychiatric Clinics of North America (2016)