Introduction

Suicide is the 10th leading cause of death in the United States; over 44,000 individuals die by suicide in the United States every year [1]. Although environmental variables have undeniable impact, evidence suggests that genetic factors play a role in completed suicide, with heritability of close to 50% [2, 3]. Recent growth in the number of suicide genetic studies has resulted in promising findings from candidate gene and genome-wide association studies [4], though many remain to be replicated. Replication is hampered by sample differences across studies, including differences in demographics and primary diagnoses of study samples, as many studies of suicide risk have been conducted within cases ascertained for specific psychiatric disorders [4]. In addition, most studies of suicide have focused on suicidal ideation and behaviors; these phenotypes are much more common than completed suicide, allowing for ascertainment of sufficiently powered samples, but suicidal behaviors can be difficult to quantify, and represent individuals with a range of risk for later suicide. In addition, evidence suggests important differences in the etiology of suicidal behaviors versus the less ambiguous but much rarer outcome of completed suicide [5].

We implemented a unique study design to investigate genetic risk for suicide through the collection of DNA samples on > 4500 consecutive individuals who died by suicide in the state of Utah, providing an unparalleled population-based genetic resource. This sample results from a long-term collaboration with the Utah State Office of the Medical Examiner. The records from these cases have been linked to the Utah Population Database (UPDB, https://healthcare.utah.edu/huntsmancancerinstitute/research/updb), a comprehensive database including multi-generational genealogies, as well as death certificates, demographic data, and current medical information on over eight million individuals. Through this linking, we have identified very large families (7–9 generations) with significantly elevated suicide risk. Familial aggregation across distant relatives in these families minimizes the impact of shared environment on risk. High-risk families also provide more genetically homogeneous risk groups, increasing statistical power to detect familial variants associated with disease risk. The Utah extended family study design has already shown success in the study of other complex genetic diseases of extended families of similar size (e.g., colon cancer [6], breast cancer [7], and cardiac arrhythmia [8]).

This study reflects an analysis of 43 very large Utah families at significantly elevated risk for completed suicide.

The focus on completed suicide, statistically concentrated in these high-risk families, optimizes power to reveal regions of the genome likely to contain risk variants. Our design investigates genetic risk using the suicide cases in the extended families regardless of co-occurring psychopathology, and continues with follow-up studies from our population-wide ascertainment of all suicide deaths in Utah, again without regard to co-occurring psychopathology. We recognize that psychiatric diagnoses are critically important in suicide risk [9, 10]; it is likely that findings from our study are related to these associated risks. However, because of the familial aspect of our design, it is possible that our results may reveal risk variants that cross-cuts specific psychiatric diagnoses [11,12,13,14].

Genetic studies of psychiatric disease have revealed associations with multiple rare and common risk variants with reduced penetrance, which may interact in complex ways with each other, with background genetics, and with environmental risks [15]. Based on results to date, we expect that suicide will follow this complex genetic architecture. In this study, the familial analyses use a new statistical method (Shared Genomic Segments, SGS [16]) that is well-powered to identify rare genetic variants in large families, evidence that can then be used to prioritize searches for additional variants contributing to risk in other case samples. This design is complementary to the Genome-Wide Association Study (GWAS) approach in large case-control samples, which can also produce statistical evidence for risk genes to be followed up in independent case samples.

Using genome-wide single-nucleotide polymorphism (SNP) variants matched to the same variants in publicly available population control data, we identified regions of the genome that segregate in suicide cases within high-risk families. These statistically significant regions provide compelling genes as targets for follow-up. Although such follow-up studies would ideally use comprehensive sequence data, the SNP array platform used in this study contains putatively functional variants of high interest to psychiatric and medical disorders, both of which may share overlapping suicide risk. Functional content of the PsychArray was investigated first within familial cases responsible for the significant regions, and then within ~ 1300 additional Utah suicide cases unrelated to the original extended families analyzed.

This study adds to the growing knowledge of genetic risk for suicide. First, we have identified genes in regions of significant familial segregation in large high-risk families, providing replication for previously reported genes of high interest, and identifying target genes for additional follow-up. Second, we have identified novel risk variants using a large follow-up association analysis of PsychArray variants with predicted functional impact using a population-matched resource of suicide cases.

Materials and methods

Sample

This project is possible because of a collaboration with the Utah State Office of the Medical Examiner (OME), which has spanned two decades. With Institutional Review Board (IRB) permissions from the University of Utah, the Utah Department of Health and Utah Intermountain Healthcare, we have collected de-identified DNA samples from consecutive suicides since 1997. The collection numbers 4585 (3632 males and 953 females). DNA was extracted from blood using the Qiagen Autopure LS automated DNA extractor (www.qiagen.com). Identifying information from cases with DNA was linked to data within the UPDB’s secure computer servers. All identifying data were then stripped before providing data to the research team; suicide cases and family structure data are referenced by anonymous IDs. DNA for this research project is shared with the NIMH Repository and Genomics Resource, project number 315 (2880 samples are now at the repository; additional samples are being sent on an ongoing basis).

Determination of familial risk, selection of families/cases

Genealogical data in the UPDB was used to construct family trees and identify those families at high risk for suicide. Beyond the suicide cases with DNA, the UPDB contains records of all known suicides from Utah death certificates dating from 1904 (N = 14,288). All 14,288 cases were used to estimate familial risk of suicide. To determine the extended families at highest risk, we used the Familial Standardized Incidence Ratio (FSIR) statistic [17], calculated by comparing the incidence of suicide in each extended family to its expected incidence determined by the statewide distribution for suicide stratified by sex and age. We identified 241 high-risk families containing significant excess of suicides (p < 0.05) and at least three suicides with DNA. We selected 43 of these families for analysis (Table 1) based on significance of the FSIR risk statistic, number of cases with DNA, and overall count of meioses between these cases (see Table 1). The 43 families included 2.04–4.41 times the expected number of suicide cases as reflected in the FSIR statistic (range p = 0.003–1E-12, average p = 0.0007). The average number of cases per family with genotyping was 6.2 (range 3–13), and the average number of meioses between analyzed cases was 29.6 (range 15–70; see Fig. 1 for an example of how meioses are counted in a family of moderate size from our resource). Family-specific significance thresholds for genomic sharing (see analysis section below) depend upon family size, structure, dispersion of cases, and number of cases analyzed. Permission for use of family structure data were granted by the Resource for Genetic and Epidemiologic Research (RGE, https://rge.utah.edu), the oversight committee for use of UPDB data.

Table 1 Characteristics of 43 extended families at high risk for suicide
Fig. 1
figure 1

Extended structure of Family 66494 that links seven suicides (shaded in black) used for Shared Genomic Segment (SGS) analyses. Suicide cases are not as evident in upper generations because suicide status from death certificates is only available back to 1904. Note that gender is disguised and sibship order is randomized in order to protect the privacy of family members. Family size: there are 34 total meioses between the seven cases in this family; this counting is shown in purple on the drawing. SGS requires a total of at least 15 meioses between cases for adequate statistical power. Shared segments: three genomic segments provided significant evidence of sharing between cases in this family. The pattern of segregation of each segment is shown. Cases 2, 4, 5, 6, and 7 share region 1 (red). Cases 1, 2, 3, 4, 5, and 6 share region 2 (gold). Cases 1, 2, 3, 4, and 7 share region 3 (blue). Essential segregation is shown; however, when cases do not share, the region can actually be lost at any meiosis above the case in the family tree. The exact point of this loss is unknown

The 198 families not selected for analysis in this study exhibited less-significant risk, and/or had too few cases with DNA, and/or had insufficient distance between cases. The average number of suicide cases per family with DNA in these 198 families was 3.23 (SD = 0.88, range 3–5). The average p value associated with these FSIRs was 0.0028 (SD = 0.0074, range 0.0458–1E-4).

Diagnostic data

In addition to basic demographic and cause of death information, we had access to diagnostic data for psychiatric conditions associated with suicide using electronic medical records data through the UPDB. Codes were linked to case numeric case identifiers within the Utah Population Database; de-identified results were provided for analysis. Conditions were defined by groups of diagnostic codes aggregated according to the International Classification of Diseases (ICD) system (www.icd9data.com; see Supplemental Table S1 for the list of codes used to define diagnostic categories in this study). Importantly, because our cases are not derived from a clinical population, cases can exhibit with no co-occurring diagnoses. Missing diagnostic data can occur for many reasons, including: (1) existence of diagnostic codes other than the 359 codes in Table S1; (2) a case who did not seek medical attention for the psychiatric disorders in question owing to stigma, lack of insurance, cultural barriers, or other lack of access to services, age-related lack of recognition of pathology, or symptoms not perceived to require medical attention; (3) diagnostic data not contained in the UPDB, including diagnoses prior to the storage of electronic diagnoses, or diagnoses given out of state or outside the ~ 85% coverage of electronic medical records data available in the UPDB. We treated missing data as unknown rather than assuming the absence of pathology.

Molecular data

The SGS analyses used variants from the Illumina Infinium PsychArray platform, version 1.0 (https://www.illumina.com/products/by-type/microarray-kits/infinium-psycharray.html) genotyped on 216 suicide cases in the 43 selected families. This PsychArray includes 265,000 common informative tag SNPs, 245,000 variants selected from exome sequencing studies of medical and psychiatric conditions, and 50,000 rare variants associated specifically with psychiatric conditions. Supplemental Figure S1 shows the use of genotype data for the study. Genotyped array content was oriented to 1000 Genomes Project data. For initial analyses of familial sharing, we included all variants contained in 1000 Genomes Project control data, omitting variants where orientation was ambiguous, and variants which were not polymorphic. Using PLINK [16], we also removed 17,058 variants with > 5% missing calls and 176 variants that failed Hardy–Weinberg equilibrium (p < 0.001). In addition, one case from family 553615 was removed owing to a low call rate (> 5% missing). Our initial familial analyses used 237,415 variants from 215 completed suicide cases to reveal familial variation that defines the boundaries of the segments shared among related cases. Rare putatively functional array variants meeting QC criteria, including psychiatric and medical disease-specific variants, were used to follow-up additional variants in the shared regions.

Analysis

(See Supplemental Figure S1 for a flow diagram of the study). We began by using a new analytical method, Shared Genomic Segments (SGS) [18] developed for analyzing large high-risk families to identify subsets of cases that share regions beyond sharing expected by chance. SGS identifies excessive lengths of consecutive SNPs with allelic sharing between relatives to infer genomic segments that are inherited. Theoretically, chance inherited genomic sharing in distant relatives is extremely improbable; thus, the method has power in large families such as those in our study [19]. The significance of each shared segment is assessed empirically using gene-drop simulations (independent of case status) to create a null distribution of expected sharing within each family. The method assigns haplotypes to family founders according to a publicly available linkage disequilibrium map from 1000 Genomes European data, followed by simulated segregation through each specific family structure, repeated a minimum of 500,000 times. The observed sharing is compared with simulated sharing to determine significance. See Supplemental Figure S2 for a hypothetical simplified example of SGS sharing. Genome-wide significance thresholds are calculated specific to each family, as statistical power varies with family structure, number of cases, and distance between cases. Significance thresholds account for multiple testing and linkage disequilibrium, and also adjust for within-family heterogeneity by including adjustment for all possible subsets of within-family sharing among cases. Model fitting to determine theoretical genome-wide thresholds used these distributions of gene-drop results. The overwhelming majority of the genome will be null (does not contain a suicide risk variant); we acknowledge a slight conservative bias as these distributions also contain a small number of true positives [16]. The genome-wide significant threshold corresponds to a false-positive rate of 0.5 per genome per family, whereas the suggestive threshold corresponds to one false-positive result per genome per family. In this study, we report regions with family-specific genome-wide significant evidence, and regions overlapping in more than one family where family-specific evidence was at least genome-wide suggestive. P values for these overlapping multiple-family regions were approximated using Fisher’s combined probability test [20]. The SGS analysis software is freely available (https://uofuhealth.utah.edu/huntsman/labs/camp/analysis-tool/shared-genomic-segment.php).

Power of SGS was previously investigated for a range of genetic models involving rare variants in extended family data [19], showing appropriate power for large families with at least 15 meioses between cases. For all scenarios considered in this study, genome-wide association studies would have had negligible power. Given these results, we selected only extended families with at least 15 meioses between cases (see Table 1 and Fig. 1).

Follow-up analyses

All follow-up work focused on the targeted set of genes identified by the significant SGS regions. Genes were considered within significant segments for follow-up if coding or regulatory sequence (defined using Genomic Regions Enrichment of Annotations Tool [21]) fell within the shared segment. Genes determined in previous research to be highly likely to represent false-positive results [22] were deleted (in our data, a cluster of 56 olfactory receptor genes in one region on chromosome 11, FAT1, CTC-432M15.3, and TRIM51). Three phases of follow-up work were pursued (see Figure S1).

Corroborating evidence from the literature

As a first investigation of the genes indicated by significant SGS regions, we conducted a comprehensive search of the literature for all suicide-related risk genes against which to compare with SGS location-specific evidence. Suicide risk was identified by searching the Web of Science database for the terms: suicid* and gene* (captures variants including suicide, suicidal, suicidality, gene, genetic, etc.). We included reviews on the genetics of suicide as well as linkage, GWAS, candidate gene, expression and epigenetic studies. As secondary information about the suicide-related genes, we also queried DisGeNET [23, 24] for gene associations or involvement with neuropsychiatric disorders and inflammation owing to their known association with suicide risk [9, 10, 25].

PsychArray variants in cases subsets in high-risk families with SGS sharing

The next phase of follow-up comprised a search of the specific familial cases that generated each SGS region using available array variants within each region to determine whether any particular array variant could be responsible each result. Although the variants available to us on the PsychArray are far from complete, a search of relatively rare coding-region variants provides an efficient, immediate, potentially interpretable screen of our results in lieu of large-scale sequencing data [26]. We therefore checked for sharing of the minor allele of non-synonymous variants within specific cases responsible for SGS results, and strictly within region boundaries. Because the SGS method is most powerful for the detection of rare familial variants, we selected a minor allele frequency < 10% in the publicly available Exome Aggregation Consortium (ExAC, www.exac.broadinstitute.org) European, non-Finnish data (matching in ancestry to our sample).

Gene-based evidence in additional Utah suicide cases

The final follow-up phase focused on genes from SGS regions as targets for further study in additional sample resources. Although familial variants may be private to the extended family/families producing the SGS evidence, it is also possible that the evidence implicates genes with additional risk variants in independent case samples (allelic heterogeneity). We screened an independent cohort of Utah suicides for potential functional variants in SGS-targeted genes; this case sample most closely matches the discovery families, as it was derived from the Utah population, and is comprised of completed suicides. Owing to the same population ascertainment source, it is predicted to match the familial discovery sample regarding demographics and diagnostics. To maximize statistical power in our relatively small follow-up cohort of 1300 completed suicide cases, we focused on the potential to discover moderately penetrant, potentially interpretable functional causal variation. To this end, we used the following criteria to select variants: (1) in coding sequence of genes identified by the significant SGS regions, (2) non-synonymous and predicted to be damaging from either PolyPhen [27] or Sift [28], (3) minor allele frequency < 20% in ExAC European, non-Finnish data. We tested for significant allelic association compared with ExAC European, non-Finnish data using Fisher’s exact test, or with chi-square tests for variants with > 10 observed chromosomes with the minor allele in cases and controls. Tests for additional variants within SGS regions excluded suicide cases responsible for original sharing evidence in that region. Significance was adjusted for multiple tests.

Results

High-risk families

SGS analyses were performed on all 43 families. Most of the families (35/43 = 81.4%) showed at least one genome-wide suggestive region, and 10 families (23.3%) showed at least one genome-wide significant region. High-risk genealogies have additional complexity. Although the total number of cases across all families listed in Table 1 is 267, 52 of these cases occurred in multiple families (see Figure S3 for an example of this complexity). Because analyses to identify genomic shared regions are done within family, we included cases each time they occurred under each founder, as we do not know a priori where true sharing may occur. It is possible that cases share risk variant(s) from one set of founders with other cases in that family, but then also share other risk variant(s) with cases in a second family through connections with the other founding couple. The complexities in relationships may allow for future studies of gene × gene interactions once risk variants have been established.

Descriptive characteristics of the 215 independent discovery cases from the 43 families were compared with the other 4370 unselected Utah suicide cases with DNA. Within the 215 high-risk familial cases, 172 were male (80.0%), similar to the 79.2% rate in the unselected sample. Average age at death in the family sample was 34.28 years (standard deviation = 16.28), significantly lower than the average age of 40.01 years (standard deviation = 17.39) in the unselected sample (t = 4.74; p < 0.0001). Method of suicide in the family sample was predominantly gun-related (110/215 = 51.1%), followed by other violent methods (78/215 = 36.3%), then overdose (27/215 = 12.6%). These rates are similar to those in the unselected sample of 52.6%, 32.0%, and 15.3%, respectively. Death certificate data identified 212/215 cases as European-Non-Hispanic, and three as African-Non-Hispanic (one case each in families 41469, 233769, and 587072), similar to the unselected sample, where death certificate data identified 96.89% as European-Non-Hispanic. An ancestry principal component analysis of genotype data from the 215 cases and 1000 Genomes population data confirmed the three African ancestry cases (two showed Hispanic admixture), and identified 11 other cases with evidence of Asian and/or Hispanic ancestry, resulting in overall rates of 3.3% non-European and 5.6% Hispanic cases.

We found similar percentages of cases with presence of diagnoses from electronic records in the 215 familial cases as compared with the unselected 4370 cases, with the exception of an increase in cases with personality disorders and an increase in prior attempts/suicidal ideation in the familial sample. Percentages of cases with at least one code in each of the diagnostic categories vs. the unselected sample were as follows: depression, 41.4% vs. 36.2%; bipolar, 13.0% vs. 10.6%; anxiety, 23.3% vs. 22.8%; psychosis, 2.3% vs. 2.9%; substance use/abuse, 9.8% vs. 12.4%; personality disorders, 14.4% vs. 9.7% (chi-square = 5.06, p = 0.02); ADHD, 4.7% vs. 3.7%; previous attempts/ideation, 37.2% vs. 28.9% (chi-square = 6.82, p = 0.009).

SGS results

SGS analyses revealed 16 single-family regions with genome-wide significance (Table 2). Several families generated more than one region; these were the larger families, where there was more opportunity for multiple different case subsets to show sharing evidence (see Fig. 1 for a specific example of sharing in family 66,494; see Supplemental Figure S4a for drawings of all families with genome-wide significance). Table 2 also presents 15 regions where sharing evidence overlapped across more than one family (Figure S4b); in each of these regions, the single-family evidence was at least at the genome-wide suggestive level. For the region on chromosome 5q23.3–q31.1, person 112,304 is a descendant of both 553,615 and 603,471, and person 95,765 is a descendant in both 553,615 and 176,860. To satisfy the independence requirement for computing the Fisher’s combined p value [20], we computed the p value for this region omitting these cases. There are 207 genes with coding or regulatory sequence in the 31 SGS regions (Table S2).

Table 2 SGS regions with (1) genome-wide significant evidence or (2) overlapping evidence in more than one family meeting at least suggestive significancea

Follow-up studies (see Figure S1 for overview)

  1. 1.

    Supporting literature evidence: we did not find any overlap between significant SGS regions and genomic regions identified by previous family-based linkage studies of suicidal behaviors (Table S3). At the gene level, we reviewed the 207 SGS-targeted genes, first investigating specific supporting evidence of suicide risk. From our comprehensive literature search, a total of 755 genes have been associated with suicide with varying levels of statistical support (Table S4). Eighteen SGS-targeted genes were among these 755 suicide-risk genes (see Table 3; also highlighted in Table S2; a detailed description of these 18 genes follows Table S2). Given an estimated number of ~ 19,000 genes in the human genome [29], we estimate that 755/19000 = 4% of genes in the genome have current evidence associated with suicide risk. If the SGS regions were a random sample of the genome and unassociated with the suicide phenotype, we would expect that only ~ 4% of the genes in SGS regions (8/207 genes) would have corroborating evidence from the literature. However, we found that 18/207 = 8.7% of genes had supporting literature evidence, a significantly greater number than expected by chance (Z = 2.41, p = 0.008). This result suggests that the SGS regions are indeed segments of the genome that are enriched for prior evidence of suicide risk.

    Table 3 Genes in significant SGS regions with supporting evidence of association with suicide
  2. 2.

    Studies of variants in specific cases giving SGS evidence: we selected the 431 non-synonymous PsychArray variants with either benign or damaging functional predictions falling strictly within the significant SGS regions with ExAC European, non-Finnish minor allele frequency of < 10%, reflecting the greater power of SGS to detect more rare risk variants. Considering each group of familial cases supporting each SGS signal, we screened the SNPs strictly within each region for sharing of the selected rare, non-synonymous array variants. There was no instance where this limited array content explained the identified SGS sharing.

  3. 3.

    Studies of variants in ~1300 independent population-based cases: additional population-ascertained Utah suicides with PsychArray genotype data were available for follow-up. This sample was well matched to the family discovery sample; a comparison of these two genotyped samples resulted in no significant demographic or diagnostic differences. These comparisons included the three variables that showed significant differences in our comparison between the family discovery sample and the larger Utah cohort of 4370 suicides with DNA described above. Specifically, in the follow-up sample, age at death was 34.96 (standard deviation = 16.76), percentage with personality disorders was 15.3%, and percentage with suicidal ideation/previous attempt was 36.9% (compared with 34.28 years, 14.4%, and 37.2%, respectively, in the discovery family sample).

    When we analyzed the selected 352 potentially damaging, relatively rare array variants within genes targeted by significant SGS regions, we found four variants with significantly increased presence of the minor allele compared with ExAC European non-Finnish frequencies, adjusting for multiple testing correction (Table 4: rs181058279, p = 5.45E-06; rs76215382, p = 8.48E-05; rs121908538, p = 3.14E-12; rs745918508, p = 5.40E-29). Specific characteristics of cases with each of these rare variants are described in Table S5. This evidence suggests rates of psychopathology similar to rates seen in the overall follow-up genotyped sample. Demographics were also similar, though cases with the AGBL2 variant were significantly more likely to be female (chi-square = 7.82, p = 0.003).

    Table 4 Putatively functional SNPs in target SGS genes with significantly elevated minor allele frequency in Utah suicide cases

Discussion

We have ascertained and studied a unique resource of 43 extended families at high risk for suicide. The design uses the distantly related, high-risk cases to magnify genetic effects, enrich for genetic homogeneity, and minimize shared environmental effects. Families were identified from cases sampled from population-wide ascertainment, resulting in a study design independent of specific psychiatric diagnosis. Cases in the high-risk families were significantly younger at death, by 5.73 years on average, perhaps reflecting enhanced familial genetic risk over and above accumulated environmental risks that may play a greater role in suicide at later ages. The follow-up sample of ~ 1300 genotyped cases more closely matched this familial discovery sample. The mean age at death was similarly young, 5.05 years younger than the unelected sample. The matching of the replication cohort may be due to the fact that we have thus far targeted our overall genotyping efforts to cases with increased evidence of at least one other extended relative who is at suicide risk. Diagnostic data, when present, suggested similar rates of psychopathology across our entire research resource, with somewhat elevated rates of personality disorders and of suicidal ideation and previous attempt in both the family discovery sample and the follow-up genotyped sample.

Cases in families were analyzed with a statistically powerful method, SGS [18], resulting in the identification of genome-wide significant regions likely to harbor risk variants. This family evidence implicated 207 genes for targeted follow-up. We found significant overlap with a comprehensive survey of 18 genes implicated in suicide, lending further support for these genes. Of note, 15 of these 18 genes also show previous associations with inflammatory conditions (Table 3), supporting accumulating evidence for a cross-association between inflammation and suicide risk [30]. Because our method discovers familial genomic regions, we also reviewed prior family linkage studies of suicide risk, but did not find overlaps. This result is perhaps not surprising owing to differences in ascertainment and outcome measures in these previous studies.

The additional rare disease-associated content of the array did not immediately reveal functional rare variants shared across cases responsible for the familial sharing. This result is likely due to the limited number of potentially risk-causing variants captured on the array; sequencing will be required to discover the causal variants shared across the high-risk discovery cases. Alternatively, one or more regions may be false positives.

SGS also provides target genes for other follow-up studies. Genes truly associated with suicide risk may harbor multiple risk-associated variants (allelic heterogeneity). By focusing our follow-up studies to find additional risk alleles to the much reduced number of high-interest target variants in genes identified by SGS, statistical power is increased. An independent population-based cohort of ~ 1300 Utah completed suicide cases, well matched for ascertainment, resulted in four variants associated with suicide (SP110, AGBL2, SUCLA2, and APH1B). SP110 is part of a leukocyte-specific nuclear body protein complex, and likely plays a role in gene transcription [25]. It has been implicated in pathogen resistance and immunodeficiency [31, 32], and may relate to suicide risk through a growing body of evidence implicating immune risk and inflammation [30]. AGBL2 is an ATP/GTP binding protein implicated in brain structure and function [33]. SUCLA2 is a mitochondrial tricarboxylic acid cycle protein recently implicated in energy supply to the synapse [34], and is possibly associated with recent findings linking suicide risk and hypoxia [35]. APH1B is a transmembrane protein associated with risk of Alzheimer’s [36] and Parkinson’s [37] diseases.

Characteristics of cases with each of these four variants did not reveal any striking patterns of association with specific psychopathology, though cases with the AGBL2 variant were significantly more likely to be female. Follow-up in additional research cohorts will be required to clarify diagnostic associations, and to replicate the association with gender found with the AGBL2 variant.

Limitations

Suicide cases were predominantly of Northern European ancestry, as verified with genotype data, so results may be limited to this race/ethnic group. The genome-wide background of the PsychArray contains ~ 265,000 common variants, which is relatively sparse for a genome-wide array. A denser array could have provided additional precision to region boundaries, or may have revealed that some regions were false positives. Diagnostic data were limited to available diagnoses in the electronic medical record. Cases with no diagnostic data are not assumed to have an absence of psychopathology. Rather, missing data more likely reflect either diagnoses outside the scope of our data resources, or lack of connection to services owing to insurance, stigma, cultural factors, or a perception that symptoms did not warrant treatment.

Conclusions

Our study has found significant associations using only on the relatively rare, potentially functional variants captured on the PsychArray; these results have been discovered through a rigorous statistical prioritization and variant selection based only on functional annotation and frequency. As new data on our resource become available, it is likely that additional potential risk variation will be found. However, the current work has produced several important lines of evidence. First, the genome-wide significant SGS regions identify 207 target genes for suicide risk. Second, follow-up analyses of these regions in an independent population-based cohort of suicides highlighted four genes with potential functional risk variants, pending replication. Finally, the SGS regions contained 18 genes with corroborating evidence for suicide risk, suggesting these as strong candidates for future work.