The conserved ASTN2/BRINP1 locus at 9q33.1–33.2 is associated with major psychiatric disorders in a large pedigree from Southern Spain

We investigated the genetic causes of major mental disorders (MMDs) including schizophrenia, bipolar disorder I, major depressive disorder and attention deficit hyperactive disorder, in a large family pedigree from Alpujarras, South of Spain, a region with high prevalence of psychotic disorders. We applied a systematic genomic approach based on karyotyping (n = 4), genotyping by genome-wide SNP array (n = 34) and whole-genome sequencing (n = 12). We performed genome-wide linkage analysis, family-based association analysis and polygenic risk score estimates. Significant linkage was obtained at chromosome 9 (9q33.1–33.2, LOD score = 4.11), a suggestive region that contains five candidate genes ASTN2, BRINP1, C5, TLR4 and TRIM32, previously associated with MMDs. Comprehensive analysis associated the MMD phenotype with genes of the immune system with dual brain functions. Moreover, the psychotic phenotype was enriched for genes involved in synapsis. These results should be considered once studying the genetics of psychiatric disorders in other families, especially the ones from the same region, since founder effects may be related to the high prevalence.

www.nature.com/scientificreports/ variants (SNVs) have evidenced higher impacts (odds ratios, 2-57) 13,14 . The challenge now resides in applying these technologies to establish personalized diagnoses. The Psychiatric Genomic Consortium (PGC) in his latest published agenda aims to study large pedigrees to search for genetic variants of large effect 15 . Pedigrees from genetic isolates with high degrees of consanguinity are of special interest. Several large pedigrees have been recently investigated, either looking for CNVs [16][17][18] , rare SNVs [19][20][21][22][23][24] or common variant contributions 25 . But very few have followed the PGC suggestions 15 , aimed to analyze those pedigrees using comprehensive genomic assays 26,27 .
In this study we have applied a systemic genomic approach to uncover the genomic architecture of a large lineage, with 41 individuals affected of MMD in the last three generations, 27 of which have been diagnosed with psychotic disorders. This family is from a region of southern Spain, the Alpujarras, known to be a hotspot for psychiatric diseases, with a prevalence of 7.8% 28 , almost double of that from the rest of the country, suggestive of being due to founding genetic events.

Results
Pedigree description. A large multigenerational family of Southern Spanish origin with high prevalence of mental disorder was recruited between the Psychiatry ward of the University Hospital Son Espases (HUSE) of the Balearic Islands and the Health Center of El Ejido. The full pedigree is shown in Fig. S1. Figure 1 shows the three subfamilies analyzed. Subjects 1-202 (subfamily 1), 4-211 (subfamily 2) and 3-208 (subfamily 3) are siblings. Out of the 41 individuals affected of MMD, 27 have been diagnosed with psychosis and 14 with a mental disease without psychosis. A clinical description of the family subjects is summarized in Table S1, showing the Global Assessment of Functioning (GAF) scale for all psychotic subjects studied and the Positive and Negative Syndrome Scale (PANSS) scores for all the schizophrenic patients analyzed (Table S1).
Phenotype definition. In order to perform the genomic analysis, two phenotypes were defined: the narrow phenotype was attributed only to patients with psychosis (n = 27), including: SCZ, (n = 17), schizoaffective disorder (SCA, n = 1), BD-I (n = 8) and acute psychotic episode F23 (n = 1). The wide phenotype of illness also included patients affected of mental disease, but who have not manifested any psychotic episode, as MDD (n = 14) and attention deficit hyperactive disorder (ADHD, n = 1). Within the narrow phenotype there were 10 females (24.3%) and 17 males (41.5%). By contrary, patients with a mental disease without psychosis included 12 females (29.3%) and only 3 males (7.3%). The mean age (± standard deviation) at participation was (30.5 ± 8) years for cases and (52 ± 6) years for controls.
Linkage analysis identified a locus at 9q33.1-33.2 associated with psychiatric disorders (wide phenotype). The genome-wide results for nonparametric LOD (NPL) scores for the wide and narrow phenotypes are plotted in Fig. 2A and Table 1. A genomic region on chromosome 9 (113,117,183-124,200,417; 11 Mb) highlighted with significant LOD scores (LOD wide = 4.11; LOD narrow = 3.07) (Fig. 2). Moreover, eight other genomic regions identified in both phenotype analyses reached LOD scores above 1.5 for the wide phenotype suggestive of linkage and were considered for further analyses ( Fig. 2A and Table 1). Within those regions it is worth to mention the one at chromosome 3 (169,411,792-183,303,037; 13.89 Mb) with a LOD narrow = 2.36, and LOD wide = 1.89 (Table 1). Once linkage analysis was performed only considering the narrow phenotype, there were no linkage regions that reached significance, although ten regions had suggestive LOD scores ≥ 1, highlighting two regions of chromosome 17 with suggestive LOD scores of 1.5 (Chr17: 51,166-6,296,217, 6.24 Mb and Chr17: 33,006,378-35,752,691, 2.74 Mb) (Table S2A). Regarding the linkage analysis considering only the wide phenotype, no significant regions were identified, although thirteen regions had LOD scores ≥ 1 (Table S2B). Two regions at chromosome 19 had suggestive LOD scores > 1.5 (Chr19: 301,639-3,030,118, 2.72 Mb and Chr19: 5,892,954-7,900,562, 7.3 Mb) (Table S2B).
To narrow down the linkage regions of chromosomes 9 and 3, family-based association analyses were performed. 114 SNPs were found to be nominally significant on chromosome 9. Four SNPs (rs117920810, rs10760030, rs1888737 and rs16908402), associated with the wide phenotype analyses, remained significant after adjusting for multiple testing (p-value = 0.042617) (Fig. 2C, Table 1). The associated SNPs map at the highly conserved ASTN2/BRINP1 locus at chr9q33.1-33.2, which contains five genes (ASTN2, BRINP1, TRIM32, TLR4 and C5) that have been previously associated with neurodevelopmental disorders (reviewed in 29 ). To closely analyze the linkage and the association region, haplotype estimation was conducted using SHAPEIT4 30 , phasing the entire chromosome 9 and carefully analyzing the linkage region (Figs. S2 and S3). Two different approaches were followed: we first used the SNP array genotyping to include all SNPs with MAF < 30% (Fig. S2). And second, haplotype phasing was also performed on those patients from whom we had whole genome sequencing data using SNPs with MAF < 0.5% (Fig. S3), searching for rare haplotypes that would segregate with the disease phenotypes. Only branch-specific haplotypes were identified. The subfamily 1 has three major haplotype blocks (H) shared by all affected subjects, but subject 1 Identification of different CNVs in psychiatric and neurodevelopmental-associated loci. We next search for SVs in the linkage regions, first performing clinical karyotyping of four individuals (subjects 1-18, 1-25, 3-30 and 3-31) to discard structural variants, as balanced translocations. All four patients had nor-  for the NPL score analysis. In blue represented the LOD scores for the wide phenotype; in red the LOD scores for the narrow phenotype. (B) NPL score results for Chromosome 9. The − log10 (P value) of the family-based association test in regions with significant NPL scores are shown as dark green or light green dots for the wide and narrow phenotype, respectively. (C) Regional association plot for the 9q33.1-33.2 linked region. The dashed grey line represents the significance threshold for the associated SNPs. In red, genes previously associated with MMD.  31,32 . This duplication shares 74.5% overlap with the 3q29 duplication syndrome, which is characterized by delayed development (particularly speech delay) and intellectual disability or learning difficulties, although its manifestation varies widely (DECIPHER and 31 ). Moreover, the same subjects (3-11 and 3-31) plus the brother (3-12) of 3-11 also harbored a 127 Kb duplication at 4q35.2, a genomic region also associated with behavioral disorders as autism and ADHD (DECIPHER). Another interesting CNV identified was a 198 Kb duplication at 22q11.23, right next to the major risk locus for SCZ 33 . Phenotypes associated with duplications of similar size comprise cognitive impairment, emotional/affect behavior, hyperactivity and intellectual disability (DECIPHER and 34,35 ). The mother 1-1, affected of MDD, transmitted the DUP22q11.23 to her two psychosis-affected children (1-18 and 1-19). Three other MDD subjects (3-9, 3-10 and 4-15) and a healthy control  also harbor the DUP22q11.23. It is also worth mentioning the deletion DEL12q14.1, only identified in affected subjects, that encodes the leucine rich repeats and immunoglobulin likes domain 3 (LRIG3) gene. Siblings 1-24, 1-27 (MDD) and 1-25 (SCZ) inherited this deletion from their mother 1-6 (MDD). Phenotypes associated with similar deletions at 12q14.1 include intellectual disability and delayed speech and language development (DECIPHER and 36 ). . We first searched for rare SNVs with protein impact affecting conserved residues within 9q33.1-33.2. We did not identify any coding variant shared by all affected subjects within these coordinates. Six variants were identified in some affected subjects and were not present in any healthy control: two in ZNF618 gene (Zinc Finger Protein 618, rs762985449 and rs770522574), one in TNC (Tenascin C, rs61729478), one in CDK5RAP2 (CDK5 Regulatory Subunit Associated Protein 2, rs41296081), and two in C5 (Complement C5, rs139479771 and rs34552775) ( Table 2). Many other rare intronic or intergenic variants were only identified in affected subjects in the linkage chromosome 9 region ( Table 2). Some of these rare SNVs were branch-specific and defined the four rare-haplotype blocks identified in the linkage region of Subfamily 3 (Table S5). In the association region, four rare intergenic variants (rs191347609, rs181505483, rs191352043, and rs4837653) are located within the H4 haplotype block of Subfamily 3 at chr9: 121,694,202-122,542,663 ( Fig. S2 and Table S5), and they were shared by all the wide-affected individuals of subfamily 3. We next searched for small SVs that could not be detected by SNP-array, using different algorithms, Haplo-typeCaller of GATK, CNVnator, Manta, BreakDancerMax and CREST. CNVnator identified 7 non-reported small INDELs in non-coding regions, only in MMD subjects, affecting the genes LPAR1, HSDL2, DELEC1, PAPPA, ASTN2 and ZNF618 ( Table 2). Two of these INDELs, located at the H2 haplotype of subfamily 3 ( Fig. S2 and Table S5), were shared by all the MMD subjects of the subfamily 3 and were not present in the healthy controls: a three base pair (AGG) deletion in an intronic region of DELEC1 gene (chr9: 118,113,219), predicted to affect a histone H3 lysine 4 trimethylation (H3K4me3) site in the brain frontal cortex 37 , and a two base pairs (TG) deletion at the long intergenic non-coding RNA 474 (LINC00474) (chr9: 118,667,077) ( Table 2 and Table S5)).  Moreover, CNVnator also identified a non-reported larger deletion of 3099 bp that overlapped with the expression of DELEC1 gene ( Table 2). This deletion is only present in two psychotic siblings, 1-18 (SCZ) and 1-19 (BD-I). Manta identified four other non-reported deletions in intergenic regions at 9q33.1-33.2 ( Table 2). All these INDELs were checked by PCR and Sanger sequencing and were not identified in any healthy family control. The search for rare SNVs was extended to the rest of the genome. In Supplementary Table S4 are summarized the coding rare SNVs identified in susceptible linkage regions. Only the rs145032100 in the ARHGAP19 gene was shared by all affected subjects but was also carried by some healthy controls (Table S4A). This SNV is located at chr10q24.1, a suggestive region associated with the narrow phenotype (LOD score = 1.02).

SNVs and INDELs identified only in MMD subjects
Regions associated with the wide phenotype are enriched for genes involved in voltage-gated ion channels, microtubule organization and immune system. Functional enrichments were performed using GREAT 38 , searching for gene ontology (GO) terms associated with the significant SNPs of the linked region 9q33.1-33.2 plus the ones identified by both phenotype analyses (LOD > 1.5) ( Table 1). The background used was composed of all the filtered SNPs, previously used to perform the association analysis. Regarding GO cellular component highlighted ontologies associated with voltage-gated ion channels and tubulin cytoskeleton (Fig. 3A). Interestingly, GO Biological Process terms were enriched for genes related to neuronal migration and differentiation and genes associated with the immune response (Fig. 3B). Mouse Genome Informatics (MGI) identified enriched expression in cerebral cortex (Fig. 3C), and within GO Disease Ontology, recurrent major depression appeared as the eight most significant enriched term (Fig. 3D).
Regions associated with the narrow phenotype are enriched for genes involved in synaptic vesicle function. We also investigated whether the significant SNPs associated with the narrow phenotype from the suggestive narrow linkage regions (LOD > 1) ( Table 1 and Table S2A) showed functional enrichments related to the disease etiology. Interestingly those regions appeared enriched for synaptic vesicle function, composition and transport (Fig. 3E-H).
Psychotic subjects have increased risk associated with common variants. We finally measured polygenic risk scores (PRS) to evaluate the contribution of common variants to the psychotic phenotype. PRS were calculated using GWAS data from 9 . We observed a clear gradient in the PRS results. All psychotic members of the pedigree scored positive PRS, either using SCZ as a base dataset to calculate the PRS or the combination of SCZ and BD (Table S6) (Table S6).

Discussion
The genetics paradigm of mental illness has changed substantially in recent years. Families with high prevalence, such as the one studied, are expected to encode variants of large effect. But since MMDs are polygenic, we obviously cannot search for a single cause of the disease and whole genome approaches need to be made.
In this pedigree we identified a susceptibility locus with a predominant involvement, the 9q33.1-33.2. Previous linkage analysis in families with mental disorders have reported the same region or very close coordinates, some of which could be considered partial linkage replications [39][40][41] . Badenhop et al. found suggestive evidence for linkage for 9q31-q33 when analyzing 13 families with high prevalence of BD-I 39 . Kaufmann et al. found suggestive evidence for linkage for 9q32-9q34 when analyzing 30 nuclear SCZ African-American families comprising 98 subjects (NPL Z max = 2.17, p = 0.017) 40 . Interestingly, the highest evidence for linkage was found when considering individuals diagnosed with either BD I and II, SZA manic type, or depression as affected (NPL = 2.5 between D9S1690 and D9S1677 at 9q31-q33) 40 . Venken et al. found evidence of linkage at 9q31.1-q33 for affective disorder susceptibility analyzing nine multigenerational families from Northern Sweden 41 . Interestingly, some of these linkage reports have included depression and BD in their analyses; this is congruent with our report of linkage which is higher when including depressive subjects as affected (wide phenotype). Others have reported linkage peaks very close to the one identified in here [42][43][44] 44 . Interestingly, the same genomic region 9q33.1-33.2 has also been associated with psychotic disorders through GWAS 45 . The linked 9q33.1-33.2 region contain five candidate genes from the immune system that participate in synaptic processes and have been previously associated with neurodevelopmental disorders, ASTN2, BRINP1, C5, TLR4 and TRIM32. ASTN2 (Astrotactine 2) and BRINP1 (Bone Morphogenetic Protein/Retinoic Acid-inducible Neural specific Protein) encode proteins from the Membrane Attack Complex Perforin (MACPF) family, highly expressed in the developing brain (reviewed in 29 ). Both genes have been associated with SCZ 45,46 , BD-I 47 , and other neurodevelopmental disorders 48 and with structural abnormalities of the hippocampal volume 49 . ASTN2 facilitates glial-guided migration during brain development 50 , and regulates synaptic trafficking by modulating the composition of surface synaptic vesicle proteins 51 . BRINP1 function in neuronal development is less studied, although it has been implicated in neurogenesis 52 and cell cycle regulation 53 . In fact, Brinp1 knock-out (KO) mice evidence altered hippocampal neurogenesis 52 and exhibit altered behaviors that could model MMDs 52,54 .
C5, which encodes Complement 5 protein, is another interesting candidate gene in the linked region. Recent evidences has implicated the complement system as a promising immune mediator of SCZ (reviewed in 55 ). GWAS studies have identified association of complement components as C4 and CSMD1 with SCZ 8,56 . In addition to the genetic findings, different studies have reported increased complement expression and overall activity in the plasma or serum of SCZ patients (reviewed in 55 ). Recently, increased C5 levels have also been observed in cerebrospinal fluid of SCZ patients 57 .
TLR4 encodes the Toll-like Receptor 4, which plays a fundamental role in pathogen recognition and activation of innate immunity. TLRs express in the developing and adult CNS, in where have been involved in neurogenesis, axonal growth and structural plasticity (reviewed in 58 ). Altered TLR4 counts have been observed in SCZ patients (reviewed in 59 ), and interestingly antipsychotic treatment could normalize those counts 60 . Increased TLR4 expression has also reported in postmortem frontal cortex from SCZ patients and depressed suicide victims (reviewed in 59 ). Other evidence supporting the role of TLR4 on psychiatric diseases come from animal models. TLR4 KO mice show improved spatial memory 61 , due to increased neuronal progenitor cell proliferation and neuronal differentiation in the hippocampus, suggesting that TLR4 may act to reduce hippocampal neurogenesis 62 .
The last candidate gene in the linked region is TRIM32, a small gene nested within an intron of ASTN2 and transcribed from the opposite strand. It encodes the Tripartite motif-containing protein 32 (TRIM32). TRIM32 is a cell fate-determinant for a balanced embryonic development of the neocortex 63 and the adult neurogenesis 64 . www.nature.com/scientificreports/ Recent reports have associated TRIM32 with psychiatric disorders, such as MDD, ASD, ADHD, anxiety and obsessive-compulsive disorder (reviewed in 48 ). Interestingly, TRIM32 loss protects against the development of anxiety and depression induced by chronic stress 65 .
Although we did not find any coding variant in the linked region that segregates with all affected MMD subjects, we found several rare SNVs in those genes harbored only by MMD patients. Moreover, we cannot discard a regulatory role of the genomic region containing the two small INDELs identified at DELEC1 (deleted in esophageal cancer 1) gene and at LINC00474 in all affected subjects of subfamily 3. Further studies will have to shed light on the potential pathogenic roles of the INDELs and the rare SNVs identified in the linked region.
Furthermore, it is also interesting to highlight the only coding rare SNV identified in the family which segregates with all affected subjects, the g.99006061 G>A transition at ARHGAP19, associated with the psychotic phenotype. ARHGAP19 is another hematopoietic cell regulator, a specific Rho GTPase-activating protein (GAP) that plays an essential role in the division of T lymphocytes 66 .
Overall, our results reinforce the growing evidence linking immune system modulators with specific brain functions and MMDs. The susceptibility locus 9q33.1-33.2 should be taken into consideration in further genetic analysis, especially in those families that come from the same region.

Methods
Clinical assessments. Psychiatric assessments included semi-structured interviews, using the Spanish ver- Nonparametric linkage (NPL) analysis was carried using the NPL scoring function 71 , implemented in Merlin v1.1.2 72 . Evidence for Linkage was assessed with the Kong and Cox exponential model 73 . Allele frequencies were calculated using the maximum likelihood method. Due to the complexity of the pedigree, it was split up in three different ~ 24 bit-sized sub-pedigrees (See Fig. 1). Before running linkage, data was exhaustively quality controlled. Graphical Representation of Relationship Errors (GRR) 74 was used to identify errors in the structure of the pedigree. Whole Genome Association Analysis Toolset (PLINK 1.7) 75 was used for the SNPs quality control. SNPs were excluded when Minor Allele Frequency (MAF) < 0.05, and if showing Mendelian inconsistencies. A total of 1198 Mendelian inconsistencies were found (0.17%). Unlikely double recombinants were analyzed using the "error detection" option from Merlin v1.1.2 and subsequently excluded using the "pedwipe" option. Linkage was carried using the most heterozygous SNPs per chromosome after being modeled for LD. Model for LD was performed calculating r 2 using PLINK and removing one SNP of a pair each time r 2 > 0.5. Out of the initial 700,008 SNPs genotyped, 8,078 SNPs were selected for the analysis.

Association analyses of suggestive linkage regions and haplotyping.
Family-based association analyses were conducted using the Linkage and Association Modelling in Pedigrees Software (LAMP) 76 . We included all the SNPs from significant and suggestive linkage regions, using both definitions of the phenotype, wide and narrow. The p-values were corrected for multiple testing using the Benjamini-Hochberg correction. LAMP allows to accommodate different family structures.
Haplotype phase was estimated using SHAPEIT 4 (version 4.2) 30 and haplotypes were visualized using inPHAP 77 mapping SNPs shared by at least four affected subjects. To perform phasing two approaches were followed: (1) Genotyped SNPs from SNP array with low minor allele frequency (MAF ≤ 30%) were included.
(2) Phasing was also performed using SNPs with MAF < 1% from VCF files of WGS. Allele frequencies were extracted from gnomAD. For both approaches, SNPs with high individual missingness rate (> 80%), and high genotyping missingness rate (> 80%) were excluded. SNPs that were not called in all the genotyped subjects were also excluded. Chromosomes 9 was entirely phased.
CNV and SVs detection. CNVs were analyzed from WGS and from SNP array data.
CNVs and SVs detection from WGS data were performed taking advantage of the paired-end sequencing configuration of the samples, and using the following algorithms: (1) CNVnator 79 , a read-depth based algorithm which is useful for detecting large INDELs, insertions and deletions; (2) BreakDancerMax 80 , a paired-read based algorithm, that allows the detection of large SVs such as deletions, insertions, inversions, and intrachromosomal and interchromosomal translocations; (3) CREST 81 , an split-read based algorithm that also allows the detection of the same SVs as BreakDancerMax; (4) Manta 82 , which combines both split-read and read-pair methods and it is useful for detecting large SVs, medium-sized INDELs and large insertions; and (5) HaplotypeCaller of GATK (v3.3.0), which was used for small INDELs detection (< 50 bp). CNVnator was run in all WGS samples using standard settings and a bin size of 100 bp (optimized for 20-30× coverage). Manta was run as a joint diploid sample analysis. BreakDancerMax and CREST were used with default settings.
CNVs were also detected from SNP array data using GenomeStudio 2.0 (Illumina Inc. San Diego, California, USA), taking as a reference GRCh37/hg19. This algorithm is based on two parameters: the B allele frequency (BAF) and the Log R Ratio (LRR) which can be used to test the genotyping quality of the samples and to check the presence of CNVs across the genome. The BAF is a measure of allelic imbalance. In a normal well-genotyped sample, three genotypes are expected, homozygous AA, heterozygous AB, and homozygous BB. Once referred to the B allele, BAF is expected to have three discrete values, 0, 0.5, and 1 (representing AA, AB, and BB genotypes, respectively). R is defined as the sum of the probe intensities used to genotype the different markers. When it is normalized becomes the LRR which is a measure of relative intensity, the logarithm (base 2) of the observed value of R (observed probe intensity) divided by the expected value (expected probe intensity) 83 .
All variants identified using the different algorithms were checked on the bam files using the software IGV, designed to visualize genomic data. This allowed the detection of artifacts or variants called in low coverage regions. Non-previously reported INDELs located on the linkage chromosome 9q33.1-33.2 were validated in all the studied subjects of the family by PCR followed by electrophoresis and Sanger sequencing.
Functional enrichment of biological pathways was investigated using the online tool GREAT (Genomic Regions Enrichment of Annotations Tool; http:// great. stanf ord. edu/ public/ html/) 38 . The enrichment analyses were based on the comparison between significant SNPs associated with the phenotype and the rest of the SNPs of the SNP array.
Polygenic risk scores (PRS) were calculated for each family member (n = 34) using the PRSice-2 v2.1.11, with the publicly available PGC schizophrenia GWAS as a base dataset (33,426 SCZ cases, 54,065 controls), in addition to BD (20,129 BD cases, 21,524 controls). Before computing PRS, data was quality controlled for missingness per SNP and per subject (excluding sample with a rate of missingness higher than 10%), assigned sex inconsistencies, MAF < 0.05 in the dataset, deviances from Hardy-Weinberg equilibrium and Mendelian inconsistencies. After quality control, data from the target dataset was transformed to match the base dataset. This step is vital since any inconsistencies in the effective allele (A1) might have a profound impact on the results. PRS were calculated with default clumping settings and normalizing PRS scores.
Ethics approval. The present study was approved by the ethics committee of the Balearic Islands (CEI-IB),

Spain.
Consent to participate. Written informed consent was obtained from all studied family members.