Introduction

There is ample evidence of partially overlapping brain morphology abnormalities in schizophrenia (SZ), schizoaffective disorder (SAD), and bipolar disorder (BD)1,2,3. These disorders also have a shared set of genetic loci associated with them which can generate diverse disease phenotypes (pleiotropism), and a partially shared polygenic diathesis4,5, as well as shared behavioral features, including psychosis in some BD patients. Although large samples of patients with psychotic mental disorders and controls have undergone GWAS, the phenotypes in these studies have been mostly restricted to diagnostic categories4,6, and most studies of intermediate phenotypes associated with psychosis reported are based on smaller samples7,8,9,10. Gottesman proposed the term endophenotype for sub-phenotypes observable in patients that might prove more tractable to genetic analysis11,12, and independently Gershon et al.13,14 proposed a similar idea. Following these concepts, the Bipolar and Schizophrenia Network for Intermediate Phenotypes study (B-SNIP) collected multiple phenotypes on each studied individual, aimed to identify biomarkers of psychosis and their genetic correlations, which can help elucidate mechanisms of the disorders, and lead to improved classifications and personalized treatment15. Previous neuroimaging studies found enlargement of the ventricular system in bipolar disorder and schizophrenia16,17, and the temporal (inferior) horns of the lateral ventricles (THLV) were also reported enlarged in patients with these disorders18. Here we present genome-wide association analyses of brain imaging in 777 B-SNIP patients with psychosis and healthy controls. Although this sample size is smaller than current large case-control studies, the power to detect associations using quantitative phenotypes is higher than with dichotomous traits, and pooling ill subjects with normal controls to analyze quantitative traits has the advantage of a wider range in the observed phenotypes, which translates into higher power to detect genetic associations19. As noted by Dahl et al.20, deep phenotyping with simultaneous genome-wide analyses can serve as a discovery tool for previously unsuspected relationships of phenotypic traits with each other, and with shared molecular events. In this study we found a significant association between the THLV volume and a common intronic SNP of the gene Neurexin 1 (NRXN1). NRXN1 is known to be involved in brain development and function. These facts support the potential role of this common variant of NRXN1 in psychosis.

Methods

The Bipolar and Schizophrenia Network for Intermediate Phenotypes (B-SNIP) studies individuals with schizophrenia (SZ), schizoaffective disorder (SAD), bipolar disorder with psychosis (BD), and healthy controls. Data were collected between 2008 and 2012 at six study sites in the USA, with IRB approval at each participating institution. Details of phenotype collection are described in Tamminga el al.21,22, including standardization of phenotyping methods across all the collaborating sites. Here we present genome-wide association analyses of structural brain MRI data from 59 brain volumes plus the whole brain gray matter density, studied in 483 cases (169 BD, 127 SAD, and 187 SZ) and 294 healthy controls. Sample demographics are shown in Table 1. A total of 60 quantitative traits (listed in Supplementary Material) from these 777 cases and controls pooled together are analyzed here as phenotypes in genome-wide association studies (GWAS). Diagnostic categories were not used as phenotypes or as covariates in GWAS, except for post-hoc analyses to check on phenotypic associations with case-control status in this dataset, and whether our top SNP was associated with disease. The rationale to pool cases and controls is that in the study of quantitative traits when the phenotype is associated with disease there will be greater phenotypic variation and thus more power to detect association. These associations could help to explain part of the shared pathophysiology of these disorders. Including each diagnostic category as a covariate would ignore the genetic intercorrelation of the diagnoses, and detract from estimates of genetic associations with the tested phenotypes.

Table 1 Sample Demographics

Phenotypes: Image processing

Structural T1-weighted 3D magnetization-prepared rapid gradient echo (MPRAGE) scans were acquired at six sites: Boston (3.0 T, GE Signa), Detroit (3.0 T, Siemens Allegra), Baltimore (3.0 T, Siemens Trio trim), Hartford (3.0 T, Siemens Allegra), Dallas (3.0 T, Philips) and Chicago (3.0 T, GE Signa). The scans (TR = 6.7 msec, TE = 3.1 msec, 8° flip angle, 256 × 240 matrix size, total scan duration = 10:52.6 min, 170 sagittal slices, 1 mm slice thickness, 1 × 1 × 1.2 mm3 voxel resolution) were acquired following the Alzheimer’s Disease Neuroimaging Initiative (ADNI) protocol (http://adni.loni.usc.edu/methods/documents/mri-protocols/). Scans were visually assessed for artifacts and processed using Freesurfer 6.0 (https://surfer.nmr.mgh.harvard.edu/)23 using Scientific Linux 7.5. The scans went through first-level auto-reconstruction to undergo registration in standard space, motion correction and skull striping. Trained raters edited the images to remove dura, sinuses and blood vessels that could interfere with segmentation. Scans were run through second and third level auto-reconstruction to segment gray-white matter, reviewed by two independent raters and removed if motion interfered with segmentation. A total of 777 scans were retained for analysis and registered to the Desikan/Killiany atlas24. For this study we used only FreeSurfer volumetric variables, joining left and right hemisphere volumes into single bilateral variables. The whole brain gray matter density was calculated using Voxelbased Morphometry Toolbox for SPM8 as in Ivleva et al.25.

Genotypes

Genotypes were assessed from blood DNA at the Broad Institute using the Illumina Infinium Psycharray (PsychChip) (Illumina Inc., San Diego, CA, USA), which contains a total of 588,454 SNP markers, including 50,000 specific genetic markers for neuropsychiatric disorders. PsychChip genotype calls were processed through a custom pipeline at the Broad Institute designed to merge calls from 3 different algorithms (GenCall, Birdseed and zCall), in order to maximize reliability and usability of rare markers (https://sites.google.com/a/broadinstitute.org/psych-chip-resources/genotype-calling). We used programs PREST-plus26 and KING27 in order to remove individuals with 3rd degree or closer kinship. Pre-Imputation QC applied to PsychChip genotypes included filtering by Call Rate > 98% by SNP and > 98% by sample, HWE P-value > 1E–06 in controls, Inbreeding Coefficient (−0.2 > F_Het > 0.2), exclusion of monomorphic markers, Sex Check using X-chromosome heterozygosity and Y-chromosome call rate (all samples with sex check mismatch were dropped), and Minor Allele Frequency (MAF) > 0.01.

Genotype Imputation

PsychChip genotypes were imputed to the 1000 Genomes phase 1 multiethnic reference panel28 using IMPUTE2 and HAPI-UR for pre-phasing29,30,31. Chromosomes were phased separately, and then divided into 5 megabase pair chunks for imputation. Poorly imputed SNPs (those with information score < 0.5) were filtered out post-imputation. Imputed genotype probabilities with an uncertainty <0.1 were transformed to hard calls using PLINK 1.9. The final imputed genotype set contained 30 million autosomal markers, reduced to 4,322,238 variants after filtering for missingness by marker < 0.05, missingness by individual sample < 0.02, and MAF > 0.05, in order to work with common variants only, given the limited power to detect rare events with our sample size (see power analysis in Suppl. Material).

GWAS analyses

GWAS analyses were performed using PLINK 1.9 linear regression model32,33 on the Bionimbus protected cloud of the Open Science Data Cloud servers at the University of Chicago (http://www.opensciencedatacloud.org)34. Genetic locations refer to the human genome GRCh37/hg19 build, gene mapping was done using the UCSC genome browser35, and graphics generated with Manhattan Plotter, FUMA36 and LocusZoom37. Q–Q plots and calculated lambdas were used to examine possible inflation in GWAS results. We corrected for population stratification and admixture using the method of Price et al.38; we used the first two eigenvectors from principal component analysis on the genotypes as covariates, which captured the majority of the ethnic-related variance (see Supplementary Material). Sex and age were used as covariates in all our analyses, and the total intracranial volume was also used as a covariate except for the association with itself as phenotype. Covariates used for gray matter density GWAS were age, sex, handedness and two PCA eigenvectors from genotypes.

Post-GWAS eQTL and functional genomic analyses were performed on FUMA version 1.3.4 (K. Watanabe, http://fuma.ctglab.nl/)36, using data from The Genotype-Tissue Expression (GTEx) project version 7 (https://gtexportal.org)39 and the BrainSpan Atlas of the Developing Human Brain (http://www.brainspan.org/)40. The UK Brain Expression Consortium database (http:// http://www.braineac.org/)41 was also consulted for eQTLs of our top SNP markers.

Significance Thresholds

We calculated the GWAS statistical significance threshold for a single phenotype as P < 5.5E–08 using the Li et al. method based on independent LD blocks from our genotypes42. In the context of multiple phenotype analyses reported, this is our threshold for suggestive significance. Many of the studied phenotypes are correlated, and as Bonferroni correction assumes independence of the tests, using it would increase the type-II error. We followed the Li and Ji method based on spectral decomposition of the phenotype correlation matrix to calculate the effective number of independent phenotype tests43. Thus, our significance threshold after multiple test correction was 5.5E–08/27 independent effective tests = P < 2.04E–09.

Additional phenotypic associations

Phenotypic associations were tested using analysis of variance (ANOVA) and partial correlations, calculated with IBM SPSS Statistics version 24.

Results

We identified a significant genome-wide association after multiple test correction between the combined volume of the temporal (inferior) horns of the lateral ventricles (THLV) and the T allele of rs12467877 (P = 6.76E–10), an intronic SNP of the gene Neurexin1 (NRXN1), see Table 2 and Figs. 1 and 2. The association between THLV and the NRXN1 allele was present in both of the two major ethnic groups of this sample (P = 8.195E–07 in Caucasians and P = 9.97E–05 in African-Americans) with no significant allelic frequency differences between them for rs12467877 (MAF 0.13 and 0.09 respectively, ANOVA P = 0.12, see Suppl. Material), which is consistent with reference data44. The T allele was associated with enlargement of THLV and its effect size on this phenotype was 0.045 (R2G), that is, 4.5% of the phenotype variance was attributable to this SNP. However, the identified NRXN1 SNP was not associated with case-control status (P = 0.21, using PLINK logistic regression model, controlling for sex and ethnicity).

Table 2 Genome-Wide Significant and Suggestive Association results
Fig. 1: Genome-wide association of the volume of the temporal horn of the lateral ventricle in the B-SNIP sample.
figure 1

Green dotted line: significance threshold after multiple test correction (P < 2.04E–09). Red dotted line: suggestive significance threshold (P < 5.56E–8)

Fig. 2: Regional association plot of lead variant: Regional association plot of 2p16.3 with the temporal horn of the lateral ventricle.
figure 2

Most significant associated SNP in violet

Other suggestive GWAS findings

Eight more genomic regions had P values lower than 5.5E−08 with 6 phenotypes; the top 3 were intronic SNPs of genes SPATA16, MAP9 and GSG1L associated with THLV (Table 2).

Precentral, Pallidum and Lateral Ventricle volumes had suggestive associations with intergenic SNPs. A variant in an uncharacterized protein, KIAA1257, was found to be suggestively associated with total cortical volume, and the gene WBP2NL, a sperm domain-binding protein previously associated with intelligence, was associated in our study with the volume of Isthmus Cingulate.

GWAS summary statistics for SNP associations with P < 1E–5 from the 60 brain morphology phenotypes studied are available in Supplementary Material.

Phenotypic associations

In our sample we observed larger volumes of the temporal (inferior) horns of the lateral ventricles in cases vs. controls (ANOVA P = 0.015). Hippocampal volumes were smaller in cases than in controls (ANOVA P = 4E–6), and amygdala volumes were also significantly smaller in cases (P = 0.01), as reported in previous publications of our group25,45,46. Enlargement of THLV was correlated in our sample with volume reduction of the hippocampus and with enlargement of the choroid plexus and caudate (Table 3).

Table 3 Partial correlations of volumes of the combined temporal horns of lateral ventricles and surrounding structures

Discussion

Enlarged ventricular volumes have been among the most consistent anatomical alterations found in SZ and BD, first reported by Johnstone in schizophrenia more than forty years ago47. This has been confirmed in SZ and BD patients compared with healthy controls in large studies by the ENIGMA consortium, and the heritability of the volume of lateral ventricles was calculated to be 0.54 by Kremen et al.48. The temporal horn of the lateral ventricle traverses the temporal lobe in a lateral-anterior direction, bordering the caudate, putamen, amygdala, choroid plexus, and hippocampus. Increased ventricular volumes observed in schizophrenia have been attributed to volume reduction of surrounding gray matter structures49,50, which is confirmed in this study. Enlargement of choroid plexus in this sample was also associated with cognitive and structural connectivity problems in cases, published by Lizano et al.51.

The UK Biobank has published structural MRI GWAS results that contain SNP associations with the temporal horn52, which did not replicate our significant finding (UK biobank result for rs12467877 P = 0.095). Non-replication of our finding in the UK Biobank sample may be due to a negligible proportion of psychotic patients in their dataset. GWAS analysis of THLV in our controls (only) also gave a non-significant association for rs12467877 (P = 0.07). The Psychiatric Genomic Consortium (PGC) did not find association of case status with rs12467877 (P = 0.076) in their large case-control schizophrenia GWASs4,53, which indicates that the NRXN1 association is pertinent to the intermediate phenotype (THLV volume) and not directly with diagnosis.

NRXN1 encodes a large presynaptic transmembrane protein that binds neuroligins to calcium-dependent synaptic complexes in the central nervous system, and is thus involved in the formation of synaptic contacts54. This gene is highly intolerant to loss of function mutations, as described by the Exome Aggregation Consortium (ExAC), with a probability of Loss of function Intolerance, pLI = 1 (https://decipher.sanger.ac.uk/genes). Structural copy number variants (CNVs) disrupting NRXN1 are associated with a wide spectrum of brain disorders including schizophrenia, autism, and developmental disorders55,56,57. The UK Brain Expression Consortium (http://www.braineac.org/)41 has reported expression data for NRXN1 related to rs12467877, with no significant evidence for affecting expression (see Suppl. Material, page 10). The Genotype-Tissue Expression (GTEx) project (http://gtexportal.org) also does not report functional associations for rs12467877. Functional analyses using FUMA identified this and other suggestively associated SNPs to be related to the expression of other genes, such as PPP1R21 and GSG1L (Suppl. Material), which are highly expressed in brain during development and adulthood.

Three other genes suggestively associated with THLV are SPATA16, MAP9 and GSG1L. SPATA16 is involved in spermatogenesis and structural variants of this gene have been associated with globozoospermia58. MAP9 (also known as ASAP) is a gene encoding a microtubule associated protein highly expressed in brain and thyroid (https://gtexportal.org)59. The germ cell-specific gene 1-like protein (GSG1L) is a component of the AMPA receptor complex, a glutamate transmembrane receptor in the CNS, highly expressed in nucleus accumbens and basal ganglia, that has been associated with synaptic plasticity, mathematical ability, and creative activities in music60,61,62. Functional analysis of the GSG1L suggestively associated SNPs identified additional eQTLs (See Supplementary Material).

This is the first report of a common SNP variant of NRXN1 associated with enlargement of THLV volume in psychosis, and this volume is correlated with reduction of the hippocampus and enlargement of the caudate and choroid plexus. The association p-value 6.76E–10 is significant after multiple test correction, but further research with a larger sample of cases and controls will be needed to confirm these results. Although the role of this common NRXN1 SNP in psychosis is indirect, this study adds evidence to the role of NRXN1 in psychosis, which was originally implied by rare CNVs of this gene.