Introduction

The volume of lateral ventricles increases in normal aging1,2,3,4. The enlargement of lateral ventricles has also been suggested in various complex neurological disorders such as Alzheimer’s disease, vascular dementia, and Parkinson’s disease5,6,7,8 as well as psychiatric disorders such as schizophrenia and bipolar disorder9,10,11. Furthermore, ventricular enlargement has been associated with poor cognitive functioning and cerebral small vessel disease pathology12,13,14. Even though it might be intuitive to interpret ventricular expansion primarily as an indicator of brain shrinkage after the onset of the disorder, recent studies have provided evidence against this notion15,16. The size of lateral ventricles is influenced by genetic factors with heritability estimated to be 54%, on average16, but changing with age, from 32–35% in childhood to about 75% in late middle and older age16. Even though the size of surrounding gray matter structures is also heritable17,18,19, ventricular volume is reported to be genetically independent of other brain regions surrounding the ventricles20. Similarly, ventricular enlargement in schizophrenia does not appear to be linked to volume reduction in the surrounding structures15.

Elucidating the genetic contribution to inter-individual variation in lateral ventricular volume can thus provide important insights and better understanding of the complex genetic architecture of brain structures and related neurological and psychiatric disorders. Candidate gene studies have identified single-nucleotide polymorphisms (SNPs) mapping to Catechol-O-Methyltransferase (COMT) and Neuregulin 1 (NRG1) genes as associated with larger lateral ventricular volume in patients with the first episode of non-affective psychosis21,22. However, a comprehensive investigation of the genetic determinants of lateral ventricular volume is lacking.

Here, we perform a genome-wide association (GWA) meta-analysis of 23,533 middle-aged to elderly individuals from population-based cohorts participating in the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium in order to identify common genetic variants that influence lateral ventricle volume. We apply a commonly used two-stage GWA design followed by a joint analysis approach that combines information across the stages and provides greater power23. We identify 7 genetic loci associated with lateral ventricular volume and report genome-wide overlap with thalamus volume.

Results

Genome-wide association results

The overview of study design is illustrated in Supplementary Fig. 1. The GWA results from 12 studies were combined in stage 1 and subsequently evaluated in an independent sample from 14 studies in stage 2. Finally, the results of stage 1 and stage 2 analyses were combined in stage 3. Detailed information on study participants, image acquisition and genotyping is provided in Supplementary Note 1 and Supplementary Data 13.

The results of the stage 1 meta-analysis (N = 11,396) are illustrated in Supplementary Fig. 2. The quantile-quantile plot suggests that potential population stratification and/or cryptic relatedness are well controlled after genomic correction (λ = 1.04) (Supplementary Fig. 2, Supplementary Table 1). The stage 1 meta-analysis identified 146 significant variant associations, mapping to three chromosomal regions at 3q28, 7p22.3, and 16q24.2 (Table 1). All 146 stage 1 significant associations replicated in the stage 2 meta-analysis (N = 12,137) with the same direction of effect at Bonferroni adjusted significance (p-value = 5 × 10−3, Supplementary Data 4), except one SNP (p-value = 7.6 × 10−3). Subsequently, the results from all individual studies were combined in the stage 3 GWA meta-analysis (N = 23,533). The quantile–quantile plot showed again adequate control of population stratification or relatedness (Supplementary Fig. 3). The combined stage 3 GWA meta-analysis identified 314 additional significant associations mapping to four additional chromosomal regions at 10p12.31, 11q23.1, 12q23.3, and 22q13.1 (Figs. 1, 2, Table 1). The effect size for the lead variant mapped to 10p12.31 locus was correlated with mean age of the cohort (r = 0.50, p-value = 0.03) (Supplementary Fig. 4). No correlation was found for the other lead variants (Supplementary Fig. 510).

Table 1 Genome-wide significant results from the meta-analyses of lateral ventricular volume
Fig. 1
figure 1

Manhattan plot for stage 3 genome-wide association meta-analysis. Each dot represents a variant. The plot shows –log10 p-values for all variants. Red line represents the genome-wide significance threshold (p-value < 5 × 10−8), whereas blue line denotes suggestive threshold (p-value < 1 × 10−5)

Fig. 2
figure 2

Regional association and recombination plots in combined stage 3 GWA meta-analysis. The left axis represents –log10 p-values for association with total later ventricular volume. The right axis represents the recombination rate, and the x-axis represents chromosomal position (hg19 genomic position). The most significant SNPs of the regions are denoted with a purple diamond. Surrounding SNPs are colored according to their pairwise correlation (r2) with the top-associated SNP of the region. The gene annotations are below the figure

Even though cohorts of European (EA) and African-American (AA) ancestry were included, all significant associations were mainly driven by EA samples (Supplementary Fig. 1112). The direction of effect size across the EA cohorts for the seven lead variants was generally concordant and showed no evidence of any single cohort driving the associations (Supplementary Fig. 11). Despite the different methods of phenotyping across the cohorts, the cohorts with different phenotyping methods showed evidence of effect suggesting that there is limited heterogeneity in effects (Supplementary Fig. 12).

To investigate whether seven lead variants have an effect in early life, childhood, the analyses were carried out in a children’s cohort of 1141 participants from Generation R study. The percentage of lead variants showing consistent direction of effect with stage 3 was 85.7% (6 out of 7, binomial p-value = 0.05) (Supplementary Data 4), and a variant mapped to the 12q23.3 region showed nominal association with lateral ventricular volume in the children’s cohort (Zscore = −2.56, p-value = 0.01). Additionally, three out of seven lead variants (or their proxies; r2 > 0.7) showed pleiotropic association (p-value < 5 × 10−8) with other traits according to the PhenoScanner database (Supplementary Data 5)24.

To capture gender-based differences, sex-stratified GWA analysis was performed (Nmen = 10,358; Nwomen = 12,872). None of the 15,660,719 variants that were tested for heterogeneity between men and women reached genome-wide significance threshold (Supplementary Fig. 13). However, an indel located at 4q35.2 showed suggestive evidence of association in men (4:187559262:C_CAA, p-value = 5.43 × 10−8) but not in women (p-value = 0.88).

Independent signals within loci

The conditional and joint (COJO) analysis using the Genome-wide Complex Trait Analysis (GCTA) identified no other additional variants, after conditioning on the lead variant at the locus 3q28, 7p22.3, 10p12.31, 11q23.1, 12q23.3, 16q24.2, or 22q13.1.

Functional annotation

A large proportion of genome-wide significant variants were intergenic (335/460) (Supplementary Fig. 14). Variants with the highest probability of having a regulatory function based on RegulomeDB score (Category 1 RegulomeDB score) were located at 7p22.3 and at 22q13.1 (Supplementary Data 6). Of seven lead variants, four were intergenic, four were in an active chromatin state and three showed expression quantitative trait (eQTL) effects (Supplementary Data 6). The lead SNP at 22q13.1 (rs4820299) was associated with differential expression of the largest number of genes (n = 6). In brain tissue, the alternate allele of this SNP was associated with higher expression of TRIOBP suggesting that higher expression was associated with smaller lateral ventricles (Supplementary Fig. 15).

Partitioned heritability

SNP-based heritability in the sample of European ancestry participants was estimated at 0.20 (SE = 0.02) using LD score regression, and this was higher in women 0.19 (SE = 0.04) than in men 0.15 (SE = 0.05). The seven lead variants explained 1.5% of total variance in lateral ventricular volume. Partitioning of heritability based on functional annotation using LD score regression, revealed significant enrichment of SNPs within 500 bp of highly active enhancers, where 17% of SNPs accounted for 54% of the heritability (p-value = 7.9 × 10−6, Supplementary Table 2). Significant enrichment was also found for histone marks including H3K27ac (which indicates enhancer and promoter regions), H3K9ac (which highlights promoters), H3K4me3 (which indicates promoters/transcription starts), and H3K4me1 (which highlights enhancers) (Supplementary Table 2)25,26.

Functional enrichment analysis

Functional enrichment analysis using regulatory regions from the ENCODE and Roadmap projects using the GWAS Analysis of Regulatory or Functional Information Enrichment with LD correction (GARFIELD) method revealed that SNPs associated with lateral ventricular volume at p-value threshold <10−5 were more often located in genomic regions harboring histone marks (H3K9ac (associated with promoters) and H3K36me3 (associated with transcribed regions))25 and DNaseI hypersensitivity sites (DHS) than a permuted background (Fig. 3, Supplementary Data 7).

Fig. 3
figure 3

Functional enrichment analysis of lateral ventricular volume loci within DNaseI hypersensitivity spots. The radial lines show fold enrichment (FE) at eight GWA p-value thresholds. The results are shown for each of 424 cell types which are sorted by tissue, represented along the outer circle of the plot. The font size is proportional to the number of cell types from the tissue. FE values are plotted with different colors with respect to different GWA thresholds. Significant enrichment for a given cell type is denoted along the outer circle of the plot from a GWA p-value threshold <10−5 (outermost) to GWA p-value threshold <10−8 (innermost). The results show ubiquitous enrichment

Integration of gene expression data

Integration of functional data from the Genotype-Tissues Expression (GTEx) project using the MetaXcan method revealed two significant associations between genetically predicted expression in brain tissue and lateral ventricular volume (Supplementary Fig. 16). Expression levels of TRIOBP at the locus 22q13.1 (p-value = 3.2 × 10−6) and MRPS16 at the locus 10q22.2 (p-value = 1.8 × 10−6) were associated with lateral ventricular volume.

Gene annotation and pathway analysis

The results of gene-based and pathway analyses are illustrated in Supplementary Table 3 and Supplementary Data 8. The pathway analysis identified “regulation of cytoskeleton organization” (GO:0051493) gene-set to be significantly enriched (p-value = 6 × 10−6). Genes of the “regulation of cytoskeleton organization” pathway have previously been implicated in various neurological or cardiovascular diseases (Supplementary Data 9). Furthermore, pathways that pointed towards sphingosine 1 phosphate (S1P) signaling showed suggestive enrichment (Supplementary Data 8).

Genetic correlation

Additionally, we examined the genetic overlap between lateral ventricular volume and other traits (Table 2). We found that genetically determined components of thalamus and lateral ventricular volumes appear to be negatively correlated (ρgenetic = −0.59, p-value = 3.14 × 10−6). This finding was also confirmed at the phenotype level (Supplementary Table 4). Weaker genetic overlap was observed with infant head circumference (ρgenetic = 0.28, p-value = 8.7 × 10−3), intracranial volume (ρgenetic = 0.35, p-value = 9 × 10−3), height (ρgenetic = −0.14, p-value = 5.7 × 10−3), and mean pallidum (ρgenetic = −0.29, p-value = 2.5 × 10−2), whereas no significant genetic overlap was found with neurological diseases, psychiatric diseases, or personality traits.

Table 2 The results of genetic correlation between the lateral ventricular volume and anthropometric traits, brain volumes, neurological and psychiatric diseases and personality traits

Genetic risk score

We next examined the association of genetic risk scores (GRS) for Alzheimer’s disease, Parkinson’s disease, schizophrenia, bipolar disorder, cerebral small vessel disease, and tau-related pathology, including tau and phosphorylated tau levels in cerebrospinal fluid, amyotrophic lateral sclerosis (ALS), and progressive supranuclear palsy (PSP), using the lead SNPs from the largest published GWA study and lateral ventricular volume (Supplementary Data 10). We found a suggestive association of GRS for tau levels in cerebrospinal fluid (p-value = 9.59 × 10−3) and lateral ventricular volume (Supplementary Table 5). The association was driven by one SNP (Supplementary Table 6). No association was observed with other examined phenotypes (Supplementary Table 5).

Discussion

We have performed the first genome-wide association study of lateral ventricular volume including up to 23,533 individuals. We identified statistically significant association between lateral ventricular volume and variants at 7 loci. Additionally, we found that genetically determined components of thalamus and lateral ventricular volume are correlated.

The strongest association was observed at the intergenic 3q28 locus between non-coding RNA SNAR-I and OSTN. This region has previously been associated with cerebrospinal fluid tau/ptau levels and Alzheimer’s disease risk, tangle pathology and cognitive decline27. Similarly, the genome-wide significant locus at 12q23.3 encompasses NUAK1, which has also been associated with tau pathology. Nuak1 modulates tau levels in human cells and animal models and associates with tau accumulation in different tauopathies28. NUAK1 is most prominently expressed in the brain where it has a role in mediating axon growth and branching in cortical neurons29. The lead SNP of the 12q23.3 locus mapped to an intron of NUAK1. This SNP is among the top 1% of most deleterious variants in the human genome based on its Combined Annotation Dependent Depletion (CADD) score of 21.5 and is located in an enhancer region (Supplementary Data 6). Interestingly, this variant also showed an effect in early life.

In our data, the significant variants of 7p22.3 region had the highest probability of being regulatory based on the RegulomeDB score (1b). The lead variant at 7p22.3 was in an active chromatin state and was associated with differential expression of GNA12 (Supplementary Data 6). The GNA12 gene is involved in various transmembrane signaling systems30,31,32,33. Interestingly, this gene was part of S1P signaling pathways identified to be enriched among genes associated with lateral ventricular volume. S1P, a bioactive sphingolipid metabolite, regulates nervous system development34 such as neuronal survival, neurite outgrowth, and axon guidance35,36, and plays a role in neurotransmitter release37. It also plays a role in regulating the development of germinal matrix (GM) vasculature38. Disruption of S1P regulation results in defective angiogenesis in GM, hemorrhage, and enlarged ventricles38.

The other identified locus, 16q24.2, has previously been connected with small vessel disease and white-matter lesions formation39. Further, the alternate allele of the lead SNP at 22q13.1 in TRIOBP is associated with higher expression of the same gene in basal ganglia and brain cortex, and the same allele is associated with smaller lateral ventricular volume. Interestingly, predicted expression of this gene in cerebral cortex was significantly associated with lateral ventricular volume, suggesting a causal functional role of the gene. The same analysis revealed significant association of the expression of MRPS16 in frontal cortex with lateral ventricular volume. This gene was previously related to agenesis/hypoplasia of corpus callosum and enlarged ventricles40.

Finally, the lead intergenic SNP at 11q23.1 maps between C11orf53 and ARHGAP20, whereas the 10p12.31 region encompasses MLLT10 which has been linked to various leukemias, ovarian cancer, and meningioma41,42. The effect size of this variant on lateral ventricular volume was correlated with mean cohort age, with the effect being near zero at younger age and larger at older ages.

The gene-enrichment analysis highlighted “regulation of cytoskeleton organization” (GO:0051493) pathway. Genes that are part of this pathway have previously been implicated in various neurological diseases such as Parkinson’s disease (PARK2), frontotemporal dementia (MAPT), neurofibromatosis 2 (NF2), tuberous sclerosis (TSC1) (Supplementary Data 9). The cytoskeleton is essentially involved in all cellular processes, and therefore crucial for processes in the brain such as cell proliferation, differentiation, migration, and signaling. Dysfunction of cytoskeleton has been associated with neurodevelopmental, psychiatric and neurodegenerative diseases43,44,45.

Previous studies showed significant sex-specific differences in lateral ventricular volume46,47. In our study we did not observe sex-specific differences; as for the lead seven variants, both males and females were contributing to the association signal. However, we observed only one suggestive association at 4q35.2 that showed association in men only. The lead variant (indel) is mapped to FAT1 which encodes atypical cadherins. Mutation in this gene causes a defect in cranial neural tube closure in a mouse model and an increase in radial precursor proliferation in the cortex48. However, the SNP-based heritability estimates were slightly higher in females. This may be explained by the differences in sample size in male and female-specific analyses implying that there is lower precision.

We estimated that 20% of genetic variance in lateral ventricular volume could be explained by common genetic variants, suggesting that common variants represent a substantial fraction of overall genetic component of variance. Moreover, the most statistically significant effect occurred in the regions of highly active enhancers and histone marks, suggesting their involvement in gene expression. Using the LD score regression method, we found a significant negative genetic correlation between lateral ventricular volume and thalamus volume. However, these may not be independent events, but inverse reflections of the same biology. Even though not strictly significant, we also observed trends for genetic correlations with other brain volumetric measures. Furthermore, no genome-wide overlap was found between lateral ventricular volume and various neurological or psychiatric diseases. Given that enlargement of lateral ventricles has been suggested in Alzheimer’s disease, we examined the association of APOE alleles and found no association between the APOE ɛ4 (p-value = 0.86) or APOE ɛ2 (p-value = 0.81) and lateral ventricular volume in our study population.

As we identified loci underlying lateral ventricular volume at the genome-wide level, but also genes and common pathways, our results provide various insights into the genetic contribution to lateral ventricular volume variability and a better understanding of the complex genetic architecture of brain structures. The genes with variants that we found to be associated with lateral ventricular volume are relevant to neurological aging given the characteristics of the study population which is relatively free from the disease as participants with stroke, traumatic brain injury and dementia at the time of magnetic resonance imaging (MRI) were excluded. This is in line with the previously published work of Pfefferbaum et al. who showed that the stability of lateral ventricles is genetically determined, whereas other factors such as normal aging or trauma and disease play a role in its change1,16.

However, while studying genetic overlap of lateral ventricular volume and various neurological or psychiatric disorders at multiple levels (LD score regression/polygenic, GRS/oligogenic, GWA hits/monogenic), we found evidence that some single genetic variants have pleiotropic effect on lateral ventricular volume and biochemical markers for a neurological disease (AD) or meningioma (Supplementary Data 5), while no evidence was found for genetic overlap with other neurological or psychiatric disorders (Table 2, Supplementary Table 5). The pattern of association between lateral ventricular volume and psychiatric disorder, i.e., schizophrenia on multiple scales is similar to the findings of Franke et al. who evaluated association of various subcortical brain volumes and schizophrenia and reported no evidence of genetic overlap49. Even though our study does not provide a definite statement regarding the relationship between lateral ventricular volume and neurological or psychiatric disorders, it lays the foundation for future studies which should disentangle whether lateral ventricular volume is genetically related or unrelated to various neurological and psychiatric disorders (e.g., result from reverse causation). Novel insights may be revealed by improving the power of the studies, studying homogeneous samples with harmonized phenotype assessment methods along with evaluation of common and rare variants.

The strengths of our study are the large sample, population-based design and the use of quantitative MRI. Our study also has several limitations. Despite the effort to harmonize phenotype assessment, the methods used to quantify lateral ventricular volume differ across cohorts. Because of this phenotypic heterogeneity, association results of participating cohorts were combined using a sample-size weighted meta-analysis, thus limiting discussion on effect sizes. Secondly, phenotypic heterogeneity may have caused the loss of statistical power. However, despite heterogeneity in the phenotype assessment, the association signals were coming from several studies irrespective of the method of phenotype assessment, which suggests robustness of our findings. Furthermore, although we made an effort to include cohorts of EA and AA ancestry, the study comprised predominately of individuals of European origin (22,045 individuals of EA and 1488 of AA ancestry). Given the disparity in sample size, it is difficult to distinguish whether any inconsistency in results between the two groups stems from true genetic differences or from differential power to detect genetic effects. Indeed, this is also exemplified by the plots of the Z-scores (Supplementary Fig. 11) showing that direction of effect size in AA cohorts is often inconsistent with the direction of effect size in EA cohorts. However, the same inconsistency can be observed with European cohorts of equally small sample size. This inconsistency may be due to small sample size rather than ethnic background but we cannot rule out that racial-ethnic specific effects may exit. This limitation underscores the need for expanding research studies in non-European populations. Finally, as some loci only reached the genome-wide significance in the combined meta-analysis, they should be considered as highly probable findings and would still require independent replication.

To conclude, we identified genetic associations of lateral ventricular volume with variants mapping to 7 loci and implicating several pathways, including pathway related to tau pathology, cytoskeleton organization, and S1P signaling. These data provide new insights into understanding brain morphology.

Methods

Study design

The overview of study design is illustrated in Supplementary Fig. 1. We performed a GWA meta-analysis of 11,396 participants of mainly European ancestry from 12 studies (stage 1) that contributed summary statistic data before a certain deadline. The deadline was set prior to data inspection and was not influenced by the results of the GWA meta-analysis. Variants that surpassed the genome-wide significance threshold (p-value < 5 × 10−8) were subsequently evaluated in an independent sample of 12,137 participants of mainly European ancestry from 14 studies (stage 2). Finally, we performed a meta-analysis of all stage 1 and stage 2 studies (stage 3).

Study population

All participating studies are part of the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium50. A detailed description of participating studies can be found in Supplementary Note 1. General characteristics of study participants are provided in Supplementary Data 1. Written informed consent was obtained from all participants. Each study was approved by local ethical committees or the institutional review boards (see Supplementary Note 1 for details).

Imaging

Each study performed MRI and estimated the volume of the lateral ventricles and intracranial volume (ICV). The field strength of scanners ranged from 0.35 to 3 T. Information on scanner manufacturers and measurement methods is provided in Supplementary Data 2. While most of the studies quantified lateral ventricular volume using validated automated segmentation methods, some studies used validated visual grading scales. The visual and volumetric scales were compared previously and showed high agreement for lateral ventricular volume2. The assessment of consistency of lateral ventricular volume on volumetric scale across time and different versions of software (freesurfer v4.5, v5.1, and v6.0), revealed high intraclass correlation (ICC > 0.98) in a subset of participants from the Rotterdam Study. Participants with dementia at the time of MRI, traumatic brain injury, prior or current stroke or intracranial tumors were excluded.

Genotyping and imputation

Information on genotyping platforms, quality control procedures and imputations methods for each participating study are provided in Supplementary Data 3. All studies used commercially available genotyping arrays, including Illumina or Affymetrix arrays. Similar quality control procedures were applied for each study (Supplementary Data 3). Using the validated software (Minimac51, IMPUTE52, BEAGLE53), each study performed genotype imputations using mostly the 1000 Genome phase 1 v3 reference panel.

Genome-wide association (GWA) analysis

Each participating study performed the GWA analysis of total lateral ventricular volume under an additive model using variant allele dosage as predictors and natural logarithm of the total lateral ventricular volume as the dependent variable. Transformation of the lateral ventricular volume was applied to obtain approximately normal distribution (Supplementary Fig. 17). The association analyses were adjusted for age, sex, total intracranial volume, age2 if significant, population stratification, familial relationship (family-based studies) or study site (multi-site studies). Population stratification was controlled for by including principal components derived from genome-wide genotype data. Study-specific details on covariates and software used are provided in Supplementary Data 3. Quality control (QC) was conducted for all participating studies using a standardized protocol provided by Winkler et al.54. Variants with low imputation quality r2 < 0.3 or minor allele count (MAC) ≤ 6 were filtered out. The association results of participating studies were combined using a fixed-effect sample-size weighted Z-score meta-analysis in METAL because of the difference in measurement methods of lateral ventricular volume55. Genomic control was applied to account for small amounts of population stratification or unaccounted relatedness. After the meta-analysis, variants with information in less than half the total sample size were excluded. Meta-analyses were performed separately for each of the stages. In the stage 1 meta-analysis, a p-value < 5 × 10−8 was considered significant. Variants that surpassed the threshold were evaluated in the stage 2 meta-analysis. In order to model linkage disequilibrium (LD) between those variants, we first calculated the number of independent tests using the eigenvalues of a correlation matrix using the Matrix Spectral Decomposition (matSpDlite) software56. Subsequently, a Bonferroni correction was applied for the effective number of independent tests (0.05/10 independent SNPs = 5 × 10−3). Additionally, all analyses were stratified by sex. Following the same QC steps as for overall analyses, the sex-stratified association results of participating studies were combined using a fixed-effect sample-size weighted Z-score meta-analysis in METAL while applying genomic control55. The variants were assessed only if test statistics (Z-score) were heterogeneous between males and females (p-value < 0.1) and if the association in a sex-combined analysis did not reach genome-wide significance threshold57.

Conditional analysis

In order to identify variants that were independently associated with lateral ventricular volume, we performed conditional and joint (COJO) GWA analysis using Genome-wide Complex Trait Analysis (GCTA), version 1.26.058. LD pattern was calculated based on 1000 Genome phase 1v3 imputed data of 6291 individuals from the Rotterdam Study I.

Functional annotation

To annotate genome-wide significant variants with regulatory information, we used HaploReg v4.159, RegulomeDB v1.160, and Combined Annotation Dependent Depletion (CADD) tools61. To determine whether they have an effect on gene expression, we used GTEx data62. For the lead variants, we explored 5 chromatin marks assayed in 127 epigenomes (H3K4me3, H3K4me1, H3K36me3, H3K27me3, H3K9me3) of RoadMap data63. To search for pleiotropic associations between our lead variants and their proxies (r2 > 0.7) with other traits, we used the PhenoScanner database designed to facilitate the cross-referencing of genetic variants with many phenotypes9. The association results with genome-wide significance at 5 × 10−8 were extracted.

Variance explained

The proportion of variance in lateral ventricular volume explained by each lead variant was calculated using Pearson’s phi coefficient squared as explained in Draisma et al.64. The total proportion of variance in lateral ventricular volume was calculated by adding up the proportions of variance in lateral ventricular volume explained by each lead association signal.

Partitioned heritability

SNP-based heritability and partitioned heritability analyses were performed using LD score regression following the previously described method65. Partitioned heritability analysis determines enrichment of heritability in SNPs partitioned into 24 functional classes as reported in Finucane et al.65. To avoid bias, an additional 500 bp window was included around the variants included in the functional classes. Only the HapMap3 variants were included as these seem to be well-imputed across cohorts.

Functional enrichment analysis

We performed functional enrichment analysis using regulatory regions from the ENCODE and Roadmap projects using GWAS Analysis of Regulatory or Functional Information Enrichment with LD correction (GARFIELD) method66. The method provides fold enrichment (FE) statistics at various GWA p-value thresholds after taking into account LD, minor allele frequency, and local gene density66. The FE statistics were calculated at eight GWA p-value thresholds (0.1 to 1 × 10−8). The associations were tested for various regulatory elements including DNase-I hypersensitivity sites, histone modifications, chromatin states and transcription factor binding sites in over 1000 cell and tissue-specific annotations66. The significance threshold calculated based on the number of annotations used was set at 4.97 × 10−5.

Integration of gene expression

To integrate functional data in the context of our meta-analysis results, we used the MetaXcan method, which evaluated the association between lateral ventricular volume and brain-specific gene-expression levels predicted by genetic variants using the data from GTEx project62,67. This method is an extension of PrediXcan method modified to use summary statistic data from meta-analysis67. Based on a total number of genes tested, the Bonferroni-corrected significance threshold was set to 0.05/12,379 = 4 × 10−6.

Gene annotation and pathway-based analysis

The gene-based test statistics were computed using VEGAS2 software which tests for enrichment of multiple single variants within the genes while accounting for LD structure68. LD structure was computed based on the 1000 Genomes phase 3 population. Variants within 10 kb of the 5′ and 3′ untranslated regions were included in this analysis in order to maintain regulatory variants68. Subsequently, the gene-based scores were used to perform gene-set enrichment analysis using VEGAS2pathway69. VEGAS2Pathway approach accounts for LD between variants within a gene, and between neighboring genes, gene size, and pathway size69. It uses computationally predicted Gene Ontology pathways and curated gene-sets from the MSigDB, PANTHER, and pathway commons databases69. The pathway-based significance threshold was set to the p-value = 1 × 10−5 while taking into account the multiple testing of correlated pathways (0.05/5000 independent tests)69.

Genetic correlation

We used the LD score regression method to estimate genetic correlations between lateral ventricular volume and various traits including anthropometric traits, brain volumes, neurological and psychiatric diseases and personality traits. The analyses were performed using a centralized database of summary-level GWA study results and a web interface for LD score regression, the LD-hub70. Summary-level GWA study results for white matter hyperintensities were obtained from the CHARGE consortium71 and the analyses were performed using the ldsc tool (https://github.com/bulik/ldsc).

Genetic risk scores

We generated genetic risk scores (GRS) for Alzheimer’s disease, amyotrophic lateral sclerosis (ALS), Parkinson’s disease, bipolar disorder, schizophrenia, white matter lesions and tau-related phenotypes. The tau-related phenotypes, including tau and phosphorylated tau levels in cerebrospinal fluid, and progressive supranuclear palsy (PSP), were studied in relatively small sample and are therefore not appropriate for LD score regression. We extracted the lead genome-wide significantly associated SNPs and their effect estimates from the largest published GWA studies (Supplementary Data 10). For white matter lesions burden, effect estimate and standard errors were estimated from Z-statistics using the previously published formula72. The allele associated with an increased risk in corresponding traits was considered to be the effect allele. The weighted GRS was constructed as the sum of products of effect sizes as weights and respective allele dosages from 1000 Genome imputed data of Rotterdam Study using R software version 3.2.5 (https://www.R-project.org). Variants with low imputation quality (r2 < 0.3) were excluded. Subsequently, the GRS was tested for association with lateral ventricular volume in three cohorts of Rotterdam Study while adjusting for age, sex, total intracranial volume, age2 and population stratification. The significance threshold for genetic risk score association was set to p-value = 5 × 10−3 (0.05/10) based on the number of genetic risk scores tested.