Genome-wide association analysis of hippocampal volume identifies enrichment of neurogenesis-related pathways

Adult neurogenesis occurs in the dentate gyrus of the hippocampus during adulthood and contributes to sustaining the hippocampal formation. To investigate whether neurogenesis-related pathways are associated with hippocampal volume, we performed gene-set enrichment analysis using summary statistics from a large-scale genome-wide association study (N = 13,163) of hippocampal volume from the Enhancing Neuro Imaging Genetics through Meta-Analysis (ENIGMA) Consortium and two year hippocampal volume changes from baseline in cognitively normal individuals from Alzheimer’s Disease Neuroimaging Initiative Cohort (ADNI). Gene-set enrichment analysis of hippocampal volume identified 44 significantly enriched biological pathways (FDR corrected p-value < 0.05), of which 38 pathways were related to neurogenesis-related processes including neurogenesis, generation of new neurons, neuronal development, and neuronal migration and differentiation. For genes highly represented in the significantly enriched neurogenesis-related pathways, gene-based association analysis identified TESC, ACVR1, MSRB3, and DPP4 as significantly associated with hippocampal volume. Furthermore, co-expression network-based functional analysis of gene expression data in the hippocampal subfields, CA1 and CA3, from 32 normal controls showed that distinct co-expression modules were mostly enriched in neurogenesis related pathways. Our results suggest that neurogenesis-related pathways may be enriched for hippocampal volume and that hippocampal volume may serve as a potential phenotype for the investigation of human adult neurogenesis.

rate is 1.41% for cognitively normal older adults and in adults, new neurons are added in each hippocampus daily via adult neurogenesis with an annual turnover of 1.75% and a modest decline during aging 4,5 . Combination of structural MRI and immunohistological markers for newborn neurons and neural stem/progenitor cells in neurogenesis-related brain regions in mice revealed that neurogenesis is associated with increased hippocampal gray matter volumes in mice 6,7 . There is hippocampal atrophy and reduction of hippocampal neurogenesis in adult rats exposed to oxygen deprivation during birth 8 . Recently, it has been found that cognitively normal individuals had preserved neurogenesis compared to less angiogenesis and neuroplasticity 9 . Environmental factors enhance transcriptional and epigenetic changes between ventral and dorsal part of the dentate gyrus that may have an effect on hippocampal volume 10 . Molecular pathways and genes affect the induction of neurogenic niche and neural/progenitor cell turnover to newborn neurons for the formation of the hippocampal structure during hippocampal neurogenesis.
To our knowledge, there is no study assessing the association of adult neurogenesis related pathways with hippocampal volume measured from MRI scans in living people. In this study, in order to investigate whether genetic variants associated with variation in hippocampal volume are enriched for neurogenesis-related pathways, we performed a gene set enrichment analysis using summary statistics from a large-scale human neuroimaging genetics meta-analysis from the Enhancing Neuro Imaging Genetics through Meta-Analysis (ENIGMA) Consortium (N~13,000). Neurogenesis is an important contributor to the formation of the hippocampus in mice but less is known about the relationship between human adult neurogenesis and hippocampal volume/atrophy.

Materials and Method
Enhancing neuro imaging genetics through meta-analysis (ENIGMA). The Enhancing Neuro Imaging Genetics through Meta-Analysis (ENIGMA) Consortium was initiated in December 2009. The research group involved in neuroimaging and genetics worked together on a range of large-scale studies that integrated data from 70 institutions worldwide. The goal of ENIGMA was to merge neuroimaging data with genomic data to identify common genetic variants that might affect brain structure. The first project of ENIGMA focused on identifying common genetic variants associated with hippocampal volume or intracranial volume (ICV) 11 . The aim of ENIGMA2, follow-on study of ENIGMA1, was to perform genome-wide association study (GWAS) using subcortical volumes as phenotypes 12 . In ENIGMA2, GWAS was conducted using mean hippocampal volume as a phenotype controlling for age, age 2 , sex, ancestry (the first four multidimensional scaling components), ICV, and diagnostic status, and MRI scanner (when multiple scanners were used at the same site), and genetic imputation were processed and examined by following standardized protocols freely available online (http://enigma.ini. usc.edu/protocols/imaging-protocols/). In this study, we used GWAS summary statistics in the discovery sample of 13,163 subjects of European ancestry from the ENIGMA Consortium 12 . 3,824 of the 13,163 participants (21%) have anxiety, Alzheimer's disease, attention-deficit/hyperactivity disorder, bipolar disorder, epilepsy, major depressive disorder or schizophrenia, and the remaining 9,339 (79%) are cognitively normal subjects.

Alzheimer's disease neuroimaging initiative (ADNI). The Alzheimer's Disease Neuroimaging
Initiative (ADNI) was launched in 2003 by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, the Food and Drug Administration (FDA), private pharmaceutical companies, and nonprofit organizations as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD, and recruited from 59 sites across the U.S. and Canada. ADNI includes over 1700 subjects consisting of cognitively normal older individuals (CN), significant memory concern (SMC), mild cognitive impairment (MCI) and Alzheimer's Disease (AD) aged 55-90 (http://www.adni-info.org/). The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. Participants for this study included 367 CN, 94 SMC, 280 early MCI, 512 late MCI and 310 AD. Demographic information, APOE, clinical information, neuroimaging and GWAS genotyping data were downloaded from the ADNI data repository (http://adni.loni.usc.edu). The CN group does not have any significant memory concern or impairment of their daily activities. The SMC group has self-reported significant memory concerns quantified using the Cognitive Change Index 13 and the Clinical Dementia Rating (CDR) of zero. Individuals with MCI and AD have to have memory complains. The range of Mini-Mental State Examination (MMSE) score was 24-30 for CN and MCI, and 20-26 for AD as well as objective memory loss measured by education-adjusted scores on Wechsler Memory Scale-Revised (WMS-R) Logical Memory II 14 . As diagnosis criteria, CDR score was used as 0 for CN, 0.5 for MCI with the memory box score being 0.5 or greater, and 0.5-1 for AD 15 . A composite memory score was calculated using Logical Memory and the Rey Auditory Verbal Learning Test (RAVLT), as well as memory items from the AD Assessment Scale -Cognitive (ADAS-Cog) and Mini-Mental State Examination (MMSE) 16 . Hippocampal volume was determined using MRI scans and FreeSurfer version 5.1 was used to extract hippocampal and total intracranial volumes (ICV) [17][18][19][20] . Table 1 shows selected demographic and clinical characteristics of these participants at baseline.
Genotyping data and quality control. The genotyping data of ADNI participants were collected using the Illumina Human 610-Quad, HumanOmni Express, and HumanOmni 2.5 M BeadChips. Standard quality control procedures of GWAS data for genetic markers and subjects were performed using PLINK v1.07 (pngu. mgh.harvard.edu/∼purcell/plink). Quality control procedures included excluding samples and SNPs with criteria including SNP call rate < 95%, Hardy-Weinberg equilibrium test p < 1 × 10 −6 , and frequency filtering (MAF < 5%), participant call rate < 95%, sex check and identity check for related individuals [21][22][23][24][25] . Non-Hispanic Caucasian participants were selected using HapMap 3 genotype data and the multidimensional scaling (MDS) analysis ( Supplementary Fig. 1) after performing standard quality control procedures for genetic markers and Gene-set enrichment analysis. Gene-set enrichment analysis using GWAS summary statistics was performed to identify pathways and functional gene sets with significant associations with hippocampal volume. All SNPs (n = 6,571,356) and subjects with European ancestry were included in this study. Pathway annotations were downloaded from the Molecular Signatures Database version 5.0 (http://www.broadinstitute.org/gsea/msigdb/ index.jsp/). This annotation data comprised a collection of Gene Ontology (GO). GO includes 1,454 pathways and is publicly available. 825 gene sets are assigned to GO biological processes, 233 gene sets are assigned to GO cellular components, and 396 gene sets are assigned to GO molecular functions. GSA-SNP software 28 uses a p-value of each SNP from GWAS summary statistics to test if a pathway-phenotype association is significantly different from all other pathway-phenotype associations. In GSA-SNP, all SNPs within each gene are considered in turn and the negative log of the p value is noted; all of these are ranked. To avoid spurious predictions, we used the SNP with the second highest negative log p value to summarize strength of association with each gene. Each pathway (gene set) was assessed by z-statistics for the identification of the enriched pathways 29 . Gene-set enrichment analysis was restricted to pathways containing between 10 and 200 genes. False discovery rate (FDR) with the Benjamini-Hochberg procedure was used for multiple comparison correction 30 . We identified as significantly enriched pathways with hippocampal volume with FDR-corrected p-value < 0.05.

Genetic association analysis.
Genome-wide gene-based association analysis using GWAS p-values was performed using KGG (Knowledge-based mining system for Genome-wide Genetic studies) software. KGG uses HYST (hybrid set-based test) to determine the overall association significance in a set of SNPs at the gene level. HYST is the combination of the gene-based association test using extended Simes procedure (GATES) and the scaled chi-square test 31,32 . First, SNPs in each gene were divided into different LD blocks depending on pairwise LD coefficients (r 2 ) for all SNPs. Second, for each block, a block-based p-value for association was calculated, and the key SNP was derived and marked. Next, the block-based p-values were combined accounting for LD between the key SNPs using the scaled chi-square 33 .
Targeted gene-based association analysis was performed using a set-based test in Plink v1.07 (http://pngu. mgh.harvard.edu/purcell/plink/) 22 . SNPs with p < 0.05 for each gene were chosen. A mean test statistic for each SNP within a gene was computed to determine with which other SNPs it is in linkage disequilibrium (LD); i.e., if the correlation coefficient between them was r 2 > 0.5. A quantitative trait analysis (QT) was then performed with each SNP. For each gene, the top independent SNPs (i.e., not in LD; maximum of 5) are selected if their p-values are less than 0.05. The SNP with the smallest p-value is selected first; subsequent independent SNPs are selected in order of decreasing statistical significance. From these subsets of SNPs, the statistic for each gene is calculated as the mean of these single SNP statistics 34 . The analysis was performed using an additive model or in other words, the additive effect of the minor allele on the phenotypic mean was estimated 22,35 . Covariates included age, sex, years of education, and diagnosis for composite scores for memory. An empirical p-value (20,000 permutations) was reported for each gene for multiple comparison adjustment 22 . Gene expression correlation analysis. We analyzed gene expression data in the hippocampal subfields, CA1 and CA3, from 32 normal controls brain samples in the Gene Expression Omnibus (GEO) repository at the National Center for Biotechnology Information (NCBI) archives. The Illumina HumanHT-12 v3 Expression BeadChip (48,803 probes) was used to measure expression of over 25,000 annotated genes. We processed gene expression data and removed the outliers as previously described 36 . We excluded probes if they were present in three or fewer samples or if they do not correspond to any gene symbol annotations. Lastly we removed duplicate probes for a gene and kept only the probe with the highest expression level. After all data cleaning process, 15,037 genes remained. We performed a weighted gene correlation network analysis (WGCNA) using processed expression data to identify clusters of highly correlated genes expressed in specific brain regions (CA1 and CA3) as modules. Pearson correlations between gene pairs were calculated. This matrix was transformed into a signed www.nature.com/scientificreports www.nature.com/scientificreports/ adjacency matrix by using a power function. Then, topological overlap (TO) was calculated by using the components of this matrix. Genes were clustered hierarchically by the distance measure, 1-TO, and the dynamic tree algorithm determined initial module assignments 37 . Gene module membership between each gene and each module eigengene was calculated. We tested these modules for enrichment of neurogenesis-related pathways.

Results
Gene-set enrichment analysis using large-scale GWAS summary statistics for hippocampal volume (N = 13,163) identified 44 significantly enriched biological pathways (FDR-corrected p-value < 0.05) ( Table 2) including 38 pathways related to neurogenesis (Supplementary Table S1). We classified the 38 neurogenesis-related pathways as primary (N = 19) and secondary (helper) (N = 19) based on existing knowledge and literature mining (Fig. 1). The primary neurogenesis-related pathways were related to cellular processes such as neuronal proliferation, differentiation and survival, cellular morphogenesis, axonogenesis, neuronal development, signal transduction, and cell-cell adhesion. The secondary neurogenesis-related pathways consisted of enzyme activities related to neurogenesis, metabotropic receptor activity, lipoprotein binding and extracellular matrix. Six pathways were not related to any neurogenesis-related process such as oxidoreductase activity, phagocytosis, perinuclear region of cytoplasm and cornified envelope.
Since the inhibition of neurogenesis could be relevant to hippocampal atrophy 38 , we also examined if neurogenesis-related pathways were enriched with hippocampal atrophy over two years from baseline in cognitively normal individuals without amyloid-β pathology based on [ 18 F]Florbetapir PET or CSF amyloid-β measurement (N = 112) in ADNI. Seven pathways related to neurogenesis processes were significantly enriched with hippocampal atrophy (FDR-corrected p-value < 0.05) in cognitively normal adults (Supplementary Table S2). These pathways were related to cellular differentiation, cellular morphogenesis during development, neurite development, axonogenesis, cell-cell adhesion and neuron development (Table 3).
Furthermore, we performed targeted gene-based association analysis of hippocampal neurogenesis related pathway associated candidate genes using ENIGMA GWAS summary statistics 31 . The gene-based analysis revealed that 4 genes (MSRB3, TESC, DPP4, and ACVR1) were significantly associated with hippocampal volume (corrected p-value < 0.05; Table 4). Since hippocampal volume is correlated with memory performance, we performed an association analysis of these four genes (with 682 SNPs) with composite memory scores in ADNI. The gene-based association analysis showed that TESC is significantly associated with composite memory scores after adjusting for multiple testing (p-value = 5.7 × 10 −3 ; Table 5). One novel SNP (rs117692586) upstream of TESC was significantly associated with composite memory scores (p-value = 4.3 × 10 −4 ; Table 6). rs117692586-T is associated with poorer memory performance (Fig. 2).
Finally, we analyzed gene expression data in the Gene Expression Omnibus (GEO) repository to investigate if neurogenesis-related pathways were enriched in the CA1 and CA3 regions of the hippocampus in normal controls. A weighted gene correlation network analysis yielded 20 modules of co-expressed genes. These 20 modules were tested for enrichment of neurogenesis-related pathways. Six modules were found to be significantly enriched with neurogenesis-related pathways after adjusting for multiple testing. The six significantly enriched modules are all related to neurogenesis-related pathways such as neuronal proliferation and differentiation as well as cellular process (Table 7).

Discussion
Using large-scale GWAS summary statistics for hippocampal volume in 13,163 subjects of European ancestry from the ENIGMA Consortium, we performed gene-set enrichment analysis to identify 44 pathways with enrichment for hippocampal volume. These enriched pathways showed that genes associated with variation in hippocampal volume are related to neurogenesis and cellular processes including neuronal cell proliferation, differentiation and maturation as well as cell adhesion. In addition, co-expression network-based functional analysis of gene expression data in the hippocampal subfields, CA1 and CA3, from 32 normal controls showed that co-expression modules were mostly enriched in neurogenesis-related pathways.
The enriched pathways showed significant relationships between neurogenesis and hippocampal volume/ atrophy. Since several studies showed neurogenesis occurs in the dentate gyrus of the hippocampus 4,39 , it is not surprising that hippocampal volume is significantly related to neurogenesis-related pathways. In particular, we observed significant enrichment of pathways related to cell proliferation, neuron differentiation, neuron generation, neurite development, neuronal development, cell recognition, neurogenesis and axonogenesis. The neural progenitor cells in the subgranular zone of the hippocampus differentiate and incorporate into neural network circuitry as mature neurons in the adult human brain 4 . In addition, these newly developed neurons enhance the formation of the hippocampus during neurogenesis and many genes are involved in these processes 40,41 . Moreover, our pathway enrichment analysis found that hippocampal volume is significantly related to signal transduction processes such as glutamate signaling, protein kinase signaling, and the Jun N-Terminal Kinase (JNK) cascade. Previously we identified five neurogenesis related pathways and the signal transduction pathway was one of the important pathways in adult neurogenesis processes 3 . During adult neurogenesis, functional granule cells in the dentate gyrus of the adult hippocampus release glutamate, project to target cells in the CA3 region, and receive glutamatergic and γ-aminobutyric acid (GABA)-ergic inputs to control their spiking activity in neuronal networks that support the formation of memory and learning 42,43 . Phosphoinositide 3-kinase (PI3K)/protein kinase pathways enhance neuronal differentiation and inhibit apoptosis of progenitor cells 44,45 . In addition, studies showed that JNK1 in the JNK cascade plays a role in neuronal differentiation and neuronal and axonal maturation [46][47][48] . Also, it has been shown that absence of JNK1 enhances hippocampal neurogenesis and reduces anxiety-related phenotypes in mouse models 46 .
Pathways related to enzyme activities such as protein tyrosine kinases, protein tyrosine phosphatases and 3'5' cyclic nucleotide phosphodiesterases were enriched for hippocampal volume. Studies showed that three www.nature.com/scientificreports www.nature.com/scientificreports/ subfamilies, Tyro3, Axl and Mertk (TAM), of receptor protein tyrosine kinases play a crucial role in adult neurogenesis. TAM receptors impact proliferation and differentiation of neural stem cells to immature neurons by controlling overproduction of pro-inflammatory cytokines 49 . Protein tyrosine phosphatases control neural stem cell differentiation during neurogenesis 50 .
Our results revealed the influence of neurogenesis pathway-related genetic variation on hippocampal volume. Particularly, two genes, tescalcin (TESC) and activin receptor 1 (ACVR1), were significantly associated with hippocampal volume. In addition, TESC was significantly associated with memory performance. Previous structural neuroimaging studies showed TESC-regulating polymorphisms are significantly associated with hippocampal volume and hippocampal gray matter structure 11,51 . TESC cooperates with the plasma membrane Na(+)/H(+) exchanger NHE1 that catalyzes electroneutral influx of extracellular Na(+) and efflux of intracellular H(+) and establishes intracellular pH level as well as cellular hemostasis 52,53 . TESC was expressed in tissues such as heart www.nature.com/scientificreports www.nature.com/scientificreports/ and brain and plays an important role during embryonic development 53 . TESC plays a crucial role in controlling cell proliferation and differentiation for the formation of the hippocampal structure during brain development 51 . In addition, ACVR1, a member of a protein family called bone morphogenetic protein (BMP) type I receptors,     www.nature.com/scientificreports www.nature.com/scientificreports/ regulates the hippocampal dentate gyrus stem cells during neurogenesis 54 . In addition, our gene co-expression analysis showed that TESC and ACVR1 were co-expressed together in the neurogenesis pathway-related module.
A limitation of the present report is that we used Gene Ontology pathways from MSigDB. For a pathway enrichment analysis design, there is no gold standard. There are several tools and strategies for pathway enrichment analysis, and alternate databases and algorithms for pathway enrichment analysis can affect the analytic results 55,56 . Another limitation is the lack of replication in the gene-set enrichment analysis, even though we used a large-scale GWAS result (N = 13,163). Replication in independent samples will be important. It is noteworthy that recently, Sorrell et al. reported that human hippocampal neurogenesis drops sharply in childhood to undetectable levels in adults, although some aspects are still under controversy 57,58 , but Boldrini et al. reported that healthy older adults display preserved neurogenesis 9 .
In summary, our results suggest that neurogenesis-related pathways may be enriched for hippocampal volume and that hippocampal volume may serve as a potential phenotype for the investigation of human adult neurogenesis. Genetic variation in neurogenesis pathway-related genes may have compensatory advantages or confer vulnerability to biological processes during adult neurogenesis but studies are needed to identify mechanisms by which genetic variants affect neural stem cells differentiation, proliferation, and their maturation to new neurons in human brain.

Data Availability
The data analyzed in the study are available from the ADNI website (http://adni.loni.usc.edu/) and the ENIGMA website (http://enigma.ini.usc.edu/).  Table 6. SNP-based association analysis results in TESC for composite scores for memory in ADNI. Figure 2. rs117692586 in TESC is significantly associated with composite scores for memory. Subjects with at least one copy of the minor allele (T) of rs117692586 showed poorer memory performance compared to those without the minor allele (p-value ≤ 0.001).