Genome-wide meta-analyses reveal novel loci for verbal short-term memory and learning

Understanding the genomic basis of memory processes may help in combating neurodegenerative disorders. Hence, we examined the associations of common genetic variants with verbal short-term memory and verbal learning in adults without dementia or stroke (N = 53,637). We identified novel loci in the intronic region of CDH18, and at 13q21 and 3p21.1, as well as an expected signal in the APOE/APOC1/TOMM40 region. These results replicated in an independent sample. Functional and bioinformatic analyses supported many of these loci and further implicated POC1. We showed that polygenic score for verbal learning associated with brain activation in right parieto-occipital region during working memory task. Finally, we showed genetic correlations of these memory traits with several neurocognitive and health outcomes. Our findings suggest a role of several genomic loci in verbal memory processes.


INTRODUCTION
The ability to focus attention and to encode, store, and recall information are not only imperative for survival but these memory-related cognitive processes also reflect healthy brain aging [1,2]. Cognitive decline, especially episodic memory impairment, is a clinical hallmark and genetic endophenotype of several types of dementia, especially Alzheimer's disease (AD) [3]. Understanding the genetic and molecular basis of inter-individual variation in normal memory function could improve precision in screening for dementias, and identify novel drug targets to support cognitive reserve, and to prevent and treat dementia.
Both episodic memory in cognitively normal individuals [3,4] and AD [5] show moderate to high heritability in twin studies. Large-scale genome-wide association meta-analyses (GWAMAs) across several cohorts have identified over 30 genomic loci for AD [6], but GWAMAs for episodic memory among dementia-free adults have shown less consistent findings [7][8][9][10][11][12][13][14][15][16][17]. In the largest GWAMA of episodic memory, Davies et al. [17] did not find any significant genomic variants for visuo-spatial memory in the UK Biobank sample of 112,067 persons. As visuo-spatial encoding of information involves partially different brain networks compared to verbal encoding [18], genomic architecture of visuo-spatial memory and verbal memory may differ. Indeed, an earlier GWAMA from the CHARGE consortium showed that rs4420638 at 19q13.3 near the APOE-APOC1-TOMM40 locus, that shows the largest known effects on AD [6], was associated with verbal long-term memory (delayed recall) in a sample of 29,076 persons [7]. There is ample evidence for differences in brain networks and thus, genetic networks, that are involved in long-term and short-term episodic memory processes [19]. A relatively small (N = 7486) genome-wide association study of immediate recall scores in tests of verbal episodic memory (verbal short-term memory; VSTM), however, detected the same APOE-APOC1-TOMM40 locus [16]. GWAMAs with considerably larger sample sizes are needed to find novel loci beyond this locus.
Therefore, we examined if common genetic variants were associated with verbal episodic memory in adults of European ancestry without dementia or stroke in the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium. We operationalized VSTM as immediate recall scores in tests of verbal episodic memory and conducted a GWAMA in a sample of 53,637persons (32 cohorts). As verbal learning (VL) tasks may constitute a more sensitive marker of cognitive deficits than tests of VSTM without a learning component [20] and to our knowledge, only one small (N = 700) GWAMA for VL exists [15], we also examined genetic underpinnings of VL in 32,762 persons (19 cohorts). To assess the functional role of the identified variants, we analyzed fMRI activations during working memory performance and computed genomic associations.

RESULTS
The characteristics of the study cohorts, details of memory tests administered, genotyping quality control and genomic inflation factors are shown in Supplementary information S1-S4 and Supplement 1.
Due to differences in verbal memory tests used in the different cohorts, we performed sample-size based meta-analyses using METAL [21]. All models were adjusted for age, sex, and population substructure. Table 1 shows results for the lead SNPs and Figs. 1-3 shows regional plots of genome-wide significant associations. Supplementary Figs. 1-7 show Manhattan plots of all genomic associations and Supplementary Table S1 shows all genome-wide significant (p < 5 × 10 −8 ) and suggestive (5 × 10 −8 ≥ p < 5 × 10 −6 ) associations in the discovery sample.
For VL, we observed significant associations at the same 19q13.3 locus and at 3p21 in the discovery sample (N = 28,909). At the 19q13.3 locus the strongest associations were observed with rs4420638 (p = 1.8 × 10 −12 ) and rs6857 (p = 2.0 × 10 −9 ) that are in linkage disequilibrium (LD; r 2 : 0.45) with each other. The 3p21 locus harbors a large LD block in/near NT5DC2, STAB1, ITIH1, ITIH4, and PBRM1. Out of 14 SNPs showing a significant association at this locus, rs4687625, within an intron of NT5DC2, and a synonymous ITIH4 variant rs2276816 were independently significant SNPs (r 2 : 0.12, distance: 297 kb). Three of the significant 3p21 SNPs (rs4687625, rs2015971, and rs11711421; all intronic to or near NT5DC2) showed nominally significant association with VL scores in an independent replication sample (p values < 0.01; N = 3853).
Despite some heterogeneity between the cohorts in 19q13.3 SNPs (rs4420638 and rs6857), no single cohort drove the results . We further examined with metaregression if cohort-level characteristics influenced estimates of the association between these SNPs and memory test scores.
Larger effect estimates in both 19q13.3 SNPs associated with smaller proportion of women in the cohort and rs4420638 effect estimates for VL associated with younger mean age of the cohort (Supplementary Table S16).
There were no other significant signals in the analyses combining discovery and replication cohorts (Supplementary  Table S8).
Analyses stratified by the type of the memory test As in Debette et al. [7], we further meta-analyzed cohorts based on the specific type of memory test applied. In the analyses of VSTM, cohorts were classified into those with paragraph recall test data (13 cohorts, N = 19,420) and those with word list recall test data (14 cohorts, N = 25,454). In the analyses of VL, cohorts were classified into those with orally presented words (11 cohorts, N = 12,593) and those with visually presented words (11 cohorts, N = 16,191).
In the analyses restricted to cohorts with the VSTM paragraph recall tests, we observed a novel locus in an intergenic region at 13q21 (lead SNP rs9528369, p = 2.0 × 10 −9 ) and a second locus at 19q13.3 (lead SNP rs4420638, p = 4.2 × 10 −12 ). Additionally, rs4420638 showed a significant association with VL in those cohorts with visually presented words (p = 3.1 × 10 −9 ). Of these results, we were able to replicate the association of rs4420638 with paragraph recall (p = 1.4 × 10 −4 ) in an independent replication sample (N = 4293). There were no significant associations in the other stratified meta-analyses.
Analyses adjusting for educational attainment Following Debette et al. [7], we ran secondary analyses to test if associations were independent of education. All associations in the significant lead SNPs remained significant after further adjusting the models for educational attainment except that the associations of rs4687625 (p = 8.8 × 10 −7 ) and rs2276816 (p = 5.3 × 10 −6 ) at 3p21 with VL became only suggestively significant.
Gene-property analysis tests if tissue-specific expression is predictive of the association of the gene with the phenotype. These analyses indicate that genes with the highest expression levels in the pituitary and all available brain regions, except for the rostral intracranial portion of the spinal cord, were the same genes showing significant associations with VSTM and with paragraph recall, but not with VL (Supplementary Table S4).

Functional analyses and colocalization
We identified potential functionality of SNPs showing significant associations with FUMA [22] (Supplementary Tables S5 and S6). Fourteen SNPs at the 3p21 locus that associated with VL are significant eQTLs for POC1A, GNL3, GLYCTK, DUSP7, ITIH4, PPM1M, and GLT8D1 in putamen, cerebellum, frontal cortex, and/or hippocampus in the Genotype-Tissue Expression (GTeX) and in putamen, white matter, and/or hippocampus in the Brain eQTL Almanac (Braineac) database. Of these, rs2276816 is also a synonymous exonic SNP with a Combined Annotation Dependent Depletion (CADD) score indicating a potential functionally deleterious effect (CADD > 12.37) [23]. Additionally, rs1961958,  All models were adjusted for sex, age, and population substructure and results reflect analyses in participants of European ancestry without dementia or stroke. that associated with VL, and rs11148561, that associated with paragraph recall, have high CADD scores. Moreover, 3p21 locus SNPs rs4687625, rs1961959, rs6798246, and rs3774355, that associated with VL, also may influence gene regulation as indicated by both eQTL data and transcription factor binding data (regulomeDB category 1f [24]   Epigenomics Project brain tissue samples. In these same brain samples, the intergenic 13q21 region implicated in the paragraph recall analyses interacts with the promoter region of TDRD3. This same region also interacted with the PCDH20 gene region in nonbrain tissue samples. Using S-PrediXcan [25], after Bonferroni correction for multiple testing we identified a single gene (POC1A) whose expression in the putamen was negatively associated with VL (Z = −5.02; p = 5.04 × 10 −7 ) whereas no significant associations were observed for VSTM (Supplementary Table S7 and Supplementary Fig. 20).

VSTM
Finally, we tested with polygenic scores (PGSs) the overall association of VSTM (PGS VSTM ) and VL (PGS VL ) with brain activation assessed via fMRI during a working memory task in 435 healthy participants in the Clinical Brain Disorders Branch Sibling Study. The intermediate PGS VL (SNP inclusion p value < 10 −4 ) correlated negatively with activity in a right parieto-occipital cluster with a peak in BA19 (peak Z = 4.73; p FWE = 0.016; 55 voxels; MNI coordinates x = 45; y = −64; z = 10; Fig. 4). At a lower p < 0.001 (uncorrected) threshold, a symmetric cluster was significant on the left with a peak in BA39 (peak Z = 3.55; 24 voxels; MNI coordinates x = −45; y = −58; z = 13; Fig. 4). No results survived correction for multiple comparisons using the PGS VSTM.

Protein-protein interactions
We investigated protein-protein interactions with DAPPLE [26] and results are presented in Supplementary Table 12. Fourteen, 30, and 11 proteins were included in the network construction for VSTM, VL, and paragraph recall, respectively, but six, 16, and two proteins were present in direct or indirect networks, respectively. None of the network parameters were significant. In the analyses of single proteins, SYT9 and NRXN1 were significant for VSTM (p = 0.006), ZFAND5, GRIK2, and ZC3H18 were nominally significant for VL (p = 0.018-0.05), and PRLHR was nominally significant for paragraph recall (p = 0.044).

Consistency of findings with earlier studies
As our results might reflect genetic effects on more general cognitive abilities, we also show the GWAS results for visuo-spatial memory test scores in the UK Biobank sample (N = 336,881; http:// www.nealelab.is/uk-biobank) and Davies et al (2018) [28] GWAMA results for GCA in the Supplemental Table 1. Only SNPs in 3p21 showed significant association with GCA implying that associations between CDH18, 13q21, and 19q13.3 SNPs with VSTM and VL are not secondary to the effect of this loci on GCA or general memory processes, but may show specificity to verbal episodic memory. However, as the UK Biobank memory test has showed low test-retest reliability, these results need to be interpreted with caution [29]. Further, we examined if the top SNPs of this study also linked with brain structure [30][31][32] and function [33] in previous GWA studies (Supplementary Table S14). We noticed that all our 3p21 top SNPs were associated with smaller intracranial volume and larger alpha oscillation during rest and both 19q13.3 (APOE-TOMM40-APOC1) SNPs linked with smaller volumes of hippocampus, amygdala, and nucleus accumbens.

DISCUSSION
We studied if common genetic variants associated with VSTM and VL in 53,637 adults without history of stroke or dementia within the CHARGE consortium. We identified four novel loci for VSTM/ VL. The top SNPs showed wide range of functional properties in the brain tissues: Some were eQTLs, meQTLs, or associated with tau or amyloid accumulation in the brain, and an aggregate polygenic score for VL associated with working memory activity in the right parieto-occipital cortex.
The first novel peak for VSTM locates at 5p14.3 and encompasses rs425724, an intronic SNP within CDH18 (aka CDH14 and CDH24) as the lead SNP. Functional effects of rs425724 remain poorly known, but Hi-C chromosomal interaction tests suggest that it may influence regulation of CDH18 expression. CDH18 is specifically expressed in the brain [34] and it belongs to the Type II classic cadherin family, which is involved in neuronal cell-adhesion [35]. Cadherins are critically important in the development of cells and synapses early in life, and in maintaining neuronal and synaptic structure in mature synapses [36]. Cadherins are also suggested to play a central role in synaptic plasticity in general, and in long-term potentiation (LTP), the molecular basis of learning and memory, in particular [37,38]. Cadherin-related alterations in LTP have been demonstrated in pharmacological, gene knockout, and RNAi experiments [39,40], but little is known about the role of genomic variation in cadherin genes in memory processes in humans. We report that rs425724 may affect specifically processing of verbal information. Interestingly, a variant in CDH13 associated with verbal but not spatial working memory in patients with ADHD [41], pointing again towards modality specificity. Some studies exist linking cadherin genes with neurodevelopmental outcomes (Supplement 1).
We also discovered a new locus for VL in 3p21 containing 14 SNPs in high LD in a~300 kb region that showed significant associations with VL. Of these variants, we replicated rs4687625 and rs2015971, both intronic to NT5DC2, and rs2015971, which is intronic to STAB1. This locus harbors several genes and genebased analyses implicated 11 genes (NT5DC2, STAB1, ITIH1, ITIH4, PBRM1, SMIM4, NEK4, GLT8D1, ITIH3, MUSTN1, and GNL3). We identified several potentially functional variants at this locus. All significant 3p21 SNPs are either intronic or exonic, are significant eQTLs and mQTLs in brain tissues, and link with brain intracranial volume [30] and alpha oscillation [33] in the previous studies. Some are also considered deleterious or regulatory. Moreover, the locus is in a transcriptionally active region and, finally, SNP associations of 3p21 variants with VL colocalized with imputed expression of POC1A in the putamen. The putamen is part of a cortico-striatal loop and it receives input from different parts of the cortex and projects back to the cortex via the globus pallidus and thalamus. Traditionally it has been linked with motor control functions, but recently both neuroimaging studies [42,43] and studies on effects of focal lesions [44] have suggested an additional role in memory functions. Prior studies have associated SNPs at 3p21 locus with various neurodevelopmental outcomes, such as GCA [28] and schizophrenia [45], but causal variant(s) are not known and in the studies with functional analyses, no specific gene has been conclusively shown to account for the many association findings at this locus (Supplement 1). Interestingly, a recent study reported an association between GLT8D1-variant rs6795646 and working memory in healthy Chinese persons [46].
We observed a third novel locus in the intergenic region in 13q21 in meta-analyses of discovery sample cohorts with paragraph recall tests to measure VSTM. The lead SNP was rs9528369 and the locus harbors 36 other significant SNPs. Again, the causal SNP or gene underlying this association is not known, but earlier studies point towards influences of this locus on language processing [47] and educational attainment [48] (Supplement 1). In line with this, rs9528369 showed no association with visuo-spatial memory test performance in the UK Biobank sample (http://www.nealelab.is/uk-biobank). Functional influences of this locus remain poorly understood, but rs9528369 was a mQTL in the dorsolateral prefrontal cortex and Hi-C analyses of this study showed chromatin-chromatin interactions with the promoter region of TDRD3 in brain tissue and PCDH20 in other tissues. TDRD3 is part of the TOP3beta-TDRD3-FMRP complex, and TOP3beta deletion was recently linked with schizophrenia, cognitive impairment, and learning difficulties [49], while lack of FMRP causes the Fragile X syndrome characterized by severe learning deficits and mental retardation.
In line with Debette et al. [7] in the GWAS for long-term verbal memory, we showed that rs4420638 in the APOE-TOMM40-APOC1 locus at 19q13.3 is associated consistently with VSTM, especially paragraph recall, and overall VL and visually presented VL test scores. Also, rs6857 associated with VL. It is near PVRL2 and locates 30 kb downstream from rs4420638 and is in LD with rs4420638. Both significant SNPs are located near transcriptionally active region, associate with tau accumulation, and with the size of the memory-relevant regions (e.g., hippocampus) [31,32]. Prior studies have linked many SNPs in this locus with a variety of cognitive outcomes and dementias although not previously with VSTM or VL in cognitively normal adults (Supplement 1) [6,7,16]. These various signals may merely reflect an impact of genetic variation at the APOE locus or suggest that additional genes in this region are involved in episodic memory, but this distinction requires functional studies; the strong LD in this region precludes further conclusions based solely on genetic association studies. We also showed gene-level associations and significant enrichment with genes expressed widely in the brain, especially in the cerebellum and the frontal cortex for VSTM, and the cerebellum and striatal nuclei for paragraph recall -a pattern that parallels one shown recently for the GCA [28]. Gene-based analyses implicated AGXT2 and CALN1 for VL, while analyses of protein-protein interactions implicated synaptic proteins previously associated with Alzheimer disease biology, SYT9 and NRXN1 for VSTM; ZFAND5, GRIK2, and ZC3H18 for VL; and PRLHR for paragraph recall. There is some evidence that AGXT2, CALN1, NRXN, and GRIK2 may influence neurodevelopmental outcomes (Supplement 1).
Previous fMRI studies on short-term word list recall associated performance with a network of brain regions including the medial temporal lobe, superior temporal gyrus, medial and inferior parietal cortex, and dorsolateral prefrontal cortex [50,51]. Within this network, joint analysis of episodic and working memory tasks observed the involvement of the prefrontal cortex, supplementary motor area, and bilateral ventral posterior parietal cortex spanning into the extrastriate cortex [52]. Consistently, here we show that a polygenic score for VL associated with activity in the posterior parietal and extrastriate cortex during the N-back fMRI task. This association was not due to years of education. This visual association area is active during recognition memory [53,54]. The association had a negative direction, consistent with N-back performance data which correlate negatively with frontoparietal network activity in healthy individuals. [55,56]  The heritability estimates of~6% for VSTM and 18% for VL are in line with a recent phenome-wide study that showed SNP-based estimates between 6% and 11% for visuo-spatial memory in the UK Biobank [57]. Moreover, our estimates are in line with a twin study showing lower estimates for VSTM than for VL [4]. In our study, VSTM and VL showed strong positive genetic correlations with each other and with GCA in adulthood, completion of college, and years of schooling, consistent with recent findings from the UK Biobank [58]; and VSTM with childhood GCA and VL with anorexia nervosa and father's age at death. VSTM also showed negative genetic correlation with coronary artery disease, in agreement with a previous study showing a negative association between a polygenic risk score for cardiovascular disease and verbal short-term memory [59]. To our knowledge, no previous studies have suggested a shared genetic background between verbal episodic memory and anorexia nervosa. However, anorexia nervosa shows positive genetic correlation with years of education and attending college [60] and children born to mothers with anorexia nervosa have shown increased working memory capacity [61].
There are limitations to our study. Heterogeneity in the testing methods and phenotypes across cohorts may have hindered our ability to find associations. Since majority of the samples (91.2% for VSTM and 93.3% for VL) were imputed against the HapMap2 reference panel resulting in~2.5 Million SNPs in the metaanalyses, re-analyses with higher resolution genotyping is warranted. Moreover, despite reporting GWAMA results of the largest sample with VSTM and VL, our study is still underpowered to detect all genomic variation related to verbal episodic memory and larger studies are needed. Finally, as VSTM and VL showed strong genetic correlation with GCA, it is possible that our results reflect genomic influences on GCA. However, there are several lines of evidence against this: of several cognitive abilities, memory has shown largest unique genetic variance [62], adjusting for educational attainment only marginally altered our results, and finally, of our lead SNPs only those in a highly pleiotropic region at 3q21 were implicated in the recent GWAS for GCA [28].
To sum up, we report the results of the largest GWAMA of verbal episodic memory. We show novel genome-wide significant associations between common SNPs in four loci, CDH18, 3p21, 13q21, and 19q13.3, and VSTM and VL, and link combined polygenic variation for VL with brain activity during working memory task in the parietooccipital cortex. Whereas many SNPs in these loci, especially in 3p21 and in 19q13.3, have been linked to other neurocognitive outcomes and show functional significance and associations with brain structure and function, their exact biological role needs to be studied further. We also show moderate SNP-based heritability and high genetic correlation of these memory traits and GCA, as well as coronary artery disease and anorexia nervosa suggesting some shared biology. These results improve our understanding of the biology underlying learning and memory and could lead to improved risk stratification scores and new drug targets for preserving memory, and preventing or treating dementias.

ONLINE METHODS Participants
This study comprised 37 cohorts and 53,637 adult participants (age > 18 years) of European descent brought together by the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium. Exclusion criteria included clinical stroke and any form of prevalent dementia.
The discovery sample comprised 44,874 participants from 27 cohorts for VSTM and 28,909 participants from 22 cohorts for VL. Replication samples comprised 8763 participants (five cohorts) and 3853 participants (two cohorts) for VSTM and VL, respectively. All studies were approved by their institutional ethics review committees and all participants provided written informed consent. Characteristics of the study cohorts are shown in Supplementary information Table 1 and Supplement 1.

Phenotypes
All verbal memory tests are standardized and validated and have shown psychometrically adequate properties. Cognitive tests were administered by trained personnel following standardized protocols and blind to genetic information. To assess VSTM, cohorts administered either word list tests, e.g., the California Verbal Learning Test (CVLT), or paragraph tests, e.g., the Paragraph/Story recall test in the Wechsler Memory Scale (WMS) test battery, with immediate recall (Supplement 1 and Supplementary information Table 3). In all tests, participants were asked to recall as many words or story elements as possible immediately after their presentation.
In addition, some of the word list tests, e.g., CVLT, RAVLT, and CERAD, included assessment of VL. In these tests, the recalled material was presented, either orally or visually, and recalled more than once, hence the tests are tapping into the ability to learn across trials. In these tests, the first round of recall was also used in the VSTM analyses. Thus, these cohorts contributed both to the VL meta-analyses and to the VSTM meta-analyses.
We decided a priori to run meta-analyses combining all cohorts with verbal episodic memory tests with immediate recall (VSTM) and another meta-analyses across cohorts that administered tests of verbal learning with immediate recall (VL). Following Debette et al. [7] we also ran additional meta-analyses combining only the cohorts that administered similar tests. In these meta-analyses, we combined cohorts with word list tests with immediate recall (VSTM word list), paragraph tests with immediate recall (VSTM paragraph recall), verbal learning tests with orally presented material (VL orally presented words), and finally, verbal learning tests with visually presented words (VL visually presented words).

Genotyping, QC, and imputation
Genome-wide genotyping was conducted in each cohort on several platforms following manufacturer protocols. Quality control was performed independently for each study. In addition, each group performed genotype imputation with appropriate software using the HapMap Phase II release 22 reference panel (70% of the cohorts) or 1000 Genomes, Phase 1, Release v3 panel.
To harmonize the datasets, we updated the SNP IDs in those cohorts with HapMap Phase II imputation to match 1000 genomes, phase 1, release v3 panel (hg 19) by using LiftOver tool. Imputation quality scores for each SNP were obtained from IMPUTE ("proper_info") or MACH ("rsq_hat"). Details on the genotyping are presented in Supplementary Information Table 2.
Cohort-level genome-wide association analyses Each cohort applied multiple linear regressions with additive genetic effect models to test for phenotype-genotype association using~2.5 million genotyped and/or imputed autosomal SNPs (cohorts with HapMap II imputation) and 10-12 million SNPs in cohorts with 1000 genomes, phase 1 imputation. In our primary model, we adjusted for sex, age, population substructure, and study-specific covariates if deemed appropriate such as clinical center for multi-center cohorts. Furthermore, in family-based studies we fitted familial relationships, if necessary. In the secondary model, we adjusted for primary model covariates and educational attainment.

Meta-analyses and detection of genomic risk loci
We performed quality control of the cohort-level summary statistics before the meta-analyses with the QCGWAS R package, version 1.0-8 [63], in the cohorts with HapMap II imputed data and EasyQC version 9.0 [64] in the cohorts with 1000 Genomes imputed data. We conducted the meta-analyses using METAL software [21]. We used the sample-size weighting and fixed effect model approach. We ran meta-analyses first separately in the discovery and replication samples and then in the combined sample including both discovery and replication cohorts. At the meta-analysis stage, we filtered out SNPs with low minor allele frequency (MAF <1%), poor imputation quality (proper_info <0.4 for IMPUTE and rsq_hat <0.3), or small sample size in the meta-analyses (N < 4000). We applied genomic control correction. A threshold of p < 5 × 10 −8 was pre-specified as genome-wide significant, while a threshold of p < 1 × 10 −6 was considered suggestive genome-wide significant. We used lambda values and quantile-quantile (Q-Q) plots of observed versus expected -log10(P value) to examine the genome-wide distribution of P values for signs of excessive false positive results. Genomic inflation factors are shown in Supplementary Information Table 4.
We applied FUnctional Mapping and Annotation of genetic associations (FUMA) [22] with default values to detect individual significant SNPs (p < 5 × 10 −8 and independent of other genome wide significant SNPs at r 2 < 0.6) and corresponding genomic risk loci (independent significant SNPs with r 2 ≥ 0.1 and distance <250 kb are assigned to the same genomic risk locus) based on the meta-analysis results.
We also report associations on visuo-spatial memory test scores (variable #399, "Number of incorrect matches in round") in the UKBiobank sample (N = 336,881; http://www.nealelab.is/uk-biobank) and on GCA in the Davies et al. [28] for those SNPs showing at least suggestively significant results (p < 5 × 10 −6 ) in our discovery cohort.
Additionally, we tested if the top SNPs reaching genome-wide significance associated with i) methylation levels in the dorsolateral prefrontal cortex (DLPFC) in the participants of the ROSMAP cohort (N = 322) and ii) brain amyloid and tau burden in a sample of 183 persons from the Framingham Heart Study (FHS) Third Generation cohort (mean age 46 ± 8years, 44% women) who underwent positron emission tomography (PET) imaging (Please see Supplement 1 for methods).
Gene-based, gene-set, and gene property analyses We performed gene-based association analysis with MAGMA (v1.6) [72] with default settings as implemented in FUMA [22]. SNPs were assigned to protein coding genes obtained from Ensembl build 85. We applied Bonferroni correction and genomewide significance was set at 2.777 × 10 −6 (0.05/18,007).
We also performed MAGMA (v1.6) [72] competitive gene-set analysis, using the results of the gene-based analyses, to examine whether genes in a gene-set are more strongly associated with VSTM and VL than other genes. A total of 10,655 gene sets (curated gene sets: N = 4738, GO terms: N = 5917) from MsigDB v6.1 [73] were used. We applied Bonferroni correction and genome-wide significance was set at 4.69 × 10 −6 (0.05/10,655).
In addition, we performed MAGMA tissue expression analysis as implemented in FUMA with default settings and GTEx v7 gene expression data. This test examines the (positive) relationship between highly expressed genes in a specific tissue and genetic associations with those phenotypes showing significant genes (VSTM, VL, and VSTM tests with paragraph recall).

S-PrediXscan analyses
We used S-PrediXcan [25] to integrate eQTL information with GWAS summary statistics to identify genes for which genetically predicted expression levels are associated with VSTM and VL. We used expression weights derived from 13 brain tissues in the GTEx v7 database and LD information from the 1000 Genomes Project Phase 3 [74]. These data were processed with beta values and standard errors from the VSTM and VL GWAS to estimate the expression-GWAS association statistic. We used a transcriptomewide significance threshold of p < 1.10 × 10 −6 , which is the Bonferroni-corrected threshold when adjusting for all brain tissues and genes and visualized the colocalization (if any) with locus compare plot (http://locuscompare.com/ /accessed 17.5.2019).
PGS VSTM , PGS VL, and brain activity during 2-Back working memory task To compute the short-term memory (PGS VSTM ) and verbal learning (PGS VL ) polygenic scores, we obtained betas associating allele dose with performance for 115,414 and 57,689, respectively, linkage disequilibrium-independent (R 2 < 0.1) index SNPs. We then computed a weighted sum of the cumulative SNP effects by summing the imputation probability for the reference allele of the index SNP, weighted by the effect size of association with performance, at each independent locus across the genome, as described elsewhere [75]. We analyzed fMRI data of 435 healthy adult (≥18 years) volunteers of Caucasian ancestry who participated in the Clinical Brain Disorders Branch Sibling Study of schizophrenia (Supplement 1). Participants were genotyped according to standard procedures. In the PGS, we included SNPs at whole-genome (p = 5 × 10 −8 ), intermediate (p = 10 −4 ), and nominal significance levels (p = 0.05). Participants performed the N-back task during fMRI (block design version: 2-Back vs. 0-Back, lasting 240 s) working memory (WM) task. This task is widely used in imaging genetics studies [76][77][78]. fMRI data collection, preprocessing, and analysis followed standard procedures (Supplement 1) [79]. We used SPM12 to perform multiple regression analyses using PGSs as predictors. We report results surviving p FWE < 0.05 threshold at whole brain level masked by task activity with a minimum cluster extent of 10 voxels (Supplement 1). Results are illustrated at p < 0.001 (uncorrected) in Fig. 4.

Protein-protein interactions with DAPPLE
We investigated a possible causal role for genes at the loci associated with VSTM and VL by searching for physical connections between proteins encoded by genes within these loci. The hypothesis is that causal genetic variants are likely to affect common mechanisms and these mechanisms may be revealed by these protein-protein interaction (PPI) networks. We performed the analyses using Disease Association Protein-Protein Link Evaluator (DAPPLE) [26] in GenePattern. DAPPLE searches for PPI in the InWeb database and assigned a probabilistic score. The InWeb database collects PPI data reported in the literature from numerous sources including IntAct, Reactome, the Molecular Interaction Database (MINT), the Biomolecular Interaction Network Database (BIND) and the Kyoto Encyclopaedia of Genes and Genomes (KEGG). DAPPLE constructs PPI networks where proteins are nodes and interactions in the InWeb databases are edges connecting the nodes. Input SNPs are those associated with memory phenotypes at p value < 0.10 and minor allele frequency >0.05. Genes harboring any of the input SNPs or those in LD (r 2 > 0.5) with the input SNPs, or located within the closest recombination hotspots plus 50 kb are identified. Proteins coded by these genes are used to construct an interaction network. Four parameters are estimated for the observed network: (1) number of edges in the direct network; (2) the average number of proteins with which each seed protein directly interacts; (3) the average number of proteins with which each seed protein indirectly interacts; (4) the average number of seed proteins bound by common interactor (CI) proteins. The distributions of these estimates are then enumerated via 20,000 permutations by randomly reassigning proteins of the same binding degree (i.e., the total number of interactions a protein has in the InWeb database) as the proteins in the observed network to each node. Individual seed proteins are then scored based on their presence in direct and indirect networks. The significance of these scores are evaluated in the same permutation procedure and Bonferronicorrected for the number of possible candidate proteins from each locus to prioritize genes (pcorr < 0.05).

Genetic correlation analyses
We used LDscore (LDSC) regression as implemented in LD Hub [27] to estimate the degree of overlap between the polygenic architecture of the traits. We estimated genetic correlations between verbal episodic memory traits and traits that may be phenotypically linked with memory (categories: Neurological, Psychiatric, Brain volume, Aging, Cognitive, Education, Cardiometabolic, and Glycemic). In these analyses, we excluded the American cohorts as their consent precluded the use of their data to examine an association with education. Therefore, sample size was 26,977 in the analyses of genetic correlation with VSTM and 25,180 in the analyses of genetic correlation with the VL. We used FDR correction to account for multiple comparisons. Heritability z-scores were 4.9 and 7.4 for VSTM and VL, respectively, suggesting that the datasets for both traits are suitable for LDSC analyses.

CODE AVAILABILITY
Code of the primary statistical analyses can be obtained from the corresponding author.