Introduction

The main pathogenic feature of Alzheimer’s disease (AD) is the accumulation of amyloid beta (Aβ) peptide in brain intracellular compartments.1 Elements involved in the transport and recycling of the amyloid precursor protein (APP) have been considered targets for genetic studies to find AD risk factors. By performing association studies between genetic variants in genes of the endocytic pathway and AD, Rogaeva et al.2 first identified 19 common SNPs and two haplotypes of the neuronal sortilin-related receptor SORL1 to be associated with late-onset Alzheimer’s disease (LOAD). SORL1 is a member of the retromer complex that directly binds to APP and differentially regulates its sorting into endocytic or recycling pathways. Despite some initial mixed results,3, 4 recent studies provide supporting evidence for the association of SORL1 with AD risk.5, 6, 7, 8 Nonetheless, clear pathogenic variants that cause SORL1 malfunctioning have not yet been identified. Recently, Vardarajan et al.9 suggested that coding variants in SORL1 may be involved in LOAD pathology. They described up to 17 exonic variants significantly associated with the disease (Padjusted=0.008) in a family-based Caribbean-Hispanic data set. Nicolas et al.10 listed 24 rare variants in SORL1 with a cumulative effect of an odds ratio (OR)=5.03 (P=7.49 × 10−5) in a French cohort of early-onset Alzheimer's disease (EOAD) patients. However, these findings have not been replicated on independent data sets. Considering that genetic variants, and particularly rare variants, in risk factors are prone to be cohort specific, it is imperative to analyze this gene across different populations. The goal of this study was to evaluate and replicate the presence of non-synonymous variants in SORL1 gene that may increase the risk for AD in three different samples of the European American population: sporadic early-onset Alzheimer’s disease (sEOAD), sporadic late-onset Alzheimer’s disease (sLOAD) and familial LOAD (fLOAD; Table 1).

Table 1 Clinical data of the three demographic groups studied

Materials and methods

All samples included in this analysis were recruited by the Charles F. and Joanne Knight Alzheimer's Disease Research Center (Knight-ADRC) and the National Institute on Aging Genetics Initiative for Late-Onset Alzheimer’s Disease (NIA-LOAD). sEOAD samples came from the Memory and Aging Project (MAP), part of Washington University School of Medicine’s (WUSM) Knight-Alzheimer’s Disease Research Consortium (ADRC). The Institutional Review Board at the WUSM in Saint Louis approved the study. Research was carried out in accordance with the approved protocol. Written informed consent was obtained from participants and their family members by the Clinical and Genetics Core of the Knight-ADRC. The approval number for the Knight-ADRC Genetics Core family studies is 201104178.

Genetic data

sEOAD samples were genotyped for 54 variants: 24 variants reported by Nicolas et al.10 as well as 27 non-synonymous variants with MAF<5% based on the EVS database (http://evs.gs.washington.edu/EVS/) (Supplementary Table S1), MassARRAY (Agena Biosciences) or KASP Assay (LGC Genomics, Teddington, UK).

sLOAD samples were genotyped using the Human Exome BeadChip v1.0 (Illumina, San Diego, CA, USA) technology. Stringent quality controls for exome array calling were performed. Genotype calling was carried out using Illumina's GenTrain version 1.0 clustering algorithm in GenomeStudio version 2011.1. Cluster boundaries were determined using study samples for only the calls with an intensity signal >0.3. A minimum call rate of 98% was used to exclude SNPs and individuals. SNPs that were not in Hardy-Weinberg equilibrium (P<10−6) were dropped. Pairwise genome-wide estimates of proportion identity-by-descent was used to test for unanticipated duplicates and cryptic relatedness.

fLOAD samples were sequenced using either whole-exome sequencing (WES, n=1177) or whole-genome sequencing (WGS, n=59). Exome libraries were prepared using Agilent’s SureSelect Human All Exon kits V3 and V5 (Agilent, Santa Clara, CA, USA). Both, WES and WGS samples were sequenced on a HiSeq2000 (Illumina) with paired ends reads, with a mean depth of coverage of 50–150 × for WES and 30 × for WGS. Variant calling was performed following GATK’s 3.4 Best Practices (https://www.broadinstitute.org/gatk/). Alignment was conducted against UCSC hg19 genome reference. WES and WGS sequences were aligned and variants were called separately, following GATK’s recommendations; back calling was performed to ensure that the same variants were called in both WES and WGS samples. Variant calling was restricted to Agilent’s Exome capture kit with a padded 100 bp region. Only those variants and indels that fell within the above 99.9 tranche and whose quality was ≥30, read depth ≥10 and missingness ≤5%; and those genotypes satisfying a genotype quality ≥20 and a DP ≥6 were kept for analysis. Variants with differential missingness between cases and controls, as well as between WES and WGS data sets, out of Hardy–Weinberg equilibrium (P<1 × 10−6) were removed from analysis. In addition, individuals with discordant sex from that reported in the clinical database were removed from data set. Finally, individual and familial relatedness was corroborated using PLINK1.9 (https://www.cog-genomics.org/plink2/ibd) and an existing GWAS data set for these individuals. Functional annotation and population frequencies were annotated with SnpEff.11 All SORL1 variant annotations refer to sequence with Accession Number NM_003105.5. The data and phenotypes used in this study have been submitted to NIAGADS – The NIA Genetics of Alzheimer’s Disease Storage Site 'https://www.niagads.org/' under accession number NG00051.

Statistical analysis

Single-variant association analysis with risk for AD for all data sets were performed using PLINK1.9, including significant covariates (gender and Principal Component for population stratification); for the family-based data set we used DFAM. For gene-wise analysis we only considered non-synonymous variants (missense, splice site or stop modifier) with MAF<5% within a data set using the SNP-set Kernel Association Test (SKAT).12 The fLOAD samples were analyzed in addition via GEE Kernel Machine score test (GSKAT).13

Results

Sporadic EOAD

From the 48 successfully genotyped variants in the sEOAD cohort (Supplementary Table S1), only one variant (rs117260922:G>A, hg19 chr11:g.121367627G>A) was found nominally associated with AD status (OR=3.462, Pnominal=0.043), found in 12 cases (n=217) and three controls (n=169) (Supplementary Table S2). This variant was previously reported associated with LOAD risk in Hispanic families9 (P=7.68 × 10−7), but we did not find a significant association in our fLOAD data set (Supplementary Table S3). Two other variants were more frequent in EOAD cases than in controls (rs140327834:T>A, rs142884576:C>T) although we did not find a significant difference. Nonetheless, gene-based analysis indicated there are more non-synonymous variants in EOAD cases than in controls (collapsed MAF: CA=4.13%; CO=1.55%; OR=2.66; 95%CI=0.6380–8.9850) almost reaching statistically significance (SKAT P=0.055; Table 2).

Table 2 Gene-based analysis of non-synonymous and damaging SORL1 variants in each of the demographic groups studied

Sporadic LOAD

From the 46 SORL1 variants genotyped in the LOAD case-control sample (134 cases and 266 controls) 16 were polymorphic (Supplementary Table S3). Four variants were present more often in cases than in controls (rs139794846:A>G, rs62617129:A>G, rs143286467:A>G, rs143615238:G>A), although the association of any of these variants to AD risk was not significant in this data set (P=0.2–0.6; Supplementary Table S3). The gene-based analysis of the missense variants was non-significant (OR= 0.992; P=0.979), and the same output was achieved when the analysis was restricted to those considered as probably or possibly damaging by Polyphen2 (OR=0.861, P=0.999; Table 2).

Familial LOAD

Within the fLOAD data set (875 cases and 328 controls), we identified 78 polymorphic variants in SORL1 coding region, 43 of which were considered non-synonymous and among those 17 were classified as probably or possibly damaging by Polyphen2 (Supplementary Table S4). No single-variant test provided a significant association for AD risk. The combined gene-based G-SKAT analysis of the 45 coding non-synonymous variants did not find any significant association with LOAD (P=0.337), nor did the analysis of the 21 variants considered probably damaging (P=0.596; Supplementary Table S4).

Discussion

Gene-based analyses provide more power to detect association than single-variant analyses, especially when these variants present a low frequency (MAF<1%). This is supported by previous studies in which multiple independent variants have been reported as causative14 or increase AD risk in APP, PSEN1, PSEN2, APOE, TREM2, PLD3 and ABCA7.15, 16, 17, 18

In this study, we have observed similar effect of rare variants in SORL1 as in previous studies, both at single-variant level (rs117260922:G>A), and at a gene-based level in the sEOAD cohort, adding support to the role of rare missense variants in SORL1 as risk factors for AD. Although our sEOAD sample size (217 cases and 169 controls) was smaller than that of Nicolas et al.10 (484 cases and 498 controls), we still had enough statistical power (83.4%) to replicate the original finding (gene-based OR= 5.03).

Our results suggest that the effect size of SORL1 may be lower than originally reported. However, it is important to note that in this data set we performed genotyping and not sequencing; therefore, we may have missed additional variants that could affect our OR estimation and P-value.

The lack of significant findings in the sLOAD data set may possibly be due to a combination of limited power and the fact that we were only looking at exome-chip variants, not sequencing data. Instead, the lack of significant association on our fLOAD data set raises some concerns. We analyzed a very large data set containing sequencing data for familial LOAD (345 families, 1190 individuals), but we were unable to find a significant association at the gene-based level, even though we found some variants that seem to segregate in some small families. On the other hand, the different degree of association of SORL1 across the different populations (French,10 Caribbean-Hispanic9 and European American), reinforces the idea that the specific effect size for these low-frequency and rare variants are population-specific. Therefore, in order to replicate these studies, single-variant analyses are not optimal. Instead resequencing of the entire genes in well-matched populations are the most idealistic approach to determine whether these genes are really associated with disease status, and to determine the real effect size of the association.