Beyond autosomal-dominant early-onset AD (<1% of all AD cases, onset at ≤65 years), the common complex form of AD has an estimated heritability of approximately 70%1. Using genome-wide association studies (GWAS), 75 mostly common genetic risk factors/loci have been associated with AD risk in populations with European ancestry; however, individually these common variants have low effect sizes2. Using DNA sequencing strategies, rare (allele frequency <1%) damaging missense or loss-of-function (LOF) variants in the TREM2, SORL1 and ABCA7 genes were identified to also contribute to the heritability of AD, with substantially higher effect sizes than individual GWAS hits3,4,5,6,7,8. To detect additional genes for which rare variants are associated with AD risk, it is necessary to compare genetic sequencing data from thousands of AD cases and controls. In a large collaborative effort, we harmonized sequencing data of studies from Europe and the USA and applied a multistage gene burden analysis (Fig. 1a) (for sample descriptions, see Supplementary Table1 and Extended Data Figs. 1 and 2). We observed site-specific technical biases, since data were generated at multiple centers, using heterogeneous methods (Supplementary Table 2). To account for these batch effects, we designed and applied comprehensive quality control (QC) procedures (Methods and Supplementary Tables 35).

Fig. 1: Study setup and power.
figure 1

a, Schematic of the study setup. The AD association of genes identified in stage 1 was confirmed in stage 2 and significance was determined by meta-analysis. Variant characteristics were investigated in a merged mega-sample rather than the meta-sample, allowing more accurate variant effect size estimates for variant categories/age-at-onset bins. The mega-sample (without exome extracts) was also used for the GWAS gene burden analysis. MTC, multiple testing correction. b, Top, number of genes (y axis) with at least a certain cumulative carrier frequency of prioritized variants (x axis), prioritized according to different deleteriousness thresholds. White box, genes with a cMAC ≥ 10 (cumulative minor allele count of ≥10 prioritized alleles identified across the 12,652 cases and 8,693 controls in the stage 1 sample) were considered to have sufficient carrier frequency to allow burden analysis. The SORL1, TREM2 and ABCA7 genes are indicated, revealing that carriers of rare damaging variants in these genes are relatively common, allowing identification in smaller sample sizes3,4,5,6,7. Bottom, power analysis for stage 1, to attain a P < 1 × 10−6, at the same scale as the top figure. For comparison, we indicate 80% power thresholds for sample sizes of 1,000 and 5,000 individuals (subsampled from stage 1). Cumulative carrier frequency and estimated effect size ranges are indicated for common variants identified to associate with AD by GWAS (green), rare-variant burdens in SORL1, TREM2 and ABCA7 identified using sequencing studies3,4,5,6,7 (grey/blue), and for rare variants observed in autosomal dominant AD (magenta). Common variants with high effect sizes (red) are not expected to exist. Genes with cMAC < 10 were not analyzed (pink). Power calculations show that aggregating more cases and controls might allow for the identification of rare-variants that have a large effect on AD but for which only few carriers are observed, or for variants that have a modest/average effect on AD, for which many carriers are observed (power calculations shown in Supplementary Table 6). c, Quantile–quantile plot of P values determined in the stage 1 discovery analysis based on an ordinal logistic burden test. For each of 13,222 genes, we tested the burden of variants adhering to four variant deleteriousness thresholds, conditional on having a cMAC ≥ 10 (n = 31,204 tests). Threshold for multiple testing correction: FDR < 0.1, P value inflation, 1.046. Gene names in black indicate the deleteriousness threshold of the most significant burden test in that gene.

After sample QC, we first compared gene-based rare-variant burdens between 12,652 AD cases, consisting of 4,060 early-onset AD cases (EOAD, age at onset ≤65 years) and 8,592 late-onset AD cases (LOAD, age at onset >65 years) and 8,693 controls (stage 1 analysis; Supplementary Table 3). We detected 7,543,193 variants after sample and variant QC and annotated LOF variants with LOFTEE and missense variants with the Rare Exome Variant Ensemble Learner (REVEL) score and selected variants with a minor allele frequency (MAF) < 1% (Supplementary Table 4). We defined 4 deleteriousness thresholds by incrementally including variants with lower levels of predicted deleteriousness: LOF (n = 57,543), LOF + REVEL ≥ 75 (n = 111,755), LOF + REVEL ≥ 50 (n = 211,665) and LOF + REVEL ≥ 25 (n = 409,733), respectively. Of the 19,822 autosomal protein-coding genes, we analyzed the 13,222 genes that had a cumulative minor allele count (cMAC) ≥ 10 for the lowest deleterious threshold LOF + REVEL ≥ 25 (Methods); 9,168 genes for the LOF + REVEL ≥ 50 threshold, 5,694 for the LOF + REVEL ≥ 75 threshold and 3,120 genes for the LOF-only threshold (Fig. 1b). For these different deleteriousness thresholds, this analysis has an estimated power of 41, 22, 11 and 4%, respectively to attain a signal with P  < 1 × 10−6 in stage 1, assuming that for a gene, the differential variant burden between cases and controls is associated with an odds ratio (OR) of 10.0 in EOAD and 3.33 in LOAD (Supplementary Table 6). Therefore, this analysis has only the power to discover genes for which either the differential variant burden is associated with a large effect size, and/or genes for which large numbers of damaging variant carriers are observed (Fig. 1b). Using ordinal logistic regression, 31,204 burden tests were performed across 13,222 genes in stage 1 (single genes were tested with up to 4 thresholds). Statistical inflation of test results was negligible (𝝀 = 1.046; Fig. 1c). Of all the burden tests performed, 13 tests, covering 6 genes, indicated a differential rare-variant burden between AD cases and controls (false discovery rate (FDR) < 0.1): SORL1, TREM2, ABCA7, ATP8B4, ADAM10 and ABCA1 (Table 1)).

Table 1 Stages 1 and 2 and meta-analysis AD association statistics

To confirm these signals, we applied an analysis model consistent with stage 1 to an independent stage 2 dataset, which after QC, consisted of 3,384 cases and 7,829 controls (Supplementary Table 3–5) and also with negligible P value inflation (𝝀 = 1.016; Extended Data Fig. 3). The effect was tested in the direction observed in stage 1 (one-sided test). All genes selected in stage 1 reached P < 0.05 (Table 1, stage 2). The stage 2 effect sizes of these genes correlated with those observed in stage 1 (Pearson’s r on log odds = 0.91). We then meta-analyzed stage 1 + stage 2 across the 13 tests using a fixed-effect inverse variance method and corrected for the 31,204 tests performed in stage 1 (Holm–Bonferroni) (Table 1). This confirmed the AD association of rare damaging variants in the SORL1, TREM2, ABCA7, ATP8B4 and ABCA1 genes. The association signal of the ADAM10 gene was not significant exome-wide, presumably because prioritized variants in this gene are extremely few and rare, such that the signal can be confirmed only in larger datasets.

Strikingly, most of these genes also map to GWAS loci (SORL1, TREM2, ABCA7, ABCA1 and ADAM10). This led us to perform a focused analysis on GWAS loci, aiming to identify potential driver genes. To maximize statistical power, we merged the full exomes from the stage 1 and stage 2 samples into one mega-sample, again with negligible P value inflation (𝝀 = 1.025; Extended Data Fig. 4). We interrogated genes that were previously prioritized to drive the AD association in the 75 loci identified in the most recent GWAS2 (Supplementary Table 7 and Methods). In 67 genes, we observed sufficient prioritized variants (cMAC ≥ 10) to test the burden signal in at least 1 deleteriousness category (a total of 187 tests). In addition to the genes mentioned above, our analysis indicated a suggestive signal of increased AD risk in RIN3, CLU, ZCWPW1 and ACE (FDR < 0.05) (Table 2 and Supplementary Table 8); these signals will have to be confirmed in a larger dataset. Nevertheless, the AD associations in these genes persisted when focusing on the burden of only the very rare variants (MAF < 0.1%), suggesting that the rare-variant burden is not in linkage with, and thus independent from, the GWAS sentinel variant.

Table 2 GWAS-targeted analysis in a mega-dataset without exome extracts

Together, the newly associated genes provide additional evidence for a central role for APP processing, lipid metabolism, amyloid-β (Aβ) aggregation and neuroinflammatory processes in AD pathophysiology. Like ABCA7, ATP8B4 encodes a phospholipid transporter. Rare variants in this gene have been associated with the risk of developing systemic sclerosis, an autoimmune disease9. In the brain, ATP8B4 is predominantly expressed in microglia. Interestingly, GWAS indicated a potential association of ATP8B4 with AD2, mainly through the rare missense variant that was most recurrent in our study (G395S). Of note, the OR point estimate for ATP8B4 LOF variants was close to 1, allowing for the possibility that the missense variants that drive the ATP8B4 association do not depend on a LOF effect. ABCA1 also encodes a phospholipid transporter; it lipidates apolipoprotein E (APOE)10 and poor ABCA1-dependent lipidation of APOE-containing lipoprotein particles increases Aβ deposition and fibrillogenesis11. In line with this, the rare N1800H LOF variant in ABCA1 was previously associated with low plasma levels of APOE and evidence suggested an association with increased risk of AD and cerebrovascular disease12. The α-secretase ADAM10 plays a major role in non-amyloidogenic APP metabolism13. Evidence for the AD association of rare variants in ADAM10 has remained suggestive until now: two rare missense variants in ADAM10 were reported before to incompletely segregate with LOAD in a few families14 (these variants did not associate with AD in our study; Supplementary Data) and a nonsense variant in the ADAM10 gene segregated with AD but in a small pedigree15. RIN3 has been associated with endosomal dysfunction and APP trafficking/metabolism16,17. CLU (also known as APOJ) affects Aβ aggregation and clearance18 and ACE is suggested to have a role in Aβ degradation19. Thus far, the role of the histone methylation reader ZCWPW1 is unclear.

To better comprehend how these genes associate with AD, we analyzed the characteristics of rare damaging variants that contributed to the burden using the mega-sample (Fig. 2 and Table 3). For damaging variants in most genes, we observed increased carrier frequencies in younger cases and larger effect sizes were associated with an earlier age at onset (P = 0.0001) (Supplementary Table 9 and Extended Data Fig. 5). Yet the variants also contributed to an increased risk of LOAD (Fig. 2a,b and Table 3). The largest effect sizes were measured for LOF variants in SORL1, ADAM10, CLU and ZCWPW1; carriers of such variants had the lowest median age at onset, implying a key role for these genes in AD etiology (Table 3 and Extended Data Fig. 6). Moderate variant effect sizes were observed for LOF variants in TREM2, ABCA1 and RIN3, while the smallest variant effects were observed in ABCA7, ATP8B4 and ACE (Fig. 3 and Table 3).

Fig. 2: Characterization of gene-specific variant features based on the mega-sample.
figure 2

For all variant features, we considered the deleteriousness threshold that provides the most evidence for AD association in the meta-analysis. Variant features were investigated in a merged mega-sample (n = 31,905) instead of the meta-sample because this allows for increased accuracy for estimations of variant effect sizes for each variant category/age-at-onset bin (Table 3, refined burden). a, Carrier frequency according to age at onset. A carrier carries at least one damaging variant in the considered gene. b, ORs according to age at onset. The effect size significantly decreased with age at onset for SORL1, TREM2, ABCA7, ABCA1 and ADAM10 (after multiple testing correction; Supplementary Table 9). c, ORs according to variant frequency. The rareness of variants in SORL1 was significantly associated with the effect size (Supplementary Table 11). d, cMAC by variant frequency: the stacked total number of cases (dark) and controls (light) that carry gene variants with allele frequencies as observed in the mega-sample. The numbers above the bars indicate the number of contributing variants. Whiskers: 95% CI. Genes in black: genes identified to significantly associate with AD in the meta-analysis; gray: genes not significantly associated with AD in the meta-analysis; blue: genes identified by the targeted GWAS analysis, these were not significantly associated with AD in the meta-analysis.

Source data

Table 3 Mega-analysis: carrier frequency, effect sizes, median age at onset and attributable fraction
Fig. 3: ORs according to age at onset and variant pathogenicity.
figure 3

ORs for LOF (red) and missense (yellow) variants as observed in the mega-sample (n = 31,905). Case/control OR (square, 95% CI), EOAD OR (triangle pointing upward), LOAD OR (triangle pointing downward). Missense variants in the considered gene appertained to the variant deleteriousness threshold that provides the most evidence for its AD association (Table 3, refined). The LOF burden effect size was significantly larger than the missense burden effect size in the SORL1 and we observed similar trends in ABCA7 and ABCA1 (Supplementary Table11). Of note, for ZCWPW1 only the burden of the LOF variants was significantly associated with AD; missense variants are shown for reference purposes (REVEL > 25).Grey: gene was not significantly associated with AD in the meta-analysis.

Source data

Extremely rare variants contributed more to large effect sizes than less rare variants (P = 0.03; Supplementary Table 10). Indeed, for SORL1, the variants with the lowest variant frequencies had the largest effect sizes (Fig. 2c and Supplementary Table 11) and damaging variants in ADAM10, CLU and ZCWPW1 were all extremely rare (Fig. 2d). Conversely, we observed that rare but recurrent variants contributed to the AD association of TREM2, ABCA7, ATP8B4 and RIN3 (Fig. 2d). The effect sizes of rare coding variant burdens were large compared to the effect sizes of the GWAS sentinel SNPs (Supplementary Tables 7 and 8). Up to 18% EOAD and 14% LOAD cases carried at least 1 predicted damaging variant in 1 of the 10 genes, compared to 9% of the controls (Supplementary Table 12). The fractions of EOAD cases in our sample that could be attributed to a rare variant in a specific gene ranged between 0.1 and 2.4% (approximately 2%: SORL1, TREM2, ABCA7; approximately 1%: ATP8B4, ABCA1, RIN3; and <0.5% for the remaining genes); for LOAD cases, this ranged between 0 and 1.3% (Table 3 and Extended Data Fig. 7).

We performed an age-matched sensitivity analysis to investigate possible effects from other age-related conditions, which supported a role in AD for all ten identified genes (Extended Data Fig. 8). Since APOE status was used as the selection criterion in several contributing datasets, burden tests were not adjusted for APOE-ε4 dosage; in a separate analysis we observed no interaction effects between the rare-variant AD association and APOE-ε4 dosage (Supplementary Table 13 and Methods). Also, the rare-variant burden association was not confounded by somatic mutations due to age-related clonal hematopoiesis (Supplementary Table 14).

Together, we report ATP8B4 and ABCA1 as new AD risk factors with exome-wide significance and we report suggestive evidence for the association of rare variants in the ADAM10 gene with AD risk. Furthermore, we identified RIN3, CLU, ZCWPW1 and ACE as potential drivers in GWAS loci, illustrating how analyses of rare protein-modifying variants can solve this drawback of GWAS studies20. Larger datasets will be required to further confirm these signals. Given the association of LOF variants with increased AD risk, we suggest that the GWAS risk alleles in the respective loci might also be associated with reduced activity of the gene, which will have to be evaluated in further experiments. We observed an increased burden of rare damaging genetic variants in individuals with an earlier age at onset. Nevertheless, damaging variants (including APOE-ε4/ε4) were observed in only 30% of the EOAD cases (Supplementary Table 12), suggesting that additional damaging variants are yet to be discovered (Fig. 1b). Further, the effect of structural variants such as copy number variants and repetitive sequences will need to be investigated in future analyses. The associated genes strengthen our current understanding of AD pathophysiology. When treatment options become available in the future, identification of damaging variants in these genes will be of interest to clinical practice.


In-depth descriptions of all methods are described in Methods section of the Supplementary Note.

Sample processing, genotype calling and QC

We collected the exome, whole genome sequencing (WGS) or exome extract sequencing data of a total of 52,361 individuals, brought together by the Alzheimer Disease European Sequencing (ADES) consortium, the Alzheimer’s Disease Sequencing Project (ADSP)21 and several independent study cohorts (Supplementary Table 1). Exome extract samples only contained the raw reads that cover the ten genes identified in stage 1. Across all cohorts, AD cases were defined according to National Institute on Aging-Alzheimer’s Association criteria22 for possible or probable AD or according to National Institute of Neurological and Communicative Disorders and Stroke-Alzheimer’s Disease and Related Disorders Association criteria23 depending on the date of diagnosis. When possible, supportive evidence for an AD pathophysiological process was sought (including cerebrospinal fluid biomarkers) or the diagnosis was confirmed by neuropathological examination (Supplementary Table 1). AD cases were annotated with the age at onset or age at diagnosis (2,014 samples); otherwise, samples were classified as late-onset AD (366 samples). Controls were not diagnosed with AD. All contributing datasets were sequenced using a paired-end Illumina platform; different exome capture kits were used and a subset of the sample was sequenced using WGS (Supplementary Table 2).

A uniform pipeline was used to process both the stage 1 and stage 2 datasets. Raw sequencing data from all studies were processed relative to the GRCh37 reference genome, the read alignments of possible chimeric origin were filtered and a GATK-based pipeline was used to call variants, while correcting for estimated sample contamination percentages. Samples were included in the datasets after they passed a stringent QC pipeline: samples were removed when they had high missingness, high contamination, a discordant genetic sex annotation, non-European ancestry, high numbers of new variants (with reference to dbSNP v.150), deviating heterozygous/homozygous or transition/transversion ratios. Further, we removed family members up to the third degree and individuals who carried a pathogenic variant in PSEN1, PSEN2, APP or in other genes causative for Mendelian dementia diseases (stage 1-only) or when there was clinical information suggestive of non-AD dementia. Variants considered in the analysis also passed a stringent QC pipeline: multiallelic variants were split into biallelic variants; variants that were in complete linkage and near each other were merged. Further, we removed variants that had indications of an oxo-G artifact, were located in short tandem repeat and/or low copy repeat regions, had a discordant balance between reads covering the reference and alternate allele, had a low depth for alternate alleles, deviated significantly from Hardy–Weinberg equilibrium, were considered false positives based on GATK variant quality score recalibration or were estimated to have a batch effect. Variants with >20% genotype missingness (read depth < 6) and differential missingness between the EOAD, LOAD and control groups were removed. To account for uncertainties resulting from variable read coverage between samples, we analyzed variants according to genotype posterior likelihoods, that is, the likelihood of being homozygous for the reference allele and heterozygous or homozygous for the alternate allele. To account for genotype uncertainty, the burden test was performed multiple times with independently sampled genotypes and the average P value across these tests is reported.

Variant prioritization and thresholds

We selected variants in autosomal protein-coding genes that were part of the Ensembl basic set of protein-coding transcripts (Gencode v.19/v.29 (ref. 24); Supplementary Note) and that were annotated by the Variant Effect Predictor v.94.542 (ref. 25). Only protein-coding missense and LOF variants were considered (LOF: nonsense, splice acceptor/donor or frameshifts). Missense and LOF variants were required to have a ‘moderate’ and ‘high’ variant effect predictor impact classification, respectively. Then, missense variants were prioritized using REVEL26, annotation obtained from dbNSFP4.1a27 and LOF variants were prioritized using LOFTEE v.1.0.2 (ref. 28). For the analysis, we considered only missense variants with a REVEL score ≥ 25 (score range 0–100) and LOF variants were annotated as ‘high confidence’ by LOFTEE. Variants were required to have at least 1 carrier (that is, at least 1 sample with a posterior dosage >0.5) and an MAF < 1%, both in the considered dataset and the Genome Aggregation Database v.2.1 populations (nonneurological set).

Gene burden testing

The burden analysis was based on four deleteriousness thresholds by incrementally including variants from categories with lower levels of predicted variant deleteriousness: LOF; LOF + REVEL ≥ 75; LOF + REVEL ≥ 50; and LOF + REVEL ≥ 25, respectively. This allowed us to identify the variant threshold providing maximum evidence for a differential burden signal. To infer any dependable signal for a specific deleterious threshold, a minimum of 10 damaging alleles appertaining to this deleteriousness threshold was required, that is, a cMAC ≥ 10. Multiple testing correction was performed across all performed tests (up to four per gene). Burden testing was implemented using ordinal logistic regression. This enabled burden testing to particularly weight EOAD cases since previous findings indicated that high-impact variants are enriched in early-onset (EOAD) cases relative to late-onset (LOAD) cases8. This implies that the burden of high-impact deleterious genetic variants is ordered according to burdenEOAD > burdenLOAD > burdencontrol. Ordinal logistic regression enabled optimal identification of such signals, while also allowing the detection of EOAD-specific burdens (burdenEOAD > burdenLOAD ~ burdencontrol) and regular case-control signals (burdenEOAD ~ burdenLOAD > burdencontrol). For protective burden signals, the order of the signals is reversed, that is, burdenEOAD < burdenLOAD < burdencontrol. We considered an additive model while correcting for six population covariates, estimated after removal of population outliers. P values were estimated using a likelihood-ratio test. Genes were selected for confirmation in stage 2 if the FDR for AD association was <0.1 in stage 1 (Benjamini–Hochberg procedure29). For the GWAS-targeted analysis, a more stringent threshold was used (FDR < 0.05) due to the absence of a separate confirmation stage. For the meta-analysis, genes were considered significantly associated with AD when the corrected P was <0.05 after family-wise correction using the Holm–Bonferroni procedure30. Effect sizes (ORs) of the ordinal logistic regression can be interpreted as weighted averages of the OR being an AD case versus control and the OR being an early-onset AD case or not. To aid interpretation, we additionally estimated ‘standard’ case/control ORs across all samples per age category (EOAD versus controls and LOAD versus controls) and for age-at-onset categories ≤65 (EOAD), 65–70, 70–80 and >80 using multinomial logistic regression, while correcting for 6 PCA covariates.

GWAS driver gene identification

For the 75 loci identified in the most recent GWAS2, genes were selected for burden testing based on earlier published gene prioritizations. First, gene prioritizations were obtained from Schwarzentruber et al.31 for 33 known loci. For 28 remaining loci, we obtained the tier 1 prioritization from Bellenguez et al.2; for loci without prioritization candidates (14 loci), we selected the nearest gene. In total, 81 protein-coding genes were selected (Supplementary Table 7), of which 67 genes had sufficient damaging allele carriers to be tested for at least 1 variant selection threshold. Gene burden testing was performed as described above and multiple testing correction to identify potential driver genes was performed using the Benjamini–Hochberg procedure, with a cutoff of 5%.

Validation of variant selection

We validated the REVEL variant impact prediction for missense and the LOFTEE impact prediction for LOF variants for all variants with an MAF < 1%, for which there were at least 15 damaging allele carriers. For protein-modifying variants that were not in the most significant burden selection of a gene due to a low predicted impact, we investigated whether they, nevertheless, showed a significant AD association (based on a case/control analysis using logistic regression). Vice versa, for variants that were in the burden selection, we investigated whether their effect size was significantly reduced or oppositely directed from other missense or LOF variants in the burden selection (Fisher’s exact test). Individual variant effects were analyzed in the stage 1 dataset, followed by a confirmation analysis in the stage 2 dataset. Multiple testing correction was performed per gene, with an FDR < 0.1 used as the threshold for stage 1 and Holm–Bonferroni (P < 0.05) for stage 2.

Descriptive measures

A variant carrier was defined as an individual for whom the summed dosage of all the variants in the considered variant deleteriousness category is ≥0.5 (see Methods section in the Supplementary Note). Carrier frequencies (CFs) were determined as the number of carriers/number of total samples. Attributable fraction for cases in an age group was estimated as the probability of a case with an age at onset in the age window i being exposed to a specific gene burden\(\left( {{\mathrm{CF}_{\mathrm{case,gene}},i}} \right)\), multiplied by an estimate of the attributable fraction among the exposed for these cases: \(\left( {\frac{{\mathrm{OR}_{\mathrm{gene},i} - 1}}{{\mathrm{OR}_{\mathrm{gene},i}}}} \right)\) (with the OR being an approximation of the relative risk)32,33. For large effect sizes, this estimate approaches the difference in carrier frequency between cases and controls: \(\left( {\mathrm{CF}_{\mathrm{case,gene},i}} \right) - \left( {\mathrm{CF}_{\mathrm{control,gene}}} \right)\).

Sensitivity analyses

We determined if the observed effects could be explained by age differences between cases and controls. We constructed an age-matched sample, dividing samples into strata based on age/age at onset, with each stratum covering 2.5 years. Case/control ratios in all strata were kept between 0.1 and 10 by downsampling controls or cases, respectively. Subsequently, samples were weighted using the ‘propensity weighting within strata method’ (Supplementary Note). Finally, a case-control logistic regression was performed both on the unweighted and weighted case-control labels and estimated ORs and confidence intervals (CIs) were compared (Extended Data Fig. 8) Also, we determined if somatic mutations due to age-related clonal hematopoiesis could have confounded the results. We calculated for all heterozygous calls in the burden selection the balance between reference and alternate reads and compared these to reference values (Supplementary Table 14). While APOE was not included as a confounder, we performed a separate APOE interaction analysis (Supplementary Table 13) through a likelihood-ratio test between a model \({{{\mathrm{label}}}}\sim {{{\mathrm{gene}}}}\_{{{\mathrm{burden}}}}\_{{{\mathrm{score}}}} + {{{\mathrm{APOE}}}}\_{{{\mathrm{e}}}}4\_{{{\mathrm{dosage}}}}\) and an interaction model \({{{\mathrm{label}}}}\sim {{{\mathrm{gene}}}}\_{{{\mathrm{burden}}}}\_{{{\mathrm{score}}}} + {{{\mathrm{APOE}}}}\_{{{\mathrm{e}}}}4\_{{{\mathrm{dosage}}}} + {{{\mathrm{APOE}}}}\_{{{\mathrm{e}}}}4\_{{{\mathrm{dosage}}}}\) \(\times {{{\mathrm{gene}}}}\_{{{\mathrm{burden}}}}\_{{{\mathrm{score}}}}\) . This test was performed on a reduced dataset, from which datasets in which APOE status was used as the selection criterion were removed.

Power analysis

Power calculations were performed for ordinal and Firth logistic regression (case-control and EOAD versus rest; Fig. 1b and Supplementary Table 6). Given the ORs for the EOAD and LOAD cases, and the cMAC per gene, we sampled the number of alleles in the EOAD cases, LOAD cases and controls according to a multinomial distribution. We randomized these allele carriers across the dataset and performed the burden test as described above. The power for genes with a cMAC < 10 was set to 0 since these genes were not analyzed.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.