Introduction

African Americans develop kidney disease at a rate five times higher than European Americans1. Two African ancestry-associated variants (G1 and G2) in the apolipoprotein L1 (APOL1) gene constitute major contributors to this disparity. APOL1 is a component of the innate immune system targeting African trypanosomes, and the G1 and G2 variants likely rose to high population frequency by conferring resistance to Trypanosoma brucei rhodesiense (particularly G2) and Trypanosoma brucei gambiense (exclusively G1)2,3. However, the putative evolutionary benefits come at a cost of increased lifetime risk for kidney disease in individuals with two copies of these variants (i.e., G1/G1, G2/G2, or G1/G2, identified as APOL1 high-risk genotypes). This is thought to be mediated by the ability of G1 and G2 variants to form cation-selective channels in podocytes resulting in subsequent activation of cytotoxic pathways4,5,6. This predisposes to progressive kidney disease, with odds ratios for hypertension-associated end stage kidney disease (ESKD), focal segmental glomerulosclerosis (FSGS), and HIV-associated nephropathy exceeding 7, 17, and 30, respectively, when comparing APOL1 high-risk (APOL1-HR, i.e., individuals carrying either the G1/G1, G1/G2, or G2/G2 genotypes) to low-risk (APOL1-LR) genotypes3,7.

The number of at-risk individuals for APOL1-associated FSGS and kidney disease is considerable. In the United States, it is estimated that 13% of African Americans carry two high-risk alleles8, and in certain West African populations, the rate of high-risk genotypes may be as high as 20–25%8,9. Approximately 15% of individuals with an APOL1-HR genotype will develop ESKD, and a smaller fraction, estimated at 5%–8%, will develop FSGS8. Due to the high frequency of these genotypes, we estimate that at least 200,000 individuals in the US have APOL1-associated FSGS. The incomplete penetrance of APOL1-HR genotypes is thought to reflect the requirement for disease modifiers that potentiate APOL1 cytotoxicity. A number of “second hits” have been proposed, with the most commonly recognized being high-interferon states (which result in increased APOL1 expression), either due to direct interferon administration, or caused by viral infections (e.g., HIV, SARS-CoV-2)10,11. Genetic modifiers have been suggested but, to date, the identification of modifier genetic variants for APOL1-mediated kidney disease and, particularly, FSGS, remains elusive. The few reported in the literature still require validation12,13.

In 2019, we studied the cytotoxic effect of multiple naturally and non-naturally occurring APOL1 haplotypes in experimental cell-based systems. We found that the toxicity of G1 and G2 alleles was substantially reduced when expressed on the haplotype defined by the APOL1 missense variant p.N264K (chr22:36265628 C > A; rs73885316)14, also associated with a partial loss of trypanolytic function15. These data suggested, at a functional level, a protective effect for this variant against the deleterious cellular effects of the G1 and G2 APOL1 risk variants. The p.N264K defines one of the common G0 (non-risk) APOL1 haplotypes, which is more frequent in individuals of European ancestry, but it is also present on a small fraction of G2 haplotypes in absence of G0, indicating two independent mutational events during evolution only on these two haplotypes. The p.N264K is therefore expected to be mutually exclusive with the APOL1 G1 allele.

In this work we show a strong protective role of the APOL1 p.N264K variant against APOL1-related FSGS and CKD, in the context of high-risk G2-containing genotypes of African origin. Therefore, this variant, based on prior functional and current genetic data, counters the toxic effect of the G2 allele, allowing the reclassifying of APOL1 high-risk individuals as non-high-risk if they carry the p.N264K missense variant.

Results

To test the hypothesis that the G2-p.N264K haplotype differs in its genetic impact from the more common G2 risk allele without the p.N264K variant, we sought to compare its frequency in APOL1-HR subjects with FSGS to APOL1-HR controls without kidney disease (Fig. 1A). First, to eliminate potential confounding by the p.N264K haplotype defined by the more common APOL1 non-risk G0 allele, we excluded all individuals with non-risk, G0-containing genotypes, i.e., G0/G0, G0/G1, and G0/G2. We studied two case-control FSGS discovery cohorts: the first consisted of 434 APOL1-HR FSGS cases and 2398 genetically matched APOL1-HR population controls subjected to Illumina DNA microarray genotyping and imputation; the second included 94 APOL1-HR FSGS cases and 208 genetically matched APOL1-HR controls with whole genome sequencing data (Supplementary Fig. 1), for a total of 528 FSGS cases and 2606 population controls with no known kidney disease. Next, in order to investigate the impact of the p.N264K variant, we conducted a comprehensive analysis only on APOL1 high-risk individuals, employing categorical approaches (based on allelic frequency) and, as sensitivity analysis, regression-based (based on genotypes) statistical tests. The primary analysis was conducted on categorical variables using a Cochran–Mantel–Haenszel (CMH) test and considering potential confounding factors such as sex and array-based vs sequence-based genotyping. We then conducted a set of sensitivity analyses: first, we used Firth’s regression test and also incorporated principal components (PCs) as covariates in order to account for potential residual population stratification; second we conducted haplotype-of-origin analysis using Tractor16.

Fig. 1: Protective effect of the APOL1 p.N264K missense variant against G2-associated FSGS.
figure 1

A Graphical representation of the study design, cohorts and main results of the study. Stratified association analysis of the combined cohort of 528 APOL1 high-risk FSGS and 2606 genetically-matched APOL1 high-risk controls: B stacked bar plot for the p.N264K MAF across APOL1-HR genotypes in cases and controls; the Allele C is the reference allele encoding for the p.N264; the Allele A is the minor allele resulting the p.K264 variant amino acid; C Forest plot for the p.N264K association analysis showing significantly protective odds ratios across APOL1 high risk (G1G1, G2G2 and G1G2) genotypes. The plot describes odd ratio and confidence interval for HR (OR = 0.067), G1G2 (OR = 0.136), G2G2 (OR = 0) and G1G2 + G2G2 (OR = 0.081). The P values were obtained separately for aforementioned individual risk alleles using two sided Cochran-Mantel-Haenszel chi-squared test without multiple correction across the alleles (See Methods). No OR or CI for the G1G1 genotype (263 cases, 991 controls) are shown in forest plot because the p.N264K was absent in both groups (the APOL1 G1 and p.N264K alleles are mutually exclusive), resulting in undefined OR, infinite CI, and a p-value of 1. FSGS focal segmental glomerulosclerosis, CKD chronic kidney disease, ESKD end-stage kidney disease, AF allele frequency, Ctrls controls, OR odds ratio, CI = 95% confidence interval, MAF minor allele frequency. The cartoon in (A) has been created using BioRender at www.biorender.com.

In our APOL1- HR FSGS cohorts, we observed a strong protective effect for the p.N264K minor allele ‘A’ (MAF cases = 0.19% and MAF controls = 2.7%, OR = 0.07, 95%CI = 0.01–0.25, CMH test P = 3.4 × 10−9) as compared to APOL1-HR controls. Stratifying the cohort for the three APOL1 high-risk genotypes showed that this variant was only observed within APOL1-HR individuals carrying the G2 allele (i.e., G1/G2 and G2/G2) and, as expected, never in G1/G1 subjects (Fig. 1, Supplementary Fig. 2). These findings support a protective effect of the p.N264K variant only in the context of G2-containing APOL1-HR genotypes. In fact, the p.N264K variant seemed to confer complete protection against FSGS as it was never observed in cases in the presence of the G2/G2 genotype: OR = 0, 95%CI 0–0.41; CMH test P = 4.4 × 10−4. A strong and significant protective effect was also observed for the G1/G2 genotype with a p.N264K MAF of 3.57% in controls as compared to 0.49% in cases (OR = 0.14, 95%CI:0.16–0.52; CMH test P = 4.0 × 10−4). Consistent with these findings, analyzing individuals with G1/G2 or G2/G2 genotypes combined increased the level of statistical significance for the p.N264K protective effect (OR = 0.08, 95%CI 0.01–0.3, CMH test P = 2 × 10−7).

The FSGS case-control samples were well-matched on principal component analysis (PCA) (Supplementary Fig. 1). In fact, our sensitivity analyses that additionally adjust for population structure confirmed the results obtained by CMH, as Firth’s regression tests supported the strong protective effect of the p.N264K variant against FSGS with comparable effect sizes (Supplementary Fig. 2).

As expected from population distribution of haplotypes, in the context of APOL1-HR genotypes, the p.N264K is limited to G2-containing genotypes (i.e., G1/G2 or G2/G2). Nevertheless, a recombination event between the p.N264K and the G1 or G2 alleles (although very unlikely given the proximity of these APOL1 alleles), could result in contamination from the European G0-p.N264K haplotype due to local ancestry admixture. To evaluate this scenario, in our final sensitivity analysis we conducted haplotype-of-origin analysis in the discovery cohort using Tractor16, a statistical framework that deconvolutes the local haplotypes into ancestral (in this case European and African) haplotypes. This confirmatory analysis showed a significant protective effect of the p.N264K variant exclusively originating from the African haplotype (OR = 0.10, 95%CI = 0.02–0.29, P = 1.3 × 10−7), while the European haplotype was non-significant (OR(ADJ) = 0.74, 95%CI = 0.00–11.37, P = 0.85) despite larger sample size (Supplementary Fig. 3). Again, stratifying for G1/G2 or G2/G2 further validated the G2-specific protective effect of the p.N264K variant for the African haplotype (OR = 0.12, 95%CI = 0.02–0.35, P = 3.53 × 10−6) but not for the European haplotype (OR(ADJ) = 0.76, CI = 0.00–12.75, P = 0.86).

Overall, these results support a strong protective effect of the APOL1 p.N264K missense variant against APOL1-associated FSGS, but this effect occurs exclusively on G2-containing APOL1 high-risk genotypes of African origin. In practical terms, based on these analyses, APOL1-HR individuals are at least 8.3 times less likely to develop FSGS if they carry one copy of the p.N264K missense variant.

Finally, to test the generalizability of these findings to milder forms of APOL1-associated kidney disease, we investigated the protective effect of the APOL1 p.N264K in individuals from the REasons for Geographic and Racial Differences in Stroke (REGARDS)17 and Electronic Medical Records and Genomics Phase III (eMERGE-III)18 studies. In aggregate, these cohorts included 1573 APOL1-HR individuals with available kidney function data. Of these, 276 had CKD stage 3 (REGARDS, N = 150; eMERGE-III, N = 126) or worse (considered as cases), and 1297 genetically-matched APOL1-HR controls (REGARDS, N = 893; eMERGE-III, N = 404) with estimated glomerular filtration rate (eGFR) > 60 ml/min/1.73 m2 (Supplementary Fig. 4A, B). Despite the smaller sample size, milder form of APOL1-associated kidney disease, and incomplete clinical data to classify and exclude unrelated causes for CKD in these cohorts, the findings revealed a direction-consistent protective effect for the p.N264K variant among individuals with the G2-APOL1-HR genotypes, by which p.N264K carriers were 3.3 times less likely to have CKD3 or worse (OR = 0.30, 95%CI: 0.11–0.83, CMH P = 0.023, Supplementary Table 2 and Supplementary Fig. 4C), with this likely representing an underestimation due to confounders as mentioned above.

Discussion

Here we report on the strong protective effect of the APOL1 p.N264K missense variant against G2-mediated FSGS and kidney disease. These findings are also supported by a recent report from the Million Veteran Program, reporting reduced risk for CKD and ESKD in APOL1-HR individuals with this variant19. These results have immediate and broad implications for translational research and clinical practice. First, from the genetic standpoint, it is important to note that we observed a very large effect of p.N264K on mitigating the consequences of the G2 risk allele but saw no evidence of this variant on the more common G1 risk allele. As consequence, because p.N264K and G1 alleles are mutually exclusive, this finding raises the possibility of additional genetic modifiers specific to G1 and, in general, identifiable by considering genotype-specific APOL1 studies. In addition to studies of the APOL1 high-risk genotype as a single genetic driver, analyses conducted by partitioning cohorts into the three specific APOL1 high-risk genotypes, although might require larger sample sizes, are likely to provide significant additional insight into the genetics and underlying biology of APOL1-associated FSGS and kidney disease. Second, our genetic observations are in agreement with our previous functional studies showing that the p.N264K variant is able to reverse the cytotoxic effect of both G1 and G2 risk variants in cell-based assays14. Therefore, conceptually, it may be best to regard the p.N264-G2 and p.K264-G2 simply as different alleles that encode different proteins. As such, they likely adopt different conformations and/or have different activities at the protein level. This will become clearer as we learn more about the APOL1 protein structure(s) in the future.

Taken together, these data support the hypothesis that the p.N264K missense variant negates the toxic effect of the G2 allele, and will allow the reclassification of a fraction of APOL1 G1/G2 or G2/G2 high-risk individuals as having a non-high-risk genotype if p.N264K is also present. This discovery has substantial, immediate, and clinically-relevant implications. First, individuals affected by CKD or ESKD with APOL1 G1/G2 or G2/2 high-risk genotypes but with the p.N264K missense variant are unlikely to have APOL1-associated FSGS, and therefore an additional cause (immune, toxic, structural, or others) should be investigated because this will likely result in a different therapeutic approach. Second, importantly, in kidney transplant settings, these results can significantly affect donor selection and both donor kidney, and recipient graft, outcome. In fact, APOL1 G2-HR donors who are p.N264K positive will likely have kidney outcomes similar to any of the G1G0, G2G0, and G0G0 low-risk donors, thus expanding donors’ pool; kidney transplant recipients of a APOL1-HR-p.N264K kidney will likely have low risk for developing de novo FSGS on the graft or graft failure from APOL1-associated kidney disease. Third, incorporation of this knowledge will allow more accurate study design for new intervention trials by which individuals with APOL1-HR-p.N264K genotypes should not be included in the intervention arm as cases since this genotype is genetically and functionally a low-risk genotype. Finally, the knowledge presented here will affect family risk stratification and planning, and, in general, CKD risk ascertainment at the population level.

Methods

Written informed consent was collected from all participating patients seen at Columbia (and collaborating Institutions) and/or their guardians in accordance with the Columbia University Institutional Review Board (Protocol AAAC7385) and the policy on bioethics and human biologic samples of AstraZeneca. All internationally recruited patients and/or their guardians were consented according to the Declaration of Helsinki and in compliance with the local ethic committees, as part of the parent IRB protocol approved at Columbia University.

FSGS cohorts, controls, genotyping and imputation, and association tests

FSGS case-control cohort 1: The cohort consisted of 434 FSGS APOL1-HR cases (Supplementary Table 1) and 2398 APOL1-HR controls. The genotyping of the cases was performed using multiple versions of the Illumina Multi-Ethnic Global Array (MEGA) chips (n = 196) that included MEGA 1.0, MEGA 1.1 and MEGAEX, and the Illumina HumanOmniExpress-12 (n = 238). The controls were genotyped on MEGA1.0 and were selected based on genetic ancestry and APOL1 genotype status from over 50,000 individuals from the PAGE consortium20. We extracted G1 (rs73885319) and G2 (rs71785313) from the MEGA arrays to define APOL1 high-risk cohort. Also, the p.N264K (chr22:36265628 C > A; rs73885316) variant was included on the MEGA arrays and hence directly genotyped, while imputed with R2 > 0.8 in the Illumina HumanOmniExpress-1221. The differences between the chips were corrected first by mapping all the SNPs to a common cluster file in Genome Studio software for individual platforms and further using Snpflip (https://github.com/biocore-ntnu/snpflip) software. In total, we used 767,100 SNPs as input for imputation after quality control, which included filtration for MAF > 1%, missing SNPs <95%, HWE (controls) P < 0.00001, and the McCarthy Group Tools (https://www.well.ox.ac.uk/~wrayner/tools/) for strand bias and removal of SNPs that deviated from expected allele frequency using the 1000 Genome Project. The same quality control was applied for cases from HumanOmniExpress-12 separately.

We performed imputation on APOL1-HR cases (MEGA and HumanOmniExpress) and controls (MEGA) together using the TopMed reference imputation panel22. SNPs with R2 > 0.8, MAF > 1%, missing SNPs <95%, and HWE (controls) P > 0.00001 were retained. All analyses were done on unrelated samples after removing the relatedness up to two degree using KING v2.3.023. PCs were calculated using PLINK 2 based on the LD-pruned SNPs24.

FSGS case-control cohort 2: The cohort for this study consisted of 94 APOL1-HR cases (refer to Supplementary Table 1) and 208 APOL1-HR controls, all subjected to 30X whole-genome sequencing. To obtain the genetic data, the raw FASTQ files for the cases were aligned to the hg19 assembly25. The alignment data underwent processing using the DRAGEN pipeline, resulting in recalibrated GVCFs (Genomic VCFs)26. The GVCFs were jointly called using GATK 4.3 separately for samples from the DUKE and CureGN cohorts27. For the control group, we included APOL1-HR samples obtained from the 1000 Genomes Project, MESA cohort, and internal controls from the Columbia University Institute for Genomic Medicine28,29. Genotypes were extracted after performing internal harmonization. Initially, all the case samples were lifted from the hg19 to the hg38 assembly using the rtracklayer R package30. Then, the genotypes from the cases and controls were merged based on common SNPs.

The APOL1 G1, G2, and p.N264K variants were directly sequenced to obtain specific genetic information. To ensure the analysis was performed on unrelated individuals, we removed relatedness up to two degrees using the KING v2.3.0 software. The same quality control measures were applied to the FSGS cohort 2, as for the initial FSGS cohort, before calculating PCs.

Statistical analyses

Stratified analysis on alleles was conducted on the two FSGS cohorts using the CMH test statistic. The CMH test was performed using the mantelhaen.test function in R with the exact = TRUE option31. The cohorts were stratified based on the variables of cohort and sex for each of the haplotypes (G1/G1, G1/G2, G2/G2) both individually and combined.

Furthermore, Firth regression was performed separately for each haplotype within each cohort using PLINK232. The covariates used in the regression analysis were sex and the first two PCs. Finally, a meta-analysis was conducted to combine the results from the two cohorts. The fixed-effect model was utilized, considering the effect sizes and standard errors obtained from the PLINK2 analysis33.

Ancestry resolution analysis at the APOL1 locus on FSGS cohort 1

Phased genotypes from the first FSGS cohort were utilized after imputation with the TOPMed reference panel. These genotypes were employed for predicting local ancestry inference (LAI) using RFMix v234. The LAI prediction was performed against samples harboring common variants obtained from the 1000 Genomes Project, specifically YRI (representing the African population) and CEU (representing the European population). The output from RFMix was subsequently integrated into the Tractor pipeline to deconvolute ancestry-specific dosages, variant call files), and haplotype counts for each sample and SNP16. Within Tractor, ancestry-specific haplotype counts and dosages for the p.N264K variant were extracted. Firth regression analysis was conducted using the logistf R package, incorporating sex, admixture fraction (derived from RFMix), and haplotype counts as covariates in a fixed effect model32. This analytical approach facilitated the assessment of the p.N264K variant’s association with FSGS by incorporating LAI, deconvolution of dosages and haplotype counts, and regression analysis with appropriate covariates.

CKD cohorts, genotyping, and analyses

The REGARDS study: The REGARDS study investigates the incidence of stroke in a population of 30,239 Black and White adults (≥45 years of age)17. Within this study, we identified 8198 Black participants with genotyped APOL1 risk alleles (G1 & G2) using TaqMan SNP Genotyping Assay35 and genome-wide genotyping using the MEGA. To increase sample size, we imputed APOL1 genotypes for an additional 534 subjects using the TOPMed Imputation Server21. Using kinship analysis, we identified and removed related samples (up to 2nd degree) between the REGARDS and PAGE consortiums. Finally, we removed all individuals with an APOL1 low-risk genotype, i.e., G0/G0, G0/G1, G0/G2. Our final cohort was composed of 1043 APOL1 high-risk individuals with genotypes G1/G1 (n = 417), G1/G2 (n = 455) and G2/G2 (n = 171). The p.N264K variant was included and directly genotyped in the MEGA array. To compare the allele frequency of the p.N264K variant, we stratified samples into the two following groups: cases - estimated glomerular filtration rate (eGFR) < 60, or ESKD, or self-reported kidney failure (N = 150); control: eGFR > 60 (N = 893). eGFR was measured from the CKD-Epi equation.

Electronic Medical Records and Genomics (eMERGE) study: The eMERGE network has made available electronic health record information connected to GWAS data for a total of 102,138 individuals18. These individuals were recruited in three phases (eMERGE-III) from 12 participating medical centers spanning the years 2007–2019. The study cohort consisted of 54% females, with an average age of 69 years. Self-reported demographics indicated that 76% identified as European, 15% as African American, 6% as Latinx, and 1% as East or Southeast Asian.

Every individual underwent genome-wide genotyping, and the specific procedures for genotyping, quality control analyses, and imputation have been previously documented36. PCs were derived using FlashPCA37 on a collection of 48,509 common variants (minor allele frequency ≥0.01) that were independent (pruned in PLINK using the –indep-pairwise 500 50 0.05 command). Imputation of the APOL1 variants G1 (R2 = 0.998), G2 (R2 = 0.995), and N264K (R2 = 0.8445) was performed using the TOPMed imputation server, as outlined in the referenced publication22. From this cohort, we selected 530 APOL1-HR individuals with available kidney function data in order to classify 126 cases based on eGFR <60 (CKD3-5 or ESKD) and 404 controls (eGFR >60), in the same way as for the REGARDS study above.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.