Introduction

More than 37 million adults (~ 15%) in the US are estimated to have chronic kidney disease (CKD), and ~ 800,000 have kidney failure1. There are significant racial disparities within CKD and kidney failure, particularly among African American (AA) adults, who experience a disproportionately higher burden of disease. AA adults are more likely to have CKD and develop organ failure at a rate nearly four-fold higher than white adults2. Additionally, while CKD often progresses asymptomatically in early disease, it is an independent risk factor for cardiovascular disease (CVD)2. Structural inequities contribute to these disparities (e.g., access to healthcare, socioeconomic status), as well as individual stress due to discrimination and racism. Yet the observed disparities are not fully explained by these factors, highlighting the need to understand the contributing biological influences on CKD3,4,5.

Heritability estimates of kidney traits are high, e.g., 44% for estimated glomerular filtration rate (eGFR) in a cohort of adults in the Netherlands6,7,8. Genetic markers associated with kidney function have been identified, and risk variants in some of these loci have been shown to have a higher prevalence in populations of African ancestry (e.g., APOL1). Still, known genetic variants account for < 10% of CKD phenotypic variation7,9,10. Furthermore, polygenic risk scores (PRS) that can incorporate up to millions of genetic variants associated with kidney disease—both known and unknown—are in development; yet, these risk models similarly explain a small fraction of the variation of kidney disease11,12,13,14,15. Therefore, it is important to consider the role of non-genetic heritable factors, such as epigenetics, in the unexplained, or “missing” heritability of CKD.

Epigenetic modifications are chemical alterations to DNA that can impact gene expression without changing the DNA sequence. These modifications are dynamic and may serve as biomarkers for the mechanisms by which one can understand health disparities due to harmful exposures. DNA methylation, a type of epigenetic modification, has been linked to CKD and other kidney traits in epigenome-wide association studies (EWAS)16,17,18,19,20. Moreover, EWAS have identified altered DNA methylation patterns at cytosine-phosphate-guanine (CpG) sites in association with stressful life events, air pollution, and neighborhood factors21,22,23,24. Notably, a study in 1.5 million individuals showed that DNA methylation explains a higher proportion of the heritability of kidney disease than gene expression25. And a Mendelian randomization analysis in ~ 35,000 adults identified multiple CpG sites that causally affected kidney function20. Thus, as a consequence of structural racism and social stress, AAs may be uniquely susceptible to epigenome-modifying exposures that increase CKD risk. Yet AAs are understudied with respect to CKD epigenomics.

Furthermore, while individual CpG sites may explain a small fraction of variance of their respective traits, studies have shown that incorporation of multiple methylation markers—similar to PRS—into risk algorithms may provide clinical utility26,27. Methylation risk scores (MRS) are an emerging tool unique from PRS given the potential for environmental exposures that may alter methylation levels. By computing risk based on the methylation level of relevant CpG sites, MRS can capture the potential impact of the exposome on gene expression. Because DNA methylation has been linked to CKD, MRS may be an important method of capturing both heritable and environmentally-driven risk for disease. To this end, we sought to develop and validate MRS for CKD. Because previous literature suggests that some social determinants of health (SDOH) may be linked to modifications in DNA methylation, we also assessed relationships between the CpG sites that comprised the optimized MRS and SDOH28,29,30.

Results

Figure 1 provides an overview of the study. Baseline characteristics of the study cohorts are presented in Table 1. In HyperGEN, the prevalence of CKD was 6%; in validation cohorts, CKD prevalence ranged from ~ 1 to 8%. The proportion of Stage G2 individuals ranged from 3 to 31%. HyperGEN EWAS summary statistics are presented in Supplemental Table 1. Selected CpG sites for each MRS are presented in Supplemental Table 2, and performance metrics of all MRS are summarized in Supplemental Table 3. The following CpG sites were used to construct the optimal MRS: cg02090160, cg11098259 (AQP9), cg12116137 (PRPF8), cg17944885 (ZNF788; ZNF20), cg00994936 (DAZAP1), cg02304370 (PHRF1), cg04460609 (LDB2), cg00501876 (CSRNP1), and cg04864179 (IRF5). MRS weights, selected from published eGFR EWAS, are summarized in Table 2.

Figure 1
figure 1

Study overview.

Table 1 Baseline characteristics of MRS development and validation cohorts.
Table 2 Overview of eGFR CpG sites selected for optimized MRS.
Table 3 Selected associations of SDOH with MRS CpG sites in JHS.

In HyperGEN, CKD cases had higher MRS than controls (p = 3.28E − 05, Fig. 2). In the continuous model, a 1 standard deviation (sd) increase in MRS was associated with 5.07 (95% CI 2.48–10.38, p < 0.0001) greater odds of CKD. Additionally, the prevalence of CKD increased with higher thresholds of MRS (Supplemental Fig. 1). Odds ratios showed even further stratification upon exclusion of G2s (Supplemental Table 4), and associations were robust after accounting for CKD risk factors (Supplemental Table 5). Results were consistent for eGFR, such that a 1 sd increase in MRS was associated with a 6.98 ± 1.75 mL/min/1.73 m2 decline in baseline eGFR (Supplemental Table 6). In the full model, the MRS explained 1.6% of the variance in CKD and 5.7% when G2s were excluded. Similarly, inclusion of the MRS improved the covariate model AUC from 0.88 to 0.92, with the MRS-only model boasting an AUC of 0.71.

Figure 2
figure 2

Distribution of optimized MRS in HyperGEN.

Among the validation cohorts, the continuous MRS was a modest predictor of CKD in JHS (OR[95% CI] 1.52[1.02,2.26], p = 0.038) and eGFR in WHI-AS311 (β(SE): − 8.84(4.06), p = 0.030) and WHI-BAA23 (β(SE): − 2.93(1.30), p = 0.024), excluding G2s. In MESA, GOLDN, and WHI-EMPC, MRS was not significantly associated with CKD or eGFR. In AA-only WHI models, MRS was not significantly associated with CKD or eGFR . Complete metrics of MRS performance among individual validation cohorts are summarized in Supplemental Table 7. In the meta-analysis of the validation cohorts, the MRS was significantly associated with CKD (OR[95% CI] 1.66[1.20,2.30], p = 0.003) and eGFR (β(SE): − 2.28(0.72), p = 0.002) for a 1 sd increase and at top 10 and 20% thresholds, but not the top 5% threshold (Fig. 3). Findings were consistent when we restricted analyses to AAs—OR[95% CI] 1.55[1.06,2.26], p = 0.029 for CKD and β(SE): − 2.19(1.17), p = 0.08 for eGFR for a 1 sd increase (Supplemental Table 7). We were unable to obtain threshold-specific estimates for the AA-only meta-analysis, as models for multiple cohorts did not converge in smaller strata.

Figure 3
figure 3

Forest plot of meta-analyzed MRS associations in validation cohorts. (A) Meta-analysis results for CKD associations in validation cohorts. (B) Meta-analysis results for eGFR associations in validation cohorts.

In secondary analyses of MRS associations with SDOH, three MRS CpG sites in JHS were marginally associated with individual and neighborhood-level factors: cg12116137 (PRPF8), cg17944885 (ZNF288; ZNF20), and cg00501876 (CSRNP1). Interestingly, the methylation pattern of cg17944885 (ZNF788; ZNF20) showed inverse association with “protective” SDOH—e.g., health insurance and higher SES—and positive associations with “detrimental” SDOH, e.g., violence and poverty. There was a similar pattern among associations with cg00501876 (CSRNP1). Results are summarized in Table 3.

Discussion

In this study, we utilized DNA methylation array data from 6,858 individuals who ranged from 18–85 years old, spanning from across the continental United States to develop MRS for CKD. Study-specific replication was not obtained in most of the validation cohorts, as the prevalence of CKD these cohorts (1–8%) is notably lower than national population estimates (~ 15%). Still, the meta-analyses suggests that the MRS is a significant predictor of prevalent CKD and eGFR for both AA and non-AA individuals. And to our knowledge, this study constitutes the first MRS developed for kidney function.

Unlike polygenic risk scores, which are commonly derived from genome-wide association study (GWAS) summary statistics using Bayesian methods and may include > 1 million variants, methods for developing MRS are less standardized and tend to apply a candidate-based approach for risk marker selection. For example, previous MRS in cancer studies have quantified absolute methylation at oncogenes via pyrosequencing of tumor tissue26,33,34. Another study derived CpG weights from model parameter estimates26. Other studies have utilized the least absolute shrinkage and selection operator (LASSO) penalized regression method and cross-validation to obtain MRS weights35,36,37. Furthermore, the type of methylation data used for MRS construction also varies from gene-based to CpG to mRNA methylation38. Still, the availability of array data (e.g., 450K and EPIC) allows CpG methylation to be one of the primary sources of MRS construction to date26,27,35,36,37,39,40,41.

Although MRS development methods are becoming more sophisticated—e.g., machine-learning and pruning and thresholding (P + T)—a consensus on methodological frameworks does not yet exist, in contrast with PRS41,42,43,44. Therefore, in the absence of these recommendations, we attempted to construct MRS using markers with substantial statistical and functional support. First, the selected CpG sites were among the top findings in their discovery EWAS and had to have a consistent direction of association in the development cohort. The only exceptions were CpG sites that had been causally linked to kidney function via Mendelian randomization; and even among those four markers, only one (cg00501876, CSRNP1) did not meet the first two criteria. While filtering CpG sites on the basis of similar effect direction in HyperGEN can contribute to overfitting, we also considered additional support and/or biological plausibility for top candidates for MRS construction: replication in at least 2 EWAS in the literature, causal effect on kidney function in Mendelian randomization, location near genes linked to kidney function, and/or significant association with traits or diseases that contribute to kidney function (e.g., aging, diabetes) in prior EWAS. Second, the published effect estimates served as CpG weights because of the large sample sizes of the discovery EWAS (N ~ 5000 to 35,000) compared to the MRS development cohort (N ~ 600). Third, we derived multiple MRS and applied additional sensitivity analyses to obtain the optimal combination of CpG sites. The optimized risk model remained a significant predictor of CKD and eGFR in a large validation sample (N > 6,000).

There is a dynamic relationship between the environment, epigenome, and biological function. And previous studies have identified associations between psychosocial stressors and aberrant DNA methylation near genes relevant to cardiovascular disease and other metabolic processes29,30,46. Similarly, studies have demonstrated a link between neighborhood-level factors—e.g., violence and social disadvantage—and DNA methylation variations21,23. In JHS, we detected modest associations between health behaviors and SDOH and CpG sites included in our MRS, although these associations were not significant after accounting for multiple comparisons. Notably, cg00501876 (CSRNP1) has been causally linked to eGFR in prior studies, and its methylation was associated with most of the factors we assessed. The CpG site is located ~ 2 Kb upstream from a CpG island in an enhancer region. Further, cg12116137 (PRPF8) and cg17944885 (ZNF788; ZNF20) are in enhancer and alternative splicing regions, respectively; and both sites are mapped to genes involved in DNA transcription and mRNA processing. Thus, their methylation may contribute to altered gene expression in kidney disease. Additional studies, such as mediation analyses, are needed to ascertain causal and functional implications of these regions.

To our knowledge, this study is among the first to develop a methylation algorithm for CKD. Strengths of this study include the large sample size of the validation population, as well as the robustness of our findings in a multi-racial sample with significant representation of AAs, who are disproportionately burdened with CKD. Additionally, although prior EWAS weights were based on associations with the 2009 eGFRcr which contains a race coefficient, we optimized the MRS using the updated race-free equation. Furthermore, for multiple cohorts included in this analysis, severe kidney disease was an exclusion criterion of the parent studies, resulting in the smaller sample sizes and that affected the power of the analysis of CKD. We attempted to redress this by meta-analyzing validation cohort results, as well as assessing MRS associations with eGFR as a continuous trait. And we observed consistent directions of effect in both overall and AA-specific meta-analyses. Still, previously established methylation associations with kidney traits should be reevaluated in the context of the new eGFR equations, as well as in studies with a kidney disease prevalence that is comparable to national estimates.

Future studies should evaluate the capacity of MRS to predict incident outcomes (e.g., end-stage renal disease). These algorithms should also be compared to existing clinical algorithms, such as the Kidney Failure Risk Eq. 47 And inclusion of other ‘-omics’ data, such as PRS, with MRS may also improve risk prediction. This multi-pronged approach may aid in the improvement of prevention of kidney failure and reduce disparities.

Methods

Study populations

The cohorts used for this study are summarized in Table 1. Complete descriptions of study design, enrollment, DNA methylation arrays, and quality control (QC) procedures are detailed in the Supplemental Methods. Briefly, a subset of AA participants who had been selected from the extremes of left ventricular mass for epigenomic profiling as part of ancillary study in the Hypertension Genetic Epidemiology Network (HyperGEN) were used to develop the MRS and optimize risk models. Participants from the Genetics of Lipid-Lowering Drugs and Diet Network (GOLDN), Jackson Heart Study (JHS), Multi-Ethnic Study of Atherosclerosis (MESA), and Women’s Health Initiative (WHI) with methylation data were used for external validation of the MRS.

The studies included in this analysis were conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board (IRB) of their respective institutions. Written informed consent to participate in epigenetic studies was obtained from all participants involved in all studies. Additionally, the specific use of these data was reviewed by the IRB of the University of Alabama at Birmingham (15 April 2021) and determined to be not human subjects research.

CKD phenotyping

Prevalent CKD status was defined as eGFR < 60 mL/min/1.73 m2, which is concordant with CKD Stage 3 or higher, as defined by the National Kidney Foundation’s (NKF) Kidney Disease: Improving Global Outcomes (KDIGO) guidelines48. We used the 2021 CKD Epidemiology Collaboration (CKD-EPI) equations to calculate eGFR as a function of participant age, sex, and serum creatinine at baseline visit49.

MRS development

We compiled published EWAS of CKD, eGFR, albuminuria, and urine albumin-to-creatinine ratio (UACR), conducted between 2014 and 202116,17,18,20,31,32,50,51. We then conducted EWAS for these traits in HyperGEN (Supplemental Table 1). The criteria for selection in the MRS were as follows: if published CpG sites (1) were available in both Illumina 450K and EPIC methylation arrays; (2) demonstrated the same directions of effect in HyperGEN EWAS at p < 0.1; and/or (3) had been causally linked to kidney function in prior Mendelian randomization analyses (n = 19 CpGs). We further prioritized CpG sites that had additional statistical support and/or biological plausibility: replication in multiple EWAS, location near genes linked to kidney function, and/or significant association with non-kidney traits or diseases that may contribute to kidney function (e.g., aging, diabetes) (n = 11 CpGs). We then derived multiple MRS as a weighted sum (\(MRS= \sum {\beta }_{k}{CpG}_{k}\)) of the published parameter estimate for the CpG association multiplied by the beta score of the respective CpG for each participant (Supplemental Table 2). For parameter estimates that were generated from models in which CpG methylation levels had been transformed (e.g., M-value), the estimates were back-transformed to reflect a beta score weight. We obtained MRS using effect estimates from EWAS for CKD or eGFR (Supplemental Table 3). For MRS based on eGFR EWAS weights, we multiplied the MRS by − 1 to reflect a higher MRS association with decreasing eGFR. We then log-transformed MRS to have a normal distribution.

MRS optimization

We fit logistic and linear mixed effect regression models in SAS (version 9.4, ‘proc glimmix’ function) to evaluate associations between MRS and prevalent CKD and baseline eGFR in HyperGEN, respectively. In HyperGEN, eGFR was calculated from creatinine-only (2021 eGFRcr) equation. Models were adjusted for age, sex, recruitment center, left ventricular mass index, the first four principal components (PCs) of ancestry, and Houseman-estimated cell counts—CD4 T lymphocytes, CD8 T lymphocytes, natural killer (NK) cells, B cells, and monocytes with granulocytes as the reference—as fixed effects, as well as family relatedness as a random effect52. In sensitivity analyses, we further adjusted models for CKD risk factors: obesity, as measured by body mass index (BMI); smoking status; hypertension; and diabetes. We also excluded KDIGO CKD Stage G2—participants with 60 ≤ eGFR < 90 with mild kidney impairment who may already be on the pathway to severe disease—from the control group and refit the models with the same covariates as described.

For each MRS we developed, we compared the distribution of the score between cases and controls. We compared the liability R2—the variance of the outcome explained by the predictor that also accounts for discrepancies between cohort and population-level disease prevalence. We fit covariate-only models and calculated the area under the curve (AUC) to evaluate the extent to which inclusion of the MRS with known risk factors improved risk prediction. Finally, we calculated the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Additionally, PPV and NPV were adjusted for population prevalence of CKD (aPPV, aNPV), which was obtained from the United States Renal Data System (USRDS). USRDS prevalence was based on estimates from the National Health and Nutrition Examination Survey between 2017 and March 20202.

MRS validation

The optimal MRS was selected based on the liability R2 and area under the curve (AUC) in HyperGEN. In validation cohorts, the MRS was computed using the weighted sum equation as previously described. CpG sites that were not available in validation datasets were excluded from the MRS calculation. eGFR was calculated using the 2021 eGFRcr equation. GOLDN and JHS models were adjusted for the same covariates as the primary HyperGEN model. However, in MESA and WHI, we performed fixed effect logistic and linear regression (‘proc logistic’ and ‘proc genmod’ in SAS) because there were no related individuals in the methylation cohorts and thus, no random effect. Further, in MESA and WHI models, we conducted exact logistic regression due to small sample size and/or very low CKD prevalence. In all WHI analyses, we fit models without sex as a covariate (as all participants were female). We included study-specific covariates, e.g., study arm (clinical trial vs. observational study), as needed. Further, we subset WHI cohorts to obtain AA-specific estimates of MRS association with kidney function.

Once we obtained study-specific estimates for CKD and eGFR in the validation cohorts, we meta-analyzed the associations using the METASOFT program, totaling 6,250 individuals. We also evaluated AA-specific associations in a second meta-analysis, totaling 3,023 individuals from JHS, MESA, and WHI. We applied random-effects models to account for between-study heterogeneity, and we evaluated the MRS associations at 1 sd and thresholds of 5%, 10%, and 20%, as in the primary model. Studies whose models did not converge for a given outcome and/or threshold were excluded, and if more than two of the eligible cohorts were excluded, we did not conduct meta-analysis for that threshold. Due to these criteria, we only report 1 sd associations for the AA-specific meta-analysis.

Secondary analyses

In secondary analyses, we evaluated associations with individual-level factors (alcohol use, physical activity, depression, insurance status, perceived daily discrimination, and perceived stress) and neighborhood-level factors (socioeconomic status (SES); social cohesion; favorable food stores; problems, e.g., excessive noise; violence; poverty rate; and density of AA residents) in JHS. We fit linear mixed models adjusted for age, sex, PC1-PC4, Houseman-estimated cell counts, and family relatedness (random). Neighborhoods were defined according to the 2000 US Census tract at baseline visit. SES was calculated according to the Diez-Roux et al. method, which is a composite of housing values, education, and income levels55. Social cohesion was calculated based on self-reported perceptions of close knitness, trust, values, and safety among neighbors. Favorable food stores represented the number of supermarkets and fruit and vegetable markets within 3 miles. The problems domain included self-reported noise, traffic, lack of access to shopping and parks, litter, and sidewalk condition. The violence domain was similarly based on self-reported frequency of violent activity. Poverty rate was calculated as the percent of persons living below the federal poverty level within a census tract, and density of AA residents was defined as the percentage of Black residents within a census tract.