Frailty is defined as a state of elevated vulnerability to poor resolution of homeostasis1. It is frequently observed in older adult populations2,3 and increases the risk of developing negative health outcomes including falls, physical limitations, hospitalization, and mortality1,4,5. A growing body of research has reported that this syndrome results from a multidimensional interplay of genetic, biological, psychosocial, and environmental factors6,7,8.

Without a gold standard measurement for frailty, various models were proposed to measure and define frailty in the past years4,9,10,11, and they were based on questionnaires, performance measures, routine data, or a combination of any of these. Two principal instruments were widely accepted in clinical practice12, the performance and questionnaire-based approach suggested by Fried et al.4 and the “Frailty index (FI)-approach” suggested by Rockwood et al.10, which can use any kind of data as long as the included items are frailty-related and fulfill specific validated criteria. FI has been shown to be a valid predictor of morbidity and mortality in many different population-based cohorts13,14.

Epigenetic modifications have been recognized to play a major role in aging15,16 and aging-related conditions, such as frailty17,18. Recently, various DNA methylation (DNAm)-based aging algorithms have been developed19,20,21,22,23 and have also been shown to be associated with frailty24,25. However, none of these algorithms was specifically derived for predicting frailty.

Here, we followed a previously proposed three-phase procedure21 to derive and validate DNAm signature-based frailty risk score in a large population-based cohort study of older adults. Replication was performed in another independent population-based cohort. Frailty was defined by a frailty index (FI) based on the concept of deficit accumulation10,26.


Characteristics of the study populations

Table 1 shows the baseline characteristics of the participants. In ESTHER, the distributions of age, sex, body mass index, smoking status, and alcohol consumption were similar among the three subsets (all P-values > 0.05). The mean age was approximately 62 years, and a slight majority of participants were women. The majority of participants were overweight or obese (approximately 7 out of 10). Approximately half of them had never smoked, and approximately one out of six participants was still smoking at the time of enrollment. In KORA-age, the mean age was about 76 years and half of the participants were females. The education level (27% with ≥12 years of school education), body mass index levels (80% overweight or obese), and the level of alcohol consumption were higher than those in ESTHER.

Table 1 Baseline characteristics of the study population

Table 2 presents the distribution of FI at baseline and various follow-ups. FI and the proportion of frail participants increased with follow-up in ESTHER and KORA-age. In KORA-age, the levels of FI were higher due to the inclusion of older participants.

Table 2 Frailty characteristics by follow-ups and subsets

Identification of frailty-related CpGs

In the discovery phase, conducted in subset I, 2220 CpGs passed the genome-wide significance threshold (FDR < 0.05). 65 CpGs located at 47 genes across 21 chromosomes were successfully replicated in the validation phase in subset II and were deemed as frailty-related methylation loci. When more comprehensively adjusted for further potential confounders using model 2, 53 CpGs were associated with FI, of which 43 CpGs were inversely associated with FI, with a decrease in FI (95% CI) per one standard deviation (SD) increase in methylation ranging from 0.92% units (0.06–1.77) to 3.07% units (1.05–5.08) (Supplementary Data 1).

Screening the literature for CpGs that were previously reported to be associated with frailty identified 15 CpGs from three studies25,27,28. Eight of them showed statistically significant associations with frailty in subset III (Supplementary Table 1).

Construction of eFRS

Using LASSO regression, the number of CpGs was further reduced from 65 to 20 because many CpGs were correlated with each other. The eFRS was constructed with these 20 selected CpGs using the equation: \({{{{\rm{e}}}}{FRS}}=0.204-0.209\times{{{\rm{cg}}}}00921350-0.100\times {{{\rm{cg}}}}01234420-0.016\times {{{\rm{cg}}}}02867102-0.293\times {{{\rm{cg}}}}03725309-0.146\times {{{\rm{cg}}}}04955914-0.084\times {{{\rm{cg}}}}07312601+0.158\times {{{\rm{cg}}}}07349348+0.137\times {{{\rm{cg}}}}08463758+0.248\times {{{\rm{cg}}}}10408430-0.101 \\ \times {{{\rm{cg}}}}11700584-0.049\times {{{\rm{cg}}}}12510708+0.064\times {{{\rm{cg}}}}13570972-0.057 \times {{{\rm{cg}}}}15058210-0.180\times {{{\rm{cg}}}}15380836-0.144\\ \times {{{\rm{cg}}}}17860366+0.315\times {{{\rm{cg}}}}17971578-0.075\times {{{\rm{cg}}}}18791730-0.176\times {{{\rm{cg}}}}19267254-0.025\times 21656937-0.077{{{\rm{cg}}}}\times {{{\rm{cg}}}}23458887\).

Functional annotation of sets of CpGs

We performed a literature search in PubMed to obtain information on the genes that contain the 65 frailty-related CpGs (Supplementary Data 2). The 20 CpGs included in eFRS is annotated to 17 Genes. These genes are involved in various frailty-related outcomes, including different types of cancer (i.e., HDAC4, CASP9, NFE2L3, RILP, STK40, HAO2, SNX20, MRTO4, EMILIN3, P4HA3), cardiovascular disease (i.e., HDAC4, CASP9, SARS), diabetes mellitus (i.e., RPL36AL, SARS), and Alzheimer’s Disease (i.e., RPL36AL).

Supplementary Fig. 1 presents pathway enrichment and PPI network analysis of target genes of frailty-related CpGs. The enrichment heatmap (Supplementary Fig. 1A) shows that pathways of these genes include cellular macromolecule biosynthetic process, non-small cell lung cancer, viral carcinogenesis, export from the cell, and viral process. Supplementary Fig. 1B shows the relationships between these enriched terms, where each node symbolized an enriched term and a similarity > 0.3 are connected by edges. With the application of MCODE algorithm29, three modules (regulation of kinase activity, positive regulation of kinase activity, and positive regulation of transferase activity) in the PPI network were seen (Supplementary Fig. 1C).

Supplementary Data 3 shows the results of the mQTL analysis on the CpGs included in eFRS. Altogether, we identified 3, 3, and 55 mQTLs where genetic variation was significantly associated (P  <  1 × 10−7) with the loci cg02867102, cg07312601, and cg11700584, respectively.

Association of eFRS with FI at baseline and each follow-up

Figure 1 shows the correlation matrix of age, eFRS and FI at baseline and various follow-up times in the two validation subsets. In ESTHER, eFRS was more strongly related to chronological age than FI at baseline (Spearman correlation coefficients, rSp, 0.443 and 0.267, respectively), but the correlation of FI with age increased with increasing length of follow-up (up to rSp = 0.397 at 11-year follow-up). Correlation coefficients between eFRS at baseline and FI at baseline and the various follow-up times were all in the range of 0.2–0.3, and correlation coefficients between the FI at various points of time were all in the range of 0.6–0.9. Essentially very similar correlations between eFRS and FI at baseline and the various follow-ups were seen with Pearson’s correlation coefficients (rSp similarly ranged from 0.2 to 0.3). Approximately 6%, 9%, 9%, 9%, and 5% variation of frailty of at baseline, 2-, 5-, 8-, and 11-year follow-up can be explained by eFRS. Furthermore, we observed a significant correlation of eFRS with AccAgeGrim (rSp = 0.566, P < 0.001). Replicated analyses in KORA-age showed consistent patterns and similar correlations. eFRS showed a slightly stronger correlation with age (rSp = 0.427) than FI at baseline (rSp = 0.411). Correlation coefficients between eFRS and FI at baseline and the two follow-ups similarly ranged between 0.2 and 0.3. About 6%, 7%, and 7% variation of frailty at baseline, 4-, and 8-year follow-up can be explained by the eFRS. All correlations in ESTHER and KORA-age were highly statistically significant (P < 0.001).

Fig. 1: Spearman correlation coefficients of age, epigenetic frailty risk score and frailty index at baseline and various follow-up times.
figure 1

All P-values < 0.001 (two-sided without adjustments). a subset III (ESTHER study); b subset IV (KORA-Age study). eFRS epigenetic frailty risk score, FI-BL baseline frailty index, FI-2Y 2-year follow-up frailty index, FI-4Y 4-year follow-up frailty index, FI-5Y 5-year follow-up frailty index, FI-8Y 8-year follow-up frailty index, FI-11Y 11-year follow-up frailty index.

Supplementary Table 2 provides covariate-adjusted associations of eFRS at baseline with FI at baseline and subsequent follow-up rounds in the two validation subsets. In ESTHER, a one SD increase in eFRS was associated with an increase in the FI by approximately 2 percent units (range from 1.55 to 2.16 percent units). The associations were fairly stable across follow-up rounds and various adjustment levels, even though they did no longer reach statistical significance for FI measured at the 11-year follow-up, given the smaller number of participants who were still included in this follow-up round. In KORA-age, very similar, albeit slightly weaker associations were observed between eFRS and FI at baseline and subsequent two follow-up times (a one SD increase in eFRS was associated with 1.27-1.97 percent unit increments of FI).

Supplementary Fig. 2 illustrates multivariable-adjusted ORs (95% CIs) for the association of FRS with being pre-frail or frail at baseline and each follow-up in the two validation subsets. In ESTHER, the associations of pre-frailty/frailty with highest (vs. lowest) quartiles of eFRS were statistically significant at 2-year follow-up and 5-year follow-up with ORs 3.53 (95% CI = 1.66–7.52) and 2.86 (95% CI = 1.18–6.92), respectively. Similarly, when assessing ORs as per SD of eFRS, eFRS were strongly associated with being pre-frail or frail at baseline (OR = 1.38, 95% CI = 1.05–1.82), 2-year follow-up (OR = 1.67, 95% CI = 1.26–2.23) and 5-year follow-up (OR = 1.38, 95% CI = 1.01–1.87). Patterns were less consistent and associations were not statistically significant with pre-frailty/frailty status at 8- and 11-year follow-up. In KORA-age, strong associations of eFRS with being pre-frail or frail at baseline were seen with multivariable-adjusted ORs for eFRS quartiles 3 and 4 compared to quartile 1 as 1.99 (95% CI = 1.23–3.21) and 2.77 (95% CI = 1.61–4.75), respectively. A one SD increase of eFRS was significantly associated with a 40% increased odds of being pre-frail or frail at baseline. When restricting the analysis to those who were non-frail at baseline, similar associations were seen between eFRS at baseline and risk of being pre-frail or frail at multiple follow-ups in both ESTHER and KORA-age (Supplementary Fig. 3).

Associations of eFRS with being frail at baseline and various follow-up times in the two validation subsets are shown in Fig. 2. In ESTHER, participants in eFRS quartile 4 were at strongly increased risk of being frail at baseline (OR = 7.98, 95% CI = 2.27–28.07), 2-year follow-up (OR = 2.93, 95% CI = 1.13–7.57), and 5-year follow-up (OR = 2.92, 95% CI = 1.12–7.67), compared to participants with eFRS in quartile 1. Multivariable adjusted ORs (95% CIs) of being frail at baseline, 2-year follow-up, 5-year follow-up were 1.94 (1.31–2.89), 1.64 (1.15–2.35), and 1.48 (1.07–2.04) per one SD increase of eFRS, respectively. In KORA-age, the associations with being frail were generally weaker but statistically significant associations were still observed between eFRS and frailty at 4-year follow-up. Similar associations of eFRS with being frail at follow-ups were seen in both ESTHER and KORA-age when the analysis was restricted to participants who were non-frail or pre-frail at baseline (Fig. 3).

Fig. 2: Association of epigenetic frailty risk score with being frail at follow-ups.
figure 2

Vertical ticks within the blue boxes and horizontal lines show the OR and 95% CI. Models were adjusted for age, sex, leukocyte composition, batch, baseline smoking status (never smoker, former smoker, current smoker), and alcohol consumption (g per day). eFRS epigenetic frailty risk score, OR odds ratio, CI confidence interval, SD standard deviation.

Fig. 3: Association of epigenetic frailty risk score with being frail among participants who were being non-frail or pre-frail at baseline.
figure 3

Vertical ticks within the blue boxes and horizontal lines show the OR and 95% CI. Models were adjusted for age, sex, leukocyte composition, batch, baseline smoking status (never smoker, former smoker, current smoker), and alcohol consumption (g per day). eFRS epigenetic frailty risk score, OR odds ratio, CI confidence interval, SD standard deviation.

We further conducted sensitivity analyses with models that adjusted for smoking status using the Maas 13-CpGs model rather than self-reported smoking status in ESTHER. The positive associations were highly consistent between both types of models (Supplementary Table 3).

Predictive performance of eFRS for frailty at baseline and follow-ups

Table 3 displays the individual and joint predictive performance of age, sex, and eFRS for being frail in the two validation subsets. In ESTHER, eFRS presented comparable predictive performances at baseline and follow-ups with the combination of age and sex. When adding FRS to models including age and sex, the predictive performance for prediction of FI at baseline and 5-year follow-up was significantly improved (from 0.629 to 0.711 at baseline and from 0.650 to 0.680 at 5-year follow-up). In the substantially older cohort of KORA-age, the predictive performance of age and sex were generally much higher and adding eFRS to models including age and sex only slightly increased predictive performance.

Table 3 AUC (95% CI) of chronological age, sex, and epigenetic frailty risk score in prediction of being frail at baseline and each follow-up

Association of AccAgeGrim with FI at baseline and each follow-up

AccAgeGrim showed similar associations with FI at baseline and subsequent follow-up times in ESTHER as eFRS (a one SD increase in AccAgeGrim was associated with 1.53–2.17% unit increments of FI, Supplementary Table 4). Similar associations of AccAgeGrim with being pre-frail or frail were also seen at baseline, 2-year follow-up, and 5-year follow-up (Supplementary Fig. 4). However, a significant relationship between AccAgeGrim with being frail was only seen at 5-year follow-up (Supplementary Fig. 5).


In this large-scale EWAS conducted in a population-based cohort of older adults, we identified 65 frailty-related CpGs located at 47 genes across 21 chromosomes based on DNA from whole blood, 20 of which were selected to construct the eFRS. To our knowledge, this is the first EWAS-derived eFRS, and it was found to be significantly associated with frailty at multiple points of time during long-term follow-up. These findings were validated in samples that did not overlap with samples from which the eFRS was derived and were also confirmed in an independent cohort, which demonstrated the ability of the eFRS to predict both prevalence and longer-term incidence of frailty.

To our knowledge, few studies have specifically assessed the relationship between DNAm patterns and frailty17,18. Only one previous EWAS on frailty defined according to the Fried criteria among 70-year-old people has been conducted and identified one CpG (cg18314882 on chromosome 8 in the MAF1 gene)25. However, the single CpG is not among the CpGs identified in our study. The differences in identified CpGs within corresponding gene clusters between the two studies might be due to the different measurements for frailty. Bellizzi et al. 17 defined frailty status using cluster analysis and reported that global DNA methylation was lower in frail individuals than in the non-frail participants. This is in line with our study, which observed that 43 out of 53 CpGs were inversely associated with FI in an adjusted model. Another study by Collerton et al. 18 using the Fried frailty definition in a cohort with 85 years old participants found that the genome-wide methylation was not associated with frailty status. Recently, several DNAm-based algorithms, i.e. Hannum’s blood-specific clock19 and GrimAge23, have been derived to estimate ‘epigenetic age’ and were suggested to closely correlate with frailty24,25,30.

The identified frailty-related CpGs highlight several genes or genetic regions and might promote further investigation of the biological mechanism of frailty. Glycolytic glyceraldehyde-3-phosphate dehydrogenase (GAPDH) contains two of the frailty-related CpGs (cg00252813 and cg02519286) and has been found to be likely related to the pathogeneses of amyotrophic lateral sclerosis and Huntington’s disease31,32, various forms of cancers33, and neurodegenerative disorders33. Three CpGs (cg01406381, cg21766592, and cg25607249) are located at Solute Carrier Family 1 Member 5 (SLC1A5), a high-affinity l-glutamine transporter that was highly expressed in several cancer types34,35,36. The remaining CpGs and correspondent genes, such as SARS, SCRN1, PIK3CD, RUNX1, NCAPH, HDAC4, and VAC14, were also observed to be associated with multiple diseases. The roles of the identified CpGs in the development and/or progression of diseases may explain the association of eFRS with frailty.

We derived a DNAm-based algorithm for predicting frailty, measured by the FI, and this eFRS showed robust predictive performance for frailty in a comprehensive validation chain. Of the 20 CpGs included in the eFRS, 8 CpGs map to intergenic regions with unknown function, and the other 12 CpGs are annotated to genes involved in common chronic diseases, including coronary artery disease, stroke, type 2 diabetes mellitus, and multiple types of cancers37,38,39,40. Frailty has been shown to be strongly associated with a broad range of adverse health outcomes, such as worsening mobility4, falls38, fracture41, and mortality42. The shared linkage with morbidity may therefore explain the association between eFRS and frailty. Moreover, the associations of 12 CpGs and common chronic diseases also support the potential capacity of eFRS for predicting adverse health outcomes and even fatal outcomes. Given that frailty may potentially be preventable up to some possible point of no return, the potential predictive performance of eFRS might be helpful for designing, implementing, and evaluating interventions aimed to prevent or slow down the development of frailty.

Several accurate composite algorithms of chronological age or lifespan have been built based on DNAm, including the Mortality Risk Score (MRscore) by Zhang et al.21, PhenoAge by Levine et al.22, and GrimAge by Lu et al.23. These DNAm-based algorithms were demonstrated to be robust predictors of mortality, lifespan, and healthspan21,22,23. In our study, a high correlation between eFRS and GrimAge was observed. Such correlation is not unexpected given that GrimAge includes DNAm-based surrogate biomarkers for health-related plasma proteins and smoking pack-years23 and the majority of the CpGs used to construct eFRS are annotated to health-related conditions. When comparing CpGs included in eFRS and these aging algorithms, three overlapped with GrimAge (cg02867102, cg07312601, and cg11700584). Moreover, cg02867102 and cg11700584 were significant signals in previous EWASs on smoking and aging43,44,45, which points to the major role of smoking in adverse health outcomes in old age including frailty.

The current study has several strengths including the large-scale random samples from the general population, the long-term follow-up with repeated measurements of frailty at multiple points of time during follow-up, and replication in a completely independent cohort. However, it is also necessary to consider the limitations of the present study when interpreting the results. First, the deficits for the measurement of FI were self-reported which might have led to potential reporting bias. However, in the ESTHER cohort, self-reported diseases which account for a large share of the self-reported deficits were found to be in high agreement with medical records in careful validation steps carried out at each follow-up. Second, the recruitment of the participants was conducted during a voluntary health check-up. Therefore, the participants of the ESTHER cohort might not be a fully representative sample of the general population. Nevertheless, the prevalence of risk factors and chronic diseases has been found to be comparable to those observed in the corresponding age range in a representative health survey from Germany which took place at the same time as ESTHER baseline recruitment did46. Third, in the discovery phase, only the CpGs included in both the 450K and EPIC array were included in the EWAS. Potential frailty-related CpGs exclusively covered by the EPIC array might have been missed.

In conclusion, in this EWAS on frailty conducted in a large population-based cohort, we identified 65 frailty-related CpGs and derived an epigenetic algorithm of frailty based on 20 CpGs. The DNAm-based eFRS was demonstrated to be strongly associated with long-term frailty and validated both internally in an independent subset and in an external population-based cohort. Further studies should investigate the associations of eFRS with additional health outcomes and its potential use for earlier detection of frailty risk and designing, monitoring, and evaluating prevention measures.


Study population and study design

The epigenome-wide association study (EWAS), including derivation and internal validation, is based on the ESTHER study, an ongoing prospective, population-based cohort study of older adults conducted in the federal state of Saarland, Germany. Details of the study design and population have been reported previously21,47. Briefly, men and women aged 50–75 undergoing a general health check-up in Saarland, a small federal state in southwestern Germany, from 2000 to 2002 were eligible for participation. At the time of recruitment, these general health exams were routinely offered every 2 years to people aged 35 years and older by their general practitioners (GPs). Overall, 9940 adults aged 50–75 years were recruited by their GPs and were followed by participant and GP questionnaires after 2, 5, 8, and 11 years During baseline enrollment and each follow-up, standardized questionnaires for participants and their GPs were used to collect extensive basic data on sociodemographic characteristics, risk factors, lifestyle factors, and medical history. Moreover, whole blood samples were collected at baseline from which DNA was extracted. The ESTHER cohort was found to be representative of the older German population with respect to major sociodemographic, lifestyle, and medical characteristics46.

Three subsets were randomly selected in three different rounds of methylation analysis from the ESTHER cohort for epigenome-wide DNA methylation data measurements. Subsets I and II included 998 and 741 randomly selected subjects for whom DNAm measurements were performed in August 2018 and July 2019 for various projects47,48. Subset III has a nested case-control design for mortality-related methylation signatures and 548 participants were randomly selected as the subcohort irrespective of death status21. Subset I was used as a discovery panel in the epigenome-wide screening for CpG sites related to frailty at baseline. Subset II was utilized as the first internal validation panel to further select CpG sites to construct a methylation-based epigenetic frailty risk score (eFRS). The associations of eFRS with FI at baseline and during 11 years of follow-up were further validated in subset III.

Replication in an independent cohort was performed in the KORA-Age study, a population-based cohort study conducted in the region of Augsburg, Southern Germany, whose study population and design have been described in detail previously49. In 2008 and 2009, 4123 participants aged ≥65 years from four population representative surveys conducted between 1984 and 2001 were enrolled and completed the baseline assessments. Then, an age- and sex-stratified random sub-sample (n = 1079) additionally completed medical examinations and was followed up in 2012 and 2016. Methylation data were available for 1010 participants from this random sub-samples (subset IV in the current analysis). The baseline, 4- and 8-year follow-up data of these 1010 participants were used to validate the associations of eFRS with FI in an independent cohort.

The ESTHER study was approved by the ethics committees of the medical faculty of Heidelberg University and of the medical board of the state of Saarland. The KORA-Age study was approved by the Ethics Committee of the Bavarian Medical Association (EK No. 08064). All ESTHER and KORA-Age participants provided written informed consent.

DNA methylation assessment

DNAm profiles of subsets I and II from ESTHER, and subset IV from KORA-Age were assessed with the Infinium Methylation EPIC BeadChip kit (EPIC, Illumina, Inc., San Diego, CA, USA), and DNAm profiles of subset III from ESTHER were determined with the earlier introduced Infinium Human Methylation450K BeadChip Assay (450K, Illumina, Inc., San Diego, CA, USA). Details of the methylation analysis in the ESTHER study have been reported previously48. Genome-wide DNAm profiling was conducted by the Genomics and Proteomics Core Facility of the German Cancer Research Center according to the manufacturer’s protocol. In data pre-processing, signals of probes with detection P-value > 0.01, missing values >10%, and probes targeting the X and Y chromosomes were excluded. Only the CpGs that are covered by both the 450K array and EPIC array were included in the EWAS. In the KORA-Age study, data quality control and pre-processing were conducted following the CPACOR pipeline50. Probes with detection P-value > 0.01 and missing values >5% were removed. Quantile normalization was then performed following a coherent approach as described by Lehne et al.50. In addition, leukocyte composition was estimated in ESTHER and KORA-Age using the algorithms of Houseman et al.51 for adjustment.

Frailty assessment

Following a standard procedure26,52, frailty in ESTHER and KORA-Age was assessed using a frailty index (FI), which is defined as the proportion of presented deficits of a predefined list of all deficits. As previously described, 31 and 33 deficits were selected for the assessment of FI in ESTHER48 and KORA-Age53, respectively. The lists of deficits used to define the FI in ESTHER and KORA-Age are presented in Supplementary Tables 5 and 6, respectively. Distributions of FI and proportions of the status of deficits included in the FI calculation at baseline and each follow-up in ESTHER are shown in Supplementary Fig. 6 and Supplementary Data 4, respectively. With reference to previous studies13,54, participants were deemed frail if their FI was ≥0.250, pre-frail if their FI was >0.100 and <0.250, and non-frail if their FI was ≤0.100.

Three-phase procedure to construct eFRS

The DNAm-based eFRS was developed using a three-phase process. A flowchart of the three-phase procedure is presented in Fig. 4. In the discovery phase, an epigenome-wide screening for frailty-related CpGs was carried out in subset I with baseline FI as a dependent variable using linear mixed regression models. The linear mixed regression models included methylation β-values as explanatory variables and adjustment for leukocyte composition as a fixed effect and batch as a random effect. After correcting for multiple testing using the Benjamini–Hochberg method55, CpGs that reached genome-wide significance [false discovery rate (FDR) < 0.05] were validated in subset II. Similar to the discovery phase, linear mixed regression models were conducted with baseline FI as the dependent variable and additionally adjusted for age and sex. Again, CpGs with FDRs <0.05 were selected and were deemed as frailty-related loci. Then, we applied LASSO regression with a regularization parameter chosen by ten-fold cross-validation following the ‘one standard error’ rule to select candidates among identified CpGs and construct eFRS.

Fig. 4: Study design and analysis flowchart.
figure 4

FDR false discovery rate, EWAS epigenome-wide association study.

Functional annotation of sets of CpGs

Frailty-related CpGs were annotated to genes with the information provided in the Illumina manifest file (, which is based on the University of California Santa Cruz (UCSC) and RefGene. To analyze the underlying roles of these genes, we used the Metascape online tool ( to perform Gene Ontology (GO) analysis, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, and protein-protein interaction (PPI) network. Kappa scores were used as the similarity metric when performing hierarchical clustering on the enriched terms and sub-trees with a similarity of >0.3 were considered a cluster. In order to link genetic variants to variations of CpGs included in eFRS, methylation quantitative trait loci (mQTL) analysis was applied for the preliminary association analysis of single nucleotide polymorphism (SNP) sites with CpG sites. SNP-DNAm site pairs with a maximum distance of 1 Mb were tested. The analysis of mQTL was performed using the online tool mQTLdb (

Statistical analysis

Linear mixed regression and LASSO regression were performed to identify frailty-related CpGs and the eFRS in subsets I and II as aforementioned. The correlations of chronological age, eFRS, and FI at baseline and various follow-ups were assessed using Spearman correlation coefficients in subset III and subset IV. We also evaluated the correlation of eFRS and DNAm-based algorithms of aging in subset III, the AccAgeGrim (GrimAge age acceleration)23. The associations of eFRS with FI at baseline and various follow-ups were assessed in the two validation subsets by two linear mixed regression models that included age, sex, and leukocyte proportions as fixed effects, and batch as random effects (model 1). In further analyses, smoking status (never smoker, former smoker, current smoker) and alcohol consumption (grams per day) were additionally included as fixed effects (model 2). By categorizing baseline and various follow-up frailty statuses into two groups (non-frail versus pre-frail and frail; and non-frail and pre-frail versus frail), the associations were also estimated using a logistic regression model adjusting for all variables in model 2. In addition to logistic regression models including eFRS as a continuous variable, we also categorized eFRS according to quartiles and run logistic regression models including eFRS as a categorical variable. FDR was also applied for multiple comparisons among results at baseline and various follow-ups and P  <  0.05 was considered statistical significance after multiple testing.

To assess potential additional variance from passive smoking, we conducted sensitivity analyses with additional logistic regression models that controlled for smoking status using a DNAm-based proxy (the Maas 13-CpGs model)58 rather than self-reported smoking status. Next, we assessed the association of AccAgeGrim with FI at baseline and various follow-ups.

Furthermore, we conducted subgroup analyses for the associations of the eFRS with frailty in which only non-frail participants at baseline were included for the outcome being pre-frail or frail, and only non-frail or pre-frail participants at baseline were included for the outcome being frail using model 2 for adjustment as described above.

We also systematically screened PubMed for previously reported CpGs associated with frailty and assessed the associations of these CpGs with frailty by linear-mixed regression models using model 2 for adjustment as described above.

The LASSO regression analyses were conducted using R programming (R Foundation of Statistical Computing, Vienna, Austria, version 4.0.1) package ‘glmnet (version 4.1-4)’59. All the other statistical analyses in the ESTHER study were carried out in SAS 9.4 (SAS Institute, Cary, NC) and the analyses in the KORA-age study were conducted in R (version 4.0.1).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.