Clonal myelopoiesis promotes adverse outcomes in chronic kidney disease

We sought to determine the relationship between age-related clonal hematopoiesis (CH) and chronic kidney disease (CKD). CH, defined as mosaic chromosome abnormalities (mCA) and/or driver mutations was identified in 5449 (2.9%) eligible UK Biobank participants (n = 190,487 median age = 58 years). CH was negatively associated with glomerular filtration rate estimated from cystatin-C (eGFR.cys; β = −0.75, P = 2.37 × 10–4), but not with eGFR estimated from creatinine, and was specifically associated with CKD defined by eGFR.cys < 60 (OR = 1.02, P = 8.44 × 10–8). In participants without prevalent myeloid neoplasms, eGFR.cys was associated with myeloid mCA (n = 148, β = −3.36, P = 0.01) and somatic driver mutations (n = 3241, β = −1.08, P = 6.25 × 10–5) associated with myeloid neoplasia (myeloid CH), specifically mutations in CBL, TET2, JAK2, PPM1D and GNB1 but not DNMT3A or ASXL1. In participants with no history of cardiovascular disease or myeloid neoplasms, myeloid CH increased the risk of adverse outcomes in CKD (HR = 1.6, P = 0.002) compared to those without myeloid CH. Mendelian randomisation analysis provided suggestive evidence for a causal relationship between CH and CKD (P = 0.03). We conclude that CH, and specifically myeloid CH, is associated with CKD defined by eGFR.cys. Myeloid CH promotes adverse outcomes in CKD, highlighting the importance of the interaction between intrinsic and extrinsic factors to define the health risk associated with CH.


INTRODUCTION
Clonal hematopoiesis (CH) is an age-related phenomenon characterised by a gradual replacement of polyclonal leucocytes by one or more clones marked by somatic mutations [1,2] or mosaic chromosomal alterations (mCA) [3,4]. CH is associated with an elevated relative risk of developing haematological malignancies compared to age and sex-matched controls without CH [5], and also an elevated risk of developing non-malignant, immune and inflammatory disorders [6,7] such as atherosclerotic cardiovascular disease (CVD) [2,8], chronic obstructive pulmonary disease [9] and premature menopause [10].
Chronic kidney disease (CKD) is a common worldwide health problem defined by low estimated glomerular filtration rate (eGFR) and/or elevated urine albumin to creatinine ratio (uACR) [11]. Patients with CKD experience a gradual and progressive loss of kidney function, but only small minority progress to end-stage kidney disease (ESKD) and require kidney replacement therapy. The majority of cases are at an early stage of the disease process [12], which remains incompletely defined due to variation in eGFR and albuminuria measurements [13][14][15][16][17].
Like CH, CKD is associated with an elevated risk of CVD and mortality [18]. Atherosclerotic risk factors for CVD, such as diabetes, smoking, hypertension and dyslipidemia, are prevalent in individuals with CKD, but there is an excess risk of CVD associated with CKD that is over and above that captured by atherosclerotic risk factors alone. In addition to sharing some risk factors, CH, CKD and CVD are characterised by persistent lowgrade inflammation [19][20][21][22]. however, a specific relationship between CH and CKD has not been defined. In this study, we sought to assess the relationship between CH and CKD in UK Biobank (UKB), an ongoing, prospective UK cohort study of approximately 500,000 community-dwelling participants aged 40-69 years when recruited between 2006 and 2010.

Study cohort
UKB participants provided comprehensive demographic, psychosocial and medical information during an initial assessment along with baseline blood and urine samples for genomic, biochemical, and other laboratory tests. Long-term follow-up was provided via linked medical records [23]. We focused on participants with both genome wide single nucleotide polymorphism (SNP) array and whole exome sequence (WES) data (n = 200,361; median age = 58 years, median follow up = 11 years) [24]. All participants provided informed consent according to the Declaration of Helsinki; UKB received ethical approval from the North West multi-centre Research Ethics Committee (REC reference 11/NW/0382).

Identification of CH
We previously described the identification of myeloid, lymphoid or other mCA in UKB from SNP array data [25]. The process for identifying likely somatic driver variants is described in the Supplementary Methods. Mutated genes were defined as myeloid-neoplasia related ('myeloid') according to previously published criteria [8], other genes were defined as 'lymphoid'. The complete list of unique putative somatic driver variants (n = 1611) is shown in Supplementary Table 1. CH was defined as participants with any mCA and/or any somatic driver mutation; myeloid CH was defined as the presence of myeloid mCA and/or a myeloid somatic driver mutation(s); lymphoid was defined by lymphoid mCA and/or lymphoid mutations, without myeloid mutations or myeloid mCA [8].

Kidney function
The eGFR in units of mL/min/1.73 m 2 was calculated in R using the Nephro package [26] and three different formulae as defined by the Chronic Kidney Disease Epidemiology Collaboration: creatinine (UKB field: 30700, eGFR.creat), cystatin-C (UKB field: 30720, eGFR.cys) or creatinine and cystatin-C (eGFR.creat.cys) [27]. The creatinine-based scores included ethnicity as recorded in UKB field: 21000. With respect to CKD, patients were considered as healthy (≥90), mild (≥60 and <90), moderate (≥15 and <60) or end stage (<15) for each eGFR score [27]. In addition, uACR in mg/ mmol was calculated as a further measure of kidney disease using albumin in urine (UKB field: 30,500) and creatinine in urine (UKB field: 30,510). Shrunken pore syndrome (SPS) is typically defined by an eGFR.cys/eGFR. creat ratio of ≤0.6 in the absence of factors that interfere with cystatin C or creatinine measurement, such as high muscle mass [28]. We defined potential SPS as an eGFR.cys/eGFR.creat ratio of ≤0.6.

Discovery and validation cohorts
To investigate the relationship between CKD and either CH or myeloid neoplasia, the data were split randomly into equally sized discovery and validation cohorts. Results from the discovery and validation cohorts were combined using a fixed effect inverse variance weighted meta-analysis using STATA version 16 (StataCorp LLC, College Station, TX) and Cochran's Q test to measure heterogeneity.

The relationship between CH and CKD
To study the association between CH and CKD we excluded 10,144 participants due to (i) missing creatinine or cystatin-C data (n = 9913), (ii) any form of ESKD (n = 231) that was either diagnosed before study entry according to relevant ICD10 codes or interventions and procedures as detailed in the Supplementary Methods, or if any of the three eGFR scores was <15. Participants with ESKD were excluded due to the possibility of dialysis and/or erythropoietin treatment that would influence their eGFR scores and blood counts, and because the relationship between ESKD and CVD is well characterised. Individuals with diabetes or hypertension were not excluded. The relationship between CH and CKD was tested using multivariable logistic regression in R where CKD was used as the dependant and CH as a binary predictor. CKD was coded into cases (1) and controls (0) using the eGFR thresholds of <60 or ≥60, respectively [29] and the analysis was repeated for each eGFR score (eGFR.creat, eGFR.cys, and eGFR.creat.cys). Logistic regressions were adjusted for potential confounding variables: age, sex, smoking status, systolic blood pressure, diastolic blood pressure, cholesterol, high-density lipoprotein (HDL), lowdensity lipoprotein (LDL), body mass index (BMI), glycated haemoglobin (HbA1c) as an indicator of diabetes, high-sensitivity C-reactive protein (hs-CRP) as a marker of inflammation and the first 10 genetic principal components. Effect sizes were reported as odds ratios (OR) with 95% confidence intervals (CI). The relationship between eGFR scores and CH was tested using multivariable linear regression in R where eGFR status was treated as the dependant and CH as a binary predictor and correcting for same confounding variables. UKB did not include follow-up biochemical assessments for the great majority of participants and so incident ESKD was inferred from recorded hospital episodes as indicated above. Prevalent and incident myeloid neoplasia are defined in the Supplementary Methods.

Mendelian randomisation (MR)
MR was used to assess the possibility of a causal relationship between CH and CKD by using germline SNPs associated with the development of CH as instrumental variables. Following the STROBE guidelines [30], we investigated the use of two significance thresholds for selecting instrumental variables based on their association with CH defined by driver somatic mutations in a subset of the TOPMed cohort (n = 65,405 total participants; n = 3831 CHIP cases) [31]. The first used a modest threshold (P < 0.001) to select 380 SNPs with MAF ≥ 0.01 and SNPs clumped (r 2 > 0.001, within 10 Mb) for a liberal analysis which aimed to investigate the evidence for a true null relationship. In the second, conservative, analysis we used a stricter threshold (P < 1 × 10 -5 ) to select a subset of 28 SNPs that were strongly associated with CH and would provide more robust evidence of causality. The effect sizes on CKD were obtained from a meta-analysis of 60 GWAS from the CKDgene consortium (n = 625,219, including 64,164 CKD cases) [32]. We estimated that~2.4% of individuals from the TOPMed cohort are also included in the CKDgene consortium which could inflate false-positive findings [33]. To mitigate against this, we performed a sensitivity analysis using the estimated effect sizes in a subset of patients from the CKDgene cohort with European ancestry (n = 480,698, including 41,395 cases). Detailed information for the SNPs used in both analyses is shown in Supplementary Table 2. MR was performed using the TwoSamplesMR package in R [34] to apply the Robust Adjusted Profile Score (MR-RAPS) methodology which enables the use of weak instrumental variables, is robust to pleiotropy and considers measurement error in the exposure estimate [35]. Additional sensitivity analyses were performed using methods that test the different assumptions of MR, specifically the inverse-variance weighted (IVW) method which performs a meta-analysis for the estimates of the instrumental variants [36], the MR-Egger method which uses the average pleiotropic effect as the intercept that allow the use of instrumental variables with pleiotropic effects [37], and the weighted median method which allows for a subset of instrumental variables to be invalid [38].

Prediction of adverse outcomes
A Cox proportional hazard model (survival package in R) [39] was used to determine if the risk of adverse outcome was associated with CH, CKD defined by each eGFR score or the urine albumin-to-creatinine ratio (uACR). Adverse outcomes were defined by a composite endpoint of either death (UKB data release April 2020), myocardial infarction (MI, field 40002, February 2018) or stroke (field 40006, February 2018). Participants who suffered MI or stroke before entering UKB were excluded. Follow-up times were calculated using the lubridate package [40] to determine the duration between study entry and the earliest of date of death (UKB field 40000), date of MI (UKB field 40002) or date of stroke (UKB field 40006). Patients without an adverse outcome were censored at the date of last follow-up for MI and stroke or the date of they were lost to follow-up (UKB field 191). Univariate survival analyses were performed for all traditional risk factors (age, sex, smoking status, LDL, HDL, cholesterol, HbA1c, BMI, hs-CRP, systolic and diastolic blood pressure). Variables with P < 0.2 were entered into a multivariate survival analysis in a backward stepwise manner and retained if they reached nominal significance (P < 0.05).
To assess the potential for a non-linear relationship between eGFR scores and adverse outcomes, we used a restricted cubic spline function [41] to transform and segment the eGFR scores. Separate curves were fitted to each segment to generate a smooth fitted curve. The method was used to transform each eGFR score using the rms package in R [42] and default values for the number of knots (n = 5) and degrees of freedom (n = 4). The regression included the covariates described above. The adjusted spline values were plotted with 95% CI.
Receiver operating characteristic curves (ROC) and area under the curve (AUC) metrics [43] were used to evaluate the prediction accuracy of the multivariable survival models. AUCs were reported for three pairs of prediction models with and without CH: (i) traditional risk factors, (ii) traditional risk factors and eGFR.cys and (iii) traditional risk factors and uACR. Where relevant, P values for all tests were corrected for multiple testing using the false discovery rate (FDR).

Definition and breakdown on CH in UKB
In a previous analysis of SNP array data from the entire UKB cohort, we identified 8203 mCA larger than 2 Mb in 5040 participants [44]. In the subset of participants with available WES data (n = 200,631), we identified 3085 mCA in 2016 participants, of which 197 (185 participants) were associated with myeloid neoplasms and 278 (237 participants) were associated with lymphoid neoplasms. Analysis of the WES data identified 4137 putative somatic driver mutations (1611 unique variants) in 3863 participants (Supplementary Table 3). In total, 5718 (2.9%) participants had CH defined by one or more mCA and/or driver mutations and 194,913 participants were considered as CH-free controls. For further analysis, these data were split randomly into discovery and validation cohorts (Table 1 and Supplementary  Table 4).

Assessment of the relationship between CH and CKD
We compared eGFR.cys, eGFR.creat and eGFR.creat.cys in participants with or without CH after excluding 10,144 ineligible participants with pre-existing ESKD or missing biochemistry measures. After excluding ineligible cases, the discovery cohort consisted of 2735 participants with CH and 92,457 CH-free controls, and the validation cohort compromised of 2714 participants with CH and 92,581 CH-free controls. As expected, the cystatin-C-derived eGFR score was lower than the scores that included creatine [45] and consequently fewer participants were determined to have moderate CKD, defined by an eGFR score between 15 and 60, according to eGFR.creat (n = 4194) and eGFR. creat.cys (n = 4433) compared with eGFR.cys (n = 8304). The median for all three eGFR scores was lower in participants with CH compared to those without CH (Fig. 1) and the median uACR was higher (1.2 with CH versus 1.05 without CH; P < 0.001) indicating impairment of kidney function in association with CH. Participants with lower eGFR scores tended to be older, male, smokers, with low HDL, high LDL, high BMI, high systolic and diastolic blood pressure, and high albuminuria (Supplementary Table 4).
We assessed the relationship between myeloid CH and the risk of developing ESKD in participants without prevalent myeloid neoplasms or prior ESKD. Myeloid CH (n = 3330) was weakly but significantly associated with ESKD incidence (n = 307, β = 0.002, P = 0.006). Specifically, 0.33% (11 out of 3330) of participants with myeloid CH developed ESKD after study entry compared with 0.16% of controls (296 of 184,811).

MR analysis to test causal effect of CH on kidney function
The possibility of a causal relationship between CH and kidney function was assessed using MR. In a liberal analysis, 380 independent SNPs associated with CH at (P < 0.001) [31] were used to estimate the effect of CH on CKD (Supplementary Table 2).  To test the different assumptions and scenarios, several MR methods were used as recommended and the results corrected for multiple testing [33]. Only the MR-RAPS method, which is adapted to test weak instrumental variables as applicable to our study, identified a positive causal relationship [OR = 1.01; P = 0.029]. However, this relationship failed to reach significance (P = 0.81) in a more conservative analysis that applied stricter threshold (P < 1 × 10 -5 ) to select 28 SNPs associated with CH (Fig. 3). Due to the potential limited overlap between cohorts used to select instrumental variables, we performed a sensitivity analysis using a subset of samples with European American ancestry which yielded similar results for the causal association between CH and CKD [OR = 1.02; P = 0.029]. Detailed results are presented in Supplementary Table 9.
Prediction of adverse outcomes by myeloid CH in CKD As expected, established risk factors (myeloid CH, age, sex, ethnicity, smoking status, cholesterol, HbA1C, HDL, LDL, blood pressure, BMI, uACR, hs-CRP and eGFR scores) were associated on univariate analysis with an adverse outcome as defined by a composite endpoint of death, MI, or stroke (Supplementary  Table 10).
To understand the influence of myeloid CH and CKD on adverse outcomes, we focused on participants without prevalent myeloid neoplasms (n = 320) or any prior history of CVD (n = 8459). Initially, Cox proportional-hazard analysis was used to identify risk factors unrelated to CH and CKD (Supplementary Table 11), and then these factors were added into the model. To determine which of the three eGFR scores was most appropriate to use in the model, we tested the linearity of each score in relation to outcome using a restricted cubic spline test, as described previously [45]. Although all three scores were associated with adverse outcomes, eGFR.cys was more linear and negative compared to the scores that used creatinine in both the discovery and validation cohorts ( Supplementary Fig. 2). Focusing on eGFR.cys, the risk of adverse outcomes was higher in subjects who had CKD (HR = 1.9, n = 1180/6970) compared to CKD free participants (n = 8295/ 172,857; P = 8.4 × 10 -65 ) (Supplementary Table 12). The risk of adverse outcomes was estimated to be 1.56-fold higher (P = 1.4 × 10 -11 ) in cases with myeloid CH (n = 338/3078) compared to myeloid CH-free participants (n = 9137/176,749). Testing each component of adverse outcomes confirmed the previously reported features of UKB cohort [46] that CH was associated with all-cause mortality (HR = 1.91, P = 2.5 × 10 -10 ) but did not reach significance for MI (HR = 1.13, P = 0.38) or stroke (HR = 1.28, P = 0.15) considered independently, in accordance with previous findings [44,47] (Supplementary Table 12).
ROC analysis was used to assess the predictiveness of multivariable models that incorporated myeloid CH, eGFR.cys and uACR. The baseline model consisting of age, sex, smoking status, HDL, HbA1c, systolic blood pressure, hs-CRP, BMI (Supplementary Table 12) and corrected for 10 genetic principal components had an AUC of 73.3% (72.8-73.9%). The addition of myeloid CH as a binary factor or eGFR.cys as a continuous trait improved the predictiveness of the model to an AUC of 73.4% and 74%, respectively, and including both further improved the AUC to 74.1% (73.5-74.6%), with very similar results achieved in both the discovery and validation cohorts (Fig. 4, Supplementary  Table 13).
To further investigate the relationship between CH and adverse outcome in participants with CKD, we stratified the cohort (excluding prior CVD and prevalent myeloid malignancies), into participants with moderate renal impairment (eGFR.cys ≥15 to <60), mild impairment (eGFR.cys ≥60 to <90) and normal kidney function (eGFR.cys ≥90). We then tested the effect of CH in each subset using Kaplan-Meier survival analysis. CH increased the risk of adverse outcome in all groups but was particularly marked (HR = 1.6, 95% CI 1.2-2.14, P = 0.002) for participants with moderate CKD (n = 59/226 with myeloid CH compared to n = 1121/6744 without myeloid CH) (Figs. 5 and 6; Supplementary Table 14). Much of the risk of adverse outcomes was related to incident myeloid neoplasms which were diagnosed in 19 participants at a median of 3.6 years after study entry. Of these, 11 (58%) had adverse outcomes in comparison to 48/207 (23%) who did not develop a myeloid neoplasm during the study period. Excluding the incident cases reduced but did not eliminate the risk of adverse outcomes (HR = 1.4, P = 0.05).

DISCUSSION
In this study we identified that CH, and specifically myeloid CH, is associated with CKD. The association was not seen with all markers of CH and, strikingly, not with mutations in DNMT3A or ASXL1, two of the most common drivers of clonality, although there was an overall association with clone size. These findings confirm previous observations that that not all CH is equal [9,25,31], as well as the importance of having sufficiently large studies to understand the granularity of CH with respect to clinical outcomes.
We found that myeloid CH is specifically associated with eGFR. cys but not eGFR.creat and only marginally with eGFR.cys.creat. Similarly, recent studies have reported the superior utility of eGFR. cys in predicting the incidence of CVD and mortality in patients with CKD [45,48,49]. In the UK, the cost to measure cystatin C is 10-fold higher than that to measure serum creatinine, and consequently eGFR.creat is widely used for initial assessment of possible CKD. Although eGFR.cys is recommended to confirm CKD, this is not believed to be common practice, at least in the UK [45].
Our findings provide further weight to the argument that eGFR.cys is more informative than eGFR.creat to define CKD. The finding that myeloid CH is associated with eGFR.cys also provides further evidence for the importance of chronic inflammation in CH-related disorders. Levels of cystatin C correlate generally with oxidative stress and inflammation [45,50], a well-recognized feature of CKD [19] that is also associated with an elevated risk of development of CVD [51,52]. Other biomarkers of chronic inflammation have been associated with CH, e.g. C-reactive protein and IL-6. [20,31] CH predisposes to haematological malignancies, particularly myeloid neoplasms [5], and both CKD and chronic inflammation have been described as features of myeloproliferative neoplasms [53,54]. Our data  show that myeloid CH increases the risk of adverse outcomes in the context of CKD and that this increase is only partly explained by incident myeloid neoplasms or SPS, a recently described phenomenon that may be observed in both children or adults with normal or reduced eGFR and is associated with increased mortality and morbidity in a variety of settings [28]. Although our analysis was corrected for hs-CRP, it is possible that part of the increase in adverse outcomes is due to chronic inflammation induced by CH. MR uses genetic variation as a natural experiment to estimate causality in observational data [33] and has, for example, been used to detect a causal effect of cystatin C on risk of stroke [55]. Our initial analysis of 380 SNPs that predispose to CH provided suggestive evidence for a causal relationship between CH and CKD (P = 0.03), but this link was not supported by a more conservative analysis of 28 SNPs that are more strongly associated with CH. Given that two of the most common CH genes (DNMT3A and ASXL1) were not associated with CKD, and that the 380 SNPs only explain 3.6% of the heritability of CH [31], the use of MR in this context is clearly challenging, and may be compounded the possibility of other factors such as horizontal pleiotropy but these concerns are partly mitigated by the large sample size of the GWAS used for CH and CKD.
In summary, the role of CH in the pathogenesis of benign diseases varies widely and depends on intrinsic factors that define the clone as well as extrinsic factors that impact the inflammatory environment [25,56]. In this study, we have shown that CH is associated with CKD and confers an adverse prognosis over and above conventional risk factors for this common disorder. Our findings suggest that screening for CH in CKD may be of clinical value to help predict outcomes.  Fig. 5 Myeloid CH predicts adverse outcomes in CKD. The forest plots show data stratified according to eGFR.cys as healthy (≥90), mild CKD (≥60 to <90) and moderate CKD (≥15 to <60). The risk of adverse outcomes was predicted by myeloid CH in all groups but was particularly marked (HR = 1.6, P = 0.002) for participants with moderate CKD. Kaplan-Meier survival estimates for the three CKD groups according to absence or presence of myeloid CH. Log-rank test P values are reported for each group, and numbers at risk at 0, 2.5, 5, 7.5, and 10 years after study entry.