Introduction

Almost 700 million individuals globally have chronic kidney disease (CKD), an important but often unrecognized cause of morbidity and early mortality1. The initial presentation of CKD is usually asymptomatic and without overt clinical manifestations especially in the early stages of the disease. Recently, the Global Burden of Diseases, Injuries and Risk Factors Study (GBD) estimated that CKD accounts for 4.6% of total mortality worldwide, with a 41.5% increase between 1990 and 20171. Delayed diagnosis and limited patient recognition of the condition contribute significantly to the burden of morbidity2,3. Early detection can potentially change the disease trajectory. The most common causes of CKD, such as hypertension and diabetes, can be reversible or treatable, and early diagnosis is crucial for avoiding renal replacement therapy4,5. There are few methods to cheaply or non-invasively screen for CKD, with conventional risk calculators lacking specificity and requiring both serum and urine laboratory testing6.

Electrocardiograms (ECGs) are inexpensive, non-invasive, widely available, and rapid diagnostic tests frequently obtained during routine visits, prior to exercise, during preoperative evaluation, and for patients at increased risk of cardiovascular disease. Deep learning algorithms (DLA) have recently been applied to medical imaging and clinical data to achieve high precision, and to identify additional information beyond the interpretation of human experts7,8. Deep learning analysis of ECG waveforms has had potentially promising performance in prognosticating outcomes9, identifying subclinical disease10,11, and identifying systemic phenotypes not traditionally associated with ECGs12,13. Given the prior success in identifying occult arrhythmias14,15, ventricular dysfunction10, anemia13, and age12, DLA applied to screening ECGs could potentially identify patients who would benefit from further evaluation for kidney disease.

The high prevalence of concomitant cardiovascular disease and the well-established changes that accompany electrolyte abnormalities suggest that the ECG is also altered in the setting of CKD and that discrete electrocardiographic signatures could be identifiable with deep learning techniques. Patients with CKD have a disproportionate accumulation of cardiovascular risk factors, such as diabetes and hypertension, as well as subclinical cardiovascular changes such as left ventricular hypertrophy, myocardial fibrosis, and diastolic dysfunction16. It is not fully clear at which stage CKD patients start to develop manifest cardiovascular changes. However, recent studies have reported that in addition to coronary artery disease and left ventricle hypertrophy, patients with early-stage CKD may already have an increase in diffuse myocardial fibrosis on cardiac MRI as well17. It is hence likely that already early-stage CKD associates with non-specific ECG signals. In addition to myocardial remodeling, CKD associates with a variety of electrolyte abnormalities that also cause widespread ECG abnormalities (e.g., decreased T-wave amplitudes in hypokalemia, large-amplitude T-waves, and prolonged QRS duration in hyperkalemia, and QTc prolongation in hypocalcemia)18. Prior work has shown such patterns are detectable on ECG waveforms, contributing to the AI-ECG detection of hyperkalemia, which might augment a model’s ability to detect CKD19,20. However, given the relative infrequency of overt abnormalities, likely not the primary feature analyzed in detecting CKD. Given such observations, it may be possible that asymptomatic CKD presents with subtle ECG alterations that are not visible to the human eye.

To overcome current limitations in screening for occult CKD, we designed, trained, and validated a deep learning model to predict CKD, including end-stage renal disease (ESRD), by analysis of waveform signals from a single 12-lead and 1-lead ECG. Incorporating both structured information from medical diagnoses as well as laboratory data, we assessed the ability of our model to evaluate the entire spectrum of kidney disease. To further evaluate our model, we validated its performance using corresponding data from a separate healthcare system.

Methods

Data sources and study population

We retrospectively identified 64,308 ECGs among 7816 patients between 2005 and 2019 which were linked to a diagnosis of CKD within a 1-year window at Cedars-Sinai Medical Center. We also identified 183,290 ECGs among 103,554 patients between 2008 and 2019 with no CKD diagnoses at any point, which were used as matched negative controls. Study cohorts included both ambulatory and in-hospital patients. If a patient had multiple ECGs taken within a year of a CKD diagnosis, each ECG-CKD pair was considered an independent case during model training, but only one was used in the test datasets. The study population from Cedars-Sinai Medical Center was randomly split 8:1:1 into training, validation, and test cohorts by patient such that the multiple ECGs from the same patient were limited to one cohort. In addition, we identified 896,620 ECGs among 312,145 patients at Stanford Healthcare from 8/2005 to 6/2018, which were used for external validation (Fig. 1).

Fig. 1: Study subject selection.
figure 1

Our primary cohort consists of 111,370 patients and 247,655 ECGs between 2005 and 2019 from Cedars-Sinai Medical Center. The primary cohort was randomly split 8:1:1 into training, validation, and test cohorts. We also used 896,620 ECGs among 312,145 patients at Stanford Healthcare from 8/2005 to 6/2018 as external validation cohort. CKD Chronic kidney disease.

ECGs from Cedars-Sinai Medical Center were obtained from MUSE Cardiology Information System (GE Healthcare), and the model used the original ECG waveforms stored for training the model in CKD prediction. In external validation at Stanford University, ECGs were stored using the Phillips TraceMaster system, and were independently used as input examples for external validation. The ECG waveform data were acquired at a sampling rate of 500 Hz and extracted as 10 second, 12 × 5000 matrices of amplitude values. ECGs with missing leads were excluded from the study cohort. Associated clinical data for each patient was obtained from the electronic health record. The data on medical diagnoses was extracted from the electronic health records using International Classification of Diseases (ICD) 9/10th edition codes, which are listed in Supplementary Table 1. Demographic and clinical characteristics (e.g., age, gender, BMI, cardiovascular disease) were also extracted from the electronic health records. The institutional review boards of Cedars-Sinai Medical Center and Stanford Healthcare approved the study protocol (Cedars Protocol 1506 and Stanford Protocol 43721). Informed consent was waived for analysis of de-identified retrospective data.

AI model design and training

We designed a convolutional neural network, for ECG interpretation with potential for clinical data integration to predict the primary outcomes of chronic kidney disease and end-stage renal disease (Fig. 2). The model was trained to predict outcomes with the input of one 12-lead ECG obtained within 1 year of diagnosis. Please see Supplementary information for additional details on model training. If the same patient had multiple ECGs, each was considered an independent case. Models were trained using the PyTorch deep learning framework. The model was initialized with random weights and trained using a binary cross-entropy loss function for up to 100 epochs with an ADAM optimizer and an initial learning rate of 1e-4. Early stopping was performed based on the validation dataset’s area under the receiver operating curve. Local Interpretable Model-agnostic Explanations (LIME)15,21 was used with 1000 samples per study to identify relevant features in the ECG waveform by iteratively randomly perturbing 0.5% of the waveform and identifying which changes most impacted model performance.

Fig. 2: Schematic illustration of deep learning model training, testing, and validation.
figure 2

We designed a convolutional neural network for ECG interpretation with potential for clinical data integration. The model was trained to predict CKD with the input of one 12-lead ECG within 1 year of CKD diagnosis. CKD Chronic kidney disease.

Statistical analysis

All analyses were performed on the held-out test dataset, which was never seen during model training. The performance of the model in predicting the primary outcomes was mathematically assessed by the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. After model derivation and training, primary and secondary analyses were performed on trained models using the held-out test cohort. Secondary sensitivity analyses were limited to procedures performed in patients with diabetes, hypertension, male, and age greater or lower than 60 years old. We computed two-sided 95% confidence intervals using 1000 bootstrapped samples for each calculation. Statistical analysis was performed in R and Python.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Results

Primary cohort characteristics

In total, we identified 17,860 patients with a CKD diagnosis at Cedars-Sinai Medical Center (7.8% of the total patient sample), among which 7816 had an ECG taken within a 1-year window of CKD diagnosis. Our primary cohort consisted of a total of 247,655 ECGs, of which 221,974 were randomized to the training set (for both training and validation) and 25,681 to the testing set. Of the primary cohort ECGs, 74.3% had no serum creatinine or eGFR estimation within 30 days and 50.7% of ECGs had no serum creatinine or eGFR estimation at any point in the EHR, however this does not capture outside hospital or paper clinic records of laboratory testing that might have been used in the diagnosis of CKD. The mean age of the primary cohort was 61.3 ± 19.7 years and 48% were female. Demographic and clinical characteristics are presented in Table 1. Demographics and clinical characteristics according to age group are presented in Supplementary Table 2.

Table 1 Demographic and clinical characteristics in the internal and external dataset.

Model performance in the primary cohort

Our 12-lead ECG-based model achieved discrimination of any stage CKD with an AUC of 0.767 (95% CI 0.76–0.773). The model performance was consistent across the range of CKD stage, with our model achieving an AUC of 0.753 (0.735–0.770) in discriminating mild CKD, AUC of 0.759 (0.750–0.767) in discriminating moderate-severe CKD, and AUC of 0.783 (0.773–0.793) in discriminating ESRD. In all cases, negative examples were defined as ECGs without CKD diagnoses.

Given the increased prevalence of wearable technologies, particularly devices that include single lead ECG information, we trained an additional deep learning model with information from only single lead ECG information to simulate the DLA’s performance with single-lead wearable information. With 1-lead ECG waveform data, DLA achieved an AUC of 0.744 (0.737–0.751) in detecting any stage CKD, with sensitivity and specificity of 0.723 (0.723–0.723) and 0.643 (0.643–0.643), respectively.

Since early detection of CKD is crucial to prevent disease progression and complications in older age, we tested the performance of our model in younger patients (<60 years of age). 12-lead and 1-lead ECG-based DLAs were able to detect any stage CKD with AUCs of 0.843 (0.836–0.852) and 0.824 (0.815–0.832) among patients under 60 years of age, respectively.

We also tested the performance of our model separately among diabetic, hypertensive, older patients, who are generally considered as high-risk subgroups. 12-lead based model detected CKD with an AUC of 0.747 (0.707–0.783) among diabetic patients, an AUC of 0.711 (0.696–0.725) among patients with hypertension, and an AUC of 0.706 (0.697–0.716) among patients greater than 60 years old. When the model was trained with 12-lead ECG waveform, age, sex, diabetes, and hypertension, the model achieved similar discrimination of any stage CKD in the held-out test set with an AUC of 0.79 (0.781–0.798). Detailed results for 1-lead and 12-lead ECG-based DLA performance in the held-out test set are presented in Tables 2 and 3, while AUC curves are illustrated in Supplementary Fig. 1.

Table 2 Performance of the 12-lead ECG-based deep learning algorithm in the internal dataset.
Table 3 Performance of the 1-lead ECG-based deep learning algorithm in the internal dataset.

The model performed similarly in detecting CKD in subset populations of patients with albuminuria, patients with corresponding laboratory testing and documented eGFR, and in both ambulatory and in-hospital patients (Supplementary Table 3). In patients with both a CKD diagnosis and eGFR estimated to be less than 60 mL/min, the AUC was 0.754 (0.737–0.771), and this performance was similar in patients with hyperkalemia with an AUC of 0.741 (0.698–0.787) and without hyperkalemia with an AUC of 0.758 (0.747–0.768). The model also performed well in patients with known albuminuria, with an AUC of 0.734 (0.723–0.745) and had similar performance regardless of the positive to negative ratio in the training set (Supplementary Table 4).

Electrocardiographic features in CKD

To understand the key features of relevance for our deep learning model to be able to detect CKD, we performed two sets of experiments to evaluate the ECG parameters that are important for identifying CKD. We found statistically significant differences in all available ECG variables (heart rate, PR interval, P wave duration, QRS duration, QTc interval, P-wave axis, R-wave axis, T-wave axis) between CKD stages (Supplementary Table 5).

Secondly, we used LIME to identify which ECG segments were particularly used in the identification of CKD. Supplementary Fig. 2 shows examples of LIME-highlighted ECG segments in 12-lead and 1-lead ECG waveforms taken from correctly recognized CKD and healthy control patients in the held-out test set. In both examples, the LIME-highlighted ECG features focused mostly on QRS complexes and PR intervals. In addition, QRS complexes and PR intervals in limb leads were most frequently highlighted, potentially denoting CKD-associated electrophysiologic alterations.

External validation cohort characteristics

The external validation cohort consisted of a total of 896,620 ECGs among 312,145 patients. The prevalence of mild CKD was 1.2% while 3.6% had moderate-severe CKD, and 0.9% had ESRD. The mean age of the external validation cohort was 56.7 ± 18.7 years and 50.4% were female. The proportion of Caucasians was 47.5%, while 3.6% were black, 12.3% were Asians, and 36.6% had other or unknown race. Demographic and clinical characteristics are presented in Table 1.

Model performance in the external validation dataset

In the external validation dataset, our 12-lead and 1-lead models’ performances were comparable to the primary cohort. 12-lead ECG-based model achieved an AUC of 0.709 (0.708-0.710) in discriminating any stage CKD. 1-lead ECG-based model detected any stage CKD with an AUC of 0.701 (0.700–0.702).

Consistent with the primary cohort in which our model achieved higher CKD detection accuracy among younger patients, 12-lead and 1-lead ECG-based models achieved AUCs of 0.784 (0.782–0.786) and 0.777 (0.775–0.779) in detecting any stage CKD among subjects under 60 years of age, respectively. Detailed results for 1-lead and 12-lead ECG-based DLA performance in the external validation cohort are presented in Supplementary Tables 6 and 7.

Discussion

In the present study, we investigated the performance of a deep learning model to detect CKD using ECG waveforms. Our 12-lead ECG-based model had good accuracy in identifying any stage CKD and higher accuracy in detecting CKD in patients under 60 years of age. Accuracy also improved along with the worsening CKD stage. These results were validated in a separate health care system, that also showed good discrimination accuracy for the presence of any stage CKD in the whole study population and higher discrimination accuracy among patients under 60 years of age. While 12-lead ECGs are widely available in the healthcare unit settings, rapid adoption of wearable technology has also introduced opportunities for large-scale data collection outside of formal healthcare settings. Our 1-lead ECG-based DLA showed good discrimination accuracy for CKD in young patients, suggesting artificial intelligence may possess significant potential in widescale screening in this patient population. One-lead ECGs could also increase screening rates in high-risk patients (Supplementary Figs. 3 and 4). However, the integration of artificial intelligence in electronic devices requires a more detailed evaluation of accuracy in a real-life setting.

Low awareness of CKD and limitations in current screening measures highlight the urgency of novel screening strategies to increase detection rates of early-stage CKD. Being non-invasive and often obtained in the clinic, ECGs are often the first line of clinical evaluation. In our healthcare system, 74% of ECGs obtained did not have laboratory testing of kidney function within 30 days. Previous studies have demonstrated that the cost-effectiveness of CKD screening is highly dependent on patient risk factor profile and CKD probability, and there has been debate on whether CKD screening should be targeted only to high-risk patients, or also extend to patients without risk factors for CKD22,23,24,25. Although screening high-risk patients is guideline-recommended, testing rates remain low as only about 20% of high-risk patients receive guideline-recommended assessment in the U.S.26. Consequently, most of the high-risk patients are likely to be unaware of underlying CKD2,3. Moreover, a substantial proportion of all CKD patients are not high-risk patients and hence not recommended to be screened regularly, which further highlights the need for novel screening methods.

Our model performed better at detecting CKD in younger patients, whereas detection accuracy was lower in older and high-risk patients. Reasons for this observation are not fully clear but may be due to the fact that younger patients in general have fewer comorbidities, meaning that any detected ECG abnormalities may be especially meaningful and specific. Although older age is a well-known risk marker for CKD, the prevalence of CKD in younger patients is also notably high in the U.S. (8–10% in <65 years)3. Remarkably, however, awareness of underlying CKD is also very low in younger patients, as only about 8% are aware of the disease3. Given the availability of effective low-risk CKD treatments and the reversibility of CKD, there are substantial potential benefits for detecting and treating CKD, especially in the young. A recent paper by Kwon et al.27 also used data from ECG waveforms in addition to age and sex to develop a DLA to detect changes in eGFR, which can include both patients with acute kidney injury (e.g., dehydration, pharmacotherapy, urinary tract obstruction) as well as chronic kidney disease. Their model achieved a slightly higher performance with an AUC of 0.86–0.91, however reaffirms the overall conclusion that renal abnormalities can be detected by CKD within large cohorts across multiple international sites.

The strengths of our study include the large cohort of patients undergoing ECG recording across a decade and the use of state-of-the-art deep learning architectures. We also used two separate approaches to understand the key features of relevance for our deep learning model. While previous studies have reported that patients with CKD have high rates of P wave abnormalities, prolonged PR interval, QTc prolongation, QT dispersion, and left ventricle hypertrophy28,29,30,31, in the present study CKD was associated with skewed P-, R-, and T-wave axes in addition to prolonged QRS, PR, and QTc intervals. However, a few limitations warrant consideration. Our study is retrospective, and study populations are derived from two large academic medical centers situated in dense urban metropolitan areas using ICD-9 codes. By prioritizing priority codes, we sought to avoid incidences of acute rather than chronic kidney injury, however we cannot exclude the possibility that some of the study subjects without CKD diagnosis in electronic health records have an undiagnosed disease, as especially mild-stage CKD can often be undiagnosed, particularly using an ICD9 code-based adjudication. In the subset with both ICD-9 code adjudication of CKD as well as laboratory testing, the ICD-9 codes were consistent with the calculated eGFR, however only a minority of patients were able to be linked to data regarding microalbuminuria.

Validation in prospective general population cohorts in outpatient settings is required to confirm an ECG-based DLA’s ability to recognize patients with CKD. Although the prevalence of CKD was low in our training cohort with ECGs, and this prevalence not directly comparable to epidemiological cohorts with CKD (as ECGs are more commonly obtained in patients without CKD), we show that our disease definitions is consistent with laboratory testing and documented eGFR (Supplementary Table 8) and that our deep learning approach is relatively insensitive in model accuracy to disease prevalence in the training set (Supplementary Table 4). The prevalence of hypertension diagnosis may be underestimated in the internal cohort, however our model performed similarly well in internal and external test cohorts with different prevalences of hypertension.

By 2030, the UN’s Sustainable Development Goals are to reduce premature mortality related to non-communicable diseases by a third. Given the high prevalence of asymptomatic CKD, serious consequences of untreated disease, presence of effective low-risk treatment, and detectable preclinical state with inexpensive and simple diagnostic tests, CKD represents a good target for large-scale population screening and harbors the potential for reducing premature mortality related to non-communicable diseases. In addition to the high mortality and morbidity due to CKD, treatment costs for CKD are also high and have increased during the last decades32. Especially, the increasing number of patients requiring renal replacement creates challenges for health care systems worldwide, and the shortage of sufficient replacement services may cause at least 2 million premature deaths annually33. Therefore, widely available, inexpensive, and effective CKD prevention and management strategies are warranted to enable equal opportunities in reducing CKD-related disability-adjusted life years.

Conclusions

Our ECG-based deep learning model was able to detect CKD with good discrimination accuracy in multiple study populations and with particularly high accuracy in patients under 60 years of age. These results suggest that deep learning-based ECG analysis may provide additional value in detecting various CKD stages, especially in younger patients. The clinical significance of this study lies in the potential enhancement of screening methods for the early detection of CKD, which is crucial to enable early treatment and prevent disease progression.