Introduction

Progressive kidney dysfunction leads to end-stage kidney disease (ESKD), requiring dialysis or transplantation. The number of patients requiring dialysis in Japan continues to increase, exceeding 330,0001. Patients with chronic kidney disease (CKD) have an increased relative risk of coronary heart disease, heart failure, and stroke compared to those without CKD2,3,4,5. ESKD and cardiovascular diseases secondary to the renal impairment have become important medical problems that lead to a decline in the quality of life and increased national health care costs.

A changing lifestyle and an aging population have dramatically altered each pathogenetic mechanism contributing to renal impairment. According to data from the Japanese population, diabetic nephropathy was the most common primary disease among patients requiring dialysis (39.0%), followed by chronic glomerulonephritis (27.8%), and nephrosclerosis (10.3%)1. The incidence of diabetic nephropathy and nephrosclerosis is increasing annually, while the incidence of chronic glomerulonephritis is declining1. Lifestyle and the coexistence of other lifestyle-related diseases play a major role in renal dysfunction. It is very important to focus on risk factors to identify patients who may develop renal dysfunction in the future, because improving renal function is difficult after deterioration. Prophylactic intervention in a high-risk population with renal dysfunction may reduce progression and ESKD and cardiovascular mortality6. In developed countries, CKD is generally associated with old age, diabetes, hypertension, obesity, and cardiovascular disease7. Some of these can be improved by promoting lifestyle and behavioral changes. It has been reported that smoking cessation inhibited the progression of CKD. In diabetic patients with normal renal function and albuminuria, smoking cessation has been reported to decrease albuminuria8. Metabolic syndrome control were independently associated with a lesser progression of diabetic nephropathy9. In Japan, since 2008, people aged ≥ 40 years are recommended to undergo regular health checkups to prevent lifestyle-related diseases. It would be clinically useful to find a way to predict CKD using variables measured during these checkups.

Several studies have developed predictive scores for CKD, but they differ in the target population’s ethnicity and score complexity, determined by the number and classification of factors10,11. Since the prevalence of renal dysfunction and mechanisms associated with its development vary by race, a unique Japanese scoring system is needed to predict CKD in the Japanese population. Although Nelson’s model included four cohorts of Japanese, they represented only 1.6% of the total participants11. Furthermore, although a complex scoring system may provide accurate prediction, it is inconvenient in daily clinical practice. Therefore, we created a simple risk score and an equation for incident CKD after 5 years using Japanese large-scale health checkup data, and evaluated its internal validity.

Methods

Data and study population

This study was conducted in the same population as the participants for whom we previously created a risk model for developing hypertension12. Data were collected from participants aged 30–69 years after annual health checkups at Kagoshima Kouseiren Hospital between April 2008 and March 2016. We selected participants whose data were available at baseline and after 5 years (range, 3–7 years). Participants were excluded if they had CKD, defined as an estimated glomerular filtration rate (eGFR) < 60 mL/min/1.73 m2; were undergoing dialysis therapy; were post-renal transplantation at baseline; and those with missing data.

Two-thirds of the participants were randomly assigned to the derivation cohort to generate a score and an equation to predict CKD. The remaining one-third was used as a validation cohort to assess the validity of the score and equation obtained in the derivation cohort.

This study was conducted in accordance with the Declaration of Helsinki and was approved by the Institutional Ethics Committees of the Graduate School of Medical and Dental Sciences, Kagoshima University. The Ethics Committee has approved that since only existing anonymized data were used in this study, it is not necessary to obtain the informed consent of each individual.

Risk factors

Age was categorized into four groups: 30–39, 40–49, 50–59, and 60–69 years. Height and weight were measured using standard anthropometric methods. Body mass index (BMI) was calculated as weight (kg) divided by height squared (m2) and categorized into ≤ 24.9 kg/m2 and ≥ 25.0 kg/m2 groups.

The following categories were determined by a self-administered questionnaire: smoker (currently smoker) or non-smoker (have never smoked or smoked in the past), non-frequent drinker (drinking less than 10 days per month) or frequent drinker (drinking more than 10 days per month), habit exerciser (30 min or more a day). Data on drugs for hypertension, diabetes, dyslipidemia, and hyperuricemia were collected using a self-administered questionnaire. Blood samples were obtained after an overnight fast. Serum lipids, glucose, uric acid, and creatinine levels were measured using standard laboratory procedures. Underlying diseases were defined as follows: hypertension (under treatment with antihypertensive agents or blood pressure ≥ 140/90 mmHg), diabetes (under treatment with oral hypoglycemic agents or insulin, or fasting blood glucose ≥ 126 mg/dL), dyslipidemia (under treatment with lipid-lowering agents, serum triglycerides ≥ 150 mg/dL, serum low-density lipoprotein-cholesterol ≥ 140 mg/dL, or serum high-density lipoprotein-cholesterol < 40 mg/dL), and hyperuricemia (under treatment with uric acid lowering agents or serum uric acid level > 7.0 mg/dL). The eGFR was determined according to the new Japanese coefficient for the modified isotope dilution mass spectrometry-traceable Modification of Diet in Renal Disease study equation:

$$ {\text{eGFR}} = {194} \times {\text{SCr}}^{{ - {1}.0{94}}} \times {\text{Age}}^{{ - 0.{287}}} . $$

For women, eGFR was multiplied by a correction factor of 0.73913. The baseline renal function was categorized into four groups by baseline eGFR: 60.0–69.9, 70.0–79.9, 80.0–89.9, and ≥ 90.0 mL/min/1.73 m2.

The outcome was CKD at the 5-year follow-up, defined as eGFR < 60.0 mL/min/1.73 m2.

Statistical analysis

As in our previous risk model for developing hypertension, we used a similar analytical method12. Continuous variables (age, BMI, blood pressure, and hematological parameters) are expressed as mean ± standard deviation, except for skewed distributed indices including triglyceride and blood glucose levels, which are expressed as median (1st quartile, 3rd quartile). Categorical variables, including underlying diseases and lifestyle variables, are expressed as proportions (percentages). Differences between the derivation and validation cohorts for normally distributed continuous variables, skewed-distribution continuous variables, and categorical variables were analyzed using the Student’s unpaired t-test, Wilcoxon test, and χ2 test, respectively.

Univariable and multivariable logistic regression analyses were performed for each variable to estimate the odds ratio and 95% confidence interval for CKD incidence. To create a risk score that predicts 5-year incidence of CKD, the following scores corresponding to standardized beta coefficients were assigned to each risk factor category for items that were significant in the multivariable logistic regression analysis, based on the methodology used in the Japan Epidemiology Collaboration on Occupational Health Study Group’s study: 1, β = 0.01–0.20; 2, β = 0.21–0.80; 3, β = 0.81–1.20; 4, β = 1.21–2.20; and 5, β > 2.2014,15,16. The reference category for each variable was given a point of 0, and the risk score for developing CKD was calculated as the sum of the individual points.

The discriminative performance of the score was assessed using the area under the curve (AUC) from the receiver operating characteristic (ROC) analysis. The performance of each score was assessed using sensitivity, specificity, positive predictive value, negative predictive value, and the Youden index16. The consistency of association between the scores and CKD incidence was evaluated using the Cochran–Armitage trend test. Then, the score was applied to the validation cohort and ROC analysis was performed.

To compensate for the weakness of the study being derived from data from a single institution, a sensitivity analysis was performed using the bootstrap resample method. The 95% bootstrap confidence interval of the odds ratios in multivariable logistic regression analysis were calculated based on 2500 bootstrap resamples. We then used the β coefficients for the risk factors that were significant in the logistic regression analysis to create an equation that directly calculates the proportion of CKD after 5 years, and assessed its predictive ability and reproducibility in the same way as for the score. In developing the equation, continuous variables such as age and eGFR were used as continuous values without categorization. We evaluated the calibration using calibration plots.

All statistical analyses were performed using JMP Pro version 14 (SAS Institute, Cary, NC, USA) for Windows. Statistical significance was set at P < 0.05.

Results

Baseline characteristics

Overall, 167,706 participants aged 30–69 years underwent medical examinations at least once during the study period. Among these, 71,002 individuals had available 5-year follow-up data. A total of 12,579 individuals were excluded (baseline CKD, 7,871; missing variables, 4,708). Finally, the data of 58,423 participants (age, 53.8 ± 10.2 years; male, 50.0%) was analyzed.

Table 1 shows the baseline characteristics of the derivation and validation cohorts. The age and proportion of men were 53.8 ± 10.2 years and 49.8% in the derivation cohort and 53.8 ± 10.2 years and 50.1% in the validation cohort, respectively. After the follow-up period (median, 5.0 years; 1st quartile, 4.7 years; 3rd quartile, 5.1 years), we identified 2,679 (6.9%) and 1,319 (6.8%) cases of new-onset CKD in the derivation and validation cohorts, respectively.

Table 1 Baseline characteristics of study population in the derivation and validation cohorts.

Association between risk factors and incident CKD

The association between incident CKD and risk factor candidates is shown in Table 2. In the univariable model, older age, male sex, higher BMI, non-current smoking, non-frequent alcohol drinking, hypertension, diabetes, dyslipidemia, hyperuricemia, and lower eGFR were associated with an increased risk of CKD. In the multivariable model, older age, male sex, non-frequent drinking, hypertension, diabetes, dyslipidemia, hyperuricemia, and lower eGFR were significantly associated with an increased CKD risk. Our risk models included age, gender, hypertension, diabetes, dyslipidemia, hyperuricemia, and eGFR. Considering the lack of sufficient information on alcohol consumption, it was not included in the risk model components.

Table 2 Odds ratios and 95% confidence interval of 5-year incidence of CKD for each risk factor.

Risk prediction score for 5-year incidence of CKD

The risk factors and points derived for each category are shown in Table 3. The risk score, sum of all points, could range from 0 to 16. From the ROC curve predicting CKD incidence, the AUC was 0.78 (Fig. 1a). Table 4 shows the predictive performance for a range of cutoff points. A score of ≥ 8 showed the highest Youden index in the derivation cohort, with a sensitivity of 0.90 and specificity of 0.52. The observed incidence of CKD for each score is shown in Fig. 2. At a score of 0 to 6, the observed risk of 5-year CKD was less than 1%. As the score increased, the risk gradually increased (P < 0.001). At a score of 14 to 16, more than 25% of the participants had developed CKD after 5 years.

Table 3 Points assigned to predict 5-year incidence of CKD.
Figure 1
figure 1

ROC curves in the derivation and validation cohorts for the risk score for predicting the 5-year incidence of CKD. Receiver operating characteristic curves of the derivation cohort (a) and validation cohort (b) for the risk score to predict the 5-year incidence of CKD. AUC area under the curve, CKD chronic kidney disease, ROC receiver operating characteristic.

Table 4 Predictive performance of the developed 5-year CKD risk score.
Figure 2
figure 2

Proportions of developing CKD after 5 years at each score in the derivation and validation cohort. Proportion of CKD development after 5 years for each score in the derivation and validation cohort. The lined bars represent the proportion of 5-year incidence of CKD in the derivation cohort, and the black-shaded bars represent the proportion of 5-year incidence of CKD in the validation cohort. CKD chronic kidney disease.

When the risk score was applied to the validation cohort, the AUC was 0.79, similar to the derivation cohort (Fig. 1b). At the cutoff point of 8 ≥ points, the score had a sensitivity of 0.89 and specificity of 0.53, which was similar to the derivation cohort (Table 4). As the score increased, the risk gradually increased (P < 0.001, Fig. 2).

The results of the sensitivity analysis using the bootstrap method are shown in Table 5. The odds ratios and 95% confidence intervals from the bootstrap method were similar to those from the derivation cohort in the main analysis.

Table 5 Mean odds ratios and 95% confidence intervals of multivariable logistic regressin analysis from bootstrap resample method.

Risk prediction equation for 5-year incidence of CKD

We developed an equation to predict CKD probability after 5 years, using age (years old), sex (female, 0; male, 1), hypertension (no, 0; yes, 1), dyslipidemia (no, 0; yes, 1), diabetes (no, 0; yes, 1), hyperuricemia (no, 0; yes, 1), and eGFR (mL/min/1.73 m2).

Probability of 5-year incidence of CKD

Probability = 1/ (1 + exp[−{9.4876 + 0.0311 × age + 0.2400 × sex + 0.3470 × hypertension + 0.0893 × dyslipidemia + 0.3444 × diabetes + 0.0832 × hyperuricemia + -0.1980 × eGFR}]).

The median probability obtained from the derivation cohort was 0.018 (interquartile range 0.002–0.084), and the AUC value of the ROC curve for the development of CKD after 5 years was 0.88, with a sensitivity of 0.84 and a specificity of 0.78 at a cutoff value of 0.077 calculated from the Youden index (Fig. 3a). When the risk equation was applied to the validation cohort, the AUC was 0.89, with a sensitivity of 0.87 and a specificity of 0.77 with a cutoff value of 0.075 (Fig. 3b). The results of the calibration are shown in Fig. 4. Good visual calibration is achieved for both the derivation and validation cohorts.

Figure 3
figure 3

ROC curves in the derivation and validation cohorts for the risk equation for predicting the 5-year incidence of CKD. Receiver operating characteristic curves of the derivation cohort (a) and validation cohort (b) for the risk equation to predict the 5-year incidence of CKD. AUC area under the curve, CKD chronic kidney disease, ROC receiver operating characteristic.

Figure 4
figure 4

Calibration plots for the equation model in derivation and validation cohorts. The visual agreement between the CKD predictions (Predicted probability) and observations (Actual probability) for the equation model in the derivation and validation cohorts. (a) derivation cohort (b) validation cohort.

Discussion

We developed a score and an equation and to predict the 5-year risk of incident CKD defined as eGFR < 60 mL/min/1.73 m2 based on seven indicators (age, sex, hypertension, diabetes, dyslipidemia, hyperuricemia, and eGFR) from large-scale health checkup data. CKD incidence increased with increasing risk scores, and the same predictive ability was validated when applied to population not included in the score development. We have achieved our goal of creating risk models based on simple indexes that is applicable to the Japanese population under 70 years of age and can be easily used in clinical practice.

Several studies have reported risk scores for predicting future renal dysfunction. O'Seaghdha et al. studied the Framingham Offspring Study cohort of Caucasians and developed a score (0–15 points) to predict CKD incidence (eGFR < 60 mL/min/1.73 m2) at 10 years based on age, hypertension, diabetes, baseline eGFR category, and the presence of proteinuria using a test paper method. The c-statistic for the CKD risk score was 0.74 when validated using an external cohort of whites and blacks, and the predictive ability was good even when for only blacks10. Predictive equations for CKD using data from over 5 million people in 34 cohort studies from 28 countries have also been reported. In addition to age and sex, race/ethnicity, eGFR, history of cardiovascular disease and hypertension, smoking history, BMI, and urinary albumin are used for calculation. For diabetic patients, glycated hemoglobin and the use of diabetic medications were added. The C statistic was 0.845 in the absence of diabetes and 0.801 in the presence of diabetes11. We should apply these prior risk models to our study population and compare the predictive abilities. However, these two risk models could not be applied to our study population because they require urinary protein and albumin information. The AUC and c-statistic of the ROC curve are commonly used as indicators of the discriminative ability of risk scores. The model developed in this study had an AUC of 0.79 for the score and 0.89 for the equation. Although direct comparison with previous reports was not possible, the discriminative ability of our model is somewhat superior.

As described above, several risk prediction models have been reported to predict the development of CKD in various ethnic groups in different countries. Since CKD prevalence and its risk factors varies by race, it is preferable to use the prediction model determined from Japanese data to predict CKD risk in Japanese people. Although Nelson et al. included four cohorts of Japanese, they represented only 1.6% of the total participants and included data as old as the 1970s, whereas the present study included only Japanese and used relatively recent data from 2008 to 2016 to develop the score and equation. These factors may have contributed to the improvement in discrimination. In Japan, all men and women aged ≥ 40 years are recommended to undergo annual specific health checkups and receive guidance focusing on visceral obesity if needed. We aimed to create a prediction model based on anthropometric measurements, blood tests, questionnaires, and other data obtained from these checkups. We achieved a good prediction accuracy of AUC 0.78 for the risk score and 0.88 for the risk equation. The risk score obtained in this study can be easily calculated by healthcare workers in daily clinical practice, enabling patients to understand the extent of their risk and contribute to decision-making for treatment and lifestyle improvement. The risk prediction equation has the advantage of more accurately predicting CKD risk, but they are not easy to calculate and should be considered for practical application in the form of online calculators or others. In addition, development of an application that can calculate this risk score would be more convenient and clinically useful for assessing risk.

Age-associated loss of kidney function has been recognized for decades. With aging, many participants exhibit a progressive decrease in the glomerular filtration rate and renal blood flow. The decrease in glomerular filtration rate is due to a reduction in the glomerular capillary plasma flow rate and glomerular capillary ultrafiltration coefficient17. The proportion of participants over 70 years who had a medical checkup after 5 years was low. In addition, 40% of the participants had already developed CKD, making the number of cases available for analysis small. Therefore, participants over 70 years were excluded from this study. Previous epidemiologic studies indicate that the incidence of ESKD is higher in men than in women18. A recent study found that female mice were more tolerant to ischemic-reperfusion injury than male mice and that female mice receiving supplemental estrogen before ischemia were further protected19. The possible mechanisms underlying the reno-protective role in females seem to be related to estrogen. This is supported by clinical studies demonstrating that premenopausal women are better protected from renal and cardiovascular disease compared to age-matched men; this protective effect seems to be lost with aging and menopause20. However, the mechanisms by which estrogen confers protective renal effects are not well understood.

Hypertension has been reported to be a risk factor for both CKD and ESKD21,22. Systemic hypertension causes an increase in intraglomerular capillary pressure, leading to glomerulosclerosis and loss of renal function22. CKD is associated with specific qualitative and quantitative lipid abnormalities, resulting in specific dyslipidemias23,24,25. Specific abnormalities in lipoprotein metabolism, caused by the inappropriate activity of some key enzymes and metabolic pathways, develop in the early stages of renal failure and result in dyslipidemia, which is a risk factor for the development of atherosclerosis. As CKD progresses, dyslipidemia worsens and may adversely affect renal function through the promotion of atherosclerosis23,26,27. Iseki et al. found a significant association between elevated serum uric acid and serum creatinine levels in a Japanese cohort study28. Further, high uric acid level (≥ 6.0 mg/dL) was a risk factor for ESKD development in women29. It is suggested that hyperuricemia may damage vascular endothelial cells and contribute to renal dysfunction through the production of reactive oxygen species by xanthin oxide reductase, activation of the renin-angiotensin system, and induction of inflammation by inflammasome activation30,31. Although alcohol consumption was a significant factor in the multivariate analysis, we did not include it as a risk factor in this study because we do not have sufficient information on it. In the questionnaires, the frequency per month and daily intake regarding alcohol consumption were asked as separate items. Therefore, we were unable to find an appropriate way to integrate frequency and quantity of intake into a single variable, "alcohol intake“. The association between alcohol intake and CKD is not simple. Moderate alcohol intake (20–40 g ethanol/day) is not a risk factor for CKD32, but rather inhibits its progression33. On the other hand, heavy alcohol intake (> 60 g/day of ethanol) has been reported to be a risk factor for CKD34. For these reasons, we decided that it would be difficult to use the data with alcohol intake not clearly defined.

In addition to the above, this study had several limitations. First, the data were not collected prospectively, and the results should be further validated in a prospective observational study. Second, the participants were limited to those who underwent health examinations at a single institution in Japan. Furthermore, participants tend to be highly interested in their own health, so selection bias cannot be avoided. Third, since we do not have information on the etiology of nephropathy, we do not know if these models work better in predicting nephrosclerosis or diabetic nephropathy compared to glomerulonephritis. Fourth, although proteinuria is itself an important risk factor for the development of CKD, urinary protein was not addressed in this study because of the low rate of obtaining urinalysis results in our participants' data. Finally, due to the large population that was randomly divided into derivation and the validation cohort, both cohorts had nearly identical characteristics. The internal validation therefore has limited informative value and external validation is necessary to assess the discriminative and predictive abilities of scores in real life.

In conclusion, we have developed a clinical risk score and equation to predict the occurrence of CKD after 5 years in a Japanese population under 70 years of age, using age, sex, hypertension, diabetes, dyslipidemia, hyperuricemia, and eGFR levels. Predictive ability was comparable with previous reports, and reproducibility was confirmed through internal validation. These prediction methods could be simple and useful tools for detecting and recognizing high-risk CKD patients in Japan and for providing appropriate counseling.