Comparing predictions among competing risks models with rare events: application to KNOW-CKD study—a multicentre cohort study of chronic kidney disease

A prognostic model to determine an association between survival outcomes and clinical risk factors, such as the Cox model, has been developed over the past decades in the medical field. Although the data size containing subjects’ information gradually increases, the number of events is often relatively low as medical technology develops. Accordingly, poor discrimination and low predicted ability may occur between low- and high-risk groups. The main goal of this study was to evaluate the predicted probabilities with three existing competing risks models in variation with censoring rates. Three methods were illustrated and compared in a longitudinal study of a nationwide prospective cohort of patients with chronic kidney disease in Korea. The prediction accuracy and discrimination ability of the three methods were compared in terms of the Concordance index (C-index), Integrated Brier Score (IBS), and Calibration slope. In addition, we find that these methods have different performances when the effects are linear or nonlinear under various censoring rates.

Methods for analyzing survival data with competing risks.Cause-specific hazard model.The CS model is an adaptation of the Cox model, with cause-specific hazard functions from different types of events.It treats failure from the cause of interest as events and failures from other causes as censored 13 .Let T and C denote failure and censoring times, respectively.The cause-specific hazard function at time t for cause k(k = 1, • • • , K) is Similar to the Cox model, a separate proportional hazards model with p-dimensional covariates for cause k can be defined as where h CS k0 (t) is the baseline of the cause-specific hazard function, and the exponential term illustrates the covari- ate effects on cause k.The regression coefficients β k for cause k can be calculated using the maximum partial likelihood estimation method as follows: where R CS i represents the i-th risk set which contains subjects who have not experienced any event and are not censored yet.
Fine and Gray model.The FG method 7 is another extension of the Cox model that estimates the incidence of outcomes in a follow-up period with competing risks.Unlike the CS model, the FG model can be defined as the instantaneous rate of occurrence of each event type in subjects who are still under observation and those who have already experienced competing risks.The FG model describes the hazard function, Analogous to the CS model, the FG model with covariates can be defined as follows: where h FG k0 (t) is the baseline of the subdistribution hazard of cause k.The regression coefficient β k for cause k can also be calculated using the maximum partial likelihood estimation: where R FG i represents the i-th risk set, which contains subjects who have experienced competing risks ahead and are still event-free.w ij are subject-specific weights that reflect the incidence of competing risks.
Random survival forests.The RSF, first introduced by Ishwaran et al. 14 , is an extension of the machine-learning tree-based Random forests 15 .Subsequently, Ishwaran et al. 8 proposed a fully nonparametric method for competing risks in the presence of right-censored data.As mentioned in Ishwaran et al. 8 , RSF has many useful properties.It calculates the cumulative incidence function directly in each node and provides an accurate prediction performance by aggregating individual trees in the ensemble.This allows the model to include not only linear effects but also nonlinear and interaction effects.Permutation importance was used as a measure of variable importance to identify the event-specific risk factors.A more detailed description of RSF is presented in Ishwaran et al. 8 .
RSF constructs each tree using a bootstrap sample of the original training data.We extracted .632samples without replacement as training data in the bootstrapped sample and excluded the rest of them for out-of-bag data.Random feature selection can be applied to evaluate the split tree nodes at each node.For each tree, a cumulative hazard function was estimated using the Nelson-Aalen estimator.The survival forest is calculated by averaging the terminal node statistics in the ensemble.
Since identifying the most influential risk factors is the main interest in the medical field, it can be interesting to use Variable Importance as a means of finding a ranking of important risk factors.The RSF approach ranks covariates based on their predicted values and determines the important factors through the predicted accuracy.That is, the prediction accuracy of the test data in the current model was first calculated for each tree.Then, the prediction accuracy was also calculated on shuffled data using the same method.The differences between the original prediction accuracy and randomly permuted prediction accuracy were averaged over all trees, and they were normalized by the standard error.If a current model without the original values of a variable can provide a worse prediction, then the variables with large values are ranked as more important.
In addition to identifying important risk factors, there is one way to explain how each covariate affects the output of the model.The SHAP value is used to address each variable's contribution to the model.The SHAP value is closely related to "Shapley values" which were first developed for the game theory method 16 .It has been widely adopted since the study by Lundberg and Lee 17 was first published.The SHAP value translates to assigning an importance value to features, depending on their contribution to the prediction.In other words, it is calculated as the average marginal contribution of a variable value across all possible combinations 18 .Thus, the SHAP value can be the relative risk of the outcomes, meaning that high values contribute more to the predicted probability.
Performance measures.Concordance index.The C-index is commonly used to identify predicted probability in a prognostic model.This is a generalization of the area under the ROC curve that considers censored data 19,20 .In randomly selected patients, those with shorter time-to-event would have higher risk scores and lower predicted time-to-event outcomes.As presented in Brentnall and Cuzick 21 , the C-index is calculated from the Wilcoxon rank-sum statistic by where T 1 and T 2 are survival time of the two different subjects, T1 and T2 are their predictions from a fitting method, and I(•) denotes the indicator function.Then, the C-index can be estimated by where Ŝ(t|X i ) is the predicted survival function with covariate X i and m = i,j I(T i ≤ T j ).
The C-index value is a proportion between 0 and 1.Values near 1 indicate high model discrimination performance, whereas values near 0.5 show similarity to random prediction 20 .Note that there are various methods for calculating C-index for right-censored data.Harrell's C-index depends on censoring distribution because they provided C-index censored information largely excluded 22 .Otherwise, Uno 23 and Efron 21 's approaches are censoring-independent.
Integrated brier score.Another criterion used to compute risk prediction is the time-dependent Brier Score (BS) 24,25 .The BS can be defined by the mean squared error between the predicted survival function Ŝ(t|X i ) and the predictor variable X i and the true status Y i (t) at a specified time-point t 26 .In the presence of right censored data, the BS can be estimated by where Ỹi (t) represents the observed status at time t.DM is the test dataset with sample size M and Ŵi (t) is the inverse probability of censoring weights 24

defined by
Here, δ i is the event indicator, Ĝ(t|x) ≈ P(C i > t|X i = x) is an estimate of the conditional survival function of censoring time C and Ĝ(T i − |X i ) denotes the estimated survival function prior to T i for the censoring time C.The estimated IBS is calculated by integration: Because IBS is based on the mean squared error, lower values are considered a better predictor.
Calibration slope.Calibration refers to the agreement between the predicted probabilities and observed number of events 27 .As described by Ambler et al. 28 , the calibration was assessed using a linear regression model to show the association between the observed and predicted values at time t.Thus, the regression model is defined as: where S i (t|X i ) and Ŝi (t|X i ) are the true and estimated survival functions for the subject i at time t, respectively.When the model is perfectly calibrated, the estimated Calibration slope is 1.If the slope is higher than 1, the model is under-fitted.In contrast, overfitted models show a Calibration slope lower than 1, meaning that the model underestimates the probability of an event in the low-risk group and overestimates it in the high-risk group.
Simulation study.To compare the performances of the three methods, we consider the following simulation study.We first generated time-to-event, T i , based on the Cox-exponential hazards model with one event of interest and one competing risk (i = 1,2), considering the following form: where U i is generated from a uniform distribution and the baseline hazards are 1 = 1 and 2 = 2 , respectively.For the ith state of events, five independent predictors X ij (i = 1, 2; j = 1, ..., 5) are generated from the standard normal distribution.For the parametric function g(X i , β)) , we consider linear and nonlinear terms (Table 1).The true regression coefficients are set as follows: To investigate the performance of each method when the parametric function is correct or misspecified, we fitted two CS models when g(X i ; β) is linear (CS_l) and nonlinear (CS_nl).Similarly, the FG model when g(X i ; β) is linear (FG_l) and nonlinear (FG_nl), respectively.Note that RSF does not require the selection of the parametric function g(X i ; β) .We implemented 500 and 1000 sample sizes and split the training and test sets with a 7:3 ratio in each simulation, which was repeated with 1000 independent observations.In addition, to examine the effect of censoring rates, we consider different censoring rates (30%, 60%, 80%) for each simulation.

Results
Baseline characteristics.Baseline characteristics of a total of 2238 patients are shown in Table 2. Here, we considered a CV event as the event of interest.Since a CV event that occurred after the development of ESKD was not followed for practical reasons, an ESKD event can be regarded as a competing risk.We focused on the 4-year incidence rate with clinical considerations, and the incidence rates of CV and ESKD events were 117 (5.23%) and 360 (16.09%) as shown in Fig. 1.The cumulative incidence function can be defined as the expected

Parametric function for generating survival time
Scenario 1 Linear : Vol.:(0123456789) www.nature.com/scientificreports/proportion of subjects with a specific event over time 29 alongside with Kaplan-Meier estimator without any competing risks.The 4-year cumulative incidence of CV and ESKD events in all subjects is described in Fig. 2 showing that the cumulative incidence of ESKD events exceeded that of the CV events.
Analysis of CKD data.The results of the two competing risk models are similar (Table 3).According to the CS model, age, gender, DM, and CVD were significantly associated with CV events; HRs were 1.04 (95% CI 1.02,1.07),0.44 (95% CI 0.20,0.97),2.15 (95% CI 1.28, 3.60), and 3.15 (95% CI 1.73, 5.72) respectively.Likewise, age, DM, and CVD were significantly associated with CV events in the FG model; HRs were 1.05 (95% CI 1.02, 1.07), 2.11 (95% CI 1.21, 3.66), and 3.03 (95% CI 1.68, 5.48), respectively.In addition, we evaluated the model performance by calculating the predicted probabilities and compared the results.All three results are similar, except for the Calibration slope, which is 1.167, 0.928, and 0.971, respectively (Table 4).Lastly, we calculated SHapley Additive exPlanations (SHAP) values and Variable Importance to identify the contribution of the risk factors.As shown in Fig. 3, the left panel displays SHAP values meaning that the first variable on the top is the most important and the last variable on the bottom is the least important.High values of CKD_stage, MAP, and LDL positively contributed to predicting a CV event.DM and CAD were also positively associated with the prediction of CV events.On the other hand, HG and BMI have a negative relationship with predicting a CV event, and males are associated with a higher incidence of a CV event than females.The variable Importance from RSF result is also represented in the right side of Fig. 3 showing that CVD is the most influential predictive factor for the incidence of a CV event.Then, age, DM, CAD, and HG were assessed.CVD, age, and DM were important clinical factors for predicting CV events when ESKD was considered a competing risk in three results.In the case of SHapley values, the CKD stage is the most important factor in predicting CV events.[32]   Simulation results.The prediction performances were estimated using box plots through the C-index, IBS, and Calibration slope.Detailed descriptions of the three measures are provided in Methods section.Figures 4, 5,  6, 7 show the results of the prediction performance by varying the sample size ( n = 500 and 1000) and scenarios (1 and 2).In Figs. 4 and 6 (when the true g(X i ; β) is linear), as the censoring rates increase, the three prediction measures have similar performance on average but become highly volatile.In particular, the Calibration slope revealed increased overfitting and underfitting.Because the true g(X i ; β) is linear, the correct models CS_l and FG_l perform similarly to the models CS_nl and FG_nl models.Interestingly, RSF was slightly worse than the correct models in terms of the C-index and IBS, and its Calibration slope is lower than 1 (overfitted) in all cases.Figures 5 and 7 (when the true g(X i ; β) is nonlinear), the misspecified models CS_l and FG_l are poorer than their correct models.Among the methods, the performance of the CS method was poorer than that of the other methods.Their C-index was much lower, the IBS was much larger, and the range of the estimated Calibration slope was too wide, indicating overfitting and underfitting.This showed that the CS model with linear and nonlinear effects in time-to-event was more susceptible as the censoring rates increased.In summary, the performance of each method deteriorated as the censoring rate increased.However, if the conditions are the same, the results with larger sample sizes show a better and more stable performance.The CS approach is more sensitive to censoring rates than the FG and RSF methods.

Discussion
In this study, we analyzed a multicentre cohort study of chronic kidney disease (KNOW-CKD) data based on the competing risks framework.We investigated the association between clinical risk factors and renal dysfunction and compared their estimates, predicted abilities, and feature importance.As the previous studies have shown that age, male sex, diabetes, and known vascular disease are risk factors for the 10-year development of CV diseases in the general population 33 , our results from the CKD population were also similar.In the CS result, age, gender, DM, and CVD variables were significant.In the FG result, age, DM, and CVD were significant but gender was marginally significant.Although RSF cannot directly provide the p-values for significance, the variable importance was calculated.Analysis from Chronic Renal Insufficiency Cohort (CRIC) with advanced CKD showed increased age, diabetes, elevated blood pressure, and any CVD history are significant risk factors for CV events 34 .Other studies with KNOW-CKD subjects also reported that male sex, diabetes, and increased CKD stage are risk factors for CV events 31,35 .In real data analysis, two conventional methods (CS, FG) and the RSF method showed similar results in terms of the significance of risk factors and prediction.Although all methods select similar variables as the relevant features and have similar prediction performances, they all have good predictive power.Therefore, we believe that the current model with linear terms is reasonable to fit our data.Thus, we have conducted simulation studies by considering linear and non-linear terms.The performance of each method decreases as the censoring rate increase.These results show that more sophisticated methods must be considered when developing prediction models with few events.Based on the simulation results when the non-linear model is true, the RSF method was more robust.Therefore, the RSF method is recommended when the prediction performances are similar.We only used the baseline covariates as the feature variables to analyze the competing risks data.Our current analysis provided ease of computation and straightforward prediction of survival time.Because the KNOW-CKD study is longitudinal, some relevant covariates related to CKD patients can be time-dependent.Extension to a model where the time-dependent covariates or time-varying coefficients are considered would be interesting future work.Figure 5. Boxplots of predicted probabilities by each method with the existence of nonlinear effects in timeto-event with sample size n = 500 .The top, middle and bottom results represent 30%, 60%, and 80% censoring rates, respectively.The horizontal dashed line represents the optimal value in each plot.Figure 6.Boxplots of predicted probabilities by each method with the existence of linear effects in time-toevent with sample size n = 1000 .The top, middle and bottom results represent 30%, 60%, and 80% censoring rates, respectively.The horizontal dashed line represents the optimal value in each plot.

Figure 2 .
Figure 2. Cumulative incidence function curves for each event.

Figure 4 .
Figure 4. Boxplots of predicted probabilities by each method with the existence of linear effects in time-toevent with sample size n = 500 .The top, middle and bottom results represent 30%, 60%, and 80% censoring rates, respectively.The horizontal dashed line represents the optimal value in each plot.

Figure 7 .
Figure 7. Boxplots of predicted probabilities by each method with the existence of nonlinear effects in timeto-event with sample size n = 1000 .The top, middle and bottom results represent 30%, 60%, and 80% censoring rates, respectively.The horizontal dashed line represents the optimal value in each plot.
Figure 1.The 4-year incidence rates for CV and ESKD events in CKD subjects.

Table 3 .
Hazard ratios (HR) with their confidence intervals, and p-values in the CS and FG results.* The bold numbers represent statistically significant with 5% significance.The age is included as a square term.CAD: baseline coronary artery disease; DM: baseline diabetes mellitus; CVD: baseline cardiovascular disease; BMI: body mass index; HG: baseline hemoglobin; CRP: baseline high sensitivity C-reactive protein; MAP: baseline mean arterial pressure; LDL: baseline low-density lipoprotein cholesterol.