Introduction

Kidney transplantation (KT) is a highly effective treatment for patients with end-stage renal disease (ESRD), offering better quality of life compared to long-term dialysis1,2. While short-term graft survival rates have significantly improved over the past few decades, there is still room for improvement in terms of long-term graft survival3,4.

Protocol biopsy is a reliable method for evaluating allograft status within the first year following KT5. Acute rejection within the first year has been shown to have a negative impact on long-term graft survival6,7. Therefore, there is ongoing research focused on improving graft survival through early detection and treatment of subclinical rejection (SCR) using protocol biopsies8,9,10.

Previously, our research team reported the safety and feasibility of performing protocol biopsies at the 2-week11. However, protocol biopsy still presents several limitations, including the risk of complications such as bleeding, cost concerns, and challenges in implementation across different centers12,13. Considering these factors, it is crucial to establish clear indications and selectively perform protocol biopsies in high-risk groups for early SCR.

The purpose of this study is to analyze the incidence and risk factors of early SCR based on 2-week protocol biopsy data accumulated in our center11, and to suggest indications for protocol biopsy. In addition, using both machine learning and logistic regression, we develop risk assessment models of SCR and compare performances.

Method

Patients who underwent KT at Samsung Medical Center from January 2005 to December 2020 were investigated. Exclusion criteria are as follows (1) Pediatric patients, (2) received spontaneous solid organ transplantation, and (3) received dual or En-bloc KT.

Both recipient and donor data on sex, Body mass index (BMI), underlying disease, pre-dialysis information, blood type, serum creatinine, donor type (living, standard criteria deceased donor [SCD], extended criteria deceased donor [ECD]), previous transplantation history, delayed graft function, and induction agent were investigated through medical records. Data on cold ischemic time (CIT), warm ischemic time (WIT), and graft weight were investigated through operation records.

SCR were determined based on pathologic reports. Pathology was performed by dedicated specialized urology pathologist in Samsung Medical Center. All biopsy cores were obtained by two week protocol biopsy and assessed using Banff 2007 classification. The procedure details of protocol biopsy were described on a previous reported paper11. If borderline rejection was observed in the protocol biopsy, repeated biopsy was not performed, and steroid pulse therapy was administered.

Classification of immunologic risk and HLA mismatch

Immunologic risks were classified into 3 groups (high, intermediate, and low). The high group was defined as patients who met any of the following conditions (ABO incompatible (ABO-i), median fluorescence intensity (MFI) value higher than 2500 with donor specific antigen (DSA), cross match positivity, or flow cytometry positivity). The intermediate group was defined as patients who met MFI value lower than 2500 with DSA, or re-transplantation. The low group was defined as patients without DSA and other immunologic risk factors.

Human Leukocyte Antigen (HLA) mismatch were evaluated for classes I and II. HLA-I mismatches were evaluated for the A and B isotypes, and HLA-II mismatches were evaluated for the DR Isotype.

Immunosuppressive protocol

Depending on the immunologic risk, de-sensitization was performed before transplantation. In the high risk group, monoclonal antibody against CD20 (Rituximab; Genetch, Inc., South San Francisco, CA, USA) at 375 mg/m2 or 200 mg was administrated one month before transplantation. Plasmapheresis (PP) was started on the following day, and performed 5 times, intravenous immunoglobulin (IVIG) at 400 mg/kg was administered after every PP session. The rabbit antithymocyte globulin (rATG) was used for induction agent. For ABO-incompatible patients, PP frequency depended on baseline anti-ABO titer and target titer (1:32) before transplantation. In the intermediate risk group, monoclonal antibody against CD20 was administered one month before transplantation and rATG was used for induction agent. In the low risk group, no desensitization was performed and basiliximab was usually used for induction agent.

For maintenance, all patients were treated with a triple immunosuppressive regimen (tacrolimus, myophenolate mofetil, and methylprednisolone). The details of maintenance protocol were described in a previously reported paper11.

Prediction model and machine learning

In the prediction model development for predicting SCR, dependent variables were coded as binary variables (0, 1). Patients who diagnosed with no rejection in protocol biopsy were set to 0, and patients who diagnosed with rejection in protocol biopsy, including borderline rejection, were set to 1. Data resampling was performed with hold-out validation. The ratio of the training set and the test set was set to 7:3. Three commonly used machine learning methods (random forest, elastic net, extreme gradient boosting [XGB]), and logistic regression were used to train the model14. As important variables in logistic regression, variables selected by backward stepwise selection in multiple logistic regression and variables with high area under a receiver operating characteristics curve (AUROC) in simple logistic regression were selected and used together. For the variable importance measures in elastic net and XGB, variables were selected through repeated cross validation. In the random forest, the definitions of the variable importance measures were as follows. The first measure is computed from permuting out-of-bag (OOB) data: for each tree, the prediction error on the OOB portion of data is recorded. Then the same is done after permuting each predictor variable. The difference between the two are then averaged over all trees, and normalized by the standard deviation of the differences. If the standard deviation of the differences is equal to 0 for a variable, the division is not done.

In the performance evaluation, hold-out validation was randomly repeated 100 times to build models and measure area under the curve (AUC). Model performances were evaluated using average AUC. We computed the Precision-Recall Area Under the Curve (PR AUC) for the best-performing model. Subsequently, a detailed PR analysis was conducted to derive mean and standard deviation (mean ± SD) values for precision, recall, and F1 score. Machine learning analyses were performed using R version 4.2.1, caret: Classification and Regression Training R package version 6.0-93. A flow diagram of developing the prediction model is shown in Fig. 1.

Figure 1
figure 1

A flow diagram of developing the prediction model.

Statistical analysis

Continuous variables with normal distribution are summarized with mean ± standard deviation, and non-normal continuous variables are expressed as the median (range). Fisher’s exact test or Pearson’s Chi-square test was applied to compare proportions between groups as appropriate. For the comparison of continuous variables, student’s t-test of Mann–Whitney U test were used. Logistic Regression was used to evaluate risk factor of SCR, and an estimated odd ratio (OR) with 95% confidence interval (95% CI) was presented and p < 0.05 was considered statistically significant. All analyses were performed using R 4.2.1 software (The R Core Team, Vienna, Austria).

Ethical approval

The study protocol conformed to the ethical guidelines of the Declaration of Helsinki and was approved by the Institutional Review Board of Samsung Medical Center (IRB No. SMC 2023-05-157).

Informed consent

The need for informed consent was waived by the institutional review board of Samsung Medical Center due to the retrospective nature of the study.

Results

Among 1325 patients who underwent KT, 1204 were eligible for the inclusion criteria after excluding pediatric patients (n = 21), spontaneous solid organ transplantation (n = 53), dual KT (n = 23), en-bloc KT (n = 9), and combined kidney-bone marrow transplantation (CKBMT, n = 15). Two hundreds seventeen patients who could not perform a 2-week protocol biopsy due to bleeding risk and patient refusal were excluded. Sixty-one patients were excluded due to insufficient medical records. Finally, cohorts of 987 patients were reviewed and analyzed. A flow diagram showing the patients included to the study is shown in Fig. 2.

Figure 2
figure 2

Flow diagram showing the selection criteria.

Incidence and types of SCR

Of the total 987 patients, 144 patients demonstrated SCR. The incidence of SCR was 14.6%. The most common type of rejection was borderline cellular rejection (BCR, 61.8%), followed by acute cellular rejection (ACR, 23.6%) and acute antibody-mediated rejection (AMR, 10.4%). Mixed cellular and antibody-mediated rejection was also observed in 6 patients. The details of rejection type are summarized in Table 1.

Table 1 Characteristics of rejection type.

Comparison of characteristics between no rejection and rejection

In the recipient characteristics, the no rejection group demonstrated older than the rejection group (49.7 vs. 47.2, p = 0.016). There were no differences between the two groups in underlying disease, dialysis period, and underlying kidney disease. In the donor characteristics, the rejection group demonstrated a higher proportion of living donors (61.8% vs. 77.8%, p = 0.001) in the donor type. There were no differences in other baseline characteristics.

In the comparison of transplantation related factors, the rejection group demonstrated a higher proportion of ABO-compatible (83.4% vs. 94.4%, p = 0.001), and basiliximab induction (33.2% vs. 56.9%, p < 0.001). On the other hand, the proportion of HLA I and II zero-mismatch and immunologic high risk was lower in the rejection group. Comparison of characteristics between the no rejection group and the rejection group are summarized in Table 2.

Table 2 Comparison of characteristics between no rejection and subclinical rejection patients in 2 week protocol biopsy.

Risk factor analyses of SCR

In the risk factor analysis of SCR, age of recipient (OR 0.98, p = 0.014), pre-dialysis period of recipient (OR 0.99, p = 0.013), donor type (SCD [OR 0.45, p = 0.002], ECD [OR 0.48, p = 0.019]), BMI of donor (OR 1.06, p = 0.032), ABO-i (OR 0.30, p = 0.001), HLA I mismatch (four [OR 2.83, p = 0.032]), HLA II mismatch (two [OR 5.70, p < 0.001], immunologic risk (intermediate [OR 2.39, p = 0.014], low [OR 2.12, p = 0.008]), and ATG induction (OR 0.38, p < 0.001) were statistically significant in univariate analysis. However, in the multivariate analysis, age of recipient (OR 0.98, p = 0.03), BMI of donor (OR 1.07, p = 0.02), ABO-incompatible (OR 0.15, p < 0.001), HLA II mismatch (two [OR 6.44, p < 0.001], and ATG induction (OR 0.41, p < 0.001) were associated with SCR (Table 3).

Table 3 Univariate and multivariate analyses of influencing factors associated with subclinical rejection in 2 week protocol biopsy.

Prediction model

The prediction model of logistic regression (average AUC = 0.717) and elastic net (average AUC = 0.712) showed good performance with an average AUC exceeding 0.7. The performance of the other two models (XGB, random forest) did not exceed an average AUC of 0.7 (Fig. 3). Additional analysis of PR AUC was conducted for the logistic regression model with the best performance. The PR-AUC for the test set was 0.302, and for the training set, it was 0.358 (Fig. 4). In the PR analysis, the logistic regression model exhibited precision of 0.143 ± 0.011, recall of 0.939 ± 0.024, and F1 score of 0.248 ± 0.016. For the random forest model, precision was 0.166 ± 0.018, recall was 0.815 ± 0.060, and F1 score was 0.275 ± 0.023. The XGB model demonstrated precision of 0.177 ± 0.024, recall of 0.689 ± 0.138, and F1 score of 0.278 ± 0.034. The variables selected in the logistic regression model were HLA II mismatch, donor BMI, induction type (Basiliximab vs. ATG), donor type (Living vs. SCD vs. ECD), and immunologic risk (high vs intermediate vs low). In the elastic net, induction type, HLA II mismatch, donor type, immunologic risk, age, recipient blood type were selected as important variables. Including analysis of variables selected from random forest and XGB, HLA II mismatch and induction type were selected as common important variables in all models. The SHAP values and important variables for XGB and random forest were demonstrated in Fig. 5. Additional OR analysis of the logistic prediction model revealed that HLA II mismatch (OR 6.77) was a risk factor for SCR, whereas ATG induction (OR 0.37) was a favorable factor.

Figure 3
figure 3

Performance evaluations of the prediction models (Average AUC of 100 times repeats). (A) Logistic regression. (B) Elastic net. (C) Extreme gradient boosting. (D) Random forest.

Figure 4
figure 4

Precision-recall curve AUC of logistic regression model. (A) Test set. (B) Training set.

Figure 5
figure 5

The SHAP values and important variables. (A) Random forest. (B) XGB.

Discussion

KT is a therapeutic approach that significantly enhances the quality of life for patients with ESRD. However, the scarcity of organ donors poses a major challenge as the number of patients in need of transplantation far exceeds the available donors14,15. The occurrence of early allograft failure among KT recipients further exacerbates this mismatch, underscoring the critical importance of effective management strategies to ensure long-term graft survival.

Protocol biopsy is a technique performed at many centers for the purpose of early detection and treatment of rejection for long-term survival of allografts. Our center is also performing a protocol biopsy at 2 weeks, where safety has previously been reported11. However, there is a possibility of complications requiring intervention with a low probability, and there were problems in cost and time to implement in all patients. Therefore, the purpose of this study was to analyze the risk factors for SCR at 2 weeks and to make an indication for protocol biopsy.

The incidence of SCR in our study was determined to be 14.6%. When excluding BCR, the rejection rate was found to be 5.6%, which is comparable to the rejection rates (7.5–10.7%) reported in other studies involving protocol biopsies conducted within 1–6 months post-transplantation16,17,18. However, our observed rejection rate was lower than the previously reported rate of 17% for the 1–2 week period19.

Previous studies have indicated that HLA-A, HLA-B, and HLA-DR are potential risk factors for early SCR19. Oh et al.20 also reported that SCR was associated with HLA II mismatch and Simulet induction. Consistent with these findings, our study identified HLA II mismatch as a risk factor for early SCR. Furthermore, our results demonstrated a gradual increase in risk with an increasing number of HLA discrepancies.

In contrast to findings from other studies21,22, ABO-compatible KT was identified as an unfavorable factor for early SCR in our analysis. There are several potential reasons for this result. Firstly, it could be attributed to variations in the choice of induction agents. All ABO-i recipients in our study were induced with ATG, which might have influenced the outcomes. Secondly, it is plausible that the additional use of plasmapheresis in ABO-i recipients had an effect. In our center, plasmapheresis is performed if the postoperative isoagglutinin titer exceeds twice the preoperative level or if the titer does not reach 1:32 before surgery. Although plasmapheresis has been reported to reduce the occurrence of rejection after KT by eliminating preexisting antibodies, it remains unclear whether it is more effective in patients with high-sensitivity DSA or ABO-i23,24. Further research is warranted to investigate the specific impact of additional plasmapheresis in the context of high-sensitivity DSA and ABO-i, in order to obtain a more accurate analysis.

Machine learning represents a novel statistical approach enabling rapid analysis of complex factors and prediction of specific events. In our study, we employed the three most commonly used machine learning techniques to develop a prediction model. When comparing the performance of these models, logistic regression and elastic net demonstrated superior predictive capabilities compared to random forest and XGB. Logistic regression and elastic net are linear model-based methods, whereas random forest and XGB are tree-based models that enhance predictive accuracy by utilizing ensemble methods to estimate numerous trees. Given the observed performance differences, it is suggested that the factors influencing early SCR exhibit a linear pattern rather than a complex one.

This study was initiated to find and apply appropriate indications for protocol biopsy. Our initial hypothesis postulated a stronger association between immunologic factors, such as HLA mismatch or recipient characteristics, and the occurrence of rejection. However, the results revealed that not only HLA mismatch but also the choice of induction agent (ATG) played a significant role in the predictive model. Djamali et al. reported that the peak intensity of ATG occurs between days 6 and 8, with sustained T-cell depletion lasting beyond 20 days25. Taking this into consideration, it is plausible that the effect of ATG may persist during the two-week period of the protocol biopsy. Therefore, while the two-week protocol biopsy is deemed a safe procedure, it may be too early to evaluate graft function accurately due to the lingering impact of the induction agent.

This study has limitations due to the nature of a retrospective and a single center study. It is meaningful in that it showed that rejection rate that occurred very early (within 2 weeks) after KT and its risk factors. In addition, since one of the artificial intelligence (AI) technologies called machine learning was used, it showed how AI is used in the KT field. Through this result, it was found that the factors influencing the KT outcome showed a linear pattern. However, there is a limitation that these factors are selected by researchers. Therefore, in order to deduce the complex factors affecting KT through AI, future research using methods such as deep learning that can exclude human intervention is considered.

In summary, although future study is needed to determine the clinical significance of the early detection of SCR after KT at early stage, early SCR was associated with HLA II mismatches and induction agent and can be predicted by prediction model using machine learning.