A Donor Quality Index for liver transplantation: development, internal and external validation

Winter, Audrey; Féray, Cyrille; Audureau, Etienne; Azoulay, Daniel; Antoine, Corinne; Daurès, Jean-Pierre; Landais, Paul

doi:10.1038/s41598-018-27960-7

Download PDF

Article
Open access
Published: 29 June 2018

A Donor Quality Index for liver transplantation: development, internal and external validation

Audrey Winter ORCID: orcid.org/0000-0002-6161-8037^1,2,3,
Cyrille Féray⁴,
Etienne Audureau⁵,
Daniel Azoulay⁶,
Corinne Antoine⁷,
Jean-Pierre Daurès^1,2 &
…
Paul Landais¹

Scientific Reports volume 8, Article number: 9871 (2018) Cite this article

2343 Accesses
10 Citations
1 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 05 October 2018

This article has been updated

Abstract

Organ shortage leads to using non-optimal liver grafts. Thus, to determine the graft quality, the Donor Risk Index and the Eurotransplant Donor Risk Index have been proposed. In a previous study we showed that neither could be validated on the French database. Our aim was then dedicated to propose an adaptive Donor Quality Index (DQI) using data from 3961 liver transplantation (LT) performed in France between 2009 and 2013, with an external validation based on 1048 French LT performed in 2014. Using Cox models and three different methods of selection, we developed a new score and defined groups at risk. Model performance was assessed by means of three measures of discrimination corrected by the optimism using a bootstrap procedure. An external validation was also performed in order to evaluate its calibration and discrimination. Five donor covariates were retained: age, cause of death, intensive care unit stay, lowest MDRD creatinine clearance, and liver type. Three groups at risk could be discriminated. The performances of the model were satisfactory after internal validation. Calibration and discrimination were preserved in the external validation dataset. The DQI exhibited good properties and is potentially adaptive as an aid for better guiding decision making for LT.

Validation of the Kidney Donor Profile Index (KDPI) to assess a deceased donor’s kidneys’ outcome in a European cohort

Article Open access 02 August 2019

Matching Graft Quality to Recipient’s Disease Severity Based on the Survival Benefit in Liver Transplantation

Article Open access 05 March 2020

The promise of machine learning applications in solid organ transplantation

Article Open access 11 July 2022

Introduction

Liver transplantation (LT) is the major therapy to cure end stage liver disease and hepatocellular carcinoma. In France, the “Agence de la Biomédecine” (ABM) is responsible for managing the waiting list and distributing the grafts. A major impediment to LT is the persistent shortage of donors. Although donor age is one of the strongest risk factor of graft failure^1,2,3, the use of grafts from elderly donors has become common. In France, the number of donors older than 65 rose by a factor of 20 between 1998 and 2014. This is partly due to the increase of vascular causes of death and the decrease of traumatic death since 2005.

To assess the quality of a graft, scores such as the Donor Risk Index (DRI) or the Eurotransplant Donor Risk Index (ET-DRI)^1,2 have been proposed. These two scores appeared to have an impact on the post transplantation survival^1,2. However, they could not be validated on the French LT database^4,5,6. Major donor and recipient differences observed between those databases may explain this result^4,5. If re-calibrating a model can be considered^7,8, however, the discrimination cannot be altered⁶. Thus, the current DRIs could not be used on our dataset.

A general policy is to try to give the best organs to the more severe patients with liver failure³. Looking for the best match between a graft and its recipient requires preliminary information about the graft quality prior to LT. A donor quality index (DQI), which scores the characteristics of the graft prior to LT, is one of the key factors for a better matching to a recipient. The covariates used to compose the DQI are those available at the time of the harvest in order to define the intrinsic characteristics of the graft. More, in order to better orient the decision of using a graft, a liver donor quality index is of importance. In 2014, in the United-States, 9.6%⁹ of donor livers were not transplanted. This rate was 6.7% in France for our study period (2009–2013) (Supplementary Tables S1 and S2 provide reasons why some grafts were not collected and why some collected organs were not transplanted during the study period, respectively). Moreover, the DQI may be of interest for choosing which graft might benefit from novel preservation techniques.

Since we showed that the existing DRIs did not fit to the French database^4,5, the primary objective of the present study was to generate a DQI and to perform an internal and external validation.

Material

Derivation Dataset

Information relating to LT performed in France between January 4, 2009 and December 31, 2013 was obtained from the ABM (https://www.agence-biomedecine.fr/Organes). The study was conducted with the approval of the Independent Ethics Committee (L 1121-1 to L 1126-11 articles of the Public Health Code). Authorization was also obtained from the “Commission Nationale de l’Informatique et des Libertés” (agreement No. 915206). The data provided were de-identified beforehand.

In accordance with previous works^1,2, recipients under 18 years, and recipients of multiple organ transplants were not included. Of note, no donation after cardiac death was performed during the retained period (https://www.agence-biomedecine.fr/annexes/bilan2016/donnees/organes/01-prelevement/synthese.htm#figP1). The recipients’ follow-up began at LT and ended with the occurrence of one of the following events: lost to follow-up, death, graft-loss (re-transplantation) or at the end of the study, as of December 31, 2014. The outcome was death or graft-loss. Patients with incomplete covariates were not retained as specified in the flow diagram presented in Fig. 1. Finally, 3961 LT were analyzed. Donor and recipient characteristics are shown in Table 1. This Table contains all the covariates included in the DRI and ET-DRI, and available in the French database^1,2,4.

Table 1 Donor and recipient characteristics with Log-rank test p-value for qualitative variables and quantitative variables across risk groups when significant threshold was met.

Full size table

Validation Dataset

For the external validation, we used 1048 LT performed in France between January 1, 2014 and December 31, 2014, followed-up up to December 31, 2015. Patients with incomplete covariates were not retained (Fig. 1).

Results

The DQI

The studied variables and log-rank tests are shown in Table 1. As stated in the methods section, a Cox model with adjustment on recipient characteristics was retained to create the DQI. Several variables (Table 2) were common to the three models. The retained donor’s covariates were: age; cause of death (COD); and intensive care unit (ICU) stay. Several recipient covariates were retained for adjustment: re-transplantation; being on dialysis; status at time of LT; presence of hepatitis C virus antibody; diabetes; and decompensated cirrhosis.

Table 2 Retained covariates in the different selection way.

Full size table

The retained full model included all the variables present in at least two of the three models and the set of covariates retained in addition (see methods section) (Table 3): the lowest MDRD creatinine clearance; the liver type; the MELD exception; donor and recipient’s blood groups.

Table 3 Retained Cox model also adjusted for: previous transplantation; MELD exception, on dialysis, medical condition before liver transplantation, hepatitis C virus antibody, diabetes, decompensated cirrhosis, recipient’s and donor’s ABO group.

Full size table

The liver type covariate was retained since it was present in both DRI and ET-DRI. The MELD exception and donor and recipient’s blood groups covariates were also retained, but for adjustment only.

Hence the proposed DQI score was as follows:

$$\begin{array}{rcl}{\rm{DQI}} & = & \exp \,(0.28\,(1\,{\rm{if}}\,{\rm{donor}}\,{\rm{age}} > 69\,{\rm{years}},\,0\,{\rm{otherwise}})\\ & & +\,0.06\,(1\,{\rm{if}}\,{\rm{COD}}\,{\rm{is}}\,\mbox{''}{\rm{other}}\mbox{''},\,0\,{\rm{otherwise}})\\ & & +\,0.30\,(1\,{\rm{if}}\,{\rm{COD}}\,{\rm{is}}\,\mbox{''}{\rm{cerebrovascular}}\,{\rm{accident}}\,({\rm{CVA}})\mbox{''},\,0\,{\rm{otherwise}})\\ & & +\,0.11\,(1\,{\rm{if}}\,{\rm{COD}}\,{\rm{is}}\,\mbox{''}{\rm{trauma}}\mbox{''},\,0\,{\rm{otherwise}})\\ & & +\,0.24\,(1\,{\rm{if}}\,{\rm{ICU}}\,{\rm{stay}}\,{\rm{is}}\le \,4\,{\rm{days}},\,0\,{\rm{otherwise}})\\ & & +\,0.22\,(1\,{\rm{if}}\,{\rm{the}}\,{\rm{lowest}}\,{\rm{MDRD}}\,{\rm{creatinine}}\,{\rm{clearance}}\\ & & < \,60\,{\rm{ml}}/{\rm{\min }}/1.73\,{{\rm{m}}}^{2},\,0\,{\rm{otherwise}})\\ & & +\,0.05\,(1\,{\rm{if}}\,{\rm{the}}\,{\rm{lowest}}\,{\rm{MDRD}}\,{\rm{creatinine}}\,{\rm{clearance}}\\ & & \ge \,60\,{\rm{ml}}/{\rm{\min }}/1.73\,{{\rm{m}}}^{2}\,{\rm{and}}\, < \,90\,{\rm{ml}}/{\rm{\min }}/1.73\,{{\rm{m}}}^{2},\,0\,{\rm{otherwise}})\\ & & +\,0.39\,(1\,{\rm{if}}\,{\rm{split}}\,{\rm{or}}\,{\rm{partial}}\,{\rm{liver}},\,0\,{\rm{otherwise}})).\end{array}$$

In Table 4, the progression of the score through different combinations is presented. In our dataset, the average score was 1.83 (SD: 0.44) with a median of 1.76 and a range between 1 and 3.12.

Table 4 Examples of some combinations of risk factors and their effect on the score.

Full size table

Risk groups

Three groups were obtained according to the following values (Fig. 2): 1.00 < DQI ≤ 1.58; 1.58 < DQI ≤ 2.35 and DQI > 2.35, comprising 34.1%, 56.8% and 9.1% patients, respectively. Three-month, 1-year and 3-year survivals were calculated for each group (Fig. 2). HRs were expressed according to the groups at risk (Fig. 2).

Calibration plot

The calibration of the score was assessed. As shown in Fig. 1S (see Supplementary Figures) the score seems to slightly underestimate the probability of death/graft loss at all months tested. This underestimation was not significant for all the predicted probabilities since more than 50% of the 95 band of the calibration plot contained the first bisector.

Of note, the 95 band was wide at the higher probability of death/graft loss due to few remaining patients at risk in this category. However, despite a slight underestimation, the longer the follow-up, the lower was the underestimation of the risk of death/graft loss.

Model performance and internal validation

As the predictive performances of the different models were close (Table 2), the covariates selection was implemented with the simplest method, namely the backward selection using Akaiké criterion. The corrected performances of the model by the bootstrap method were 0.609 (0.591–0.627) for the Harrell C-index, 0.604 (0.590–0.617) for the Gönen and Heller K statistic and 0.464 (0.408–0.520) for the Royston and Sauerbrei ${R}_{D}^{2}$. The confidence intervals were calculated using 200 bootstrap estimates. The performances of the model remained satisfactory even after correction by the optimism.

External validation

An external validation^4,6 was performed in the validation dataset, which was constituted of grafted patients in 2014. This dataset was independent of the derivation dataset. To calculate the DQI in the validation dataset, we followed the same procedures as proposed previously in the construction steps. Calibration and discrimination were evaluated through seven steps.

Comparing the databases

The comparison of the datasets appears in Table 5. The distribution of the DQI from the derivation and validation datasets is presented in Fig. 2S (see Supplementary Figures). It appears that the DQI is higher in the validation dataset than in the construction dataset. This might be due to older donors and more frequent strokes.

Table 5 Differences in donor and recipient characteristics between the derivation and validation datasets (P-values, χ² tests and ANOVA when appropriate).

Full size table

Regression on prognostic index (PI)

The slope on the PI, i.e. ln(DQI), was 0.88 (SD = 0.29). The slope was not different from 1 (P-value = 0.68, likelihood-ratio test); thus, the discrimination was preserved in the validation dataset.

Check model misspecification/fit

Since the estimation of the β* coefficients were not different from 0 (p = 0.64), no adjustment on the DQI covariates was needed.

We tested if there was a lack of fit for the adjustment covariates, i.e. recipients’ covariates. The ${\beta }_{adjustment\,}^{\ast }$ were different from 0 (p < 0.01), an adjustment on the adjustment covariates was needed.

The assumption of proportional hazards was verified for all covariates.

Measures of discrimination

The Harrell C-index was 0.587 (0.545–0.628), the Gönen and Heller K statistic was 0.612 (0.594–0.630) and the Royston and Sauerbrei ${R}_{D}^{2}$ was 0.416 (0.309–0.523). Discrimination measures on the validation and derivation datasets were close.

Kaplan-Meier curves for groups at risk

Failure free survival per DQI categories for derivation and validation datasets is given in Fig. 3. The follow-up in the derivation data was truncated to 800 days.

First, Kaplan Meier curves were well separated. Then, the risk groups were discriminative. Second, Kaplan Meier curves from derivation and validation data were close. For greater accuracy, we then plotted three survival curves for each risk group with confidence intervals in derivation and validation datasets (Fig. 3). Validation curves are within the confidence interval of the derivation dataset curves (except for group 1: from 300 to 500 days and the tail of group 2). The apparent calibration thus seems to be preserved.

Hazard ratios between groups at risk

HRs between groups at risk are presented in Fig. 3. They were not significantly different.

Calibration

The Kaplan Meier curves (green, Fig. 3, see “calibration”) seem to underestimate survival. The predicted survival curves in the two datasets (red and black) are superimposed for the first risk group, very close for the second risk group and more separated for the third group. Then, there is some similarity in the PI distribution in each risk group in the derivation and validation datasets. The empirical cumulative distribution functions of the PI by dataset and risk group, on Fig. 3S (see Supplementary Figures), further support this.

Discussion

A DQI, which qualifies the liver graft, is one of the key factors for a better matching between donor and recipient. As the existing DRIs were not discriminant (i.e. slope on the PI: 0.57 (SE 0.15) and 0.64 (SE 0.16) for the DRI and ET-DRI respectively, P-value < 0.001) and miscalibrated (e.g. survival, according to DRIs categories, was not consistent between the construction and validation datasets) according to our dataset^4,5, because of population differences. We thus decided to create a DQI, which characterizes the graft in view of transplantation. It is based on Cox model adjusted on the recipients’ characteristics. Three different ways to elaborate the model were tested. We explored new covariates that were not taken into account in the DRI and ET-DRI, defining the donor such as MDRD clearance or adjustment covariates such as MELD exception, decompensated cirrhosis or HCC.

Five donor covariates were retained in the final model: age, cause of death, intensive care unit stay, lowest MDRD creatinine clearance, and liver type (split or total).

We obtained three groups of different sizes (Fig. 2). Of note, there is no consensus in the literature regarding selecting the number of risk groups or in positioning the cut-off points to delineate these groups¹⁰. Too many groups could be unstable and consequently discrimination becomes insufficient. The recommendation is to create three to five groups, not necessarily of the same size, in order to highlight extreme groups¹⁰. After construction of the risk groups, HRs (Fig. 2), between groups, showed significant differences in mortality/graft loss. Indeed, the curves were all well separated, and the survival decreased with the increase of the score (Fig. 2). The third group will correspond to donors at higher risk that we may define as “extended criteria donors”. As shown by Collett et al.¹¹, the average MELD scores for recipient in the three groups at risk (20.6, 20.8 and 19.3, respectively) were not different (p = 0.06). The performance of the model remained satisfactory even after correction by the optimism. Finally, an external validation^4,6 was performed; discrimination and apparent calibration were preserved in the validation dataset.

Our study has some limitations. First, it is very difficult to evaluate the intrinsic quality of a graft since the allocation procedure associates to each donor a recipient. The quality of the graft is a function of the graft/recipient survival. We then assume that the allocation was appropriate. The graft/recipient survival provides an indirect measure of the donor quality and thus can be considered as a surrogate variable. We assumed that this variable was highly correlated to the target variable, i.e. the graft quality, and was thus able to be extrapolated as an outcome.

In prognostic models, discrimination measures are generally not high¹² which the case in our study. This occurred even when models were built in large datasets^2,13.

A selection bias has been explored as 858 recipient/donor pairs were removed from the model due to missing data. We compared the distributions of data of this subset to the data used in the analysis either for covariates belonging to the recipient (age, sex, cancer, decompensated cirrhosis, waiting time, medical condition before LT) or donor (age, sex, COD, liver type). After Bonferroni correction, only the presence of a cancer and medical condition before LT were significant (χ² tests). Of note, only the covariate “presence of a liver cancer” was not part of the score. However, the frequency of liver cancer was higher in the analysis data set, which limits the occurrence of a potential bias.

As the size was part of the DRI and the weight of donors is often difficult to establish due to fluctuations in hydration during the ICU stay, we gave more importance to the size and we did not include the weight in our study. Nevertheless, we tested the weight in the final model. The weight did not provide a pertinent information to be added in the model (p = 0.82). This result is consistent with the DRI and ET-DRI models^1,2.

Macro/micro-steatosis was not included in the model since this information was not available in the French database.

Since the cold ischemia time (CIT) is only known at the time of LT, this covariate was not retained as an intrinsic characteristic of the graft in the model, as outlined in^11,14, even though this covariate has an impact on post LT survival, such as post- and peri-operative period covariates. These covariates are not known at the time of graft procurement and can’t contribute to the creation of a score, which qualifies the donor. The aim of this work was to characterize the donor independently of the potential recipient. Of note, the distance between the organ procurement center and the transplant center was not correlated to the cold ischemia time (Pearson correlation coefficient r = 0.08), indicating that cold ischemia time is highly dependent on the logistic of transplantation. Moreover, the introduction of perfusion machines may completely reverse the paradigm of organ preservation, and thus will modify the impact of the cold ischemia time. In effect, increasing the duration of organ conservation in standardized conditions may consistently improve the chances for a patient to benefit from the most compatible graft. Whatever the case may be, cold ischemia time did not go through the three different ways of selecting the covariates. About the exploration of a potential bias, as CIT is not known at the time of graft procurement, it does not lead to a selection or misclassification bias. However, there may be confounding bias because CIT exerts an effect on the recipient/graft survival. We went back and found that this was not the case. In effect, CIT had no specific weight in the Cox model (HR: 0.9998 [0.9996–1.0001], p = 0.26), therefore we did not include it in the final model even as an adjustment covariate. Adding CIT in the final model as adjustment covariate would have decreased both the quality and discrimination performance of the model (i.e. increase in Akaiké criterion (AIC) and decrease in c-index, respectively). For example, in case-control studies, when a model is forced to include non-retrained covariates, the model often becomes overfitted. An overfitted model has a weak validity and the model would have insufficient discriminative power in new patients.

Donor age was different between the Organ Procurement and Transplantation Network (OPTN) database¹ and the French database^4,5. Since age below 70 did not exert a significant effect on recipient survival in our database, the first three quartiles were grouped together. Age over 70 represented 25% in the French database compared to 4.3% in the OPTN database.

For COD, 60% of patients presented a CVA, 12% an anoxia and 25% a trauma. This distribution was quite different from that observed in the OPTN database¹, which was 44%, 9% and 45%, respectively (p < 0.001). The risk of death/graft loss was increased in patients who experienced a CVA compared to anoxia (HR: 1.35 [1.08–1.69], p < 0.01). This result was consistent with Feng et al.¹ and Singhal et al.¹⁵.

In three out of four stays in ICU the donor’s length of stay was less than 4 days. Shorts stays in ICU were associated with an increased risk of death/graft loss. As outlined above, CVA was associated with an increased risk of death/graft loss compared either to anoxia or trauma COD. Given that the length of ICU stays of donors with a CVA was ≤4 days in more than 80% of cases (which represents 49% of the donors), it supports the fact that short stay in ICU in our series was related to a shorter recipient survival. In the literature the donor ICU stay did not appear to influence graft survival^16,17. However, Cuende et al.¹⁸ included 3429 LT and showed that an ICU stay of more than six days represented a moderate risk (RR: 1.21 [1.1–1.4]).

Finally, concerning the external validation, the survival among the different DQI groups was not significantly different (HRs with large confidence interval). This may be due to a lack of power given that the follow up was shorter than in the derivation dataset. We were thus truncated the derivation data follow-up. A longer follow-up period would have been beneficial to obtain a better accuracy. Nevertheless, Kaplan Meier curves were well separated (Fig. 3). Then, the risk groups were discriminative. For the calibration step, an under-estimation was observed. This may be due to the specific structure of the DQI. Indeed, this score includes only a part of the PI of the original model (the β_adjustment for recipient covariates were not included). But, baseline survival was calculated on the original model, which includes all the covariates; both donor and recipient. The weights of the recipients’ covariates then may play an important role in the estimation of ${S}_{0}^{deriv}(t)$, i.e. the baseline survival in the derivation dataset. We checked whether this hypothesis was consistent by computing the baseline survival on the derivation dataset, without the recipients’ covariates (Fig. 4S, Supplementary Figures). Compared to those in Fig. 3, the predicted survival was under-estimated. It confirms that baseline survival is affected by recipients’ covariates. This result therefore impacts the seventh step of the external validation. Nevertheless, according to Kaplan Meier curves (Fig. 3), the apparent calibration and the discrimination were preserved in the validation dataset.

In a recent review, Flores and Asrani¹⁴ suggested that the donor score might benefit from being updated. We aimed to create a flexible DQI model and adaptable. Indeed, if calibration is lacking with a validation dataset comprising a longer follow-up, and if the discrimination is preserved, a re-calibration of the DQI will be possible. This type of procedure is dedicated to be repeated yearly using LT data from subsequent years. This adaptive procedure will enable to gradually take into account the modifications of the graft allocation system. This approach may be of interest for other countries. Indeed, French allocation system is quite similar to most European systems. Moreover, in France, the MELD and MELD exceptions are also used such as in the United States of America. For an international-reader, our model is a tool for decision support in countries with a high activity of harvest of brain dead organ donors, with a significant increase of elderly donors, such as in Italy, Spain or Portugal thanks to a political proactive organ census and collection of old or very old donors. We encourage these countries to make an external validation of the DQI on their own datasets. Furthermore, recently, another national index was introduced, the DLI (Donor Liver Index, which has been calculated in the French dataset (Table 1))¹¹. This score is an example that a national index gives complementary information to bring an aid to improving the evaluation of the quality of grafts.

This DQI allows considering a next step to explore the optimal matching between donor and recipient and the “extended criteria donor” graft attribution, to improve the success of LTs, through sequential stratification method³. The DQI will be used to qualify the graft. We also aim to create a score based on the survival benefit, which combines two models: a pre-transplantation model and a post-transplantation model. The post-transplantation model takes into account the DQI and integrate the results of the sequential stratification study. A validation will be performed using a discrete-event simulation model.

More, use of machine perfusion protocols are starting this year in France. Until now, the only source of grafts are brain dead donors. These perfusion machines will likely allow rehabilitation of critical grafts and also will offer viability tests, which may eventually be compared to the DQI to test its relevance. Machine perfusion are quite expensive, where such a prognostic score may be of interest to target grafts and who would benefit the most from the infusion.

Methods

In order to simplify the interpretation of the model, and moreover to protect against outliers, we transformed the quantitative variables into qualitative variables. When there was no clinical threshold generally accepted (contrary to biological covariates such as alkaline phosphatases or Modification of the Diet in Renal Disease (MDRD) creatinine clearance for which thresholds are recommended) we defined four groups according to the quartiles of the covariate distributions. Graft survival curves were plotted according to the product limit method. They were then compared using hazard ratios (HR). In the absence of significant difference between groups, they were re-grouped together. If no groups at risk were identifiable for a given covariate, then this covariate was kept in the model for adjustment as a quantitative variable (or its natural logarithmic form, when appropriate), according to the three models presented below.

Definitions of covariates

In France, all donors are cared for in intensive care units (ICU). The ICU length of stay is based on the number of nights spent, rather than the number of days. Therefore, a stay of 0 day is observable.

For recipient, decompensated cirrhosis was identified in the database using the MELD score and the Child-Pugh score. All recipients presenting with cirrhosis of the liver, a MELD score ≥16 and a Child-Pugh B or C were considered as decompensated cirrhosis.

Patients first-transplanted, without cancer and cirrhosis, were considered as having a non-cirrhotic liver disease.

The estimated distance between donor and recipient centers was calculated on the basis of a geographic model taking into account road distances in minutes.

MELD exceptions were identified and resulted in extra points while on the waiting list¹⁹.

The variable named “Hors tour” means that after at least five consecutive refusals by the transplant teams, the graft is then supplied to a transplant team who have identified an appropriate candidate.

Score creation

In the derivation dataset, as in Feng’s¹ and Braat’s² studies we used a Cox model with adjustment on recipient characteristics to create the score (Table 1). We tested three different ways to elaborate this model:

(1)
A complete model with a covariate selection according to AIC;
(2)
An analysis using a two-step procedure. First, performing log rank tests with a threshold of 20% for the type I error for each covariate, according to the outcome. Then, followed by a multivariate model that included the selected covariates on the basis of the previous step with a selection threshold set at 20% for the type I error.
(3)
An analysis using the same first step of the procedure above was performed but followed by two multivariate models for donor and recipient, respectively. These models included the selected covariates on the basis of the first step with a selection threshold set at 20% for the type I error. Finally, a multivariate model was set up with all the covariates selected in the previous models with a selection threshold set at 20% for the type I error.

We then retained a model including all the covariates present in at least two of the three models. Moreover, in this model we also tested each covariate, which appeared in only one of the three models. Covariates were retained if their P-values were lower than 5%. Then, the final model included the set of covariates present in two of the three models and the set of covariates retained in addition. This approach prevented omission any relevant covariates.