Validation of systems biology derived molecular markers of renal donor organ status associated with long term allograft function

Donor organ quality affects long term outcome after renal transplantation. A variety of prognostic molecular markers is available, yet their validity often remains undetermined. A network-based molecular model reflecting donor kidney status based on transcriptomics data and molecular features reported in scientific literature to be associated with chronic allograft nephropathy was created. Significantly enriched biological processes were identified and representative markers were selected. An independent kidney pre-implantation transcriptomics dataset of 76 organs was used to predict estimated glomerular filtration rate (eGFR) values twelve months after transplantation using available clinical data and marker expression values. The best-performing regression model solely based on the clinical parameters donor age, donor gender, and recipient gender explained 17% of variance in post-transplant eGFR values. The five molecular markers EGF, CD2BP2, RALBP1, SF3B1, and DDX19B representing key molecular processes of the constructed renal donor organ status molecular model in addition to the clinical parameters significantly improved model performance (p-value = 0.0007) explaining around 33% of the variability of eGFR values twelve months after transplantation. Collectively, molecular markers reflecting donor organ status significantly add to prediction of post-transplant renal function when added to the clinical parameters donor age and gender.

studied the proteomic signature of the preservation fluid to derive biomarkers to predict immediate postoperative transplant function 6 . Another intuitively superior way is to use information obtained from pre-implantation biopsies. Next to unresolved technical issues (wedge vs. needle biopsy, frozen section vs. paraffin embedded tissue) it has recently been shown that conventional histological workup does provide reliable information in some 7 but not all situations 8 . Researchers have started to look into molecular signatures in the biopsy tissue that predict immediate, but also mid-to long term post-transplant renal function [9][10][11][12][13][14][15][16] . Some of these studies are hypothesis driven. Donor age for example is an often used (albeit less than perfect) surrogate for organ quality. Koppelstätter and colleagues extended this concept by showing that markers of biological age in the biopsy (telomere length and expression of cell cycle inhibitors) better than chronological donor age predict post-transplant renal function as assessed by serum creatinine one year after transplantation 17 . Günther and colleagues very recently could show that the activating cytotoxicity receptor NKG2D was associated with chronological age and was indicative for post-transplant outcome 18 . However age is only one factor in a likely complex molecular interaction phenotype that determines organ quality in a way that it affects transplant outcome. Hypothesis driven approaches looking into single markers or pathways therefore are most likely inappropriate, a problem that could be solved by studying marker combinations or panels. However, as long as the molecular pathophysiology of chronic allograft failure is not completely understood unbiased approaches like whole-genome expression data sets obtained from pre-implantation biopsies are an important data source. Strategies on the identification of biomarkers making use of computational modelling and data integration approaches seem reasonable in this context and were elegantly summarized by Wang and Sarwal 19 . Recently O'Connel and colleagues published a study on the use of transcriptomic profiling using tissue obtained from protocol biopsies taken three months after transplantation to predict long term outcome 20 .
In this study we used a data integration approach to build a molecular model reflecting pre-implantation donor organ status. This model was based on features associated with mid-to long term allograft function from published hypothesis driven research articles and from transcriptomics data sets. Bioinformatics network analysis reduced the number of genes by focusing on those, which are more likely important features in interaction networks. A major limitation of published data on biomarkers in this context is that they are usually not validated in external cohorts. We therefore finally tested our in-silico derived markers in an independent group of renal allograft recipients, in whom gene profiling data were available from pre-implantation biopsies. Figure 1 outlines the data analysis workflow along with key results on identified molecular markers and biological processes.

Results
The donor organ molecular model. A set of 548 molecular features associated with renal function 12 months after transplantation could be derived based on molecular feature sets from LIT-CAN, TX-PERCO, TX-KAINZ, and TX-SCIAN. A brief description of datasets used in the analysis is listed in Table 1.
LIT-CAN consisted of eight genes, the transcriptomics datasets TX-PERCO, TX-KAINZ, and TX-SCIAN held 260, 46, and 259 molecular features respectively. 391 of the molecular features shared at least one protein interaction to another member of the combined set thus forming the induced subgraph with omicsNET as the underlying biological network. Eleven clusters of highly interconnected proteins were identified by the MCODE algorithm with 89 proteins assigned to these eleven clusters. A set of 34 GO biological processes could be identified as being enriched based on the set of 89 proteins being part of the donor organ status molecular model. The top-ranked GO biological processes were the neurotrophin TRK receptor signaling pathway (FDR < 0.001), the Fc-epsilon receptor signaling pathway (FDR < 0.001), the epidermal growth factor receptor signaling pathway (FDR < 0.001), as well as the fibroblast growth factor receptor signaling pathway (FDR = 0.0027).
Predicting post-transplant renal function. Average donor age in our cohort was 54 years ranging from 18 to 86. Donor age was significantly correlated to recipient age (R = 0.47, p-value < 0.0001) (see Fig. 2). The ratio of women to men in our dataset was around 1:1.5 for donors and around 1:1.7 for recipients. Average eGFR value 12 months after transplantation was 47.23 ml/min/1.73 m 2 ranging from 6.38 ml/min/1.73 m 2 to 100.74 ml/ min/1.73 m 2 .
The best performing linear model solely holding clinical parameters consisted of donor age, donor gender, and recipient gender and explained about 17% of variance of recipient eGFR values twelve months after transplantation (p-value = 0.0010).
The best performing linear model holding both clinical parameters and molecular markers consisted of donor age and gender along with the expression levels of the markers EGF (epidermal growth factor), CD2BP2 (CD2 cytoplasmic tail binding protein 2), RALBP1 (ralA binding protein 1), SF3B1 (splicing factor 3b subunit 1), and DDX19B (DEAD-box helicase 19B). This set of parameters explained around 33% of post-transplant eGFR values (p-value < 0.0001) thus significantly outperforming the statistical model only holding clinical parameters

TX-PERCO
Transcriptomics study on renal zero-hour biopsies reporting differentially regulated genes associated with histopathological characteristics of the donor organ.
Molecular features linked to mediumterm post-transplant outcome after reanalysis were used as input for generating the donor organ status molecular model. 9 TX-KAINZ Transcriptomics dataset reporting on differentially expressed genes between a group with high (>=45 ml/min/1.73 m 2 ) and low (<45 ml/min/1.73 m 2 ) eGFR group 12 months after renal transplantation.
Molecular features were used as input for generating the donor organ status molecular model. 15
Molecular features were used as input for generating the donor organ status molecular model. 16 omicsNET Hybrid protein interaction network holding protein-protein interactions together with computationally inferred relations.
This biological network was used in order to generate the donor organ status molecular model. 37 DAVID v6.8 GO biological process set Set of gene to GO biological process relations as stored in DAVID v6.8.
GO biological process set was used for enrichment analysis in order to identify affected biological processes based on the set of molecular features in the donor organ status molecular model. 40

TX-IBK
Set of 76 gene expression profiles of renal pre-implantation biopsies with clinical data of donor as well as transplant recipient at baseline and during follow up.
Expression data and clinical data were used for building multiple linear regression models in order to predict post-transplant kidney function. Table 1. Listing of datasets used. Overview and brief description of used datasets within the present study. The specific use of the dataset is given along with the links to original publications.

GSE95026
following analysis of variance (ANOVA) (p-value = 0.0007). Detailed model characteristics of the statistical models are given in Table 3. Significant correlations between predictor variables were identified for the molecular markers CD2BP2 and DDX19B (R = 0.48, p-value < 0.0001), SF3B1 and RALBP1 (R = 0.37, p-value = 0.0011), as well as RALBP1 and DDX19B (R = −0.5, p-value < 0.0001). As the maximum absolute Pearson correlation was not above 0.5, inclusion of selected molecular markers into one regression model was justifiable. The full correlation matrix of continuous variables is given in Fig. 2.
Whereas the add-on information on rejection episodes did not improve model performance, knowledge on the occurrence of delayed graft function significantly improved both models. The adjusted R 2 value of the model only holding clinical parameters increased from 0.17 to 0.28 (ANOVA p-value < 0.0001) and the adjusted R 2 value of the best performing model based on molecular markers further increased from 0.33 to 0.42 (ANOVA p-value < 0.0001). No significant association of the five markers with either delayed graft function nor rejection episodes could be found in our dataset.

Discussion
In this study we used a systematic data integration approach to develop a molecular model of highly interconnected proteins being associated with renal function as assessed by eGFR 12 months after transplantation. Our in-silico approach reduced the number of genes of interest and in addition allowed to focus on features with a high interaction potential thus more likely reflecting core pathophysiological processes. As the genes reported in the literature and used in our modelling approach had not been validated in independent datasets we performed an additional study with this focus. Our final prediction model, holding the five selected molecular markers EGF, CD2BP2, RALBP1, SF3B1, and DDX19B on top of two clinical parameters significantly outperformed a model only based on clinical parameters. No significant correlations between molecular markers and donor age could be detected indicating that molecular markers hold complementary information to donor age. Information on delayed graft function significantly added to predictive power whereas information on rejection episodes did not further add in explaining variability of eGFR values one year after transplantation. Rejection episodes have been reported to be associated with renal outcome in the past. This is however mostly confined to rejection episodes not responsive to steroid bolus therapy, which were not observed in our cohort. The fact that rejections were rare (n = 9) and usually mild may very well be due to the fact, that no highly immunized subjects were included and subjects suffering irreversible rejection leading to graft loss within the follow up period were excluded.
Higher EGF levels were associated with higher post-transplant eGFR levels and thus with better renal function in our dataset. This is in-line with recent findings that EGF was downregulated in progressive DN 21 . The authors could further demonstrate that intrarenal EGF values are highly correlated to urinary EGF concentrations. A positive regulatory effect of EGF in progressive kidney disease has also been previously discussed by Rudnicki and colleagues 22 . This finding could be confirmed by a study from Betz and colleagues investigating EGF levels in a diabetic nephropathy animal model as well as in a cohort of normoalbuminuric type 2 diabetes patients 23 . EGF values were also found to be downregulated in donor kidney biopsies showing reduced eGFR values after transplantation in the TX-SCIAN dataset that entered our molecular model generation workflow 16 . Dosanjh and colleagues showed that members of both the hepatocyte growth factor (HGF) as well as the EGF signaling pathway are differentially expressed in renal allografts with chronic allograft nephropathy and interstitial fibrosis as well as tubular atrophy 24 .  O'Connel et al. collected biopsies from allograft recipients with stable renal function three months after transplantation and identified a set of 13 genes that were predictive of the development of fibrosis at one year. Although this final set did not contain EGF the latter was found to be negatively associated with the chronic allograft damage index three months after transplantation, which is in line with our findings 20 .
Surprisingly and contradictory to all other studies Rintala and colleagues demonstrated that inhibition of the EGF signalling cascade with the EGFR inhibitor erlotinib prevented chronic rejection and led to improved allograft function as their reasoning had been that EGF in part mediated chronic allograft nephropathy 25 .
As with EGF, also higher levels of CD2BP2 were associated with higher post-transplant eGFR levels indicating that reduced expression of CD2BP2 is detrimental to renal function. CD2BP2 was initially described as factor involved in T cell activation and enhancer of interleukin 2 expression 26 . CD2BP2 was however also very recently shown to be an important factor for podocyte function. Depletion of CD2BP2 in podocytes led to proteinuria and ultimately kidney function decline in a mouse model as reported by Albert and colleagues 27 .
Higher expression of RALBP1, also known as RIP1 or receptor-interacting protein 1, in our dataset as indicated by the negative parameter estimate in the statistical model were associated with lower eGFR values post-transplant and thus reduced kidney function. RALBP1 was reported as inducer of necroptosis, a novel form of cell death, which was also linked to myocardial, renal, and cerebral ischemia-reperfusion injury 28 . Enhanced expression of RALBP1 was also found to enhance cisplatin-induced nephrotoxic acute kidney injury in an in-vitro model by inducing necroptosis of cultured tubular cells 29 . Higher RALBP1 expression levels were also detected in an animal model of cisplatin-induced nephrotoxicity with inhibitors of necroptosis providing protection from kidney injury in this mouse model 30 .
SF3B1 encodes for a subunit of the splicing factor 3b,being involved in RNA splicing and gene expression, which was previously reported to be detectable in exosomes of transformed Madin-Darby canine kidney cells with the potential of inducing epithelial to mesenchymal transition 31 . SF3B1 was also reported as a direct target of the hypoxia inducible factor 1 alpha (HIF1A) containing a hypoxia response element in the promoter region when investigated in an animal model of cardiac hypertrophy 32 . No information on involvement in pathophysiological renal processes could be identified in scientific literature for DDX19B being involved in mRNA transport from the nucleus. This molecule might serve as sensor of deregulated transcriptional activity in deceased donor organs after brain death. Of note that DDX19B was the least significant parameter in the constructed regression model with a p-value of 0.0449. Excluding DDX19B from the model led to a drop of the adjusted R 2 from 0.33 to   Most transcriptomics studies dealing with expression profiles and outcome in the renal donor setting focused on short term outcome such as delayed graft function [12][13][14] , while others dealt with expression patterns in the recipient after transplantation 20,33,34 . We identified three studies reporting on transcriptional changes in the donor organ being associated with long term transplant outcome 9,15,16 . Based on these transcripts complemented by genes reported in scientific literature we constructed a network-based molecular model or organ damage and validated a set of five markers in an independent dataset in the present study. Especially the mechanistic roles of EGF, CD2BP2, and RALBP1 in kidney tissue might hold the potential to also serve as targets for therapeutic intervention.
As we were primarily interested in prediction models holding parameters known at the time of transplantation, we deliberately did not include post-operative parameters in the first place. We however evaluated the impact of the two post-operative parameters rejection episodes as well as delayed graft function on performance of our generated prediction models. Delayed graft function is a known predictor of long term graft function which we also observed in our cohort 35 . Delayed graft function indeed improved model performance significantly indicating that delayed graft function holds information being complementary to the information of our selected markers. Rejection episodes interestingly did not add to prediction in our dataset with nine patients having biopsy-proven rejections in our cohort.
A limitation of our study is that no functional experiments were conducted thus further exploring the mechanistic role of the validated markers in renal tissue damage which however was not scope of this work.
In summary renal transplant donor organ expression of the five molecular markers EGF, CD2BP2, RALBP1, SF3B1, and DDX19B significantly adds in predicting post-transplant renal function when added to the clinical parameters donor age and gender.

Donor organ status molecular model construction. Transcriptomics studies were identified in
Pubmed using the following query: kidney transplantation [

mh] AND (microarray analysis [mh] OR gene expression profiling [mh] OR High-Throughput RNA Sequencing [mh]) NOT review [publication type].
Titles and abstracts of 242 publications were screened for gene expression studies reporting on human deceased donor kidney biopsy workup with a focus on investigating correlations of gene expression to renal function 12 months after transplantation (either based on serum creatinine or eGFR). Three studies were identified as relevant and molecular features were extracted from these publications. These three studies were the only ones using an unbiased gene expression profiling approach in donor kidney biopsies and were thus considered for further analysis in the current study. From the studies by Kainz et al. 15 [TX-KAINZ] as well as Scian et al. 16 [TX-SCIAN] the reported sets of transcripts were used as input for molecular model generation. The expression dataset published by Perco and colleagues [TX-PERCO] was re-analysed with respect to correlation of transcripts to twelve months post-transplant eGFR values 9 . P-values of the correlation analysis were corrected with the R package fdrtool, setting the false discovery rate to <5% 36 . All transcripts were mapped to their respective genes using the Ensembl Gene ID as common denominator. All datasets used in the integrative analysis of this work are listed in Table 1. The set of transcripts from the omics studies was complemented by a set of molecular features reported in scientific literature described in the donor organ and having impact on medium-term graft function [LIT-CAN]. The Pubmed query "chronic allograft nephropathy" OR "chronic allograft dysfunction" AND humans NOT review [publication type] resulted in 1075 publications. Publications mentioning at least one gene (based on gene2pubmed, GeneRIFs, or official gene name) in title or abstract were manually reviewed in order to extract deregulated genes in the donor organ having impact on post-transplant renal function. Next, all molecular features of donor organ status affecting outcome identified above were mapped on a hybrid interaction network including protein-protein interaction data from IntAct, BioGrid, and Reactome together with computationally inferred relations 37 . The organ status-transplant outcome specific molecular interaction subgraph was extracted from the network discarding molecular features not being connected to at least one other member from the signature. This subgraph was forwarded to the Molecular Complex Detection algorithm for identifying clusters of nodes, in the following denoted as molecular processes 38,39 . These processes are characterised by an accumulation of genes with a high level of interaction and the Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.8 was used to identify significantly enriched Gene Ontology (GO) biological process terms in these processes reflecting donor organ status 40 . This in silico workflow, which was developed within the European Framework 7 project SYSKID [41][42][43] , reduced the number of genes derived from the literature from 548 to 89. Additionally the genes remaining for validation are part of molecular processes characterized by a high level of interaction and thus most likely hold information on relevant pathophysiology. Independent gene expression dataset. Wedge biopsies from 78 deceased donor kidneys were obtained prior to reperfusion. Biopsy workup, RNA extraction and hybridization to Agilent Human Gene Expression 4x44 oligonucleotide microarrays was performed as described in a previous publication 44 . The study was approved by the Ethics Board of the Medical University Innsbruck. Details on the population are provided in Table 2.
The Agi4x44PreProcess R module was used for gene expression analysis. For 27 of the 78 samples, the Agilent chip version 2 was used. Raw gene expression values obtained from the two chip versions were combined into a joined dataset using shared probe identifiers. Quantile-quantile normalization was used to normalize raw mean intensity values. Summarization of probes was successively performed on the level of the human Gene Symbol. Two samples had to be excluded from further analysis based on hierarchical cluster analysis resulting in an analysis dataset of 76 samples. Gene expression data were available for 74 of the 89 molecular features in the donor organ status molecular model described above thus qualifying for inclusion in successive multiple regression analysis.
Multiple linear regression analysis. Linear regression models were derived using all clinical parameters listed in Table 2 in order to predict post-transplant renal function given as eGFR values twelve months after transplantation. The MDRD (Modification of Diet in Renal Disease) equation was used in order to estimate GFR values. The best performing model was identified considering adjusted R 2 values, leave-one-out cross validated (LOOCV) R 2 values and LOOCV root-mean-square-error values.
The clinical parameters together with all molecular features of the donor organ status molecular model were used as input for delineating improved multiple regression models for predicting eGFR values twelve months after transplantation. Simulated annealing was used to identify models holding between six and eight parameters minimizing LOOCV root-mean-square error. Individual non-significant parameters were removed in a post-processing step in order to achieve statistical significance for all single parameters in the final model. The best performing regression model was compared against the regression model only holding clinical parameters using ANOVA.
Pairwise Pearson correlation coefficients were determined between molecular markers of the best performing regression model and all continuous clinical parameters listed in Table 2, namely donor and recipient age, last donor creatinine, and cold ischemia time. Bonferroni correction for multiple testing was used in order to identify significant correlations between parameters.
The primary aim of our study was to derive prediction models based solely on data available at the time of transplantation. In addition we were interested to decipher if postoperative complications like delayed graft function or acute rejection episodes have an impact on identified associations. We therefore analysed the clinical course of the study population with regard to the need for postoperative dialysis as an indication for delayed graft function (categorized as mild with 1-3 dialysis sessions or severe with >3 dialysis sessions performed) or the incidence of biopsy-proven acute rejections (regardless of Banff categorization). We first tested whether the molecular marker panel was capable of predicting these postoperative events and successively also determined the add-on value of postoperative parameters on predicting eGFR values one year after transplantation.

Ethics.
The study protocol and the use of surplus zero hour biopsy tissue after routine workup was approved by the local IRB. As no living donors were included no informed consent from the organ donors could be obtained.

Data availability. Raw gene expression data used in this study are available at NCBI Gene Expression
Omnibus with the accession number GSE95026.