Observer variability in the assessment of renal 18F-FDG uptake in kidney transplant recipients

18F-FDG PET/CT imaging may help non-invasively disprove the diagnosis of acute kidney allograft rejection (AR) in kidney transplant recipients (KTR). The present study aims at evaluating the repeatability and reproducibility of the quantification of renal 18F-FDG uptake in KTR. We prospectively performed 18F-FDG PET/CT in 95 adult KTR who underwent surveillance transplant biopsy between 3 to 6 months post transplantation. Images were obtained 180 minutes after injecting 3 MBq 18F-FDG per kg body weight. Mean standard uptake value (SUVmean) of kidney cortex was independently measured by 2 experienced observers in 4 volumes of interest (VOI) distributed in the upper (n = 2) and lower (n = 2) poles. The first observer repeated SUV assessment in the uppermost VOI, blinded to the initial results. Intra-class correlation coefficients (ICC) and Bland-Altman plots were calculated. An ICC of 0.96 with 95%CI of [0.94; 0.97] was calculated for the intra-observer measurements. The ICC for inter-observer reproducibility for each VOI was 0.87 [0.81–0.91], 0.87 [0.81–0.91], 0.85 [0.78–0.89] and 0.83 [0.76–0.88] for the upper to the lower renal poles, respectively. The repeatability and reproducibility of the quantification of kidney allograft 18F-FDG uptake are both consistent, which makes it transferrable to the clinical routine.

Kidney transplantation represents the treatment of choice for patients with end stage renal disease 1 . Despite the steady progress of immunosuppressive treatments, acute rejection (AR) remains a recurrent complication which impacts both graft and patient survivals 2,3 . Furthermore, systematic studies focusing on the clinical value of protocol biopsies (by definition performed in stable kidney transplant recipients (KTR)) have demonstrated a non-negligible prevalence of subclinical AR [4][5][6][7][8] . By definition, subclinical AR corresponds to "the histological documentation of unexpected evidence of AR in a stable patient" 9 . Early management of AR decreases the risk of chronic cellular/humoral rejection, late AR episodes and improves long-term graft survival in KTR 10 . Therefore, precocious detection of (subclinical) AR is essential.
In current clinical practice, transplant needle biopsy (TNB) using Banff classification is the gold standard for AR diagnosis 11 . Still, it is associated with a substantial risk of complications, such as hemorrhage or infection 12 . Thus, non-invasive approaches have been developed over the past decades in order to help clinicians avoid potential side effects of TNB [13][14][15] . Particularly, promising preclinical and clinical observations have been reported on the role of 18 F-fluorodeoxyglucose ( 18 F-FDG) positron-emission tomography coupled with computed tomography (PET/CT) in kidney allograft AR, in both diagnosis and therapeutic monitoring [16][17][18] . One may speculate that the AR-induced recruitment of activated leukocytes -with high metabolic activity -increases the uptake of 18 F-FDG in renal graft cortex 19 .
In addition to the visual assessment, PET/CT allows a semi-quantitative analysis of the images which is reflected by standardized uptake value (SUV). SUV represents the decay-corrected concentration of intravenously injected 18 F-FDG in a volume of interest (VOI). Doing so, we demonstrated a significant link between cortical renal graft SUV mean and Banff score, with sensitivity and specificity in diagnosing AR of 100% and 50%, respectively, using a threshold of 1.6 16 . However, many well-known factors may affect the accuracy of SUV measurement, including patient weight, blood glucose level, time between the injection of the 18 F-FDG and image acquisition, partial-volume effect, and recovery coefficient. Additionally, VOI delineation, which is the sole parameter dependent on the physician, may bias the quantification of renal 18 F-FDG uptake 20 . The segmentation of kidney transplant is especially important in order to avoid VOI contamination by the physiological activity linked to the urine excretion of the radiotracer. The purpose of this study is therefore to evaluate the intra-and inter-observer variability in the assessment of renal 18 F-FDG uptake in KTR.

Material and Methods
patients. From November 2015 to January 2018, we prospectively performed an 18 F-FDG PET/CT imaging in KTR undergoing a surveillance transplant biopsy between 3 to 6 months post transplantation. Patients with delayed protocol biopsy (>6 months), under 18 years, who underwent transplantectomy, or who were pregnant or breastfeeding were all excluded. Estimated glomerular filtration rate (eGFR) was calculated using modification of diet in renal disease (MDRD) equation. The study was approved by the institutional review board of the University of Liege.

18
f-fDG pet/ct. PET/CT was performed using cross-calibrated Philips GEMINI TF Big Bore or TF 16 PET/ CT systems (Philips Medical Systems, Cleveland, OH). Low-dose helical CT (5-mm slice thickness, 120-kV tube voltage, and 40-mAs tube current-time product) was followed by a PET emission scanning with 2 bed positions each lasting 240 seconds. Image reconstruction involved iterative list mode time-of-flight algorithms. Corrections for attenuation, dead time, random, and scatter events were applied.
Mean standard uptake value (SUV mean ) of kidney cortex was measured by 2 observers (board-certified physicians in nuclear medicine with 9 and 5 year-experience in 18 F-FDG PET/CT imaging) in 4 VOI of 1 mL distributed in the upper (n = 2) and lower (n = 2) poles at distance of the pelvicalyceal zone. There was no a priori minimal threshold of distance to draw the VOI from the urinary pelvis. One VOI of 20 mL was drawn in the psoas muscle. The observer 1 repeated SUV assessment in the uppermost VOI, blinded to the initial results. SUV mean of each VOI was calculated with the following formula: On average, it takes ~5 minutes to measure the SUV mean of the renal cortex and the psoas muscle per patient.

Statistics.
To measure the agreement between the results (intra-and inter-observer variability), the following statistical methods were used: Repeated measures ANOVA, Bland and Altman's graph, and intra-class correlation coefficient (ICC). ICC is a measure of the agreement between two methods when the studied variable is continuous. Closer the ICC is to 1, better is the agreement between the two measurements. The results are considered significant at the significance level of 5% (p < 0.05). ethical approval and consent to participate. All procedures were performed in accordance with the principles of the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. The study design and exemption from informed consent were approved by the Institutional Review Board of Liege University Hospital.

Results
Ninety-five adult KTR underwent one PET/CT between November 2015 and January 2018, within 3 to 6 months following the transplantation. The mean age of the cohort was 54 ± 13 years (Range: 19-73 y.), with a male to female ratio of 2.4. Characteristics of the cohort are summarized in Table 1 (Fig. 2). Concerning the repeatability, the agreement was calculated for the intra-observer measurements (ICC: 0.96) (Fig. 3), with mean difference of −0.04 [−0.07; −0.01] in Bland Altman graph (Fig. 4). The same statistics were performed for SUV max and showed lower values both for ICC and Bland Altman analyses (Annex 1).
Finally, no significant relationship was highlighted between MDRD-based stages of chronic kidney disease (CKD) and the value of the ratio between mean kidney SUV mean and SUV mean of the psoas muscle (p = 0.24) (Fig. 5). 18 F-FDG PET/CT functional imaging is a promising tool in the assessment of AR-associated inflammation 16,21,22 . This rapid imaging technique does not cause any side-effect in patients with renal failure from normal to mildly reduced GFR to end-stage renal disease. Furthermore, we have recently showed that 18 F-FDG PET/CT may help rule out subclinical rejection in stable KTR, with a negative predictive value of 98% 23 . Our results support conclusions of previous studies which showed that the uptake of 18 F-FDG by kidneys in non-transplant patients and in KTR is not influenced by renal function 24,25 , and that the alteration of GFR does not significantly compromise the clearance of background activity 26 . Moreover, the physiological urinary excretion of 18 F-FDG may hamper the measurement of 18 F-FDG uptake in the renal parenchyma 27 . To overcome this problem, we performed late acquisitions and drew multiple VOI. Although the SUV max has shown the lower inter-observer variability in tumors 28,29 , we elected to use the SUV mean in order to limit the impact of a potential urinary contamination in the VOI. Doing so, we observed a consistent agreement between the two observers for all VOI. The agreement was the best in the upper pole (ICC: 0.87). No previous data are available in the literature since 18 F-FDG PET/CT imaging has been very recently tested in AR diagnosis in KTR 16 . However, Huang et al. 29 and Büyükdereli et al. 30 have demonstrated a high inter-observer correlation of SUV mean in the evaluation of 43 pulmonary nodules (ICC: 0.97) and 97 lung lesions (ICC: 0.98), respectively. Benz et al. 28 and Goh et al. 31 also proved a strong reproducibility of SUV mean assessment in treatment monitoring by 18    www.nature.com/scientificreports www.nature.com/scientificreports/ Despite the absence of significant difference between ICC of the mean of SUV mean and SUV mean of each pole, superior pole SUV mean of observer 2 was significantly lower than the 3 other values. Furthermore, even if the delineation of only one single VOI is easier in clinical routine, we must keep in mind that histopathological changes in the kidney allograft may not be homogeneous. Sorof et al. 32 highlighted discordances in histological grading in 30% and therapeutic defects in 7.5% of cases with only one analyzed sample. Automatic kidney component segmentation methods are currently under development and validation. Such an approach may help improve PET/CT-based diagnosis of kidney allograft rejection 33 . As a whole, we currently recommend to draw multiple VOI within the renal parenchyma rather than only one.  www.nature.com/scientificreports www.nature.com/scientificreports/ Although simplicity and ease of use are among the strengths of SUV, the measurement is nevertheless vulnerable to many sources of unwanted variability 20 . Despite a careful attention to protocol during the acquisition, there is still a within-subject coefficient of variation in SUV mean , reaching 10% for tumours (4.8 to 17.7 depending of the studies) 34 and 10-15% for normal tissues 35 . This variability could be problematic for "borderline" cases with SUV right next to cutoff values.

conclusions
This study shows that assessment of renal 18 F-FDG uptake in KTR is highly repeatable and reproducible if 18 F-FDG PET/CT images are evaluated by experienced observers with careful attention to the technique.

Data availability
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.