Deep learning body-composition analysis of clinically acquired CT-scans estimates creatinine excretion with high accuracy in patients and healthy individuals

Assessment of daily creatinine production and excretion plays a crucial role in the estimation of renal function. Creatinine excretion is estimated by creatinine excretion equations and implicitly in eGFR equations like MDRD and CKD-EPI. These equations are however unreliable in patients with aberrant body composition. In this study we developed and validated equations estimating creatinine production using deep learning body-composition analysis of clinically acquired CT-scans. We retrospectively included patients in our center that received any CT-scan including the abdomen and had a 24-h urine collection within 2 weeks of the scan (n = 636). To validate the equations in healthy individuals, we included a kidney donor dataset (n = 287). We used a deep learning algorithm to segment muscle and fat at the 3rd lumbar vertebra, calculate surface areas and extract radiomics parameters. Two equations for CT-based estimate of RenAl FuncTion (CRAFT 1 including CT parameters, age, weight, and stature and CRAFT 2 excluding weight and stature) were developed and compared to the Cockcroft-Gault and the Ix equations. CRAFT1 and CRAFT 2 were both unbiased (MPE = 0.18 and 0.16 mmol/day, respectively) and accurate (RMSE = 2.68 and 2.78 mmol/day, respectively) in the patient dataset and were more accurate than the Ix (RMSE = 3.46 mmol/day) and Cockcroft-Gault equation (RMSE = 3.52 mmol/day). In healthy kidney donors, CRAFT 1 and CRAFT 2 remained unbiased (MPE = − 0.71 and − 0.73 mmol/day respectively) and accurate (RMSE = 1.86 and 1.97 mmol/day, respectively). Deep learning-based extraction of body-composition parameters from abdominal CT-scans can be used to reliably estimate creatinine production in both patients as well as healthy individuals. The presented algorithm can improve the estimation of renal function in patients who have recently had a CT scan. The proposed methods provide an improved estimation of renal function that is fully automatic and can be readily implemented in routine clinical practice.

www.nature.com/scientificreports/ CKD-EPI, and Ix equation, the latter developed as a creatinine clearance equation based on the original CKD-EPI cohorts [6][7][8] . These equations were derived in datasets where surrogates of body composition (i.e. age, sex, weight, and race) were used to estimate either the measured glomerular filtration rate, or the measured creatinine clearance. However, these equations may be unreliable in patients with aberrant body composition 9 . Creatinine is a breakdown product predominantly formed in the muscles, which distributes in total body water and is excreted by the kidneys [10][11][12] . Conventional equations that estimate creatinine excretion are constructed using surrogate markers of muscle mass (e.g. age, gender, weight, and race). However, these equations have mostly been developed and validated in patients visiting the outpatient clinic and outliers such as patients with aberrant muscle mass were excluded 6,8 . In patients with aberrant muscle mass, the ability of surrogate markers to correctly estimate creatinine excretion is severely hampered and a direct measurement of body composition may be more accurate.
Clinically acquired CT-scans can be used to acquire body composition estimates such as muscle and fat volume 13 . The cross-sectional muscle area at the L3 level has a strong association with total body muscle volume 14 . Deep learning algorithms have been developed that automatically select L3 and segment the muscles, which provides the opportunity to incorporate these measurements in routine clinical care and large scientific studies 13 . These algorithms have shown to provide reliable automated segmentations of body composition at the L3 level compared to manual segmentation by experts 15,16 . Furthermore, these algorithms can extract radiomics parameters which provide additional parameters of body composition based on the radiation attenuation (i.e. distribution of Hounsfield units) that cannot easily be quantified in a reproducible way by the human eye 17 . Since abdominal CT-scans are increasingly performed in routine care, this provides the opportunity to automatically quantify body composition in large groups of patients without extra costs or burden to the patient.
The goal of this study was to create an equation that estimates creatinine production from automated image analysis of CT-scans that were made for other purposes, combined with traditional surrogate markers of body composition that are used in existing equations that estimate creatinine production such as weight and stature..

Methods
Patient population and data extraction. For the patient dataset, we included all patients that received a CT-scan of the abdomen and had a 24-h urine collection 2 weeks prior or after the CT-scan in the University Medical Center Utrecht (UMC Utrecht) between 2011 and 2019. Patients that were scanned more than once and had separate 24-h collections were considered as separate data entries if the scans were more than one week apart. In addition, we included a second dataset of kidney donors who received an abdominal CT-scan and multiple subsequent collections of 24-h urine as part of their routine screening between 2008 and 2020 in the UMC Utrecht. Kidney donors were also included if time between CT and 24-h urine creatinine excretion was more than two weeks, since they are likely to have stable muscle mass.
Pseudonymized CT-scans were extracted from the Research Image Archive of the UMC Utrecht. Patient and laboratory data were extracted from the Utrecht Patient Oriented Database (UPOD), an infrastructure of relational databases at out center 18 . Plasma and urine creatinine were measured by enzymatic colorimetric assay (Beckman Coulter, Brea, CA, USA). Clinical variables that were extracted from UPOD were age (years), weight (kg), stature (cm), and sex (male/female). Due to the retrospective nature of the study with collection of anonymized data, it was not possible to further specify reason for admittance.

Selection of 24-h urine creatinine measurements.
When plasma creatinine is not in steady-state, excretion of creatinine does not equal creatinine production due to build-up in the volume of distribution 19,20 . To correct for this, we calculated approximate increase or decrease of creatinine in the volume of distribution and excluded measurements exceeding 1.5 mmol/day, calculated by: where ΔTBCr is the daily change in total body creatinine, ΔPCr is the daily change in plasma creatinine, and VD is the volume of distribution.
As a measure of volume of distribution we used 60% of bodyweight, an estimate of total body water [10][11][12] . In addition, we excluded measurements with biologically implausible values (< 3 mmol and > 30 mmol/day). If a patient had multiple consecutive urine measurements, the mean of these values was used.
Feature extraction from CT-scans and calculation of radiomics parameters. We included any CT-scan covering the L3 segment, including any variation of intravenous contrast. CT examinations were obtained with multidetector scanners: Mx8000 IDT 16, Brilliance 64, iQon Spectral or Brilliance iCT; all from Philips Medical Systems, Cleveland, OH, USA. Exposure settings (range:  mAs and 80-120 kVp) were adjusted to patient size.
The deep learning algorithm (Quantib Body Composition version 0.2.1.) has been developed previously in separate datasets and is described previously and is available 13,21 . Briefly, the L3 slices were automatically identified using a convolutional neural network with a regression output. The network was trained to estimate, for every slice in the image, its proximity to the L3 vertebra. During automated processing, the slice estimated by the network as closest to the craniocaudal middle of L3 was labelled as the L3 slice. Second, at the estimated L3 level, automated segmentation into areas (visceral/subcutaneous fat, psoas/long spine/abdominal wall muscles) was performed using a dilated residual network trained using a cross-entropy loss function. Both networks were implemented using PyTorch1.6. To increase robustness of body composition measurements, image segmentation www.nature.com/scientificreports/ was performed for five slices of 5 mm around the L3 slice, i.e., two slices below and two slices above. Therefore, the resulting segmentation areas were averaged over a range 2.5 cm. The algorithm segmented muscle (psoas, long spine, and abdominal wall muscles) and fat compartments (subcutaneous and visceral), and further segmentation was performed using threshold values for Hounsfield Units (HU) for muscle (> − 15 HU) 22 . Furthermore, radiomics parameters were calculated from the HU distribution of all muscle and fat compartments. Radiomics parameters that were calculated and used for variable selection were: percentiles (10th, 25th, 75th, 90th), kurtosis, skewness, mean, median, standard deviation, minimum, and maximum. After segmentation by the algorithm, slices were manually checked for sufficient quality of the scan (e.g. no artefacts and/or cut-off muscle).
Handling of missing data, construction and validation of the equations. The patient dataset was split in a development and validation set (random split 60%/40%). Missing data was imputed separately in the development (weight 22%, stature 9%), validation (weight 19%, stature 7%), and kidney donor (weight < 1%, stature < 1%) set (m = 10) 23 . All clinical and radiomics variables were used as predictors and both weight and stature were imputed using predictive mean matching. We used the LASSO algorithm to perform the first variable selection in the development dataset 24 . The LASSO algorithm is specifically designed to handle datasets with large amount of predictors and has previously been used for variable selection in datasets containing radiomics variables 25,26 . Variables that were selected in more than half of the imputed sets were further selected using least squares regression with backwards selection and tenfold cross-validation 27 . The equation with the lowest cross-validated mean RMSE was selected. Model coefficients were calculated using least squares regression and averaged over the 10 imputed datasets. We selected two equations, one where all parameters were considered (CRAFT 1) and one where weight and stature were left out (CRAFT 2).
Performance was compared to two previously published formulas, the Cockcroft-Gault formula and the Ix 7,8 . The Ix formula was calculated without the variable race, since this is not routinely recorded for patients in the Netherlands. We did not perform sample size calculation prior to inclusion, since there is no validated method to perform this when using LASSO for feature selection. We evaluated the risk of introducing bias by overfitting by comparing the results of our equations in the development dataset versus the validation dataset, which were similar.

Statistics.
We assessed bias with the Mean Prediction Error (MPE), precision with the R 2 -value of least squares regression, and accuracy with the RMSE and the number of predictions that fell within 15% (p15) or 30% (p30) of the outcome. We computed confidence intervals using the combined variance of bootstrap (1000x) and multiple imputation. For p15 and p30, we used the normal approximation. Relative accuracy of equations was evaluated based on the 95% CI of the ΔRMSE with CRAFT 1 as reference 28 .
All calculations and statistical analyses were executed using R statistics 3.5.1.
Ethics. This study was performed according to the declaration of Helsinki and the ethical guidelines of our institution. The research protocol was approved by the institutional review board of the UMC Utrecht and they waived the need for informed consent (protocol number 20-247/C). The clinical and research activities being reported are consistent with the Principles of the Declaration of Istanbul as outlined in the 'Declaration of Istanbul on Organ Trafficking and Transplant Tourism' . Pseudonymized data were used for this study. Data collection and handling was conducted in accordance with European privacy legislation (GDPR).

Characteristics of patients and healthy kidney donors.
For the patient dataset, we included 702 abdominal CT-scans. We excluded 19 scans because of insufficient scan quality and 16 scans with segmentation errors (Supplemental Table 1, Supplemental Fig. 1). Finally, we excluded 31 scans that were performed within a week of each other in the same patient, therefore including 636 scans for development (n = 370) and validation (n = 266). Furthermore, we included 287 kidney donors who received a CT-scan and 2-3 consecutive 24-h urine collections in our center. In the kidney donor dataset, all scans were of sufficient quality, thus none were excluded (Table 1). Among the combined datasets (n = 923), 49% was female, mean age was 55 (SD ± 14) years, mean BMI was 26 (SD ± 4.5) and mean creatinine excretion was 12 (SD ± 4.1) mmol/day. Apart from sex, clinical characteristics did not differ significantly between the datasets. The intraindividual variability of urine collections in kidney donors, defined as the mean standard deviation of 2-3 consecutive collections calculated for individual donors, was 1.47 mmol/day. Figure 1 shows examples of the result of the segmentation in varying body compositions. Surface areas of the muscles had a strong association with creatinine excretion in the combined cohorts (Fig. 2) (R 2 = 0.48, 0.47 and 0.39 respectively). Traditional clinical variables (e.g. weight, age, sex) also associated with creatinine excretion, although the correlation was less strong (Fig. 2) (R 2 = 0.11, 0.25 and 0.24 respectively).

Development of equations using clinical and radiomics parameters.
We developed two equations, one where both clinical and radiomics variables were considered (CRAFT 1) and one where weight and stature were not used (CRAFT 2). After multiple imputation (Supplemental Fig. 2   Further selection with tenfold cross-validation and backwards selection on these variables resulted in 10 variables for CRAFT 1 and 11 variables for CRAFT 2 (Supplemental Fig. 3, Supplemental Table 2).

High accuracy of CT-based equations in patient validation dataset.
Both equations showed excellent performance in the patient validation dataset (  Table 3).

Low level of bias of CT-based based equations in healthy kidney donors.
To ensure that the new equations were also unbiased in healthy individuals, we validated the equations in a group of kidney donors ( Table 3, Fig. 4). Both CRAFT 1 and CRAFT 2 displayed slight negative bias compared to the patient validation cohort (− 0.71 mmol/day and − 0.73 mmol/day respectively), although precision was higher (R 2 = 0.73 and R 2 = 0.69, respectively). Furthermore, accuracy was better in both CRAFT 1 (RMSE = 1.86 mmol/day, p15 = 70%, p30 = 97%) and CRAFT 2 (RMSE = 1.97 mmol/day, p15 = 67%, p30 = 95% Accuracy of equations at different levels of creatinine excretion. Next, we compared the accuracy at different tertiles of creatinine excretion (Supplemental Tables 4, 5). CRAFT 1 was the most accurate across all tertiles in the patient dataset. The difference in accuracy between CRAFT 1 and the Ix equation was the most pronounced in the lowest tertile, whilst the difference in accuracy between CRAFT 1 and the Cockcroft-Gault  www.nature.com/scientificreports/ equation was highest in the highest tertile. In the kidney donor dataset, the difference in accuracy was more or less comparable across tertiles.

Discussion
We demonstrated that deep learning-based analysis of abdominal CT scans at L3 can improve the estimation of creatinine excretion. We developed and validated two equations to estimate creatinine excretion based on clinical characteristics and measures of body composition. The equations showed excellent accuracy in an unselected group of patients with varying body compositions and in an additional group of healthy kidney donors. When compared with previously published equations, our equations improved the accuracy of the estimation of creatinine excretion especially in patients with relatively low or high creatinine excretion. The strong correlation between body composition at L3 and creatinine excretion is in line with previous studies that examined body compartments at L3. Radiologic assessment of body composition at L3 has been shown to be an objective measure of obesity and muscle wasting in patients with end-stage liver disease awaiting transplantation 29 . The L3 skeletal muscle index has been shown to be a marker for sarcopenia and frailty in lung cancer patients 30 . Assessment of sarcopenia and changes in body composition have been shown to associate with outcomes after neoadjuvant chemotherapy in oesophageal cancer 31 . Furthermore, analysis of body composition may help in dosing chemotherapy and dosing contrast injection in contrast-enhanced CT 32,33 . We add to these observations that body composition analysis at L3 can be used to accurately estimate creatinine excretion.
The finding that our CT-based equations had the most benefit over clinical equations in the patient cohort is most likely explained by the fact that this cohort included a wide range of sick and/or hospitalized patients with altered body composition. This is further emphasized by the fact that our equations had the most benefit over the clinical equations in the lowest tertile of creatinine excretion in patients, where there are likely patients included with muscle wasting. Indeed, it has previously been shown that eGFR equations perform worse in hospitalized patients 9,34 . In addition, the Ix formula was developed and validated in cohorts were patients with liver disease, cancer, and muscle mass far from population norms were excluded, whilst we did not exclude patients based on these characteristics 8 . Our results indicate that in these patients where creatinine excretion is hard to estimate based on clinical variables, our equations may provide the most benefit.
Although the new equations use direct measures of muscle mass to predict creatinine excretion, there was still residual inaccuracy. This may be explained by several factors. Although we excluded patients with fluctuating serum creatinine, build-up of serum creatinine in the volume of distribution may still have affected creatinine excretion 19 . Furthermore, muscle volume and radiation attenuation at L3 may not be fully representative of total body muscle volume, especially in areas not represented such as the extremities. Finally, the residual inaccuracy may be the result of errors in the timed urine collections. This is illustrated by the mean standard deviation of Table 3. Equation performance in kidney donor dataset (n = 287). Confidence intervals were calculated with the combined variance of multiple imputation (10×) and bootstrap (1000×). MPE mean prediction error (mmol/day), MAPE mean absolute percentage error, RMSE root mean squared error (mmol/day), R2 the R2-value calculated with linear regression, p15/p30 the percentage of points that fall within 15%/30% of the outcome.   www.nature.com/scientificreports/ creatinine excretion in kidney donors (1.47 mmol/day), which was close to the standard deviation of the prediction error of CRAFT 1 in the kidney donor set (RMSE = 1.86 mmol/day). This suggests that a large part of the residual inaccuracy may have been caused by collection errors. Strengths of this study includes it large sample size and the use of direct measures of muscle volume and body composition to estimate creatinine excretion. Furthermore, this was a retrospective study in real-life clinical data where we did not exclude patients with aberrant muscle mass. We validated the formula in both patients and healthy individuals, further supporting its validity in diseased and healthy individuals. Since the analysis of the CT-scan is fully automated, it may be easily implemented in routine clinical care. Finally, with the use of direct measures of body composition variables that are not automatically available or less discrete such as race may no longer be needed in renal function formulas 35,36 .
The retrospective nature of this study may have led to selection bias, since we could only evaluate the in patients that received a CT-scan and a 24-h urine collection. However, the equations also performed well in kidney donors, indicating that the equations may be accurate in a broader group of patients. Since this is a single-center study, the equations should still be validated in other patient groups and other centers. We could not calculate the Ix equation with the race variable and used the Caucasian race as default, which may have decreased the accuracy of this equation in patients of other ethnic backgrounds. The lack of data on race also prohibited us from analysing whether a difference in muscle volume/density explains the difference in creatinine excretion between people of different ethnical backgrounds that has previously been observed. The use of a race variable in renal function formulas is however currently under debate, not only for ethical reasons but also because an increasing amount of people identify as being of mixed racial background 36 .
In the future, automated body composition analysis of abdominal CT-scans could be fully integrated in electronic health records. Automated reporting of our equations may be used to validate the result of the CKD-EPI and replace the laborious collection of 24-h urine with spot urine collections. This combination of automatically generated data from the radiology and laboratory department represents a trend in diagnostic medicine called integrated diagnostics, where multidisciplinary data is combined to provide the treating physician with lean and accurate information 37,38 .
In conclusion, we developed and validated equations that estimate creatinine production by adding automated deep learning body-composition analysis of clinically acquired CT-scans to traditional clinical characteristics. These equations were superior to previously published formulas, especially in patients with aberrant body composition. Addition of creatinine production to the results of CT scans may improve and individualize the estimation of renal function derived from plasma creatinine concentrations as compared to traditional renal function formulas.