Association of serum bilirubin levels with risk of cancer development and total death

Serum levels of bilirubin, a strong antioxidant, may influence cancer risk. We aimed to assess the association between serum bilirubin levels and cancer risk. Data were retrieved from 10-year electronic medical records at Kyushu University Hospital (Japan) for patients aged 20 to 69 years old. The associations of baseline bilirubin levels with cancer risk (lung, colon, breast, prostate, and cervical) were evaluated using a gradient boosting decision tree (GBDT) model, a machine learning algorithm, and Cox proportional hazard regression model, adjusted for age, smoking, body mass index, and diabetes. The number of study subjects was 29,080. Median follow-up time was 4.7 years. GBDT models illustrated that baseline bilirubin levels were negatively and non-linearly associated with the risk of lung (men), colon, and cervical cancer. In contrast, a U-shaped association was observed for breast and prostate cancer. Cox hazard regression analyses confirmed that baseline bilirubin levels (< 1.2 mg/dL) were negatively associated with lung cancer risk in men (HR = 0.474, 95% CI 0.271–0.828, P = 0.009) and cervical cancer risk (HR = 0.365, 95% CI 0.136–0.977, P = 0.045). Additionally, low bilirubin levels (< 0.6 mg/dL) were associated with total death (HR = 1.744, 95% CI 1.369–2.222, P < 0.001). Serum bilirubin may have a beneficial effect on the risk of some types of cancers.

SHAP dependence plots of cancer risk in GBDT models. To focus on the association between serum bilirubin levels and each cancer risk, SHAP dependence plot (SDP) analyses were performed 21 . Of interest, the associations were clearly classified into two patterns. First, bilirubin levels were negatively and non-linearly associated with a risk of lung cancer in men, colon cancer in men and women, and cervical cancer (Fig. 3A). These negative associations were approximately linear for bilirubin levels < 1.2 mg/dL, and the lowest risk was observed in patients with high bilirubin levels ≥ 1.2 mg/dL. In contrast, these analyses clearly revealed that there was a U-shaped association for prostate cancer and breast cancer risk (Fig. 3B) and there was no association between bilirubin levels and lung cancer risk in women (Fig. 3B).
Cox hazard regression analysis. The SDP analyses in GBD models showed the non-linear relationship between serum bilirubin levels and cancer risks. First, according to the SDP analyses, the association patterns between bilirubin levels and cancer risk were divided into two patterns by the cut-off value (1.2 mg/dL) for lung cancer in men, colon cancer, and cervical cancer, and Cox hazard regression analyses were conducted (Fig. 1). Subjects with high bilirubin levels > 1.2 mg/dL were associated with lung cancer risk in men [hazard ratio (HR) = 0.434, 95% CI 0.214-0.880, P < 0.001] and colon cancer risk (HR = 0.429, 95% CI 0.212-0.868, P = 0.019), and tended to be associated with cervical cancer risk (HR = 0.285, 95% CI 0.040-2.049, P = 0.213) ( Table 2). For bilirubin levels < 1.2 mg/dL, the analyses were performed using serum bilirubin levels as continu- Table 1. Baseline characteristics of the study subjects. Data was expressed as median (IQR) or mean (SD). The number (No.) was expressed as absolute value and %. www.nature.com/scientificreports/ ous variables because the associations were approximately linear. As a result, serum bilirubin levels were negatively associated with lung cancer risk in men (HR = 0.474, 95% CI 0.271-0.828, P = 0.009) and cervical cancer risk (HR = 0.365, 95% CI 0.136-0.977, P = 0.045) and tended to be associated with colon cancer risk (HR = 0.647, 95% CI 0.391-1.070, P = 0.090) ( Table 2). In contrast, since a U-shaped association was observed for prostate cancer and breast cancer, serum bilirubin levels were divided into three groups by the cut-off values in each cancer separately prior to analysis. Prostate cancer risk was associated with subjects with high bilirubin levels ≥ 0.7 mg/dL (HR = 1.348, 95% CI 1.021-1.780, P = 0.035). Additionally, subjects with low bilirubin levels < 0.4 mg/dL tended to be associated with prostate cancer risk (HR = 1.407, 95% CI 0.795-2.489, P = 0.241). Breast cancer risk also tended to be associated with high bilirubin levels (≥ 0.9 mg/dL) (HR = 1.306, 95% CI 0.913-1.867, P = 0.144). No significant association was found between serum bilirubin levels and lung cancer risk in women (Table 3).
Association between serum bilirubin levels and total death. The SHAP summary plots showed that total death appeared to be associated with low BMI, low bilirubin levels, age, and smoking in descending order in men, and it appeared to be associated with age, low BMI, low bilirubin levels, diabetes, and smoking in women (Fig. 2F). The SDP analysis clearly showed that there was a reverse J-shaped association for total death in men and a U-shaped association in women (Fig. 3C). Thus, bilirubin levels were divided into three groups using the two cut-off values (0.6 and 1.1 mg/dL). Cox hazard regression analyses showed that subjects with low bilirubin levels < 0.6 mg/dL were associated with total death risk in men (HR = 1.998, 95% CI 1.481-2.694, P < 0.001), whereas subjects with high bilirubin levels ≥ 1.1 mg/dL tended to be negatively associated with total death risk (HR = 0.623, 95% CI 0.332-1.170, P = 0.141). In women, subjects with high bilirubin levels ≥ 1.1 mg/ dL were associated with total death (HR = 2.448, 95% CI 1.286-4.661, P = 0.006), and subjects with low bilirubin levels tended to also be associated with its risk (HR 1.371, 95% CI 0.909-2.067, P = 0.132). In a combined analysis of men and women, subjects with low bilirubin levels < 0.6 mg/dL were associated with total death (HR = 1.744, 95% CI 1.369-2.222, P < 0.001) ( Table 4).

Discussion
In this study, we showed the non-linear relationship between serum bilirubin levels and cancer risks using used SHAP summary plots and SDPs calculated from the GBDT model as estimation methods, which have been recently developed to improve the interpretability of outputs from machine learning approach and its consistency with human intuition [19][20][21] . These analyses enabled us to use linear regression models appropriately. Thus, Cox proportional hazard regression models showed that bilirubin levels were negatively associated with an increased risk of lung cancer in men and cervical cancer in the range of bilirubin levels < 1.2 mg/dL. A similar association was observed in colon cancer risk, although it was not statistically significant (HR = 0.647, P = 0.090). In addition, subjects with high bilirubin levels ≥ 1.2 mg/dL was significantly associated with the decreased risk for lung cancer in men and colon cancer. Subjects with high bilirubin levels ≥ 1.2 mg/dL were thought to be those with Gilbert's syndrome, a congenital mild hyperbilirubinemia 22,23 , because subjects with abnormally high bilirubin levels who had diseases, including liver cirrhosis, hemolytic anemia, or other hepatobiliary diseases, were excluded in this study. Subjects with Gilbert's syndrome may have a lower risk of lung cancer in men and colon cancer. Cancer-associated infections, smoking, obesity, diabetes, ionizing and ultraviolet radiation, and air pollution are established risk factors for cancer development 24 . All of these factors are likely to be associated with increased reactive oxygen species (ROS) production in humans. Increased ROS production has been hypothesized to damage DNA, proteins, and lipids, and thus initiate or promote cancer development 7,8 . Since bilirubin is a strong endogenous antioxidant, lower serum bilirubin levels reduce the systemic antioxidant capacity, resulting in an impaired defending ability against oxidative stress-induced damage. Therefore, it is very likely that the associations between lower serum bilirubin levels and increased cancer risk are mediated by decreased antioxidant activities.
Serum bilirubin levels are influenced by many environmental factors, including physiological and pathological conditions, as well as genetic factors. It has been reported that smoking is negatively associated with bilirubin levels 25,26 . In addition, low bilirubin levels have been reported in patients with various chronic diseases and conditions, such as diabetes, obesity, aging-related disability [3][4][5][6]27 . Therefore, the association between low bilirubin levels and cancer risk may be mediated, at least in part, by the effect of these factors. However, the present study revealed that the association remained significant even after the model was adjusted for these variables. Taken together, low bilirubin levels may reflect a total susceptibility determined by both genetics and various environmental factors to some types of cancers, and thus might be a clinically useful biomarker for the risk of cancer development.
In contrast, a U-shaped association was observed for breast cancer and prostate cancer in the GBDT model. The increased risk of these cancers in patients with high bilirubin levels was inconsistent with the concept of the protective effect of bilirubin. Of great interest, breast cancer and prostate cancer are both estrogen-dependent cancers. Estrogens are well-known risk factors for breast cancer 28 , and previous epidemiologic and experimental findings have indicated key roles of estrogens in prostate cancer development and progression 29,30 . Of note, both serum bilirubin levels and estrogen activities are mainly regulated by uridine diphosphate-glucuronosyltransferase 1A1 (UGT1A1). Serum bilirubin levels are highly related to genetics, and many genome-wide association studies have shown the substantial contribution of various UGT1A1 polymorphisms to human serum bilirubin levels 31 . Serum estrogen activities are also regulated by UGT1A1 by the conjugation and subsequent direct inactivation of estrogens 32 . Therefore, it is very likely that high bilirubin levels due to UGT1A1 polymorphisms may be accompanied by high activities of serum estrogens and a subsequently increased risk of estrogen-dependent cancers. In fact, several studies have shown that increased estrogen activities due to UGT1A1 polymorphisms may be associated with an increased risk of breast cancer, although this phenomenon is still controversial 33,34 . Recent study investigated the relationship between genetically raised bilirubin levels and risk of 10 cancer. This study showed that genetically raised bilirubin levels were negatively associated with squamous cell lung cancer risk and positively associated with breast cancer risk, but not associated with prostate cancer risk 35 . To our knowledge, our study is the first to show the association between high serum bilirubin levels and prostate cancer risk. As for lung cancer in women, non-smoking lung adenocarcinoma was reported to be strongly associated www.nature.com/scientificreports/ with the female sex, and estrogens are suggested to be involved in the development of this type of lung cancer in women 36,37 . This might offset the association between low bilirubin levels and lung cancer risk observed in men. However, these hypotheses should be confirmed in further studies.
In this study, we also showed that subjects with serum bilirubin levels < 0.6 mg/dL were associated with an increased risk of total all-cause death in men and total subjects, and a U-shaped association was observed in women. Since cancer is a significant cause of death (~ 27%) (https:// www. mhlw. go. jp/ toukei/ saikin/ hw/ jinkou/ geppo/ nenga i18) in Japan, these associations might be primarily explained by cancer mortality. However, since low bilirubin levels are associated with cardiovascular diseases and other chronic diseases, the detailed cause of death related to bilirubin levels should be evaluated in future studies.
The present study had some merits. First, a key strength of this study is that statistical evaluations were performed using a combination of machine learning and classical statistical approaches. The associations of low serum bilirubin levels with some types of cancer risk found in this study were quite different from the previous some studies showing that bilirubin levels were not associated with any cancer risk including colon, lung, breast or prostate 13,14,35 . This difference may be due to no consideration to the non-linear association in that study. The similarities and differences between this study and previous studies were summarized in Supplement Table 1S. Second, the important merit of this study was its setting in practical care. Regular clinical visits and hospitalizations could lead to a higher chance of an early and accurate diagnosis of cancer development and thus could reduce the misdiagnosis of indolent cancers. Third, the longitudinal study design and exclusion of cancer cases diagnosed within 1 year from enrollment minimized the potential reverse causality. There are several limitations to this study. First, we used EMR data from one hospital. Studies using EMR have the potential for confounding bias due to a lack of randomization and for selection bias, and there were missing data on many parameters. Second, the study was not prospective. Third, the sample size was not large enough to evaluate non-linear and complicated associations between bilirubin levels and some cancer risks using a classic statistical approach. Forth, the study subjects were suffering from different kinds of chronic diseases. Whether the results of this study can be generalized to healthy subjects without any chronic diseases remains to be elucidated.
In conclusion, combination approach using both machine learning methods and conventional statistical analysis showed that baseline serum bilirubin levels were negatively and non-linearly associated with some types of cancer risk including lung cancer in men, cervical cancer, and probably colon cancer. The causal association between serum bilirubin levels and cancer risk and its clinical utility should be evaluated in future prospective studies.

Methods
Study subjects. We obtained data from the electronic medical record (EMR) system at Kyushu University Hospital (Japan) for 311,391 patients between January 1st, 2008 and December 31th, 2017. This practical care information included age, sex, height, weight, smoking status, diagnoses [International Classification of Disease version 10 (ICD-10) codes], laboratory test results, and details of prescription medications. Eligible patients were 20 to 69 years old (n = 203,104) and had recorded serum bilirubin levels (n = 10,8014). In addition, the patients, who had a history of admission and were followed up for over 1 year, were included in the analysis to increase the accuracy of their information (n = 41,415) (Fig. 1S in the Supplement). Patients were excluded if they had a previous history of cancers or had ICD-10 codes corresponding to liver cirrhosis or hemolytic anemia or had other hepatobiliary diseases with abnormal liver enzyme levels (alanine aminotransferase or alkaline phosphatase greater than twofold of the upper limit of the normal range). Cancer cases diagnosed within 1 year from recruitment into the study were excluded to minimize potential reverse causality. In addition, patients with serum bilirubin levels over 2.0 mg/dL were excluded because those patients may have had unidentified pathological conditions affecting serum bilirubin levels, although some of them had hereditary hyperbilirubinemia, such as Gilbert's syndrome (Fig. 1S in the Supplement). All procedures were performed in accordance with the relevant Procedure. The outcomes of this study were the new development of cancers (lung, colon, breast, prostate, and cervical) and all-cause death. To identify incident cancer cases, we obtained the ICD-10 code of each cancer and death from the EMR. For colon cancer, patients with familial adenomatous polyposis were excluded. Follow-up times for each patient were calculated as the time between study enrollment and the date of either outcome onset or the last contact. For baseline serum bilirubin levels of each patient, we used the mean value of those tested during the 6 months before enrollment.
Statistical analysis. Since a non-linear relationship was expected to exist between bilirubin serum levels and cancer risk or total death, we used a GBDT model, one of the tree-based machine learning algorithms for making prediction models 18 . First, we generated a prediction model for the risk of each cancer and total death using extreme gradient boosting (XGBoost) ver. 1.0.0.2 (https:// github. com/ dmlc/ xgboo st) 38 , and variables, including serum bilirubin levels, age, BMI, smoking status (current or past), and the presence of diabetes. In the setup of the hyperparameters, the main fine-tuned parameters in this study included learning rate (learn-ing_rate = 0.1) and maximum tree depth (max_depth = 3). All the other parameters remained at their default values of XGBoost. In this study, we implemented SHapley Additive exPlanation (SHAP), which is a recent method of interpreting the outcome of a machine learning model. The SHAP value represents the contribution weight of each variable to the prediction model 19,20 . It is generally considered to be comparable to a standardized partial regression coefficient in linear regression models. Then, to illustrate visually the relationship between Table 2. The associations between serum bilirubin levels and each cancer risk (lung in men, colon, and Cervical). For lung cancer in men (M), colon cancer, and cervical cancer, the association between and cancer risk was evaluated by Cox hazard regression models. (Left table) First, serum bilirubin levels were divided into two groups using the cut-off value (1.2 mg/dL). The reference group for serum bilirubin levels was that of < 1.2 mg/dL. BIL (≥ 1.2), subjects with bilirubin levels ≥ 1.2 mg/dL. (Right table) Next, for serum bilirubin levels < 1.2 mg/dl, they were used as continuous variables (BIL), and the association between bilirubin levels and each cancer risk was evaluated. Age (20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38)(39), subjects aged 20-39 years old; Age (60-69), subjects aged 60-69 years old; BMI (< 21), subjects with body mass index < 21 kg/m 2 ; BMI (≥ 27), subjects with body mass index ≥ 27 kg/m 2 . The reference of age and BMI was a middle range group. www.nature.com/scientificreports/ bilirubin levels and each cancer risk or total death, we used a SHAP dependence plot (SDP) 21 . In this study, to confirm the results obtained from GBDTs, conventional statistical analyses were performed using Cox hazard regression models adjusted for age, BMI, smoking status, and the presence of diabetes. Since these associations were not linear, cancer risk was compared between two groups determined by the cut-off value of bilirubin levels or three groups determined by the two cutoff values according to the SDP patterns. The roadmap for statistical analyses was shown in Fig. 1. In all analyses, BMI was divided into three groups: low BMI < 21 kg/m 2 , 21 ≤ middle BMI < 27 kg/m 2 , and high BMI ≥ 27 kg/m 2 according to a previous report showing the relationship between BMI and cancer risk in a Japanese population 39 . Age was divided into three groups: 20 to 39 years old, 40 to 59 years old, and 60 to 69 years old. All statistical analyses were performed using R software ver. 3.6.3 (R Project for Statistical Computing, https:// cran. ism. ac. jp/). We considered 2-sided P values of less than 0.05 as statistically significant.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.