Assessment of compensated advanced chronic liver disease based on serum bile acids in chronic hepatitis B patients

Patients with chronic liver disease progressed to compensated advanced chronic liver disease (cACLD), the risk of liver-related decompensation increased significantly. This study aimed to develop prediction model based on individual bile acid (BA) profiles to identify cACLD. This study prospectively recruited 159 patients with hepatitis B virus (HBV) infection and 60 healthy volunteers undergoing liver stiffness measurement (LSM). With the value of LSM, patients were categorized as three groups: F1 [LSM ≤ 7.0 kilopascals (kPa)], F2 (7.1 < LSM ≤ 8.0 kPa), and cACLD group (LSM ≥ 8.1 kPa). Random forest (RF) and support vector machine (SVM) were applied to develop two classification models to distinguish patients with different degrees of fibrosis. The content of individual BA in the serum increased significantly with the degree of fibrosis, especially glycine-conjugated BA and taurine-conjugated BA. The Marco-Precise, Marco-Recall, and Marco-F1 score of the optimized RF model were all 0.82. For the optimized SVM model, corresponding score were 0.86, 0.84, and 0.85, respectively. RF and SVM models were applied to identify individual BA features that successfully distinguish patients with cACLD caused by HBV. This study provides a new tool for identifying cACLD that can enable clinicians to better manage patients with chronic liver disease.


Methods
Patient selection.This study was approved by The First Hospital of Lanzhou University Ethics Committee (approval number: LDYYLL2022-111).Data were collected from January to October 2022.Figure 1 shows the flow chat of the study population.We prospectively selected patients who met the following criteria: age between 18 and 75 years; positive serum hepatitis B surface antigen (HBsAg) for at least 6 months; accepted examination of LSM by 2D-SWE.Informed consent was obtained from all the study participants.Exclusion criteria were as follows: patients who used drugs against BA accumulation, such as ursodeoxycholic acid; hepatitis A, C, D, E viral infections; autoimmune liver disease; non-alcoholic fatty liver disease; alcoholic liver disease; drug-induced hepatitis; hepatocellular carcinoma; history of endocrine diseases and cholestasis.Cholestasis was defined as serum alkaline phosphatase (ALP) level > 1.5 ULN and gamma-glutamyl transpeptidase (GGT) level > 3 ULN.The ULNs of ALP and GGT were 125 U/L and 69 U/L, respectively.
In addition, 60 healthy volunteers were recruited as control with older than 18 years.They were all volunteers who had no substantial past medical history of chronic liver disease.They had no substantial alcohol intake Sample preparation and high performance liquid chromatography (HPLC) analysis.We measured BA in the serum using previously described Liquid Chromatograph Mass Spectrometer method 17 .Simple protein precipitation using methanol was used to prepare the serum samples.Briefly, 200 μL methanol was added to 100 μL of serum spiked with 100 μL of ISs (d 4 -chenodeoxycholic acid and Nor-desoxycholic acid).Subsequently, all the mixtures were vortexed for 1 min and centrifuged at 13,000×g for 5 min.The supernatant was aspirated for further analysis.An Agilent 1260 Infinity HPLC coupled with an Agilent 6460 triple-quadrupole mass spectrometer equipped with an electrospray ionization interface was used for the analysis of serum.Chromatographic resolution was performed on an Agilent HC-C18 column (4.6 mm × 250 mm, 5-μm particles), guarded by an Agilent Eclipse XDB-C18 4.6 mm × 12.5 mm analytical guard column (Agilent Technologies, USA).The mobile phase consisted of methanol (solvent B) and 7.5 mM ammonium acetate containing 0.1% ammonium hydroxide (solvent A, deionized water), pH 7.5, at a total flow of 1 mL/min, and post column splitting (1:4) was applied to give optimal interface flow rates (0.2 mL/min) for MS detection.

Principal component analysis (PCA).
Principal component analysis (PCA) was performed using the prcomp (version 4.0.2) package to visualize the distribution of individual BA in different classes.

Random forest (RF).
RF was introduced as a classifier owing to its attractive characteristics, including the need for few tunable parameters, automatic handling of missing data, and insensitivity to overfitting 18 .By using all the descriptors in the training set to build an RF classification model based on the cross-validation method, the importance of each descriptor with respect to prediction ability was determined.Subsequently, the order of importance for all descriptors was obtained.The resulting model was implemented in the statistical language R based on STATISTICA 10.0 with the default settings.The BA profiles of the control, F1, F2 and cACLD group were randomly divided into training and test sets at a 4:1 ratio, respectively 175 and 44 patients.There are two important parameters, ntree (the number of trees) and mtry (the number of features to split on each node), that must be optimized.In this study, to obtain the optimal model, the value of ntree was tuned from 1 to 219 with a step of 100.Meanwhile, the value of mtry was tuned from 1 to 50 with a step of 1 in each tuning step of ntree.

Support vector machine (SVM).
The SVM method is a novel small-sample learning method, which can be used to deal with highly nonlinear regression and classification problems 19 .In brief, it is a supervised learning method that predicts the corresponding category of the new training sample by learning the category of the known sample and judging the relationship between the sample and the category.Similarly, all subjects were randomly divided into training and test sets with the ratio of 4:1.SVM was developed in a training set of 175 patients and tested in a validation set of 44 patients.

Model validation.
The performance of the classification models was evaluated using the following metrics: Marco-Precise, Marco-Recall, Marco-F1 score, total accuracy, and Kappa coefficient.

Statistical analysis.
Continuous variables were reported as median with interquartile range or mean with standard deviation.Categorical data, presented as number and frequencies (%).Differences among groups were analyzed by one-way analysis of variance with Dunnet's multiple comparison test or Mann-Whitney test using SPSS 25.0.02(IBM, New York, U.S.).The difference was considered statistically significant when p < 0.05.RF and SVM data acquisition and quantification were performed using STATISTICA 10.0.
Vol:.( 1234567890 PCA.PCA was performed to visualize the distributions of individual BA profiles in different degrees of liver fibrosis.As shown in Fig. 2, the BA of cACLD (pink) patients yielded higher PC1, PC2 and PC3 values, and only a few outliers were evident between control, F1, and F2.Therefore, the PCA method yielded a clear separation of cACLD patients from chronic liver disease caused by HBV using the individual BA.The alteration of serum bile acids in different liver fibrosis stage.Compared to control, the subjects with F1, F2, and cACLD exhibited an increase in total serum primary BA, while the proportion of total secondary BA decreased (Fig. 3a).In addition, patients with F1, F2, and cACLD exhibited significant increases in glycine-conjugated BA and taurine-conjugated BA compared with the control (Fig. 3b).The heat map displays the spectrum of the bile acid profiles across different fibrosis stage (Fig. 3c).Compared to control, there was a significant increase in the percentage of glycine-conjugated BA in F1 (2%), F2 (9%), and cACLD patients (14%).Taurine-conjugated BA exhibited a higher percentage in F1 (1.1%) and F2 patients (2.2%) and a significantly higher percentage in cACLD patients (12%).Moreover, the percent of unconjugated BA was 36% for the control, 33% for F2, 24% for F3, and 10% for cACLD group.

The change of serum individual bile acids in different liver fibrosis stage.
Compared to the control, the sum of unconjugated, glycine-conjugated, and taurine-conjugated BA contents in the serum was significantly increased in F1, F2, and cACLD patients (Fig. 4a).Compared to the control, F1 patients showed a significant increase in the content of CDCA (p < 0.001), GCA (p < 0.05), and GCDCA (p < 0.01) (Fig. 4b, c), but there was no significant difference in other individual BA.Unconjugated BA such as CA, CDCA, DCA, LCA, and UDCA in the serum of patients with F2 were significantly increased (p < 0.001), and glycine-conjugated BA such as GCA, GCDCA, GDCA, GLCA, and GUDCA were also significantly increased (p < 0.001) in those with F2 (Fig. 4c).Furthermore, TCDCA, THDCA, TLCA, and TUDCA were significantly increased (p < 0.001) in patients with F2 (Fig. 4d).All individual BA were significantly increased in the serum of cACLD patients, except for TDCA.

Classification performance of RF and SVM.
The number of decision trees was set to 20, and the maximum tree size was set to 15 based on the results of the parameter tuning tests.A summary of the RF response of classification is shown in Fig. 5a.For the RF method, a regression algorithm based on importance ranking was used to extract the features of the impact factors, select the optimal feature variable set, and achieve the goal of dimension reduction.The SVM model was optimized using tenfold cross-validation, and the selected samples were trained and predicted using the SVM model.The Marco-Precise, Marco-Recall, Marco-F1 score, kappa coefficient, and accuracy of the optimized RF model were 0.82, 0.82, 0.82, 0.74, and 0.81, respectively.The importance of 16 individual BA features was calculated using the RF method.The importance status is shown in Fig. 5b, showing that all the assigned individual BA features had the capacity to discriminate between different liver fibrosis stage.The CA, CDCA, DCA, GCA, and GCDCA features gained the highest importance.The Marco-Precise, Marco-Recall, Marco-F1 score, kappa coefficient, and accuracy of the optimized SVM model were 0.86, 0.84, 0.85, 0.76, and 0.82, respectively.The performances of the built RF and SVM models for identifying different liver fibrosis stage are shown in Table 2.

Discussion
The degree of liver fibrosis in patients with chronic liver disease predicts the likelihood of developing liver-related morbidity and death 20 .When these patients progressed to cACLD, the risk of liver-related decompensation events and death increased significantly.Thus, assessment of cACLD is an essential part of the evaluation of chronic liver disease patients in order to prognosticate, stratify therapeutic and surveillance strategies 8 .Research showed that elevated serum BA concentrations have been shown to be a more sensitive test for the detection of liver cirrhosis than conventional liver function tests 14 .Therefore, this study developed two different models (RF and SVM) based on individual BA profiles of serum samples to recognize cACLD in patients with HBV infection.
In the present study we analysed the relationship between serum concentrations of individual BA and the degree of liver fibrosis in patients with chronic liver diseases.We found that PCA method using the individual BA can distinguish cACLD from chronic liver disease patients.Furthermore, the total serum primary BA increased while the proportion of total secondary BA decreased in subjects with the controlled, F1, F2, and cACLD group.Meanwhile, the glycine-conjugated BA and taurine-conjugated BA increased, while the unconjugated BA decreased.More importantly, conjugated BAs, including GCDCA and TCDCA, increased significantly in patients with cACLD.It seems to be consistent with the research of Žížalová et al., which found that GCDCA and TCDCA are significantly related to portal pressure in patients with cirrhosis 14 .Oehler showed that the synthesis of primary BA, CA and CDCA, was significantly increased by the strong induction of hCYP7A1 (the rate-limiting enzyme converting cholesterol to BA) in human liver chimeric mice infected with HBV 21 .Although the relative excess of CDCA over CA derivatives seem to be a common feature of liver cirrhosis as well as nonalcoholic fatty liver disease, the mechanism behind this remains somewhat enigmatic 22 .The Žížalová K's study aimed to identify clinically significant portal hypertension in patients with cirrhosis through BA, while our study intended to identify cACLD in patients with chronic liver disease, which represent an important point for timely intervention to prevent further progression.
Furthermore, the RF model and SVM model derived from our current BA analysis showed good separation between different fibrosis stage, highlighting the diagnostic potential of this noninvasive analytical approach.Five serum BA, CA, CDCA, DCA, GCA, and GCDCA, gained the highest importance, suggesting that unconjugated and glycine-conjugated BA may be indicators of liver dysfunction in chronic hepatitis.
The limitation of the current study was that we were unable to compare noninvasive biomarkers with the gold standard of liver biopsy.In addition, the subjects included in present study are all patients with HBV.The results need to be verified in patients with other causes, such as hepatitis C virus, alcoholic, non-alcoholic fatty liver disease, autoimmune liver disease, etc.
In conclusion, RF and SVM models were applied to identify individual BA features that successfully distinguish patients with cACLD caused by HBV.This study provides a new tool for identifying cACLD patients that can enable clinicians to better manage patients with chronic liver disease.

Figure 1 .
Figure 1.The flow chat of the study population.HBV hepatitis B virus, HIV human immunodeficiency virus, cACLD compensated advanced chronic liver disease.

Figure 3 .Figure 4 .
Figure 3.The alteration of serum bile acids in the control, F1, F2 and cACLD group.(a) Stack bar plot representing proportion of unconjugated (Unc-BAs), glycine conjugated (Glyc-BAs) and taurine conjugated (Taur-BAs) bile acids.(b) Stack bar plot representing proportion of primary bile acids and secondary bile acids; (c) heat map display the spectrum of bile acids profile across different fibrosis stage.

Figure 5 .
Figure 5.The performance of build Random Forest (RF) model.(a) Classification matrix of all samples, number of trees: 20; (b) importance plot of individual bile acids.
A total of 159 patients with HBV infection and 60 healthy volunteers from the First Hospital of Lanzhou University between January 2022 to October 2022 were included in the final analysis.The characteristics of the study population are summarized in Table1.The mean age was (44.7 ± 11.6) years and 62% were males.F2 group was present in 74 patients, accounting for the largest proportion (33.8%).This was followed by control (27.4%, 60/219) and F1 group (20.5%, 45/219).cACLD was found in 18.3% (40/219) of patients.There was no significant difference in biochemical indexes between F1 and control group (p > 0.05).Patients with F2 and cACLD group had significantly elevated levels of AST, TBIL, DBIL, IBIL, and TBA compared to those in the control group (p < 0.05).In addition, the subjects with cACLD exhibited notable increase in ALT, ALP, and GGT comparing with the control (p < 0.05).