Introduction

Non-alcoholic fatty liver disease (NAFLD) is a global health issue with growing incidence1. Some patients with simple steatosis develop liver fibrosis which can lead to cirrhosis, hepatocellular carcinoma, liver transplantation and death2,3. The presence of liver fibrosis is the major determinant of future liver related complications4,5. Liver biopsy, an invasive procedure with an inherent risk of complications remains the gold-standard for disease grading and staging.

There is an urgent need for reliable non-invasive tools to detect NAFLD patients at risk for disease progression in order to prevent further complications. Several non-invasive tests such as the NAFLD fibrosis score (NFS)6 and FIB-4 index7 have been developed and included into a diagnostic algorithm proposed by the European Association for the Study of the Liver8.

Recent studies have further linked alterations in the gut microbiota to disease severity in NAFLD9,10,11,12,13,14,15,16,17,18,19. Loomba et al. were the first, who precisely and non-invasively identified NAFLD patients with advanced fibrosis (F3-F4) by using a machine learning algorithm which combined the abundance of specific gut microbial taxa and clinical features15. However, this Random Forest algorithm has not been validated so far.

The aim of this study was to validate the diagnostic accuracy of the gut microbiota-based prediction model published by Loomba et al.15, to compare the results to an own Random Forest model including clinical features and 16S rRNA gene sequencing data and to compare the results to the NFS and FIB-4 index as well as transient elastography as ultrasound-based method for the prediction of advanced fibrosis (fibrosis stage F3-F4) in 96 NAFLD patients.

Methods

Patient cohort

This cross-sectional, prospective study was primarily designed to detect associations between the gut microbiota, diet, as well as genetic factors with the activity and severity of NAFLD. Therefore, a total of 180 NAFLD patients were prospectively enrolled between March 2015 and December 2018 in the outpatient liver department of the Clinic for Gastroenterology and Hepatology, University Hospital of Cologne, Germany. The protocol was approved by the Ethics Commission of Cologne University’s Faculty of Medicine and written informed consent was obtained from each patient. The study was performed in accordance with the Declaration of Helsinki.

In this secondary analysis, 83 biopsy-proven NAFLD patients and 13 NAFLD patients diagnosed with cirrhosis based on characteristic clinical findings (see criteria below) were included.

Patients were referred to our tertiary referral center with elevated liver function tests and/or liver abnormalities on ultrasound for further diagnostic tests, or with already diagnosed NAFLD in order to investigate disease activity and severity. If NAFLD diagnosis was made or confirmed, patients were consecutively enrolled in this observational study, after written informed consent was obtained.

Within the study, a detailed history, physical exam, anthropometric measurements, blood pressure measurements, ultrasound and/or magnetic resonance imaging (MRI), transient elastography and liver biopsy, if clinically indicated, as per standard of care were performed. NAFLD was diagnosed, if the following conditions were met: hepatic steatosis on liver imaging (ultrasound and/or magnet resonance imaging) and/or the presence of >5% fat in histological analysis of liver biopsy, daily alcohol consumption of less than 10 g in women and less than 20 g in men, absence of steatogenic drugs such as glucocorticoids, methotrexate, amiodarone and tamoxifen, absence of other diseases causing secondary steatosis such as human immunodeficiency virus infection, celiac disease or inflammatory bowel disease, absence of other chronic liver diseases, e.g. viral hepatitis, autoimmune hepatitis, toxic liver injury, alcoholic steatohepatitis, cholestatic liver disease, Wilson’s disease and hereditary hemochromatosis. Alcohol consumption was self-reported. The daily alcohol intake was calculated based on the regular amount of specific alcoholic beverages according to patients’ disclosures. Exclusion criteria for all study subjects were oral- or intravenous antibiotic treatment within the last 6 months prior to the study, known malignancy, pregnancy, and age <18 years. Any recommendations or treatment suggestions for study participants did not differ from usual patient care. Thus, NAFLD patients received the same overall lifestyle recommendations as indicated in the current European guideline20. Further exclusion criteria for NAFLD patients were ongoing successful lifestyle modifications defined as more than 5% loss of body weight within the last three months prior to enrollment or current or prior participation in an interventional non-alcoholic steatohepatitis (NASH) study21.

Abdominal ultrasound was performed for all patients. All blood samples for laboratory analyses were collected under fasting conditions. Anthropometric measurements were carried out by trained physicians or research assistant nurses.

Type 2 diabetes was defined as glycated hemoglobin (HbA1c) ≥6.5% and/or fasting glucose ≥126 mg/dL and/or use of antidiabetic medications. Overweight was defined as body mass index (BMI) ≥ 25 kg/m2. Metabolic syndrome was defined following the International Diabetes Foundation (IDF) criteria. Arterial hypertension was defined as office blood pressure ≥140/90 mmHg on ≥2 measurements during ≥2 occasions or antihypertensive drug treatment.

Liver biopsies

Liver biopsy was performed in patients with elevated liver function test values over a longer period, to rule out potential other liver diseases than NAFLD and if a possible underlying advanced liver injury was suggested. Indicators for an advanced liver injury included altered laboratory parameters, co-morbidities, findings from ultrasound and/or MRI and results from transient elastography measurements.

If liver biopsy was performed, samples were evaluated by an experienced liver pathologist who was blinded for all clinical and laboratory patient data. The NASH clinical research network histological scoring system22 was used to evaluate disease activity and severity. Accordingly, steatosis was graded 0–3, ballooning was graded 0–2, lobular inflammation was graded 0–3. Fibrosis was staged from 0–4. Stages 1a, 1b and 1c were summarized as stage 1. Fibrosis stages: 0 none, 1 perisinusoidal or periportal, 2 perisinusoidal and portal/periportal, 3 bridging fibrosis, 4 cirrhosis. The NAFLD activity score was obtained for each biopsy. This score is defined as the unweighted sum of the scores for steatosis, (0–3), lobular inflammation (0–3), and ballooning (0–2); thus ranging from 0 to 821,22.

Non-invasive diagnosis of liver cirrhosis

If the following criteria were present, patients were staged as histological F4 fibrosis without determination of histological grading: liver imaging consistent with liver cirrhosis (e.g. nodular hepatic contour, changes in volume distribution indicating portal hypertension in the absence of portal vein thrombosis, secondary phenomena of portal hypertension such as splenomegaly, enlarged caudate lobe and left lobe lateral segment, regenerative nodules) together with clinical and laboratory signs of portal hypertension/cirrhosis (e.g. low platelets, albumin and prothrombin time, esophageal varices)21,23.

Liver stiffness measurement

For all patients, vibration controlled transient elastography (FibroScan, Echosens, Paris, France) was performed in fasting patients by experienced operators, blinded to all clinical patient data. At least 10 valid measurements were performed, and the median value of these measurements was reported in kPa. In accordance with the manufacturer’s protocol, patients were first scanned using the M probe and if indicated by the equipment, patients were re-scanned using the XL probe. Sensitivity, specificity, positive-, and negative predictive values were calculated using published cut-off values24.

Non-invasive fibrosis tests

The fibrosis-4 index (FIB-4)7 and NAFLD fibrosis score6 were calculated for each patient. Sensitivity, specificity, positive-, and negative predictive values were calculated using published cut-off values8.

Gut bacterial sequencing

The DNA was isolated using the RNeasy Power Microbiome Kit (Qiagen, Hilden, Germany). Seven of the 9 variable bacterial 16S rRNA gene regions (pool 1: V2, V4 and V8; pool 2: V3, V6/7 and V9) were amplified with the Ion 16S Metagenomics Kit (Thermo Fisher Scienctific, Waltham, USA) utilizing two primer pools (An integrated research solution for bacterial identification using 16S rRNA sequencing on the Ion PGM System with Ion Reporter Software https://www.thermofisher.com/content/ dam/LifeTech/Documents/PDFs/Ion-16S-Metagenomics-Kit-Software-Application-ote.pdf). Amplicons were pooled and cleaned using the NucleoMag NGS Clean-up (Macherey-Nagel, Düren, Germany). The Qubit system was used to determine amplicon concentration, the library was prepared with the Ion Plus Fragment Library Kit (Thermo Fisher Scienctific, Waltham, USA). For the template-preparation amplicon concentration was diluted to 30 ng/mL. The Ion Chef Kit and the Ion Chef system (both, Thermo Fisher Scienctific, Waltham, USA) were used to enrich and prepare the template-positive Ion Sphere Particles (ISP). Amplicon library was sequenced using the Ion Torrent S5 system (pH-dependent, Thermo Fisher Scienctific, Waltham, USA). The amplicon sequences were clustered into operational taxonomic units (OTUs) before taxonomical alignment with the MicroSEQ. 16S-rDNA Reference Library v2013.1 (Thermo Fisher Scienctific, Waltham, USA) and Greengenes v13.5 databases. 97% similarity was used to genus level assignment and 99% similarity for species level assignment. Data files were assigned by the Ion Reporter metagenomics 16S w1.1 workflow (Thermo Fisher Scienctific, Waltham, USA). The raw data was processed using the programming language R version 3.5.1.

Accession numbers sequence data

Sequence data were registered at NCBI under BioProject PRJNA540738. Sequence reads are available at NCBI under the following BioSample IDs: SAMN11554417-SAMN11554433, SAMN11554446, SAMN11554451-SAMN11554484 and SAMN13895357- SAMN13895400.

Statistical analysis

Results are expressed as median and interquartile range in parentheses for each continuous outcome and as number and percentage for categorical variables. A two-sided P value of equal or less than 0.05 was considered as statistically significant. Comparisons of clinical characteristics between groups were performed using the Kruskal-Wallis test with Dunn’s post-hoc test for continuous and Fisher’s exact test for categorical variables, each followed by false discovery rate (FDR) procedures to correct for multiple comparisons.

According to Loomba et al.15 we used a Random Forest machine learning algorithm to identify taxa and clinical variables that predict advanced fibrosis. In order to reduce noise, only taxa present in at least 66% of all samples and with a mean relative abundance >10−4 were considered as an input feature. Features resulting in the “best” accuracy were selected by the recursive feature elimination algorithm. The dataset was repeatedly (300 times) randomly split into training and test datasets. We further trained a Random Forest model including only overlapping taxa and the same clinical variables as implemented by Loomba et al.15. Only 16 species out of 37 species identified by Loomba et al. were also detected in our cohort. For these unresolved species, we included all other species within the respective genus. E.g. for “Blautia sp. KLE 1732” and “Blautia sp. CAG:37”, we included Blautia coccoides, Blautia faecis, Blautia glucerasea, Blautia hansenii, Blautia hydrogenotrophica, Blautia luti, Blautia obeum, Blautia producta, Blautia stercoris, Blautia wexlerae, Ruminococcus gnavus, Ruminococcus torques and Blautia unknown species, which resulted in 136 taxa. To increase the diagnostic accuracy and to reduce distraction of the model, we used Random Forest feature elimination to determine the top 37 taxa out of these 136 features together with age, Shannon diversity index, gender and BMI.

Receiver operating characteristic (ROC) analysis with calculation of the area under the curve (AUC) was performed to compare all non-invasive approaches. For the clinical scores and transient elastography, we calculated sensitivity, specificity, positive-, and negative predictive values using published cut-offs8,24. Statistical analysis was performed using R statistical software, R version 3.5.1, 2018 the R Foundation for Statistical Computing. This report follows the Standards for Reporting Diagnostic accuracy studies (STARD) checklist25.

Results

A total of 65 patients was staged as none to significant (F0-F2) fibrosis and compared with 31 patients with advanced (F3-F4) fibrosis of whom 13 patients were staged as F4 fibrosis based on characteristic clinical findings (see Methods sections). Patients with advanced fibrosis were older, had a higher BMI and waist-circumference, suffered more frequently from type 2 diabetes, arterial hypertension and metabolic syndrome, differed significantly in a variety of laboratory parameters and used proton pump inhibitors more frequently on a daily basis (Table 1). Patients with cirrhosis diagnosed based on characteristic clinical findings had significantly higher bilirubin (P = 0.006) and international normalized ratio (INR) (P = 0.04) levels, lower platelet counts (P = 0.003), a higher NAFLD fibrosis score (P = 0.023) and FIB-4 Index (P = 0.028) compared to those patients with advanced fibrosis detected at liver biopsy (Supplementary Table 1).

Table 1 Characteristics of the study cohort.

Values are presented as median and interquartile range (IQR) in brackets. 65 NAFLD patients were staged as F0-F2 fibrosis based on their liver biopsy result (“biopsy-proven F0-F2”), 18 NAFLD patients were staged as F3-F4 fibrosis based on their liver biopsy result (“biopsy-proven F3-F4”) and 13 patients were staged as NAFLD-cirrhosis based on characteristic findings on ultrasound and/or magnetic resonance imaging together with clinical and laboratory findings (“non-invasive F4”) (see Methods section for details). Groups were compared using the Kruskal-Wallis test with Dunn’s post-hoc test for continuous and Fisher’s exact test for categorical variables, each followed by false discovery rate (FDR) procedures to correct for multiple comparisons. Bold font indicates significance (P value < 0.05). Post-hoc P values for significant variables are reported in Supplementary Table 1. The number of missing values within the overall cohort is indicated in the third column (“N/A”). ALT, alanine aminotransferase; AST, aspartate aminotransferase; BMI, body mass index; GGT, gamma-glutamyl-transferase; HbA1c, glycated hemoglobin; INR, international normalized ratio; HDL, High-density lipoprotein; kPa, kilopascal; LDL, low-density lipoprotein.

All liver histology features are shown in Table 2.

Table 2 Liver histology features of the cohort.

Random Forest models to predict advanced fibrosis

Our Random Forest model, including all laboratory parameters from Table 1 together with age, gender, BMI, type 2 diabetes, arterial hypertension, metabolic syndrome, waist circumference and all gut bacterial taxa that met the criteria as an input feature, identified Ruminococcaceae (family), Streptococaceae (family) and Sutterellaceae (family) as gut bacterial predictors of advanced fibrosis in NAFLD. The strongest predictors, however, were clinical features such as fasting glucose, platelet count and age. This Random Forest model, based on clinical variables and the mentioned gut microbial taxa, achieved an AUC of 0.87 (95% CI 0.865–0.874, Fig. 1a).

Figure 1
figure 1

Comparison of Random Forest models with simple non-invasive tools to predict advanced fibrosis in NAFLD. (a) Area under the curve (AUC) for our Random Forest model based on 14 features (right panel) that were identified by Random Forest feature elimination. Light grey lines represent the 300 training runs, the black line and AUC represent the median over these. The right panel shows the feature importance based on mean decrease in Gini index. All shown bacterial taxa belong to the family level (b) AUC and mean decrease in Gini index for the validation approximation of the Random Forest model by Loomba et al. Only 16 species out of 37 species identified by Loomba et al. were also detected in our cohort. For these unresolved species, we included all other species within the respective genus (see methods section) which resulted in 136 taxa. To increase the diagnostic accuracy, we used Random Forest feature elimination to determine the top 37 taxa out of these 136 features together with age, Shannon diversity, gender and BMI. (c) Diagnostic performance of the FIB-4 index, NAFLD fibrosis score and transient elastography. In a-c, 83 biopsy-proven NAFLD patients and 13 NAFLD patients diagnosed with liver cirrhosis based on clinical characteristic and characteristic findings on liver imaging (see criteria in methods section) were included. 65 patients were staged as F0-F2 and 31 as F3-F4. AST, aspartate aminotransferase; GGT, gamma-glutamyl-transferase; INR, international normalized ratio; LDL, low-density lipoprotein; FIB-4, fibrosis-4 index.

We further compared the data to a model limited to clinical variables. In this model, fasting glucose, platelet counts, waist circumference, age, gamma-glutamyl-transferase (GGT), prothrombin time and aspartate aminotransferase (AST), followed by INR, albumin and type 2 diabetes were predictors of advanced fibrosis, identified by Random Forest feature elimination. The resulting AUC achieved 0.85 (95% CI 0.849–0.858, Supplementary Fig. 1a), indicating that the gut microbiota component of the previous model did not add major value regarding prediction of advanced fibrosis compared with the model which only included clinical variables.

Sixteen gut bacterial taxa were overlapping with the Random Forest model published by Loomba et al.15. To cover the remaining specific 21 species, we included all other related species within the corresponding genus level for the respective undetected species (see methods section). In this dataset, we used Random Forest feature elimination to select only the top 37 taxa together with Shannon diversity index, age, gender and BMI (according to Loomba et al.15). This model achieved an AUC of 0.71 (95% CI 0.701–0.714) for the prediction of advanced fibrosis in NAFLD (Fig. 1b).

Comparison of random forest models to conventional non-invasive fibrosis tests

We further compared gut microbiota-based approaches to well-known simple non-invasive fibrosis scores based on clinical and laboratory parameters as well as transient elastography in our cohort. The NAFLD fibrosis score and the FIB-4 index achieved both comparable results with an AUC of 0.86 (95% CI 0.75–0.94) and 0.85 (95% CI 0.78–0.94), respectively (Fig. 1c).

Transient elastography had the highest diagnostic performance with an AUC of 0.93 (95% CI 0.87–0.99, Fig. 1c). We used published cut-off values to further determine the diagnostic sensitivity, specificity, positive and negative predictive values of the NFS and FIB-4 (Table 3). When we compared the number of patients classified into intermediate stages to the actual, by liver biopsy confirmed fibrosis stage, both non-invasive fibrosis scores classified a considerable number of patients into intermediate stages. In contrast, this was only observed for two patients when using transient elastography. Transient elastography had the highest negative predictive value (94.6%) whereas the highest positive predictive value was observed for the FIB-4 index (85.7%, Table 3).

Table 3 Diagnostic performance of non-invasive fibrosis tests.

83 biopsy-proven NAFLD patients and 13 NAFLD patients diagnosed with liver cirrhosis based on characteristic clinical findings (see criteria in methods section) were included. Due to present missing values, 91 patients were included in the analysis for transient elastography, 95 for the FIB-4 analysis and 95 for the analysis NAFLD fibrosis score analysis.

Overall, the prediction of advanced fibrosis with transient elastography as an imaging-based method, performed superior when compared with Random Forest classifier models based on the gut bacterial microbiota together with clinical data or clinical features alone.

Discussion

In this study, we directly compared the performance of widely used non-invasive tools to microbiota-based machine learning approaches for the detection of advanced fibrosis in a well-described cohort of biopsy-proven NAFLD patients. Transient elastography, which is a fast and convenient method performed best, with an AUC of 0.93 for the prediction of advanced fibrosis.

Alterations in the gut microbiota have been linked to NAFLD but no consistent disease-specific gut microbiota signature has been established across several studies including patients with different geographical, ethnic and dietary backgrounds9,10,11,12,13,14,15,16,17,18,19. Further, using different sequencing methods limits the comparability of individual studies. While most studies, including our own, used 16S rRNA gene sequencing methods, the study by Loomba et al. sequenced the complete metagenome of gut microbial communities. The model by Loomba et al. includes bacterial taxa at species level and several of the included species could not be resolved in our study cohort. The analysis of 16s rRNA gene sequencing data involves clustering of the obtained sequences into OTUs. This approach is well-established, widely used and efficient. However, several challenges remain in terms of accurate and precise taxonomic quantification at species level26. On the other hand, assembling genomes from whole genome sequencing can be more informative and precise in identifying species level but may fail to identify the taxonomic origins of a gene of interest or to produce accurate and unbiased estimates of gene families abundances26.

Besides differences in the technical approaches, studied patient populations differed in several aspects. NAFLD patients in the study by Loomba et al. were enrolled in the Southwestern United States and included 34% NAFLD patients with Hispanic ethnicity while our study, performed in Germany, includes almost exclusively patients with a white ethnic background. Region-specific variations in lifestyle, ethnicity, nutrition, medication, genetics and environmental conditions play a role in shaping the gut microbiome. While some bacterial taxa are found to be common in all populations from different countries, abundances of multiple taxa have been found to vary substantially across populations27.

These aspects altogether might explain why we did not identify the same gut bacterial taxa at species level in our patient cohort and why several identified taxa by Loomba et al. were not similarly associated with advanced fibrosis in our study cohort. For a conclusive and more accurate validation, the combination of both datasets within one Random Forest algorithm using the same NGS method seems to be essential to distinguish between differences in the gut microbiome composition due to different NGS methods versus variations in patient populations.

Using an own Random Forest algorithm, we were able to achieve a good diagnostic accuracy. This algorithm was, however, still inferior to transient elastography. Compared with a Random Forest model consisting only of clinical features, adding bacterial taxa did not add major value to the model performance. This indicates that clinical features might be still more consistently associated with progression of liver disease compared with the relatively inconsistent findings from studies investigating gut microbiota alterations in NAFLD patients.

In conclusion, among the tested modalities to non-invasively predict advanced fibrosis in NAFLD patients, transient elastography, which is an easily applicable ultrasound-based method, performed best with an excellent diagnostic performance compared with other simple non-invasive scores and gut microbiota-based approaches in our cohort. When NGS will become easier applicable and standards regarding NGS based methods are more established, assessment of the gut microbiome might help to identify NAFLD patients with ongoing disease progression with the aim to prevent further liver related complications.