Nonalcoholic fatty liver disease is associated with dysbiosis independent of body mass index and insulin resistance

This study aimed to determine if there is an association between dysbiosis and nonalcoholic fatty liver disease (NAFLD) independent of obesity and insulin resistance (IR). This is a prospective cross-sectional study assessing the intestinal microbiome (IM) of 39 adults with biopsy-proven NAFLD (15 simple steatosis [SS]; 24 nonalcoholic steatohepatitis [NASH]) and 28 healthy controls (HC). IM composition (llumina MiSeq Platform) in NAFLD patients compared to HC were identified by two statistical methods (Metastats, Wilcoxon). Selected taxa was validated using quantitative PCR (qPCR). Metabolites in feces and serum were also analyzed. In NAFLD, 8 operational taxonomic units, 6 genera, 6 families and 2 phyla (Bacteroidetes, Firmicutes) were less abundant and; 1 genus (Lactobacillus) and 1 family (Lactobacillaceae) were more abundant compared to HC. Lower abundance in both NASH and SS patients compared to HC were confirmed by qPCR for Ruminococcus, Faecalibacterium prausnitzii and Coprococcus. No difference was found between NASH and SS. This lower abundance in NAFLD (NASH+SS) was independent of BMI and IR. NAFLD patients had higher concentrations of fecal propionate and isobutyric acid and serum 2-hydroxybutyrate and L-lactic acid. These findings suggest a potential role for a specific IM community and functional profile in the pathogenesis of NAFLD.


Measurements.
Patients provided one stool sample and one fasting blood sample, underwent anthropometric measurements (height, weight, BMI) and completed a 7-day food record, a 7-day activity log and an environmental questionnaire before the liver biopsy. Healthy adults undergoing assessment for living liver donation were approached during their first screening appointment. Upon consent, subjects completed the same tests. Histology was obtained during a pre-donation biopsy (performed to verify the healthy state of the liver) or during hepatectomy. The study participants' demographics, smoking, alcohol consumption history, medications and supplement use were reviewed and collected.
The food record included all food and beverages consumed over seven days using the 2D Food Portion Visual Chart (Nutrition Consulting Enterprises, Framingham, MA) to estimate portion sizes. Macro-and micronutrient intakes were calculated using Food Processor Diet and Nutrition Analysis Software (Version 7, ESHA Research, Salem, OR). Physical activity logs were recorded for seven days concurrent with the food records. Participants were asked to list any activity, duration, and intensity level (mild, moderate, strenuous, and very strenuous). Daily physical activity units were calculated: 1 unit = 30 minutes mild, 20 minutes moderate, 10 minutes strenuous, or 5 minutes very strenuous activity 20 .
Liver biopsies were taken percutaneously (needle biopsy) for NAFLD patients and intra-operatively (wedge biopsy) for HC. In some HC a pre-donation needle biopsy was used. A portion of each biopsy was fixed in 10% formalin, embedded in paraffin wax, sectioned, and stained with hematoxylin and eosin for morphologic evaluation and Prussian Blue stain to rule out iron loading. Steatosis, inflammation, and fibrosis were assessed using the Brunt system 21 . Disease severity was additionally evaluated using the NAFLD Activity Score (NAS) 22 . The pathologist was blinded to the study.
Blood was drawn in a fasting state. Measures of glucose metabolism included plasma glucose, insulin, hemoglobin A1c (HbA1c), and homeostasis model assessment estimated IR (HOMA-IR) 23 . Liver enzymes and lipid profile (total cholesterol, low density lipoprotein (LDL), high density lipoprotein (HDL) and triglycerides (TG)) were measured at the UHN Laboratory Medicine Program using the Architect c8000 system (Abbott Laboratories, Abbott Park, Illinois). LDL was calculated from total cholesterol -HDL. Fasting plasma glucose and plasma insulin were measured by the enzymatic hexokinase method and radioimmunoassay, respectively 24 .
Each patient and HC collected one stool sample, which was frozen immediately in their home freezer (−20 °C). Within 24 hours, they brought the frozen sample in an insulated bag with cooling elements to UHN, where it was stored at −80 degree centigrade. Subsequently, stool samples were transferred into a sterile masticator bag, thawed and homogenized using a masticator blender (IUL, S.A., Barcelona, Spain). The samples were aliquoted, immediately placed on dry ice, and then transferred to −80 °C for storage until DNA extraction and metabolite analysis.
For IM analysis, bacterial DNA was extracted using the E.Z.N.A. TM Stool DNA Isolation Kit (Omega Bio-Tek, Doraville, GA, USA) as previously described 9 . Fecal DNA samples were stored at −80 °C until being amplified by PCR using the Earth Microbiome V4 primer set 25 , with the addition of combinatorial in-line barcodes so that all the samples could be sequenced in the same sequencing run 26 . The amplified DNA was then sequenced on the Illumina MiSeq platform with paired end 220 nucleotide reads, producing 25 million reads in total. Reads were overlapped with PANDAseq assembler 27 , clustered into Operational Taxonomic Units (OTUs) using UCLUST 28 , and annotated with the SILVA database 29 using mothur 30 , producing a table of counts per OTU per sample. 7.5 million of the reads were successfully overlapped and annotated into 532 OTUs. A generalized workflow for processing 16S rRNA gene sequencing reads is available at https://github.com/ggloor/miseq_bin. Two sequencing runs were conducted with the results combined into one data matrix which was used for the analysis. The resulting data matrix was filtered at 1% OTU relative abundance (OTUs had to be at least 1% abundant in at least 1 sample) and the remaining OTUs (161) were included into the analysis. The lowest read count per sample after OTU filtering was 15,790, and the highest read count per sample was 210,867. The sequencing and pre-processing data for analysis was performed at the Microbiota Profiling Service, Department of Biochemistry, University of Western Ontario, London, Ontario, Canada.
To further evaluate the sequencing data, the abundance of selected taxa was estimated in fecal samples using qPCR. Primers and probe sequences used in this study are listed in Supplementary Table 2. These included: total bacteria, Bacteroidetes, Alistipes, Coprococcus, Ruminococcus, Lactobacillus and F. prausnitzii. The runs were performed in triplicates using 50 ng of fecal DNA in the 7900HT thermocycler (Applied Biosystems, Foster City, CA) as previously described 9 . The number of cells for each bacterial group in fecal samples was calculated from standard curves and expressed as colony forming unit per gram (CFU/g) of wet feces, normalized to total counts (relative abundance).
Serum metabolites, including ethanol, were measured at the Metabolomic Innovation Centre (University of Alberta, Edmonton, AB, Canada) using nuclear magnetic resonance (NMR) spectrometry on a 500 MHz Inova (Varian Inc., Palo Alto, CA) spectrometer (Varian Inc., Palo Alto, CA). For stools, NMR spectroscopy was also used from the same center to identify eight metabolites of interest (acetic acid, butyric acid, formic acid, isobutyric acid, isovalerate, L-lactic acid, propionate, succinate). For the measurement, fecal samples were prepared by mixing 20 mg of frozen fecal material with 1 mL of saline phosphate buffer in deuterium oxide, followed by centrifugation (18,000 × g, 1 min). Fecal supernatants were filtered through 0.2 μm membrane filters 31 .

Statistical Analysis.
Descriptive summaries of all measurements were calculated: means and standard deviations (SD) for normal variables, medians and first and third quartiles for skewed variables, and proportions for categorical variables. Plots were used as appropriate to examine the data. Bivariate relationships were assessed using Pearson's or Spearman's correlation coefficients, Chi-square and Fisher exact tests where appropriate. Comparisons across diagnosis groups were performed using one-way ANOVA followed by the t-test (normally distributed data), data were log-transformed to normalize distribution if necessary, or using the Kruskal-Wallis test followed by Wilcoxon rank-sums test (not normally distributed data). All tests were two-sided and performed at the 5% significance (alpha) level. Bonferroni's correction method was used to account for multiple comparisons between diagnostic groups.
For next generation sequencing, differences in overall IM characteristics of diagnostic groups were assessed in QIIME 32 . Principal coordinate analysis based on weighted UniFrac distance matrix 33 was performed to visualize and examine differences in overall microbial community compositions among the three groups. For differential abundance analysis read counts were quantile normalized 34 . In order to ensure robustness of our conclusions to statistical assumptions we applied 3 different analysis methods and made inferences if the results of all analyses were consistent. First, zero-inflated Gaussian models to perform 3-group comparisons 35 . Then pairwise differences between groups were assessed by two non-parametric methods: Metastats 35,36 and Wilcoxon rank sum test (details in Supplemental Materials). Correction for multiple comparisons was done using Benjamini-Hochberg false discovery rate 37 , and q-value < 0.05 was considered statistically significant. In some cases, following a recent publication 15 we were less stringent in controlling for multiple comparisons, and considered higher q-value thresholds (but less than 0.1).
For qPCR data, due to non-normality and heteroscedasticity of distribution for most groups of bacteria, we used non-parametric methods: Kruskal-Wallis and Wilcoxon rank sum tests for between group comparisons, and Spearman correlation coefficients and partial Spearman correlation coefficients to examine relationship between relative abundances and other variables (diagnosis (treated as ordinal variable), BMI, IR, metabolites).
Analysis was performed using tools SAS 9.4, R 3.2.5 and QIIME software.
Data Availability. The datasets generated during and analysed during the current study are not publicly available due to privacy restrictions but some blinded data may be made available from the corresponding author on reasonable request.

Results
Patient characteristics. The study included 15 patients with biopsy-proven simple steatosis (SS), 24 with nonalcoholic steatohepatitis (NASH) and 28 healthy liver donors from the living donor transplant program as HC. Patient demography and clinical characteristics are presented in Table 1. The HC group was younger compared to both SS and NASH with significantly lower median BMI compared to NASH. There were no significant differences in gender distribution. As expected, liver transaminases were progressively increased from HC to SS to NASH. Four NASH and 1 SS patients who were taking anti-diabetic drugs (none were on insulin therapy) as well as 3 patients with HOMA-IR >15 (possibly a consequence of non-fasting before the blood test), were excluded from analyses related to diabetes parameters (HbA1c, HOMA-IR). SS and NASH patients had significantly higher HbA1C compared to HC, and NASH had higher HOMA-IR compared to both SS and HC. For the liver histology, NAFLD Activity Score for SS patients ranged from 1 to 4 (median 1), and for NASH patients from 3 to 8 (median 4.5). Details on liver histology (Supplementary Table 3 Tables 6 and 7) are reported in the supplementary section. Overall, there were no significant differences in level of activity and diet composition. There was a higher proportion of HC born in Canada but ethnicity was similar between groups for those who filled out the questionnaire.
Intestinal Microbiome. Next generation sequencing data. 161 OTUs passed a 1% filter and were retained for analysis. The filtered OTUs were aggregated (after normalization) into higher taxonomic levels: 8 phyla, 25 families and 44 genera.
The analysis of overall community structure and composition, including Simpson diversity metric and principal coordinate analysis did not reveal differences between the groups. Simpson diversity was not significantly different between the 3 groups (p-value 0.5) (Supplementary Figure 1). There was no clear separation between groups in weighted UniFrac principal coordinate analysis plot (Supplementary Figure 2).
To investigate differences between diagnostic groups in differential abundance of individual taxa, we first fit ANOVA ZIG model comparing 3 groups on OTU and genus levels, followed by pairwise comparisons between each diagnostic category (Supplementary Data). Size and direction of differences for HC vs SS and HC vs NASH comparisons were similar in both OTU and genus levels, and differentially abundant groups were similar in both of these comparisons. At the same time, the differences between SS and NASH groups were never significant and size of differences was small, suggesting similar microbiome characteristics for NASH and SS in our study. Based on this and taking into account low sample size in SS group, we combined NASH and SS into one NAFLD group for further analysis.
qPCR Data. Based on 16S data analysis results and similar taxa previously reported 11,12,14,15 , the following taxa were selected to be quantified by qPCR: Bacteroidetes which included Parabacteroides and Bacteroides genera found on sequencing as well as F. prausnitzii, Alistipes, Coprococcus, Ruminococcus and Lactobacillus. The relative abundance of the targeted bacterial groups is presented in Table 3.
The following groups were less abundant in NASH and SS patients compared to HC: Ruminococcus, F. prausnitzii and Coprococcus (Table 3). Even though the median abundance for all taxa except Coprococcus were lower in NASH than in SS, the differences between the two patient groups did not reach statistical significance. There were no statistically significant differences between diagnostic groups for Bacteroidetes, Alistipes and Lactobacillus.
To investigate whether the association between taxa and NAFLD was due to confounding factors like BMI and IR, we calculated two sets of partial Spearman correlations: (1) between these factors and bacterial abundances, controlling for diagnosis and; (2) between bacterial abundances and diagnosis controlling for BMI and HOMA-IR (Table 4). For the taxa that differed between HC and NAFLD, partial correlations between diagnosis status and abundance were highly significant after controlling for BMI or IR. After controlling for diagnosis, there was no evidence of relationship between BMI and bacterial abundances, and for HOMA-IR only weak evidence of a positive relationship with F. prausnitzii (Fig. 3). Therefore, our analysis suggests that NALFD is associated with reduced abundance of Ruminococcus, Coprococcus and F. prausnitzii independent of BMI and IR.
We then tested whether there was a relationship between liver histology (% steatosis, lobular inflammation, ballooning, fibrosis, NAFLD activity score, as seen in Supplementary Table 3) and relative abundances of quantified taxa within the NAFLD group (NASH+SS, n = 38). The results were negative except for a weak negative relationship between Ruminococcus and lobular inflammation (ρ = −0.33, p-value 0.04) (Supplementary Table 8).
Considering that these p-values were not controlled for multiple comparisons, this result can be due to Type 1 error.
Bacterial Products. NAFLD patients had significantly higher concentrations of fecal propionate and isobutyric acid as well as serum 2-hydroxybutyrate and L-lactic acid ( Table 5). The ratio of the short-chain fatty acids (SCFA) acetate:propionate:butyrate in our HC and NAFLD patients was approximately 62:23:15 and 61.5:22.5:16, respectively, which is similar to commonly quoted 'normal' ratios of 60:20:20 and 60:25:15 38 . Serum ethanol was not different between groups. The abundance of Ruminococcus, F. prausnitzii and Coprococcus was not correlated with the concentration of the measured bacterial products (Supplementary Table 9).

Discussion
We demonstrated, in a well-characterized adult population, that NAFLD was associated with reduced abundance of several bacterial taxa (Ruminococcus, Coprococcus and F. prausnitzii) independent of BMI and IR. Additionally, selected bacterial products related to fermentation (SCFA) were higher in NAFLD patients compared to HC. Our results are consistent with previous studies, comparing NAFLD and HC [11][12][13][14] and investigating the association between IM and NAFLD severity 15 . In all of these studies, comparisons were cross-sectional, except for Wong et al. who also compared IM of NASH patients before and after probiotic intervention. All reported evidence of dysbiosis in NAFLD patients when compared to HC. Only one study 14 used RT-PCR to validate the results of the next-generation sequencing analyses but only with one genus (Lactobacillus). A pediatric study 11 is also the only one that controlled for confounding factors when assessing the association between IM and NAFLD.
The patient groups in this study differed as expected based on the disease state. SS and NASH patients were older and NASH had a significantly higher BMI than HC. Liver enzymes and insulin resistance measures were also higher in the NAFLD groups. All of these findings were expected given the typical NAFLD population 39 . In regard to age, previous studies have identified changes in IM over the lifespan, however, most changes occur in early childhood and elderly adults >70 years old [40][41][42] . Research into IM changes in the elderly has often been confounded by dietary changes, medication use, and declining overall health. Therefore, a cut-off age for IM change is undeterminable 43 . Our total population, with a maximum age of 68 years is therefore unlikely to have experienced a confounding effect of age on IM. In regard to BMI, the median BMI of the HC was in the overweight range 26.6 (19.5, 35.3) [median (min, max)]. However, the groups were characterized by liver histology, and HC were confirmed to have a normal liver. This is a strength in our study as, to our knowledge, no other studies published on NAFLD had a HC group with confirmed normal liver biopsies. Another strength is the HC group with a median BMI in the overweight range. This BMI range in the HC group is reflective of the current Canadian population 44 . The smaller difference in BMI between NAFLD and HC reduces potential confounding effects on IM when comparing the groups.
The lower abundance of Coprococcus or Faecalibacterium in NAFLD was consistent with other studies 11,13,14 . Ruminococcus was previously reported only by Zhu et al., but the family Ruminococcaceae was consistently reported as less abundant in NAFLD 12,14 . In our study, liver biopsy was used to diagnose NAFLD while some of these previous studies 12,14 relied mostly on imaging. Also, their healthy controls had no liver biopsy, were much leaner compared to NAFLD [12][13][14] and differences in BMI were not considered in the analysis. This could have influenced the results.
Our analyses did not show any association between IM and NAFLD severity or correlation between specific taxa and histological parameters. This is contrary to the findings by Boursier et al. 15 , who reported higher abundance of Bacteroides in NASH compared to SS patients and a positive association between Ruminococcus and fibrosis severity, independent of metabolic factors (assessed only by the presence or absence of diabetes  Table 2. Taxa identified as differentially abundant in patients with NAFLD (SS or NASH) compared to HC by Metastats and Wilcoxon rank sum test. 1 Negative parameter denotes that taxon is more abundant in HC, positive -that it is more abundant in NAFLD. 2 Estimated in 3 rounds of permutations, 10,000 instances each. HC: healthy controls, SS: simple steatosis, NAFLD: nonalcoholic fatty liver disease, NASH: nonalcoholic steatohepatitis.   Table 3. Relative abundances of bacterial taxa quantified by quantitative PCR. Abundances were calculated as ratio of colony forming units (CFU) to total bacteria CFU in the sample. Same subscript denotes significant difference (after Bonferroni correction). HC: Healthy controls, SS: simple steatosis, NASH: nonalcoholic steatohepatitis. *For 1 NASH patient there was not enough DNA for Coprococcus, Alistipes and F. prausnitzii, so the sample size was reduced to n = 23. **Kruskal Wallis test. Same subscript letters denote significant differences.  comparisons were applied compared to Boursier's study, which did not correct for increased chances of type I error when testing the hypothesis of differential abundance for multiple taxa.
Although the sequencing data showed that the Bacteroidetes phylum and one OTU within the phylum were lower in NAFLD versus HC, this was not confirmed by qPCR. Using qPCR, we previously showed 9 a lower percentage of Bacteroidetes in NASH versus HC and SS. However, it is possible that since the qPCR primers targeted all Bacteroidetes, we may have missed sub-phylum differences if the differences were less pronounced in this study compared with results from Mouzaki et al. 9 . Others also reported conflicting results for this taxon. Boursier et al. found higher proportion of Bacteroidetes in NASH compared to SS 15 , Zhu et al. also found a higher proportion in pediatric NASH patients and those with obesity compared to lean controls 11 , whereas others 12,14 did not mention it among differentially abundant taxa in NAFLD.
We found lower abundance of F. prausnitzii, Coprococcus and Ruminococcus in NAFLD patients. F. prausnitzii is an anti-inflammatory commensal bacterium and its abundance is reduced in inflammatory bowel disease 45 and metabolic syndrome 46 . Coprococcus was also lower in NAFLD patients, similar to the report by Zhu et al. 11 . A lower abundance of Coprococcus has also been observed in other inflammatory conditions 47 . Therefore, it is conceivable that lower abundance can promote chronic inflammation that may contribute to NAFLD pathogenesis. Ruminococcus belong to the family Ruminococcaceae which are also fermenting anaerobes that lead to the production of SCFA 48 . Both Zhu et al. 11 and Raman et al. 12 found a lower abundance of Ruminococcaceae in NAFLD patients, however, not in this particular genus.
We expected to see differences in SCFA between groups. However, in feces, only propionate and isobutyric acid were higher in NAFLD compared to HC, whereas concentrations of butyrate, acetate, formate, and total SCFA were not different. There were no correlations between SCFA and bacterial abundances. The ratios of acetate:propionate:butyrate were similar to the normal values in both groups 38 . However, the amount and type of products can vary depending on species. For example, Ruminococcus genus contains species and strains which can be metabolically versatile [49][50][51] which means the amount and type of fermentation products vary according to species. Some can use different substrates like mucus or cellulose, resulting in production of acetate, ethanol, succinate, lactate and formate, but very little butyrate as end products of glucose metabolism 51 . We also found higher serum 2-hydroxybutyrate and L-lactic acid. However, it is difficult to compare our results to previous studies as human data on SCFA in NAFLD are scarce. Wong et al. found that NASH patients had a lower prevalence of butyrate producing Faecalibacterium (similar to our study), but they did not measure butyrate levels 13 . Raman et al. 12 , comparing obese NAFLD diagnosed on imaging versus lean HC subjects, measured fecal volatile compounds and also showed higher levels of SCFA with elevated fecal propionic, butyric and acetic acids in NAFLD. Similarly increased volatile compounds were observed in NAFLD patients 16 and higher total fecal SCFA, particularly propionate, was also found in overweight and obese adults compared to lean controls 52 . Higher SCFA from bacterial fermentation can lead to increased energy absorption of up to 150 kcal per day 53 . It is conceivable that the higher concentration of fecal propionate and isobutyric acid found in our NAFLD group reflect this phenomenon. Mechanisms other than IM could have also contributed to the higher level of SCFA in NAFLD versus HC, such as slow transit time 38 reported in obesity 54 which may have increased fermentation, or differences in absorption rate 55 . 2-hydroxybutyrate (alpha-hydroxybutyrate) is a metabolite formed during amino acid catabolism and glutathione anabolism 56 . Although this compound has been mentioned as an intermediary in the bacterial  metabolism of amino acids, it is not clear how much is actually produced and absorbed in the human colon and absorbed into the blood 57 . Oxidative stress in the liver that can be triggered by IR may have contributed to higher 2-hydroxybutyrate level by increasing the production of hepatic glutathione 56,58 . The strengths of our study were the precise characterization of HC and NAFLD patients through liver biopsy, and the documentation of dietary intake, physical activity, and environmental factors. We applied robust statistical methodology appropriate for analysis of this type of data, including normalization methods and statistical testing. Particularly, we used 2 types of statistical tests for sequencing data to ensure robustness of conclusions. We also applied multiple comparisons corrections whenever appropriate to avoid identifying false positive taxa and confirmed the results with qPCR. The limitation of our study is the relatively low sample size that does not allow for more multivariate analyses and may have prevented us from showing differences between IM of SS and NASH. However, the sample size was similar to other studies 11,13 that used liver biopsy to characterize NAFLD. The expanding study of the IM discovered numerous environmental factors that may influence the IM. Vitamin D deficiency, for instance, has recently been linked to dysbiosis and NAFLD 59 . In this study, vitamin D status was not examined, but the patient population was from a relatively small geographic region, and vitamin D intake did not differ between groups. Also, we cannot confirm causality owing to the observational design.
In summary, NAFLD had lower abundance of Ruminococcus, F. prausnitzii and Coprococcus independent of BMI and IR, and higher concentrations of select fecal and serum metabolites, which may suggest a specific IM community and functional profile in these patients. Future metagenomic research would allow for better characterization of this functional profile.