Metagenomic sequencing of bile from gallstone patients to identify different microbial community patterns and novel biliary bacteria

Despite the high worldwide prevalence of gallstone disease, the role of the biliary microbiota in gallstone pathogenesis remains obscure. Next-generation sequencing offers advantages for systematically understanding the human microbiota; however, there have been few such investigations of the biliary microbiome. Here, we performed whole-metagenome shotgun (WMS) sequencing and 16S rRNA sequencing on bile samples from 15 Chinese patients with gallstone disease. Microbial communities of most individuals were clustered into two types, according to the relative enrichment of different intestinal bacterial species. In the bile samples, oral cavity/respiratory tract inhabitants were more prevalent than intestinal inhabitants and existed in both community types. Unexpectedly, the two types were not associated with fever status or surgical history, and many bacteria were patient-specific. We identified 13 novel biliary bacteria based on WMS sequencing, as well as genes encoding putative proteins related to gallstone formation and bile resistance (e.g., β-glucuronidase and multidrug efflux pumps). Bile samples from gallstone patients had reduced microbial diversity compared to healthy faecal samples. Patient samples were enriched in pathways related to oxidative stress and flagellar assembly, whereas carbohydrate metabolic pathways showed varying behaviours. As the first biliary WMS survey, our study reveals the complexity and specificity of biliary microecology.


Results
Microbial community heterogeneity among patients. We performed unbiased WMS sequencing of total DNA extracted from 15 bile samples, obtaining an average of 7.29 Gbp per sample (range: 6.81-7.81 Gbp). WMS reads were filtered multiple times to remove host components, and then aligned to datasets containing 9,671 bacterial genomes from 3,608 species. Among these individuals, bacterial read percentages ranged from < 0.001% to 1.66% (Fig. 1a). We did not find apparent between-group bias of bacterial ratios; that is, whether patients were febrile and/or had a history of endoscopic retrograde cholangiopancreatography (ERCP) was not correlated with the bacterial fraction in bile samples.
Bile microbial community analysis revealed heterogeneity among individuals. A total of 173 bacterial species were identified, and hierarchical clustering at species level divided these individuals into two major clusters (I and II) and one minor cluster ( Fig. 1b; Supplementary Table S1). Clustering was not consistent with patient grouping. Interestingly, we found that oral cavity and respiratory tract inhabitants were more prevalent than intestinal inhabitants in the bile samples. Through literature mining, we defined 25 oral cavity/respiratory tract inhabitants, 11 intestinal inhabitants and 14 inhabitants of both sites among the 54 microbes observed in at least three patients with an abundance ≥ 0.1% (Supplementary Table S2). The 11 intestinal inhabitants were basically distributed in Cluster I (containing individuals A1, B1, B5, C1 and C3), which was highly enriched in E. coli (abundance range: 52.3-85.7%). Cluster I contained most of the intestinal microbes, whereas Cluster II (containing individuals A2-A5, B2, B6, C4 and C5) only contained two intestinal microbes. In particular, intestinal microbes such as Shigella spp. and Salmonella enterica were only detected in Cluster I and not found in Cluster II. Both major clusters contained 17 of the 25 oral cavity/respiratory tract inhabitants and 11 of the 14 microbes reported in both body sites. Samples in Cluster I consistently had higher bacterial read ratios (0.62-1.66%) than samples in Cluster II (0.0008-0.056%). Overall, the heterogeneity of the microbial communities can be attributed to the distribution of intestinal inhabitants, despite the presence of oral cavity/respiratory tract inhabitants.
Microbial species were frequently identified in only one (n = 77) or two individuals (n = 43) (Supplementary Table S1). High within-sample abundance was observed for some species, such as Raoultella ornithinolytica (15%) in A1, Pyramidobacter piscolens (3.39%) in C3 and Proteus mirabilis (0.6%) in C1. K. pneumoniae was the only species identified in almost all individuals (14/15 Abundances of the genera exhibited a high correlation (Pearson correlation coefficient = 0.91) (Fig. 2a,b). For samples with insufficient bacterially aligned WMS reads (i.e., A5 and B4), 16S sequencing had greater sensitivity and could identify more genera. Although hierarchical clustering was not performed for 16S sequencing, the distributions of overlapping genera were somewhat similar (Fig. 2a,b), especially for highly abundant genera. For example, Cluster I was still enriched with intestinal genera (e.g., Escherichia and Enterococcus). However, 16S sequencing exhibited greater intra-sample differences in abundance (a) Ratios of WMS sequencing reads after removing host background (hollow black-edged bars) and ratios of reads aligned to bacterial database (inner bars). All ratios were calculated by dividing by the total clean read number after quality control. Sample IDs are labelled on the x-axis, and ranked by their clustering relationships. Bacteria-aligned read ratios of samples from groups A, B and C are indicated by light blue, dark blue and green bars, respectively. (b) Hierarchical clustering of samples based on microbial community, distribution of bacterial origin and heatmap of species abundances. Individuals are denoted by coloured blocks, as in (a). Bacterial origins were classified as oral (referring to the oral cavity/respiratory tract), intestinal, both oral and intestinal, and other (environmental or unknown). For bacteria with ≥ 0.1% abundance in at least three individuals, the distribution of their origin in each individual is reflected by the histogram above the heatmap. The heatmap colour scale quantifies the log 10 relative abundance of species, from grey (none or low abundance) to dark red (high abundance).
(i.e., A5 and C4) than WMS sequencing, which could be attributed to the experimental biases of 16S sequencing.
Next, we compared the alpha and beta diversities of bile samples with those of healthy faecal samples (n = 319) from Human Microbiome Project (HMP) 17,18 (Fig. 2c,d). As measured by inverse Simpson index, alpha diversities of bile samples based on WMS and 16S sequencing were significantly lower than those of faecal samples (Wilcoxon rank-sum test, P < 10 −8 ). The difference in alpha diversity between bile WMS and 16S sequencing data (P = 0.029) might be due to differences in the sequencing strategy (Fig. 2c). On the contrary, there was no significant difference between the Bray-Curtis beta diversities  Fig. 1. Common genera were those with a relative abundance ≥ 1% in at least three samples. The genera names for (a,b) are shown on the y-axis of (a). (c-d) Distributions of log 2 alpha (c) and beta diversities (d). Orange and red boxplots denote distributions of WMS and 16S sequencing of bile samples respectively, accompanied with the distributions of faecal samples from HMP (purple boxplots). Log 2 alpha and beta diversities are presented as scatter plots in insets (orange, bile WMS; red, bile 16S), with scales identical to the corresponding boxplot y-axis. (e-f) Species and OTU counts by WMS (e) and 16S sequencing (f) with increasing sample number. At a given sample number, a maximum 100-time random sampling of these bile samples was performed to calculate species or OTU counts. Boxes represent the interquartile range (IQR) between first and third quartiles (25th and 75th percentiles, respectively). Lines inside denote the median, and whiskers denote the most extreme values within 1.5 times IQR from the first and third quartiles, respectively. Outlier values are represented as circles.
of bile by 16S sequencing and faecal samples (P = 0.1753). Beta diversities of bile by WMS sequencing were borderline-significantly higher than those of bile by 16S sequencing (P = 0.044) and of faecal samples (P = 0.021). This result may be due, at least in part, to the varied sensitivity of WMS sequencing for bacterial identification. We also employed other alpha (Simpson index and Shannon index) and beta (Jaccard index) diversity indices to compare these samples, and obtained qualitatively consistent results (Supplementary Figs S1 and S2).
Accumulation curves of species and OTU counts are shown in Fig. 2e,f. Although bile samples exhibited less microbial diversity than faecal samples, the curves were not fully saturated. More samples and a more sensitive experimental design could be useful for enabling a more comprehensive analysis of the biliary microbiota.
Newly identified microbes in bile. Through WMS sequencing, we identified microbes that have not been reported previously in the human biliary tract. To ensure the reliability of taxon identification, we selected strain-specific WMS sequencing reads having ≥ 99% identity and full-length (100-bp) alignment with their reference genomes. We obtained 35 reference genomes with ≥ 1% coverage by these reads. Among these genomes were 13 microbe species that, to the best of our knowledge, have not previously been reported in bile samples (Table 1; Fig. 3). Nine of these species were individual-specific. In particular, P. piscolens and Cellulosimicrobium cellulans, identified in C3 and A1, respectively, exhibited ≥ 10% reference genome coverage. Five of the 13 novel species were also identified by 16S sequencing with ≥ 97% identity to species-level OTUs in Greengenes.
According to the Human Oral Microbiome Database 20 and our literature investigation, eight of the 13 novel species were human oral microbial taxa; the rest were possible environmental taxa. We focused on the opportunistic pathogens. P. piscolens has been found in odontogenic abscesses, inflamed gingival crevices and infected canals 21,22 . Olsenella uli and Porphyromonas endodontalis are associated with human endodontic infections 23,24 . C. cellulans (strain LMG 16121), identified in A1 with a coverage of 10.32%, is an inhabitant of soil, water, decaying plant material and brewery sewage 25 . It rarely causes human infections, but may be involved in human septic arthritis 26 and bacteraemia 27 . R. ornithinolytica, Gordonia sputi and Gordonia bronchialis are environmental bacteria that can cause bacteraemia [28][29][30] .
Gallstone formation and bile resistance. We further investigated genes that might be related to gallstone formation or bile resistance. Previous studies have demonstrated potential associations of gallstone formation with bacterial products β -glucuronidase, phospholipase and urease 6,7,[36][37][38][39] . Among the 173 bacterial species identified in the 15 bile samples, we extracted 34 species harbouring genes encoding these enzymes by searching the KEGG (Kyoto Encyclopedia of Genes and Genomes) GENOME   Database. Using WMS sequencing, we also validated gene existence through metagenomic assemblies and metabolic reconstruction. β -glucuronidase and phospholipase hydrolyse biliary bilirubin and phosphatidylcholine, respectively, leading to precipitation of calcium bilirubinate and calcium palmitate and thereby facilitating pigment solid formation 6 . Seven and ten individuals had samples containing ≥ 3 species harbouring genes uidA (encoding β -glucuronidase) and pldA (encoding phospholipase A1), respectively ( Fig. 4a; Supplementary  Table S5). Samples from some individuals (i.e., A3, A5, B6 and C5) had one or no unvalidated related species. As the species numbers were correlated with bacterial WMS sequencing read ratios (Fig. 1a), failure to identify genes encoding β -glucuronidase or phospholipase might be due to issues related to DNA extraction or sequencing. For most individuals, the gene predictions did not identify the gene plcC encoding phospholipase C, which promotes cholesterol nucleation in human gallbladder bile 37 . Helicobacter-produced urease was reported to promote calcium precipitation, which might initiate gallstone formation 38 . Although the genes encoding urease were prevalent in the bile samples, Helicobacter spp. were not found among the urease-containing bacteria.
We also investigated genes involved in bile resistance (Fig. 4b). Bile salt deconjugation and multidrug efflux pump proteins are two mechanisms for bacterial bile resistance 8,40 . Within bile microbial communities in this study, we found 13 species with the gene encoding bile salt hydrolase (bsh) and 30 species with genes encoding efflux pump proteins (acrA, n = 29; acrB, n = 29; emrA, n = 23; emrB, n = 23; tolC, n = 22). Species harbouring bsh were less prevalent among individuals than species with efflux pump proteins, and the gene predictions were consistent with these observations. Moreover, bacterial slime (i.e., glycocalyx) and biofilm play important roles in gallstone formation and bile resistance 6,41-43 . Strong slime producers, including Prevotella intermedia, Clostridium perfringens,  Table S6). Pathways involved in basic bacterial life dominated the bile microbiota metabolic network. Using predicted gene sets, we identified core eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups) functional categories, including carbohydrate transport and metabolism, amino acid transport and metabolism, transcription as well as energy production and conversion (Supplementary Table S7; Supplementary  Fig. S3). When we projected eggNOG orthologous groups onto the KEGG pathways with iPath v2.0 46 ( Supplementary Fig. S4), we found that the overall distribution of pathways was similar to that of pathways identified by HUMAnN 47 as shown in Fig. 5.
Next, we compared the metabolic networks of bile microbiota with those of faecal microbiota from HMP faecal samples (n = 136), focusing on differentially enriched pathways ( Fig. 5; Supplementary Table  S8). Bile samples were significantly enriched in pathways related to glycerophospholipid and glutathione metabolism (Wilcoxon rank-sum test, both adjusted P < 1.5 × 10 −4 ), which are involved in inflammation and oxidative stress responses, respectively 48,49 . Upon investigating KEGG orthologues associated with resistance to oxidative stress (Supplementary Table S9), we observed an enrichment of glutathione reductase (NADPH, K00383, adjusted P < 2.1 × 10 −4 ) and putative iron-dependent peroxidase (K07223, adjusted P < 0.042), consistent with the pathway analysis results. There was also an enrichment in the flagellar assembly pathway (adjusted P = 0.00022), which might be involved in biofilm development 45 and bacterial sensory systems 50 . Glycolysis/gluconeogenesis, propanoate metabolism, ascorbate/aldarate metabolism, ABC transporter and phosphotransferase system pathways were also enriched (all adjusted P < 4 × 10 −5 ).
There were many depleted pathways in bile samples compared to faecal samples ( Fig. 5; Supplementary  Table S8). For instance, the bile microbiota had decreased abundances of glycosaminoglycan and other glycan degradation pathways (both adjusted P < 1.3 × 10 −6 ); this finding implies a decreased ability to employ glycan as a bacterial nutrition source 51 . Other depleted pathways included carbon fixation pathways in prokaryotes (adjusted P < 5.3 × 10 −7 ) and amino acid metabolic pathways involving lysine, histidine, taurine and hypotaurine (all adjusted P < 2.9 × 10 −6 ). Pentose phosphate pathway and starch/ sucrose metabolism, which were related to carbohydrate metabolism, were also among the depleted pathways (both adjusted P < 2.4 × 10 −3 ).

Discussion
Asian countries, including China, have a high prevalence of BPS disease, which is the gallstone disease that is most closely associated with microbial infection. Thus, BPS disease could serve as a good model for gallstone-related bacterial studies 2,3 . Unexpectedly, we did not observe an apparent relationship between patient groups and microbial community features, as individuals from the same group had different patterns of biliary microbiota. Moreover, the bacterial read ratios of bile samples by WMS sequencing, which might reflect the bacterial load in the biliary environment, were not correlated with patient grouping. These results imply that the biliary microbiota might be shaped by more complicated factors, such as diet, lifestyle, host immune responses and/or specific anatomy structures, rather than by microbiota alone.
On the other hand, the lack of a correlation between patient grouping and microbial community may provide some insight into the clinical debate about ERCP procedures. Whether to perform endoscopic sphincterotomy (EST) or endoscopic papillary balloon dilation during ERCP remains controversial 52,53 , given concerns that EST could increase the risk of biliary tract infection 54 . We found that the microbial distribution of samples from choledocholithiasis patients who had no history of EST (groups A and B) did not differ greatly from the distribution of samples from afebrile patients who had a history of EST (group C). This observation implies that EST might not be a determining factor for the biliary microbial composition of BPS patients.
Our results were basically consistent between the WMS and 16S sequencing experiments, but had some discrepancies with the results of the investigation of Wu et al. 16 . Using 16S sequencing to study bile samples, these authors did not observe different biliary microbial community patterns among their subjects, and found that the bile samples had higher alpha diversities than faecal samples from healthy individuals and patients. The disagreement between their study and ours might be attributed to differences in gallstone type (cholesterol gallstones vs. BPSs), bile collection site (gallbladder vs. common bile duct), host and/or sequencing details (i.e., different 16S rRNA variable regions, V1-V2 vs. V3-V4) between the studies.
Compared to bacteria-specific 16S sequencing, biliary WMS reads were dominated by host components (> 98%). Thus, the proportion of effective bacterial reads was relatively small, similar to those of the human vaginal introitus and mid-vagina 18 . Consequently, WMS sequencing exhibited less sensitivity to uncover less-abundant bacterial species, especially for samples with low bacterial ratios. Nonetheless, our recruited patients had distinct dominant species when either WMS or 16S sequencing was used. Moreover, even with limited data, WMS sequencing provided higher resolution for bacterial species identification and gene-level analysis. Finally, we reliably identified novel biliary bacteria by using a WMS sequencing analysis workflow with strict alignment thresholds.
WMS sequencing can be employed to obtain insights into microbial functionality. For example, information on gene abundance can be obtained by metagenomic assembly or direct alignment with protein databases. As this study is the first to apply WMS sequencing to bile samples, we paid special attention to genes encoding bacterial proteins involved in gallstone formation (e.g., β -glucuronidase, phospholipase and urease), which were identified by metagenomic assembly, metabolic reconstruction and/or KEGG GENOME annotation. The results agreed with previous reports and could be indicative of the roles of these genes in gallstone pathogenesis. Bile resistance-related genes, which could be crucial for bacterial survival, were also identified by the same methods. For example, although beneficial for microbial life in the biliary microenvironment, the role of bsh is not yet fully understood 8 . Genes encoding multidrug efflux pump proteins were more prevalent than bsh in our bile samples, suggesting that the former proteins might be more favourable for bacterial colonization and overgrowth in these patients.
To explore the biliary microbial functionality further, we studied the microbiota-host interplay at the pathway level. Bile acid and inflammation in the bile duct could both induce bacterial oxidative stress responses. Bacteria employ multiple pathways (e.g., cysteine/methionine, glutathione and riboflavin metabolism) to address oxidative stress 49 . Compared to HMP stool samples, patient bile samples were enriched in pathways related to glutathione metabolism and genes related to oxidative stress resistance (i.e., glutathione reductase). Another enriched gene, the putative iron-dependent peroxidase, is also involved in the oxidative stress response. These findings might reflect a mechanism by which the biliary microbiome maintains bacterial homeostasis and redox metabolism. Other oxidative stress-related pathways were not significantly enriched, implying that glutathione metabolism might be more crucial for microbial life in the biliary microenvironment. The observed increase in flagellar assembly suggests the possible enhanced motility of biliary inhabitants. Bile samples showed enrichment of pathways related to glycolysis/gluconeogenesis, propanoate metabolism and ascorbate/aldarate metabolism, but depletion of pentose phosphate and starch/sucrose metabolism pathways. These carbohydrate metabolic pathway results suggest a specific energy source and nutrition intake mode for the biliary microbiota.
Our study elucidates aspects of the biliary microbiota for patients with BPS disease. Nevertheless, the biliary microenvironment, microbial activity and their interplay with the host require further investigation. Specifically, to obtain a comprehensive picture of the biliary microbial composition and function, more 'meta' omic' (e.g., metagenomic, metatranscriptomic, metametabolomic and metaproteomic) datasets are required. Meta' omic studies could offer enormous potential for choledocholithiasis research and clinical practice.

Conclusions
We applied WMS and parallel 16S sequencing methods to samples collected from the common bile ducts of Chinese choledocholithiasis patients. Microbial communities of these individuals were clustered into two categories: Cluster I, featuring species from the gut and oral cavity/respiratory tract, and Cluster II, featuring species from the oral cavity/respiratory tract. Given the direct connection between the bile duct and small intestine, biliary tract infections are generally believed to originate from retrograde infection of gut bacteria 55,56 . Surprisingly, we observed that oral cavity and respiratory tract inhabitants were more prevalent in bile samples than intestinal inhabitants. Thus, in addition to gut species, bacteria from the oral cavity/respiratory tract might be relevant to human biliary infection.
Using WMS sequencing, we identified 13 novel biliary bacterial species, which were of oral cavity (primarily) or environmental origin. Many patient-specific bacteria were found, implying the strong individuality of the biliary microbial community. We also identified putative genes related to gallstone formation and bile resistance. On the other hand, compared to HMP faecal samples from healthy individuals, the within-sample diversity of the bile microbial community was reduced, whereas the abundance levels of pathways involved in inflammation, oxidative stress response, flagellar assembly and membrane transport were enriched. Taken together, our results provide new insights into the biliary microbiota with respect to microbial composition and function, which could be valuable for clinical applications, such as the diagnosis and treatment of bile duct diseases.

Materials and Methods
Patients and sample collection. Bile samples were collected from the common bile ducts of 15 patients (mean age: 70.7 ± 16.8 years; 6 men, 9 women) who had been diagnosed with choledocholithiasis by computed tomography, magnetic resonance imaging, and B-mode ultrasonography at Hangzhou First People's Hospital. All patients had BPSs in the common bile duct. There were no occurrences of gallbladder gallstones or hepatolithiasis. Patients were divided into three groups (n = 5 each), according to their fever status and/or history of ERCP: group A, febrile with no ERCP history (A1, A2, A3, A4 and A5); group B, afebrile with no ERCP history (B1, B2, B4, B5 and B6); and group C, afebrile with a history of ERCP with EST of Oddi's sphincter (C1, C2, C3, C4 and C5).
A bile sample (2-5 mL) was collected from each patient during ERCP at Hangzhou First People's Hospital, and the procedure of sample collection was sterilized (for details, see Supplementary Methods). Samples were immediately placed in germ-free sputum cups and stored at − 80 °C until further processing. All patients provided written informed consent upon enrolment. The study conformed to the ethical guidelines of the 1975 Declaration of Helsinki and was approved by the Institutional Review Board of Hangzhou First People's Hospital.
DNA extraction from bile samples. For each patient, the total DNA was extracted from 400 μ L of noncentrifuged bile sample by using the Invitrogen Purelink Genomic DNA Mini Kit (Life Technologies, Carlsbad, CA, USA), following manufacturer's blood DNA extraction protocol. The DNA concentration was quantified with a Qubit 2.0 Fluorometer (Life Technologies), and quality was examined with the E-Gel electrophoresis system (Life Technologies). Taxonomic analysis of host-removed WMS sequencing data. We downloaded 2,773 complete and 6,898 draft bacterial genomes from NCBI (Jul-Aug 2014). Host-removed WMS reads were aligned to these genomes by BLASTN (parameters: perc identity = 85, word size = 16, max target seqs = 30, outfmt = 6, evalue = 0.0000001, dust = no). Alignments were further filtered by using the following criteria: (1) sequence identity ≥ 90%, (2) alignment length ≥ 80 bp, (3) paired-end reads align to the same reference genomes, and (4) one reference genome should have reads that uniquely mapped to it. Abundances of reference genomes were calculated as described 58 and normalized by genome size. For each individual, the abundances of the reference genomes were scaled to sum to one (i.e., to calculate the corresponding relative abundance). Relative abundances of species, genus and phylum were calculated by adding the relative abundances of all their taxonomic members deposited in the NCBI Taxonomy Database. Species with ≥ 0.1% relative abundance in at least one individual was considered for further analysis.

WMS sequencing.
Taxonomic analysis of 16S sequencing data. For details on quality control, chimera removal, OTU clustering and taxonomy assignment of 16S sequencing data, see Supplementary Methods. Hierarchical clustering of microbial communities. Microbial communities of samples were hierarchically clustered and visualized by the R language (version 3.1.2) gplots package (heatmap.2 function). Z-scores for microbes were calculated based on the microbial abundance distributions. Then, hierarchical clustering was performed with the complete linkage method based on the Euclidian distance between Z-scores for all possible sample pairs. HMP data retrieval. The OTU table generated by 16S sequencing and the corresponding metadata were downloaded from http://hmpdacc.org/HMQCP. Metabolic profiles by WMS sequencing, including metadata, were downloaded from http://hmpdacc.org/HMMRC. The procedure to analyse these 16S and WMS sequencing datasets was described previously 17,18 . Datasets of stool samples from healthy individuals were extracted for downstream analysis.
Microbial diversity index. Inverse Simpson and Bray-Curtis dissimilarity indices were used to calculate the within-sample alpha diversity and the between-sample beta diversity, respectively. Both indices were measured by 16S OTU counts or species-aligned WMS reads. The R language vegan package was employed for these calculations.

Statistical analysis.
Correlations between the overlapping genera of the WMS and 16S sequencing were identified by using Pearson's correlation in the R language. Wilcoxon rank-sum test (R v3.1.2) was employed to detect differences between the bile and HMP faecal samples with respect to microbial diversities and KEGG profiles. Under the condition of multiple comparisons, P-values were corrected to control the false-discovery rate by using the method described previously 59 . Differences with P < 0.05 or adjusted P < 0.05 were considered to be statistically significant.