An Aberrant Microbiota is not Strongly Associated with Incidental Colonic Diverticulosis

Colonic diverticula are protrusions of the mucosa through weak areas of the colonic musculature. The etiology of diverticulosis is poorly understood, but could be related to gut bacteria. Using mucosal biopsies from the sigmoid colon of 226 subjects with and 309 subjects without diverticula during first-time screening colonoscopy, we assessed whether individuals with incidental colonic diverticulosis have alternations in the adherent bacterial communities in the sigmoid colon. We found little evidence of substantial associations between the microbial community and diverticulosis among cases and controls. Comparisons of bacterial abundances across all taxonomic levels showed differences for phylum Proteobacteria (p = 0.038) and family Comamonadaceae (p = 0.035). The r-squared values measuring the strength of these associations were very weak, however, with values ~2%. There was a similarly small association between the abundance of each taxa and total diverticula counts. Cases with proximal only diverticula and distal only diverticula likewise showed little difference in overall microbiota profiles. This large study suggests little association between diverticula and the mucosal microbiota overall, or by diverticula number and location. We conclude that the mucosal adherent microbiota community composition is unlikely to play a substantial role in development of diverticulosis.


Results
We evaluated the role of the microbiota in colonic diverticulosis among 226 patients with diverticulosis and 309 diverticulosis-free controls. As previously reported 11 , participants with diverticula were more likely to be older, male, and have a higher body mass index than those without diverticula (Table 1).
In general, we found very limited to no associations between the microbiota profiles and the presence of diverticulosis. Across taxonomic levels, Shannon diversity was only significantly associated with diverticulosis case-control status at the class level p = 0.012 (FDR corrected Wilcoxon), but with an associated effect size of <1% (r-squared from Pearson correlation). Similarly, the only association between richness and diverticulosis case/ control was at the class level (p = 0.011; FDR corrected Wilcoxon) again with a r-squared <1%, (Supplementary Table 1, Fig. 1). Likewise, multidimensional scaling ordination (MDS) (Fig. 2) revealed no statistically significant differences between diverticulosis cases and diverticula-free controls. We also performed analysis at each phylogenetic level to test the null hypothesis of no association of each taxon with the presence of diverticulosis (Supplementary Tables 2-6). Across all taxonomic levels, phylum Proteobacteria and family Comamonadaceae were the only two taxa that had significant associations at a 5% FDR threshold ( Table 2). Even for these taxa, the r-squared values measuring the strength of the association were very weak with values ~2% (Table 2, Fig. 3). We conclude that even though our large sample size allowed us to find some associations, the strength of these associations is very modest despite over 500 total patients in our cohort.
We were concerned that the lack of association might be because of the coarse assignment of case-control to patients who might have a range of disease severity. We therefore compared the abundance of each taxa to the total count of diverticula from each patient. At an FDR-adjusted threshold of p < 0.05, three taxa (Table 3) were significantly associated with the diverticula count, but again the effect size were very modest with r-squared values ~1%. We conclude that using the diverticula count rather than a binary case-control assignment did not substantially improve our power.
We next asked whether the location of the diverticula made a difference. We separately examined the subset of patients who had diverticula in only the distal or only the proximal colon. At a 5% FDR cutoff, there were only two taxa across all taxonomic levels (genus Hallella and Delftia) that showed significant differences in patients    Table 1).    Table 5; n = 135 distal, n = 14 proximal). For both of these taxa, the r-squared value of the association with location was <4%. We conclude that diverticula location did not have a strong effect on the microbial community, although we may have limited power to address this question due to the small number of patients with only proximal diverticula. In addition to diverticulosis, we examined associations with a number of patient metadata (Supplemental Tables 2-5). Associations with sex and race were slightly stronger than the associations with diverticulosis. There were 25 significant taxa associated with sex (Supplemental Table 8) and 40 taxa associated with ethnicity (Supplemental Table 9) at a 5% FDR. While these hits are stronger associations than we saw with diverticulosis, they were quite modest with r-squared values of 2-3% and no taxa showing an r-squared of >6%. Correlations with waist circumference were much more modest with only two significant taxa (phylum Verrucomicrobia and genus Asaccharobacter) both of which had r-squared values of 5%. Only one taxa (class "Deltaproteobacteria") was significantly associated with age (p < 0.05). We conclude that, as has been observed in other large cohorts 12,13 , associations of patient metadata with the composition of the microbiota are modest.

Discussion
Colonic diverticulosis is common and the complications are costly. Because complications such as diverticulitis can only occur in patients with diverticulosis, if we could uncover the etiologic risk factors for diverticula, we could potentially prevent complications. In this large study, we found little to no difference in microbial composition between individuals with and without diverticula. Based on the large size of this study and the small effect sizes we observed, it is not likely that changes in bacterial relative abundance are responsible for the development of colonic diverticula. In addition, the presence of diverticulosis does not alter the microbial composition to a significant degree.
Although bacteria have been associated with a number of gastrointestinal disorders, prior information on a bacterial etiology for colonic diverticula is limited. A pilot study of 38 subjects from Italy examined bacteria profiles in feces and mucosal biopsies 10 . Compared to controls, the patients with diverticulosis had a lower relative abundance of Clostridium cluster IV bacteria, although the difference was not statistically significant. The general microbiota composition in colonic biopsies showed no significant differences between controls and diverticulosis patients. There was a lower abundance of Enterobacteriaceae in the diverticulosis cases compared to controls and a non-significant higher abundance of Bacteroides/Prevotella.
It should be stressed that this was a study assessing the microbiome of patients with incidental colonic diverticula. This is not a study of the microbiome in patients with complications of colonic diverticulosis. While a proportion of our population reported symptoms of irritable bowel syndrome and chronic abdominal pain, there is    no evidence that these symptoms are associated with colonic diverticulosis, so called symptomatic uncomplicated diverticular disease (SUDD). Our group recently published a colonoscopy-based study that found no association between colonic diverticulosis and chronic gastrointestinal symptoms or mucosal inflammation 14 . As such, we did not assess the microbiome in patients with colonic diverticulosis and chronic symptoms. While we found no differences in the gut microbiota between individuals with asymptomatic diverticulosis (AD) and healthy controls, diverticulosis represents a continuum in the progression to diverticular disease. Therefore, we cannot exclude the role of the gut microbiota in the disease progression. Several small studies have reported alterations in the gut microbiota in SUDD patients [15][16][17] . Tursi et al. 18 evaluated the fecal microbiota in SUDD patients, diverticulosis patients and healthy controls. They found no overall differences in bacterial abundances between the three groups but the levels of fecal Akkermansia muciniphila was significantly higher in diverticulosis and SUDD patients. Another study found higher bacterial diversity and increased abundance of Proteobacteria in diverticulitis patients compared to controls 15 . One study assessed bacteria and fungi in diverticulitis tissue from the sigmoid colon and adjacent unaffected tissue. They observed an enrichment of Microbacteriaceae and Ascomycota in diverticulitis tissue 17 suggesting that the diverticulum microbiota may be different from adjacent mucosa. These studies implicate the gut microbiota in diverticulitis, but larger studies are needed to confirm their findings. In our study, we assessed the gut microbiota (bacteria) but we did not evaluate the fungal mycobiome because it is an emerging field that was not well characterized until recently.
Our large sample size revealed some borderline significant associations, but there was little evidence of a strong association with diverticulosis. As with any negative results, we might have seen stronger association with different methods (RNA-seq, metabolomics, whole-genome metagenome shotgun sequencing). If we had corrected for multiple hypothesis testing including all hypotheses in one correction, nothing in our paper would have been significant. This again emphasizes the modest nature of the associations that we observed.
We chose to examine mucosal adherent bacteria from biopsies rather than feces. It was logistically simple and safe to obtain biopsies from patients during their colonoscopy. More importantly, although there are known differences in the bacterial composition of feces and mucosal biopsies 19 , we reasoned that the adherent bacteria would be more likely to influence the colonic mucosa. All patients in the study underwent a colonoscopy prep that could change the bacterial composition. Adherent bacteria are less influenced by a purge and all patients in the study were prepped 20 .
This paper has notable strengths. All subjects underwent their first colonoscopy for screening purposes rather than colonoscopy for symptoms that might be associated with diverticulosis. We systematically recorded diverticula from all colon segments. Mucosal associated bacteria were evaluated from biopsies from the sigmoid colon. The biopsies were handled in a uniform manner by technicians who were blinded to diverticulosis status. Importantly, the sample size was very large.
Because the patients were drawn from a single academic medical center in the US, the results may not be widely generalizable. The pilot study by Barbara et al. reported differences in the microbial composition in symptomatic uncomplicated diverticular disease patients compared to normal controls 10 . Our study was cross sectional. If we had found substantial differences in the bacterial composition of the diverticulosis subjects compared to controls, one might question whether the differences were a consequence of the diverticula and not a cause. In the absence of pronounced differences in composition, however, this is not a concern. The sensitivity of colonoscopy for diverticulosis is not known. Endoscopists in this study were aware of the study and were accompanied by a research assistant who prompted them to report diverticula in each colon segment. Consequently the sensitivity is likely to better than during a clinical exam, but some diverticula are likely to have been overlooked. However, in analyses where we included the number of diverticula, we still found no differences.
In summary, in a large study of individuals undergoing screening colonoscopy, we found little evidence of an association between adherent microbial communities and diverticulosis. Alterations in colon bacterial community composition are unlikely to be responsible for the development of colonic diverticulosis. Furthermore, the presence of diverticulosis does not appear to alter the microbial composition of the colon.

Methods
Participants. This cross-sectional study was designed to assess factors associated with colonic diverticulosis (NIH R01DK094738). Details of the study methods have been described previously 7,11 . Briefly, 226 case subjects with one or more diverticula and 309 controls without diverticula were drawn from outpatients undergoing first time screening colonoscopy at the Meadowmont Ambulatory Endoscopy Center, University of North Carolina Hospitals, Chapel Hill, North Carolina. The study included consented subjects 30 years and older who had satisfactory colonoscopy preparation and complete examination to the cecum. The study excluded those with a history of previous colon resection, or a prior diagnosis of polyposis, colitis, colon cancer, diverticulosis or diverticular disease.
Endoscopists carefully examined the colon for diverticula in all segments and the results were recorded on special data collection forms. The number of diverticula in each segment of the colon (cecum, ascending, transverse, descending, sigmoid) was recorded and the number summed to indicate the total number of diverticula observed. Biopsies were taken adjacent to sigmoid diverticula when present or from the mid sigmoid in subjects with no diverticula. The biopsies (approximately 3-4 mm in diameter) 21 were obtained using standard (8 mm. wing) disposable, fenestrated colonoscopy forceps. Two biopsies obtained for microbiota profiling were rinsed in sterile PBS prior to freezing in liquid nitrogen to avoid contamination with fecal bacteria 22 . Laboratory personnel were blinded to clinical information and diverticulosis status of subjects. The study was approved by the University of North Carolina Office of Human Research Ethics. All participants gave informed consent. Enrollment of participants and laboratory experiments were performed in accordance with the relevant guidelines and institutional regulations. DNA Extraction, PCR and sequencing. We extracted bacterial genomic DNA from mucosal biopsy specimens as previously described 23,24 . Briefly, normal biopsies from each patient were placed in lysozyme for 30 minutes followed by bead beating and DNA extraction (Qiagen DNeasy Blood and Tissue, kit cat # 69504). The DNA fractions were eluted in 30 μl of elution buffer and stored in aliquots at −20 °C.
Illumina library creation was performed using two separate PCR reactions according to a previously published protocol 25 . The first-step PCR (PCR1) contained primers designed to amplify the V2 region of the 16S bacterial rRNA gene and Phusion High-Fidelity Master Mix (Life Technologies, Carlsbad, CA). PCR1 product was diluted 20-fold and used as a template for second-step PCR (PCR2). PCR2 primers contained an Illumina index barcode sequence, Illumina adapter sequence and a tag sequence. There were two sets of PCR2 primers, and each PCR2 reaction received one of each, resulting in a dual-indexed product. One reaction was performed for each sample using Phusion High-Fidelity Master Mix.
PCR product was visualized by E-Gel 96 to check samples for amplification. All samples with positive amplification were normalized to 25 ng/µl using the SequalPrep Normalization Kit (Life Technologies, Carlsbad, CA), and an equal volume of each sample library was pooled followed by cleaning using AxyPrep Mag Beads 25 . The pool was stored at −20 °C, then shipped to the University of Maryland Institute for Genome Sciences for sequencing using the Illumina MiSeq protocol 25 . Appropriate positive and negative controls were included in all sample preparation steps. A pooled sample of known bacteria served as positive control.
Sequence processing and statistical analysis. Although producing adequate DNA can be challenging from biopsy samples, >90% of these samples had at least 1,000 reads assigned by different taxonomy algorithms (Table 4, Suppl. Figure 1) and these samples were used for downstream analysis at each taxonomic level. Forward reads were de-multiplexed and ran through version 2.10.1 of the RDP classification algorithm 26 . at a 50% confidence score (Table 1) or pick_closed_reference_otus.py script in QIIME 1.91. Read counts were log normalized as previously described 20 .
The alpha-diversity and richness measurements were performed using the functions "diversity" and "rarefy" from the vegan package in R, with the subsample size of "rarefy" set to the minimum number of sequences detected in any sample. MDS ordination was performed with Bray-Curtis dissimilarity using the vegan package in R. Log-normalized abundance values for each taxon at the phyla, class, order, family and genus levels (RDP algorithm) or OTU were evaluated with a series of linear models and non-parametric tests. P-values were corrected for multiple hypothesis testing using B & H FDR correction 27 with correction occurring separately for each test at each taxonomic level. To preserve power, statistical tests were only constructed for taxa that were present in at least 25% of all samples. All linear models and statistical tests were conducted in R. The R code used is available here: https://github.com/afodor/metagenomicsTools/blob/master/src/scripts/topeOneAtATime/metadataTests.txt Each linear model took the form of: Where "Y" is the alpha-diversity, richness, MDS axis or log normalized abundance and the metadata is the case/ control status (for a two-factor one-way ANOVA), sex (for a two-factor one-way ANOVA), or race (white, black or other for a three-factor one-way ANOVA) or tics count (for a linear regression) or waist circumference (for a linear regression). As indicated in the text, non-parametric equivalents to linear models were used to generate p-values including the Wilcoxon test for two-factor metadata, Kruskal-Wallis test for multi-factor metadata, and the Kendall test for association of two quantitative variables.
In order to ensure that our results were not a consequence of our use of the RDP algorithm, we performed t-tests comparing case and control status for each taxa at the genus level with both the RDP algorithm and with the OTUs from the QIIME pipeline. The inference produced from these two classification schemes was highly concordant (Supplementary Fig. 1) demonstrating that our results are robust to our choice of classification scheme.
Data Availability. The datasets generated from this study are available from the corresponding author on request. Raw sequences are available in the NCBI SRA data repository via submission SUB3467354 under Bioproject PRJNA429136.  Table 4. Number of sequences identified by the RDP classification algorithm*. *The number of sequences identified by the RDP classification algorithm at a threshold of 50% (for phylum through genus) or were assigned to an OTU in QIIME 1.91. Almost all 226 case and 309 control samples had at least 1,000 sequences per sample (last column) and these samples were used for analysis at each phylogenetic level.