Gut microbiome of the largest living rodent harbors unprecedented enzymatic systems to degrade plant polysaccharides

The largest living rodent, capybara, can efficiently depolymerize and utilize lignocellulosic biomass through microbial symbiotic mechanisms yet elusive. Herein, we elucidate the microbial community composition, enzymatic systems and metabolic pathways involved in the conversion of dietary fibers into short-chain fatty acids, a main energy source for the host. In this microbiota, the unconventional enzymatic machinery from Fibrobacteres seems to drive cellulose degradation, whereas a diverse set of carbohydrate-active enzymes from Bacteroidetes, organized in polysaccharide utilization loci, are accounted to tackle complex hemicelluloses typically found in gramineous and aquatic plants. Exploring the genetic potential of this community, we discover a glycoside hydrolase family of β-galactosidases (named as GH173), and a carbohydrate-binding module family (named as CBM89) involved in xylan binding that establishes an unprecedented three-dimensional fold among associated modules to carbohydrate-active enzymes. Together, these results demonstrate how the capybara gut microbiota orchestrates the depolymerization and utilization of plant fibers, representing an untapped reservoir of enzymatic mechanisms to overcome the lignocellulose recalcitrance, a central challenge toward a sustainable and bio-based economy.

4. On page 5, line 116: specify the extended form for GTDB. 5. The authors suggest that the cellulose degradation in the capybara gut might be mainly accomplished by endo-beta-1,4-glucanacases (EC 3.2.1.4) from families GH5, GH8, GH9 and GH45. However, the abundance of these GH families is quite low (Figure 2). Is this a common feature in other microbiota animals that eat grass or it is only found in capybara?
6. On page 7, line 179: remove "as reported in". 7. Please, mention the software used to perform the prediction of the GHXXX family structural topology (Suppl. Fig. S7 and page 9, line 246).
8. Can the authors explain why the catalytic efficiency of the single GH10 domain is better than the full-length protein against rye arabinoxylan? 9. Figure 6 or the text should be reorganized. In the text, the binding experiments of CapCBMXX are described before the crystal structure and in the Figure the other way around.
Reviewer #3 (Remarks to the Author): To the authors: The original research manuscript entitled "Gut microbiome of the largest living rodent harbors unprecedented enzymatic systems to break down complex plant polysaccharides", shows the multiomics analysis (16S rRNA, metagenomics, metatranscriptomics and metabolomics) of capybara gut microbiome in cecum and recto. The authors describe the microbial composition and enzymatic systems CAZymes and PULs from Bacteroidetes along the metabolic pathways involved in the conversion of dietary cellulose to SCFA.
General comments: The manuscript is well written, and the authors demonstrate especially the importance of CAZymes families from Bacteroidetes in the gut microbiome of capybara rodent. These novel findings along the complete description of the microbial composition are of interest for the scientific community. However, I have major concerns about the research.
Major comments: 1. I did a research about the state of the art about the capybara gut microbiome, and according to it there are few works already that have describe the bacterial diversity of this specie (DOI: 10.1007/s00248-011-9963-z, 10.1128. However, none of these works have been referenced or discussed, what are the novel findings in relation to these studies as it seems that the capybara gut microbial composition has been already studied?
2. In relation with my previous comment, there is a version of the current manuscript in a pre-print repository. Does it mean that authors have sent the same work for two journals?
3. I am concern about the sample size and that the manuscript seems very descriptive, always comparing to what has been found but not with experiments. How accurate are both issues compared to the reality? It would be good to add a line in the text to justify the number of samples and their comparison with published data. 9. Line 159, GH5_25 and _37 do not appear in Fig 2. 10. Line 195. Please define CCs 11. Lines 204-206. I do not agree that this is the metabolomic fingerprint of the capybara gut microbiome. It should be stated that the metabolites were obtained from the polar part after a Folch extraction. So, the metabolic analysis is only done in one part of the microbiome metabolism.
12. I suggest that for all the figures in the main text and in supplementary, to add in all the number of samples, replicates, and description of all abbreviations in the figure so that the reader without seen the text could understand the picture. For example, add in the text what does KM and Kcat mean.
13. Same as before, this should be done the same for the tables, explaining all the abbreviations in a foot note. For example, what does the e-value means and how was calculated?
14. Line 355. Use the abbreviations MG and MT 15. Line 516. If all the supernatant was transferred and evaporated, this means that volume was not controlled between samples and the analytical response for each sample was different and can not be comparable.
16. Line 630-638, include in a repository the data from metabolomics analysis 17. Figure 4. It has not been described how metabolomic reconstruction was performed, please give all the details.
The aim of the study is to understand the enzymatic and metabolic pathways employed by the gut microbiota facilitating the breakdown and utilization of recalcitrant dietary polysaccharides. The study animals are monogastric herbivorous capybara, famous for their ability to depolymerize and utilize lignocellulosic biomass through microbial symbiotic mechanisms. The study is based on samples obtained from three euthanized females that were euthanized as a measure of management of Rocky Mountain Spotted Fever (RMSF) hosts. The microbiome (16S rRNA sequencing, V4 region), the metagenome and the metatranscriptome of samples from cecum and recto were analyzed, and combined with carbohydrate enzymology and X-ray crystallography to elucidate how capybara are able to convert the hard-to digest dietary fibers into short-chain fatty acids to cover their energetic demands. Overall, these is a highly interesting paper that would fit very well into the journal.
I have one major concern, and this relies on the fact that the three study individuals might have been infected with Rickettsia. You might argue most wildlife is infected with something -I agree. However, usually sample sizes are much higher.
The gut microbiome takes up many of these tasks and constant direct and indirect molecular crosstalk between the genomes of interacting hosts and symbionts maintain a stable gut microbial community, optimise their functionality and buffer against disturbances. Radical changes in the commensal microbial community are, however, I am not an expert in CAZymes / X-ray / structural biology, and metabolomics aspects, and will focus on the animal microbiome perspective; especially, given that the study appears to use animals that may have been infected with Rocky Mountain Spotted Fever (RMSF), a bacterial disease spread by ticks, caused by Rickettsia rickettsia (Proteobacteria). Yet, I agree that this is not the focus of this paper, one might crosscheck how a potential infection could affect the presented results, especially given the small sample size (for a microbiome study).
Line 95, I don't think that the 16S marker used to sequence bacteria would also be used for Archaea and Fungi.
Line 100ff, Fig. 1: Only three correlations out of six potential correlations are reported.
Why are all correlations with only 16S missing? Additionally, there seems to be a shift in the abundance and at least the ranking should be different. Did you test for it? Could you check for the presence of Rickettsia among the Proteobacteria?  Using zOTUs is not common. Why don't you report ASVs, as usual in most recent paper. Would QIIME give you different results?

REVIEWERS' COMMENTS:
Reviewer #1 (Remarks to the Author): In this manuscript, Cabral et al. provide a tour de force analysis of a mammalian gut microbiome, going from metagenomic analysis to X-ray crystallography. Along the way, they discover new subfamilies of carbohydrate binding modules (CBMs) and glycoside hydrolases (GHs). The manuscript is generally well written and easy to read, if not a bit wordy and in need of a little brushing up on the English. I have no major concerns with this paper, only minor comments, including: R: Thank you for the positive comment and for taking the time to revise our work.
1. The use of the word "deconstruction" (e.g., in line 34 but recurs throughout the text) is not ideal; probably better to use "degradation" or "depolymerization." R: Thank you for the suggestion. It was modified in the revised version of the manuscript.
2. "Exploring the genomic dark matter..." is a wonderful turn of phrase but perhaps a little much for a scientific manuscript.

R: Thank you for pointing this out. It was modified in the revised version of the manuscript.
3. Some abbreviations are used without definition (e.g., MG and MT in line 96). 4. In lines 130-137, are these differences statistically significant?
R: This part of the text was removed after the changes in this section following the comments from other reviewers to become this section less descriptive.

In lines 333-335, is this shown or just inferred?
R: We have biochemically tested each enzyme comprising the cluster CC102. Given that the CapGH97 is a calcium-activated α-galactosidase, CapGH43_12 is a highly active α-L-arabinofuranosidase, and the CapCBMXX-GH10 is an endo-β-1,4-xylanase coupled with the novel CBM targeting xylan, we have inferred that they would complementarily act on heteroxylans with substitutions of α-galactosyl and α-Larabinofuranosyl moieties as show in the current Figure 7e.

R: It was modified in the revised version of the manuscript. Now it reads "
"The understanding of the enzymatic and metabolic mechanisms employed by these microbial communities to obtain energy from plant fibers may unveil alternative biological systems for the conversion of these lignocellulosic agro-industrial residues into value-added products and create new opportunities for carbohydrate-based biotechnological applications." 7. In lines 346-348, some comment on the relative timescales of microbiome shaping versus evolution would be useful here.
R: Thanks for the suggestion. We have modified the manuscript to include the following information in the discussion section.
"These semi-aquatic animals are hindgut fermenters throughout found in Pantanal wetlands and Amazon basin and, in particular, such animals dwelling the Piracicaba basin region in Brazil have incorporated sugarcane in their diet for decades, a relevant timescale for microbial adaptation and specialization41, reasoning that this microbiota has been shaped and optimized for energy extraction21 from this industrially relevant lignocellulosic biomass. Sugarcane is an important feedstock for Brazilian economy and other countries such as India and Thailand. Two thirds of this crop are made of lignocellulose, which currently is left in the field (straw) and burnt for energy purposes (electricity and vapor). The understanding of the enzymatic and metabolic mechanisms employed by these microbial communities to obtain energy from plant fibers may unveil alternative biological systems for the conversion of these lignocellulosic agro-industrial residues into value-added products and create new opportunities for carbohydrate-based biotechnological applications." Reviewer #2 (Remarks to the Author): Cabral and colleagues elucidated the microbial community composition of capybara gut, the largest living rodent that processes lignocellulosic biomass. The authors also unveiled the enzymatic systems and metabolic pathways involved in the conversion of recalcitrant dietary fibers into short-chain fatty acids using a multidisciplinary approach, including multi-meta-omics and enzymology. The work also provides the identification and structural characterization of two novel CAZy families: a glycosyl hydrolase (GH) and a carbohydrate binding module (CBM). Because of its novelty and importance, I recommend publishing this manuscript in Nature Communications. R: We do appreciate the positive comment and for taking the time to revise our manuscript.
Please, find below some comments and suggestions: Major: 1. Please, explain in more detail what are the main differences between the new GH family from the distantly related GH5 and GH30 families using sequence alignments, secondary structure comparisons and enzymatic activities. Supplementary Fig. 9-11 and Supplementary Table 9 were included to support these analyses.

R: Thank you for the suggestion. These analyses were included in the revised version of the manuscript (please see below) and the
"Protein modelling and threading performed using RoseTTAFold 37 and PDBsum 38 , respectively, revealed that CapGHXXX consists of a (α/β) 8 -barrel structure ( Supplementary Fig. 8), which is an archetypal scaffold of the clan GH-A. According to structural predictions, CapGHXXX exhibits a two-domain architecture including an appended β-sandwich domain ( Supplementary Fig. 9), which is a similar structural organization found in the GH30 family. With the exception of the residues defining the clan GH-A, sequence alignment with GH5 and GH30 members revealed a very low sequence conservation below the criterium for significant similarity detection (using an e-value < 0.05), demonstrating that although the domains in the tertiary structure can be similar, the sequences between these families are remarkably diverse (Supplementary Fig. 9-11 and Supplementary Table 9). To further explore the GHXXX family, the enzyme BXY_26070 (SEQ_ID CBK67650.1) from B. xylanisolvens, which shares 46% sequence identity with CapGHXXX, was also heterologously expressed, purified and biochemically characterized ( Table 1). The two members characterized from the GHXXX family present β-galactosidase activity, which is not described in either GH30 or GH5 families, strengthening at biochemical level the establishment of this new GH family."

Current Supplementary Fig. 9: Structural comparison between CapGHXXX model and (a) GH5 and (b)
GH30 members. GH5 members are the endo-β-1,4-mannanases from Streptomyces thermolilacinus NBRC14274 (StMan, subfamily GH5_8, PDB code 3WSU, rmsd 3.27 Å, in cyan 91 ) and from Streptomyces sp. SirexAA-E (SACTE_2347, subfamily GH5_8, PDB code 4FK9, 4.41 Å rmsd, in green 92 ). GH30 members are the endo-β-1,4-xylanase from Ruminiclostridium papyrosolvens C71 (CpXyn30A, subfamily GH30_8, PDB code 4FMV, rmsd 2.07 Å, in blue 93 ) and the glucuronoarabinoxylan-specific endo-β-1,4-xylanase from Erwinia chrysanthemi (XynA, subfamily GH30_8, PDB code 1NOF, rmsd 2.12 Å, in salmon 94 ). c-d, Identification of conserved clan GH-A residues in the CapGHXXX model. In panel (c), CapGHXXX model is in orange and the GH5 endo-β-1,4-mannanases StMan (PDB code 3WSU) and SACTE_2347 (PDB code 4FK9) are in cyan and green, respectively. In panel (d), CapGHXXX model is also in orange and the GH30 CpXyn30A and XynA are in blue and salmon, respectively. The residues E305 and E207 correspond to the nucleophile and acid/base, respectively (inferred by structural superposition). Structural alignment was performed using the "super" command in Pymol (The PyMOL Molecular Graphics System, Schrödinger, LLC, New York). Fig. 10: Multiple sequence alignment between GHXXX members (CapGHXXX and CBK67650 (BXY_26070)) and the GH5 members, described in Supplementary Table 9. Red triangles indicate the conserved residues from Clan GH-A and the inferred catalytic residues. ClustalΩ from MPI Bioinformatics 95 was used to generate the multiple sequence alignments, with manual adjustment based on conserved residues according to structural comparisons. ESPript 3.0 96 was used to generate the alignment image. Fig. 11: Multiple sequence alignment between GHXXX members (CapGHXXX and CBK67650 (BXY_26070)) and the GH30 members, described in Supplementary Table 9. Red triangles indicate the conserved residues from Clan GH-A and the inferred catalytic residues. The orange line represents the β-sandwich domain found in GH30 members. ClustalΩ from MPI Bioinformatics 95 was used to generate the multiple sequence alignments, with manual adjustment based on conserved residues according to structural comparisons. ESPript 3.0 96 was used to generate the alignment image. 2. Please, expand the overall discussion further -e.g. relevance of the findings to the industrial application of sugarcane processing? This application is mentioned in the introduction and just briefly (one sentence) in the discussion.

R: We have modified the manuscript to expand this aspect in the results and discussion sections (please see below).
In the result section: "As presented in the taxonomic analysis, Bacteroidaceae bacterium MAG57 represents a novel genome with a remarkable number of CAZyme-encoding genes including a gene cluster targeting arabinoxylan (CC102), an abundant hemicellulose in secondary cell walls of sugarcane and other grasses. This cluster encodes two exo-enzymes from families GH43 and GH97, and an unconventional GH10 member with an unknown 45 kDa N-terminal domain (Fig. 7a)." In the discussion section: "These semi-aquatic animals are hindgut fermenters throughout found in Pantanal wetlands and Amazon basin and, in particular, such animals dwelling the Piracicaba basin region in Brazil have incorporated sugarcane in their diet for decades, a relevant timescale for microbial adaptation and specialization41, reasoning that this microbiota has been shaped and optimized for energy extraction21 from this industrially relevant lignocellulosic biomass. Sugarcane is an important feedstock for Brazilian economy and other countries such as India and Thailand. Two thirds of this crop are made of lignocellulose, which currently is left in the field (straw) and burnt for energy purposes (electricity and vapor). The understanding of the enzymatic and metabolic mechanisms employed by these microbial communities to obtain energy from plant fibers may unveil alternative biological systems for the conversion of these lignocellulosic agro-industrial residues into value-added products and create new opportunities for carbohydrate-based biotechnological applications." Minor: 1. In the abstract the sentence "combination of unique enzymatic mechanism from Fibrobacteres to degrade cellulose with a broad arsenal of CAZymes organized in PULs from Bacteroidetes" is confusing. Please, re-write.
R: Thank you for the suggestion. The abstract was modified accordingly (please see below).
"In this microbiota, the unconventional enzymatic machinery from Fibrobacteres seems to drive cellulose degradation, whereas a diverse set of Carbohydrate-Active enZymes (CAZymes) from Bacteroidetes organized in polysaccharide utilization loci (PULs) are accounted to tackle complex hemicelluloses typically found in gramineous and aquatic plants.

R: This information was included in the revised version of the manuscript.
5. The authors suggest that the cellulose degradation in the capybara gut might be mainly accomplished by endo-beta-1,4-glucanacases (EC 3.2.1.4) from families GH5, GH8, GH9 and GH45. However, the abundance of these GH families is quite low (Figure 2). Is this a common feature in other microbiota animals that eat grass or it is only found in capybara?  (Terry et. al, 10.1139(Terry et. al, 10. /cjas-2019. In our data, these enzymes represent ~2.8% of the total identified CAZymes.

R: It was removed.
7. Please, mention the software used to perform the prediction of the GHXXX family structural topology (Suppl. Fig. S7 and page 9, line 246).
R: During the paper evaluation period, we updated the structural model of the GHXXX using RoseTTAFold, an improved deep learning-based modeling method (Baek et al., 2021). The predictor generated a model with 0.74 of confidence, and the protein topology was obtained using PDBsum (Laskowski et al., 2018). This information and an updated version of the Supplementary Fig 7 (current Supplementary Fig 8) (Kari et al., 10.1038/s41467-021-24075-y). Another factor that may negatively affect the turnover is the reduced diffusion rates of the full-length in relation to the isolated catalytic domain, which has a smaller gyration radius and a more compact globule-like structure. The kinetic characterization of several truncated versions of glucanases, xylanases and mannanases also verified that the accessory domain negatively affects the catalytic rates on soluble substrates (Wen et al., 10.1021/bi0500630, Marrone et al., 10.1093/protein/13.8.593, Santos et al., 10.1016/j.jsb.2011.1042/BJ20110869). 9. Figure 6 or the text should be reorganized. In the text, the binding experiments of CapCBMXX are described before the crystal structure and in the Figure  10. On page 12, line 328. GH28 active site or GH10 active site?
R: It is indeed the GH28 active site. We have modified the manuscript, as shown below, to clarify this point.
"It is worth to mention that two aromatic residues considered critical for the activity of a GH28 member are present in the corresponding region of the CapGH10 β-helix domain, Y193 and Y279; however, their alanine mutation did not alter the carbohydrate binding, in agreement with the lack of catalytic function of the CapGH10 β-helix domain (Supplementary Figs. 16 and 18)." Reviewer #3 (Remarks to the Author): To the authors: The original research manuscript entitled "Gut microbiome of the largest living rodent harbors unprecedented enzymatic systems to break down complex plant polysaccharides", shows the multiomics analysis (16S rRNA, metagenomics, metatranscriptomics and metabolomics) of capybara gut microbiome in cecum and recto. The authors describe the microbial composition and enzymatic systems CAZymes and PULs from Bacteroidetes along the metabolic pathways involved in the conversion of dietary cellulose to SCFA.
General comments: The manuscript is well written, and the authors demonstrate especially the importance of CAZymes families from Bacteroidetes in the gut microbiome of capybara rodent. These novel findings along the complete description of the microbial composition are of interest for the scientific community. However, I have major concerns about the research.