Metagenomic analysis of the cow, sheep, reindeer and red deer rumen

The rumen microbiota comprises a community of microorganisms which specialise in the degradation of complex carbohydrates from plant-based feed. These microbes play a highly important role in ruminant nutrition and could also act as sources of industrially useful enzymes. In this study, we performed a metagenomic analysis of samples taken from the ruminal contents of cow (Bos Taurus), sheep (Ovis aries), reindeer (Rangifer tarandus) and red deer (Cervus elaphus). We constructed 391 metagenome-assembled genomes originating from 16 microbial phyla. We compared our genomes to other publically available microbial genomes and found that they contained 279 novel species. We also found significant differences between the microbiota of different ruminant species in terms of the abundance of microbial taxonomies, carbohydrate-active enzyme genes and KEGG orthologs. We present a dataset of rumen-derived genomes which in combination with other publicly-available rumen genomes can be used as a reference dataset in future metagenomic studies.

We also compared the abundance of genes encoding for specific CAZymes between species. These enzymes are responsible for the synthesis, binding and metabolism of carbohydrates. The carbohydrate esterases (CEs), glycoside hydrolases (GHs), glycosyltransferases (GTs) and polysaccharide lyases (PLs) act to degrade cellulose, hemicellulose and other carbohydrates which could otherwise not be digested by the host. Non-catalytic carbohydrate-binding modules (CBMs) bind to specific carbohydrates, increasing the efficiency of enzymatic degradation 29 . The auxiliary activities (AAs) redox enzymes are reclassified CBMs which are lytic polysaccharide monooxygenases 30 . In our samples we found the following numbers of these CAZyme families: 6 AAs redox enzymes, 39 CBMs, 14 CEs, 191 GHs, 61 GTs and 27 PLs. The ten most abundant GHs in the different ruminant species were: for cows GH2, GH3, GH31, GH97, GH28, GH51, GH43_10, GH105, GH10 and GH95; for sheep GH2, GH3, GH28, GH31, GH97, GH32, GH51, GH77, GH78 and GH95; for red deer GH2, GH3, GH31, GH97, GH77, GH32, GH51, GH109, GH28 and GH78; and for reindeer GH2, GH3, GH92, GH109, GH97, GH13, GH31, GH78, GH28 and GH77. Different ruminant species were found to have significantly differently abundant CAZyme genes (PERMANOVA: P = 1e−05, Fig. 5). However, it should be noted that the vast majority of CAZyme families were found in all sample types (Fig. 6), indicating that there exists a set of CAZymes which are present across ruminant species consuming different diets and living in vastly different conditions.  www.nature.com/scientificreports/ DeSeq2 was used to identify specific CAZymes which were significantly more abundant in one ruminant species versus another (Supplementary data S3). Those CAZymes which were consistently more abundant in specific species when compared to other species are listed in Supplementary tables S1-S4.
CAZymes are often found organised into Polysaccharide Utilization Loci (PUL) which comprise a set of genes that enable the binding and degradation of specific carbohydrates or multiple carbohydrates. We used the software PULpy to predict PULs which were present in our Bacteroidales RUGs. Of the 136 RUGs which belong to the taxonomy Bacteroidales, 112 contain putative PULs. Within these RUGs we identified 970 PULs, with numbers of PULs per RUG ranging from 1 to 35. The largest quantity of PULs originating from one RUG was 35 from uncultured Bacteroidales sp. RUG30227; these encoded a wide range of CAZymes. This RUG was more abundant in reindeer samples than samples from other ruminants. Of the 970 PULs, 332 of these were a single susC/D pair. A summary of identified PULs can be found in Supplementary data S4 and Supplementary fig S1. We also examined the abundance of genes which belonged to specific KEGG orthologs. KEGG orthologs represent a wide range of molecular functions and are defined by a network-based classification. We found that, as for CAZymes, ruminant species clustered significantly by the abundance of genes with specific KEGG orthologs (PERMANOVA: P = 1e−05, Fig. 7) and that the vast majority of orthologs were found in all ruminant species (Fig. 8). However, the large amount of orthologs (n = 729) which were only found in the two domesticated species (cows and sheep) is also worthy of note. It should also be noted that the two sheep samples did not cluster visually to the same extent as the samples originating from the other ruminant species (Fig. 7). DeSeq2 was used to identify many KEGG orthologs which were significantly more abundant in one ruminant species vs another (Supplementary data S5). Those orthologs which were consistently more abundant in specific ruminant species (Adjusted p value < 0.05) are listed in Supplementary data S6.  www.nature.com/scientificreports/

Discussion
The rumen microbiota plays a crucial role in the ability of ruminants to efficiently digest feed while the rumen microbiota and their products also have a potential use in diverse industrial applications. The ruminal microbiota of red deer and reindeer have previously been studied using 16S rRNA gene sequencing [31][32][33][34] . However, metagenomic studies of these species are limited, with only one study in reindeer 35 and no studies in red deer.
In this study we constructed 391 rumen microbial genomes from metagenomic data from cows, sheep, red deer and reindeer. We assigned taxonomies to our RUGs using GTDB-Tk rather than NCBI based taxonomies as this improves the classification of uncultured bacteria due to the use of a genome-based taxonomy 36 . We have also previously found less need to manually correct taxonomic assignments when using GTDB-Tk 21 . Our microbes predominantly belonged to the Bacteroidota and Firmicutes_A, with lesser numbers of 14 other phyla. We dereplicated our genomes alongside a superset of rumen bacterial genomes 20 and used the results output by GTDB-Tk to identify RUGs which represent novel microbial strains and species. Amongst our genomes we identified 372 novel strains and 279 novel species. These microbes were taxonomically diverse, belonging to 15 phyla. Only 31 RUGs were assigned an identity at species level.
The vast majority of our total RUGs were only present in one ruminant species. However, we found that at higher taxonomic levels taxonomies were shared between sample types. When comparing the abundance of taxonomies between samples we found that ruminant species clustered separately by both higher (kingdom and phylum) and lower (family and genus) taxonomic levels. We are aware that the sample sizes for our study are small and unequal numbers of samples were included per group, therefore any conclusions about differences between the microbiota of ruminant species should be drawn cautiously. However, our data are supportive of the hypothesis that there are host species-specific rumen microbiota at the strain and species level, potentially due to the co-evolution of the microbiome and host 37 , but that these differences do not necessarily translate into large differences in the types of CAZymes expressed. While we found that there were significant differences between the abundances of CAZymes and KEGG orthologs between ruminant species, most CAZymes and KEGG orthologs were present in all ruminant species. These findings may indicate that while the microbial strains and species present in the rumen differ between ruminant species, these microbes perform similar metabolic roles. That function is more highly conserved than taxa across samples has also been documented in humans 38 .
We also identified 970 PULs in our Bacteriodales RUGs, with numbers of PULs per genome ranging from 1 to 35. The RUG containing 35 PULs was found most abundantly in reindeer samples, emphasising the potential for the discovery of novel carbohydrate-active enzymes in lesser studied ruminant species, as also highlighted by a previous study which identified multiple PULs in metagenomic samples from reindeer 35 . Unfortunately due to the nature of our samples, with red deer and reindeer samples originating from animals eating a non-regimented diet, we are not able to provide metadata as to the exact nutritional composition of our animals' diets, therefore a more in depth analysis of dietary carbohydrates vs CAZyme/PUL abundance is not possible.
While several thousand RUGs have previously been published that originate from the rumen microbiota, the vast majority of these originate from cows. By investing more effort in exploring the metagenome of less well studied ruminants we will be able to identify a greater diversity of microorganisms and enzymes of industrial interest. In conclusion, we present a dataset of RUGs from four ruminant species which can be used as a reference dataset in future metagenomic studies and to aid in selection of microbes in culture based studies.

Methods
Ethical approval. Cow projects were carried out under Home Office PPL 30/2579. Sheep experimentation was carried out under the conditions set out by UK Home Office licence no. 604028, procedure reference number 8. Animal experiments were assessed and approved by animal ethics committees of the University of Reading (cows) and James Hutton Institute (sheep), respectively. All methods were carried out in accordance www.nature.com/scientificreports/ with the Animals (Scientific Procedures) Act of 1986. The study was carried out in compliance with the ARRIVE guidelines.
Experimental design. Reindeer (Rangifer tarandus: Grazing mixed vegetation, n = 2) and red deer (Cervus elaphus: Grazing mixed vegetation, n = 4) were shot in the wild, and ruminal digesta samples were collected immediately. Samples were taken from Holstein cows (Bos Taurus: Fed total mixed ration (once a day), n = 4) and Finn-Dorset cross sheep (Ovis aries: Grazing mixed pasture, n = 2) via a rumen cannula. Samples were taken from sheep after morning grazing. Sheep sampling was performed as described in McKain et al. 39 . Cow samples were taken 3 h post feeding. Samples were collected from the bovine rumen in the following locations: top near cannula, middle at the front of the rumen, middle towards the back of the rumen and bottom (approximately 45 cm down from the entrance to the rumen). Digesta samples were mixed with buffer containing glycerol as a cryoprotectant 39 . The mixtures were kept on ice for 1-2 h then frozen at − 20 °C. DNA extraction was performed using repeated bead beating plus column filtration, as described by Yu et al. 40 54 and sourmash (v.2.0.0) 55 searching against all public bacterial genomes. Taxonomies were assigned to MAGs using GTDB-Tk 36 . Trees produced by MAGpy were rerooted at the branch between archaea and bacteria using Figtree 56 (v.1.4.4) and visualised using GraPhlAn 24 (v.0.9.7). For submission to public repositories, our RUGs were named as the lowest taxonomic level at which NCBI and GTDB-Tk matched. The taxonomies assigned to RUGs were manually checked against the taxonomic tree and improved accordingly.
Carbohydrate-active enzymes (CAZymes) were identified using dbCAN2 (version 7, 24th August 2018) by comparing RUG proteins to the CAZy database 57 . RUG proteins were compared to the KEGG database (downloaded on Sept 15th 2018) 58-60 using DIAMOND (v0.9.21). KEGG hits for which the alignment length was ≥ 90% of the query length were retained. The likely KEGG ortholog group for each RUG protein was inferred from the DIAMOND search results and the KEGG database. CAZyme and KEGG ortholog abundances were calculated as the sum of the reads mapping to RUG proteins within each group after using DIAMOND to align reads to the RUG proteins. PULpy was used to identify polysaccharide utilisation loci 61 . Statistics and reproducibility. Statistical analyses were carried out within R (version 3.5.1). The ggplot2 62 package was used to construct scatter plots and NMDS graphs. The vegan package 63 was used to create NMDS axes using the Bray-Curtis dissimilarity. The Adonis function from the vegan package was used to perform PERMANOVA analyses and DeSeq2 64 was used to calculate differences in coverage for individual CAZymes, KEGG orthologs and RUGs. UpSet graphs were constructed using the UpSetR package 65 . Taxonomies were assigned to paired sequence reads with Kraken 66 using a custom kraken database consisting of RefSeq complete genomes with our RUGs and the rumen superset 20 added. Prior to statistical analyses (excluding DeSeq2) and graph construction, data was subsampled. For RUGs, subsampling to the lowest sample coverage was performed. CAZymes and KEGG orthologs were subsampled to the lowest sample abundance. Significant P values were defined as P < 0.05.

Data availability
The paired-read fastq files supporting the conclusions of this article are available in the European Nucleotide Archive repository (https ://www.ebi.ac.uk/ena/brows er/view/PRJEB 34458 ). The RUG fasta files supporting the conclusions of this article are available in the Edinburgh DataShare repository (https ://doi.org/10.7488/ds/2640).