Cultivation and sequencing of rumen microbiome members from the Hungate1000 Collection

Rumen microbiome biology gets a boost with the release of 410 high-quality reference genomes from the Hungate1000 project. Supplementary information The online version of this article (doi:10.1038/nbt.4110) contains supplementary material, which is available to authorized users.

r e s o u r c e and archaeal species that have been cultivated from the rumens of a diverse group of animals 10 . We surveyed Members of the Rumen Microbial Genomics Network and requested they provide cultures of interest. We supplemented these with additional cultures purchased from culture collections to generate the most comprehensive collection possible. These cultures are available to researchers, and we envisage that additional organisms will have their genome sequences included as more rumen microbes are able to be cultivated.
Large-scale reference genome catalogs, including the Human Microbiome Project (HMP) 11 and the Genomic Encyclopedia of Bacteria and Archaea (GEBA) 12 have helped to improve our understanding of microbiome functions, diversity and interactions with the host. The success of these efforts has resulted in calls for continued development of high-quality reference genome catalogs 13,14 , and led to a resurgence in efforts to cultivate microorganisms [15][16][17] . This high-quality reference genome catalog for rumen bacteria and archaea increases our understanding of rumen functions by revealing degradative and physiological capabilities, and identifying potential rumen-specific adaptations.

Reference rumen genomes
Members of nine phyla, 48 families and 82 genera (Supplementary Table 1 and Supplementary Note 1) are present in the Hungate Collection. The organisms were chosen to make the coverage of cultivated rumen microbes as comprehensive as possible 10 . While multiple isolates were sequenced from some polysaccharide-degrading genera (Butyrivibrio, Prevotella and Ruminococcus), many species are represented by only one or a few isolates. 410 reference genomes were sequenced in this study, and were analyzed in combination with 91 publicly available genomes 18 . All Hungate1000 genomes were sequenced using Illumina or PacBio technology, and were assembled and annotated as summarized in the Online Methods. All genomes were assessed as high quality using CheckM 19 with >99% completeness on average, and in accordance with proposed standards 20 . The genome statistics can be found in Supplementary Table 2.
The 501 sequenced organisms analyzed in this study are listed in Supplementary Table 1. We refer to these 501 genomes (480 bacteria and 21 archaea) as the Hungate genome catalog. Supplementary Table 3 provides a comprehensive chronological list of all publicly available completed rumen microbial genome sequencing projects, including anaerobic fungi and genomes that have been recovered from metagenomes but that were not included in our analyses.
Members of the Firmicutes and Bacteroidetes phyla predominate in the rumen 21,22 and contribute most of the Hungate genome sequences (68% and 12.8%, respectively; Supplementary Fig. 1a), with the Lachnospiraceae family making up the largest single group (32.3%). Archaea are mainly from the Methanobrevibacter genus or are in the Methanomassiliicoccales order. The average genome size is ~3.3 Mb (Supplementary Fig. 1b), and the average G+C content is 44%. Most organisms were isolated directly from the rumen (86.6%), with the remainder isolated from feces or saliva. Most cultured organisms were from bovine (70.9%) or ovine (17.6%) hosts, but other ruminant or camelid species are also represented ( Table 1).
The Global Rumen Census project 22 profiled the microbial communities of 742 rumen samples present in diverse ruminant species, and found that rumen communities largely comprised similar bacteria and archaea in the 684 samples that met the criteria for inclusion in the analysis. A core microbiome of seven abundant genus-level groups was defined for 67% of the Global Rumen Census sequences 22 . We overlaid 16S rRNA gene sequences from the 501 Hungate genomes onto the 16S rRNA gene amplicon data set from the Global Rumen Census project (Fig. 1). This revealed that our Hungate genomes represent ~75% of the genus-level taxa reported from the rumen.
Previous studies of the rumen microbiome have highlighted unclassified bacteria as being among the most abundant rumen microorganisms 10,21 , and we also report 73 genome sequences from strains that have yet to be taxonomically assigned to genera or phenotypically characterized (Supplementary Table 1). Most abundant among these uncharacterized strains are members of the order Bacteroidales (RC-9 gut group) and Clostridiales (R-7 group), and this abundance points to a key role for these strains in rumen fermentation 22 . The RC-9 gut group bacteria have small genomes (~2.3 Mb), and the closest named relatives (84% identity of the 16S rRNA gene) are members of the genus Alistipes, family Rikenellaceae. The R-7 group are most closely related to Christensenella minuta (86% identity of the 16S rRNA gene), family Christensenellaceae.  . 2a) were found in isolates with large genomes including Bacteroides ovatus (over 320 glycoside hydrolases (GH) and polysaccharide lyases (PL) from ~60 distinct families), Lachnospiraceae bacterium NLAE-zl-G231 (296 GHs and PLs), Ruminoclostridium cellobioparum ATCC 15832 (184 GHs and PLs) and Cellvibrio sp. BR (158 GHs and PLs). The most prevalent CAZyme families are shown in Supplementary Figure 3. Bacteria that initiate the breakdown of plant fiber are predicted to be important in rumen microbial fermentation (Fig. 2b), including representatives of bacterial groups capable of degrading cellulose, hemicellulose (xylan/xyloglucan) and pectin (Fig. 2c).
Examination of the CAZyme profiles (Supplementary Fig. 3) highlights the degradation strategies used by different taxa present in our collection. Members of the phylum Bacteroidetes have evolved polysaccharide utilization loci (PULs), genomic regions that encode all required components for the binding, transport and depolymerization of specific glycan structures. Predictions of PUL organization in all 64 Bacteroidetes genomes from the Hungate catalog have been integrated into the dedicated PULDB database 27 . The pectin component rhamnogalacturonan II (RG-II) is the most structurally complex plant polysaccharide, and all the CAZymes required for its degradation occur in a single large PUL recently identified in Bacteroides thetaiotaomicron 28 . Similar PULs encoding all necessary enzymes were also found in rumen isolates belonging to three different families within the phylum Bacteroidetes ( Supplementary  Fig. 2 and Supplementary Fig. 4). Another feature of the Bacteroidetes genomes and PULs is the prevalence of GH families dedicated to the breakdown of animal glycans (Supplementary Figure 2). Host glycans are not thought to be used as a carbohydrate source for rumen bacteria, and most of the genomes with extensive repertoires of these enzymes (Bacteroides spp.) were from species that were isolated from feces. However, ruminants secrete copious saliva and the presence of animal glycan-degrading enzymes in rumen Prevotella spp. may enable them to utilize salivary N-linked glycoproteins 29 , and help explain their abundance in the rumen microbiome 22 .
The multisubunit cellulosome is an alternative strategy for complex glycan breakdown in which a small module (dockerin) appended to glycan-cleaving enzymes anchors various catalytic units onto cognate cohesin repeats found on a large scaffolding protein 30 . Cellulosomes have been reported in only a small number of species, mainly in the family Ruminococcaceae in the order Clostridiales. Supplementary Table 4 reports the number of dockerin and cohesin modules found in the reference genomes and the main cellulosomal bacteria are highlighted in Supplementary Figure 2. We find that Clostridiales bacteria can be divided into four broad categories: (i) those that have neither dockerins nor cohesins (non-cellulosomal species), (ii) those that have just a few dockerins and no cohesins (most likely non-cellulosomal), (iii) those that have a large number of dockerins and many cohesins (true cellulosomal bacteria like Ruminococcus flavefaciens) and (iv) those that have a large number of dockerins but just a few cohesins like R. albus and R. bromii. In R. albus, it is likely that a single cohesin serves to anchor isolated dockerin-bearing enzymes onto the cell surface rather than to build a bona fide cellulosome. The starchdegrading enzymes of R. bromii bear dockerin domains that enable them to assemble into cohesin-based amylosomes 31 , analogous to cellulosomes, which are active against particulate resistant starches. R. bromii strains from the human gut microbiota and the rumen encode similar enzyme complements 31 . Most of what is known about microbial fermentation pathways in the rumen has been derived from measurements of end product fluxes or inferred from pure or mixed cultures of microorganisms in vitro, and based on reference metabolic pathways present in non-rumen microbes. The relative participation of particular species in each pathway, or their contribution to end product formation in vivo, is poorly characterized. To determine the functional potential of the sequenced species, we used genome information in combination with the published literature to assign bacteria to different metabolic strategies, on the basis of their substrate utilization and production of specific fermentation end products (Supplementary Table 5). The main metabolic pathways and strategies are present in at least one of, or combinations of, the most abundant bacterial and archaeal groups found in the rumen (Fig. 2b); as a result, we now have a better understanding of which pathways are encoded by these groups. The analysis also provides the first information on the contribution made by the abundant but uncharacterized members of the orders Bacteroidales and Clostridiales to the rumen fermentation. This metabolic scheme provides a framework for the investigation of gene function in these organisms, and the design of strategies that may enable manipulation of rumen fermentation.
Gene loss. One curious feature of several rumen bacteria is the absence of an identifiable enolase, the penultimate enzymatic step in glycolysis, which is conserved in all domains of life. Examination of >30,000 isolates from the Integrated Microbial Genomes with Microbiomes (IMG/M) database 32 revealed that enolase-negative strains were rare (<0.5% of total), and that a high proportion of such strains were rumen isolates belonging to the genera Butyrivibrio and Prevotella and uncharacterized members of the family Lachnospiraceae (Supplementary Table 5). In the genus Butyrivibrio approximately half the sequenced strains lack enolase, while some show a truncated form. The distribution of this enzyme in relation to the phylogeny of this genus is shown in Figure 3. This analysis suggests that enolase is in the process of being lost by some rumen Butyrivibrio isolates and that we may be observing an example of environment-specific evolution by gene loss 33 . Although the adaptive advantage conferred by loss of enolase is not clear, there is a possible link with pyruvate metabolism and lactate production. Several enolase-negative Butyrivibrio    22 using information from metabolic studies and analysis of the reference genomes. The abundance and prevalence data shown in the table are taken from the Global Rumen Census project 22 . Abundance represents the mean relative abundance (%) for that genus-level group in samples that contain that group, while prevalence represents the prevalence of that genus-level group in all samples (n = 684).* The conversion of choline to trimethylamine, and propanediol to propionate generate toxic intermediates that are contained within bacterial microcompartments (BMC). Cultures from the reference genome set that encode the genes required to produce the structural proteins required for BMC formation are shown in Supplementary Table 5 r e s o u r c e strains do not produce lactate and 12 also lack the gene for l-lactate dehydrogenase. Conversely the enolase and l-lactate dehydrogenase genes are co-located in seven strains. An attempt to identify additional functions exhibiting a similar pattern of gene loss (or a complementing gain of function) by comparing enolase-positive versus enolase-negative Butyrivibrio spp. strains yielded no substantial additional insights (Supplementary Table 6).
Another example of gene loss is seen in bacteria that have lost their complete glycogen synthesis and utilization pathway, as shown by the concomitant loss of families GH13, GH77, GT3 or GT5, and GT35 (Supplementary Fig. 2). These bacteria include nutritionally fastidious members of the Firmicutes (Allisonella histaminiformans, Denitrobacterium detoxificans, Oxobacter pfennigii) and Proteobacteria (Wolinella succinogenes), and have also lost most of their degradative CAZymes, suggesting that they have evolved toward a downstream position as secondary fermenters where they feed on fermentation products (acetate, pyruvate, amino acids) from primary degraders.
Biosynthetic gene clusters. We searched the Hungate genomes for biosynthetic gene clusters (Supplementary Fig. 5 and Supplementary  Table 7) to identify evidence of secondary metabolites that might be used as rumen modifiers to reduce methane production through their antimicrobial activity 34 . A total of 6,906 biosynthetic clusters were predicted from the Hungate genomes (Supplementary Note 2).
CRISPRs. Identification of CRISPR-Cas systems and their homologous protospacers from viral, plasmid and microbial genomes could shed light on past encounters with foreign mobile genetic elements 35 and somewhat indirectly, habitat distribution and ecological interactions 36 . A total of 6,344 CRISPR spacer sequences were predicted from 241 Hungate genomes and searched against various databases (Supplementary Table 8

Metagenomic sequence recruitment
We evaluated whether the Hungate catalog can contribute to metagenomic analyses by using a total of 1,468,357 coding sequences (CDSs) from the 501 reference genomes to search against ~1.9 billion CDS predicted from more than 8,200 metagenomic data sets from diverse habitats. A total of 892,995 Hungate CDSs (~60%) were hits to 13,364,644 metagenome proteins at ≥ 90% amino acid identity. 466 out of 501 Hungate isolates recruited sequences from 2,219 metagenomic data sets derived from host-associated, environmental or engineered sources ( Fig. 4 and Supplementary Table 9). The large number of human samples recruited (1,699) can be attributed to the greater availability of human samples compared to metagenomes from other mammals, including ruminants. Considering the number of isolate CDSs with hits to metagenome sequences (% coverage), most Hungate genomes (413/501) are represented in rumen metagenome samples, as well as in human or other vertebrate samples (Fig. 4). The average % coverage for 466 recruited genomes was 26.5% of total CDS, with Sharpea azabuensis DSM 18934 showing the highest capture (95.6%) in a sheep rumen metagenome (Supplementary Fig. 6).
Examining recruitment against available rumen metagenomes, a majority of 336 isolates were captured in 24 rumen samples (27% average coverage) (Supplementary Fig. 7 and Supplementary  Table 9). A further 52 rumen isolates may be included if the hit count recruitment parameter is relaxed from 200 to 50. These isolates are predicted to occur in relatively low abundance in these rumen metagenomes, and raise the proportion of recruiters to almost 80% of the total Hungate catalog. Top recruitment (in terms of % coverage of total isolate CDS) was by organisms previously identified as dominant genera in the rumen 10,22,37,38 , such as Prevotella spp., Ruminococcus spp., Butyrivibrio spp. and members of the unnamed RC-9, R-7 and R-25 groups. Some Hungate catalog genomes were exclusively detected in one or a few samples originating from the same ruminant host (e.g., sheep-associated Sharpea, Kandleria and Megasphaera strains), whereas others were detected across all ruminants (e.g., Prevotella spp.). It is, however, important to acknowledge the limitations of existing rumen metagenome samples (not merely in terms of their paucity), as they were sourced from animals on special diets (e.g., switchgrass 5 or lucerne (alfalfa) pellets 39 ), which may alter the microbiome 22 .
165 Hungate cultures were not detected in deposited rumen metagenome data sets under the thresholds applied. Many of these (~50) were of fecal origin, and reflect how the microbiota of the rumen is distinct from that found in other regions of the ruminant GI tract 40 .  r e s o u r c e A total of 68 isolates were recruited by both rumen and human intestinal samples and represent shared species between the rumen and human microbiomes (Fig. 4), possibly fulfilling similar roles. A further 66 Hungate isolates were recruited by human samples but were not detected in rumen samples, giving a total of 134 Hungate catalog genomes that recruited various human samples, making them valuable reference sequences for the analysis of human microbiome samples. This observation is also indirectly recapitulated by the CRISPR-CAS systems-based analysis, which showed links to spacers from human intestinal samples, particularly for Hungate isolates of fecal origin (Supplementary Table 8). Additional metagenome recruitment analysis details are provided in Supplementary Note 4.
Comparison with human gut microbiota Many Hungate strains (134/501) were shared between rumen and human intestinal microbiome samples. This is unsurprising, as both habitats are high-density, complex anaerobic microbial communities, producing similar fermentation products, and with extensive interspecies cross-feeding and interaction 41 . We performed a comparative analysis against available human intestinal isolates (largely from the HMP), to identify differences that can be attributed to distinct lifestyles and adaptive capacity of rumen microorganisms. The Hungate and human intestine isolate collections were curated to remove redundancy, low-quality genomes and known human pathogens. This resulted in a set of 458 rumen and 387 human intestinal genomes (Supplementary Table 10), which was used to identify protein families in the Pfam database that were differentially abundant in isolates from each environment. Out of 7,718 Pfam domains found in 458 non-redundant Hungate isolate genomes, we determined 367 were over-represented in the ruminal genomes and 423 were under-represented on the basis of the falsediscovery rate (FDR), q-value < 0.001 (Supplementary Table 11). Over-represented Pfams (Fig. 5) included enzymes involved in plant cell wall degradation (GH11, GH16, GH26, GH43, GH53, GH67, GH115), carbohydrate-binding modules (CBM2, CBM3, and cohesin and dockerin modules associated with cellulosome assembly) and GT41 family glycosyl transferases, which occur predominantly in the genera Anaerovibrio and Selenomonas. Notably, Pfams for the biosynthesis of cobalamin (vitamin B 12 ), an essential micronutrient for the host, were over-represented. Vitamin B 12 biosynthesis is one of the most complex pathways in nature, involving more than 30 enzymatic steps, and given its high metabolic cost, is only encoded by a small set of bacteria and archaea. We examined this biosynthetic pathway in more detail using other functional annotation types (KO and Tigrfam) across the 501 Hungate isolates, and discovered that 12 or more enzymatic steps were overrepresented in the Hungate genomes, and at least 47 isolates might be capable of de novo B 12 synthesis (Supplementary Table 12). Many of these were members of the Class Negativicutes within the Firmicutes (Anaerovibrio, Mitsuokella, and Selenomonas). A further 140 (including 21 archaeal) genomes encode enzymes for the salvage of B 12 from an intermediate, and may even work cooperatively (based on potential complementarity of lesions in the pathway in different members) to share and synthesize corrinoids for community and/or host benefit. These observations reflect the high burden of a requirement for vitamin B 12 , which is needed as a cofactor for enzymes involved in gluconeogenesis from propionate in the liver. This process is essential for lactose biosynthesis and milk production in dairy animals 42 , and dairy and meat products of ruminant origin are important dietary sources of B 12 (ref. 43). By contrast, it has been speculated that human gut microbes were unlikely to contribute significant amounts of B 12 for their host and were likely competitors for dietary B 12 (ref. 44).
Of the Pfams (Fig. 5) under-represented in Hungate genomes, the occurrence of all steps for the oxidative branch of the pentose phosphate pathway (OPPP) was striking. The role of the OPPP is primarily the irreversible production of reducing equivalents (NADPH), although other enzymes may serve as alternate sources of reducing equivalents. As discussed above, the Pfam for enolase appeared in the list of under-represented families. The list also contained several Pfams associated with bacteriophage functions and sporulation. The differential abundance of sporulation genes is interesting as the observation that sporulation genes are abundant in human gut bacteria has been made recently 16,31 and is potentially linked with resistance to oxygen exposure. This observation is particularly striking given the preponderance of Firmicutes, an archetypically spore-forming phylum 45 , in the rumen set. Large and small subunits of an oxygen-dependent Class I type ribonucleotide reductase were also under-represented together with several other Pfams implicated in oxygen tolerance, suggesting that human intestinal isolates may encounter higher oxygen tension compared to the strictly anaerobic ruminal ecosystem. These observations indirectly suggest that  Table 9 for data and other specifics. r e s o u r c e host genetics and physiology influence rumen microbiome composition and that rumen microbes are likely to be vertically inherited as indicated in recent studies 46,47 . Conversely, human intestinal (more specifically, fecal) isolates are transmitted from other sources in the environment 31,48 . We were able to recapitulate these findings in a metagenome-based comparison of these two environments (sheep rumen samples against normal human fecal samples; Supplementary Table 13), suggesting that these differences cannot be explained by cultivation or abundance biases in the isolate data sets.

DISCUSSION
The Hungate genome catalog that we report here includes genomic analysis of 501 bacterial and archaeal cultures that represent almost all of the cultured rumen species that have been taxonomically characterized, as well as representatives of several novel species and genera. This high-quality reference collection will guide interpretation of metagenomics data sets, including genomes recovered from metagenomes (MAGs). The Hungate genome catalog also allows robust comparative genomic analyses that are not feasible using incomplete sequence data from metagenomes. Researchers have access to Hungate Collection strains, which will enable a better understanding of carbon flow in the rumen, including the breakdown of lignocellulose, through the metabolism of substrates to SCFAs and fermentation end products, to the final step of CH 4 formation.
The Hungate genome collection is by no means complete. Some important taxa are missing, especially members of the order Bacteroidales 10,22 . At the start of this project genome sequences were available for strains belonging to 11 (12.5%) of the 88 genera described for the rumen. Currently, genome sequences are available for 73 (83%) of those 88 genera, as well as for 73 strains that are only identified to the family or order taxonomic level. Of the rumen 'most wanted list' which comprises 70 rumen bacteria 10 , the Hungate Collection has now contributed 30 members. In addition to missing bacteria and archaea, the sequencing of rumen eukaryotes presents considerable technical challenges and although some progress has been made in sequencing of anaerobic fungi 49 , there are no genome data for rumen ciliate protozoa, and only preliminary data on the rumen virome 50 .
Microbiome research is moving from descriptive to mechanistic, and to translation of those mechanisms into interventions 51 . Using rumen microbiome data to engineer rumens to reduce CH 4 emissions 52 and improve productivity and sustainability outcomes is now in sight 53 . The Hungate Collection provides a starting point for this, shedding light on what has been described as 'the world's largest commercial fermentation process' 54 . Future studies can use the Hungate resources to improve the resolution of rumen meta-omics analyses, to identify antimicrobials, to source carbohydrate-degrading enzymes from the rumen for use as animal feed additives and in lignocellulose-based biofuel generation, and as the basis for synthetic microbial consortia.

METHODS
Methods, including statements of data availability and any associated accession codes and references, are available in the online version of the paper. We are grateful to J. White of Resphera Biosciences for assistance with using Metastats. We thank L. Olthof and G. Peck for isolating cultures that were included in this work. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the US Department of Agriculture. The project title refers to the pioneering work in culturing strictly anaerobic rumen bacteria carried out by Robert (Bob) E. Hungate, who trained many of New Zealand's first rumen microbiologists.

CoMPETING INTERESTS
The authors declare no competing interests.
reprints and permissions information is available online at http://www.nature.com/ reprints/index.html. Publisher's note: springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This article is licensed under a creative commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the creative commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's creative commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's creative commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. r e s o u r c e features were delineated using a q-value cutoff of <0.001, and less populous or sparsely recruited Pfams were also eliminated (where the sum of gene counts in each genome set was <100) (Supplementary Table 11, worksheet designated "Q-val<0.001_edited"). A second worksheet labeled "Q-val<0.005" shows a larger subset of differentially abundant Pfams applying the less stringent threshold of Q-value < 0.005, and including results for Pfams with sparse counts. Pfam was chosen for this primary analysis because it is the largest and most widely used source of manually curated protein families, with nearly 80% coverage (on average) of total CDS in these microbial genomes. KO terms or TIGRFAMS were also assessed to validate and complement Pfam-based findings or to examine specific pathways more closely. For comparisons of enolase-positive versus enolase-negative Butyrivibrio spp. strains, Metastats 79 was employed in conjunction with contrasting upper and lower quartile or percentile gene counts, in order to identify additional functions with a similar pattern of preservation/loss as the glycolytic enolase gene. For metagenomes-based comparisons, previously published sheep rumen (IMG IDs: 3300021254, 300021255, 3300021256, 3300021387, 3300021399, 3300021400, 3300021426, 3300021431) and human intestinal (IMG IDs: 3300008260, 3300008496, 3300007299, 3300007296, 3300008272, 3300007361, 3300008551, 3300007305, 3300007717) metagenomes were reassembled using metaSPAdes 80 , annotated and loaded into IMG. Estimated gene copy numbers (calculated by multiplying gene count with read depth for the scaffold the gene resides on) were compared using Metastats (as described above).

Statistical analysis. Refer to the Life Sciences Reporting Summary.
Life Sciences Reporting Summary. Further information on experimental design is available in the Life Sciences Reporting Summary. Data availability. All available genomic data and annotations are available through the IMG portal (https://img.jgi.doe.gov/). Additionally, a dedicated portal to download all 410 genomes sequenced in this study is provided: https://genome.jgi.doe.gov/portal/pages/dynamicOrganismDownload.jsf?or ganism=HungateCollection.