Novel clostridial lineages recovered from metagenomes of a hot oil reservoir.

Oil reservoirs have been shown to house numerous microbial lineages that differ based on the in-situ pH, salinity and temperature of the subsurface environment. Lineages of Firmicutes, including Clostridiales, have been frequently detected in oil reservoirs, but are typically not considered impactful or relevant due to their spore-forming nature. Here we show, using metagenomics, a high temperature oil reservoir of marine salinity contains a microbial population that is predominantly from within the Order Clostridiales. These organisms form an oil-reservoir specific clade based on the phylogenies of both 16S rRNA genes and ribosomal proteins, which we propose to name UPetromonas tenebris, meaning they are single-celled organisms from dark rocks. Metagenome-assembled genomes (MAGs) of these Petromonas sp. were obtained and used to determine that these populations, while capable of spore-formation, were also likely replicating in situ in the reservoir. We compared these MAGs to closely related genomes and show that these subsurface Clostridiales differ, from the surface derived genomes, showing signatures of the ability to degrade plant-related compounds, whereas subsurface genomes only show the ability to process simple sugars. The estimation of in-situ replication from genomic data suggest that UPetromonas tenebris lineages are functional in-situ and may be specifically adapted to inhabit oil reservoirs.

lineages containing genes initially classified as hydrocarbon degradation genes, but well-characterized anaerobic hydrocarbon-degrading genes, bssA and assA, were not detected. Instead, robust signatures of polysaccharide, peptide, and fatty acid degradation were seen as well as robust pathways of sugar fermentation 4 . To our knowledge, few studies have assessed the portion of sporulated and vegetative cells of Clostridiales in oil reservoirs.
In this study, we sampled production wells from Galveston 209, a mature oil field ~32 km south of Galveston Island, Texas. Well temperatures of this oil field range from 80 to 160° Celsius. Discovered in 1983, the field was extensively developed off two platforms (A & B) with a cumulative production over 23 MBO (million barrels of oil) and 300 GCF (gas cubic feet) out of stacked pay of Lower to Middle Miocene shallow marine sandy reservoirs (~2.1 to 4.6 km depth) with outer shelf mudstone seals, all deposited in a wave-dominated deltaic setting. Production data indicates that there is a strong water drive through well-connected, continuous reservoirs. Traps are in low-relief fault-dependent closes on basinward-dipping listric normal faults. In this study, microbial communities in the produced fluids (mixture of oil and water) of these hot wells were examined via metagenomics. Functional annotation and the generation of metagenome assembled genomes (MAGs) were used to examine potential community metabolisms. A novel clostridial lineage which we propose to name U Petromonas tenebris was the dominant microbial signature in this subsurface environment, and in-situ replication of this organism can be detected via metagenomics.

Methods
Collection of produced fluids. Produced fluids were collected using sterile technique at the drilling rig offshore Texas. Fluids came from 4 distinct wells that had temperatures ranging from 88 to 102° C (Table 1). Produced fluids were filtered through a Sterivex filter until the filter clogged, which was seen between 300-400 mL of fluid. The Sterivex filter was frozen immediately and stored at −80 °C until use. DNA extraction and sequencing. DNA was extracted using a modified version of the Qiagen PowerWater Sterivex filter extraction kit 18 . DNA was checked versus a blank control extraction for bacterial PCR products, and once it was determined that DNA was amplifiable and contained a microbial signal, it was sent for metagenomic library preparation and sequencing via Illumina HiSeq at the University of Delaware Genomic Sequencing Facility. Raw sequences and MAGs for this project are deposited at NCBI under Bioproject PRJNA578106.
Quality trim and assembly. Raw Illumina reads were quality trimmed in CLCBio Workbench version 7.5.1 (Qiagen), with the following parameters: removal of low quality sequence (limit = 0.0016, but rounded to 0.002 by CLCBio, which represents a Phred score of 36 or better); removal of ambiguous nucleotides: No ambiguous nucleotides allowed; removal of terminal nucleotides: 2-12 nucleotides from either end to minimize sequencing errors and enriched 5mers; removal of sequences on length: minimum length 60 nucleotides. Whenever one read of a read pair was excluded due to the quality trim, the entire pair was excluded. Trimmed, paired reads were assembled using IDBA-UD version 1.1.1. with the following settings:-mink 40-maxk 120-step 20-min_contig 300 19 . The resulting scaffolds were then used for further genome binning of each reservoir metagenome.
phylogeny. The phylogeny of metagenome community members was determined using EMIRGE (Expectation-Maximization Iterative Reconstruction of Genes from the Environment) based on the reconstructed 16S rRNA gene sequences from unassembled data 20 . A maximum likelihood phylogenetic tree of 16S www.nature.com/scientificreports www.nature.com/scientificreports/ rRNA gene was inferred from these sequences using Mega version 5 using default parameters for alignment and tree construction with 500 bootstrap replicates 21 .
Metagenome-assembled genomes (MAGs). Metagenome assembly of individual samples were subjected to binning using MaxBin version 1.4.2 with the max iteration of 200 22 . The taxonomical uniqueness of each resulting MAG was initially determined using Phylosift version 1.0.1 23 with the default parameters. The level of potential contamination and strain heterogeneity in each MAG was evaluated using CheckM 1.0.6 with the "lineage_wf " option 24 . The VizBin program 25 was then used to visually refine the MAGs to minimize outlier scaffolds. Close relative genomes of the clostridial MAGs were downloaded from NCBI (C. sporogenes DSM14501 NZ_FRAJ00000000.1 and P. caminithermalis DSM15212 NZ_FRAG00000000.1). Average nucleotide identity (ANI) between the MAGs and the two reference genomes were calculated using PyANI 26 implemented in Anvio v5.5 27 , by following the procedure described here (http://merenlab.org/2016/11/08/pangenomics-v2/). Pair-wise average amino acid identity (AAI) was calculated as one-way AAI and two-way AAI using the online tool AAI calculator (http://enve-omics.ce.gatech.edu/aai/).
A collection of 16 ribosomal proteins 28 from each MAG were extracted using the Geneious software (Biomatters, Auckland, New Zealand) from the PROKKA annotation. Also included for phylogenetic comparison are genomes from closely related microbial groups downloaded from National Center for Biotechnology Information (NCBI): E. coli, H. congolese, T. gondwanense, D. alkaliphilum, C. cellulovorans, C. hydrogeniformans, C. acetium, C. forminaceticum, C. sporogenes, P. caminithermalis, M. halophilus. Ribosomal proteins were concatenated and aligned with MAFFT v7.392 29 for each of the recovered genomes. Only those genomes that had over 50% of the ribosomal proteins were used in the analysis. The alignment output was used to generate maximum likelihood phylogenetic trees with 100 iterations using FastTree v2.1.11 30 .

Results
Metagenomes were generated using DNA extracted from four samples taken from produced fluids from an oil field located in the Gulf of Mexico, offshore Texas. The reservoirs in this formation are hot and salty, with temperatures ranging between 88-102 °C, salinity values exceeding 28% and sulfate values below seawater levels. Four separate wells access likely connected reservoir material in this system, and each well has somewhat similar but distinct geochemical conditions (Table 1).
We generated metagenomic sequencing data from each well, and the metagenome assembly resulted in 9,600-34,300 contigs varying in total length of 18-43 million basepairs (Table 2).
Based on 16S rRNA gene sequences reconstructed by EMIRGE, the most abundant organism in each metagenome was the unique lineage, U Petromonas tenebris, which comprised from 69.8% to 96.7% of the microbial community (Fig. 1). Petrotoga, Geotoga, Euryarchaea and other members of Clostridiales were less abundant members of the community (Fig. 1). The closest relatives were uncultivated lineages found in other oil reservoirs (Fig. 2).
Phylogenetic analysis suggested that the nearest cultured relatives of the dominant 16 S rRNA gene from the oil reservoirs were Caminicella sporogenes and Paramaledivibacter caminithermalis (formerly Clostridium  www.nature.com/scientificreports www.nature.com/scientificreports/ caminthermale), both of which are moderately thermophilic and halophilic and were initially isolated from deep-sea hydrothermal vent systems (Fig. 2).
Metagenome assembled genomes (MAGs) of U Petromonas tenebris lineages were recovered. These high-quality MAGs were 99.2% to 100.0% complete with <5% contamination, as assessed based on single-copy genes by CheckM ( Table 3). The estimated sizes of the genomes range between 2.7-3.0 Mb, which ranges in between the relatives of Ca. sporogenes (2.5 Mb) and P. carminithalis (4.1 Mb). The number of annotated genes of the MAGs ranged from 2,815-2,959 (Table 3).
Phylogenetic analysis of concatenated ribosomal protein sequences shows that the clostridial MAGs found in these reservoir samples are most closely related to Caminicella sporogenes and Paramaledivibacter caminithermalis, along with Maledivibacter halophilus (Fig. 3). These closely related organisms form a group of thermophilic and halophilic Clostridia within the Clostridiales 38,39 . Both 16S rRNA gene and concatenated ribosomal protein sequences were similar for each of the reservoir MAGs suggesting that the similar organisms were present in all four reservoir samples (Figs. 2 and 3).
As such, we provisionally name the organism U Petromonas tenebris as the MAGs fit the suggested metrics for establishment of a new genus and species 40 . The name comes from "petra" (rock), "monas" (single celled organism) and "tenebris" (dark) since the phylogenetic trees show relatives of these organisms are all found in oil reservoirs, the U indicates its uncultivated status.     www.nature.com/scientificreports www.nature.com/scientificreports/ than those found in the isolate genomes include a number of oxidoreductases, including Fe-S oxidoreductase, citrate lyase, and CO dehydrogenase-CoA synthase subunits. There were also between 8-13 COGs unique to each MAG that did not appear in any other MAG or cultured genome (Fig. 4). These unique genes were primarily housekeeping genes except that U P. tenebris B7 had a unique COG1719, which is a predicted hydrocarbon binding protein.
The potential phylogenetic history of functional genes in the reservoir MAGs was examined to determine if they had any unusual evolutionary histories. Each gene was individually examined via BLAST and the taxonomy of the best hit was recorded. About 48% of the coding genes in the reservoir MAGs appear to have originated within the Caminicella, Maledivibacter, and Paramaledivibacter genera, with the remainder coming from other Clostridiales or Firmicutes groups (Fig. 5).
Many metabolic pathways are shared between the reservoir MAGs and their cultured relatives. Shared core metabolisms include butyrate fermentation, sporulation, and common two component regulatory systems involving temperature, salt stress, chemotaxis, and flagella regulation. However, many differences between the MAGs and the closely related cultured organisms exist. For example, there are differences in Carbohydrate Active Enzymes (CAZY) between each of the cultured genomes and between the cultured genomes and the reservoir MAGs ( Table 5). The reservoir MAGs have unique genes and transporters involved in the metabolism of sucrose   which are not present in the cultured genomes. Conversely, the cultured isolate genomes are capable of processing extracellular cellobiose and xylan/xylose, while the reservoir MAGs lack these genes.
Other key metabolic differences include a complete Wood-Ljungdahl CO 2 fixation pathway found in three of the four reservoir MAGs, but not in the other genomes ( Table 6). COGs of the key enzyme, CO dehydrogenase/ acetyl CoA synthase, noted above, were not found in the cultured genomes. Certain sulfur metabolism genes are present in the reservoir MAGs but not in the cultured genomes. These include anaerobic sulfite reductase (ASR), adenosine 5′-phosphosulfate reductase (APR), and sulfite reductase (ferredoxin). ASR is typically part of the assimilatory sulfate reduction pathway, and APR is typically present in dissimilatory pathways, but neither of these pathways is complete in the MAGs. In addition, neither of the qmoABC and dsrMKJOP electron transport complex genes typically found in sulfate reducers are present in the MAGs. As a result, it does not appear that any of the reservoir MAGs and cultured genomes can perform dissimilatory sulfate reduction for energy conservation, but instead use sulfur compounds as electron sinks for fermentation.
Both Caminicella sporogenes and Paramaledivibacter caminithermalis contain a glycyl radical enzyme with the same active site as the gene annotated as a pyruvate formate lyase (pflD, locus tag AF1449) found in Archaeoglobus species, which may be used to anaerobically metabolize some hydrocarbons 41 . However, the annotated pyruvate formate lyase genes found in the U Petromonas tenebris MAGs contain a different active site that was found in more typical carbohydrate fermenters, leading to the conclusion that these are indeed typical pyruvate formate lyase genes used in the majority of anaerobic bacteria 41 . We examined the metabolic profile of the MAGs via standard annotation by PROKKA and also via individual BLAST analysis for known anaerobic hydrocarbon degradation genes 42,43 . We found no evidence for other anaerobic hydrocarbon metabolism pathways, except that the gamma subunit of acetophenone carboxylase, an enzyme in the ethylbenzene degradation pathway, was present in each of the MAGs. However, the gene encoding ethylbenzene dehydrogenase, the initial enzyme in this pathway 44 , could not be found. As such, based on current genomes and annotations, U Petromonas tenebris seems not to be capable of utilizing hydrocarbons. We do caveat that as seen in the Archaeoglobus case 41 , genes may exist for hydrocarbon degradation that are either unannotated or misannotated and full proof cannot be given until a culture is tested.
To determine if the U Petromonas tenebris MAGs came from spores or active, vegetative cells we estimated the index of replication (iRep) of the MAGs in the four oil reservoirs. The resulting iRep indexes ranged from 1.35-1.40 for the reservoir MAGs. For reference, this iRep value is comparable to the median values seen in other environments, including soil (1.34) and human gut systems (1.37-1.42) 37 , and should be interpreted as a measurement indicative of growth. The value indicates 35-40% of these cells were replicating in situ at time of sampling.

Discussion
The hot and salty oil reservoirs described here represent a challenging environment for microbial growth. Using metagenomic analysis, we found the dominant species in these reservoirs is related to the thermophilic Caminicella/Paramaledivibacter clades of thermophilic and halophilic Clostridiales, and form a distinct clade with other uncultured organisms found in high-temperature oil reservoirs (Fig. 2). Due to these unique features, Figure 5. Potential source phylogenies for genes present in the U Petromonas tenebris bins as determined by DarkHorse, which retrieves the phylogeny of the best BLAST hit for each sequence. Error bars represent standard deviations among the four bins. Nearly 25% of the genes represent the Caminicella genus, the remainder appear more closely related to other groups within the Firmicutes.  45 and formation water from the high temperature Ekofisk oil field in the North Sea 46 . While these relatives do not grow at temperatures as high as observed in this environment, hyperthermophilic clostridial species have been documented from oil wells previously 47 . However, no genomic information is available for comparison to these MAGs.
It is unlikely that these MAGs represent infrastructure contaminants or sporulated, inactive cells. First, 35-40% of these cells were in the process of replication. Additionally, the DNA was readily extractable and this species dominated the community, also suggesting it was not heavily sporulated. Considering the apparent lack of hydrocarbon consumption by these cells, it is plausible that clostridial spores present in the subsurface may have germinated as the oil seeped upward from hotter source rocks to a slightly cooler reservoir formation, where water was present. However, we interpret the low diversity of the community to be reflective of the challenging in-situ environment. We cannot refute the hypothesis that spores may be germinating en route to oil processing inside pipelines, however, we note that communities which show infrastructure influence are typically much more complex, reflecting the increase in electron acceptors and metals available within pipelines, as well as industrially introduced materials 4 . As such, the entirety of the data presented suggests these are active in-situ.
Other organisms in this system in lower abundance (Fig. 1) include Methanothermococcus methanogens which use hydrogen and formate as electron donors 48 , and Desulfallas species such as D. gibsonae and D. geothermicum, which utilize simple organic compounds, including some carbohydrates and/or fatty acids, and alcohols such as ethanol, propanol, and butanol, as electron donors and sulfur compounds as electron acceptors, producing carbon dioxide or acetate as end products 49 . Also present is Petrotoga mobilis, which is a fermenter of a variety of carbohydrates including xylan 50 . This system may be a syntrophic methanogenic system in which the U Petromonas and Petrotoga ferment complex organic compounds, with Desulfallas and Methanothermococcus scavenging the fermentation products. Based on the analysis of genomes, no alkane metabolizing partner has been detected, compared to other syntrophic methanogenesis systems that were explored via enrichment