Introduction

Oil reservoirs are unique subsurface environments that can be hostile to life due to high temperatures, pressures, and salinity (reviewed in1). It is thought that temperatures in source rocks or reservoirs greater than 80 °C act to deter hydrocarbon degradation and sterilize microbial life2,3. As sterile crude oil and gases migrate from the hot source rocks and cool they accumulate and may be populated by existing subsurface communities in rock or flowing subterranean waters. Determining the indigenous microbial community structures in these reservoirs can be difficult because of the presence of pipeline and infrastructure contaminants, and the indigenous communities themselves can be altered in terms of member representation and activity by the injection of seawater or gases and chemical additives used for secondary recovery4.

Microbes can cause souring or consume significant quantities of lower molecular weight hydrocarbons, degrading the quality of the petroleum products. Given the potential for degradation of resources, there is interest in understanding the structure and source of these communities, and the metabolisms they are employing2. Clostridiales are frequently found in oil reservoir surveys4,5,6,7, including in systems with temperatures greater than 50 °C1,8,9. Many observed Clostridiales are spore formers, which may give them an advantage in persisting in these systems with high temperatures, pressures, and salinity. However, this ability also means that they are often overlooked in regards to their influence on the in-situ microbial community, as it is unknown if these organisms are active in the reservoir or present in a dormant, sporulated form. Some clostridial species have been implicated as reservoir souring culprits9, yet many reservoir descriptions lack any explanation of clostridial activity subsurface8,10,11,12,13,14. As such, their role in the reservoir microbial community remains poorly understood. Most Clostridiales employ fermentation as a metabolic strategy, however, many Desulfotomaculum species respire using sulfate reduction, and some Desulfotomaculum have been shown to directly utilize aromatic hydrocarbons (e.g., toluene, m-xylene, o-xylene) as carbon and energy sources15. Hydrocarbon enrichment cultures from oil reservoirs have stimulated clostridial lineages16, and clostridial hydrocarbon degradation genes have been observed in surficial oil-contaminated environments17. Previous metagenomic studies of oil reservoirs show clostridial lineages containing genes initially classified as hydrocarbon degradation genes, but well-characterized anaerobic hydrocarbon-degrading genes, bssA and assA, were not detected. Instead, robust signatures of polysaccharide, peptide, and fatty acid degradation were seen as well as robust pathways of sugar fermentation4. To our knowledge, few studies have assessed the portion of sporulated and vegetative cells of Clostridiales in oil reservoirs.

In this study, we sampled production wells from Galveston 209, a mature oil field ~32 km south of Galveston Island, Texas. Well temperatures of this oil field range from 80 to 160° Celsius. Discovered in 1983, the field was extensively developed off two platforms (A & B) with a cumulative production over 23 MBO (million barrels of oil) and 300 GCF (gas cubic feet) out of stacked pay of Lower to Middle Miocene shallow marine sandy reservoirs (~2.1 to 4.6 km depth) with outer shelf mudstone seals, all deposited in a wave-dominated deltaic setting. Production data indicates that there is a strong water drive through well-connected, continuous reservoirs. Traps are in low-relief fault-dependent closes on basinward-dipping listric normal faults. In this study, microbial communities in the produced fluids (mixture of oil and water) of these hot wells were examined via metagenomics. Functional annotation and the generation of metagenome assembled genomes (MAGs) were used to examine potential community metabolisms. A novel clostridial lineage which we propose to name UPetromonas tenebris was the dominant microbial signature in this subsurface environment, and in-situ replication of this organism can be detected via metagenomics.

Methods

Collection of produced fluids

Produced fluids were collected using sterile technique at the drilling rig offshore Texas. Fluids came from 4 distinct wells that had temperatures ranging from 88 to 102° C (Table 1). Produced fluids were filtered through a Sterivex filter until the filter clogged, which was seen between 300-400 mL of fluid. The Sterivex filter was frozen immediately and stored at −80 °C until use.

Table 1 Selected reservoir produced fluid (water) chemical concentrations and reservoir temperatures.

DNA extraction and sequencing

DNA was extracted using a modified version of the Qiagen PowerWater Sterivex filter extraction kit18. DNA was checked versus a blank control extraction for bacterial PCR products, and once it was determined that DNA was amplifiable and contained a microbial signal, it was sent for metagenomic library preparation and sequencing via Illumina HiSeq at the University of Delaware Genomic Sequencing Facility. Raw sequences and MAGs for this project are deposited at NCBI under Bioproject PRJNA578106.

Quality trim and assembly

Raw Illumina reads were quality trimmed in CLCBio Workbench version 7.5.1 (Qiagen), with the following parameters: removal of low quality sequence (limit = 0.0016, but rounded to 0.002 by CLCBio, which represents a Phred score of 36 or better); removal of ambiguous nucleotides: No ambiguous nucleotides allowed; removal of terminal nucleotides: 2-12 nucleotides from either end to minimize sequencing errors and enriched 5mers; removal of sequences on length: minimum length 60 nucleotides. Whenever one read of a read pair was excluded due to the quality trim, the entire pair was excluded. Trimmed, paired reads were assembled using IDBA-UD version 1.1.1. with the following settings:–mink 40–maxk 120–step 20–min_contig 30019. The resulting scaffolds were then used for further genome binning of each reservoir metagenome.

Phylogeny

The phylogeny of metagenome community members was determined using EMIRGE (Expectation-Maximization Iterative Reconstruction of Genes from the Environment) based on the reconstructed 16S rRNA gene sequences from unassembled data20. A maximum likelihood phylogenetic tree of 16S rRNA gene was inferred from these sequences using Mega version 5 using default parameters for alignment and tree construction with 500 bootstrap replicates21.

Metagenome-assembled genomes (MAGs)

Metagenome assembly of individual samples were subjected to binning using MaxBin version 1.4.2 with the max iteration of 20022. The taxonomical uniqueness of each resulting MAG was initially determined using Phylosift version 1.0.123 with the default parameters. The level of potential contamination and strain heterogeneity in each MAG was evaluated using CheckM 1.0.6 with the “lineage_wf” option24. The VizBin program25 was then used to visually refine the MAGs to minimize outlier scaffolds. Close relative genomes of the clostridial MAGs were downloaded from NCBI (C. sporogenes DSM14501 NZ_FRAJ00000000.1 and P. caminithermalis DSM15212 NZ_FRAG00000000.1). Average nucleotide identity (ANI) between the MAGs and the two reference genomes were calculated using PyANI26 implemented in Anvio v5.527, by following the procedure described here (http://merenlab.org/2016/11/08/pangenomics-v2/). Pair-wise average amino acid identity (AAI) was calculated as one-way AAI and two-way AAI using the online tool AAI calculator (http://enve-omics.ce.gatech.edu/aai/).

A collection of 16 ribosomal proteins28 from each MAG were extracted using the Geneious software (Biomatters, Auckland, New Zealand) from the PROKKA annotation. Also included for phylogenetic comparison are genomes from closely related microbial groups downloaded from National Center for Biotechnology Information (NCBI): E. coli, H. congolese, T. gondwanense, D. alkaliphilum, C. cellulovorans, C. hydrogeniformans, C. acetium, C. forminaceticum, C. sporogenes, P. caminithermalis, M. halophilus. Ribosomal proteins were concatenated and aligned with MAFFT v7.39229 for each of the recovered genomes. Only those genomes that had over 50% of the ribosomal proteins were used in the analysis. The alignment output was used to generate maximum likelihood phylogenetic trees with 100 iterations using FastTree v2.1.1130.

Functional annotation

PROKKA (prokaryotic annotation) version 1.11 was used to annotate the metagenomes and MAGs31. COG (Clusters of Orthologous Groups) data was obtained via a local copy of the RAMMCAP (Rapid analysis of Multiple Metagenomes with Clustering and Annotation Pipeline) using the updated version of the 2014 COG database from NCBI32.

Metabolic pathways

The presence or absence of functional genes in metabolic pathways of both cultured metagenome and MAGs were predicted using the BlastKOALA web service provided by the KEGG: Kyoto Encyclopedia of Genes and Genomes website (http://www.kegg.jp/blastkoala/), between July and October, 201733.

Carbohydrate enzymes

The Carbohydrate Active Enzymes (CAZY) annotation was performed via HMMER searches against the dbCAN release 4.0 HMM (hidden Markov model) database (downloaded from http://cys.bios.niu.edu/dbCAN/), and based on the CAZyDB released on March 17, 201534,35,36.

Estimation of growth

Growth signatures were created with the iRep program, which estimates the proportion of actively replicating cells by comparing the read recruitment to the origin of replication37.The program was run with default settings.

Results

Metagenomes were generated using DNA extracted from four samples taken from produced fluids from an oil field located in the Gulf of Mexico, offshore Texas. The reservoirs in this formation are hot and salty, with temperatures ranging between 88-102 °C, salinity values exceeding 28% and sulfate values below seawater levels. Four separate wells access likely connected reservoir material in this system, and each well has somewhat similar but distinct geochemical conditions (Table 1).

We generated metagenomic sequencing data from each well, and the metagenome assembly resulted in 9,600–34,300 contigs varying in total length of 18–43 million basepairs (Table 2).

Table 2 Metagenome Statistics.

Based on 16S rRNA gene sequences reconstructed by EMIRGE, the most abundant organism in each metagenome was the unique lineage, UPetromonas tenebris, which comprised from 69.8% to 96.7% of the microbial community (Fig. 1). Petrotoga, Geotoga, Euryarchaea and other members of Clostridiales were less abundant members of the community (Fig. 1). The closest relatives were uncultivated lineages found in other oil reservoirs (Fig. 2).

Figure 1
figure 1

Taxonomic classification of reservoir metagenomes based on 16S rRNA gene sequences reconstructed from EMIRGE analysis of metagenomic data. Other Clostridia include Desulfallas species. Euryarchaea includes Methanothermococcus. Petrotoga mobilis was the only Petrotoga species found.

Figure 2
figure 2

Maximum likelihood tree for UPetromonas tenebris 16 S rRNA sequences from the reservoirs. The sequences were generated from the EMIRGE program from unassembled Illumina reads (red), or were obtained from the Silva 16 S RNA database or NCBI (black). Relatives from similar environments such as high temperature oil reservoirs (blue) and hydrothermal vent systems (green) are indicated. Bootstrap values were generated from 500 replicates. The scale bar represents substitutions per position.

Phylogenetic analysis suggested that the nearest cultured relatives of the dominant 16 S rRNA gene from the oil reservoirs were Caminicella sporogenes and Paramaledivibacter caminithermalis (formerly Clostridium caminthermale), both of which are moderately thermophilic and halophilic and were initially isolated from deep-sea hydrothermal vent systems (Fig. 2).

Metagenome assembled genomes (MAGs) of UPetromonas tenebris lineages were recovered. These high-quality MAGs were 99.2% to 100.0% complete with <5% contamination, as assessed based on single-copy genes by CheckM (Table 3). The estimated sizes of the genomes range between 2.7–3.0 Mb, which ranges in between the relatives of Ca. sporogenes (2.5 Mb) and P. carminithalis (4.1 Mb). The number of annotated genes of the MAGs ranged from 2,815–2,959 (Table 3).

Table 3 UPetromonas tenebris MAG statistics and comparisons to cultured relative genomes.

Phylogenetic analysis of concatenated ribosomal protein sequences shows that the clostridial MAGs found in these reservoir samples are most closely related to Caminicella sporogenes and Paramaledivibacter caminithermalis, along with Maledivibacter halophilus (Fig. 3). These closely related organisms form a group of thermophilic and halophilic Clostridia within the Clostridiales38,39. Both 16S rRNA gene and concatenated ribosomal protein sequences were similar for each of the reservoir MAGs suggesting that the similar organisms were present in all four reservoir samples (Figs. 2 and 3).

Figure 3
figure 3

Phylogeny of UPetromonas tenebris and related genomes based on the concatenated 16 ribosomal proteins: RpL2, 3, 4, 5, 6, 14, 15, 16, 18, 22, and 24, and RpS3, 8, 10, 17, and 1928. The tree was reconstructed using the maximum likelihood algorithm with 100 iterations.

Average nucleotide identities (ANI) are 99-100% among the 4 MAGs, but the MAGs share only 74-75% ANI and 51-61% average amino acid identity (AAI) with the cultured genomes (Table 4).

Table 4 ANI and AAI values of UPetromonas tenebris MAGs versus isolate genomes.

As such, we provisionally name the organism UPetromonas tenebris as the MAGs fit the suggested metrics for establishment of a new genus and species40. The name comes from “petra” (rock), “monas” (single celled organism) and “tenebris” (dark) since the phylogenetic trees show relatives of these organisms are all found in oil reservoirs, the U indicates its uncultivated status.

The analysis of Clusters of Orthologous Groups (COGs) shared with either Ca. sporogenes and P. carminithermalis show that individual MAGs of UPetromonas tenebris have different levels of overlap with cultured relatives, with an average of 323 and 197 unique genes, respectively (Table 3). Genes present in the MAGs that are different than those found in the isolate genomes include a number of oxidoreductases, including Fe-S oxidoreductase, citrate lyase, and CO dehydrogenase-CoA synthase subunits. There were also between 8-13 COGs unique to each MAG that did not appear in any other MAG or cultured genome (Fig. 4). These unique genes were primarily housekeeping genes except that UP. tenebris B7 had a unique COG1719, which is a predicted hydrocarbon binding protein.

Figure 4
figure 4

Clusters of Orthologous Groups shared among the UPetromonas tenebris MAGs B9, A2, B6, B7 and the Ca. sporogenes genome. Unique genes are seen in each MAG: 9 in B9, 8 in A2, 8 in B6 and 13 in B7. There are 147 genes unique in Ca. sporogenes.

The potential phylogenetic history of functional genes in the reservoir MAGs was examined to determine if they had any unusual evolutionary histories. Each gene was individually examined via BLAST and the taxonomy of the best hit was recorded. About 48% of the coding genes in the reservoir MAGs appear to have originated within the Caminicella, Maledivibacter, and Paramaledivibacter genera, with the remainder coming from other Clostridiales or Firmicutes groups (Fig. 5).

Figure 5
figure 5

Potential source phylogenies for genes present in the UPetromonas tenebris bins as determined by DarkHorse, which retrieves the phylogeny of the best BLAST hit for each sequence. Error bars represent standard deviations among the four bins. Nearly 25% of the genes represent the Caminicella genus, the remainder appear more closely related to other groups within the Firmicutes.

Many metabolic pathways are shared between the reservoir MAGs and their cultured relatives. Shared core metabolisms include butyrate fermentation, sporulation, and common two component regulatory systems involving temperature, salt stress, chemotaxis, and flagella regulation. However, many differences between the MAGs and the closely related cultured organisms exist. For example, there are differences in Carbohydrate Active Enzymes (CAZY) between each of the cultured genomes and between the cultured genomes and the reservoir MAGs (Table 5). The reservoir MAGs have unique genes and transporters involved in the metabolism of sucrose which are not present in the cultured genomes. Conversely, the cultured isolate genomes are capable of processing extracellular cellobiose and xylan/xylose, while the reservoir MAGs lack these genes.

Table 5 Differences in Carbohydrate Active Enzymes (CAZY) between the UPetromonas tenebris MAGs and the Caminicella sporogenes and Paramaledivibacter caminithermalis genomes.

Other key metabolic differences include a complete Wood–Ljungdahl CO2 fixation pathway found in three of the four reservoir MAGs, but not in the other genomes (Table 6). COGs of the key enzyme, CO dehydrogenase/acetyl CoA synthase, noted above, were not found in the cultured genomes. Certain sulfur metabolism genes are present in the reservoir MAGs but not in the cultured genomes. These include anaerobic sulfite reductase (ASR), adenosine 5′-phosphosulfate reductase (APR), and sulfite reductase (ferredoxin). ASR is typically part of the assimilatory sulfate reduction pathway, and APR is typically present in dissimilatory pathways, but neither of these pathways is complete in the MAGs. In addition, neither of the qmoABC and dsrMKJOP electron transport complex genes typically found in sulfate reducers are present in the MAGs. As a result, it does not appear that any of the reservoir MAGs and cultured genomes can perform dissimilatory sulfate reduction for energy conservation, but instead use sulfur compounds as electron sinks for fermentation.

Table 6 Differences in metabolism between the MAGs and cultured isolate genomes.

Both Caminicella sporogenes and Paramaledivibacter caminithermalis contain a glycyl radical enzyme with the same active site as the gene annotated as a pyruvate formate lyase (pflD, locus tag AF1449) found in Archaeoglobus species, which may be used to anaerobically metabolize some hydrocarbons41. However, the annotated pyruvate formate lyase genes found in the UPetromonas tenebris MAGs contain a different active site that was found in more typical carbohydrate fermenters, leading to the conclusion that these are indeed typical pyruvate formate lyase genes used in the majority of anaerobic bacteria41. We examined the metabolic profile of the MAGs via standard annotation by PROKKA and also via individual BLAST analysis for known anaerobic hydrocarbon degradation genes42,43. We found no evidence for other anaerobic hydrocarbon metabolism pathways, except that the gamma subunit of acetophenone carboxylase, an enzyme in the ethylbenzene degradation pathway, was present in each of the MAGs. However, the gene encoding ethylbenzene dehydrogenase, the initial enzyme in this pathway44, could not be found. As such, based on current genomes and annotations, UPetromonas tenebris seems not to be capable of utilizing hydrocarbons. We do caveat that as seen in the Archaeoglobus case41, genes may exist for hydrocarbon degradation that are either unannotated or misannotated and full proof cannot be given until a culture is tested.

To determine if the UPetromonas tenebris MAGs came from spores or active, vegetative cells we estimated the index of replication (iRep) of the MAGs in the four oil reservoirs. The resulting iRep indexes ranged from 1.35-1.40 for the reservoir MAGs. For reference, this iRep value is comparable to the median values seen in other environments, including soil (1.34) and human gut systems (1.37–1.42)37, and should be interpreted as a measurement indicative of growth. The value indicates 35–40% of these cells were replicating in situ at time of sampling.

Discussion

The hot and salty oil reservoirs described here represent a challenging environment for microbial growth. Using metagenomic analysis, we found the dominant species in these reservoirs is related to the thermophilic Caminicella/Paramaledivibacter clades of thermophilic and halophilic Clostridiales, and form a distinct clade with other uncultured organisms found in high-temperature oil reservoirs (Fig. 2). Due to these unique features, we propose to name this lineage UPetromonas tenebris (Figs. 2 and 3). Related species have been detected in other oil well systems, including Ca. sporogenes in Oman at temperatures near 60 °C45 and formation water from the high temperature Ekofisk oil field in the North Sea46. While these relatives do not grow at temperatures as high as observed in this environment, hyperthermophilic clostridial species have been documented from oil wells previously47. However, no genomic information is available for comparison to these MAGs.

It is unlikely that these MAGs represent infrastructure contaminants or sporulated, inactive cells. First, 35–40% of these cells were in the process of replication. Additionally, the DNA was readily extractable and this species dominated the community, also suggesting it was not heavily sporulated. Considering the apparent lack of hydrocarbon consumption by these cells, it is plausible that clostridial spores present in the subsurface may have germinated as the oil seeped upward from hotter source rocks to a slightly cooler reservoir formation, where water was present. However, we interpret the low diversity of the community to be reflective of the challenging in-situ environment. We cannot refute the hypothesis that spores may be germinating en route to oil processing inside pipelines, however, we note that communities which show infrastructure influence are typically much more complex, reflecting the increase in electron acceptors and metals available within pipelines, as well as industrially introduced materials4. As such, the entirety of the data presented suggests these are active in-situ.

Other organisms in this system in lower abundance (Fig. 1) include Methanothermococcus methanogens which use hydrogen and formate as electron donors48, and Desulfallas species such as D. gibsonae and D. geothermicum, which utilize simple organic compounds, including some carbohydrates and/or fatty acids, and alcohols such as ethanol, propanol, and butanol, as electron donors and sulfur compounds as electron acceptors, producing carbon dioxide or acetate as end products49. Also present is Petrotoga mobilis, which is a fermenter of a variety of carbohydrates including xylan50. This system may be a syntrophic methanogenic system in which the UPetromonas and Petrotoga ferment complex organic compounds, with Desulfallas and Methanothermococcus scavenging the fermentation products. Based on the analysis of genomes, no alkane metabolizing partner has been detected, compared to other syntrophic methanogenesis systems that were explored via enrichment cultures5,51. Therefore, UPetromonas tenebris represent one of the keystone members of the microbial community inhabiting this harsh environment.

Despite being retrieved from a similar geologic formation, the UPetromonas tenebris MAGs show some distinctions between wells. In particular, they contained between 8-13 unique genes per genome (Fig. 4). They were clearly differentiated from their nearest cultivated neighbors by both ribosomal sequences (Figs. 2 and 3) and genome size and content (Tables 1 and 3, Fig. 4). The variations in genome all seem to be of clostridial origin (Fig. 5). Overall the MAGs in this reservoir system represent a single species, supported by phylogenies (Figs. 2 and 3) and ANI calculation (Table 2), despite the unique genome values mentioned previously.

The UPetromonas tenebris MAGs are the most abundant organisms in this reservoir system, they show signatures of replication, and potential end products of their fermentative metabolism have built up in this system. Yet they lack potential hydrocarbon processing pyruvate formate lyase genes seen in the cultured relatives, and none of the other less abundant organisms have signatures of directly processing hydrocarbons either. Nevertheless, the UPetromonas tenebris in this reservoir have become a dominant species in the environment, likely through fermentation, are growing in-situ and may be responsible for the increase of organic acids in the reservoirs. Due to their potential influence in reservoirs, Firmicutes and clostridial relatives should be regarded as key players in reservoir microbiomes, not only as sporulated, inactive cells. Furthermore, presence of these organisms in produced fluids, drilling fluids, or naturally seeped fluids could be an indicator of the temperature of a connected subsurface reservoir, with applications to oil and gas exploration and development52.