Introduction

Deep-sea hydrothermal vents are unique ecosystems, widely distributed at mid-ocean ridges and other types of spreading centers throughout the global oceans [1]. Electron donors enriched in deep-sea hydrothermal plumes include some organic compounds, such as methanethiol (MT), methanol, formate, acetate, methylamine, alkanes, and aromatics, and these organic compounds can support the growth of microbes in hydrothermal plumes [2,3,4,5]. These organic molecules, particularly MT, may have been abundant on the early Hadean-Archaean Earth, and could have played a role in the emergence of early hyperthermophilic microbial life [2]. Deep-sea hydrothermal plumes have been considered natural laboratories for understanding the ecology, physiology, and function of microbial groups that mediate much of the transfer of elements and energy from the subsurface to deep-sea and from the geosphere to the biosphere [6].

Microorganisms have expansive diversity, metabolic versatility, and functional redundancy in the hydrothermal sediments, and they are responsible for the important organic carbon turnover and nitrogen and sulfur cycling processes [7,8,9,10]. Furthermore, metagenomic and metatranscriptomic analyses have provided insights into the roles of dominant microbial populations involving the oxidation of ammonia [11], hydrogen [12, 13], and sulfur [12, 14] in deep-sea hydrothermal vent plumes. Compared to this relatively well documented biotic cycling of inorganic substrates and role of chemoautotrophic microorganism in mediating plume biogeochemistry, the transformation of organic carbon, nitrogen, and sulfur, especially methylated compounds and petroleum hydrocarbons, have received less attention in deep-sea hydrothermal plumes. The corresponding microbial metabolisms have been documented in a limited number of studies, including organic matter utilizing bacteria and archaea [15, 16] and the presence of various hydrocarbon monooxygenase genes in deep-sea hydrocarbon plumes [17].

While largely unknown in the hydrothermal plume setting, microbes that use methyl-, sulfur-, and petroleum organic compounds have recently been identified in other marine environments. Previous researches on Gammaproteobacteria within hydrothermal plumes have unraveled the important roles on hydrogen and sulfur oxidation (SUP05 group) [13] and methane oxidation (Methylothermaceae) [18]. Nevertheless, several other gammaproteobacterial groups with functions beyond these lithotrophic and methanotrophic activities are not as well studied. Members belonging to genus Methylophaga are a unique group of aerobic, halophilic, non-methane-utilizing methylotrophs that have been isolated from various marine environments and brackish waters [19]. They were found to be dominant in waters influenced by the Deepwater Horizon oil spill [20] and identified as a major organism involved in the methanol breakdown process [21]. Other gammaproteobacterial methano-/methylotrophs and petroleum hydrocarbon degraders, such as Methylococcales and Cycloclasticus, have also been found to be prevalent in marine methane/oil seeps [22, 23]. Cycloclasticus has been discovered from sponge symbionts and oil spill plumes as an obligate oil-degrading microbial genus [20, 23], however, its metabolic potential and functional activities remain elusive in hydrothermal environments. We hypothesized these microorganisms can also live and thrive in hydrothermal plume ecosystems, due to the similar hydrocarbon-rich settings in oil spill and hydrothermal plumes [6, 17].

Unlike other well-studied organisms with genomic characterization, such as Alcanivorax [24, 25] on alkane degradation, Marinobacter [26, 27] and Acinetobacter [28, 29] on hydrocarbon degradation, and SAR324 [30] on alkane and sulfur degradation, these three groups of Gammaproteobacteria have been seldom studied before, except for limited reports of hydrocarbon degrading Cycloclasticus genomes [20, 23]. The distribution pattern of these microorganisms in the marine water column and how they are triggered by the presence of organic compounds are also intriguing, of great ecological significance and still remain unknown at the present time.

Here, we reconstructed metagenome-assembled genomes (MAGs) from the hydrothermal plume and background metagenomes of three hydrothermal fields, and mapped corresponding metatranscriptomic reads to enable genome-specific measurements and quantitative analyses of metabolic activities. Based on these results, we evaluated metabolism with regard to transformations of methyl-, sulfur-, and petroleum organic compounds by three groups of Gammaproteobacteria in deep-sea hydrothermal systems, and discussed their potential influence on plume biochemistry.

Materials and methods

Sample information and metagenome/metatranscriptome sequencing

Hydrothermal plume and surrounding background samples were acquired from the corresponding cruises: R/V New Horizon in Guaymas Basin, Gulf of California (Sampling dates: July 2004), R/V Atlantis in Mid-Cayman Rise, Caribbean Sea (Sampling dates: January 2012) and two consecutive cruises of the R/V Thomas G Thompson in Eastern Lau Spreading Center (ELSC), Lau Basin, western Pacific Ocean (Sampling dates: May–July 2009). Guaymas Basin hydrothermal plume and background samples were collected by CTD and filtered shipboard onto 142 mm diameter 0.2 μm woven polyethersulfone (SUPOR®) filters, and then, preserved with RNAlater right after the collection [12]. Cayman hydrothermal plume and surrounding background samples were collected by Suspended Particulate Rosette (SUPR) filtration device mounted to the remotely operated vehicle (ROV) Jason II [31]. The samples were filtered in situ onto 142 mm diameter 0.2 μm woven polyethersulfone (SUPOR®) filters, and then preserved in situ [31]. The ELSC samples were collected by the SUPR filtration device of ROV Jason II and filtered in situ onto 47 mm diameter 0.8 μm polycarbonate filters, and then preserved shipboard with RNAlater (~6 h post filtration) [32, 33]. Details regarding geochemical analysis, and metagenomic/metatranscriptomic sequencing and QC-processing are described in the previous publications for Guaymas Basin samples [13, 34], Mid-Cayman Rise samples [16, 31, 32], and Lau Basin samples [35]. Detailed cruises and sampling information refer to Supplementary Dataset S1.

Assembling and metagenomic binning

QC-processed reads were assembled de novo by MEGAHIT v1.1.2 with settings as “--k-min 45 --k-max 95 --k-step 10” [36]. Hydrothermal plume and background (P&B) metagenomes from the same hydrothermal site were assembled together. The QC-processed reads were re-mapped to assemblies by Bowtie 2 v2.2.8 with default settings [37]. For each hydrothermal site, hydrothermal plume and background reads were mapped to corresponding assemblies separately; bam files of plume and background reads for individual assemblies were used for downstream binning. Separate mapping of hydrothermal plume and background metagenome reads to P&B de novo co-assemblies from each site resulted in two sets of scaffold coverages that were subsequently used in MAG binning. Subsequently, the assemblies were subjected to a MetaBAT v0.32.4 based binning with 12 combinations of parameters (including the parameters of sensitivity mode, proportion of shared membership in bootstrapping, and minimal contig length) [38]. DAS-Tool v1.0 was implemented to screen MetaBAT MAGs resulted from 12 sets of combination runnings (12-batch of resulted MAGs) for the best set of high quality and completeness MAGs [39]. Outlier scaffolds with abnormal coverage, tetranucleotide signals, and GC pattern within potential high contamination MAGs (value checked by CheckM v1.0.7) and erroneous SSU sequences within MAGs were screened out and decontaminated by RefineM v0.0.20 with default settings [40, 41]. After that, further MAG refinement for decontaminating certain MAGs was manually inspected based on VizBin [42]. MAGs are picked using a threshold of <10% contamination and >50% completeness (Given the incompleteness of MAGs, the absence of metabolic pathways within MAGs does not necessarily indicate their absence in the actual genome).

MAGs genomic property and annotation

The genome taxonomy was determined by RefineM v0.0.20 [40] and GTDB database (http://gtdb.ecogenomic.org). The genomic properties, including genome coverage, genome and 16S rRNA taxonomy, tRNAs, genome completeness, and scaffold parameters, were all obtained using the homemade bioinformatic pipeline which was deposited in GitHub (https://github.com/ChaoLab/GenomePropertyParsingPipeline). Relative genome coverage was calculated by normalizing each metagenome to 100M reads. For ORF annotation, GhostKOALA v2.0, KAAS v2.1, and eggNOG-mapper v4.5.1 were applied to thoroughly annotate ORFs to KOs with default settings [43,44,45]. GhostKOALA and KAAS annotations provide results in KO identifier style. For eggNOG-mapper annotation, we use the following method to translate its annotations to KO identifiers: (a) we use its first KO hit as the final result; (b) if it only has a COG hit, we translate the COG identifier to KO identifier by ‘ko2cog.xl’ provided by KEGG database and take this as the final result. When combining the results of these software, we use the annotation (the KO identifier) from the first software as the final annotation; if there is no annotation from the first software, then we will move to the next software accordingly. Annotation using NCBI nr database (Mar 6, 2017 updated) was done by DIAMOND BLASTP v0.9.28.129 [46] with default settings. Only the top ten hits were retained for each BALSTP run. Within the top ten BLASTP hits of each annotation, we used the first ranked hit as the final annotation. When the first ranked hit is “hypothetical protein”, we will move downward and use the next one, until we find an annotation result which is not “hypothetical protein”. The pipeline for the above annotation modification was deposited in GitHub (https://github.com/ChaoLab/AnnotationModify).

Phylogenetic construction and genome alignment

To define the genome phylogeny, 43 aligned and masked phylogenetically informative marker proteins of MAGs from this study and their closely related reference genomes were retrieved by CheckM. Concatenated markers were applied to construct the phylogenomic tree (genomes that contained <25% informative sites in the concatenated marker alignment were not included in tree construction). The phylogenomic tree was constructed by IQ-TREE v1.6.3 [47] with its self-implemented ModelFinder procedure (best-fit model automatically chosen case by case) and the settings were as follows: “-m MFP -bb 1000 -mset WAG,LG,JTT,Dayhoff -mrate E,I,G,I+G -mfreq FU -wbtl”. To identify synteny genomic regions among genomes, Mauve v2.4.0 was applied to align genomes with progressiveMauve aligning algorithm [48] and 70% cutoff of sequence similarity (nucleotide) within the shared region was used. The gene operons were identified among genomes according to the accurate reports on the reference genomes and their gene arrangement patterns. We built phylogenetic trees for functionally important proteins (e.g., MxaF, XoxF, MtoX, etc.) that had complex phylogenetic subclades. In each phylogenetic tree, proteins that had biochemical activity evidence from references were labeled accordingly. The functions and phylogenetic affiliation information of the rest proteins were deduced from the phylogenetic tree accordingly.

Transcriptomic level analysis and metagenome mapping

The non-rRNA containing metatranscriptomic datasets were used to map the MAG gene sequences respectively using Bowtie 2 v2.2.8 with the “--very-sensitive” option (the same as running with options “-D 20 -R 3 -N 0 -L 20 -i S,1,0.50”) to achieve sensitive and accurate mapping performance. The RPKM (mapped read number per million total reads per kilobase gene length) was calculated by counting both forward and reserve reads (for single-end cDNA library, one end read was used) mapped on each gene sequence. To compare the plume and background gene expression levels, we also normalized the RPKM value by the genome coverage of each genome in the corresponding metagenomes (All the normalized RPKM values were set to 10×, to make the values easy to read and compare. If a given genome was only reconstructed from either plume or background metagenome, we use it as the mapping target for both the plume and background metagenomes) to exclude the influence of genome abundance on the gene transcriptomic level comparison. If it is not additionally stated, nRPKM (normalized RPKM) in the following text stands for genome coverage-normalized RPKM.

A collection of 371 marine water column metagenomes around the world were used as the database to evaluate the relative abundance of MAGs and reference genomes in open oceans around the world (Supplementary Dataset S2). The metagenomic mapping result of each MAG was represented by RPKG (reads per kb per genome equivalent), which serves as a good standard to estimate genome abundance within metagenomes.

Results

Sampling site

Deep-sea hydrothermal plume samples were collected from Guaymas Basin, Mid-Cayman Rise, and the ELSC, respectively. Guaymas Basin in the Gulf of California is located at 2000 m depth and characterized by active seafloor spreading and rapid deposition of organic-rich, diatomaceous sediments. Active hydrothermal venting is predominantly found in the southern portion of the basin where hydrothermal vents, mounds, and chimneys appear on the seafloor [49]. The Mid-Cayman Rise is the Earth’s deepest and slowest spreading mid-ocean ridge. In this study, two hydrothermal vent sites associated with the Mid-Cayman Rise were investigated: Piccard (Cayman Deep), the deepest known hydrothermal vent site, which is basalt-hosted and located at a depth of 4960 m; and Von Damm (Cayman Shallow), an ultramafic-influenced hydrothermal vent adjacent to Piccard (~20 km) at a depth of 2350 m, which is near the summit of an oceanic core complex [16, 50]. The ELSC is located in Lau Basin, a back-arc basin in the Western Pacific Ocean [16]. In this study, six hydrothermal vent sites along the ELSC were investigated. These vents exhibit a range of geochemical properties with vents at the southern end exhibiting higher concentrations of H2S, Fe and Mn, and lower pH [51].

Genome reconstruction and phylogeny

Among metagenomes from these three hydrothermal sites, we reconstructed 206 draft MAGs from the plume and background samples of >50% genome completeness and <10% genome redundancy (Supplementary Dataset S3). This included 22 archaeal MAGs affiliated to Thaumarchaeota and Marine Group II & III, and 184 MAGs affiliated to various bacterial groups, mainly the Alpha-, Gamma-, Delta- and Epsilonproteobacteria (84 MAGs) (Fig. 1). The three major gammaproteobacterial lineages, including Cycloclasticus, Methylophaga, and Methylococcales, were reconstructed with high genome completeness and low genome redundancy (Supplementary Dataset S3). The phylogenomic tree built by concatenated 43-marker protein alignments represents the overall placement of MAGs (Fig. 1). In the Cycloclasticus subtree (Fig. 1 and Supplementary Fig. S1), the closest genomes are from uncultured Deepwater Horizon oil spill (DWH) impacted pelagic samples or mussel/sponge endosymbionts, which are separated from the “isolate” strains (cultured isolates but not unculturable stains or symbiotic strains). In the Methylococcales subtree (Fig. 1 and Supplementary Fig. S2), both the phylogenomic tree and ANI cluster dendrogram support the close genomic relationship between MAGs from this study and deep-sea water column/sediment strains. For Methylophaga (Fig. 1 and Supplementary Fig. S3), both methods show that Methylophaga aminisulfidivorans SZUA-1124 clusters with the seawater isolate from the same species and shares a further relationship with TARA metagenome-binned epi-/mesopelagic MAGs. The normalized MAG coverage indicates these gammaproteobacterial MAGs (Hydro-γ-MAGs hereafter) are present with considerable abundance in hydrothermal plumes in distinct geological settings within three different ocean basins (Fig. 1).

Fig. 1: Phylogenetic tree of the hydrothermal plume and background MAGs based on concatenated marker proteins.
figure 1

The piecharts in the tree stand for genome completeness. The MAG coverage shown in the table was normalized by per 100 million reads in each corresponding metagenome. MAG origin stands for the metagenomes that were used to assemble and bin the MAGs; ‘P’ stands for only plume metagenome, and ‘BandP’ stands for both plume and background metagenomes together. Only the metagenomes with the largest MAG coverage values were listed in the table within the figure. MAGs from this study are labeled bold, others are from online available assemblies. Abbreviations for metagenomes refer to Supplementary Dataset S1.

Gene operons of methyl-metabolism

We identified gene operons for methane oxidation, methylamine utilization (details in Supplementary Information), and MT oxidation. Methanethiol (MT) is an intermediate in the breakdown of abundant organosulfur compounds [52] and can itself be degraded by MT oxidase (MTO), encoded in the operon containing mtoX, SCO1/senC (encoding a copper chaperone), and mauG (sometimes it is a SCO1/mauG fusion, encoding a diheme cytochrome c peroxidase for generating MTO co-factor, TTQ) [52, 53]. Cycloclasticus sp. SZUA-1075 and four of the Methylococcales bacteria contain transcriptionally active mto operons with conserved key residues for the formation of TTQ and copper ligands (Fig. 2a, b). The phylogenetic tree places these mtoX genes from 5 Hydro-γ-MAGs to affiliate with other Alpha- and Betaproteobacteria, of which the active functions of MTO have been tested with genomic and enzymatic evidence (Fig. 2c) [52]. Furthermore, Guaymas Basin metatranscriptome results suggest that mto genes from three Methylococcales bacteria (SZUA-1212, 1314, and 1258) have very high transcriptional expression, and their transcription levels are considerably higher in the plume than in the background (Fig. 2d).

Fig. 2: Genomic, phylogenetic, and transcriptomic analyses of mto genes.
figure 2

a Schematic diagram of aligned mto gene operon of Cycloclasticus sp. and Methylococcales from this study and reference strains. The star labeled mto gene is used as the standard for calculating sequence similarity. b Conserved amino acids for key functions of the MtoX protein. Trp211 and Trp 374 are conserved key residues for the formation of tryptophan tryptophylquinone prosthetic group (TQQ) by MauG-like enzyme, and His89/90, His140 and His412 are conserved key residues of copper ligands. The strains that were experimentally tested with methanethiol oxidizing function (Exp. tested) and successful PCR amplification of mtoX gene are labeled with stars. c Phylogenetic tree of MtoX rooted with a distant SBP56 protein constructed by IQ-TREE. The strains that were experimentally tested (Exp. tested) with methanethiol oxidizing function were labeled with yellow square. d The transcriptional expression of mto genes in the identified genomes. The X-axis stands for the genome-coverage normalized RPKM. Metatranscriptome abbreviations: GyBn Guaymas Basin, C cDNA, NBP neutrally buoyant plume, APB above plume background.

Methyl-compound and C1 metabolism

The Piccard and Von Damm (Cayman Deep/Shallow) metatranscriptome mapping results suggest the high expression of methanol dehydrogenase mxaF gene and Clade 1, 2, and 3 pyrroloquinoline quinone-dependent methanol dehydrogenase xoxF genes of M. aminisulfidivorans SZUA-1124 (Supplementary Fig. S4,a). Interestingly, only the specific Clade 1 xoxF copy (SZUA-1124_00001) has a high expression level, while the other copy shows little transcriptional activity. Meanwhile, the metatranscriptomic data suggest that the xoxF gene had higher abundance of transcripts than mxaF based on the normalized gene expression calculation by genome abundance. M. aminisulfidivorans SZUA-1124 was originally reconstructed from the Cayman Shallow metagenome (Supplementary Dataset S1); the corresponding metatranscriptome mapping results suggest the active expression of genes in methylamine utilization pathways. These results suggest that methylamine utilization occurs mainly in the hydrothermal plume, while the methanol oxidation occurs in both plume and background (Supplementary Fig. S4a). M. aminisulfidivorans SZUA-1124 and three Guaymas Basin Methylococcales bacteria all have high transcriptional activities on formate dehydrogenation and formaldehyde assimilation by the ribulose monophosphate (RuMP) cycle and the Wood–Ljungdahl (WL) pathway (Supplementary Fig. S4b). Nevertheless, for other C1 metabolisms, e.g., formaldehyde dehydrogenation and formamide degradation, no transcripts could be detected, although some Hydro-γ-MAGs possess the encoding genes.

PAH degradation, methane and alkane incorporation

The Cycloclasticus sp. SZUA-1075 MAG (completeness > 90%) has the most complete set of genes for degradation of polycyclic-aromatic hydrocarbons (PAHs), including degradation of various PAHs by dioxygenation and monooxygenation of benzene, toluene and cyclohexane and polycyclic-aromatic ring removal (Supplementary Fig. S5a). Nearly all of them have transcriptional activities in Guaymas Basin plume samples; monooxygenation of benzene to catechol was also active in background seawater (Supplementary Fig. S5a). Other minor dioxygenase encoding genes are also actively expressed by Guaymas Basin Methylococcales bacteria in the plume (Supplementary Fig. S5a).

The phmoA (gene for particulate hydrocarbon monooxygenase subunit A) phylogenetic tree places one identified phmoA gene (SZUA-1075_01584) from Cycloclasticus sp. SZUA-1075 adjacent to those of mussel/sponge endosymbiotic and DWH Cycloclasticus within Group Z, which mediate short-chain alkane oxidation based on genomic and proteomic evidence [23], and the other phmoA gene (SZUA-1075_00092) adjacent to only those of DWH Cycloclasticus but not mussel/sponge endosymbionts within Group X (Supplementary Fig. S6) [17]. The latter gene could be involved in the oxidation of alkanes of various length [23]. The phylogenetic trees of pqq-adh and aor (genes encoding enzymes for alkane oxidation to carboxylic acids) indicate gene copies of SZUA-1075 always form highly supported monophyletic clades with those of mussel/sponge endosymbionts and DWH strains (Supplementary Figs. S7 and S8). These ‘DWH and endosymbiont’ clade gene copies also have higher transcript abundance than those genes from ‘isolate’ clade (Supplementary Fig. S8 and S5b). It is suggested that ‘DWH and endosymbiont’ clade pqq-adh and aor genes are involved with the exact functions of alkane oxidation while the ‘isolate’ clade genes acquire divergent functions [23]. Together with the existence of methylcitrate and beta-oxidation pathways, these results suggest that the alkane oxidation pathway is active in SZUA-1075 [23]. Guaymas Basin metatranscriptome indicates that SZUA-1075 has transcriptional activities on incorporating both short (<C3) and long (>C6) alkanes; however, the former has relatively higher expression (Supplementary Fig. S5b). Meanwhile, the Guaymas Basin metatranscriptome also supports highly expressed Methylococcales pmoA in both plume and background as observed previously [13, 17] (Supplementary Fig. S5b).

Active, concurrent thiosulfate and methane oxidation

Genomic and transcriptomic evidence for thiosulfate oxidation was found in Cycloclasticus and Methylococcales genomes, but not in the M. aminisulfidivorans SZUA-1124 genome (Supplementary Fig. S5c). The concurrent expression of pmoA/phmoA and soxABX within Cycloclasticus and Methylococcales genomes also indicates they could depend on abundant energy sources from surrounding environments. As predicted by Cycloclasticus sp. SZUA-1075 and Methylococcales genomic contents, elemental sulfur could be formed by both flavocytochrome c/sulfide dehydrogenase (fccAB encoded) and sulfide quinone oxidoreductase (sqr encoded), while the metatranscriptome results suggest the former gene has higher transcriptional levels (Supplementary Fig. S5c). Cycloclasticus spp. from this study also contain genomic capacities and transcriptional activities on transforming important organosulfur compounds [54], including MT, dimethyl sulfoxide (DMSO), and methanesulfonate (MSA) (Supplementary Fig. S5c).

Metabolism of Methylophaga aminisulfidivorans SZUA-1124

The Methylophaga aminisulfidivorans SZUA-1124 MAG from this study is currently the only reported high-quality (with 98.9% genome completeness, 1.09% genome redundancy, and 30 contigs) Methylophaga genome reconstructed from hydrothermal environments (Fig. 3). Beyond C1 and methyl-compound metabolisms described above, it could actively transport and utilize urea and reduce nitrate (Fig. 3). The assimilatory sulfate reduction and sulfide oxidation are of high activity (Fig. 3), meanwhile, it could actively reduce arsenate and export them out of the cell. It could actively synthesize siderophore to incorporate iron (III) into the cell, which is absent in other closely related Methylophaga genomes (Fig. 3, Dataset S4 and S5) [55].

Fig. 3: The metabolism of Methylophaga aminisulidivorans SZUA-1124.
figure 3

a The reconstructed metabolic pathway of Methylophaga aminisulidivorans SZUA-1124. The expression levels of individual protein-encoding gene/genes were calculated and labeled accordingly in RPKM. The left square represents the average RPKM value from all Cayman Shallow plume metatranscriptomes, and the right square represents that from all Cayman Shallow background metatranscriptomes. The square and arrow colors correspond to different RPKM value ranges. The protein details of labeled IDs refer to Supplementary Dataset S8. b The presence/absence matrix of functions in closely related Methylophaga genomes. The function IDs are the same as those in Panel (a). The detailed annotations of these close related Methylophaga genomes refer to Supplementary Dataset S8.

No peptide/protein and sugar hydrocarbon transporter encoding genes could be identified, and no complete and effective alkane/PAH-degrading pathways could be reconstructed, suggesting that SZUA-1124 could only depend on small carbons (C1 and methyl-compounds) as energy sources. The active expression of glycogenesis and RuMP cycle genes suggests that SZUA-1124 has a well-established anabolic system for effectively synthesizing sugar hydrocarbons and amino acids for cell life functions (Fig. 3). Similar to that in Oceanospirillales single-cell amplified genome and hydrocarbon plume metagenome and metatranscriptome [56], the active expression of methyl-accepting chemotaxis and flagellar motor switch protein-encoding genes in SZUA-1124 suggests that it could rapidly respond when its targeted substrates (methyl-compounds) become available [57].

Distribution in the ocean

To evaluate the prevalence of Hydro-γ-MAGs in the global ocean and their distribution patterns, we used their genomes as the query to recruit reads from 371 NCBI SRA (sequence read archive) metagenome archives, which provide wide coverage on global ocean waters of different depths. The RPKG (reads per kb per genome equivalent) serves as a good standard to estimate genome abundance within marine metagenomes, and RPKG > 1 stands for a clear sign that a genome is well-represented [58]. There are two general patterns (Supplementary Dataset S5): Cycloclasticus sp. SZUA-1075, Methylococcales bacterium SZUA-1212, and Methylophaga aminisulfidivorans SZUA-1072 only have significant distribution in hydrothermal environments, while the other Hydro-γ-MAGs have detectable distribution in seawater at various depths globally (Supplementary Fig. S9). In the latter group, Methylococcales bacterium SZUA-1258 has an even distribution throughout all depths [expect for high altitude Baltic Sea [59] and North Sea (PRJNA182070) metagenomes], while, the other MAGs are predominantly represented in mesopelagic and bathypelagic waters (Supplementary Fig. S9).

Discussion

We reconstructed detailed pathways for the metabolism of five groups of substrates (methylated and C1 compounds, sulfur-containing organics, PAH, methane and alkane, and sulfur cycling-related compounds) that had been understudied heretofore in deep-sea hydrothermal plumes (Fig. 4). Placement of these pathways into their genomic context allowed us to make predictions about which taxa are able to use these different substrates. For the metabolism of methylated and C1 compounds, two groups of methylotrophic microorganisms, Methylococcales and Methylophaga could utilize methanol and formate, and assimilate formaldehyde [5], while methylamine pathways are exclusively found in Methylophaga genome. Capacities for PAH and alkane degradation were primarily in the Cycloclasticus genome, while methane oxidation could only be found in Methylococcales genome.

Fig. 4: Schematic figure indicating ecology and function of Hydro-γ-MAGs.
figure 4

a Schematic diagram showing how indigenous Hydro-γ-MAGs are distributed in the marine water column and may utilize the chemicals in petroleum oil spills and hydrothermal plumes. b Summary diagram representing the metabolic functions that are mediated by Hydro-γ-MAGs in the hydrothermal environments. Remarks: (i) The flavin-containing monooxygenase [annotated as trimethylamine (TMA) monooxygenase here] was also reported to have broad substrate specificity and could oxidize DMS and DMSO; [71] (ii) The original annotations here are toluene monooxygenase (TmoCF), which have high sequence similarity to methanesulfonic acid monooxygenase (MsmCD) [72]. The function assigned here (oxidizing methanesulfonate) needs further validation.

Meanwhile, metatranscriptome comparison indicates functional variation between plume and background, and among different gene orthologs. For instance, in the Methylophaga aminisulfidivorans SZUA-1072 genome (Supplementary Fig. S4 and details in Supplementary Information), the methanol dehydrogenase gene (xoxF) Clade 1 copy (SZUA-1124_00001) has the highest expression level among all xoxF genes, while the other copy (SZUA-1124_02641) has nearly no expression. A possible evolutionary explanation is that SZUA-1124_00001 is the copy with original methanol oxidizing function and other paralogs have site substitutions, which make them either unexpressed/functionless or subfunctionalized [60]. Meanwhile, the higher expression level of xoxF (SZUA-1124_00001) relative to mxaF (Supplementary Fig. S4) indicates that xoxF is the more important functional gene for methanol oxidation. XoxF-MDHs bind rare-earth elements [such as cerium (III) and lanthanum (III)] as a co-factor that assists PQQ (Pyrroloquinoline Quinone, the prosthetic group for catalyzing) in catalysis. It is speculated that the presence of rare-earth elements at the active sites could confer a superior catalyzing efficiency to their Ca2+-binding MxaF counterparts [61].

Thermogenic interactions between upwelling vent fluids and dissolved organic matter (DOM) (likely subsurface microbial biomass) could be responsible for the production of MT and other organosulfur compounds [2]. The low temperature mixed fluids in Guaymas Basin and Cayman are enriched in MTs (up to 103–104 nM) (Supplementary Dataset S4) [2], consistent with the high expression level of the MT oxidation operon in the hydrothermal plume. The potential oxidation products of MT [52], e.g., formaldehyde and hydrogen sulfide, could be further utilized by plume microbes. Meanwhile, Hydro-γ-MAGs could actively catalyze the methylation of MT to dimethyl sulfide (DMS), further contributing to the microbial removal of MT [52, 62]. This study, to our knowledge, is the first functional evidence that hydrothermal plume microorganisms could utilize the simplest thermochemically-derived sulfur organics for energy yields.

In further support of the above statement that these Gammaproteobacteria are able of transforming important organosulfur compounds, spectroscopic data indicate that the particulate sulfur pool in the Von Damm buoyant plume is dynamic and diverse along plume rising path (Supplementary Dataset S6). The particulate sulfur pool contained a wide variety of organic and inorganic sulfur-bearing functional groups including diverse reduced, intermediate, and oxidized sulfur species (Supplementary Dataset S6). The organic sulfur compounds could be generated by the degradation of pre-existing sedimentary organics in the subsurface, and along the venting path, they were entrained to the plume environments and present in the sulfur pool [63]. Organic sulfur compounds may also form in the plume through the reaction of hydrogen sulfide and polysulfides with organic matter, a process referred to as sulfurization [63]. As these sulfurization products [thiol and organic monosulfide (-R-S-H) and thiophene (aromatic-R-S-R′, R could also be “H”)] are persistent in the buoyant plume, it is more likely that the latter process significantly contributes to the presence of organic sulfur compounds [63]. Components of the sulfur pool that contain the MT bond, are likely biogeochemically cycled by MT-utilizing microorganisms (either via methyl-transferring or MT bond oxidizing pathways), thereby contributing to element and energy transformation activities of plume microbiome. This is also consistent with the functional capacity of Methylophage from Cayman plume on utilizing MT (Fig. 4).

Hydro-γ-MAGs could perform Sox-dependent thiosulfate oxidation with sulfate as the product (Supplementary Fig. S5c); while, other bacteria could actively oxidize methane for energy yields. In addition, they could actively oxidize sulfide to element sulfur as the energy source, consistent with the chemical datasets showing high H2S concentrations (100–101 mM) in Guaymas Basin and Cayman plume samples [2, 64]. This suggests Hydro-γ-MAGs are well adapted to utilize abundant electron donors in hydrothermal environments.

The genome of Methylophaga aminisulfidivorans indicates the complete degradation pathway for methylamines and also suggests the utilization of various hydrothermal vent C1-compounds, including formate, formaldehyde, urea, and methanol (Supplementary Dataset S7). Moreover, the arsenic detoxification pathway, which reduces As (V) to As (III) and extrudes As (III) by arsenate reductase (ArsC) and arsenite transporter (ArsB), is present (Supplementary Dataset S7). Hydrothermal vent fluids are a significant source of arsenic to the deep-sea [65]; collectively, this indicates Methylophaga aminisulfidivorans is well adapted to the chemical background in close proximity to hydrothermal plume, being functionally capable to utilize these organic compounds.

These Hydro-γ-MAGs are major functional players involved in the use of C1-compounds, petroleum hydrocarbons, PAHs, and sulfur cycling (Supplementary Fig. S10) in specific hydrothermal plume environments. The significantly enriched distribution and elevated expression level revealed by this study suggest that they are specifically triggered by these hydrothermally sourced substrates in the plume environments (Supplementary Fig. S10). Hydro-γ-MAGs are prevalent in global marine water columns and have methyl-accepting chemotaxis mechanisms for searching preferable eco-niches with substantial substrates (for Cycloclasticus and Methylophaga) (Supplementary Fig. S10 and Supplementary Dataset S5). Though they are abundant and active in hydrothermal plume environments revealed from this study, these properties enable them to be widely adapted and significantly triggered in environments with similar substrates throughout the marine water column. For instance, as stimulated by similar petroleum hydrocarbons, e.g., alkanes, aromatics, methane, and other C1-compounds, these Hydro-γ-MAGs phylotypes (especially for Cycloclasticus) also play a significant functional role in petroleum oil spill environments.

As indicated by previous research, the heterotrophic degradation of marine DOM stimulates the activity of methylotrophs (e.g., Methylophaga) [66], potentially through degradation of DOM methyl sugars to the substrates of methylotrophs such as methanol or formaldehyde [66]. This successive and synergistic incorporation of DOM could in part explain the ubiquitous distribution of marine methylotrophs in the open ocean [66]. The linkage of heterotrophic DOM degradation and methyl-compound incorporation indicates the metabolic handoff of two groups of microorganisms [67]. The wide accessibility of DOM in the ocean suggests that these methylotrophs can be widely distributed in the marine water column and have a broader ecological impact [68]. The results from our study indicate that such marine methylotrophs could play a metabolic opportunistic role in hydrothermal plumes through their chemotactic properties, thus further improving our understanding of their significance in the ocean carbon cycle. Meanwhile, it will be also interesting to study the contribution of DOM-driven syntrophic activities in plume biogeochemical cycles.

Conclusion

This study reveals the metabolic potential and functional activity of Hydro-γ-MAGs in using methyl-, sulfur- and petroleum organic compound in deep ocean hydrothermal plumes, highlighting the essential importance of these compounds in the ecology and biogeochemistry of plumes and the connections between habitats of hydrothermal plumes, the open ocean, and oil spills. Based on these insights we propose a conceptual model, in which Hydro-γ-MAGs are indigenous microorganisms widely distributed in the global marine water column but their abundance and activity are maintained at a relatively low level due to limited availability of their carbon sources and electron donors for energy metabolism in the pelagic background [69]. As a response to stimulus (hydrothermal plume or petroleum oil spill), they find new habitats with a substantial supply of hydrocarbons and vigorously incorporate the substrates and replicate themselves to become the dominant taxa [20, 56, 70], thus playing significant roles in the biogeochemistry of deep-sea hydrothermal plumes.