Metagenomic views of microbial dynamics influenced by hydrocarbon seepage in sediments of the Gulf of Mexico

Microbial cells in the seabed are thought to persist by slow population turnover rates and extremely low energy requirements. External stimulations such as seafloor hydrocarbon seeps have been demonstrated to significantly boost microbial growth; however, the microbial community response has not been fully understood. Here we report a comparative metagenomic study of microbial response to natural hydrocarbon seeps in the Gulf of Mexico. Subsurface sediments (10–15 cm below seafloor) were collected from five natural seep sites and two reference sites. The resulting metagenome sequencing datasets were analyzed with both gene-based and genome-based approaches. 16S rRNA gene-based analyses suggest that the seep samples are distinct from the references by both 16S rRNA fractional content and phylogeny, with the former dominated by ANME-1 archaea (~50% of total) and Desulfobacterales, and the latter dominated by the Deltaproteobacteria, Planctomycetes, and Chloroflexi phyla. Sulfate-reducing bacteria (SRB) are present in both types of samples, with higher relative abundances in seep samples than the references. Genes for nitrogen fixation were predominantly found in the seep sites, whereas the reference sites showed a dominant signal for anaerobic ammonium oxidation (anammox). We recovered 49 metagenome-assembled genomes and assessed the microbial functional potentials in both types of samples. By this genome-based analysis, the seep samples were dominated by ANME-1 archaea and SRB, with the capacity for methane oxidation coupled to sulfate reduction, which is consistent with the 16S rRNA-gene based characterization. Although ANME-1 archaea and SRB are present in low relative abundances, genome bins from the reference sites are dominated by uncultured members of NC10 and anammox Scalindua, suggesting a prevalence of nitrogen transformations for energy in non-seep pelagic sediments. This study suggests that hydrocarbon seeps can greatly change the microbial community structure by stimulating nitrogen fixation, inherently shifting the nitrogen metabolism compared to those of the reference sediments.

microbial community structure and function 11,12 . Metagenome sequencing has been previously performed to better assess the microbial response to petroleum seeps in GOM 13 . However, most of the analyses performed focused on functional genes 14,15 rather than genomes. Microbial genomes in hydrocarbon-impacted sediments were reported in the Guaymas Basin 16 and compared to the nearby reference sites 17 , to elucidate the changes in metabolic functionality and dependencies caused by hydrocarbon seepages. In Guaymas Basin sediments, despite contrasting community compositions, similar overall community functionality was observed between hydrothermal and non-hydrothermal sediments 17 . Functional redundancy of the microbial communities was invoked to explain this observation 17 .
In this study, we collected sediment cores from multiple sites with and without natural hydrocarbon seepages in the GOM to investigate the impacts of this environmental disturbance to microbial communities. We focus on the subseafloor sediments to test the hypothesis that the seep-related communities persist at low metabolic rates in the deep sediments and are insensitive to hydrocarbon seepages, as had been previously suggested for GOM sediments 11 . Here we employ both gene-and genome-based approaches to characterize the microbial community composition and structure, overall metabolic functionality, and inter-dependencies by comparing the metagenomes from seep and reference environments. With our genomic and functional insights into the microbial communities inhabiting the two contrasting geochemical conditions, we see that in addition to microbial cycling of sulfate and methane, nitrogen cycling dynamics must be severely shifted between seep and reference sediments.

Materials and methods
Sampling and site characterization. Samples used in this study were retrieved by multiple push coring via a remotely operated vehicle aided by video feed, from two contrasting areas: Seep-1 and Seep-2 in the seabed of the Gulf of Mexico (GOM) ( Table 1; Fig. 1a). Seeps were identified using a hull mounted multibeam echosounder which identified the natural gas plumes as indicators of active seepage as well as the nature of the seafloor (Fig. 1b,c). Sediment cores were retrieved by an ROV using a push-corer. In Seep-1, one core (D27) was collected from the center, two cores (D21 and D30) were collected from the edge, and the other two samples (D24 and D33) were collected halfway between the center and edge of this seep location (Fig. 1d). In Seep-2, two cores were retrieved from the central area: D72 was a site with white bacterial mat, while D75 was a site where seep bubbles were observed. Sediments of 10-15 cm below seafloor (cmbsf) of each core were sterilely subsampled, stored in an Falcon tube, and preserved in liquid nitrogen for transport to an onshore laboratory where they were stored at −80 °C until further DNA-based analyses were performed.
Each push core collected was subsampled for a variety of analyses. After sub-sampling all remaining sediments were placed in 500 ml isojars and poisoned with zephiran chloride. Geochemistry measurements were performed at Isotech Laboratories (Champagne, IL, USA). Liquid hydrocarbon presence/absence was determined for all push core samples (seep and background) based on maximum florescence intensity (MFI) and unresolved complex mixture (UCM) as screening tools, and detailed GC chromatograms for confirmation. Dissolved hydrocarbon concentrations were determined in seawater above the active seep bubble plumes at site D27. Sample was taken 1 meter above origin of the visible bubble plumes and analyzed by Isotech Laboratories (Champagne, IL, USA) for hydrocarbon composition and composition of methane, ethane, and propane. DNA extraction and metagenomic sequencing. For each sample, DNA was extracted from 0.5 gram (duplicate extractions with 0.25 gram used for each) of sediment using the MoBio PowerSoil DNA isolation kit following the manufacturer's instructions. Resulting DNA from the two duplicate extractions were pooled and eluted into 100 uL of PCR grade water. DNA concentrations were measured using a Qubit 2.0 fluorometer (Thermo Fisher Scientific). Metagenomic libraries were prepared using NEBNext UltraN II DNA library preparation kit and sequenced (2 × 250 bp) on an Illumina HiSeq platform (Illumina Inc.) at the Sequencing & Genotyping Center, University of Delaware.
Quantitative PCR. The bacterial 16S rRNA gene was quantified using the primer pair B1114/B1275R and thermal cycling conditions as described in Yoshimura, et al. 18 . Archaeal 16S rRNA gene was quantified using primers Uni515F/ Arc908r and thermal cycling conditions as described in Zhao, et al. 19 . The mcrA gene, encoding the alpha subunit of Methyl-CoM reductase (MCR) complex, was quantified using the primers qmcrA-F/ qmcrfA-R 20 , combining with the thermal cycling conditions previously described 18 . All qPCR reactions were run in triplicate and each reaction mixture contained 1 × QuantiTech SybrGreen PCR master mixture (QIAgen, Metagenome data analysis. Raw metagenome sequencing reads were first quality controlled using FastQC v0.11.5 21 , and then trimmed using Trimmomatic v0.36 22 to trim read-through adapters (ILLUMINACLIP:TruSeq. 2-PE.fasta:2:30:10), trim low quality base calls at the starts and ends of reads (LEADING:3, TRAILLING:3), remove reads that had average phred score lower than 20 in a sliding window of 10 bp (SLIDINWINDOW:10:20), and finally remove reads shorter than 100 bp (MINLEN:100). The overall quality of processed reads was evaluated again with FastQC v0.11.5, to ensure only high-quality reads were used in the downstream analysis.
Gene-based analysis. Short reads of the 16S rRNA gene in the trimmed sequences were identified and taxonomically classified using two different programs: phyloFlash v3.2 beta1 23 and GraftM v0.12.2 24 . The former program uses BBMap 25 to identify and VSEARCH v2.5.0 26 to classify putative 16S rRNA gene reads with SIVLA 132 release 27 as the reference database, while the latter program uses curated packages prepared from RDP 28 to identify 16S rRNA gene reads and pplacer 15 to assign the taxonomy. In addition, putative functional genes encoding key enzymes in nitrogen, sulfur, and methane cycling pathways were also identified and classified using GraftM 24 with the default parameters and the respective curated reference packages (available at https://data.ace.uq.edu.au/public/graftm/7/). The relative abundances of each gene in each sample were normalized by dividing by the total read pairs in the trimmed dataset of that sample.
Metagenome assembly and genome binning. Metagenome sequencing data from the samples of the two different groups (Seep sites: D27, D33, D72 and D75; Reference sites: D21 and D30) were processed separately; the transition site, D24, was not analyzed further. To maximize assembly, reads from different samples of the same group were co-assembled using megahit v2 29 with the following parameters:-k-min 21,-k-max 121,-k-step 10,-min-contig-len 500. Contigs longer than 1 kb were subjected to automatic genome binning with MaxBin v2.2.5 30 using the maximum expectation algorithm based on the tetranucleotide frequency, %GC, and Seep-2 (c) in GOM. Natural seepage detected by sonar is indicated by the grey gas bubbles. Images were created using QPS Fledermaus software (https://www.qps.nl/fledermaus/) (d) Settings of the 7 sampling sites: seep sites are shown in red circles and reference sites without seepage influence are shown in blue circles. D24 (marked in orange) was retrieved from the edge of Seep-1, but no geochemical signal of seepage was detected. differential coverage. The quality of the resulting genome bins were evaluated with CheckM 31 using the default parameters. Genome bins of contamination level >10% were manually refined using gbtools 32 , taking the %GC, marker gene taxonomy, and differential coverages in different samples into account. The qualities of the final bins were quality checked using CheckM again, only those of >50% of completeness and <10% of contamination were retained as metagenome-assembled genomes (MAGs) and reported in this study, to include most of the final MAGs to discuss the overall metabolic shift of sedimentary microbial communities caused by hydrocarbon emission. To determine the relative abundance of each MAG in each sample, coverage of each genome was determined by mapping the trimmed reads onto the genome assembly using BBmap v37.61 25 , with the similarity threshold of 98%. Coverages were processed in R 33 and visualized by heatmaps prepared using the R package ggplot2 34 .
Genome annotation and functional characterization. Genome bins were annotated using Prokka v1.13.3 35 , with default settings for bacteria and the "-kingdom archaea" flag for archaeal genomes. Putative coding DNA sequences (CDS) were predicted by Prodigal 36 in the Prokka annotation process. The functional assignments of genes of interest were also confirmed using BLASTp 37 and annotations of KEGG 38 and eggNOG 39 . The overall metabolic capacities of MAGs were visualized through a heatmap prepared using KEGG-decoder V1.0.6-1.0.8 40 based on the annotation results from KEGG annotation. phylogenetic analysis. Phylogenomic analysis was performed based on the concatenation of 13 ribosomal proteins (rpL2, 3, 4, 5, 6, 14, 15, 18, 22, 24 and rpS3, 8, 10, 17, 19). These ribosomal proteins have been demonstrated to undergo limited lateral gene transfer 41 , and have been included in large scale phylogenetic analysis of bacteria, archaea, and eukaryotes 42 . Close relative genomes of the MAGs recovered in this study were identified by BLASTp search 43 in the NCBI webserver, using the ribosomal protein L2 as the query. These ribosomal proteins in each genome (the 49 MAGs recovered from GOM and their close relatives), among the conservative single-copy ribosomal proteins included in Campbell, et al. 44 , were identified in Anvi' o v5.4 45 by Hidden Markov Model (HMM) search following the procedure described here (http://merenlab.org/2017/06/07/phylogenomics/). Sequences were aligned individually using MUSCLE 46 and gaps were removed using trimAl 47 with the "automated1" mode. Individual alignments were concatenated and the maximum likelihood phylogenetic tree was reconstructed using RAxML 48 using PROTGAMMALG as the substitution model. Interactive tree of life (iTOL) 49 was used for tree visualization and modification.
For the phylogenies of nitrogenases (NifH and NifK), the corresponding protein sequences from the GOM genomes were manually extracted from the Prokka annotation outputs. These protein sequences were also used as the queries in BLASTp 43 searches in the NCBI webserver, to identify their close relatives (up to 5 close relatives were retained). For each of the two proteins, all sequences were aligned using MAFFT LINSi 50 and gaps were removed using trimAl 47 with the "automated1" mode. Maximum-likelihood phylogenetic trees were reconstructed using IQ-TREE v.1.6.10 51 with substitution model determined by ModelFinder 52 , and 1000 ultrafast bootstrap replicates.

Results
We collected 7 sediment cores from two seeps and their adjacent areas in GOM (Fig. 1, and Table 1). Thermogenic gases were detected in most of the subsurface sediments retrieved from the two seep sites (except D24), but not the reference sites. In addition, liquid hydrocarbons were also noted in the two sediments (D72 and D75) from Seep-2 (Table 2). At the site with visible bubble plume, D27 at the center of Seep-1 area, thermogenic gases were also detected in the overlying seawater (139 ppm methane, 0.16 ppm ethane, and 0.049 ppm of propane). Based on these geochemical characterizations, we divided these samples into two groups: D27, D33, D24, D72 and D75 are within seep areas influenced by natural liquid and gaseous hydrocarbon seepages, while D21 and D30 serve as references without obvious seepage influences.

DNA concentrations and gene abundances.
We extracted the total DNA and used qPCR to quantify the abundance of both the archaeal and bacterial 16S rRNA genes. With the same input materials in DNA extraction, we obtained higher DNA yields in the seep sites than the reference sites (Student t-test, t = −7.92, P = 0.004) ( Table 1). Consistent with this, bacterial 16S rRNA gene abundances were detected to be 1.3-4.7 × 10 8 copies g −1 at the seep sites, while only 7.8-8.3 × 10 7 copies g −1 were detected at the reference sites (Student t-test, t = −2.81, P = 0.067) (Fig. 2a). Likewise, archaeal 16S rRNA gene abundance was quantified to be 0.9-3.6 × 10 7 copies g −1 www.nature.com/scientificreports www.nature.com/scientificreports/ at the seep sites, while only 7.6-7.8 × 10 5 copies g −1 at the reference site (Student t-test, t = −2.94, P = 0.055) (Fig. 2b). Higher DNA yields and 16S rRNA gene abundances detected at the seep sites indicate hydrocarbon seepage increases the overall microbial abundance.
Different community structures were observed based on 16S rRNA genes at the two types of sediment sites, although the community compositions were similar (Fig. 3b). Microbial communities at the seep sites were dominated by archaea, especially the ANME-1 archaea within the phylum of Euryarchaeota (45.8-59.3% of total community), which is in agreement with the fact that these are active seep sites, supported by the visible methane gas plumes present at both sites (Fig. 1b,c) and detection of gas (Table 2). Other dominant taxa (Euryarchaeota, Chloroflexi, Deltaproteobacteria, and Atribacteria) present in the seep samples were also detected in the reference sites (Fig. 3b). In contrast, the reference sites were dominated by bacteria, especially members of Chloroflexi, Planctomycetes (mainly the genus of Scalindua), and Deltaproteobacteria, with archaea only accounting for small fractions (8.8-11.0% of the total community) (Fig. 3b). We note that Deltaproteobacteria and, to a lesser extent (<2%), Latescibacteria, Patescibacteria (i.e. Candidate Phyla Radiation, CPR) and Zixibacteria were present with similar proportions in the two types of sediments.
Functional gene fractions. We used GraftM to detect various functional genes involved in biogeochemical cycles. We detected higher fractions (0.03-0.05%) of mcrA gene encoding the methyl coenzyme M reductase involved in methane/alkane metabolism at the seep sites, while mcrA is virtually absent in the reference sites  www.nature.com/scientificreports www.nature.com/scientificreports/ (Fig. 4c). This observation was also confirmed by mcrA gene-specific qPCR, in which ~20-fold higher mcrA gene abundances were quantified in the seep sites than the reference sites (Fig. 1c). In addition, the nifH gene which encodes nitrogenase, the key enzyme of nitrogen fixation process, was also observed to be higher in the seep sites than the reference sites (Student t-test, t = −8.30, P = 0.0033) (Fig. 4c). The majority of the gene fragments of these two genes (92-95% of mcrA and 32-39% of nifH) were contributed by contributed to by genes that are similar to those from known euryarchaeotal organisms.
We observed higher fractions of functional genes for denitrification (narG, nirK, and nosZ) at the reference sites than the seep sites (Fig. 4a). Similarly, mxaF_xoxF gene encoding the alpha subunit of methanol dehydrogenase in aerobic methane oxidizing bacteria 53 , is also observed to show higher percentages in the reference sites than the seep sites (Fig. 4a). These results suggest that the reference sediments harbor more microbes capable of utilizing nitrate and oxygen as electron acceptors in redox reactions. In addition, the functional gene for anaerobic ammonium oxidation (hzo encodes the hydrazine dehydrogenase of the last step of anammox) was mostly detected in the two reference sites (Fig. 4b), suggesting that anammox may only be significant in the reference sites. In contrast, genes of sulfate reduction (aprA encoding adenosine-5-phosphosulfate (APR) reductase subunit alpha and sat encoding sulfate adenylyltransferase), carbon fixation (RuBisCo), and hydrogen utilization (NiFe-and FeFe-hydrogenase) were also detected in both groups of samples, without significant differences in the relative abundances between the two groups of samples (Fig. 4d).

MAG distribution and function in GOM sediments.
Following metagenome assembly and binning, a total of 49 MAGs (29 from the seep samples and 20 from the reference samples) were recovered from the metagenome datasets, excluding sample D24. Each MAG was classified by constructing a phylogenetic tree using concatenated 13 single-copy ribosomal proteins (Fig. 5). These MAGs represent 13 bacterial and 2 archaeal phyla commonly found in marine sediments. The bacterial MAGs were dominated by the phyla of Chloroflexi (Anaerolineae and Dehalococcoidia), Deltaproteobacteria (Desulfobacteraceae and Syntrophobacter), Aminicenantes, NC10, Planctomycetes, and Zixibacteria (Fig. 6a). Archaeal MAGs was dominated by the phyla of Euryarchaeota (ANME-1 and ANME-2) and Bathyarchaeota (Fig. 6a). However, their distributions (using the average coverage of each MAG in each sample as the proxy) are markedly different between the sites with and without seepage influences (Fig. 6b). MAGs recovered from one environment are generally absent in the other environment (Fig. 6), except for the MAGs of Bathyarchaeota and Deltaproteobacteria, which seemed to be cosmopolitan in both environments as suggested by the 16S rRNA gene analysis (Fig. 2b).
Consistent with the 16S rRNA gene analysis, MAGs recovered from the seep sites were dominated by Euryarchaeota (mainly ANME-1 archaea), Bathyarchaeota, Anaerolineae, Aminicenantes, and Zixibacteria (Fig. 6b). Some ANME archaeal MAGs from the seep sites, including GOM_Seep_010, GOM_Seep_023, and GOM_Seep_027, encode molybdenum-dependent nitrogenase (Nif) that could enable them to fix dinitrogen gas  www.nature.com/scientificreports www.nature.com/scientificreports/ to ammonia (Fig. S1). In contrast, MAGs from the reference sites without seepage influences were dominated by bacteria from the taxa of NC10, Scalindua, Woeseia, Deltaproteobacteria, and Gemmatimonas (Fig. 6b). We also found a novel bacterial MAG affiliated to the recently proposed phylum Muproteobacteria, which is absent in the seep sites (Fig. 5).

Discussion
We performed a metagenomic study focusing on subseafloor sediments from two contrasting habitats in the Gulf of Mexico to examine the microbial responses to seafloor hydrocarbon seepage. We used five samples from sites bearing natural hydrocarbon seepage and two from nearby reference sites without detectable seepage. Based on gene-and genome-centric analyses, this study reveals that hydrocarbon seepage can significantly alter the community structure. Although metagenome-assembled genomes are helpful to provide information of microbes in natural environments beyond their identities 16,17,54,55 , complete microbial genomes are rarely reconstructed from natural environments 56 . Importantly, reconstructed genomes often lack 16S rRNA 57 , which is a robust phylogenetic marker and is critical to constrain the taxonomy of novel genomes on the basis of existing extensive 16S rRNA gene sequences 23 . Also, the assembly-binning approaches typically only recover members of a community that have sufficient coverage, minimal population heterogeneity, few sequence repeats and distinct genome nucleotide composition 58 . As a result, current genome-centric approaches do not capture the full phylogenetic and functional diversity of a microbial community, particularly in complex environments. These deficiencies have led to the emergence of gene-centric analysis pipelines such as phyloFlash 23 and graftM 24 , which bypass the bottleneck of assembly of complex communities and can provide a comprehensive view of diversity of the 16S rRNA gene and other functional genes based on well-curated reference databases. Combining our gene and genomic analysis results, we offer detailed insights into the subsurface community.
Microbial community at site D24 showed mixed features of both seep and reference sites, although this sample was collected within Seep-1. Most of the analyzed microbial features such as the DNA yield, 16S rRNA gene and mcrA abundances assessed by qPCR (Fig. 2), ratio of 16S rRNA gene in the metagenome (Fig. 3a), abundance of ANME archaea (Fig. 3b), and functional gene ratios of hydrogenase and nitrate reductase (Fig. 4a), was lower than the seep sites but higher than the two reference sites. Considering that hydrocarbon seepages in GOM are ephemeral and can migrate over time 59 , it is very likely that the duration time of seepage at this site was shorter than the rest seep sites. Therefore, we excluded this site for the MAG comparison analysis.
High fractions of ribosomal RNA in the seep sites suggesting active growth. We detected significantly higher fractions of the 16S rRNA gene in the metagenome at the seep sites than the reference sites. Microbes with different growth rates tend to have different copies of ribosomal RNA operons (rrn), and microbes exhibiting higher reproductive rates were revealed to have higher rrn copy numbers 60 . Therefore, the copy number of genes encoding the ribosome has been suggested as a predictor of both growth rate and carbon use efficiency because of a proteome allocation trade-off 60 . The different overall microbial community structures, especially the markedly different proportions of archaea observed between the two types of sites, could explain the different fractions of 16S rRNA gene in the total communities. However, archaea are known to have fewer copies of rrn per genome than bacteria 61 and therefore are not likely to cause the higher fractions of 16S rRNA genes detected in the metagenomes of the seep sites. The faster proliferation rates of microbes stimulated by energy-rich substances emitting from the subsurface seepages is the most plausible explanation of the higher fractions of rrn in the metagenomes of the seep sites. Faster growth rates at the seep sites could also explain the higher microbial abundances detected by qPCR measurements (Fig. 1). Microorganisms in marine sediments beyond the bioturbation zone are always assumed to persistent in a maintenance state with turnover time of hundreds of years; 62,63 our data, however, reveal that subseafloor microbes can respond to and experience significant growth due to the impact of hydrocarbon seepages. Whether the subseafloor microbes are still undergoing growth or not is unclear. Experimental approaches such as H 2 18 O-labeling 64,65 are necessary to apply to marine sediments to confirm the proposed microbial in situ growth here and elsewhere 19 .
The overall community structure and functionality differences between the two types of sites. The overall microbial community structure was significantly changed by the presence of active hydrocarbon seepage (Table 2; Fig. 3). The notable dominance of ANME archaea detected at the seep sites (~50% of total) can be explained by the growth stimulated by methane in hydrocarbon seepage. It is worth noting that such archaeal dominances were not detected in the seep sites by qPCR of 16S rRNA genes, which is likely attributed to the known primer biases against archaea 66,67 . In the reference sites Planctomycetes (mainly anammox Scalindua), NC10, Acidobacteria, Alphaproteobacteria and Gammaproteobacteria were more frequently detected, suggesting that these groups are well adapted to the typical pelagic sediment environment found in the reference sites 68 . The relative abundances of Deltaproteobacteria, Latescibacteria, and Zixibacteria are similar between the two types of sediments, indicating these are part of the core microbiome in the sediments and are not affected by the hydrocarbon seepage.
We also assessed the potential functional differences between the two types of environments based on the functional gene contents in the metagenomic datasets. We detected notably high fractions of mcrA and nifH genes in the seep sediment metagenomes (Fig. 4c), both of which are largely derived from ANME archaea 69,70 . ANME archaea were >45% of the taxa detected in the seep sites while virtually absent in the reference sites (Fig. 3). ANME archaea at the seep sites could have exhibited significant growth under the stimulation of hydrocarbon seepage, supported by the 20-fold higher mcrA gene abundances relative to the reference sediments (Fig. 2c). The nitrogen fixation capacity of ANME archaea could provide ammonia as the nitrogen source for biomass synthesis during the microbial proliferation in the seep sites. The bloom of ANME archaea have resulted in higher methane oxidation rates previously noted in GOM cold seep sediments compared to reference sediments 71  www.nature.com/scientificreports www.nature.com/scientificreports/ in the relative abundance of archaea was also observed in the hydrothermal sediments in Guaymas Basin 17 . We also detected significantly higher fractions of mtrA gene in the metagenome datasets at the seep sites, the function of which was suggested as iron oxidation 72 , suggesting that iron oxidation could be more strongly affected at the seep sites than the reference ones.
We detected notably higher fractions of the functional genes for denitrification (narG, nirK and nosZ), anammox (hzo gene), and aerobic methane oxidation (mxaF/xoxF gene) in the reference sites, suggesting that these processes are more prevalent in typical pelagic sediments. This is consistent with the fact that the genomes of Woeseia and NC10, both capable of nitrate/nitrite reduction, and anammox Scalindua were mainly or exclusively recovered in the reference sites by metagenome assembly and binning (Fig. 6). Interestingly, in the reference sites we also detected higher abundances of the mxaF/xoxF genes, which are involved in aerobic methane oxidation 53 . This suggests that methane consumption in the reference sites is largely coupled to oxygen respiration, and that oxygen may be more steadily available than in the seep sites. This increase in oxygen availability in the sediments is likely due to lower overall microbial activities than those in the seep sediments under the stimulation of reduced substance fluxes.
Functional genes of carbon fixation (RuBisCo), sulfate reduction (sat gene), and hydrogen oxidation (NiFeand FeFe-hydrogenase) are not statistically different between the two types of sediments. This observation suggests that microorganisms responsible for these processes are core members of the microbial communities in these deep-sea sediments and are not affected by hydrocarbon seepages. This is consistent with the similar amounts of Deltaproteobacteria (mainly Desulfobacterales) detected in both environments by 16S rRNA gene-based analysis (Fig. 3b) since Desulfobacterales have been suggested to play an important role in H 2 consumption and sulfate reduction in marine sediments 73 . Metabolic dependency between different groups in the two contrasting environments. To better understand the biogeochemical impacts of hydrocarbon seepage on sedimentary microbial communities in the Gulf of Mexico, we examined the major ecological roles of the dominant genomes in the two contrasting types of sediments (Fig. 7). In the seep sites, Chloroflexi (mainly Anaerolineae) and Zixibacteria are the major organic matter degraders that are capable of degrading organic matter to low-molecular weight (LMW) carbon compounds such as acetate and ethanol 16,17,74 . These LMW compounds can be used as the electron donor by sulfate-reducing Desulfubacterales. In addition to using sulfate as the electron acceptor, Desulfubacterales may also form consortia with ANME archaea and accept electrons from ANME archaea when performing sulfate-dependent methane oxidation under the stimulation of the presumed upward flux of methane from the subsurface. In addition, the higher relative abundances of Bathyarchaeota in the seep sites (Fig. 6) may be due to their capability of aromatic compound degradation 75 and could contribute to the hydrocarbon degradation in GOM seep sediments. The co-enrichment of ANME and Bathyarchaeota has also been reported in the hydrothermal-influenced biofilm on the chimney wall of the Soria Moria vent field 76 , supporting that these two archaeal groups may have similar responses to environmental impacts.
The initial hydrolysis and fermentation of organic matter in the reference sediments is likely performed by the abundant taxa Chloroflexi and Zixibacteria (Fig. S2). The further degradation of organic matter could be carried out by putative denitrifying bacteria capable of using nitrate/nitrite as the electron acceptors, such as Woeseia 77 and NC10 bacteria 78 . Nitrite produced from nitrate reduction and ammonium produced from the degradation of organic matter could be consumed by microbes of the genus of Scalindua (Fig. 7), a specific taxon capable of anaerobic ammonium oxidation 79 . The dominance of these taxa in the recovered MAGs is consistent with higher fractions of denitrification and anammox functional genes detected in the reference sites, suggesting that nitrogen transformation for energy generation is prevalent in the typical pelagic sediments in GOM.
Our results run counter to conclusions from the previously studied seeps in the Gulf of Mexico, where deeper communities were not seen to be greatly impacted compared to the surface-expression of the seeps 11 . Perhaps the removal of the surface comparison allows a more even comparison between deep sediments. However, our study also runs counter to the data from Guaymas Basin sedimentary seeps, where communities in-seep and outside-of-seep were found to be functionally redundant 17 . Guaymas Basin sediments may be under different nutrient limitations compared to Gulf of Mexico sediments, so nutrient influences on microbial communities may not be as readily seen there. Our work shows that seepage influences on the microbial community structure should be complexed with additional measurements to fully understand what limits or supports community development and geochemical cycling.
Overall, the finding that nitrogen fixation is largely present at the seeps versus nitrogen loss (via denitrification and annamox) present at the reference sites (Fig. 4b), suggests that the carbon use efficiency varies drastically between the two settings. While carbon and nitrogen dynamics are not often focused in the seafloor, the study of soils has shown that C and N balances greatly impact microbial processing 80 and nitrogen fixation has been documented in carbon enriched sites 12 . This contrasting nitrogen metabolism pattern in GOM sediments -the prevalence of nitrogen fixation at seep sites and nitrogen loss at non-seep sites -was also supported by 15 N isotope compositions of previous studies 71 . Microbial selection toward nitrogen-fixing bacteria in marine sediments has also been reported in laboratory microcosm containing hydrocarbon (crude-oil) 81 . Considering GOM seep sediments are considered organic-rich (3-6% weight percent 71 ) and nitrogen-poor (C/N ratio in the range of 20-40 in sediments <10 cm 71 ), our results indicate that the newly fixed nitrogen by diazotrophs is a prerequisite of the rampant microbial growth in seep sediment since N limitation may limit microbial growth. Meanwhile, our results show that outside seep environments, the deep pelagic sediments (>1,400 m water depth) may use nitrogenous compounds as an energy source. This investigation of seep and reference sediments from the Gulf of Mexico deep biosphere shows that sedimentary microbes may be limited by different factors in adjacent settings due to the bottom-up environmental perturbation of natural hydrocarbon seepage.

Conclusion
Our study revealed the responses of subseafloor microbial communities to hydrocarbon seepage in deepwater GOM, by employing both gene-and genome-based analysis of metagenomes. Particularly, based on 16S rRNA gene analysis, community structure was revealed to be significantly altered, with higher fractions of 16S rRNA gene and the notable dominance of ANME archaea in the seep sites. Further, functional gene analysis suggested that anaerobic methane oxidation and nitrogen fixation, both presumed to be performed by ANME archaea, were intensified by hydrocarbon seepage, while nitrogen transformation such denitrification and anammox are dominant in the reference sediments. These metabolic features predicted in gene-based analysis were confirmed by genomic analysis, with high relative abundances of ANME archaea in the seep sites and the dominance of denitrifying Woeseia and anammox Scalindua in the reference sites. Collectively, this study revealed that natural seafloor hydrocarbon seepages in the deep ocean can alter the microbial community structure and change the dominant redox regime from nitrogen loss for energy production to sulfate-based methane oxidation and nitrogen fixation.

Data availability
Metagenomic sequencing data used in this study are available in NCBI under the BioProject number PRJNA553005.

Figure 7.
Major geochemical processes and the associated microbial genomes in the seep (upper) and reference (lower) sites of GOM. Only metabolic pathways of the dominant microbial genomes in the two contrasting sediment sites are shown. Arrows represent metabolic capabilities that were inferred in the MAGs reconstructed from GOM deep-sea sediments by genome annotation. Taxa enriched in the seep and reference sites are shown in orange and blue, respectively, while the common taxa in both sites are shown in black. OM: organic matter.