Deep sea sediments associated with cold seeps are a subsurface reservoir of viral diversity

In marine ecosystems, viruses exert control on the composition and metabolism of microbial communities, influencing overall biogeochemical cycling. Deep sea sediments associated with cold seeps are known to host taxonomically diverse microbial communities, but little is known about viruses infecting these microorganisms. Here, we probed metagenomes from seven geographically diverse cold seeps across global oceans to assess viral diversity, virus–host interaction, and virus-encoded auxiliary metabolic genes (AMGs). Gene-sharing network comparisons with viruses inhabiting other ecosystems reveal that cold seep sediments harbour considerable unexplored viral diversity. Most cold seep viruses display high degrees of endemism with seep fluid flux being one of the main drivers of viral community composition. In silico predictions linked 14.2% of the viruses to microbial host populations with many belonging to poorly understood candidate bacterial and archaeal phyla. Lysis was predicted to be a predominant viral lifestyle based on lineage-specific virus/host abundance ratios. Metabolic predictions of prokaryotic host genomes and viral AMGs suggest that viruses influence microbial hydrocarbon biodegradation at cold seeps, as well as other carbon, sulfur and nitrogen cycling via virus-induced mortality and/or metabolic augmentation. Overall, these findings reveal the global diversity and biogeography of cold seep viruses and indicate how viruses may manipulate seep microbial ecology and biogeochemistry.


Introduction
Marine cold seeps are typically found at the edges of continental shelves and feature mainly gaseous and liquid hydrocarbons from deep geologic sources [1,2]. Seep fluids can be produced through biological processes (methanogenesis in shallow sediments) or derive from thermogenic oil and gas systems that have been present for long periods of time in more deeply buried strata [3,4]. In the context of global climate change, methane and other short-chain alkanes escaping from deep sea cold seep sediments can reach the atmosphere, exacerbating the greenhouse effect [5]. Understanding biogeochemical cycling in marine sediments associated with cold seeps is thus important for meeting critical energy and climate challenges.
Cold seeps are chemosynthetic ecosystems and contain an extensive diversity of archaea and bacteria which play important roles in hydrocarbon metabolism [6,7]. These microbial populations are not only highly active in influencing seep biogeochemistry at the sediment-water interface [8], but also contribute to a variety of biological processes such as sulfate reduction, sulfur oxidation, denitrification, These authors contributed equally: Zexin Li, Xiyang Dong metal reduction and methanogenesis within the seabed [2,8]. Viruses have also been observed in cold seep sediments. Epifluorescence microscopy of sediments from the Gulf of Mexico revealed that viral-like particle counts and virus-to-prokaryote ratios at cold seeps were significantly higher than in surrounding sediments, suggesting that these habitats may be hot spots for viruses [9]. This agrees with the elevated microbial activity at cold seeps driven by the availability of energy-rich substrates supplied from below. In addition, novel viruses have also been discovered in methane seep sediments, including a novel sister clade to the Microvirus genus of Enterobacteria phage and a putative archaeal virus linked to an anaerobic methane-oxidizing (ANME) clade [10,11]. These findings suggest that cold seeps harbour abundant and undiscovered viruses potentially influencing their microbial hosts and consequently, biogeochemical cycling at cold seeps.
Knowledge of the ecological roles of viruses in deep sea sediments has been limited by difficulties in sampling and extracting viral particles (virions) from sediments [12]. In recent years, developments in sequencing and bioinformatics have enabled the analysis of viruses recovered from metagenomes sequenced without prior virion separation. These methods have greatly advanced viral ecology from the identification of novel viruses to the global distribution of viruses. Studies from a variety of environments such as thawing permafrost [13], mangroves [14], arctic lakes [15], freshwater lakes [16], deep sea sediments [17], the terrestrial subsurface [18,19], and especially seawater [20][21][22] have suggested that prokaryotic viruses act as key agents in natural ecosystems via a range of interactions with their microbial hosts. Viruses can influence organic carbon and nutrient turnover by top-down control of microbial abundance via lysis of cells and subsequent release of cellular contents during lytic infection [23]. They can also reprogramme host metabolism through horizontal gene transfer or via auxiliary metabolic genes (AMGs) in their genomes that are expressed during infection. In peatland soils along a permafrost thaw gradient in Sweden, virus-encoded glycoside hydrolases were found to play a role in complex carbon degradation [13]. In freshwater lakes fed with sedimentderived methane, some viruses were found to encode subunits of particulate methane monooxygenase, suggesting that they may augment bacterial aerobic methane oxidation during infection [24]. Many studies have also revealed the presence and abundance of viruses in deep sea sediments [12,17,[25][26][27][28][29][30], and thus deep sea sediments associated with cold seeps present a unique opportunity to study viruses and their interactions with hosts in a chemosynthetic ecosystem often dominated by anaerobic methane oxidation. Several metagenomic sequencing efforts have been undertaken on cold seep sediments [2], yet most studies have focused exclusively on genomes of bacteria and archaea, neglecting the viruses which are also present.
In this study, we sought to expand the understanding of viral diversity and the ecological role of viruses in deep sea sediments associated with cold seeps. To this end, 28 publicly available marine sediment metagenomes from seven cold seeps around the world were analyzed to recover genomes of viruses in cold seep communities. Characterizing these viral communities enabled host predictions and identification of AMGs. Our findings reveal the global diversity and biogeography of seep viruses and their role in benthic microbial ecology and biogeochemistry.

Methods
Collection of metagenomic data sets for deep sea cold seeps Metagenomic data sets were compiled from 28 sediment samples collected from seven cold seep sites across the global oceans (Fig. 1

Taxonomic profiling of microbial communities
To explore the prokaryotic composition of each sample, 16S rRNA gene fragments (i.e., miTags) were extracted from metagenomic raw reads using the phyloFlash pipeline [33]. Extracted 16S miTags were mapped to the SILVA SSU rRNA reference database (v132) [34] and assigned with an approximate taxonomic affiliation (nearest taxonomic unit, NTU).

Metagenomic assembly
Raw reads were quality-controlled by trimming primers and adaptors and filtering out artifacts and low-quality reads using the Read_QC module within the metaWRAP pipeline [35]. Quality-controlled reads from each metagenome were individually assembled using MEGAHIT v1.1.3 [36] (default parameters). Short contigs (<1000 bp) were removed.

Comparisons to viral sequences from other environments and data sets
To place the 2,885 vOTUs in broader context, they were compared to viral contigs in public databases: (i) GOV 2.0 seawater [21] (n = 195,728); (ii) wetland sediment [48] (n = 1,212); (iii) Stordalen thawing permafrost [13] (n = 1,896). For each viral contig, open reading frames (ORFs) were called using Prodigal v2.6.3 [49], and the predicted protein sequences were used as input for vConTACT2 [50]. We followed the protocol published in protocols.io (https://www.protocols.io/view/applying-vcontact-to-viralsequences-and-visualizi-x5xfq7n) for the application of vConTACT2 and visualization of the gene-sharing network in Cytoscape v3.7.2 [51] (edge-weighted spring-embedded model). Viral RefSeq (v85) was selected as the reference database, and Diamond was used for the protein-protein similarity method. Other parameters were set as default.
In order to determine overlaps between cold seep vOTU representatives and viral populations in the IMG/VR v3 dataset [52], we used rapid genome clustering to identify our vOTUs that share 95% identity with IMG/VR v3 populations [53] based on the scripts provided in CheckV [47]. The supporting code can be found at https://bitbucket. org/berkeleylab/checkv/src/master/.
Abundance profiles RPKM (Reads per kilobase per million mapped reads) values were used to represent relative abundances of viruses and microorganisms. To calculate the RPKM values of each viral contig or MAG, quality-controlled reads from each sample were mapped to a viral contig database or to contigs compiled from the 592 MAGs with BamM v1.7.3 'make' (https://github.com/Ecogenomics/BamM).

Virus-host prediction
Four different in silico methods [13,48,56] were used to predict virus-host interactions. (1) Nucleotide sequence homology. Sequences of vOTUs and prokaryotic MAGs were compared using BLASTn. Match criteria were ≥75% coverage over the length of the viral contig, ≥70% minimum nucleotide identity, ≥50 bit score, and ≤0.001 e-value. (2) Oligonucleotide frequency (ONF). VirHostMatcher v1.0 [57] was run with default parameters, with d 2 * values ≤0.2 being considered as a match. (3) Transfer RNA (tRNA) match. Identification of tRNAs from prokaryotic MAGs and vOTUs was performed with ARAGORN v1.265 using the '-t' option [58]. Match requirements were ≥90% length identity in ≥90% of the sequences by BLASTn [22]. (4) CRISPR spacer match. CRISPR arrays were assembled from quality-controlled reads using crass v1.0.1 with default parameters [59]. CRISPR spacers were then matched against viral contigs with ≤1 mismatch over the complete length of the spacer using BLASTn. For each matching CRISPR spacer, the repeat from the same assembled CRISPR array was compared against the prokaryotic MAGs using BLASTn with the same parameters, creating a virus-host link. Among potential linkages, cas genes of putative microbial hosts were inspected further using MetaErg v1.2.2 [60]. Only hits with adjacent cas genes were regarded as highly confident signals. Whenever multiple hosts for a vOTU were predicted, the virus-host linkage supported by multiple approaches was chosen. Otherwise, to identify a single predicted host for each viral population, hosts were predicted using previously-reported [13] ranked criteria: (1) CRISPR spacer match with adjacent cas gene; (2) CRISPR spacer match without adjacent cas gene; (3) tRNA match or nucleotide sequence homology; (4) ONF comparison. The host predicted by the highest ranking criterion was chosen.

Functional annotations of MAGs
Each MAG was first annotated using MetaErg v1.2.2 [60]. Predicted amino acid sequences were then used as queries for identification of key metabolic markers via META-BOLIC v2.0 [61]. For phylogenetic analysis of McrA and DsrA, amino acid sequences were aligned using the MUSCLE algorithm [62] included in MEGA X [63]. All positions with less than 95% site coverage were eliminated. The maximum-likelihood phylogenetic tree was constructed in MEGA X using the JTT matrix-based model. The tree was bootstrapped with 50 replicates and midpoint-rooted.

Identification of auxiliary metabolic genes
Prior to annotation, CheckV [47] was used to detect provirus boundaries and remove contamination from hostderived sequences. Annotations were performed mainly based on DRAM-v, the viral mode of DRAM [64]. DRAMv requires output produced by VirSorter (e.g. VirSorter2) [65]. We first ran all the identified 2,885 vOTUs through VirSorter2 (--prep-for-dramv) to produce the affi-contigs. tab file and then annotated with the default databases. This process excluded 389 vOTUs owning to low viral scores based on VirSorter2, with 2,496 vOTUs being retained for AMG analysis. Putative AMGs with both auxiliary scores <4 and gene descriptions were selected according to the distillation output of DRAM-v annotations using default parameters [64,66]. Following the method reported by Horst et al. [67], and to be conservative, we performed manual curation of the annotation output to improve the confidence in AMG identification, mainly by removing categories of genes related to nucleotide metabolism, organic nitrogen, glycosyl transferases and ribosomal proteins. Conserved domains of selected AMGs were then identified using NCBI CD-search tool [68]. Protein structural homology searches were performed using the Phyre2 web portal [69]. Genome maps for each contig encoding AMGs were visualized based on DRAM-v and VirSorter2 annotations. For comparison, VIBRANT [46] annotations were also performed on viral contigs based on KEGG, Pfam and VOG using default parameters. KEGG annotations categorized into "metabolic pathways" and "sulfur relay system" were reported as potential AMGs [46]. No curations were performed on the VIBRANT output.

Statistical analyses
All statistical analyses were performed in R version 3.6.3. Alpha and beta diversity of viral communities were calculated using vegan package v2.5-6 [70]. Shapiro-Wilk and Bartlett's tests were employed to test data normality and homoscedasticity prior to other statistical analysis. Nonmetric multidimensional scaling (NMDS) was used to reduce dimensionality using the function capscale, based on Bray-Curtis dissimilarities generated from OTU tables with viral abundances (RPKM) using the vegdist function. The groupings of cold seep sites into different types [2] (mineral-prone vs mud-prone) and different sample depths (shallow vs deep) were individually verified using Analysis of Similarity (ANOSIM). For comparison of different cold seep sites, Shannon index was tested using Analysis of Variance (ANOVA) while Simpson and Chao1 indices were tested using a Kruskal-Wallis nonparametric test. For comparison of different cold seep systems and sample depths, Shannon index was tested using Student's T test while Simpson and Chao1 indices were tested using Wilcoxon signed-rank test. Pearson correlations were calculated using the cor function.

Results and discussion
To investigate the diversity and ecological function of viruses inhabiting cold seep sediments, a 0.38 Tbp compilation of metagenomic data was recruited from public databases and analyzed (Supplementary Table 1). Metagenomes were sequenced from 28 sediment samples obtained at seven seabed cold seeps across the global oceans, encompassing gas hydrates, oil and gas seeps, mud volcanoes and asphalt volcanoes (Fig. 1).

Viruses from cold seep sediments are diverse and novel
From the 28 bulk shotgun metagenomes, 39,154 putative viral sequences were obtained, manually filtered and clustered at 95% ANI to represent approximately species-level taxonomy [21,53]. This gave rise to 2,885 non-redundant cold seep vOTUs, each represented by contigs ≥10 kb in size, including four that were ≥200 kb (Supplementary Table 4) possibly corresponding to huge viruses [73]. Completeness of metagenome-assembled viral genomes or genome fragments was estimated using CheckV [47], giving rise to four different quality tiers: complete genomes (10% of the vOTUs), high-quality (4%), medium-quality (11%), and low-quality (58%), with the remainder being undetermined (Supplementary Table 4 and Supplementary  Fig. 5).
Though some cold seep vOTUs were very abundant in multiple sediment samples, a large majority (84%) of vOTUs were only present within a single cold seep site (Supplementary Table 5). Further analysis of viral distribution across the seven cold seep sites (ANOSIM, R = 0.80, p = 0.001; Fig. 2a) also shows that cold seep viruses display a high degree of endemism, similar to what was found previously in methane seep prokaryotic communities [74]. Viral Shannon diversity, Simpson diversity, and Chao1 richness were all observed to be significantly different (p < 0.05) between the seven sites (Supplementary Table 6). To arrange samples into environmentally meaningful groups, the seven cold seeps were designated as mineral-prone or mud-prone systems according to their fluid flow regime [2]. Mineral-prone systems have longer geologic history with slower emission of fluids, e.g., gas hydrates (ENP, SMM and SB) and oil and gas seeps (EGM), whereas younger mud-prone systems are high-flux, such as mud volcanoes (HM and MS), asphalt volcanoes (WGM), brine pools and brine basins. NMDS analysis revealed clear dissimilarity in viral communities between mineral-prone and mud-prone systems (ANOSIM, R = 0.56, p = 0.001; Fig. 2a). For the most part, viral communities from mineral-prone systems clustered together, except that SB_0 (i.e. surface sediment from 0.0 mbsf) deviated from other Scotian Basin viral communities as well as those from other mineral-prone seeps. Other factors also contribute to the structuring of the viral community, possibly including sediment depth (ANOSIM, R = 0.20, p = 0.01; Supplementary Fig. 6a). Shannon diversity, Simpson diversity, and Chao1 richness of viral communities were significantly higher in mineral-prone seep systems than in mud-prone (Fig. 2b), whereas no clear differences were observed between shallow (0-0.2 mbsf) and deep (>0.2 mbsf) groups (Supplementary Table 6 and Supplementary  Fig. 6b). Overall, these results suggest that fluid flux explains a significant portion of the variance in viral diversity in cold seep sediments.
To investigate the relationship between cold seep vOTUs and publicly available virus sequences from a broader diversity of ecosystems, a gene-sharing network was constructed using vConTACT2 [50]. Such a weighted network can assign sequences into viral clusters (VCs) at approximately the genus level. Cold seep sediments, seawater, wetland, and permafrost vOTUs were grouped into 3,082 VCs ( Fig. 3a and Supplementary Table 7). Only 17 VCs were shared amongst all ecosystems (Fig. 3b). The limited extent of clustering between viral genomes sampled from the various ecosystems may reflect a high degree of habitat Fig. 2 Comparison of viral community diversity between mineral-prone and mud-prone cold seeps. a NMDS analysis of a Bray-Curtis dissimilarity matrix calculated from RPKM values of vOTUs. ANOSIM was applied to test the difference in viral communities between different sites and different systems (mineral-prone vs mudprone). b Shannon, Simpson and Chao1 indices of the viral community diversity from mineral-prone and mud-prone cold seeps. Asterisks denote significance, with * indicating p < 0.05, and ** indicating p < 0.01. specificity for viruses. Among cold seep sediment viruses, 1,742 out of 2,885 vOTUs were clustered into 804 VCs, with the majority (78.7%) being not encountered in any other ecosystem. This suggests that most cold seep viruses may be endemic to cold seeps ( Fig. 3b and Table 8). Very few cold seep viral vOTUs (~0.7%) clustered with taxonomically known genomes from Viral RefSeq (Fig. 3a and Supplementary Table 4), and the proportion is much lower in comparison with recent estimates in soil viruses using a similar approach [75]. Similarly, attempted taxonomic assignment of cold seep vOTUs using whole genome comparisons against 2,616 known bacterial and archaeal viruses from NCBI RefSeq (version 94) left >96% unclassified. The remainder were assigned to the Caudovirales order, specifically Podoviradae (n = 35), Myoviradae (n = 34) and Siphoviradae (n = 27) (Fig. 3c and Supplementary Table 4). These analyses show that cold seep sediments harbour considerable unexplored viral diversity. In addition, using rapid genome clustering (95% identity), only 62 vOTUs from our study were found to cluster with IMG/VR v3 populations (Supplementary Table 9), further suggesting that most cold seep vOTUs are endemic and unique compared to viruses in other ecosystems. The clustered vOTUs in the IMG/VR v3 database include 839 viral genomes, which are primarily derived from marine (54%) and human-associated (11%) samples.

Virus-host linkages and host-linked viral abundance and viral lifestyles
Comparing sequence similarity, oligonucleotide frequencies, tRNA sequences and CRISPR spacers [76], putative hosts were predicted for 14.2% of the 2,885 cold seep vOTUs (Supplementary Table 10). Consistent with previous observations [76,77], most of these vOTUs were predicted to have narrow host ranges, with only 54 vOTUs potentially exhibiting a broader host range across several phyla. Twenty-six vOTUs were linked to both bacterial and archaeal hosts, suggesting existence of viral infection across domains (Supplementary Table 10). A single predicted host was retained for each viral population (see "Methods"). Predicted prokaryotic hosts spanned 9 archaeal and 23 bacterial phyla, with Thorarchaeota (19% of virus-host pairs) and Chloroflexi (14%) being the most frequently predicted (Fig. 4). A considerable proportion (40%) of cold seep vOTUs were linked to archaea, including members of Bathyarchaeota, the Asgard group, Methanomicrobia, Thaumarchaeota and Thermoplasmata. Such broad ranges for archaeal viruses have not been reported previously in natural systems [13,76]. As compared to the IMG/VR v3 database [52], several novel viralbacterial linkages were also identified, including putative viruses for Bipolaricaulota, Coatesbacteria and Sumerlaeota. Based on the presence of functional marker genes within MAGs, predicted hosts include two aerobic methanotrophic Methylococcales (two vOTUs, Supplementary Tables 11), 13 anaerobic methane-oxidizing archaea (e.g. ANME-1 and ANME-2, Supplementary  Fig. 7a), one non-methane multi-carbon alkane oxidizer within Methanosarcinales (Supplementary Fig. 7a), 16 sulfate reducers mostly belonging to Deltaproteobacteria (52 vOTUs, Supplementary Fig. 7b), and numerous respiring and fermentative heterotrophs (Supplementary Table 11). In total, we found 22 vOTUs that might infect archaea (Class Methanomicrobia) capable of anaerobic gaseous alkane oxidation based on methyl/alkylcoenzyme M reductase (Supplementary Table 10). Along with the similar discovery of viruses that infect ANME clades in other methane seep sediments, e.g., the Nyegga methane seep and Coal Oil Point hydrocarbon seep [11], potential viruses infecting anaerobic gaseous alkane oxidizers are possibly widespread in cold seeps, in agreement with the dominance of these archaea [2]. The genome of the sulfate reducer Desulfobacterales 8_GM_sbin_oily_21 also harboured genes possibly encoding akyl-/arylalkylsuccinate synthases related to anaerobic degradation of longer alkanes and aromatic hydrocarbons. These results suggest that populations mediating hydrocarbon and sulfur cycling may be vulnerable to lysis by viral populations in cold seeps, where sulfate reduction is coupled to the anaerobic oxidation of methane and other seeping hydrocarbons. Predicted hosts were also identified within the candidate phyla radiation (six vOTUs were predicted to infect Patescibacteria) and DPANN archaea (13 vOTUs were predicted to infect Pacearchaeota or Aenigmarchaeota). Due to the limited metabolic capabilities and small cell sizes, many CPR and DPANN organisms are likely to be obligate symbionts of other bacteria and archaea [78]. The impact of viral infection on obligate symbionts and any consequences for the larger organisms hosting those symbionts are not yet known, although it has been suggested that they may protect those hosts from viral predation [78].
Based on abundances determined by read mapping, targeted hosts were predicted for >20% of the cold seep viral community (Fig. 5a). When grouped at the phylum level (class level for Proteobacteria and Euryarchaeota), the composition of predicted microbial hosts agreed well with that of their viruses (Fig. 5b). This is supported by regression modelling of the abundances of hosts and lineagespecific viruses (Fig. 5c). By applying metagenomic read recruitment, most viruses had higher genome coverage compared to their hosts, suggesting that most taxa may be undergoing active viral replication and possibly lysis at the time of sample collection [79]. Lineage-specific virus/host abundance ratios (i.e. VHR) for most taxa were greater than one with Thorarchaeota being the highest at 10 2.5 (Fig. 5d), indicating a high level of active viral genome replication. This is in accordance with the presence of higher abundances of viral particles detected by epifluorescence microscopy in cold seep sediments compared to non-cold seep sediments in the Gulf of Mexico [9]. Thus, viral lysis may be a major top-down factor [80], contributing to significant microbial mortality in cold seep sediments. In addition, based on the presence of integrase genes and/or being located within their host genomes, at least 372 cold seep vOTUs were predicted to be lysogenic (i.e., temperate viruses, Supplementary Fig. 8 and Supplementary Table 4).

Viral AMGs involved in carbon, sulfur and nitrogen transformations
To further understand how viruses might affect the biogeochemistry of cold seep sediments, viral contigs encoding AMGs that supplement host metabolism during infection were examined. Overall, cold seep viruses tend to encode AMGs for cofactor/vitamin and carbohydrate metabolism based on VIBRANT [46] annotations (Supplementary  Table 12). A significant portion also encoded AMGs for amino acid and glycan metabolism. Based on DRAM-v annotations [64] and manual curation [67], we identified 32 AMG genes from 29 vOTUs encoding carbohydrate-active enzymes involved in the initial breakdown of complex carbohydrates (Supplementary Table 13). Six of these genes were affiliated with glycoside hydrolases predicted to catalyze hydrolysis of complex sugars based on modelling of protein three-dimensional structures ( Fig. 6 and  Table 13). These six bacteria-or archaealike viral glycoside hydrolases spanned four families, including GH10, GH136, GH33 and GH74. They were found in five vOTUs from four different VCs with hosts being unidentified (Supplementary Tables 7, 10 and 13). In addition, AMGs related to ABC transporters for carbohydrate, central carbon (e.g., UDP-glucose 4-epimerase) and C1 metabolism (e.g., transketolase) were also identified ( Fig. 6 and Supplementary Table 13). These genes have been increasingly recovered as putative AMGs in viruses, which may contribute to nucleotide and energy production during infection [75,81,82]. Deep sea sediment microorganisms associated with cold seeps were reported to be involved in a variety of carbohydrate degradation reactions for processing detrital organic matter supplied from the overlying water column [31,32]. Infection by viruses containing genes related to carbohydrate degradation in general is consistent with their supportive role in host metabolism during the infection cycle [13].
Based on VIRBANT annotations (Supplementary  Table 12), the most common AMG related to sulfur metabolism within the viral contigs was phosphoadenosine phosphosulfate reductase (PAPS reductase or CysH), which is predicted to participate in assimilatory sulfate reduction. Viral cysH has also been found in viral sequences obtained from oxygen-deficient water columns [83], rumen [84], a deep freshwater lake [16] and sulfidic mine tailings [85]. Here, we manually confirmed that three vOTUs from two distinct viral genera encoded PAPS reductase genes ( Fig. 6a and Supplementary Table 13). The three representative PAPS reductase genes contain the conserved domain and structural configuration of PAPS reductase (Fig. 6b) that assimilates sulfates for biosynthesis of methionine and cysteine [83]. One of these three vOTUs was identified to potentially infect the ANME-1 clade. Cold seep sediments are in general considered to feature an abundance of organic matter but are nitrogen-limited for supporting biological biomass production [2,86]. We suspect that viral-encoded PAPS reductase genes can enhance biosynthesis of amino acids during the proliferation of typical cold seep microorganisms like ANME archaea in seep sediments, supplementing the nitrogen fixation capacity of cold seep microbial populations [2,87]. In addition, ABC transporters related to amino acids were also identified (Supplementary  Table 13), highlighting another potential route for organic nitrogen metabolism in these communities.

Conclusions
Due to the challenges of deep sea sediment sampling and laboratory cultivation of microbial communities along with their viruses, the roles that viruses play in influencing microbial mortality, ecology, and evolution remain largely unexplored in marine sediments associated with cold seeps [17,88]. In this study, in-depth exploration of untargeted de novo metagenomic data successfully revealed novel, abundant, and diverse bacterial and archaeal viruses. Many of the putative microbial hosts for seep viruses belong to taxonomic groups with no cultured representatives. These results therefore expand the diversity of archaeal viruses, especially those infecting important archaeal lineages in hydrocarbon seep microbiomes, e.g., members of the Euryarchaeota, Bathyarchaeota, and the Asgard group. While a significant portion of the viruses appear to be lysogenic, the high read coverages for many viral genomes suggest that viral lysis is a major source of microbial mortality and biomass turnover in cold seep sediments. Virus-encoded AMGs, including genes related to carbon, sulfur, and nitrogen metabolism, may augment the metabolism of prokaryotic hosts during infection, potentially altering biogeochemical processes mediated by cold seep microorganisms. As subsurface reservoirs of prokaryotic diversity and hot spots of microbial activity, cold seeps additionally represent oases of viruses and viral activity. Much remains to be revealed about the contribution of viruses to the functioning of cold seeps and other marine environments, especially with respect to their potential role in horizontal gene transfer which was not addressed in this study. With only a fraction of vOTUs identified here able to be classified and many of them predicted to infect poorly characterized taxa, there remain large gaps in the understanding of the microbiology of these environments.

Data availability
Sequences of 2,885 viral contigs and 592 de-replicated metagenome-assembled genomes can be found at figshare (https://doi.org/10.6084/m9.figshare.12922229). All other data are available from the corresponding author upon request. for help with host assignment. The study would not have been possible without publicly available genomic data from already published studies, and the authors wish to acknowledge all the participants, organizations, and funding agencies that contributed to the collection and analysis of all the data utilized in our study, including those  Table 1.
Author contributions XD designed this study. XD, ZL, CZ, and WP analyzed metagenomic data. XD, ZL, DP, and GW interpreted data. ZL, YP, and LZ performed viral diversity analyses. CRJH contributed part of the data. XD, ZL, DP, and CRJH wrote the paper, with input from other authors.

Compliance with ethical standards
Conflict of interest The authors declare no competing interests.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.