Viruses in deep-sea cold seep sediments harbor diverse survival mechanisms and remain genetically conserved within species

Deep sea cold seep sediments have been discovered to harbor novel, abundant, and diverse bacterial and archaeal viruses. However, little is known about viral genetic features and evolutionary patterns in these environments. Here, we examined the evolutionary ecology of viruses across active and extinct seep stages in the area of Haima cold seeps in the South China Sea. A total of 338 viral operational taxonomic units are identified and linked to 36 bacterial and archaeal phyla. The dynamics of host-virus interactions are informed by diverse antiviral defense systems across 43 families found in 487 microbial genomes. Cold seep viruses are predicted to harbor diverse adaptive strategies to persist in this environment, including counter-defense systems, auxiliary metabolic genes, reverse transcriptases, and alternative genetic code assignments. Extremely low nucleotide diversity is observed in cold seep viral populations, being influenced by factors including microbial host, sediment depth, and cold seep stage. Most cold seep viral genes are under strong purifying selection with trajectories that differ depending on whether cold seeps are active or extinct. This work sheds light on the understanding of environmental adaptation mechanisms and evolutionary patterns of viruses in the sub-seafloor biosphere.


INTRODUCTION
Cold seeps are deep-sea environments where hydrocarbon fluids and gas seepage occur at the continental margins worldwide.The continuous seepage of gaseous and liquid hydrocarbons boosts local biodiversity and microbial activity, featuring prevalent archaeal anaerobic methanotrophs (ANME) and sulfate-reducing bacteria (SRB) [1,2].Compared to the rich knowledge of cold seep bacterial and archaeal communities, viruses remain largely underexplored in spite of their significant roles in impacting microbes and corresponding biogeochemical cycles [3,4].Virus studies using enumeration or cultivation have shown that cold seep sediments are hotspots of viral production with high virusprokaryote ratios [5,6].A recent survey of metagenomes from seven cold seeps demonstrates that these sediments harbor diverse and novel viruses, hinting at their potential impact on hydrocarbon biodegradation and other local metabolisms catalyzed by cold seep microbiomes [7].However, cold seep viral diversity and distribution patterns, virus-microbe interactions, adaptive mechanisms to environmental factors, and viral genetic diversity are still relatively unexplored.
Viruses have a genetic toolbox of diverse mechanisms to adapt to the environment and co-evolve with hosts.As foreign mobile genetic elements, viruses face a wide repertoire of antiviral defense systems, including restriction-modification (RM) and CRISPR-Cas [8].In line with antagonistic co-evolution of viruses and their hosts [9,10], viruses have developed efficient and robust counter-defense systems, such as anti-restriction, anti-CRISPR, and other counter-defense proteins [11,12].Diversity-generating retroelements (DGRs) containing reverse transcriptase (RT) are another important diversification mechanism for driving sustained amino acid-level diversification of their target domains [13,14].Viruses also encode DGRs to produce many mutations in specific regions of host target genes through error-prone reverse transcription [15][16][17].To replicate more efficiently, viruses can alter their hosts' metabolic potential through the expression of auxiliary metabolic genes (AMGs) to modulate host cell metabolism during infection [18].In addition to these gene inventories, viruses can use alternative genetic codes different from those of their host, potentially increasing viral adaptability (e.g., in regulating translation of lytic genes) [19,20].Whether or not cold seep viruses incorporate these strategies into their repertoire of mechanisms for mediating host-virus interactions and environmental adaptation in these harsh deep-sea subseafloor environments requires further investigation.
Intra-population genetic variations (microdiversity) can also improve virus adaptation to their environment by driving phenotypic variation [21,22].For example, depth-dependent evolutionary strategies of viruses were observed in the Mediterranean Sea [9] and grassland soil in northern California [10].Large viral microdiversity was observed for perhaps the most abundant ocean virus in temperate and tropical waters infecting Pelagibacter [23], whereas viruses were under significantly low evolutionary pressures in stable subzero Arctic brines [24].The principles governing the viral evolution likely differ depending on environmental conditions, such as host dynamics, physicochemical properties, and population sizes [25][26][27].Examining 39 abundant microbial species identified in sediment layers below the sea floor and across six cold seep sites, we previously reported that their evolutionary trajectories were depth-dependent and differed across phylogenetic clades [1].However, it remains to be answered if cold-seep viruses are undergoing similar evolutionary patterns and selection pressures.
To understand adaptive survival mechanisms and genetic microdiversity of cold seep viruses, we extracted viral genomes from 16 sediment core samples in the area of Haima cold seeps in the South China Sea (Supplementary Figure 1 and Supplementary Table 1).Cores were collected from two active seeps with dense and living bivalves, as well as from one extinct seep covered with many dead clams [28].We explored viral diversity patterns at both the community-level (macrodiversity) and population-level (microdiversity), and the viral functional gene repertoire related to arms race between viruses and their prokaryotic hosts.This study expands the knowledge of ecological and evolutionary patterns of viruses inhabiting cold seep subsurface ecosystems.

RESULTS AND DISCUSSION
Diverse antiviral strategies in cold seep microbial genomes In total, 16 metagenomic data sets were derived from depthdiscrete sediment core samples obtained from two active (n = 5 for Active1; n = 6 for Active2) and one extinct (n = 5) cold seeps (Supplementary Figure 1 and Supplementary Table 1), at depths ranging from 0 to 20 cm below the sea floor (cmbsf) [28].Bacterial and archaeal community structures varied between different depth layers at the three sites (Supplementary Fig. 2 and Supplementary Table 2).Active seep sediments were dominated by taxa affiliated with Halobacteriota and Desulfobacterota, whereas the members of Desulfobacterota and Chloroflexota were the major microbial lineages in extinct seep sediments.After assembly, 487 species-level metagenome-assembled genomes (MAGs) were reconstructed at an average nucleotide identity (ANI) threshold of 95% (Supplementary Figure 3 and Supplementary Table 3), spanning 53 bacterial and 10 archaeal phyla, with the majority affiliated with Proteobacteria (n = 59), Desulfobacterota (n = 56), Chloroflexota (n = 49), Bacteroidota (n = 38), and Thermoplasmatota (n = 24).
Bacteria and archaea possess diverse antiviral strategies to defend against infection by their viruses [29][30][31].A total of 2,145 antiviral genes were detected in 63% of cold seep microbial genomes, and could be assigned to 43 families of antiviral systems [8,32]; these include restriction-modification (RM) systems that target specific sequences on the invading DNA elements, and CRISPR-Cas systems that use RNA-guided nucleases to cleave foreign sequences [33] (Fig. 1a and Supplementary Table 4).On average, the cold seep microbial genomes encode two antiviral systems per genome and the number of antiviral systems is positively correlated with the genome size for each MAG (linear regression; R 2 = 0.27, p = 4.73 × 10 −5 ; Fig. 1b), similar to previous observations on the importance of genome size for encoding accessory systems in prokaryotes or ocean microbiomes [8,34].The number of antiviral systems per genome varies from zero (179 genomes) to 32 in a genome belonging to the phylum Fermentibacterota (classified as JAFGKV01 at the family-level; Supplementary Table 4), followed by 30 in a Gammaproteobacteria genome and 27 in a Bacteroidia genome.On average, the bacterial genomes encode more antiviral systems per genome than those in archaeal genomes (3.9 vs 2.4).The most abundant species in the metagenomic dataset (18% of the microbial community) is the putative anaerobic methanotroph ANME-1 SY_S15_40 that encodes two RM type II and one RM Type IIG systems (Supplementary Tables 3 and 4).Based on surveying large datasets of sequenced genomes, RM and CRISPR-Cas systems were reported to be present in ~75% and ~40% of microbial genomes, respectively [29,35].Relatively fewer cold seep microbial genomes appear to encode RM (50.8%) and CRISPR-Cas systems (22.7%), yet feature higher frequencies of AbiEii (44%; one antiviral system of Abortive infection [36]) and SoFlC (38%) that can modulate various target protein activities [32] (Fig. 1c and Supplementary Table 4).Diverse antiviral systems were also found in microbial communities from Mediterranean sponge species [37], epipelagic and mesopelagic layers in the Pacific Ocean [38], a deep-sea hydrothermal microbial mat in the Guaymas Basin [39].In general, they have different distribution patterns of antiviral systems from cold seep sediments.Overall, these data reveal diverse antiviral strategies throughout the Haima cold seep microbiome with specific enrichment in some antiviral systems that govern the dynamics of host-virus interactions.
Novel viral genomes linked to 36 microbial phyla Cold seep samples contained highly abundant viruses with densities up to 7.6 × 10 11 per gram sediments, with viral abundances being associated with sediment depth (Supplementary Table 5).From the 16 metagenomic data set, 488 singlecontig viral genomes with ≥50% estimated completeness (based on CheckV [40]) were recovered using multiple virus identification tools (Fig. 2a and Supplementary Figure 4).Viral genomes were clustered into 338 species-level viral operational taxonomic units (vOTUs) [41], belonging to 83 viral clusters (VCs; roughly equivalent to an ICTV genus) utilizing whole genome genesharing profiles [42] (Supplementary Fig. 5 and Supplementary Table 6).Similar to observations in prokaryotic communities [1,2,43], alpha and beta diversity analyses of 338 vOTUs suggest that sampling site, sediment depth in relation to redox conditions [28], and the geological state of cold seeps (active or extinct) shape the structure of viral communities (Supplementary Fig. 6 and Supplementary Table 5).
Among the 338 vOTUs, 291 could be taxonomically assigned revealing that 288 are affiliated with the class Caudoviricetes (Fig. 2a and Supplementary Table 6), which encompasses tailed phages that are the most prevalent viral taxon across ecosystems [44].Only ten vOTUs could be annotated at the order level, confirming a large knowledge gap in the taxonomy of deep-sea cold seep viruses [7].With respect to viral lifestyles, 48 and 22 vOTUs were predicted to be lytic and lysogenic, respectively, with others being unclassified (Fig. 2a).Host predictions of these vOTUs revealed that virus-infected hosts were detected in 36 bacterial and archaeal phyla (Fig. 2b, c and Supplementary Table 7).From the 475 host-virus linkages, the most common phylum among predicted hosts was Chloroflexota (n = 80), followed by Halobacteriota (n = 31), Asgardarchaeota (n = 30), and Desulfobacterota (n = 29).This is consistent with our previous observation that a significant portion of viruses targeted archaea in cold seep sediments, and such a host-virus pattern has not been reported in other deep-sea ecosystems [7,45,46].Ten viruses were predicted to infect ANME-1 and ANME-2 groups that perform anaerobic methane oxidation.Viruses infecting Methanosarcinales and Gammaproteobacteria were highly abundant in the extinct and active cold seep samples, respectively.

Cold seep viruses harbor diverse strategies for environmental adaptation
To protect against antiviral systems of their microbial hosts, cold seep viruses encode an extensive repertoire of counter-defense systems, including anti-CRISPR (Acr) proteins, methyltransferases, and antitoxins (Fig. 3a-c and Supplementary Table 8).A total of enzymes were detected in 55 viral genomes (16% of all viruses), encoding diverse DNA modification enzymes (e.g., adenine-and cytosine-specific methyltransferases, and adenine methylase) [34].The acr-aca operon (anti-CRISPR gene acr and acr-associated gene aca) [47] was identified in ten viral genomes (3%), which may inhibit the CRISPR-Cas immunity of the host to allow viruses to propagate [48].Accordingly, one Poribacteria genome SY_Acti-ve_Co137 infected by a virus with the acr-aca operon has nine cas genes (Supplementary Tables 4 and 8).Interference modules of the antitoxin genes (e.g., vapBC, relBE, hicBA) were found in 63 viruses (19%) and belonged to the type II Toxin-antitoxin (TA)  Fig. 1 Diversity of antiviral systems found in cold seep bacterial and archaeal genomes.a Proportion of antiviral genes from each type of antiviral systems in all the identified antiviral genes.b Relationship between antiviral system numbers per prokaryotic genome and their genome sizes.The correlation analysis was conducted with the completeness-filtered dataset (>90% genome completeness) to reduce the potential bias caused by the genome incompleteness.c Frequency of antiviral systems detected in microbial genomes.Detailed statistics for antiviral systems of microbial genomes are provided in Supplementary Table 4.
system [49].Additionally, a total of 17 viruses were found to encode two or more types of counter-defense systems.
As an important mechanism in adaptation to the environment, viruses can acquire new functional genes via transduction, namely auxiliary metabolic genes (AMGs) that contribute to host and/or viral fitness [4,45].Ten AMGs were identified in seven viral genomes (Fig. 3d, Supplementary Fig. 7 and Supplementary Table 9), related to four different types of functions.Two AMGs encoded GTP cyclohydrolase I (FolE), and six belonging to Que super family (QueC and QueD) may contribute to synthesizing GTP to 7-Cyano-7-deazaguanine (preQ 0 ) for genome modifications and translational efficiency [50].The preQ 0 is the key intermediate in Q and G + pathways, which can be further modified for protecting viral DNA from host restriction enzymes [51].AMGs encoding S-adenosylmethionine (SAM) decarboxylase (SpeD) and Dehydrogenase E1 component were also identified, and are involved in biosynthesis of amines or polyamines and the tricarboxylic acid cycle, respectively.SAM is the methyl donor for methyltransferases that modify DNA, RNA, histones, and other proteins; decarboxylation of SAM to S-adenosylmethioninamine might reduce the SAM required for methylation by host enzymes [52].These AMGs have been also reported to be encoded by viruses in other deep-sea settings, including seawater and sediments of oceanic trenches, and free-living and particle-attached fractions from the bathypelagic ocean [45,[53][54][55], suggesting their importance roles in increasing viral adaptability in deep oceans.
Different classes of reverse transcriptases (RTs) were also found in 22 viruses, including diversity-generating retroelements (DGRs), retrons, UG26, and UG28 (Fig. 3e and Supplementary Fig. 8).Among them, RTs associated with DGRs were detected in five viruses; this mechanism can introduce variations in the target gene and facilitating the evolution of their hosts [17].Retrons were found in three viruses, also possibly involved in defense systems for foreign DNA elements [49,56].Other RTs systems were identified with their roles and mechanisms remaining unknown.
Diverse lineages of viruses from different habitats have been seen to be self-beneficially employ alternative genetic codes to reassign one or more codons [20,[57][58][59].In the dataset from the Haima cold seeps, 16 viral genomes are predicted to use genetic  6 and 7.
codes characterized by reassignments of the ochre (TAA; n = 620 recoding events of genes), amber (TAG; n = 182) or opal (TGA; n = 3) stop codons (Fig. 3f, Supplementary Fig. 9a and Supplementary Table 10).These viruses are associated with hosts in multiple phyla (e.g., Desulfobacterota and Acidobacteriota).Genome sizes of these viruses range from 5.2 kb to 179.7 kb, with larger genomes having more recoding events of genes (linear regression; R 2 = 0.58, p = 0.0004).Recoded genes were mostly associated with replication, recombination and repair functions, followed by unknown functions (Supplementary Fig. 9b), suggesting adaptive recoding in controlling viral replication and regulation.
Cold seep viruses are genetically conserved and under strong purifying selection Nucleotide diversity (π), single nucleotide polymorphisms (SNPs) and fixation indices (F ST ) were calculated to track viral microdiversity (Supplementary Tables 11 and 12).Nucleotide diversity of cold seep viral populations ranged from zero to 3.06 × 10 −3 , and were on-average 1.29 × 10 −4 (median 3.38 × 10 −5 ) for viruses detected in both active and extinct cold seep sediments (Fig. 4a).This viral nucleotide diversity is significantly lower than that observed for viral populations in seawater sampled from throughout the world's oceans (on-average 3.78 × 10 −4 ) [22] and in soils having various land uses (on-average 6.54 × 10 −3 ) [60].Low SNP frequencies were also observed in Haima cold seep viral populations (0.33 SNP per 1000 bp on average, median 0.076; Fig. 4b), e.g., compared to those detected in the SARS-CoV-2 coronavirus, in 25 uncultivated virophage populations in North American freshwater lakes, and in 44 dsDNA viral populations dominating the oceans, based on various approaches for the extraction of viral genomes [61][62][63].F ST values between viral populations in relation to different sediment samples ranged from zero to 0.89 and were on-average 0.048, with 80% of pairwise fixation indices being zero (Fig. 4c).These data reflect that cold seep viral populations are genetically conserved and homogeneous contrary to observations of their microbial hosts [1], suggesting viruses and microbes might undergo different types of environmental selection.
Nucleotide diversity of viral populations is significantly different among viruses infecting different microbial hosts (p = 0.0003; Fig. 4d and Supplementary Table 11).Archaeal viruses associated with Halobacteriota have the highest nucleotide diversity.Like evolutionary trajectories of microbial populations in cold seeps [1] (e.g., Asgardarchaeota, Halobacteriota, and Bacteroidota), the nucleotide diversity of associated viruses is also depthdependent in active cold seeps (linear regression; R 2 = 0.21, p = 1.65 × 10 −5 ; Fig. 4e).On the other hand, no obvious depthdependent trends were observed for viruses in the extinct cold seep (linear regression; R 2 = −0.0048,p = 0.40).This is in agreement with the significant difference for nucleotide diversity between the two cold seep stages (Fig. 4a; p = 0.051).
3 Diverse strategies for environmental adaptation in cold seep viruses.a Viruses encode methylases that can modify their DNA to prevent its recognition by host restriction-modification systems and cleavage by certain restriction endonucleases.b Anti-CRISPR genes in viruses can inhibit CRISPR-Cas activities when it is targeted by the CRISPR-Cas system of the host.c Viruses encode antitoxins that can neutralize host toxin-antitoxin systems.d Potential functions of auxiliary metabolic genes.SAM: S-adenosylmethionine. preQ 0 : 7-cyano-7deazaguanine.e Reverse transcriptases (RTs) in cold seep viruses including diversity generating retroelements (DGRs), retrons, UG26, and UG28.For DGR, RT mediates exchange between two repeats: one serves as a donor template (TR) and the other as a recipient of variable sequence information (VR).f Alternative genetic codes found in some cold seep viral genomes.Related genes identified in cold seep viruses are marked in red (gene name) or with red border (gene arrow).Detailed statistics for diverse strategies for environmental adaptation in viruses are provided in Supplementary Tables 8-10.
At the gene level, 90.6% of pN/pS ratios were less than 0.4, much lower (p < 0.0001) than those observed for viral assemblages present in underground saline waters from hypersaline springs [64] (Fig. 5a, Supplementary Fig. 10 and Supplementary Table 13), indicating that most cold seep viral genes were under strong purifying selection.However, genes under positive selection were also detected in relation to viral DNA replication, recombination, repair, and maturation (Fig. 5b), including genes encoding TerL, transposase, and leucyl-tRNA synthetase with abnormally high pN/pS values (Supplementary Table 13).Significant differences were exhibited for pN/pS ratios between the two cold seep stages (Fig. 5a; p < 0.0001).When grouped according to the functional categories of VOGDB (http:// vogdb.org/),nucleotide diversity values were found to be significantly different while no significant differences were observed for pN/pS ratios (Supplementary Fig. 11).Tajima's D values ranged from −9.7 to zero and significantly varied (p = 1.66 × 10 −8 ) between the two cold seep stages (Fig. 5c).A total of 90.5% of viral gene Tajima's D values were found to be zero with no detected SNP.For others, genes under natural selection (Tajima's D < −2.5; 6.1%) outnumbered those under neutral processes (Tajima's D = 0; 3.4%).The observation of large number of negative values supports the presence of excess rare alleles and recent expansion of cold seep viral populations [65].Together, these analyses of evolutionary dynamics of viruses will help guide future studies targeting the viral evolution and virus-host systems in extreme environments.However, it should be noted our results are representative only of double-stranded DNA viruses, such that other viral particles are not incorporated in the extraction process and analysis [9].Nevertheless, studies with more samples from more locations and covering larger spatial gradients via the combination of metagenomes and viromes as well as single-virus genomics [23,61] will be necessary to determine if the trends presented here are universal for deep sea subseafloor viral communities.

Sample description, metagenomic sequencing and analysis
Metagenomic sequencing was performed on 16 sediment samples collected from the Haima cold seeps in the northern part of the South China Sea (Supplementary Fig. 1).Samples were taken from two active seep sites and one extinct seep site via the R/V Tansuo Yihao using the piloted submersible ShenHai YongShi [28].Sediment cores penetrated  13.
18 to 20 cm into the seabed.Details for DNA sequencing can be found elsewhere [28] and involved genomic DNA extraction with the MO BIO PowerSoil DNA Isolation Kit followed by sequencing on the MGI sequencing platforms DNBSEQ-T1 or BGISEQ500 (MGI Tech Co., Ltd., China) at BGI-Shenzhen (China).

Enumeration of viruses via fluorescence microscopy
Viral particles in sediments were counted by fluorescence microscopy according to a previous protocol [73].In brief, around ~0.8 g sediment from each sample was transferred into a sterile 50 mL centrifuge tube and promptly fixed in 0.5% glutaraldehyde.Viruses were separated from sediments by vortexing in the dark, incubated in sodium pyrophosphate, and sonicated on ice.Samples were then filtered onto 0.02 μm pore-size membrane filters (Anodisc 25, Whatman), stained with SYBR Green I and observed using a HORIBA Aqualog fluorescence microscope (Tokyo, Japan) with a Leica imaging system.The Find maxima tool of Image J (https:// imagej.net)was used to automatically select the fluorescent points [74] with manual curation.

Host assignments for bacteriophages and archaeoviruses
A total of 2678 bacterial and archaeal MAGs recovered from 68 previously sequenced cold seep sediments were used to serve as the host reference database [1].Multiple host prediction strategies were used to link viral genomes to their microbial hosts following our previous method [7] complemented with iPHoP, an automated command-line pipeline for host predictions [81] (Supplementary Fig. 4).(i) For CRISPR spacer matches, the CRISPR arrays of cold seep microbial genomes were predicted using the CRISPRidentify v1.1.0with default parameters [82].Spacers shorter than 25 bp and CRISPR array with fewer than three spacers were dropped out.CRISPR spacers were aligned with viral genomes with ≤1 mismatch using BLASTn, and the thresholds of 95% identity were selected.Additionally, 1,398,130 spacers from 40,036 distinct genomes in the iPHoP_db_Sept21 database were also used for CRISPR-based predictions by version 1.1.0 of iPHoP [81].(ii) For the detection of shared tRNA between viruses and hosts, tRNA genes were annotated using tRNAscan-SE v2.0.9 (parameters: -B -A) [83].Putative host-virus linkages satisfied a threshold of ≥90% length identity over the 95% of the sequences by BLASTn.(iii) For alignmentbased matches, viral genomes were aligned with microbial genomes using BLASTn based on their nucleotide sequence homology (e-value ≥ 0.001, nucleotide identity ≥70%, match coverage over the length of viral genomes ≥75% and bitscore ≥50).(iv) For host predictions based on independent signals (k-mer usage profiles and protein content), VirHost-Matcher (VHM) [84], WIsH [85], Prokaryotic virus Host Predictor (PHP) [86], and RaFAH [87] were performed individually using iPHoP v1.1.0.Match criteria were d 2 * values ≤ 0.2 for VHM, p-values ≤ 0.05 for WIsH, the predicted 'maxScoreHost' for PHP, and RaFAH_scores>0.14for RaFAH.The genome was considered to be the host if it belonged to the same family with top hits for each viral genome based on multiple methods.

Macro-and microdiversity analyses of viral populations
Filtered reads from each sample were mapped to 338 single-contig viral genomes that represent each vOTU using Bowtie2 v 2.3.5 [99].Resulting BAM files, viral genomes, and read counts for each metagenome were used as input for the MetaPop pipeline [100] for pre-processing, macrodiversity and microdiversity analyses.MetaPop was run using the default parameters (--snp_scale both), and genes from viral genomes were predicted using Prodigal v2.6.3 [101].Macrodiversity estimates include population abundances, alpha-diversity (within community) and betadiversity (between community) indices.To accurately call SNPs and assess contig-level microdiversity, 207 viral populations with >10× average read depth coverage and >70% length of genome covered were retained for microdiversity analyses [100].SNP frequencies subsampled down to 10× coverage were used to assess nucleotide diversity (θ and π) at the individual gene and whole-genome levels, as well as fixation indices (F ST ; between population microdiversity) and selective pressures on specific genes (pN/pS and Tajima's D).

Statistical analyses
Statistical analyses were performed using R v4.0.0.The normality and variance homogeneity of the data were assessed using Shapiro-Wilk and Bartlett's tests.Wilcoxon tests were used to compare differences in viral microdiversity parameters (π, Tajima's D, pN/pS) across cold seep stages.The Kruskal-Wallis rank-sum test with Chi-square correction was used to compare differences in evolutionary metrics of genomes and genes among different groups and samples.Correlations between microdiversity and sediment depth, defense system numbers, genome sizes, and others parameters were obtained using the linear regression with the fitness and confidence of the regression curves characterized as R 2 and p values, respectively.

Fig. 2
Fig. 2 Ecological features of cold seep viruses.a Workflow for identification, taxonomic assignment, and lifestyle prediction of viruses.Phylogenomic trees of predicted (b) archaeal and (c) bacterial hosts based on concatenated alignments of single-copy marker genes predicted by GTDB-Tk.Scale bars indicate the average number of substitutions per site.The orange triangle shows the number of viruses predicted to infect hosts in each clade, and the blue circle shows the number of microbial genomes in each clade with predicted viruses.Detailed statistics for taxonomy, lifestyles, and host-virus linkages are provided in Supplementary Tables6 and 7.

Fig. 5
Fig.5Gene-wide evolutionary metrics of cold seep viral populations.a pN/pS ratio of viral genes from cold seeps (this study) and viral genes from an ancient saltern[64].b Viral genes under positive selection in active and extinct cold seeps.Viral genes are divided into two groups based on pN/pS values, consisting of genes under positive selection (pN/pS≥1) and those under purifying selection or relaxed selection (pN/pS < 1).c Tajima's D of viral genes across 16 sediment samples from extinct (blue) and active (red) cold seeps.Detailed statistics for evolutionary metrics of cold seep viral genes are provided in Supplementary Table13.