Double-stranded RNA sequencing reveals distinct riboviruses associated with thermoacidophilic bacteria from hot springs in Japan

Metatranscriptome sequencing expanded the known diversity of the bacterial RNA virome, suggesting that additional riboviruses infecting bacterial hosts remain to be discovered. Here we employed double-stranded RNA sequencing to recover complete genome sequences of two ribovirus groups from acidic hot springs in Japan. One group, denoted hot spring riboviruses (HsRV), consists of viruses with distinct RNA-directed RNA polymerases (RdRPs) that seem to be intermediates between typical ribovirus RdRPs and viral reverse transcriptases. This group forms a distinct phylum, Artimaviricota, or even kingdom within the realm Riboviria. We identified viruses encoding HsRV-like RdRPs in marine water, river sediments and salt marshes, indicating that this group is widespread beyond extreme ecosystems. The second group, denoted hot spring partiti-like viruses (HsPV), forms a distinct branch within the family Partitiviridae. The genome architectures of HsRV and HsPV and their identification in bacteria-dominated habitats suggest that these viruses infect thermoacidophilic bacteria.


Sampling sites
The sites H4 and H5 in Hayashida hot spring were located at a natural venting site on the slope of the valley line at the southwestern foot of a caldera lake called Onami Pond in the Kirishima Volcanic complex that is one of the most active volcanic sites in Japan.The sites T1-4 in Tearai area and Y66, Y80, and Y86 in Yunoike area were located at fumerole zone at the western foot of the same caldera pond.The site Oi was located at fumerole zone western foot of the Unsen volcano, which is a volcano that erupted in a devastating eruption that lasted from November 1990 to February 1995 1 .The site Ob was in Obama hot spring area, which is located ~10 km apart coastal area from the site Oi (Supplementary Figure 1).

Chemistry of hot spring water
The chemical composition of hot spring water was measured and shown in Supplementary Table 1.The major cation (Ca 2+ , Mg 2+ , Na + , K + ) and anion (Cl -, SO4 2-) were analyzed using ion-chromatography (Gulliver; Jasco International Co., Ltd., Tokyo, Japan) after 21 times dilution of the samples with distilled water.SiO2 was measured by the molybdenum yellow method with a spectrophotometer.δD and δ 18 O were measured from filtered (0.2-μm-pore-size) and non-diluted water samples by using a liquid water isotope analyzer (Los Gatos Research, Inc.).Measurement errors are within 5% for Ca 2+ , Mg 2+ , SO4 2−, and SiO2, 10% for the other ions, ±1.1‰ for δD, and±0.1‰for δ 18 O.Table S6 shows the measurement results of the water samples.1: Chemical and isotopic composition of the fluid samples at the 11 samples.1).

RNA2 and RNA2* of HsRV
Notably, the 5'-terminal three genes in RNA2 and RNA2* encode homologous proteins containing predicted transmembrane domains (BLASTP, E-value < 1e-07), so that RNA2 and RNA2* most likely represent two distinct variants of the same segment (Fig. 2a; Extended Data Table 3), with the viruses from the Oi sample having bisegmented genomes.However, we cannot rule out the possibility that RNA2* represents a satellite genome that depends on the bisegmented virus.Three predicted transmembrane proteins that were shared between RNA2 and RNA2* (Fig. 2a) could be involved in virion morphogenesis and/or virus-host interactions 2 .
Furthermore, no sequence related to RNA1_ORF4 could be identified using PZLAST 7 among several tera-bytes of public metagenomic sequences either.Similarly, direct comparison of RNA1_ORF4 against the RdRPs from the recently described putative divergent phyla of riboviruses [8][9][10] revealed no significant similarity (BLASTP, E-value ≤ 1e-05).Attempts to generate a structural model of RNA1_ORF4 using AlphaFold2 with a default multiple sequence alignment (MSA) generation method did not result in a reliable model, likely due to the lack of identifiable homologs in public databases.

Proteins of HsRV-like viruses
Similar to HsRV, the RdRPs of these related viruses from moderate aquatic environments did not show similarity to sequences in current databases either using BLASTP or HHpred searches.However, the functions of several other proteins could be predicted from HHpred search results (Extended Data Fig. 1b).In particular, ORF1 and ORF3 of Ga0393213 encode putative carbohydrate-binding protein with the jelly-roll fold and a phospholipase A2 (PLA2), respectively.Notably, PLA2 has been previously identified in viruses with small RNA and ssDNA genomes 11,12 and, in parvoviruses, shown to be important for infectivity 12 .Ga0169446 encodes a predicted kinase related to nucleoside monophosphate kinases.Finally, Ga0169446 and Ga0456180 encode nonorthologous zinc finger proteins, which could be involved in nucleic acid binding or protein-protein interactions.
Apart from the RdRP, the HsRV-like viruses lack predicted proteins shared with HsRVs from hot springs.However, the three viruses from the moderate environments share a ~500 aa-long protein encoded upstream of the RdRP, a likely candidate for the capsid protein (Fig. 3a).In addition, Ga0456180 and Ga0393213 share a protein of unknown function.

RNA2 of HsPV and its related viruses
ORF1 and ORF2 of RNA2 did not show significant sequence similarity to proteins in sequence or profile databases, except that ORF2 was predicted to encode a membrane protein with two transmembrane domains.Given the relatively high similarity between the RdRP of HsPV and Driatsky virus (Accession: MT025082), we explored the possibility that the second genomic segment of Driatsky virus was missed, not an uncommon situation with segmented RNA viruses, especially for segments not encoding the RdRP.Driatsky virus was identified in a metatranscriptome of an environmental animal sample, but labeled to be "unlikely vertebrate associated".To identify the potential RNA2 of Driatsky virus, we searched contig sequences from the corresponding metatranscriptome (SRR7239362) using HsPV RNA2 as a query.BLASTX analysis against ORF1 of RNA2 identified only one significantly similar sequence, contig_9778 (see below).This contig encompassed two ORFs, as in the case of HsPV RNA2, and the amino acid sequences of contig_9778 ORF1 showed significant similarity (E-value = 2e-06) with HsPV-H4 RNA2_c_ORF1 protein.
Unlike in other classified members of the family, RNA2 of HsPV is bicistronic and encodes a membrane protein of unknown function, which is likely to participate in virion morphogenesis and/or virus-host interaction.A similar genome organization has been observed in the unclassified genPartiti.0019partiti-like viruses 9 ; however, the latter clade is nested among bona fide partitiviruses, forming a sister group to deltapartitiviruses (Fig. 5c).

Metagenomic sequencing and data processing
Five metagenomic libraries were constructed and analyzed as described before 13 (Supplementary Table 2).In brief, DNA was extracted from cells collected on a portion of the 0.2-μm-pore-size filters corresponding to approximately 0.25-2.5 L of H4, H5 and Oi samples, that potential complete RNA virus genomes were identified, using DNeasy PowerSoil Kit (QIAGEN).Covaris M220 (Woburn, MA, USA) was used for physical DNA fragmentation using the conditions described below to obtain a peak fragment size:  3.
Supplementary Table 3: Sequences used in phylogenetic analysis of HsPV and related viruses.

Figure 1 :
Locations of the sampling points.a, Samples were collected from Kyushu, southwestern Japan, where an area with many active volcanoes.b, Three sites (H: Hayahida hot spring, T; Tearai area, and Y; Yunoike area), were located on the west side of the Kirishima Volcanic complex.c, The sites Oi was located at a fumerole zone western foot of the Unsen volcano.The site Ob differs from the other sites in that it was located along the coast, relatively far from an active volcano.The maps were based on the digital elevation topographic map published by Geospatial Information Authority of Japan (http://maps.gsi.go.jp/).
400 bp; Peak Power: 75.0, Duty Factor: 15.0, Cycles/Burst: 200, and Time: 60 s.Then, shotgun metagenomic libraries were constructed using KAPA Hyper Prep Kit.The metagenomic sequence libraries were analyzed using the Illumina MiSeq platform with a 2×300-bp read length.Sequence reads were quality filtered using Trimmomatic v0.35 with the option "LEADING:20 TRAILING:20 MINLEN:60".The quality-controlled reads were then assembled in a sample-by-sample manner using MEGAHIT v1.1.4with the default setting.Only the long contigs (≥1kb) were retained for further analyses.CRISPR regions were identified using MinCED v0.4.2 with the default setting.The identified CRISPR spacers (n=919) were assembled as a database for the identification of potential virushost interactions.The 25 virus genome segments were used as a query for sequence similarity search against the CRISPR spacer database.The similarity search was performed using BLASTn with the options "-word_size 7 -evalue 1e-3 -dbsize 100000000" and no sequence met the threshold.Supplementary Table 2: Hot spring metagenomes and CRISPRsGenBank accession numbers of theRdRP proteins from representative members of the family Partitiviridae and related sequences GenBank accession numbers of the RdRP proteins from representative members of the family Partitiviridae and related sequences, are shown in Supplementary Table