Microbial ecologists have devoted considerable effort to understanding the nature of the viruses in seawater, because viruses have key roles in the evolution, ecology and mortality of marine plankton (Rohwer and Vega Thurber, 2007; Suttle, 2007). For at least the past two decades, researchers have assumed that the pool of viruses in the ocean is dominated by bacteriophages with DNA genomes (Steward et al., 1992; Breitbart et al., 2002; Weinbauer, 2004; Comeau et al., 2010; Sullivan et al., 2010). Perhaps as a consequence, studies of the molecular diversity of marine viruses have most commonly (exclusively, before 2003) focused on DNA viruses (Edwards and Rohwer, 2005; Kristensen et al., 2010). However, evidence that RNA viruses are important contributors to marine plankton ecology has been steadily accumulating (Lang et al., 2009).

The isolation of a positive-sense, single-stranded RNA (ssRNA) virus that infects the raphidophyte Heterosigma akashiwo (HaRNAV) was the first recorded instance of an RNA virus infecting a marine protist (Tai et al., 2003). This was followed by reports of similar picorna-like ssRNA viruses infecting diatoms (Nagasaki et al., 2004; Shirai et al., 2008; Tomaru et al., 2009), and a thraustochytrid (Takao et al., 2005). All of these viruses are now classified as members of the order Picornavirales. Molecular surveys using degenerate primers to target the RNA-dependent RNA polymerase gene of picorna-like viruses have shown that, in addition to the handful of isolates, a very diverse pool of uncultivated picornavirads exist in seawater (Culley et al., 2003; Culley and Steward, 2007).

Other novel RNA viruses infecting marine protists have also been isolated. A double-stranded RNA virus infecting the abundant marine prymnesiophyte Micromonas pusilla represents a new genus in the family Reoviridae (Brussaard et al., 2004). A positive-sense, ssRNA virus that infects the dinoflagellate Heterocapsa circularisquama (Tomaru et al., 2004) is only distantly related to existing viral families (Nagasaki et al., 2005) and may represent a new family. Metagenomic surveys of the RNA viral fraction of seawater are consistent with the view provided by the limited number of isolates available so far, and suggest that the positive-sense, single-stranded picornavirads dominate the marine RNA virus pool, but other diverse RNA viruses, including some containing double-stranded RNA, are present as well (Culley et al., 2006).

Despite the emerging evidence that marine RNA viruses are diverse and infect ecologically important members of the marine planktonic food web, there have been no reports that satisfactorily address the question of whether these viruses constitute a substantial fraction of the total virioplankton. This question is of considerable ecological interest, because, unlike the pool of DNA viruses in seawater, which is composed predominantly of bacteriophage-like sequences (Edwards and Rohwer, 2005), marine RNA viruses are almost exclusively comprised of those that infect eukaryotes (Lang et al., 2009).

Directly quantifying the abundance of RNA viruses in a mixed viral assemblage has proven difficult, because of technical limitations. Differences in counts using a DNA-specific stain vs a nonspecific nucleic acid stain have been reported (Weinbauer and Suttle, 1997; Guixa-Boixereu et al., 1999; Bettarel et al., 2000), but the results are of uncertain significance, because the stains differ in their sensitivity (Weinbauer, 2004). In particular, the small genomes of single-stranded DNA and RNA viruses make the individual virions difficult to detect even with appropriate stains (Brussaard et al., 2000; Tomaru and Nagasaki, 2007; Holmfeldt et al., 2012).

Given the difficulties of obtaining reliable direct counts of RNA viruses, we took a different approach. In the work reported here, we measured the relative masses of RNA and DNA in natural assemblages of viruses purified from tropical coastal seawater and coupled this with estimates of the mass of nucleic acid per RNA or DNA virion to obtain the first estimates of the relative abundance of RNA viruses in seawater.

Materials and methods

Study site

The samples for this study were collected from a pier in the southern portion of Kāne'ohe Bay on the windward side of O'ahu, Hawai‘i (21° 25′ 46.80″ N, 157° 47′ 31.51″ W). This tropical embayment is characterized by year-round warm temperatures (22 °C to 28 °C) and salinities ranging from 32 to 35, except during periods of heavy rain, when freshwater plumes from stream runoff can transiently suppress salinity to <30 (Drupp et al., 2011). Concentrations of chlorophyll a in surface waters are typically low with average values recorded over a 2.5-year period ranging from 0.5 to 1.1 μg l−1 in summer and from 1.3 to 3.4 μg l−1 in the winter depending on location (Drupp et al., 2011). Following heavy rainfall, blooms have been reported with transient increases in chlorophyll a to ca 6 μg l−1 (Hoover et al., 2006; De Carlo et al., 2007).

Sample collection and processing

Seawater was collected in acid-washed, polycarbonate carboys on 1 August 2009 (35 liters) and on 3 June 2010 (80 liters) during non-storm conditions. Seawater was transported immediately to the laboratory (<1 h) and filtered through 0.22 μm pore-size filters (Sterivex, Millipore, Billerica, MA, USA). The sample from June was split and processed as two parallel subsamples (40 liters each). Viruses in the filtrates were concentrated by iron flocculation (John et al., 2011) then concentrated further in a centrifugal 30 kDa ultrafiltration device (Amicon 15, Millipore). Viruses in each concentrate were purified by a two-step process in CsCl buoyant density gradients (Lawrence and Steward, 2010). Fractions (ca 0.5 ml each) were collected from the final gradient for analysis with an Auto Densi-Flow (Labconco, Kansas City, MO, USA).

Fraction analysis

The density of each fraction was measured using a micropipet and an analytical balance (Lawrence and Steward, 2010). CsCl was exchanged with buffer (10 mM Tris, 1 mM EDTA, pH 8) by centrifugal ultrafiltration (Steward, 2001) and total nucleic acids in a subsample of each fraction was extracted using spin columns (QIAamp MinElute Viral Spin Kit, Qiagen, Valencia, CA, USA) according to the manufacturer’s instructions. Samples for RNA analysis were treated twice with DNase (TURBO DNase, Life Technologies, Carlsbad, CA, USA) to avoid nonspecific signal. RNA and DNA contents of each fraction were measured separately in parallel subsamples by fluorometry (Quant-iT DNA and RNA assays, Life Technologies) in a cuvette fluorometer (TD-700, Turner Designs, Sunnyvale, CA, USA). The RNA content of the individual fractions from the June sample was low, so we also measured the RNA content of the pooled putative viral fractions to obtain a more accurate estimate. The nucleic acid masses (±95% confidence interval (CI)) were calculated by inverse prediction from the standard curves (Zar, 1996).

The fluorometric assay was necessary to achieve the required sensitivity. To ensure that we did not overestimate viral RNA with the fluorometric assay, we compared the measurements of the mass of RNA obtained from a purified RNA virus by fluorometry and by spectrophometry (Model DU 800, Beckman Coulter, Inc., Brea, CA, USA). The virus was a positive-sense, single-stranded, picornavirad that infects a marine diatom and has a genome of 8800 nucleotides (nt) (Schvarcz et al., unpublished results). We also tested for cross-reaction of DNA in the RNA fluorescence assay with and without the DNase digestion procedure described above (Figure 1). The DNA used was double-stranded genomic DNA from Enterobacteria phage lambda. The fluorometer was calibrated with the Quant-iT kit standards (Escherichia coli RNA). The RNA standards were then read as samples along with undigested and digested DNA standards and the purified viral RNA. Apparent RNA concentration for each sample as determined by fluorometry was plotted as a function of the given RNA or DNA concentrations (Quant-iT kit standards) or, in the case of the viral RNA, the concentration determined by absorption at 260 nm (Sambrook and Russell, 2001). Model I regression lines were calculated and a two-tailed t-test used to determine if the slopes were significantly different from zero (Zar, 1996).

Figure 1
figure 1

Test of the fluorescence-based RNA assay. The x axis represents concentrations of DNA or RNA as determined by absorbance of purified nucleic acid solutions at 260 nm. The y axis represents the apparent RNA concentration based on fluorometric signal after calibration with the RNA standards in the Quant-iT kit (Life Technologies). Closed circles are the E. coli RNA standards provided with the kit and which were used to calibrate the fluorometer. The open triangle is purified genomic RNA from a ssRNA virus that infects a marine diatom (mean±95% CI). Squares are dilutions of Enterobacteria phage lambda genomic DNA (double-stranded DNA (dsDNA)). Closed squares are undigested lambda DNA, and the open squares lambda DNA digested with TURBO DNase. Slopes of the model I linear regressions (solid lines) are noted above each line. Dashed lines represent the 95% confidence bands for the RNA standards.

Metagenomic analysis

To determine the composition of the RNA in the selected RNA peaks, we created and analyzed metagenomes from this material for the August sample and from one of the duplicate subsamples (replicate 1) from June. Purified, DNase-treated RNA from the indicated fractions (Figure 2) of each sample was pooled and amplified using random priming-mediated sequence-independent single-primer amplification as described previously (Djikeng et al., 2008; Culley et al., 2010). Tests of this method on RNA viral genomes indicated that it results in coverage and redundancy similar to the ideal values that are predicted by the Lander–Waterman model (Djikeng et al., 2008), which suggests amplification biases are limited. Sequence libraries from the resulting amplified complementary DNA samples were produced by pyrosequencing (GS FLX Titanium, 454 Life Sciences, Branford, CT, USA). Sequences that were >2 s.d. from the mean length or had an average phred score of <15 were discarded using the MG-RAST QC pipeline ( and sequence-independent single-primer amplification primers were trimmed from the remaining reads. Artificial (or technical) replicates (Gomez-Alvarez et al., 2009) were removed using the online tool MG-RAST (Meyer et al., 2008).

Figure 2
figure 2

Distribution of nucleic acids after separation of viral concentrates in CsCl buoyant density gradients. DNA and RNA per fraction for samples collected on 1 August 2009 (top panel) and the duplicate samples from 3 June 2010 (middle and bottom panels) is shown. DNA in the fractions between the dashed lines and RNA in the fractions between the solid lines was considered viral.

Processed sequences were assembled with CLC Genomics Workbench version 5.0 (CLCbio, Cambridge, MA, USA) using global alignment with automatic word and bubble sizes, a minimum contig length of 200, mismatch, insertion and deletion costs set to 3, length fraction set to 0.5, and the similarity threshold set to 0.8. The community composition of each sample was analyzed with MEGAN (Huson et al., 2011) using the output from blastx comparisons (Altschul et al., 1990) of the assembled metagenomes (contigs plus singletons) with the non-redundant NCBI sequence database. The threshold E-value for considering a hit to be significant was 10−5. The taxonomic assignment for a given contig was applied to all reads comprising that contig. MEGAN assignments were manually checked and some reassignments made in the case of annotation errors. As an independent check on possible contamination from cellular RNA, sequences were compared with the SILVA database (Pruesse et al., 2007) using blastn to search for ribosomal RNA contamination, which is expected to be the major source of cellular RNA contamination (Karpinets et al., 2006). The mass of RNA in the pooled viral fraction for each library was adjusted downward by the percentage of reads identified as being cell derived.

Calculation of DNA and RNA genome copies

To estimate the number of RNA virus genomes, we first summed the RNA mass within a narrow buoyant density range (between the solid vertical lines, Figure 2). Although RNA viruses can have densities outside of that range, we used this conservative window because we analyzed the RNA in only those fractions by metagenomic analysis. Using the relative representation of different taxa in the metagenomic library and the average genome sizes for those taxa derived from data in the Ninth report of the International Committee on the Taxonomy of Viruses (King et al., 2012), we calculated a weighted average RNA mass per virion. For our prima facie estimates of the RNA virus contribution, we assumed that the sequences having no significant hit to any sequences in GenBank (either directly or by association with other sequences in a contig) had the same taxonomic distribution as the collection of sequences that did have significant BLAST hits. The portion of the total RNA determined to be viral was then divided by the weighted average mass of RNA per virion to obtain the number of RNA-containing viruses. Errors were calculated (and propagated) as the 95% CI of the inverse predictions of RNA mass from the standard curves.

To estimate abundance of DNA viruses, we summed the DNA content in the fractions encompassing the main DNA peak in the viral buoyant density range (between the dashed vertical lines, Figure 2) and converted the total DNA mass to numbers of DNA viruses assuming an average DNA content per virion of 5.5 × 10−17 g (equivalent to 50 kb double-stranded DNA), an average that was found to be similar in a wide range of environments (Steward et al., 2000). This value is conservative compared with some other estimates (Brum, 2005; Angly et al., 2009) and therefore potentially overestimates the DNA virus abundance.

We also calculated the contribution of RNA viruses using more extreme assumptions to get a sense of how much higher and lower the percentage might be. For the high estimate, we assumed that the DNA per virion was twofold greater (100 kb, double-stranded DNA) as calculated for marine viral metagenomic data (Angly et al., 2009). For the low estimate, we assumed an average DNA content per virion twofold lower than expected (25 kb, double-stranded DNA), and conservatively assumed that any RNA sequences with no significant match in GenBank were not viral (which reduced the total mass of viral RNA by about half).

Results and Discussion

To achieve the sensitivity necessary to assay the nucleic acids in our CsCl-purified viral fractions, we had to employ fluorometry. We performed simple tests of the RNA assay kit to ensure that it would provide reasonable estimates of viral RNA concentration, and that our measurements would not be overestimated by cross-reaction from DNA. With the fluorometer calibrated to the kit standards, the standards themselves result in a linear curve with a significant slope of 1.000 as expected (P<0.001). The assay of a purified ssRNA virus falls very close to the calibration line (low by 14%) suggesting that the kit standards (dilutions of E. coli RNA) are reasonably accurate for quantifying a ssRNA viral genome (Figure 1). DNA was found to have a limited cross-reaction with the RNA stain with a significant (P<0.001) slope of 0.084, which suggests a cross-reaction of 8.4% on average. However, this cross-reaction was effectively removed (slope statistically indistinguishable from zero; P>0.5) by the DNase treatment that we applied to all of our samples before RNA assay (Figure 1).

The majority of DNA (83–90%, depending on the sample) fell within a range of buoyant densities from 1.33 to 1.53 g ml−1 with a peak around 1.45 g ml−1 (Figure 2). This buoyant density range and the peak location are similar to previous observations for DNA-containing viruses from other marine environments (Steward et al., 2000) and is well within the range for all known viruses (1.16–1.6; King et al., 2012). RNA concentrations displayed a local peak at the same density with 32–69% of the total in the same viral range. DNA and RNA were also found in higher density fractions nearer to, or at, the bottom of the gradient. As these fractions are outside the known range of viral buoyant densities, data from them were not included in subsequent analyses. This resulted in the exclusion of a higher percentage of the RNA than DNA.

For the metagenomic libraries prepared from RNA within the viral density range (Figure 2), the number of quality-controlled, de-replicated reads was 139 801 and 110 140 for the 2009 and 2010 samples, respectively (Table 1). The majority of reads (69–78%) in both libraries formed contigs with maximum lengths of 9378 bp for August and 9480 bp for June. We found that, after assembly and classification of the sequences in the RNA viral metagenomes, roughly half (50–57%) of the reads derived from the designated viral fractions were most similar to known eukaryote-infecting RNA viruses, with the majority matching positive-sense, ssRNA viruses in the order Picornavirales (Table 2). The percentage of sequences matching known double-stranded RNA viruses was very small (0.02–1.2%). The calculated weighted average mass of RNA per virion was 5.38 × 10−18 g in August and 5.25 × 10−18 g in June (Table 2), which translate into weighted average genome sizes of 9528 nt and 9301 nt, respectively. The average of these (9414 nt) is very similar to the maximum contig lengths observed in each library and somewhat larger than the three complete RNA virus genomes (ranging from 4449 to 9212 nt) assembled from a sample from coastal British Columbia (Culley et al., 2007).

Table 1 Number and length (in nucleotides, nt) of initial quality-controlled reads and of contigs after assembly for the August 2009 and the June 2010 libraries, % G+C content of each library, and the percentage of reads in each library that were matched with at least one other sequence to form a contig
Table 2 The percentage of reads in each of two libraries (August 2009 and June 2010) that matched known viruses, that had no significant match in the sequence databases, or were most similar to sequences derived from cells

Less than half of the reads (41–43%) had no significant matches in GenBank. We presume that most of these unidentified sequences are viral as well, because many otherwise unidentifiable sequences were found to assemble with virus-like sequences and the fraction of reads identified as cellular was small (Table 2). Estimates of cellular contamination in each library based on blastn comparisons with the Silva database (6.1% and 0.7%) were nearly identical to those based on blastx comparisons with GenBank nr database (6.7% and 0.8% of the sequences). The reads identified as cell derived by the former method were a subset of those identified by the latter. The identification of all of the putative ribosomal RNA reads by blastx was somewhat surprising, but appears to be a result of mis-annotations of ribosomal RNA genes as hypothetical proteins in GenBank. As the blastx annotations used for MEGAN were more comprehensive in the assignment of cell-derived reads, the larger percentages from that method were accepted as correct. After adjusting the RNA mass estimates to discount the contribution from cellular RNA, our estimates put the contribution of RNA viruses to the total number of viruses at 38–63% (Table 3). By applying more conservative and more liberal assumptions, we calculated extreme high and low estimates that ranged from 15% to 77% (Table 3).

Table 3 Relative abundance of RNA viruses in coastal waters

The absolute numbers for both types of viruses in our final samples are minimum estimates, because of losses during processing. Assuming typical viral concentrations for these waters based on epifluorescence microscopy to be on the order of 0.5 to 1 × 1010 per liter, this implies overall final yields of DNA viruses on the order of 17–33% (June) and 5–9% (August), but these estimates have considerable uncertainty. In the absence of data to the contrary, we assume these losses to be similar for RNA and DNA viruses. If DNA viruses were preferentially lost, this would have led to overestimates of the contribution of RNA viruses. Some of the largest viruses (especially those >0.2 μm in diameter) will be lost by the initial 0.2 μm filtration and all of the largest viruses known so far contain double-stranded DNA (King et al., 2012), suggesting that the purification procedure will have some bias against DNA viruses. However, the numerical contribution of viruses >0.2 μm to the total virioplankton appears to be low based on quantitative surveys using electron microscopy (Bratbak et al., 1992; Cochlan et al., 1993; Wommack and Colwell, 2000; Auguet et al., 2006). The viral concentration method we used has not been tested specifically on RNA viruses, but is reported to be exceptionally efficient for DNA viruses (John et al., 2011). We were also more conservative in our assignment of RNA as viral compared with DNA. From the above considerations, we feel it is unlikely that we have grossly overestimated the contribution of RNA viruses, but we cannot conclusively rule out the possibility that our procedure is significantly biased against DNA viruses.

If RNA viruses are as abundant as our data suggest, this would have important consequences for our understanding of marine viral ecology. The data imply, for example, that eukaryotic viruses can be just as abundant as bacteriophages in coastal ocean waters, despite the fact that the eukaryotic plankton concentrations are orders of magnitude lower than those of bacteria. It seems that the much larger burst sizes of eukaryotic RNA viruses (thousands to tens-of-thousands; Lang et al., 2009) relative to those of bacteriophages (tens to hundreds; Wommack and Colwell, 2000), compensates for the lower host abundances. This is consistent with earlier theoretical work based on mass transport calculations (Murray and Jackson, 1992). At the time of that report, only DNA viruses with relatively small burst sizes were known to infect marine protists. As a consequence, the authors tentatively concluded that the viruses in seawater primarily infect bacteria. They point out, however, that small viruses having a large burst size (that is, those like the RNA viruses that we now know exist), could have a large contribution to protistan mortality.

As RNA viruses and single-stranded DNA viruses are not reliably detected with the current routine methods for viral direct counts, our data also suggest that many rate estimates that depend on fluorescence-based viral direct counts, such as viral turnover times, virus–host contact rates and viral production rates, may be in need of revision. Development of new methods to directly count even the smallest viruses would be helpful in better constraining the rates of these important processes. In the meantime, the approach we described here provides a means to estimate the relative contribution of RNA viruses in natural aquatic habitats to determine whether our observations are more broadly representative of marine and freshwater habitats.