Abstract
Picorna-like viruses are a loosely defined group of positive-sense single-stranded RNA viruses that are major pathogens of animals, plants and insects. They include viruses that are of enormous economic and public-health concern and are responsible for animal diseases (such as poliomyelitis1), plant diseases (such as sharka2) and insect diseases (such as sacbrood3). Viruses from the six divergent families (the Picornaviridae, Caliciviridae, Comoviridae, Sequiviridae, Dicistroviridae and Potyviridae) that comprise the picorna-like virus superfamily4 have the following features in common: a genome with a protein attached to the 5' end and no overlapping open reading frames, all the RNAs are translated into a polyprotein before processing, and a conserved RNA-dependent RNA polymerase (RdRp) protein. Analyses of RdRp sequences from these viruses produce phylogenies that are congruent with established picorna-like virus family assignments5, 6, 7; hence, this gene is an excellent molecular marker for examining the diversity of picorna-like viruses in nature. Here we report, on the basis of analysis of RdRp sequences amplified from marine virus communities, that a diverse array of picorna-like viruses exists in the ocean. All of the sequences amplified were divergent from known picorna-like viruses, and fell within four monophyletic groups that probably belong to at least two new families. Moreover, we show that an isolate belonging to one of these groups is a lytic pathogen of Heterosigma akashiwo, a toxic-bloom-forming alga responsible for severe economic losses to the finfish aquaculture industry, suggesting that picorna-like viruses are important pathogens of marine phytoplankton.
Viruses are extremely abundant and geochemically significant agents of microbial mortality in the ocean8, 9. They comprise a morphologically and genetically diverse array of pathogens, some of which infect heterotrophic bacteria and cyanobacteria, as well as photosynthetic and non-photosynthetic protists10, 11, 12. On the basis of a variety of evidence, marine viral communities have been assumed to consist almost entirely of double-stranded DNA viruses; consequently, little effort has been made to examine natural communities of RNA viruses. There are data however to indicate that marine RNA viruses are also important. For example, rhabdoviruses and paramyxoviruses are negative-sense, ssRNA viruses that are major pathogens of fish13 and marine mammals14, respectively. In addition, picorna-like viruses have been isolated that are pathogens of penaeid shrimp7, seals15 and whales15.
We used available sequences in GenBank to design degenerate primers that target the highly conserved RdRp sequence in picorna-like viruses. These primers in conjunction with RT–PCR were used to assay for the presence of picorna-like viruses in the coastal waters of British Columbia, Canada. With the exception of retroviruses, all RNA viruses encode an RdRp, which is essential for replication. Within the RdRp protein sequence, several motifs have been identified which are homologous among diverse species of RNA viruses5, 16. Groups based on alignments of these conserved regions are congruent with presently defined RNA virus families6, 17. Families of picorna-like viruses classified on the basis of RdRp sequence data are congruent with families of picorna-like viruses classified according to virus structure, host and epidemiology. Consequently, sequence analysis can be used to infer the relationship of sequences from natural viral communities to known families of picorna-like viruses.
Viruses were concentrated from sea water using ultrafiltration18 in the spring and summer of 1996, 1997 and 2000. Viral RdRp sequences were amplified from 13 of the 21 viral communities examined. Amplification occurred in samples collected from oceanographically diverse environments, including anthropogenically influenced sites, estuarine environments, stations with a well-mixed water column and pristine, highly stratified fjords. Preliminary analysis of the environmental sequences by BLAST19 searches of the GenBank database gave high similarities to several picorna-like virus family RdRp sequences as well as an unidentified viral sequence from a Chinese clam homogenate20. These results suggested that picorna-like viruses are prevalent in a variety of marine waters.
To confirm the amplified products originated from picorna-like viruses, a selection of the amplicons were sequenced, and along with representative sequences from known picorna-like virus families, used to construct phylogenetic trees. All sequenced environmental PCR products translated into continuous amino acid sequences that contained the signature positive-stranded ssRNA virus RdRp motif GDD21. Phylogenetic analysis of the RdRp fragment amplified in this study resolved all established picorna-like families (Fig. 1). Strikingly, none of the environmental sequences fell within established families of picorna-like viruses, but rather into four previously unknown and distantly related groups that we refer to as A, B, C and D. Our results suggest that at least two and possibly all four of these groups represent new families of RNA viruses. Phylogenetic trees constructed with both maximum-likelihood and neighbour-joining methods group the environmental sequences outside established picorna-like virus families (Fig. 1). However, low bootstrap support prevents us from drawing any further conclusions regarding the evolutionary relationship between groups B, C and D. Interestingly, a single sample (JP800) collected from English Bay, which is adjacent to the Strait of Georgia and the city of Vancouver, contained representatives from three of the four novel clades, indicating that a high diversity of picorna-like viruses exists even within a single water sample.
Figure 1: Maximum-likelihood tree of RdRp sequences from environmental amplicons and representative viruses from picorna-like virus families.

(See Methods for complete virus names). Viruses from the Potyviridae, which contain RdRp sequences from a different lineage5, were used as an outgroup. Family names and group letters are shown. Environmental amplicons from coastal British Columbia are labelled by a two- or three-letter station designation, month, year, group sequence number and GenBank database accession number (SSSMYY-AA, see also Supplementary Information). TREE-PUZZLE support values are shown for relevant nodes in boldface followed by bootstrap values based on neighbour-joining analysis. N indicates there was no corresponding node in the neighbour-joining tree. The maximum-likelihood distance scale bar indicates a distance of 0.1.
High resolution image and legend (32K)Sequence identity among the clades of environmental picorna-like virus sequences was low, ranging from 38.9% to 54.6% nucleotide identity and 21.8% to 52.9% identity when translated to amino acids. In contrast, sequences within each clade were highly conserved and ranged from 97.7% to 100.0% and 95.9% to 100.0% identical on nucleotide and amino-acid levels, respectively. This is more variability than would be expected from potential errors in RT–PCR and sequencing; therefore, we are confident that despite the high identity within these groups, the sequences represent a number of different viruses. Moreover, within group A, although no sequences were identical on a nucleotide level, six were identical when translated into amino acids; this suggests that the observed nucleotide differences are real and not due to methodological error.
Although four groups of picorna-like viruses were discovered in this study, one of these groups (group A) contains a virus (HaRNAV22) that causes the lysis of Heterosigma akashiwo, a toxic-bloom-forming alga that is responsible for major fish deaths in temperate waters23. Nucleotide sequences from HaRNAV were 98.8% to 99.7% identical to sequences from group A, implying that there are a number of viruses closely related to HaRNAV that belong within this group. Interestingly, we were able to amplify sequences belonging to group A from samples collected in different locations, seasons and years, suggesting that HaRNAV-like viruses are reoccurring and widely distributed in the Strait of Georgia.
Our results indicate that there is a diverse but previously unknown community of picorna-like viruses that are persistently occurring and widespread in the ocean. The fact that sequences from four stations resulted in at least two novel families of picorna-like viruses suggests that the diversity of these viruses in the ocean is high. Branch lengths between groups B, C and D are similar to those among families and genera of known picorna-like viruses (Fig. 1), suggesting that these groups are representative of at least three novel genera within a new family, or, in fact, may represent three previously unknown families.
Viruses are obligate pathogens that generally remain infectious for a relatively short time in natural marine waters24. The repeated amplification of picorna-like virus sequences from the same geographic location in water samples collected over a four-year period implies persistent viral production and therefore infection. The significant, high degree of identity between a sequence amplified from clams harvested from Asian waters and marine group B sequences from this study suggests that picorna-like viruses are likely to be present in a wide range of marine environments. Furthermore, the few isolates of marine picorna-like viruses infect ecologically and economically important organisms. These include HaRNAV22, which infects the red-tide-forming, fish-killing raphidophyte Heterosigma akashiwo, Taura syndrome virus (TSV), which infects penaeid shrimp7, an intensely farmed, important member of the marine food web, and San Miguel sea lion viruses (SMSVs), which infect pinnipeds such as the California sea lion (Zalophus californianus)25. Ultimately these newly identified viruses should be isolated, sequenced in full and their hosts identified. In the terrestrial environment, picorna-like viruses infect a wide variety of organisms and are responsible for several important diseases; it seems likely that they will prove to be important players in marine ecosystems as well.
Methods
Sample collection and preparation
Viruses from 40 to 200 litres of sea water were concentrated from stations in and adjacent to the Strait of Georgia, British Columbia, between May and August during 1996, 1997 and 2000 aboard the CCGS Vector as described18. Two millilitres from each concentrated viral community was pelleted by ultracentrifugation and RNA extracted from resuspended pellets using Trizol-LS (Invitrogen) as per the manufacturer's protocol.
RT–PCR
The degenerate primers RdRp 1 (negative-strand, 5'-GGA/GGAC/TTACAG/CCIA/GA/TTTTGAT-3') and RdRp 2 (5'-A/CACCCAACG/TA/CCG/TCTTG/CAA/GA/GAA-3') were designed from an alignment of the putative RdRp sequences from several picorna-like viruses in the GenBank database. To confirm the identity of environmental PCR products, the 454-base pair (bp) target fragment includes a highly conserved amino acid motif (GDD) characteristic of the RdRp of positive-strand RNA viruses21.
Complementary DNA was synthesized with SuperScript II RNaseH- Reverse Transcriptase (Invitrogen) with the reagents provided using 5
l of extracted RNA and the positive-strand primer RdRp 2. Subsequently, PCR was performed with RdRp 1 and RdRp 2 primers. PCR products were separated on a 1.5% agarose gel and bands of the appropriate size (approximately 500 bp) were excised. Washed agarose plugs were used as the template in a second round of PCR with the RdRp primer set. Positive and negative controls were done in parallel for the entire procedure. These RdRp fragments were either directly cloned or separated using denaturing gradient gel electrophoresis (DGGE).
DGGE was conducted using 25% to 40% linear denaturant gradient, 7% to 8% linear polyacrylamide gradient gels, as described26. DGGE bands were excised, re-suspended and amplified in a third round of PCR with RdRp primers.
Cloning and sequencing
Second-round PCR fragments or re-amplified excised DGGE bands were cloned in the pGEM-T (Promega) vector using the manufacturer's protocol. Recombinants containing the cloned insert were identified using PCR with universal -21M13 (5'-GTTTTCCCAGTCACGACGTTGTA-3') and M13R (5'-CAGGAAACAGCTATGACC-3') primers. Cloned, second-round PCR products were screened by restriction endonuclease digestion. PCR products from plasmids with DGGE bands and second-round PCR inserts with unique digestion patterns were sequenced. PCR fragments were sequenced at the University of British Columbia Nucleic Acid and Protein Service Facility (Vancouver, Canada). Conserved regions of translated sequences were aligned with CLUSTAL X v1.81 (ref. 27) and then transformed into maximum-likelihood distances using the WAG matrix28 in TREE-PUZZLE v5.0 (ref. ref. 29) and 25,000 puzzling steps. Neighbour-joining bootstrap values were calculated based on 1,000 replicates using FITCH v.3.6 (ref. 30). The name, acronym and accession number of viruses used in Fig. 1 are: Aichi virus (AiV), NC_001918; broad bean wilt virus 2 (BBWV2), AB013615; cowpea mosaic virus (CPMV), NC_003549; cricket paralysis virus (CrPV), NC_003924; Drosophila C virus (DCV), NC_001834; encephalomyocarditis virus (EMCV), NC_001479; equine rhinitis B virus (ERBV), NC_003983; foot-and-mouth disease virus (FMDV), NC_004004; human rhinovirus A (HRV), NC_001490; maize chlorotic dwarf virus (MCDV), NC_003626; parsnip yellow fleck virus (PYFV), NC_003628; poliovirus (PV), NC_002058; porcine teschovirus (PTV), NC_003985; potato virus Y (PVY), NC_001616; rabbit haemorrhagic disease virus (RHDV), NC_001543; rice tungro spherical virus (RTSV), NC_001632; ryegrass mosaic virus (RGMV), NC_001814; Sapporo virus (SV), U65427; sweet potato mild mottle virus (SPMMV), NC_003797; swine vesicular exanthema virus (VESV), NC_002551; Taura syndrome virus (TSV), NC_003005; tomato ringspot virus (ToRSV), NC_003840; wheat streak mosaic virus (WSMV), NC_001886.
We determined an error rate of 0.3% to 0.9% based on five independent amplifications using RT–PCR of a picorna-like virus isolate, implying that sequences less than 99.0% identical are probably different.
