Identification and genome analysis of a novel picornavirus from captive belugas (Delphinapterus leucas) in China

The discovery of new viruses is important for predicting their potential threats to the health of humans and other animals. A novel picornavirus was identified from oral, throat, and anal swab samples collected from belugas (Delphinapterus leucas), from Dalian Sun Asia Tourism Holding Co., China, between January and December 2018, using a metagenomics approach. The genome of this novel PicoV-HMU-1 strain was 8197 nucleotides (nt) in length, with a open reading frame (from 1091 to 8074 nt) that encoded a polyprotein precursor of 2328 amino acids. Moreover, the genomic length and GC content of PicoV-HMU-1 were within the ranges found in other picornaviruses, and the genome organization was also similar. Nevertheless, PicoV-HMU-1 had a lower amino acid identity and distinct host species compared with other members of the Picornaviridae family. Phylogenetic trees were constructed based on the P1 and 3D amino acid sequences of PicoV-HMU-1 along with representative members of the Picornaviridae family, which showed that PicoV-HMU-1 was related to unclassified bat picornaviruses groups. These findings suggest that the PicoV-HMU-1 strain represents a potentially novel genus of picornavirus. These data can enhance our understanding of the picornavirus genetic diversity and evolution.

. The 5′-untranslated region (UTR) of picornaviruses is a structured functional region that contains an internal ribosomal entry site (IRES), which mediates the end-independent initiation of translation 3 . The single ORF encodes a polyprotein precursor that is divided into three regions (P1, P2, and P3), of which P1 encodes the capsid proteins, whereas P2 and P3 encode proteins involved in proteolytic processing and viral replication. The P1 region is derived from VP4 to VP1, the P2 region is derived from 2A to 2C, and the P3 region is derived from 3A to 3D 1 . Advances in metagenomics have allowed the identification of new viral genomic sequences with clinical and epidemiological value. Metagenomic technology has facilitated virus discovery due to the enhanced capacity to detect viruses with an unknown genetic background. Recently, several novel picornaviruses were discovered within different hosts, including a novel picornavirus from Macaca mulatta in China identified by viral metagenomic analysis 4 . Furthermore, a novel bovine picornavirus was discovered in Japan 5 , and several picornaviruses, including three bat, one feline, and one canine picornaviruses were discovered in Hong Kong [6][7][8] . The discovery of novel picornavirus provides the foundation for virus traceability and evolutionary research. Overall, approximately 75% of emerging diseases are estimated to arise from zoonotic sources 9 . Zoological parks and aquariums provide a unique opportunity for emerging virus surveillance. The discovery of novel viruses may help predict their potential threat to the health of humans and other animals. In this report, we analyzed by high-throughput next-generation sequencing (NGS) a novel picornavirus identified in belugas (Delphinapterus leucas) from an aquarium.

Results
Metagenomic analysis. In total, 28 oral, throat, and anal swab samples from 12 captive belugas (Delphinapterus leucas) were acquired and were combined into two pools. We found new viruses for construction and NGS in one of the pools. A total of 1.91 Gb of nucleotide data (19,732,441 valid reads, 100 bp in length) were obtained. Reads that were classified as being from cellular organisms (such as bacteria, archaea, and eukaryotes) and those with no significant similarities to any amino acid (aa) sequences in the non-redundant (NR) nucleotide database were removed. The remaining 7517 reads were best-matched with viral proteins from the NR database, which included the families Herpesviridae, Podoviridae, Microviridae, Picornaviridae, Papillomaviridae, environmental samples viruses, ssRNA viruses, unclassified + ssRNA viruses, unclassified RNA viruses, and unclassified bacterial viruses (Fig. 1). There were 60 Picornaviridae-associated reads in the lane. The reads mapped to the picornaviral genome are shown in Supplementary Fig. S1 online.
Genome analysis. Based on the NGS-generated fragment sequences, primers were designed to obtain the intervening portions of the genome. The terminal sequences were then acquired using a 5' and 3' rapid amplification of cDNA ends (RACE) method. We obtained a picornaviral genome sequence, named the PicoV-HMU-1 strain, which was 8,197 bases in length and with a G + C content of 43.4% after exclusion of the polyadenylated tract. Both the 5'-(1090 bases) and 3'-ends (123 bases) of the genome contained UTRs. The genome contained a ORF of 6,984 bases that encoded a polyprotein precursor of 2328 aa. The sequenced PicoV-HMU-1 genome was predicted to encode the P1 (843 aa), P2 (694 aa), and P3 (791 aa) regions, which potentially encode the capsid proteins VP4, VP2, VP3, and VP1; the non-structural proteins 2A, 2B, and 2C, and 3A, 3B, 3C, and 3D, www.nature.com/scientificreports/ respectively (Fig. 2). The 5′-UTR of PicoV-HMU-1 was 1090 nt in length. According to the blast analysis, PicoV-HMU-1 shared high sequence identity with Bat picornavirus(HQ595344). We selected 362-1090 bp to analyse the secondary structure by comparison with the complete sequence of HQ595344, and hence we performed the secondary structure prediction shown in Fig. 3.
Phylogenetic analyses. The phylogenetic trees were constructed based on the P1 and 3D aa sequences of PicoV-HMU-1 along with representative members of the Picornaviridae family and other unclassified viruses. In the P1 trees, PicoV-HMU-1 was close to unclassified bat picornavirus 3 and la_io_picornavirus_1 and formed a monophyletic branch (Fig. 4). In the 3D trees, PicoV-HMU-1 was also related to the unclassified bat picornaviruses 3 and formed an independent cluster (Fig. 5).

Discussion
Previous viral outbreaks in marine mammals (such as morbillivirus infections in European seals and Eastern U.S. dolphins) have temporarily reduced their populations 10 . Morbillivirus infections have also been detected in marine mammals in the Canadian arctic, as evidenced by the presence of neutralizing antibodies in walruses 11 . In 2007, researchers in California identified the first sequence-confirmed picornavirus isolated from ringed seals, which was proposed to represented a member of a new picornavirus genus in the Picornaviridae family 12 .
In the last year, Australian researchers demonstrated that the novel marine flaviviruses identified in crustaceans were more closely related to the terrestrial vector-borne flaviviruses than classical insect-specific flaviviruses 13 . Although their clinical relevance remains unclear, identification of novel viruses and their respective animal hosts is important for a better understanding of their genetic diversity, evolution, biology, and potential for cross-species transmission and emergence.
In this study, we discovered a new picornavirus within oral, throat, and anal swab samples from white whales using metagenomic technology. Genomic characterization revealed that the structural features of PicoV-HMU-1 are similar to those of other Picornaviridae family members, with its genomic length and GC content being  www.nature.com/scientificreports/ within the ranges of those found in picornaviruses. Moreover, the genome organization was similar to that of other picornaviruses, with the characteristic gene order: (with 3C pro and 3D pol being 3C protease and 3D polymerase, respectively). Picornaviruses encode extensive RNA stem-and-loop structures in the 5′-UTR regions that are critical for viral replication. The 5′-UTR of PicoV-HMU-1 was 1090 nt in length. According to the Bat picornavirus(HQ595344), We performed the secondary structure prediction with the secondary structure region (362-1090 bp). The secondary structure of the PicoV-HMU-1 sequence is different from that of all currently known IRESs (Fig. 3). At the nucleic acid level, the sequence was barely comparable with that of other known picornaviruses in the National Center for Biotechnology Information nucleotide database. Therefore, we suggest that this secondary structure is of a novel type of IRES. Pair-wise alignment of the aa sequences of PicoV-HMU-1 with those of other picornaviruses   Table 2). Phylogenetic analysis confirmed that PicoV-HMU-1 was related to unclassified bat picornaviruses groups. Therefore, it is likely that these viruses may have shared a common ancestor, and started to evolve independently after terrestrial mammals entered the ocean. Moreover, these results suggest that the host of the novel picornavirus was exclusively the white whale. According to our findings, PicoV-HMU-1 has a lower aa identity and a distinct host species compared with other members of the Picornaviridae family. Thus, we propose that the novel PicoV-HMU-1 is a new picornavirus  www.nature.com/scientificreports/ Viral nucleic acid library construction and next-generation sequencing. All 28 samples were combined into two pools based on sampling time and were passed through 0.45 µm filters. The collected filtrates were digested with DNase (Applied Biosystems, Waltham, MA, USA) to remove unprotected nucleic acids. Total RNA was extracted using a QIAamp Viral RNA Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. cDNA was generated using Superscript III Reverse Transcriptase (Invitrogen, Waltham, MA, USA). Amplified viral nucleic acid libraries were analyzed on an HiSeq2500 sequencer (Illumina, San Diego, CA, USA) for a single read of 100 bp in length. The raw sequence reads were filtered using previously described criteria to obtain valid sequences 14 .
Taxonomic assignment. Sequence similarity-based taxonomic assignments were conducted as previously described 14 . Briefly, each read was evaluated for viral origin by conducting alignments with the non-redundant (NR) nucleotide and protein databases of the National Center for Biotechnology Information, using BLASTn and BLASTx, respectively (Expected value < 10 −5 , F: Filter query sequence, default = T). Taxonomies of alignedreads with the best BLAST scores (E-value < 10 −5 ) from all lanes were parsed and exported using the Metagenome Analyzer (MEGAN) 6 15 .
Genome sequencing. Molecular clues provided by metagenomic analyses were used to classify sequence reads into a virus family or genus using MEGAN 6. Viral RNA was isolated using QIAamp Viral RNA Mini Kit (Qiagen), and cDNA was generated using random primer and Superscript III Reverse Transcriptase (Invitrogen) according to the manufacturer's instructions. Gene-specific primers were designed based on representative identified reads of the novel picornavirus while ensuring that they covered the partial genomes by nested-PCR amplification and Sanger sequencing. The remaining genomic sequences were obtained by amplification using a genome-walking kit (Takara, Kusatsu, Japan), and the full 5′-and 3′-end sequences were obtained by repeated amplification using a 5′-RACE system version 2.0 combo (Invitrogen) and 3′-Full RACE Core Set with Prime-Script RTase (Takara) according to the manufacturers' instructions. All primer sequences were based on the newly obtained reads and newly amplified sequences. All primers used are shown in Supplementary Table S1 online.

Detection of picornavirus.
Using the genomic sequences of the viruses obtained by amplification of the ends, specific primers for nested-PCR were designed to screen for picornavirus in the 28 swabs samples collected from 12 captive belugas. PCRs were performed using KOD One PCR Master Mix (Takara), and the first PCR round was primed with outer primers (F: 5′-AGC AGT TAC CTT GCC CAC G-3′ and R: 5′-TCC CTG CTC GCA CCTTG-3′), and the second PCR round was primed with inner primers (F: 5′-CGC CTG AGA CTG GTGT-3′ and R: 5′-TTG CCA TTG GGT GTAA-3′). PCR product (1 µL) from the first round was used as template for the second round reaction. The thermal cycling conditions for the first round PCR were 98 ℃ for 5 min, followed by 35 cycles at 98 ℃ for 10 s, 50 ℃ for 5 s, 68 ℃ for 5 s, and a final elongation step at 68 ℃ for 1 min; and for the second round PCR were 98 ℃ for 5 min, followed by 35 cycles of 98 ℃ for 10 s, 58 ℃ for 5 s, 68 ℃ for 5 s, and a final elongation step at 68 ℃ for 1 min. Finally, the PCR products were analyzed by 1% agarose gel electrophoresis ultraviolet imaging. Positive samples were determined by the presence of 642 bp amplified products. All PCR products were further verified by Sanger sequencing.

Genome annotation and phylogenetic analysis. Picornaviruses containing IRES-like sequences in
the 5′-UTRs were identified by BLAST searches (http:// www. ncbi. nlm. nih. gov/ BLAST/) of viral sequences in the GenBank database. Secondary/tertiary structural elements in picornavirus 5′-UTRs were modeled using the aligned RNAalifold server of the ViennaRNA Web Services (http:// rna. tbi. univie. ac. at/). MEGA 6.0 (http:// www. megas oftwa re. net) 15 was used to align the nt sequences and deduce the corresponding aa sequences using the MUSCLE package with default parameters. Phylogenetic trees showing the relationships among picornaviruses based on the amino acid sequences of P1 and 3D were generated using maximum likelihood mtREV with Freqs (+ F) model, gamma distributed with invariant sites (G + I), and 1000 bootstrap replicates. Routine sequence alignments were performed using Clustal Omega, EMBOSS Needle (both available at: http:// www. ebi. ac. uk/ Tools/), MegAlign and Lasergene (DNAstar, Madison, WI, USA), and T-coffee (http:// www. tcoff ee. org/) with manual curation. The nt sequences of the genomes and aa sequences of the ORF were deduced by comparison with those of other picornaviruses. The conserved protein families and domains were predicted using Pfam and InterProScan 5 (available at: http:// www. ebi. ac. uk/ servi ces/ prote ins). Amino acid identities and genetic distances were calculated using the ML method and were performed using pairwise evolutionary distance calculation as the distance metric.
Animal rights statement. Animals