A novel hepatovirus identified in wild woodchuck Marmota himalayana

Hepatitis A virus (HAV) is a hepatotropic picornavirus that causes acute liver disease worldwide. Here, we report on the identification of a novel hepatovirus tentatively named Marmota Himalayana hepatovirus (MHHAV) in wild woodchucks (Marmota Himalayana) in China. The genomic and molecular characterization of MHHAV indicated that it is most closely related genetically to HAV. MHHAV has wide tissue distribution but shows tropism for the liver. The virus is morphologically and structurally similar to HAV. The pattern of its codon usage bias is also consistent with that of HAV. Phylogenetic analysis indicated that MHHAV groups with known HAVs but forms an independent branch, and represents a new species in the genus Hepatovirus within the family Picornaviridae. Antigenic site analysis suggested MHHAV has a new antigenic property to other HAVs. Further evolutionary analysis of MHHAV and primate HAVs led to a most recent common ancestor estimate of 1,000 years ago, while the common ancestor of all HAV-related viruses including phopivirus can be traced back to 1800 years ago. The discovery of MHHAV may provide new insights into the origin and evolution of HAV and a model system with which to explore the pathogenesis of HAV infection.

Antigenic site analysis. HAV has a conformation-dependent immunodominant neutralization site.
Residues S102, V171, A176, and K221 of VP1 as well as Q70, S71, E74 and 102-121 of VP3 have been implicated in neutralizing epitopes; residues 71 and 198 of VP2 as well as residues 89-96 of VP3 may harbor other epitopes 14 . Sequence alignment showed that all the aa in the antigenic sites were different between MHHAV and HAV prototype Fig. S1a) with the exception of S102 of VP1 and T71 of VP2. In addition, only two aa (A70 of VP3 and T71 of VP2) were identical between MHHAV and simian prototype. Furthermore, three aa are the same between MHHAV and the recently reported phopivirus, T74 of VP3 as well as T71 and P198 of VP2. The detailed alignment of the MHHAV capsid protein with counterparts was shown in Secondary RNA structure of the 5′ UTR. The 5′ UTR of MHHAV is 20 nt shorter than that of human HAV (734 nt), 45 nt and 73 nt longer than those of simian HAV (669 nt) and phopivirus (648 nt), respectively. It shares 56.72%, 54.48% and 41.73% nucleotide identity with human HAV, simian HAV and phopivirus, respectively. The predicted secondary structure of MHHAV (Fig. 2a) shows that MHHAV 5′ UTR contains five major structural domains (including six stem-loops) labeled from I to V beginning at the 5′ terminus of the 5′ UTR, and lacks the first domain found in human HAV. The first domain (stem-loop Ia and Ib) of MHHAV corresponds to domain II (stem-loop) of human HAV. The predicted secondary structure of simian HAV 5′ UTR (Fig. S3) is similar to that of human HAV but lacks the first domain and stem-loop Ia. The stem-loop Ia and Ib of domain I, present in human HAV domain II, may form a pseudoknot; however, the pseudoknot in domain V of human and simian HAVs was not found in MHHAV.
The segment from 79 to 120 nt of the MHHAV 5′ UTR represents a polypyrimidine tract that has been suggested not to be involved in the formation of conserved helical structures 25 . In the 5′ UTR secondary structure of MHHAV, stem-loop III is a long multi-loop cloverleaf structure that corresponds in position and shape to the previously designated stem-loop IV (in primate HAVs and phopivirus), is an internal ribosome entry site (IRES). The IRESes in MHHAV, primate HAV and phopivirus have highly conserved base paired regions governing internal translation initiation and belong to IRESes of type III.
Two important motifs near the 3′ border of the picornavirus IRES were also found in the MHHAV 5′ UTR. The first motif is an UUUCC sequence (box A) within the second pyrimidine-rich tract; the second motif is an AUG triplet (box B) that functions as an initiation codon in HAV and phopivirus 26  Continued two 1 nt bulges (Fig. 2b). However, in spite of the 68% nt identity between the cres of MHHAV and phopivirus, the similar region in phopivirus includes three loops and a stem segment (Fig. S4).
Detection of MHHAV in wild woodchucks. Sixteen (16.16%) (ID1-16) of 99 enteric lysates from wild woodchucks were positive for MHHAV RNA by RT-PCR, among which seven animals (ID1-7) were chosen for the collection of blood, liver, spleen, lung, and trachea samples. All sample types were MHHAV RNA-positive, with the highest virus load in the liver and the lowest in the trachea. T-test showed the viral loads in liver and other tissues were statistically significant (blood, p = 0.000; spleen, p = 0.022; lung, p = 0.000; trachea, p = 0.000). Two woodchucks (ID 17 and 18) had MHHAV RNA in their tissues and blood, but not in their feces (Fig. 3a). The sequences of the complete VP1 region amplified from these positive samples were determined. The 34 VP1 sequences showed 99% nt identity. Sequences within an individual woodchuck showed 100% nucleotide identity. All sequences were deposited into GenBank under accession numbers KT229577-KT229610. Furthermore, negative-stranded MHHAV RNA was detected only in the seven MHHAV-positive livers, while it is not detected in the spleen, lung, trachea, blood, and feces samples (Fig. 3b).
Phylogenetic and evolutionary analysis. Recombination analysis showed no evidence of inter-genotype recombination among primate HAVs, phopivirus and MHHAV (Fig. S5). Phylogenetic analysis suggested that MHHAV forms a distinct lineage compared to previous reported HAVs by the neighbor-joining method and 1,000 bootstrap replications. In contrast, the distance between phopivirus and previously reported HAVs is much closer (Fig. 4).   Under the best-fit model, the mean substitution rate was 8.62 × 10 −4 substitutions per site per year (ssy), with a 95% HPD of 6.96 × 10 −3 -7.03 × 10 −3 . The time of the most recent common ancestor of MHHAV and the primate HAV isolates was estimated to be around 1,000 years ago. In contrast, the common ancestor of MHHAV and phopivirus was estimated to be 1800 years ago (Fig. 5).
Electron microscopy. MHHAV was visualized by negative-staining electron microscopy, which revealed the presence of spherical, non-enveloped virus particles of ~27 nm in diameter, morphologically similar to HAV. Chloroform-purified virions were only rarely detected in feces, but were readily apparent following incubation with a polyclonal antibody (Fig. 6).

Discussion
HAV continues to be a source of morbidity and mortality despite the availability of an effective vaccine 15,27 . Recently a new HAV-related hepatovirus known as phopivirus was reported in seals 23 . Here, we report on the identification of a novel hepatovirus, tentatively named MHHAV, from wild woodchuck Marmota Himalayana. The purified virus particles from the fecal sample are morphologically similar to HAV virions by negative-staining electron microscopy. The complete genome and VP1 capsid protein of MHHAV was best hit to simian HAV, with 67% nt and 75% aa identities, respectively. Based on the classification of enteroviruses, in which a VP1 sequence identity of 70 to 85% is defined as a heterologous serotype, it is suggested that MHHAV may be a new serotype of HAV. Phylogenetic analyses of the polyprotein and VP1 protein indicate that MHHAV forms a separate branch from previous HAVs and may represent a new species in the genus Hepatovirus within the family Picornaviridae.
Although MHHAV has a genome organization identical to that of previously reported HAVs, the length of MHHAV polyprotein is different to those of the prototypic human HAV, simian HAV and phopivirus, respectively. Furthermore, the aa sequences of the predicted cleavage sites of MHHAV differ substantially from these viruses with only three cleavage sites (VP2/VP3, 3B/3C, and 3C/3D) conserved. The common motifs in the non-structural proteins of picornaviruses, such as the NTPase and helicase motifs, were also found in MHHAV. In most picornaviruses, RGD motif is located near the C terminus of VP1 28 , however, this motif was found in the middle of the VP3 region in MHHAV, primate HAVs and phopivirus. The RGD motif is conserved in picornaviruses and functions in recognizing and attaching to host cells or enabling cell-to-cell and cell-to-matrix interactions 29,30 . It is inferred from differences in the positions of RGD motifs that HAV interactions with host cell-surface integrins differ from those of other picornaviruses 14 . Furthermore, hepatovirus differ from other picornaviruses in that they rarely use codons most often preferred by their hosts 23,31 . MHHAV appears to follow HAV and phopivirus in codon usage. We speculate that this strategy may minimize direct competition of hepatovirus with host cell systems and enable persistence 31 .
The antigenic sites of HAVs have been characterized by several research teams 13,14,32 . Sequence alignment showed that these antigenic sites differed between MHHAV, primate HAVs and phopivirus. Further sequence alignment of the whole genome of MHHAV, primate HAVs and phopivirus revealed an 18 nt and 15nt insertion encoding six aa ([S]SSRRT) at the C terminus of VP1, with three or two potential O-glycosylation sites (glycosylation site prediction website: http://www.cbs.dtu.dk/services/NetNGlyc/). As glycosylation is essential for antigen processing and presentation and VP1 is a major antigenic protein, we speculate that these sequence differences may indicate antigenic differences between MHHAV and other HAVs. However, the real antigenic characteristics should be determined by a neutralization assay when an in vitro cell culture system for these viruses has been developed.
The predicted secondary structure of the 5′ UTRs of MHHAV, human HAV,, simian HAV and phopivirus is very similar. The most important structure in the 5′ UTR is the IRES, which directs internal initiation of translation 25 . The IRESes of primate HAVs, MHHAV and phopivirus exhibit evolutionarily conserved secondary structure including a long multi-loop cloverleaf structure 23 , which has been grouped into type III. This is different from IRESes of poliovirus and human rhinovirus (type I), encephalomyocarditis virus and foot-and-mouth disease virus (type II) and HCV like IRES (type IV) 33,34 . A conserved RNA structure, the 110 nt HAV cre element located near the 5′ end of the 3D Pol region, is present in both human and simian HAVs as well as in the distantly related avian encephalomyelitis virus 35 , but the similar cre element was not present in the corresponding region in phopivirus. However, this element was also found in MHHAV, as was the AAACA/G motif that serves as the template for uridylylation of VPg by a slide-back mechanism 35 . These findings suggest that MHHAV might have replication mechanisms and tissue tropism similar to those of other known HAVs, but the shape and position of cre element in phopivirus were still unknown. Further studies are needed to address the subtle difference in the secondary structure of the 5′ UTR among MHHAV, HAV and phopivirus.
HAV is highly transmissible and HAV infection is acquired primarily by the fecal-oral route. The poor sanitary condition can cause HAV outbreak locally. In the present study, 16 out of 99 (16.16%) wild woodchucks were detected to carry the MHHAV. Furthermore 34 MHHAV VP1 sequences from different animals in this study shared 99% nt identity and sequences in the same woodchuck showed 100% nt identity. All these indicated that an MHHAV outbreak might happen in wild woodchucks at the time we collected the samples. As anticipated for an HAV replication in the liver 36 , negative-sense RNA complementary to the positive-sense genomic MHHAV RNA was only detected in the liver, however, though with the highest RNA viral load in the livers, MHHAV distributed widely in different organs in the wild woodchucks, which was in accordance with the previous studies that HAV (phopivirus included) can also be detected in extrahepatic organs in the hosts 22,23,37,38 . This phenomenon may be partly explained by the fact that the HAV cellular receptor 1 was expressed wildly in different organ, such as liver, spleen, kidney and testis 39 , so these organs can capture HAV virons. However, they cannot support the virus replication by some specific reasons. These findings suggest that the liver is the target organ of MHHAV infection and replication in wild woodchucks. Although its hepatotropism and ability to cause disease remain to be determined, the presence of MHHAV in the liver of wild woodchuck Marmota Himalayana is reminiscent of human HAV infection.
The phopivirus reported in seals in USA 23 has similar geomic organization, codon usage bias and hepatic tropism to HAV and MHHAV. Phylogenetic analyses in this study indicated that the phopivirus is more distant to the previously reported HAVs than MHHAV. Evolutionary analysis suggested that phopivirus has a common evolutionary ancestry with HAVs and MHHAV. The estimated substitution rate of these viruses was 8.62 × 10 −4 ssy, similar to a French study based on VP1 sequences from primate genotype IA HAVs (9.76 × 10 −4 ssy) 40 . This substitution rate is lower than those found in other picornaviruses 41,42 . However, the common ancestor of MHHAV, primate HAVs and phopivirus was older than that of MHHAV and primate HAVs, indicating that the diversity and evolutionary pathway of HAV are far more complex than previously thought.
Human hepatotropic viruses or related viruses that infect wild woodchuck include woodchuck hepatitis virus (WHV) and hepatitis delta virus (HDV) 43,44 . Natural infection with WHV results in liver disease similar to that induced by HBV in humans 45 . Woodchuck Marmota monax is commonly used as an animal model for hepatitis B virus (HBV) infection 46,47 . There are currently no non-primate models of HAV infection. Guinea pigs can be infected by HAV but do not develop signs of disease or seroconvert 38 . The discovery of MHHAV in woodchuck may facilitate the development of a new tractable animal model of human HAV infection and thus provide further insights into the evolution and pathogenesis of, HAVs.

Materials and Methods
Specimens and high-throughput sequencing. Ninety-nine wild woodchucks were caught from Haixi, Qinghai Province in 2013. Enteric lysates of all the animals and the liver, spleen, lung, and trachea specimens of some animals were collected after exsanguination. The samples were transported on dry ice and stored at − 80 °C at the China Center for Disease Control (CDC). After dilution (1:5 ratio, wt/vol) and filtration (0.45 μm and 0.22 μm membranes), total nucleic acid was extracted from 99 enteric lysates, followed by cDNA synthesis. Random PCR amplification was performed on each sample using primers with different barcodes. The PCR products were pooled for sequencing on the Illumina MiSeq platform (Illumina, San Diego. CA). Metagenomic   profiling of the shotgun datasets was carried out using the customized informatics pipeline VirusSeeker to computationally identify viral sequences 48 . The study protocol was approved by the Ethics Committee of the China CDC, and was performed according to Chinese ethics laws and regulations. Furthermore, the methods were carried out in accordance with the approved guidelines.

Full-length genomic amplification.
To determine the full-length genomic sequence of MHHAV, primers were initially designed based on contigs obtained by miseq high-throughput sequencing. Further synthesis was based on newly amplified MHHAV sequences. Long fragments (1500-3000 bp in length) were amplified for final confirmation. All PCR amplifications were performed using ExTaq DNA polymerase. The extreme 5′ and 3′ ends of the genome were determined using a SMART RACE cDNA Amplification Kit (Clontech, US) and Genome Walking Kit (Takara, Japan). Sequences were assembled and manually edited to produce the final sequence of the viral genome. Codon usage was assessed both for the MHHAV and woodchuck. For woodchuck, codon usage tables were obtained from a database based on genomes in NCBI GenBank (http://www.kazusa.or.jp). For MHHAV, codon usage frequencies were determined by the Cusp program (http://emboss.sourceforge.net/apps/ cvs/emboss/apps/cusp.html). Recombination and phylogenetic analysis. Complete genomes of the known HAVs were downloaded.

Detection of MHHAV
To detect potential combination, aligned sequences were analyzed by using the Boots canning method and the neighbor-joining algorithm was run with 100 pseudo replicates implemented in Simplot software. Phylogenetic tree were performed using nucleic acid sequences of VP1 and polyprotein by the Neighbor-joining method and subsequently subjected to bootstrap analysis with 1000 replicates. Tree figures were produced using MEGA software (version 5).

Evolutionary analysis. To precisely estimate MHHAV, phopivirus and HAV substitution, a Bayesian
Markov chain Monte Carlo (MCMC) approach was implemented in the BEAST package (v 1.8.2, available from http://beast.bio.ed.ac.uk/downloads). The jModelTest software 2.1.7 was used to identify the optimal evolutionary, Akaike Information Criterion and hierarchical likelihood ratio test suggested that the GTR (general time reversible) + Γ (gamma distributed rate variation) model best fitted the sequences in this study. Different population dynamic models were used (constant size, exponential growth, logistic growth, expansion growth and Bayesian skyline). The MCMC analysis was performed with 50 million generations and sampled every 1000 generations with 10% burnin. The results were computed and analyzed using Tracer 1.6. The effective sample size values for the estimated parameters in the MCMC analysis were greater than 200. Statistical uncertainty in the data was reflected in the 95% highest probability density values (HPD).
RNA structure prediction of the 5′ UTR. The secondary structure of the 5′ UTR RNA of MHHAV was predicted using consecutive fragments of the complete nucleotide sequence of the 5′ UTR of MHHAV and a thermodynamic folding energy minimization algorithm with RNA structure software (version 5.3); the graph was integrated using RnaViz (version 2.0.3).
Scientific RepoRts | 6:22361 | DOI: 10.1038/srep22361 Virus purification. Stool samples were diluted to 20% suspensions in phosphate-buffered saline (PBS). Beads and chloroform were added to the suspensions, followed by 20 min centrifugation at 1,500 × g. Supernatant was collected and then subjected to a single ultracentrifugation step through a discontinuous sucrose/glycerol density gradient for HAV purification, as described previously 28 . Electron microscopy. Fifty-microliter volumes of chloroform-purified MHHAV (1 × 10 7 copies/ml) in PBS were examined directly by negative staining with 1% phosphotungstic acid (pH 6.8). Chloroform-purified MHHAV (450 μl) was incubated with a 1:10 dilution of 50 μl MHHAV polyclonal antibody (BALB/c mice immunized subcutaneously with purified MHHAV) at 37 °C for 1 h. Following centrifugation at 23,000 rpm for 1 h, the sediment was resuspended in 50 μl PBS and the suspension subjected to negative staining. Grids were examined using a transmission electron microscope (TECNAI 12, FEI, Blackwood, NJ). Statistical Analysis. The statistical significance of viral load means between the liver and other tissues was assessed using the Student's t test, and the statistical analyses were performed using SPSS 16.0.