Introduction

Norovirus (NoV) of the genus Norovirus and the family Caliciviridae causes acute gastroenteritis in humans1. NoV shows strong infectivity leading to large epidemics of acute gastroenteritis in various countries including Japan2,3,4. Accumulating evidence suggests that approximately 50% of patients with acute gastroenteritis in the winter season in Japan may be due to NoV infection5,6. In addition, large outbreaks of food poisoning due to the virus have been reported7,8,9. Thus, NoV is a major causative agent of acute viral gastroenteritis in industrial countries as well as other major viral agents such as rotaviruses7,8,9.

NoV is classified into 5 genogroups (genogroups I–V)1. Among them, genogroups I and II are detected mainly in humans1. The NoV genome encodes 3 open reading frames (ORF) and ORF2 encodes the NoV capsid protein10. On the basis of detailed genetic analysis, Kroneman et al. showed that NoV GI and GII strains can be classified into 9 and 22 genotypes, respectively11.

In general, the capsid protein may be an essential determinant of the antigenicity of the non-enveloped virus12. For example, it plays pivotal roles in not only viral adsorption/entry but also leads to the generation of neutralising antibodies13,14,15,16. Thus, to control NoV infection, it is important to understand their antigenic variation13,14,15,16. NoV evolution has been investigated considerably, but most studies have focused on NoV GII17,18,19.

Recent advances of genetic analysis algorithms enable us to obtain the evolutionary information of various viruses. For example, we can assess the evolutionary time scale of viral genes using the Bayesian Markov chain Monte Carlo (MCMC) method20. In addition, maximum likelihood approaches may enable us to analyse the determinants of adaptation in viral proteins such as NoV capsid protein17,18. In the present study, we utilise these methods to analyse comprehensively the molecular evolution of the NoV GI capsid gene.

Results

Phylogenetic analysis and evolutionary rates of the NoV capsid gene by the Bayesian MCMC method

We constructed a phylogenetic tree with an evolutionary time scale by the Bayesian MCMC method. The 95% highest posterior densities (HPDs) for each node of the phylogenetic tree are indicated by grey bars in Fig. 1. In the present phylogenetic tree, the NoV GI strains divided into 2 lineages about 750 years ago. These lineages are subdivided into 9 genotypes (genotypes 1–9). Lineage 1 contains genotypes 1, 2, 4, 5 and 6, while lineage 2 contains genotypes 3, 7, 8 and 9. Furthermore, genotype 2, 4, 5, 6 and genotypes 7–9 subdivided from the same ancestor virus, while genotype 1/genotype 3 evolved uniquely. The mean evolutionary rate of the present strains was estimated as 1.26 × 10−3 substitutions/site/year (95% highest posterior density [HPD] 7.22 × 10−4–1.79 × 10−3). In addition, we obtained the evolutionary rate of 5 genotypes (GI.2–GI.6), while the rate could not be obtained for the other 4 genotypes due to the small number of strains analysed (Supplemental Table S1). As a result, the evolutionary rate of them was significantly different (p < 0.05, Kruskal-Wallis test). These results suggested that an ancestor NoV GI strain diverged from the ancestor of NoV GII, GIII and GIV strains and it could be dated back to 1570–4390 years ago, corresponding to 95% HPD (mean diverged year, 2803 years ago) (Fig. 1). Furthermore, the present NoV GI strains diverged about 750 years ago and the virus formed 9 genotypes with wide genetic divergence and rapid evolution.

Figure 1
figure 1

Phylogenetic tree of ORF2 constructed by the Bayesian Markov Chain Monte Carlo method.

The phylogenetic tree was based on the whole nucleotide sequence of ORF2 (1593 nt corresponding to GI.1/Norwalk/1968/US). We analysed 65 strains of GI, 6 strains of GII, 1 strain of GIII and 3 strains of GIV. Each node represents mean root height. The scale bar represents the unit of time (years). The grey bars indicate the 95% HPDs for the estimated year. The reference strains of each genotype are indicated by solid circles.

Selection pressure analysis

To estimate comprehensively the positive selection sites in the capsid protein of NoV, we used 4 methods: conservative single likelihood ancestor counting (SLAC), fixed effects likelihood (FEL), internal fixed effects likelihood (IFEL) and mixed effects model of evolution (MEME) (Table 1). Only 2 positive selection sites were estimated by the FEL and IFEL methods, while 19 sites were estimated using MEME. Notably, the amino acid (aa) substitutions of aa10 consisted of a variety of amino acids. In addition, the substitutions of aa557 were of a single amino acid. However, these substitution sites were not located in the protruding 2 (P2) domain, which is associated with cellular binding site of the capsid protein for NoV infection. Furthermore, over 400 negative selection sites were found in the capsid gene (Table 2). These results suggested that the positive selection sites in the NoV GI capsid protein are located mainly near the N- and C-terminal regions.

Table 1 Positive selection sites in ORF2 of NoV GI.
Table 2 Negative selection sites in ORF2 of NoV GI.

Predicted epitopes in reference strains

Using multiple methods such as LEPS21, BCPRED22, FBCPRED22, BepiPred23, Antigenic24 and LBtope25, we predicted the B-cell linear epitopes in the deduced amino acid sequences of the NoV capsid protein in the reference strains. In the present study, we accepted the epitopes as those identified with 4 or more methods and with >10 consecutive amino acids26. The detailed data are shown in Table 3. Many epitopes were estimated in the capsid protein of each NoV GI genotype. Of them, an epitope of GI.1 (aa377–388) may be associated with the histo-blood group antigen (HBGA) binding sites (Ser377, Pro378 and Ser380)27 (Table 3). In addition, 1–3 epitopes in each genotype were found in the P2 domain. In the present GI strains, a consensus epitope motif, PAPxGFP, was predicted in the P2 domain in 7 of the 9 genotypes (GI.1, 3–7 and 9). These results suggested that a few viral binding sites of host cells are linked to the epitope sites in the capsid protein of the viruses.

Table 3 Predicted epitopes of the reference strains for each genotype.

Phylodynamics of NoV GI strains

We assessed the phylodynamics of the capsid gene of the NoV GI strains using Bayesian skyline plot (BSP) analysis. As a result, the present strains showed effective population size values over 1000 for a period of 500 years (Fig. 2). In addition, a relatively constant value was seen from 1500–1900 CE, but thereafter the values tended to be low (Fig. 2). These results suggested that NoV GI strains might have adapted to humans over 500 years ago.

Figure 2
figure 2

Bayesian skyline plot of ORF2 in NoV GI.

The Bayesian skyline plot was estimated under the GTR-Γ4 model. The MCMC chains were run for 65,000,000 steps. The Y-axis represents the effective population size and the X-axis represents generation time (year). The solid black line represents the mean value over time. The 95% HPD intervals are shown in dotted lines.

Pairwise distance values of intergenogroup and intergenotypes

To assess the genetic distance among the present strains, we calculated their pairwise distance (p-distance) (Fig. 3). The p-distance value of the intergenogroup was 0.29 ± 0.07 (mean±standard deviation [SD]). The p-distance value of the intergenotypes was 0.036 ± 0.010–0.192 ± 0.082 (mean ± SD). These results suggested that the NoV GI capsid gene has undergone considerable genetic divergence (intergenogroup p-distance > 0.25).

Figure 3
figure 3

Distributions of the pairwise distance values of ORF2.

The distributions of the p-distance values based on the nucleotide sequences of NoV GI. A total of 65 strains were analysed.

Discussion

We studied the molecular evolution of the capsid gene in NoV genogroup I. First, we found that the human NoV GI strains diverged approximately 2,800 years ago from the ancestor of the GII, GIII and GIV strains, although the mean estimated time of divergence had a large variation (95% HPD, 1570–4390 years ago). NoV GI evolved rapidly (approximately 10−3 substitutions/site/year). They also had wide genetic divergence (p-distance > 0.25). In addition, the NoV GI strains diverged and formed 9 genotypes over a period of about 750 years. Some genotypes (genotypes 2, 4, 5 and 6) evolved from the same ancestor. Second, 2–19 positive selection sites and over 400 negative selection sites were estimated in the deduced capsid protein of NoV GI. Third, many epitopes were estimated in the deduced capsid proteins. However, there were few epitopes at the cellular binding site of the capsid protein. Furthermore, BSP analysis suggested that NoV GI strains adapted to humans a long time ago.

With regard to norovirus, many evolutionary and/or molecular epidemiological studies have been reported17,18,19,28. For example, Rackoff et al., estimated the molecular evolution rate of NoV GI.1 and GI.3 as 1.37 × 10−3 and 1.25 × 10−3 substitutions/site/year, respectively28. In the present study, we obtained a mean evolutionary rate of 1.26 × 10−3 substitutions/site/year for all NoV GI genotypes. The evolutionary rate among the genotypes GI.2–GI.6 were significantly different (Supplemental Table S1). These results suggested that the evolutionary rate of the NoV GI capsid gene was variable among NoV GI genotypes, although these data, including our own, are limited, because they were estimated using a small number of strains. However, the present study may be the first report to estimate the evolutionary rate for all genotypes of the NoV GI capsid gene.

The capsid protein of a non-enveloped virus plays pivotal functions such as adsorption and entry of the target cells12. Divergence of the capsid protein may be linked to the antigenicity of various viruses12. Thus, divergence of the capsid protein may reflect differences in the antigenicity of NoV. Furthermore, host defence mechanisms, including the immune system, may act as a selective pressure to NoV. In general, a viral protein with strong antigenicity may undergo strong selection pressure, resulting in the presence of many positive selection sites in the antigenic protein29. Indeed, many positive selection sites were found in the capsid proteins of an enterovirus showing strong antigenicity30. To date, some representative studies regarding the relationship between positive selection and antigenicity in NoV have been reported17,28. For example, Cotton et al. showed some positive selection sites in NoV GII strains, at Glu106Arg and Asn298Asp31. Moreover, Siebenga et al. confirmed some sites in NoV GII/4: Asn6Ser, Asn9Ser/Thr, Ala15Thr, Ile47Val and Ala534Thr/Val17. The capsid proteins of NoV GI and GII may have undergone selective pressure mainly near the N- and C-terminal regions in the host. In the present study, variations (at 2–19 sites) among the 4 models—SLAC, FEL, IFEL and MEME—was found. In each method, differences in the number of positive selection sites have been found in other virus genomes32,33. This may be due to differences in the principles used in each method to estimate the sites34,35.

Two distinct types of epitope, T-cell-recognised and B-cell-recognised epitopes, have been confirmed36. B-cell-recognised epitopes may be an important index for the prediction of antibody binding sites against NoV GI. Next, previous reports suggested that the HBGA binding sites of the viral P2 domains are associated with infection of host cells27. In the present study, although we only estimated B-cell linear epitopes, we found the following: 1) many predicted epitopes were found in the capsid protein of NoV GI; and 2) a consensus epitope motif (PAPxGFP) was estimated in 7 of the 9 GI genotypes (Table 3). Regarding the HBGA binding sites, many predicted epitopes were found in the capsid protein of NoV GI. Among them, an epitope of GI.1 (aa377–388) was estimated at an HBGA binding site of the P2 domain (Table 3)27. Previous reports showed that the host cellular binding sites of NoV may be located in the P2 domain of the capsid protein (corresponding to aa279–405 in ORF2 of GI.1/Norwalk/1968/US)37,38,39. If epitopes are located in the P2 domain, the immune system may react with them, leading to the generation of a neutralising antibody. Furthermore, previous reports estimated some epitopes in NoV GII strains40. These epitopes are located in the P2 domain on the surface of the capsid protein of GII.4 strains40. Furthermore, some positive selection sites were identified in an area of the P2 domain associated with blockade epitope A by using a monoclonal antibody31. An effective neutralising antibody may inhibit NoV infection of the host; however, the majority of epitopes in the other NoV GI genotypes were not detected at the HBGA binding sites of the P2 domain. However, we did not confirm the conformational epitopes in the present NoV GI strains. In various RNA viruses, such as dengue viruses, the conformational epitopes may be associated with the production of neutralising antibodies41,42,43. Together, further studies regarding the relationships among B-cell epitopes, including the conformational epitopes, HBGA binding sites and the consensus epitope motif (PAPxGFP), are needed to assess whether the human immune system can produce effective neutralising antibodies against most types of NoV GI.

Next, to evaluate the effective population size of NoV GI, we performed BSP analysis. This method enables us to estimate the effective population size over a period of several hundred years, even in there are no sequences from strains aged more than 50 years20. The effective population size showed a constant value from 1500 to 1900 CE. The values decreased from 1900 to 1950 CE; however, after that, the values were restored. With regard to NoV GII.4, no relationship was found between epidemics of the virus and its effective population size based on calculations using the capsid gene17. Conversely, a relationship was found between epidemics of the virus and its effective population size based on partial sequencing of the polymerase gene17. Both previous and present results suggest that NoV GI strains evolved and maintained constant genetic divergence. The virus may be subject to selection pressure in the host, resulting in the lack of a significant change in its effective population size.

In conclusion, NoV GI strains evolved rapidly and their common ancestor dated back to approximately 750 years ago. The virus may be under local positive selection to escape the immune system of the host, resulting in its adaptation to humans. In addition, to understand better the molecular evolution of NoV GI, further studies regarding the evolution of other genes including the RNA dependent RNA polymerase (RdRp) gene may be needed.

Materials and Methods

Strains and alignments

We collected complete capsid gene (ORF2) sequences of all NoV GI strains, excluding ORF1/2 recombinant strains, from GenBank. We confirmed the genotype of each strain using the norovirus typing tool NoroNet44. All sequences were aligned using Clustal W45. More than 99% of identical sequences were removed from the dataset. A total of 65 strains were collected. The nucleotide sequences correspond to positions 1–1593 in ORF2 of GI.1/Norwalk/1968/US (GenBank accession No. M87661). The detailed data are shown in Supplemental Table S2.

Phylogenetic analysis by the Bayesian Markov Chain Monte Carlo method

We estimated time-scaled phylogeny and evolutionary rate of ORF2 using the Bayesian MCMC method in the BEAST program v1.7.520. To estimate the time of divergence from the other genogroups, some sequences of GII, GIII and GIV were added to the sequences from 65 GI strains (Supplemental Table S2). The KAKUSAN4 program was used to select the best nucleotide substitution model46. Four clock models (strict, lognormal, exponential and random) and 4 demographic models (constant size, exponential growth, expansion growth and lognormal growth) were compared using Akaike’s information criterion through MCMC (AICM)47,48. The datasets were analysed using the GTR-Γ4 model of substitution under a lognormal relaxed clock model with an exponential growth model. The MCMC chains were run for 50,000,000 steps to achieve convergence with sampling every 1000 steps. Convergence was assessed from the effective sample size (ESS) after a 10% burn-in using Tracer v1.649. Only parameters with an ESS above 200 were accepted. Uncertainty in the estimates was indicated by the 95% HPD intervals. The maximum clade credibility tree was generated by Tree Annotator v 1.7.5 after a 10% burn-in. The phylogenetic tree was viewed in FigTree v1.3.1. The evolutionary rates of each genotype were also calculated. In addition, to estimate changes in the effective population size through time of NoV GI, a BSP was constructed using the BEAST program as described above.

Statistical analyses

Statistical analyses were performed using the Kruskal-Wallis test or Mann-Whitney U test with Bonferroni correction using EZR50. Values of p < 0.05 were considered to be significant.

Estimation of positive and negative selection sites

To evaluate the selection pressure on the ORF2 region, synonymous (dS) and nonsynonymous (dN) substitution rates at every codon were calculated by Datamonkey using the following methods: SLAC, FEL, IFEL and MEME51. SLAC is intensive for large alignments compared to the other methods35; however, this method pretends to underrate the substitution rate35. In contrast, the FEL and IFEL methods consider both synonymous and nonsynonymous rate variations and may be efficiently parallelised35. MEME can consider episodic selective pressure34. We employed 4 different methods for accurate calculations. The cut-off p-value was set at 0.05.

Epitope prediction

The B-cell linear epitopes of the standard reference strains were predicted as described previously26. We used the following six tools: LEPS21, BCPRED22, FBCPRED22, BepiPred23, Antigenic24 and LBtope25. All tools were used in the default condition. We accepted the common epitopes estimated by 4 or more tools and with >10 consecutive amino acids26.

Calculation of p-distance values

To assess the frequency distribution of NoV GI, the p-distance values of intergenogroup and intergenotypes were calculated. We analysed the present strains using MEGA 6.052.

Additional Information

How to cite this article: Kobayashi, M. et al. Molecular Evolution of the Capsid Gene in Norovirus Genogroup I. Sci. Rep. 5, 13806; doi: 10.1038/srep13806 (2015).