A decade of sustained selection pressure on two surface sites of the VP1 protein of Enterovirus A71 suggests that immune evasion may be an indirect driver for virulence

Enterovirus A71 (EV-A71) is an emerging pathogen in the Enterovirus A species group. EV-A71 causes hand, foot and mouth disease (HFMD), with virulent variants exhibiting polio-like acute flaccid paralysis and other central nervous system manifestations. We analysed all enterovirus A71 complete genomes with collection dates from 2008 to mid-2018. All sub-genotypes exhibit a strong molecular clock with omega (dN/dS) suggesting strong purifying selection. In sub-genotypes B5 and C4, positive selection can be detected at two surface sites on the VP1 protein, also detected in positive selection studies performed prior to 2008. Toggling of a limited repertoire of amino acids at these positively selected residues over the last decade suggests that EV-A71 may be undergoing a sustained frequency-dependent selection process for immune evasion, raising issues for vaccine development. These same sites have also been previously implicated in virus-host binding and strain-associated severity of HFMD, suggesting that immune evasion may be an indirect driver for virulence (154 words).

A further research priority in recent years has been elucidation of the molecular basis of neurovirulence in EV-A71. Such knowledge would enable prediction of the likely severity of new outbreaks, or prognosis in individual cases where the viral genome sequence could be determined [13][14][15][16][17][18][19] . Searches for positive selection have also  Table 1. Pairwise genetic distances between viruses of species Enterovirus A and subtypes of EV-A71. Genetic distances, in substitutions per site, calculated using the TN93 substitution matrix with sitewise rate variation modelled using a gamma distribution (shape parameter = 0.6868), between EV-A71 sub-genotypes, coxsackie virus CV-A16 (their nearest non-EV-A71 relative in Enterovirus A) and baboon A13, the outlier within Enterovirus A (see Fig. 1). The upper inset table shows the mean genetic distance within genotypes B and C, and the lower inset table shows the mean genetic distances between genotypes.  www.nature.com/scientificreports www.nature.com/scientificreports/ been carried out [20][21][22][23] and there is some overlap between the detected residues in the two types of study, suggesting that the virulence of EV-A71 may be undergoing natural selection.
Evolution of EV-A71 at the molecular level has implications for vaccine design. EV-A71 sub-genotypes are serologically distinctive, implying that acquired immunity to one sub-genotype may not be completely protective in future outbreaks of different sub-genotypes 24 . Supportive of this notion, epidemics of EV-A71 often correlate with the appearance of a previously unencountered sub-genotype in a population -for instance the successive epidemics of sub-genotypes B3, B4 and B5 in Singapore 25 and the cyclical nature of outbreaks of EV-A71 26 . Tee et al. 20 show that EV-A71 phylogenetic trees display a ladder-like morphology, suggesting that immune evasion may be a selective pressure producing antigenic drift within EV-A71 cyclical sub-genotypes.
The preponderance of large epidemics of virulent EV-A71 in east Asia, despite the presence of the virus on all continents, is unexplained. Host factors such as HLA-A33 and HLA-DR17 responses have been linked to severe HFMD 27 , suggesting that human populations may differ in susceptibility. This may help to explain why separation of virulent and non-virulent strains purely on the basis of viral genome sequence, has remained elusive.
In this report, we update the analyses of Tee et al. 20 , which covered all EV-A71 sequences up to mid-2008, by analysing EV-A71 genome sequences from 2008 to mid-2018. We examine the behaviour of the molecular clock within sub-genotypes, and use Bayesian skyline analysis to demonstrate population dynamics of the two most important sub-genotypes of the last decade. We then search for evidence of amino acid residues under positive selection, and display their patterns of variation on phylogenetic trees. The structural consequences of positive selection on EV-A71 proteins are examined and implications for immune evasion, antigenic drift and virulence are discussed.

Results
Genetic distance between sub-genotypes. Genetic distance, measured by the Tamura-Nei substitution model 28 , averages at 0.30 to 0.33 substitutions per site, between EV-A71 genotypes. Within genotypes, the corresponding figure is 0.17 to 0.19 substitutions per site between sub-genotypes. The genetic distance from EV-A71 to CV-A16, EV-A71's nearest relative within species Enterovirus A, ranges from 0.35 to 0.43 substitutions per site, and to baboon enterovirus A13, the outlier in Enterovirus A, from 0.85 to 0.91 substitutions per site (Table 1). Figure 1 places Table 1 in the context of the wider Enterovirus A species, showing phylogenetic relationships within species Enterovirus A and indicating viruses known to cause HFMD. HFMD-associated viruses tend to be more closely related to EV-A71 but phylogenetic clustering is not complete, and members of species Enterovirus B (not included in Fig. 1) can also cause HFMD 2 . Table 2 summarizes qualifying EV-A71 sequences for each sub-genotype (see Methods section for qualification protocol). Since 2007, genotypes B5 and especially C4 have been predominant in GenBank. Figure 2 plots the correlation co-efficients for the root-to-tip distance on the input tree and sampling date. Sub-genotypes B1, B5, C4 and C5 have very high R values (>0.9, see Table 2) suggesting highly clock-like behaviour. The R value of C2 (0.76) is somewhat less than the others, but even this is strong enough to indicate clock-like behaviour. Bayesian analysis of clock models using BEAST shows that relaxed molecular clock models are the likeliest for all genotypes (see Supplementary Data) and these were used for the construction of Bayesian skylines on sub-genotypes B5 and C4.

Molecular clocks in EV-A71 sub-genotypes.
Bayesian skyline plots of EV-A71. Figure 3 shows the Bayesian skyline plots for sub-genotypes B5 and C4. These sub-genotypes emerged in 2003 and 1998, respectively. BEAST analysis by Tee et al. 20   www.nature.com/scientificreports www.nature.com/scientificreports/ Natural selection on EV-A71. Table 2 and Fig. 4 show the omega (dN/dS) values calculated for sub-genotypes B5, C2 and C4 collected since 2008. Figure 4 also shows the relative substitution rates in the 3 codon positions. Together, these indicate that EV-A71 has been evolving for the past decade mostly under selective constraint.
Within this general pattern of selective constraint, positive selection is detected on certain sites. Figure 5 shows the sitewise omega Slr output for sub-genotypes B5, C2 and C4. Eight sites in sub-genotype C4, fifteen sites in C2 and fourteen sites in B5 have omega values >1, suggesting positive selection. However, only two positions in C4 are statistically significant, and one each in C2 and B5.

Figure 2.
TempEst analysis of molecular clocks in genotypes B1, B5, C2, C4 and C5. x-axis: sampling dates; y-axis: root-to-tip distance on a maximum likelihood tree using the GTR + G substitution model. Sub-genotype B1 in inset. See Table 2 for correlation coefficient R values for each sub-genotype.  Figure 5D shows the location of the selected site in the VP1 protein of PDB structure 4YVS (sub-genotype C4). VP1, VP2 and VP3 make up the icosahedral surface structure of the EV-A71 virus particle. Five VP1 proteins form a pentamer which has a raised "mountain" feature 29,30 at its vertex. Between the "mountain" and another raised feature is a "canyon". An antibody binding site is found on both slopes of the canyon 31 .
Temporal pattern of positive selection. Figure 6 shows the toggling of amino acids at statistically significant positively selected sites 663 in sub-genotype C4 and 710 in B5 since 2008.  www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
Our sampling protocol for EV-A71 sequences used in this analysis, is directed at all complete genomes over the last decade, where collection dates are available. This reduces the 12122 total sequences from EV-A71 in GenBank on our sequence retrieval date to a subset of 697, from which a further 46 suspected recombinant genomes were removed. The stringency of this protocol is designed to ensure data quality, but carries the risk of sampling bias. We note that Table 1 presents overwhelmingly B5 and C4 sub-genotypes. However, we suggest that this is not a result of any lack of interest by sequencing laboratories in other sub-genotypes but is rather more likely to be an indication of genuine predominance of these sub-genotypes over the study decade. Nevertheless, we acknowledge, in the absence of any globally co-ordinated monitoring programme for EV-A71, that the potential for inadequate sampling remains a shortcoming of this study.
Although EV-A71 and influenza A are very different viruses, being positive-strand unsegmented and negative-strand segmented, respectively, several factors may be identified that suggest that their epidemiological dynamics may be similar. For instance, we show in this paper that EV-A71 sub-genotypes exhibit remarkably regular clock-like substitution behaviour (Fig. 2), comparable to that shown by seasonal influenza A subtypes 32 , as previously suggested by Tee et al. 20 . We calculate that overall rates of omega (dN/dS) in EV-A71 sub-genotypes have been low since 2008 ( Table 2, Fig. 4), but we also find evidence of a small number of sites under positive selection, with omega >1 (Fig. 5). Low overall omega implies strong purifying selection and a virus that is well adapted to its human hosts. This has previously been found in a study restricted to the evolution of the VP1 gene 23 . Figure 5 is notable for many sites presenting signals of positive selection that do not achieve statistical significance. This may be due to more fleeting selection pressures that are not sustained over the entire decade examined in the analysis.
Again as in influenza A, we find that those sites with statistically significant positive selection tend to be in prominent functional surface positions 31,33 . Together with the ladder-like phylogenetic trees of EV-A71 20 , these factors suggest an evolutionary scenario of a proficiently host-adapted virus engaged in intense immune evasion mediated at a small number of virus particle surface sites that interact directly with the host immune system. The result of this evolutionary struggle between virus and host is antigenic drift, with annual seasonal epidemics of influenza A, and cycling non-seasonal EV-A71 outbreaks 20,26 . When novel genotypes are formed -either by reassortment in influenza or recombination and cladogenesis in EV-A71 -there may be a partial absence of cross-protectivity derived from previous genotypes, allowing surges in new sub-genotype cases. For EV-A71, this is illustrated using Bayesian skyline analysis for pre-2008 genotypes by Tee, et al. 20 and for post-2008 sub-genotypes by Fig. 3. Our skyline analysis shows a decline of C4 in the early part of the present decade after its peak around 2004, and that this may be coincident with the second expansion of B5, suggesting competition among sub-genotypes for hosts.
The pattern of natural selection in EV-A71 has the added complication that the positively selected sites may also have been involved in virulence as well as antibody binding. Polyprotein positions 663, 710 and 729 have previously been implicated in changes in cell tropism 34 , and also in virulence 15 . The first two of these sites are among the identified positively selected sites in the present study and are also among the positively selected sites previously detected 20,21 . Specifically, amino acids G, Q, E and R at position 710 have been associated with virulence 13,16,17,19 with Q and R correlated to acute flaccid paralysis 18 . When Q is present at position 710, experimental www.nature.com/scientificreports www.nature.com/scientificreports/ evidence suggests it may confer better virus-cell binding and also immune evasion 18 , where E is present it may confer higher viraemia and neurovirulence 19 . At position 663, K is associated with the development of acute flaccid paralysis 18 .
We find that in sub-genotype C4 genomes since 2008, K is the second most common residue, after E, at positively selected site 663 (Fig. 6A, red branches for K), implying the frequent circulation of flaccid paralysis-inducing strains. We also find that in B5 genomes since 2008, residue Q is occasionally present at positively selected site 710 (Fig. 6B, green branches for Q), implying that this sub-genotype may also pose an ongoing flaccid paralysis risk. Moreover, all the residues we see at position 710 in genotype B5 since 2008 have been implicated in some aspect of increased virulence. Figure 6 also shows that variation at positions 663 and 710 since 2008 exhibits a toggling pattern, with many tip substitutions on the phylogenetic trees. This indicates that the signal for positive selection is not due to a past selective sweep of EV-A71, but rather suggests a sustained frequency-dependent selection process, where antigenic novelty is favoured, but only as long as it takes for the host population to develop herd immunity against the new variant. The continuity of position of positively selected sites in both post-2008 genomes and in previous studies 20,21 shows that the nature of the molecular interaction between host and virus producing the selective pressure has been unchanged over a timescale of decades. Both position 663 and 710 have the same predominant variant, E. Indeed, the basal position of E in both trees in Fig. 6 may indicate that E is the most functionally efficient residue at these positions from the point of view of the action of VP1 as coat protein, and that it may only be its vulnerability to antibody binding that selects for periodic temporary substitution to a small handful of variants. The involvement of these same residues in virulence may be due to a dependency of virulence on the efficiency of virus binding to host cells 34 . This implies that virus immune evasion may be driving changes in virulence, by an incidental effect on basic virus-host interaction.
One final comparison with seasonal influenza is that any future approved EV-A71 vaccines may need to be renewed on a regular basis, as positive selection will produce vaccine resistant forms. However, unlike seasonal influenza, where the repertoire of antigenic variation is large, the relatively small set of immune evasion options at residues 663 and 710 displayed by EV-A71 over the last decade and in previous studies, suggests that a cocktail vaccine composed of strains with the seven variant residues may generate a herd immunity that will last several years.

Methods
Collection and processing of Enterovirus A71 genome sequences. On 3rd October 2018, GenBank contained 12122 nucleotide sequences classified as from Enterovirus A71. Of these, 877 were greater than 5 kb in length. These were downloaded and sequences annotated as passaged in cell culture were manually removed, along with those annotated as unverified, or with internal runs of more than 6 consecutive Ns, reducing the number of qualifying sequences to 784.
Sequences with a collection date were then extracted using tempus et locus (TeL 35 ). 697 qualifying sequences were found to have collection dates. Rapid input of sequences sets to TempEst for molecular clock analysis is assisted by the following expedient. Where only year of collection is given, TeL generated a collection date of 30 th June for that year (155 cases; no sequences had genuine collection dates on the 30 th June of any year). Where only a month and year of collection is given, TeL generated the 15 th of that month (111 cases; 12 sequences with genuine collection dates on the 15 th of any month were noted).
TeL-dated output sequences were grouped into sub-genotypes using E-type (available in the Supplementary Data), which compares each sequence with a subset of the reference genomes given by van der Sanden, et al. 36 plus MG672478 37 for putative genotype E, using needle from EMBOSS 38 to perform a global pairwise alignment. The needle output is parsed for each reference genome and the closest match was used to assign a sub-genotype to the query sequence. As output, E-type produced a separate FASTA-formatted file for each sub-genotype. Where the difference between the closest sub-genotype match and the second closest match for any sequence was < = 5% (152 sequences), further analysis was carried out using Simplot 39 . If Simplot indicated a likely recombination event, the sequence was removed from the analysis (46 sequences).
Duplicate sequences are also removed from each sub-genotype file if they share the same collection date, using duptrimmer (available in the Supplementary Data). This is a rare occurrence, found once in genotype B5 and twice in C4.
Phylodynamic analysis of sub-genotype sets. Sub-genotypes output files from E-type with ten or more qualifying sequences after the removal of duplicates and recombinants, were aligned using MAFFT 40 , and a maximum likelihood tree produced in MEGA 41 using the GTR + G substitution model 42 . TempEst 32 was used to derive the best fitting root and assess clock-like behaviour for each sub-genotype tree. For those sub-genotypes with more than 25 qualifying sequences (only C4 and B5), Bayesian skyline plots were produced using BEAST 43 and Tracer 44 .
Detection of positive selection using Slr. Genome sequences with a collection date before 2008 were removed and each sub-genotype alignment was trimmed to the boundaries of the coding sequence. Where necessary, alignment was refined using Muscle in "align codons" mode within MEGA. Positive selection, neutrality and constraint were analysed on a site-wise basis using Slr 45 (version 1.5.0). An initial estimate of the transition/ transversion bias (kappa) was obtained in MEGA and was used along with an initial value of omega (dN/dS) of 1 (corresponding to neutral evolution) in the Slr input parameters. After the first run, the values of kappa and omega derived by Slr were used as input parameters to confirm convergence. Visualization of positively detected sites was performed in Molecular Operating Environment (MOE), 2013.08 (Chemical Computing Group ULC, 1010 Sherbooke St. West, Suite #910, Montreal, QC, Canada, H3A 2R7).