A novel subgenotype C6 Enterovirus A71 originating from the recombination between subgenotypes C4 and C2 strains in mainland China

Recombination plays important roles in the genetic diversity and evolution of Enterovirus A71 (EV-A71). The phylogenetics of EV-A71 in mainland China found that one strain DL71 formed a new subgenotype C6 with unknown origin. This study investigated the detailed genetic characteristics of the new variant. DL71 formed a distinct cluster within genotype C based on the genome and individual genes (5′UTR, VP4, VP1, 2A, 2B, 2C, 3D, and 3′UTR). The average genetic distances of the genome and individual genes (VP3, 2A, 2B, 2C, 3A, 3C, and 3D) between DL71 and reference strains were greater than 0.1. Nine recombination events involving smaller fragments along DL71 genome were detected. The strains Fuyang-0805a (C4) and Tainan/5746/98 (C2) were identified as the parental strains of DL71. In the non-recombination regions, DL71 had higher identities with Fuyang-0805a than Tainan/5746/98, and located in the cluster with C4 strains. However, in the recombination regions, DL71 had higher identities with Tainan/5746/98 than Fuyang-0805a, and located in the cluster with C2 strains. Thus, DL71 was a novel multiple inter-subgenotype recombinant derived from the dominant subgenotype C4 and the sporadic subgenotype C2 strains. Monitoring the emergence of new variants by the whole-genome sequencing remains essential for preventing disease outbreaks and developing new vaccines.


Results
Capsid VP1 gene characterization of the new subgenotype C6. Phylogenetic analysis was performed using the VP1 sequences of 3036 Chinese EV-A71 strains and 34 reference strains. The cladogram showed that except the predominant subgenotype C4 and sporadic subgenotypes (A, C0, C2, C3, and B5), an independent branch contained the orphan strain DL71 was located within genotype C with 97% bootstrap value but distinct from the known subgenotypes C0-C5, C1-like, and C2-like (Fig. 1). The novel branch was defined as subgenotype C6.
Genome sequence comparison of DL71. The complete genome and individual gene sequences of DL71 were compared to 221 EV-A71 reference strains belonging to different genotypes and subgenotypes. The numbers of base and amino acid substitutions per site from averaging over all sequence pairs between groups are shown in Table 1. The average genetic distance of nucleotide and amino acid sequences in different gene fragment between DL71 and these reference strains was calculated. DL71 has the smallest genetic distance with subgenotype C4 EV-A71 strains in nucleotide sequences of the genome (0.1280), 5′UTR (0.0677), VP2 (0.0728), VP3 (0.1277), 2C (0.1144), 3A (0.1131), 3B (0.0776) and 3C (0.1586). DL71 has the smallest genetic distance with subgenotype C2 EV-A71 strains in VP4 (0.0762), VP1 (0.0714), 2A (0.1219), 3D (0.1441) and 3′UTR (0.0412). And DL71 shares the smallest genetic distance with subgenotype C2-like EV-A71 strains in 2B (0.1790). A previous study has reported that the inter-genotype, inter-subgenotype and intra-subgenotype mean divergences of EV-A71 whole-genome nucleotide sequences were 0.17-0.22, 0.1-0.14 and 0.01-0.1, respectively 32 . However, the smallest genetic distances of the genome, VP3, 2C, 3A and 3C between DL71 and C4 strains, the smallest genetic distances of 2A and 3D between DL71 and C2 strains, and the smallest genetic distances of 2B between DL71 and C2-like strains were all more than 0.1. Moreover, the average genetic distances of the genomes and individual genes between DL71 and these reference strains (Table 1) were all higher than those within genotypes/subgenotypes (Supplementary Table S3). Thus, the results further demonstrated that DL71 belonged to a new subgenotype C6.
Phylogenetic analysis of the complete genome and individual genes. To assess the genetic relationships between DL71 and EV-A71 reference strains, phylogenetic trees based on the genome and individual gene sequences were constructed by the maximum likelihood method with the GTR model. As shown in Fig. 2, except that the VP2, VP3, 3A, 3B, and 3C of DL71 are located in the same evolutionary branch with subgenotype C4 strains, the genome and individual genes 5′UTR, VP4, VP1, 2A, 2B, 2C, 3D and 3′UTR of DL71 formed a new independent branch in phylogenies. Combining the results of the sequence comparison and phylogenetic analysis, it is reasonable to assume that the subgenotype C6 is a novel genetic lineage of EV-A71 in mainland China different from the previously known genetic lineages.  (Fig. 3a), the schematic sequence display (Fig. 3b) and the pairwise identity plot (Fig. 3c) of RDP4 showed that DL71 was a recombinant strain formed by 9 putative recombination events between a major parental strain Fuyang-0805a and a minor parental strain Tainan/5746/98, which belonged to subgenotypes C4 and C2, respectively. The average p-values of 7 algorithms were utilized to confirm the potential recombination events ( Table 2). The recombination scores of 9 recombination events were all more than 0.5 (Table 2) www.nature.com/scientificreports/ To display the recombination results of RDP4, the genome sequence of DL71was used as the query sequence in Simplot analysis. The recombination signals in DL71 genome exhibited by SimPlot were similar to those detected by RDP4. The similarity plot analysis showed that the sequence of DL71 between every two recombination regions shared higher similarity with Fuyang-0805a than Tainan/5746/98. However, the sequence of DL71 within every recombination region shared higher similarity with Tainan/5746/98 than Fuyang-0805a (Fig. 3d). The recombination diagram shown in the bootscan analysis further confirmed these recombination events (Fig. 3e).
In addition, the authenticity of the recombination events within DL71 genome was proven by a set of statistically incongruent phylogenetic trees, an approach considered the gold-standard bioinformatics method for demonstrating the presence of recombination 33 . According to the locations of the recombination breakpoints in the alignment dataset in RDP4 sequence display, the locations of recombination regions in genome sequences without gaps were listed (Supplementary Table S5), which was convenient for building the corresponding sequence datasets of the recombination and non-recombination regions. The non-recombination regions of DL71 were clustered into subgenotype C4 strains (Fig. 4). However, the recombination regions of DL71 were clustered into subgenotype C2 strains (Fig. 5). The topological incongruence of phylogenetic trees reconfirmed that DL71 Table 1. Average genetic distances of nucleotide and amino acid sequences in different gene fragment between DL71 and other genotypes or subgenotypes of EV-A71. The average genetic distance of nucleotide and amino acid sequences in different gene fragment between DL71 and the reference strains was estimated by the DISTANCE program (Compute between Group Mean Distance) in MEGAX. The numbers of base and amino acid substitutions per site from averaging over all sequence pairs between groups are shown. The nucleotide sequences comparison was conducted using the Kimura 2-parameter model with both transitions and transversions substitutions. The amino acid sequences comparison was conducted using the Poisson correction model with all substitution. The bootstrap method with 1000 replications was selected as the variance estimation method. The smallest genetic distances of nucleotide sequences in different regions are marked in bold. The information of 222 EV-A71 strains used in the analysis was listed in the supplementary table S2. nt, nucleotide. aa, amino acid.  www.nature.com/scientificreports/ appeared with potential recombination events, and subgenotypes C4 and C2 EV-A71 strains participated in this process.

Discussion
Our previous study found that a new subgenotype C6 of EV-A71 emerged in mainland China, which might derive from subgenotypes C4 and C2 EV-A71 strains 25 . However, the detailed genetic characteristics of the subgenotype C6 EV-A71 strain DL71 were unclear. Here we provided strong evidence to demonstrate that DL71 exactly formed a new branch distinct from the known subgenotypes by calculating the genetic distance and constructing the phylogenetic trees based on the genome and individual gene sequences. 9 recombination events, 2 parental strains and the breakpoints locations of DL71 were detected by RDP4 and SimPlot. Moreover, phylogenetic analysis and nucleotide sequence identity calculation based on the recombination and non-recombination regions further indicated that DL71 was a new variant originated from 9 inter-subgenotype recombination events between subgenotypes C4 and C2 EV-A71 strains. In this study, DL71 exhibited a multiple inter-subgenotype recombination manner. The recombination events detected in DL71 genome was more than that previously reported in EV-A71 strains belonging to genotype A and subgenotypes C4, C2 and B5 25,34,35 . In total, 9 recombination events were detected in DL71 genome with multiple recombination breakpoints throughout the full genome. Numerous studies suggested that the recombination breakpoints in EV-A71 genome are frequently located in the non-structural regions 5′UTR, P2, and P3 but rarely in the structural region P1 25 . Consistent with previous findings, the breakpoints of recombination events event-1, event-5, event-6, event-7, event-8, and event-9 of DL71 were located in the 5′UTR, P2 and P3 regions. However, 6 breakpoints of recombination events event-2, event-3, and event-4 of DL71 were located in the P1 region. This study showed that three recombination events occurred in the structural regions of DL71. In addition, unlike the recombination of subgenotype C4 EV-A71 involved several viruses, such as CVA4, CVA5, CVA14, CVA16 and genotype B EV-A71 25 , only subgenotypes C4 and C2 EV-A71 respectively as the major and minor parents participate in the recombination of DL71. DL71 originated from 9 recombination events between subgenotypes C4 and C2 EV-A71 and presented a notable helix-like recombination manner.
The emergence of new variants has been reported continuously. For example, the subgenotype C2-like strains were detected in different geographic regions in the Philippines in 2000,2002,2005,2010 and in Taiwan in 2008 26,36 . The C2-like strains originated from the recombination between subgenotypes C2 and B3 EV-A71 strains 36 . Moreover, the C1-like EV-A71 strains associated with neurologic symptoms have emerged in several outbreaks in Europe since 2007 [27][28][29][30][31]37 . The C1-like strains shared the highest nucleotide similarity with subgenotype C1 EV-A71 strains in P1 region, while it shared higher nucleotide similarity with CVA6 and CVA8 in 5′UTR and higher nucleotide similarity with CVA2, CVA4, CVA5, and CVA6 in 3D region than subgenotype C1 EV-A71 31 . The C1-like EV-A71 was a multiple recombinant derived from the complex recombination among several viruses 31 . Except for the C2-like and C1-like strains, DL71 was identified as another new inter-subgenotype recombinant originated from the predominant subgenotype C4 and the sporadic circulating C2 strains. The comparison of the amino acid sequences in the complete genome between DL71 and EV-A71 strains belonging   Table S6). This was the first report describing the detailed genetic characteristics of the novel subgenotype C6 of EV-A71 in mainland China. However, given that the C1-like and C2-like were also named subgenotype C6 respectively in one report 27 and in two reports 38,39 , the subgenotype C6 described in the present study could be named C4-like to avoid confusion. The co-circulation of various genotypes/subgenotypes of EV-A71 in one country during the same period of time provides an environment for the emergence of new recombinant variants. Recombination may endow newly emerging variants with different antigenicity or pathogenicity 40 . An antigenic map constructed using serological data suggested that the antigenicity varied among different genotypes and subgenotypes of EV-A71. The antigenic map showed that subgenotypes B1 and B4 strains were clustered closely together, subgenotypes C2 and C4 formed a separate cluster different from genotype B, and subgenotype B5 formed its own cluster 41 . Although the cross-neutralization by anti-EV-A71 antibodies was commonly observed among various genotypes 42 , there was a notable difference in neutralization titers. For example, children infected with genotype B strains showed higher neutralization titers against genotype B strains than genotype  Table 3. Nucleotide sequence identities of the non-recombination and recombination regions between DL71 and its putative parental strains. The highest nucleotide identities of different regions are marked in bold. www.nature.com/scientificreports/ C strains 43 . The subgenotype B4 vaccine could not elicit an effective neutralization titer against subgenotype C2 strains 44 . The C2-like variant in Taiwan in 2008 had a 128-fold lower neutralization titer than the previous C2 strains 36 . The antigenicity difference between subgenotype C6 EV-A71 and its parental strains need to be further studied. The altered antigenicity of new variants may influence the circulation of EV-A71 29 . The subgenotype C4 EV-A71 has a large susceptible population in mainland China, whereas the C2 EV-A71 does not. As a recombinant derived from subgenotypes C4 and C2 strains, C6 EV-A71 emerged occasionally may be due to its antigenicity being different from C4 EV-A71 and less susceptible population. Moreover, the altered antigenicity According to the potential virulence determinants previously reported in EV-A71 genome, we have speculated that the C6 EV-A71 strain DL71 was probably a strain with low virulence 25 . In fact, there is no deterministic relationships between a specific genotype/subgenotype and viral virulence or certain clinical manifestations 45 . Both virus and host factors contribute to the progression of EV-A71-associated disease.

Recombination regions in alignment with gaps (nt) Parent major/minor Identity
In addition, this study on DL71 was limited to the field of bioinformatics due to the lack of live isolate and clinical symptom information. There are 3 limitations in this study. First, only the orphan strain DL71 belonged to subgenotype C6 so far, making it hard to judge whether subgenotype C6 contained only one strain or other www.nature.com/scientificreports/ strains were omitted due to its low prevalence in actual circulation environment. Based on the current epidemiological data, the actual route of virus transmission cannot be traced. Second, as one important mechanism of EV-A71 evolution, recombination can lead to the altered antigenicity of new variants 36 . Whether the new subgenotype C6 strain DL71 shares similar phenotypic and antigenic characteristics with its parental strains is unknown. Both the genome sequence analysis and neutralization antibody titers determination are needed. Third, the only information about DL71 was its isolated place and time, whereas the clinical symptoms and disease severity of DL71 was unknown. Although the authors have analyzed the previously reported potential virulence determinants in DL71 genome 25 , it is still difficult to define the pathogenicity of DL71, which is associated with many factors, including virus virulence and host immunity. Although the value is limited for investigating the actual viral population by using the published genomes, this study revealed the recombination manner of the new subgenotype C6. The effect of mutation during EV-A71 evolution is undeniable. In fact, both recombination and mutation play important roles in the emergence of new variants. Genetic recombination may occur more frequently than we expected. Since virus typing with a single short fragment will miss some new subgenotypes, the complete genome sequences should be used to classify EV-A71 when monitoring the molecular epidemiology. Moreover, considering that the available EV-A71 vaccines in mainland China target subgenotype C4 strains, it is necessary to developing a polyvalent vaccine against various strains. Thus, comprehensive surveillance of the genotype shifts, recombination, mutation and the altered antigenicity of EV-A71 is very important for preventing HFMD outbreaks and developing new treatment strategies.

Materials and methods
Sequence data and alignment. This study involved no human participants or human experimentation.
The complete VP1 sequences of 3036 EV-A71 strains in mainland China during 1987-2017 were downloaded from GenBank database. The VP1 sequences of 34 international EV-A71 strains belonging to genotypes/subgenotypes A, B0-B5, C1-C3, C5, D, E, F, G, C1-like, and C2-like were used as the reference sequences. Coxsackievirus A group 16 (CVA 16) prototype strain G-10 was used as an out-group in the phylogenetic analysis. Thus, 3070 VP1 sequences (Supplementary Table S1) were used in the phylogenetic analysis to determine whether new genetic lineages of EV-A71 emerged in mainland China.
In these 3036 Chinese EV-A71 strains, the complete genome sequences of 174 EV-A71 strains were downloaded from GenBank database. The genome sequences of 38 international EV-A71 strains belonging to A, B0-B5, C1-C3, C5, E, F, C1-like, and C2-like were downloaded. The genome and individual gene sequences of 222 EV-A71 strains (Supplementary Table S2) were used to construct phylogenetic trees and calculate genetic distance to identify the novel subgenotype.
The genome and individual gene sequences were edited by the EditSeq program of DNAStar5.0 software. All sequence files were converted into merged files in Clustal format by SeqVerter software. Multiple-sequence alignments of the genomes and individual genes were performed by the ClustalW program embedded in MEGAX software. Information including the virus name, gene accession number, genotype, isolation time and place is summarized in Supplementary Tables S1 and S2.

Phylogenetic analysis.
Phylogenetic trees based on the genome and individual gene sequences were constructed by the maximum likelihood method with the GTR model using MEGAX. The bootstrap support values were determined with 1000 replicates to assess the robustness of individual nodes of the phylogenies. Genetic distance. The genome and individual gene sequences of EV-A71 were first grouped by the genotypes/subgenotypes using the Data program (Edit Taxa/Groups) in MEGAX. The mean genetic distances of the genome and individual gene sequences between DL71 and the reference strains were calculated (Table 1) by the DISTANCE program (Compute Within or Between Group Mean Distance) in MEGAX. The mean genetic distances of the genomes and individual genes of EV-A71 strains within genotypes/subgenotypes were also calculated (Supplementary Table S3). The Kimura 2-parameter model with both transition and transversion substitutions was selected for parameter setting. The bootstrap method with 1000 replications was selected as the variance estimation method.
Recombination analysis. The genome sequence of DL71 was examined for potential recombination events. The recombination events, breakpoints locations and parental strains were detected by the recombination detection program software version 4.101 (RDP4). 7 detection methods in their default mode, RDP, GENECONV, BOOTSCAN, MAXCHI, CHIMAERA, SISCAN and 3SEQ, were utilized to evaluate the potential recombination events between the inputted aligned sequences. Only the recombination events identified by at least 4 methods were considered significant 46 . The recombination events were further validated by the topological discordance of phylogenetic trees generated for each recombinant regions and non-recombinant regions of virus genomes. The potential recombination events were visualized by Similarity Plot (SimPlot) version 3.5.1. Similarity and Bootscan analyses were carried out using the Neighbor-Joining tree model with the Kimura 2-parameter method. The window size was 200 nucleotides moving along the alignment in 20 bp increments. Information on the alignment, including the genome sequences of DL71 and 42 reference strains used in recombination analyses by RDP4 and SimPlot3.5.1 software, is listed in Supplementary Table S4.
Nucleotide sequence identity. According  www.nature.com/scientificreports/ corresponding sequences by the ClustalW method, the nucleotide identities in different regions of DL71 and its parental strains were calculated using the MegAlign program in DNAStar5.0.

Data availability
All data used in our study are public data. The genome and individual gene sequences of EV-A71 and other viruses are downloaded from the GenBank database (http:// www. ncbi. nlm. nih. gov/ nucle otide/). The datasets generated during the current study are available from the corresponding authors on reasonable request.