Phylogenetic molecular evolution and recombination analysis of complete genome of human parechovirus in Thailand

Human parechovirus (HPeV), which is a member of the Picornavirus group of viruses, is a pathogen that is reported to be associated with manifestations that include respiratory tract involvement, gastroenteritis, sepsis-like symptom, and central nervous system complication. Until now, nineteen genotypes have been identified. The lack of proofreading property of viral RNA-dependent RNA polymerase (RdRp) together with recombination among the intra- and inter-genotypes of the virus results in high diversity. However, data specific to the molecular evolutionary perspective of the complete genome of HPeV remains limited. This study aimed to analyze the phylogenetic, molecular evolution, and recombination characteristics of the complete genome of HPeV strains isolated in Thailand during 2009–2012. Fifty-eight samples that were previously confirmed to be HPeV positive and then evaluated for genotyping were subjected to complete genome amplification to generate ten overlapping PCR fragments using a set of in-house designed primers. The same position of the viral genome was read in triplicate using direct Sanger sequencing. All samples were classified into the same previously defined genotypes in both whole-genome and VP1 phylogenic tree. However, sample B1091/HPeV14/2011 exhibited discordant grouping between whole-genome and VP1 on the phylogenetic tree. Bootscan analysis revealed that B1091/HPeV14/2011 inherited from two genotypic viruses, including VP1 from HPeV14, and the rest of the genome from HPeV1B. The results of this study provide important insights into the molecular evolution of and recombination in the viral genome of HPeV that will improve and accelerate our ability to develop treatment and prophylactic strategies in the future.

www.nature.com/scientificreports/ proteins, including 2A-C and 3A-D. RNA-dependent RNA polymerase (RdRp), which is encoded from the 3D gene, is responsive for viral genome replication 21,22 . The lack of proofreading property of viral RdRp together with the recombination among intra-and inter-genotypes of the virus leads to high viral genome diversity. The recombination breakpoint hotspots were reported to be around 5′UTR and P1, and P2 and P3 junction [23][24][25][26][27] . However, the molecular evolutionary perspective of the complete genome of HPeV is limited. Therefore, this study aimed to analyze the phylogenetic, molecular evolution, and recombination of the complete genome of HPeVs isolated in Thailand. Together with the reference sequences deposited in the GenBank database, the previously identified viruses from our center during January 2009 to January 2012 28 were subjected to complete genome amplification to retrieve complete coding sequences and to evaluate their phylogenetic, evolutionary relation recombination event. This study has provided an insightful understanding of molecular evolution and recombination in the viral genome that will pave the way for controlling this virus and for investigating a curative strategy in the future.

Materials and methods
Samples and viral genome extraction. All fifty-eight HPeV positive samples were selected during 2009-2012, based on the previous study 28 . Two hundred microliters of 46 fecal samples and 12 nasopharyngeal swab samples were subjected to viral RNA extraction. Extraction was performed using the conventional GTCphenol-chloroform method 29 . The extracted RNA was finally dissolved in 20 μl of diethylpyrocarbonate (DEPC) water and stored at − 70 °C until further use.
Complete genome amplification and direct Sanger sequencing. Semi-nested one-step reverse transcription polymerase chain reaction (RT-PCR) was used for amplification. Degenerated primers were designed for ten overlapping PCR fragments based on the GenBank database indicated in Supplementary

Molecular evolution analysis.
All study sequences were submitted to the GenBank database. The accession numbers included in this study are shown in Supplementary Table 1. Phylogenetic tree, the nucleotide substitution rate, and the most recent common ancestor (tMRCA) were achieved using Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software version 1.10 (University of California, Los Angeles, CA, USA) 31 . The relaxed clock-uncorrelated exponential with 10 million chains was run in the Genetic Testing Registry (GTR) with a gamma distribution substitution model. The data from BEAST were analyzed using the TACER program (http:// beast. bio. ed. ac. uk/ Tacer). The phylogenetic tree was annotated and analyzed using Figtree version 1.4.4 (http:// tree. bio. ed. ac. uk/).

Recombination analysis.
The recombination events were initially analyzed using phylogenetics and genetic distances, after which they were analyzed using the RDP4 software package (http:// web. cbio. uct. ac. za/ ~darren/ rdp. html) 32 . The recombination region count matrix, modularity matrix, and recombination breakpoint matrix were generated using RDP4 software in which the default algorithm setting was used. Potential recombination events were also identified by BootScan within RDP4 using a sliding window of 200 nucleotides.
Ethical consideration. All

Results
Phylogenetic analysis of VP1 and the complete genome of HPeV. The viral genome of fifty-eight samples previously identified as HPeV positive 28 was subjected to complete genome amplification. A set of consensus primers were in-house designed for use in the overlapping semi-nested PCR (Supplementary Table 1). Direct Sanger sequencing revealed the nucleotide sequence of the viral genome after amplification. Each position of a nucleotide was read in at least triplicate with a different direction of sequencing. After assembly and annotation, the complete genomes of HPeVs from this study were evaluated for their genetic correlation with other HPeV strains publicly available in the GenBank database by BEAST program. The sequences from this study were deposited in the GenBank database as accession numbers MW476080 to MW476137. The genotype of viruses classified from this study in the whole-genome phylogenetic tree ( Fig. 1) corresponded to the VP1 region (Supplementary Figure S1). However, sample B1091/HPeV14/2011 was classified as genotype 14 from the VP1 sequence (Supplementary Figure S1). In contrast, this sample was grouped with genotype 1B in the wholegenome phylogenetic tree ( Fig. 1). This discordant genotype grouping may indicate a recombination event in the virus genome. The year of sample collection was defined as the tip of each taxon. The estimated time to the most recent common ancestor (tMRCA) of the complete genome of the virus indicated as AD 1767 (95% highest posterior density [HPD] interval: 1729-1882) ( Table 1). Whereas the tMRCA of VP1 was AD 1505 (95% HPD interval: 1400-1601) ( Table 1). The clock rate of VP1 was 1.56 × 10 -3 substitutions/site/year, whereas the complete genome was 1.68 × 10 -3 substitution/site/year (Table 1). These clock rates corresponded with the high evolutionary rate in the VP1 region compared with the complete genome.
Phylogenetic analysis of P2 and P3 region of HPeV. P2 region of HPeV consists of non-structural proteins 2A, 2B, 2C, and 2D, which are suspected of playing a crucial role in pre-replication and post-translational modification 22 . In comparison, P3 consists of 3A, 3B, 3C, and 3D, which function in the viral replication process 22 . The phylogenetic tree of these two regions of the virus did not group as a genotype as P1 or VP1 region. All genotypes of the virus were unintentionally distributed along the tree of P2 (Fig. 2B) and P3 (Fig. 2C). As previously described by Calvert J., et al. (2010), the minimum of 0.155 nucleotide sequence divergence was the suitable threshold corresponding to a naturally occurring minimum value in the distribution of pairwise distances among 3D sequences 33 . Therefore in this study, the substitution rate over 0.155 was used as the cutoff for classifying the phylogenetic tree of P2 and P3 into different clads. Regarding P2 from this study, six clades of the phylogenetic tree could be defined, including clade I, II, III, IV, V, and VI (Fig. 2B). Interestingly, all samples in this study were members of clade V (the darkened circle in Fig. 2B). tMRCA of P2 was AD 1774 (95% HPD interval: 1732-1812), and the clock rate was 1.56 × 10 -3 substitutions/site/year (Table 1). In the meantime, the phylogenetic tree of P3 could also be defined into six clades (Fig. 2C). Moreover, all samples from this study were distributed all along with clade I of the tree. It should be noted that only B1187/HPeV4/2011 appeared in clade II (Fig. 2C). tMRCA of P3 was AD 1818 (95% HPD interval: 1782-1847), and the clock rate was 1.96 × 10 -3 substitution/site/year (Table 1). These results indicate that P2 and P3 occupied less evolutionarily driven rate than the P1 region. All samples isolated from this study were distributed into the same clade, which suggests the regional distribution of P2 and P3 of the virus.

Recombination investigation in HPeV genome.
The recombination analysis of the viral genome was performed using the RDP4 program, after which a consensus event was visualized as a matrix. A recombination event was usually detected in the genome of the viruses included in this study, which indicated in the hot spectrum of recombination region count matrix (upper matrix of Fig. 3A). The result showed that 5′UTR and P1 were inherited from lineage different from that of P2, P3, and 3′UTR. Moreover, the genetic diversity of P1 was also higher than the least of the viral genome, which shows as hot spectrum in the modularity matrix (lower Table 1. The molecular evolution of the HPeV genome. HPeV human parechovirus, tMRCA the most recent common ancestor, HPD highest posterior density, AD Anno Domini. www.nature.com/scientificreports/ matrix of Fig. 3A). The consensus sequences indicated two hotspots for recombination in the viral genome of our samples. Those breakpoint hotspots appeared in the junction between VP3 and VP1, and between 2C and 3A (upper matrix of Fig. 3B). It may not correspond with the breakpoint in the reference sequences published in the database, which showed only one hotspot between the VP1 and 2A junction (lower matrix of Fig. 3B).

Genome of HPeV tMRCA (AD) 95% HPD interval of tMRCA (AD) Clock rate (substitution/site/year)
In this study, one of the samples (B1091/HPeV14/2011) is suspected of recombination events from the phylogenetic tree analysis (Figs. 2, 3). This genome of the virus seems to be mixed between genotypes 1B and 14. The VP1 region was clustering with genotype 14, whereas the rest of the genome was related to genotype 1B. Therefore, BootScan analysis was applied for suspected recombination examination. The result indicated two breakpoints (BPs) in the genome of B1091/HPeV14/2011 (Fig. 4). BP1 and BP2 were the junctions between VP3 and VP1, and between VP1 and 2A, respectively. The VP1 of the virus was closely related to genotype 14, represented by MG58109/HPeV14/3C/V.E./2015. Meanwhile, the rest of the viral genome was inherited from genotype 1B, represented by GQ183023/HPeV1B/K150-93/NL/1993 (Fig. 4).

Discussion
HPeV is a member of the Piconavirudae family, which was recently reported to be a pathogen related to the respiratory tract, gastrointestinal 11,12 , and central nervous system (CNS) involvement 14 . The lack of proofreading activity of RdRp results in high genetic diversity, which was found in the entire viral genome. Nineteen genotypes have been identified (http:// www. picor nastu dygro up. com/ types/ parec hovir us/ hpev. html). The different genotypes of viruses relate to human pathogenesis 22 . By way of example, genotype 3 was closely related to sepsis-like complications [17][18][19][20] . Moreover, genetic recombination plays a crucial role in Picornavirus's evolutionary dynamic and diversity. However, this intent analysis in the HPeV genome is very limited. Despite the phylogenetic, molecular evolution, and recombination analysis, it mainly focuses on this study.
Fifty-eight samples previously identified as HPeV positive and that were evaluated for genotyping 28 were subjected to complete genome amplification. Ten overlapping PCR fragments were performed using a set of in-house designed primers. The exact position of the viral genome was read in triplicate with direct Sanger sequencing. After assembly and annotation, the evolutionary dynamic was revealed with Bayesian's algorithm in BEAST software. Then, the relationship among the different HPeV genotypes from this study was visualized as a phylogenetic tree. All samples were classified into the identical previously defined genotypes in both wholegenome and VP1 phylogenic tree. However, sample B1091/HPeV14/2011 exhibited discordant grouping between whole-genome and VP1 on the phylogenetic tree. Bootscan analysis revealed that B1091/HPeV14/2011 inherited from two genotypic viruses (Fig. 4), including VP1 from HPeV14 (Supplementary Figure S1), and the rest of the genome from HPeV1B (Fig. 1). Inter-genotypic recombination was also reported by Zhao X., et al. (2017), who demonstrated that the virus from fecal isolation (CH-ZXY1) recombined genotypes 1 and 5. Most studies reported that the junction between P1 and P2 was frequency responsiveness for the breakpoint of the HPeV genome 34 (lower panel of Fig. 3A), and this corresponds with our finding (upper panel of Fig. 3A).
Interestingly, the junction of VP3 and VP1 was another breakpoint indicated in our samples (upper panel of Fig. 3A). Direct Sanger sequencing with triplicate reads was performed to confirm those recombination breakpoints (Fig. 4B). This finding suggested that VP3 and VP1 were other potential points for recombination in the HPeV genome.
HPeV shows dynamic diversity along the genome. High diversity was indicated in the P1 region, which translated into viral structural protein VP1-VP3 ( Fig. 2A). In contrast, P2 and P3 of the viral genome exhibited less variation showing in the cold spectrum in Fig. 3. Because the viral structural protein is generally exposed to the environment and is responsive to trigger host immunity, it may be an evolutionarily driven force to increase their variation in the P1 region more than the other viral genes. When we incorporated our samples with the genome in the database, the molecular clock rate of VP1 and P1 was defined as 1.56 × 10 -3 and 1.68 × 10 -3 substitutions/site/year, respectively. These rates were slightly lower than those from a previous report by Faria NR., et al. (2009), who reported the substitution rate in VP1 and P1 to be 2.30 × 10 -3 and 2.03 × 10 -3 substitutions/site/year, respectively. However, the clock rate of the P3 region, which was 1.96 × 10 -3 substitutions/site/year (Table 1), was comparable to other reports 35 . Our first report of the evolutionary rate of P2, which was 2.21 × 10 -3 substitutions/ site/year, suggests different evolutionarily driven forces among the HPeV genome. Figure 1. Phylogenetic tree of the complete genome of HPeV. The tree was constructed using the BEAST program under the relaxed clock-uncorrelated exponential with 10 million chains, and was run in the Genetic Testing Registry (GTR) with a gamma distribution substitution model. Each branch is labeled as GenBank accession number/genotype/strain name/origin country/year of collection. The genotype was classified based on the cluster on the VP3/VP1 gene. The samples from this study are indicated with a darkened circle. The most recent common ancestors (tMRCAs) are defined at the tree node, and the highest posterior density (HPD) is indicated at each tree branch.  , and P3 (C) of HPeV. The tree was constructed using the BEAST program under the relaxed clock-uncorrelated exponential with 10 million chains, and was run in the Genetic Testing Registry (GTR) with a gamma distribution substitution model. Each branch is labeled as GenBank accession number/genotype/strain name/origin country/year of collection. The samples from this study are indicated with a darkened circle. The most recent common ancestors (tMRCAs) are defined at the tree node, and the highest posterior density (HPD) is indicated at each tree branch.

◂
We also analyzed the complete genome from our isolation with the genome retrieved from the GenBank database, which indicated the root of complete genome sequences as AD 1767 (95% HPD: 1729-1882). This tMRCA was in concordance with the P2 and P3 regions of the virus (Table 1). However, P1 shows as being a bit older than those two regions, indicated as AD 1614 with 95% HPD: 1566-1663, and VP1 was AD 1505 with 95% HPD: 1400-1601. The tMRCAs of P1 and VP1 were comparable to the Faria NR., et al. study, which rooted as AD 1581 with 95% HPD: 1334-1733, and AD 1553 with 95% HPD: 1412-1673, respectively 35 . The overlapping amplification by ten separated PCR to reveal the virus's complete sequence may be the limitation of this study. Only a majority of viral genotypes will be amplified by this method. Next-generation sequencing would be more suitable for complete genome analysis. Unfortunately, in this study, the volume of samples was limited; therefore, we could not subject our samples to the next-generation sequencing process.

Conclusion
This study successfully retrieved the complete genome of fifty-eight HPeV samples previously isolated from the Thai population 28 . It was clear that one sample (B1091/HPeV14/2011) exhibited a recombination event in which the VP1 gene was inherited from genotype 14, and the rest of the genomes were closely related to HPeV1B. The phylogenetic tree analysis and molecular evolutionary study indicated high diversity at the whole genome level, especially in the P1 region. However, the mechanism of the observed recombination remains unclear and is worthy of further intensive investigation.