A comprehensive global perspective on phylogenomics and evolutionary dynamics of Small ruminant morbillivirus

A string of complete genome sequences of Small ruminant morbillivirus (SRMV) have been reported from different parts of the globe including Asia, Africa and the Middle East. Despite individual genome sequence-based analysis, there is a paucity of comparative genomic and evolutionary analysis to provide overarching and comprehensive evolutionary insights. Therefore, we first enriched the existing database of complete genome sequences of SRMVs with Pakistan-originated strains and then explored overall nucleotide diversity, genomic and residue characteristics, and deduced an evolutionary relationship among strains representing a diverse geographical region worldwide. The average number of pairwise nucleotide differences among the whole genomes was found to be 788.690 with a diversity in nucleotide sequences (0.04889 ± S.D. 0.00468) and haplotype variance (0.00001). The RNA-dependent-RNA polymerase (L) gene revealed phylogenetic relationship among SRMVs in a pattern similar to those of complete genome and the nucleoprotein (N) gene. Therefore, we propose another useful molecular marker that may be employed for future epidemiological investigations. Based on evolutionary analysis, the mean evolution rate for the complete genome, N, P, M, F, H and L genes of SRMV was estimated to be 9.953 × 10–4, 1.1 × 10–3, 1.23 × 10–3, 2.56 × 10–3, 2.01 × 10–3, 1.47 × 10–3 and 9.75 × 10–4 substitutions per site per year, respectively. A recombinant event was observed in a Pakistan-originated strain (KY967608) revealing Indian strains as major (98.1%, KR140086) and minor parents (99.8%, KT860064). Taken together, outcomes of the study augment our knowledge and current understanding towards ongoing phylogenomic and evolutionary dynamics for better comprehensions of SRMVs and effective disease control interventions.

Peste des petits ruminants (PPR), caused by Small ruminant morbillivrus (SRMV), is a contagious transboundary disease of domestic and wild ruminants 1,2 . Despite exhaustive vaccination, the disease is endemic across many regions/countries in Africa, Middle East and Asia, where occurrence of frequent disease outbreaks is not uncommon [3][4][5][6][7] . Currently, the PPR is threatening approximately 80% of the global population of sheep and goats with an estimated loss of USD 2.1 billion per year 8 .
The SRMV belongs to the genus Morbillivirus within the family Paramyxoviridae. It is a pleomorphic and enveloped virus that carries a negative sense RNA genome 9 of variable length, from 15,927 to 16,058 nucleotides (NCBI database). The genome encodes six structural and two non-structural proteins in an order of 3′-N-P/ C/V-M-F-HN-L-5′. Non-structural proteins (V and C) are encoded either by alternate open reading frames or mRNA editing in the phosphoprotein (P) gene. Based upon either N gene (255 bp) or F gene (322 bp), four distinct lineages of SRMVs (I-IV) are reported so far. Lineage I-II viruses are mostly reported from West African countries. Lineage III viruses seem restricted to the Middle East and East African countries. Lineage IV viruses have been reported from Asian and African countries 1,10 . The lineage IV is replacing prevalence of other lineages (i.e. I-III) territories and the occurrence of lineage IV is overwhelming even in Africa. These features demonstrate that lineage IV possess stronger positive selection and host-adaptation potential in a wide spectrum of hosts and geographical areas 11,12 .
Given the fact that genetic variations within a population of viruses could alter their pathogenicity and host spectrum, viral genetic diversity is considered a key to unleash viral evolution 13 . Using complete or partial Percentage identity of nucleotide and comparative residue analysis. We found a varying nucleotide divergence among strains representing different lineages and geographical settings. For instance, a maximum nucleotide divergence (12.7%) was observed among Mongolian, Georgian (lineage IV) and Asian strains (lineage III). This was followed by 11.9% divergence between Pakistani (lineage IV) and other Asian strains (lineage III), and 11.8% divergence between Chinese (lineage IV) and rest of Asian strains (lineage III). As high as 11.5% nucleotide divergence was observed between Asian (lineage II) and African strains (lineage III) of SRMV. Similarly, a total of 11% nucleotide divergence was observed between African strains of lineages II and III. However, a variable divergence (8.5-10.3%) was noticed between SRMVs of lineage I and IV whereas, a divergence of 1.0-4.9% was revealed among strains within lineage IV (Table 4).
Comparative residue analysis of different proteins across the entire genome length revealed conserved functional and/or structural motifs; however, few substitutions were noticed in some of the studied strains. A hypervariable region of varying length was observed in each of the SRMV proteins i.e., 423-456 aa in N, 74-111 aa in P, 73-197 aa in M, 6-16 aa in F, 174-179 aa in H and 617-627 aa in L protein. The nuclear export and nuclear localization signal, and RNA binding motifs appeared conserve in N protein of all strains. In P protein, a Soyuz 1 motif was also conserved in all strains except for the consensus sequence of Africa/1994-2012 (lineage III) where a total of six substitutions (L5Q, V10N, E11K, A14E, L16I and F20K) were observed. A serine residue ( 151 S) in the P protein and a cell membrane anchor in the M protein were conserved in all of the SRMV sequences ( Table 5). The signal peptide in F protein has previously been reported to be hypervariable (Table 6); however, while comparing SRMVs of different lineages, we proposed a relatively conserved long stretch of residue ( 1 MTRVAILTFLFLFPNVVAC 19 ) (Fig. 1). The cleavage motif ( 103 GRRTRR 108        Phylogenetic topology based on geographical pattern. Utilizing each of the coding genes, the phylogenetic analysis of SRMV sequences revealed a distinct pattern of clustering according to the geographical locations. However, the complete N gene-based clustering pattern was more authoritative and conclusive followed by L, H, F, M and P genes (Fig. 2a,b). Within lineage II viruses, variations in clustering pattern were related to the reporting period from different regions in the African continent. In contrast, lineage III viruses from Africa showed variations in their clustering pattern on the basis of each of five gene used for analysis. For lineages IV viruses, there were significant variations in clustering pattern for each of the coding gene. The clustering pattern derived from N and L genes was similar to M, P, F and H genes. Since N and L genes-based topology of www.nature.com/scientificreports www.nature.com/scientificreports/ phylogenetic relationship among geographically distinct strain was found to be more precise and conclusive, the L gene is suggested to be employed in future epidemiological investigations.
Based on the analysis of the complete N gene dataset, we proposed a geography and timeline based-classification of SRMV strains within lineage IV. A substantial analysis revealed a 6% and 2% nucleotide divegernce as a considerable cut-off criterion for the clasification of SRMV lineages and sub-lineages, respectively. Further analysis identified a total of six sub-clades (a-f) where sub-clade "a" represented strains from India, Turkey and Israel during 1994-2017, sub-clade "b" contained Chinese strains reported in 2007-08, sub-clade "c" represented strains from Africa and Georgia during 2008-2016, sub-clade "d" had Chinese strains reported Nucleotide diversity and selective pressure analysis. The average nucleotide diversity (Pi-value) was 0.04889 for complete genome of all SRMV strains. With a variance (0.00001) and standard deviation (0.002) for haplotype diversity (Hd = 1.000), the average nucleotide differences among all haplotypes was found to be k = 788.690. A total of 5891 mutations were observed in DnaSP analysis, where 10831 were monomorphic and 5117 were polymorphic. The polymorphic mutations consisted of 1311 singleton variable sites with 3806 parsimony informative sites. While an assessment for neutrality, the Tajima's D value was found to be negative for all www.nature.com/scientificreports www.nature.com/scientificreports/ genes (p > 0.10). The reliability of the analysis, as determined by HKA test, was found to be 6.078 (X-square value) in T = 6.732 (divergence time) at a significant level (p = 0.0131) ( Table 8). An analysis of the genetic diversity within the coding genes across the whole length of the genome revealed an occurrence of hotspot event (300 nt window size per ten nt overlapping steps) between 5ʹ UTR of M gene and 3ʹ UTR of F gene (Fig. 4). The nucleotide diversity across the coding genes of nucleotide sequence haplotypes was found to be highest in H gene (0.05171) followed by P (0.04527), F (0.04409), N (0.04068), L (0.03982) and M (0.03931) genes. On the other hands, the haplotype diversity (Hd) was observed to be higher in L gene (0.996) followed by F (0.926), P (0.921), H (0.920), N (0.909) and M (0.900) genes (Table 8).
Datamonkey output for selective pressure analysis across CDS regions is summarized in Table 9. Although none of the gene carried a mean dN-dS greater than 1 at p < 0.05, it was highest for P gene (0.  (Fig. 6). This observation was consistent in all of the seven recombination algorithm methods at p < 0.001. A detailed information on inferred breakpoint and p-value of algorithm approaches is given in Table 10.  Table 8. A brief description on genome polymorphism for secletion sites in the complete genome and each of the coding regions in SRMVs. Note: HKA test direct mode: Divergence time T = 6.732 × -square value = 6.078, P value = 0.0131*, * = 0.01 < p < 0.05.

Discussion
We presented a comparative genetic, phylogenomic and evolutionary analysis of SRMV strains reported so far in public database. Whole genome sequences and open reading frames (ORFs) of individual genes of representative strains were used in subsequent higher-resolution bioinformatic analysis. This is because a specific gene might not evolve at the same rate as does the whole genome 15 and, therefore, can provide precise information on viral evolutionary dynamics and necessary epidemiological investigations in future 16 . While considering the "rule of six" for whole gnome atlas, comparative complete genome analysis revealed a varying length of complete genome suggesting the potential of the virus to evolve over a period of time. A few sequences showed unsusual lengths (e.g., MF678816; 15927 bp, KM089831; 15957 bp and KM816619; 16058 bp) where, for each of these sequences, a nucleotide insertion/deletion was observed in the noncoding region between the M and F genes 17,18 . Interestetingly, each of these sequeunce was deriven from the next generation sequencing approach and, therefore, such an unusual length may correspond to the sequencing errors. Owing to the fact that all paramyxovirus including SRMV follow a polyhexameric genome length for the effective replication in host cells 19 , SRMV sequences erroneously not following the "rule of six" in genome atlas were excluded from the specific analysis.
Comparative residue analysis of viral proteins showed several conserved motifs 20,21 . Among these, the N protein had three conserved motifs. These included export signal, nuclear localization signal and RNA binding motif. The first two are considered responsible for transport of the N protein to nucleus of host cell, while the third one was believed to be involved in interaction of N-N monomers of RNA during genomic RNA binding and N-N self-interaction 20 . Developing polymerase complex with N and L proteins, the P protein plays a significant role in virus replication and RNA biosynthesis 22 . The protein contains a variable N-terminus whereas C-terminus is believed to be the most conserved, and is required for the interaction with L protein in synthesis of polymerase complex 23 . The Soyuz 1 motif and presence of 151 S residue, responsible for viral transcription via altering its phosphorylation status 24 , were found in all study-included strains 21 . The M protein is a core organizer of viral morphogenesis and has the ability to interact with other proteins for maturation of viral progeny 25 . For all of the investigated strains, this protein carried a previously known residue pattern 21 for late domain or cell membrane anchor, which has a known role for localization of cell membrane and budding activity 26 . An unusually long and GC rich non-coding region was observed between 3ʹ UTR-M and 5ʹ UTR-F genes in studied SRMV   www.nature.com/scientificreports www.nature.com/scientificreports/ sequences. While no biological or functional significance is warranted, a previous study has suggested an up-and/ or down-regulation of these proteins to differences in their lengths and therefore may alter cyto-pathogenicity and survival fitness of the virus in nature 27 .
Three motifs were also noticed in the F protein as signal peptide, cleavage site (responsible for virulence and adaptation in the environment) and a leucine zipper domain. These are known to be involved in maintenance of protein tertiary structure 20,22 . Since the signal peptide motif was located in a variable region 28 , we performed a comparative analysis to investigate the conserveness of specific residue at a specific position among all reported strains from different geographies and proposed a stretch of consensus residues at the global level. The H protein is considered responsible for attachment of the virus to host cell membrane via cleavage of sialic acid residue in cellular glycoprotein 29 . As observed in the current study, the protein has a hydrophobic domain at the N-terminus that acts as a signal peptide to anchor the protein into the membrane 20 . The findings of SLAM receptor binding sites during the analysis highlight the epitheliotropic and lymphotropic nature of SRMVs 30 . Herein, a high number of glycosylation sites were found in the N protein, which plays a major role in protein translocation 31 . The large protein (L) contributes in viral replication, transcription and polyadenylation using different domains that were found to be conserved in this study. Domain I, II and III are considered responsible for polymerase and kinase activity where GDDD and QGDNQ residues carry a prime significance 32 and, as observed in a previous  www.nature.com/scientificreports www.nature.com/scientificreports/ experimental study 33 , any substitution in these residues can abolish the polymerase activity of the L protein. Two highly conserved hinge regions were also observed in a pattern typically corresponding to established hinge regions of other closely related morbilliviruses 34 . Taken together, the potential influence of these substitutions in the functionality of corresponding proteins is scarce and, therefore, requires future investigations to determine impact of these variations in conserved domains.
The phylogenetic analysis, either based upon complete genome or each of the complete coding genes, showed a clustering pattern according to distinct geographical setting and time-period e.g., strains clustered within a distinct clade represented same country of origin within a specific time period. Therefore, while presenting a global perspective, a clustering and subsequent sub-clade grouping is proposed in the current study as an imporved and updated version of previous proposal 35 . This is simply becuase the previous classification proposal was limited to sub-grouping of Indian strains along with a few of those reported from the Middle East and Africa. Not only that the said proposal excluded strains reported from China and Georgia but also did not represent a well-defined evolutionary cut-off for the lowest taxonomic node (sub-lineage or sub-grouping). In addition to that, Kumar et al. 35 have classified the strains into clades and subclades which contradicts previously proposed standard classification criteria for the lowest taxonomic node or sub-grouping of the viruses within a lineage or genotype 36,37 . Though such a classification may provide some pre-liminary assessment exclusively for Indian-origin strains, a limited geographic-pattern based classification may raise controversies for SRMV classification at global scale. Therefore, these are considered unreliable to present molecular epidemiology of SRMVs worldwide. Indeed, with a substantial increase in the number of SRMV sequences in future, following a uniform classification criterion such as presented in the current study (IVa, IVb, IVc, IVd, IVe and IVf), is necessary for a more precise clustering at the lowest taxonomic node. While comparing different coding genes (P, M, F, H and L) of SRMV strains (Fig. 2a,b), minor differences were observed in the clustering pattern indicating an influence of nucleotides in genetic diversity of SRMVs. Nevertheless, the N gene-based topography was closer to those of the L gene (RNA-dependent RNA polymerase) and complete genome sequences. Thus it (L gene) could be employed alternatively for a precise evolutionary relationship of SRMV strains originating from different geographical regions. This is important because, considering SRMV a member of the family Paramyxoviridae, L protein is now considered as a standard criterion for classification of some of the closely related members of the sub-family Avualvirinae 38 . The observed topology of the N gene revealed evolutionary dynamics of circulating SRMV strains consistent with observations made previously 10,20 . Therefore, it is suggested that complete N and L gene-based phylogeny analysis can provide an accurate evolutionary relationship of the circulating strains in particular geographical settings 10,38 , especially for those regions where full-genomes have not yet been reported or have limited resources.
Nucleotide diversity analysis was used to unleash the genomic variation (polymorphism) within a given dataset 39 where a substitution rate is considered a prime parameter to elucidate virus evolution over a period of time. The average number of pairwise nucleotide difference among the whole genome of all SRMV sequences was found to be 788.690 with a diversity in nucleotide sequences (0.04889 ± S.D. 0.00468) and haplotype variance (0.00001). Gained observations correspond to distinct features of RNA viruses where there is a lack of proofreading activity by reverse transcriptase 40 . In contrast to previous observations 14 , a lower diversity in nucleotide and haplotype variance, and nucleotide difference in the current study may largely be ascribed to inclusion of a smaller number of complete nucleotide sequences than those employed in the current study (n = 37 vs n = 68). In addition to this, evidenced by significant nucleotide diversity over a period of time (p < 0.05), the HKA test outcome indicated an ongoing evolution or adaptation of virus in the environment.
The DnaSP based nucleotide diversity analysis revealed higher diversity in the H gene than others of SRMVs. Owing to significant roles in attachment and subsequent genome replication, the gene has been proposed to assess the evolutionary relationship of SRMV strains 41 . Though it ascertains further research, the substitutions in the H gene may have an influence on host adaptability and pathogenicity to susceptible host such as observed previously for SRMV 14 and influenza virus 42 . A diverse nucleotide hotspot was obsereved between 5ʹ UTR of M and 3ʹ UTR of the F genes in the whole genome. This aligns with observations made previously where a hotspot was identified at similar position between M and F genes 14 , highlighting potential variations in the genome size and corresponding substitutions 43 in each of the gene. An influence of these spontaneous mutations in genome was assessed by employing Tajima's D statistics that showed a non-significant negative value for all coding genes in DnaSP analysis, suggesting a lack of influence of spontaneous mutations on the fitness of individual virus. Such observations suggest positive selection among coding region of sequences with a lower level of sequence diversity and an excess of low-frequency variants reflecting the role of natural selection in SRMV genomes. Contrary to  Table 10. Evidence of recombination events in the whole genome of Pakistan-originated SRMV strain along with breakpoint positions and significant p-values current study findings where analysis showed negative value for each of the coding genes, positive values in F and H genes has previously been suggested 14 .
The non-synonymous/synonymous rate (ω = dN-dS) is an important indicator of selective pressure at the protein level where ω = 1 means neutral mutations, ω < 1 correspond to purifying selection while ω > 1 indicates diversifying positive pressure 44 . Herein, as reported in a previous study 14 , the dN-dS plot for each protein showed value not more than 1 indicating a slow genetic evolution of SRMV. Indeed, such a comparison of rates of synonymous and non-synonymous mutations provides an understanding towards the mechanisms of molecular sequence evolution. The positive selection sites were found in all coding genes (N, P, M, F, H and L) using different statistical approaches. Though these sites were found to be non-significant with a ratio less than 1 by Tajima's D statistics, it seldom happens in structural domains of genome. However, the impact of such positive selection sites with lower level of sequence diversity may cause the emergence of variants 44 . According to the neutral theory of molecular evolution, such type of molecular variations, which arise via spontaneous mutations, has no influence on individual's fitness 45 . However, the biological significance of these sites still remains unknown and needs to be explored in future.
The occurrence of recombination events is considered a significant source of genetic diversity for RNA viruses 46 . Beside rare occurrence of recombination in negative sense RNA viruses particularly SRMV, an analysis for the detection of recombination event/s is recommended as a standard component of every phylogenetic analysis to serve an important quality-control function to weed out laboratory and analytical errors 47 . We found recombination events among Pakistani-and Indian-origin strains which further highlight the co-existence of similar SRMV strains along with its transboundary nature of transmission 48 . Indicating a high resolution of prediction, the observed putative recombination event was statistically significant and was identified by more than five recombination detection algorithms. Such an interference of Indian strains as major and minor parents for Pakistan-originated recombinant strain highlight its potential to cross international borders 48 . Similar finding has previously been observed for another RNA virus (Yellow leaf virus) from Pakistan and India 49 . Potnetial reason for such a sharing of genetic material could be spectulative and may be attributed to an increased disease incidence rate and frequent disease outbreaks near borderline of these countries 50,51 . Though potnetial occurrence of homologous recombination in some of the negative sense RNA viruses is low 52 , it is not surprising because sporadic recombination in various negative-sense RNA viuses such as Hantavirus 53,54 , ambisense arenaviruses 55,56 , Newcastle disease viruses 57,58 and morbilliviruses (e.g. canine distemper virus 59 and measles virus 60 ) has been evidenced. Hence, an emergence of viral variants could be anticipated that may differ antigenically and serologically and therefore may have consequences in terms of failure in diagnotics and vaccine efficacy.

Materials and Methods
Complete genome sequencing of SRMVs from Pakistan and dataset information. The complete genome sequencing of two SRMV isolates [KY967609 (SRMV/Faisalabad/UVAS/Pak/2015) and KY967610 (SRMV/Layyah/UVAS/Pak/2015)] was performed as per primers and protocols described previously 5 . Later, including these two strains, a total of 75 whole genome sequences of SRMVs were accessed (https://www.ncbi. nlm.nih.gov/, October 01, 2019) and processed for subsequent bioinformatic analysis. Among these 75 SRMV sequences, four were attenuated vaccine strains (KJ867542, KF727981; HQ197753, X74443) and were excluded from the dataset used in the current study. Furtermore, given the "rule of six" genome atlas or polyhexameric genome length, 03 sequences including MF678816 (15927 bp), KM089831 (15957 bp) and KM816619 (16058 bp) were also excluded from comparative whole genome-specific analysis. However, owing to length of coding region comparable to each of the protein of SRMV, only the coding regions of these sequences were included and processed further in comparative genomic and residue analsyis. All essential information related to whole genome sequences of study-included strains is presented in Table 1.
Comparative genomic analysis. The complete genome (15954 bp) dataset was aligned to equal length using ClustalW methods in BioEdit version 5.0.6 61 and, based upon nucleotide number and position across the whole length of the genome, different genomic features were compared among all SRMV sequences. The consensus sequences were made for those SRMV sequences that had a highest nucleotide similarity and were originated from similar geographical regions. Nucleotide identity and divergence among all consensus whole genome sequences of lineages I-IV was assessed by Pairwise Sequence Comparisons (PASC) analysis in MEGA version 6.06 62 . The conserved domains, functional and structural motif/s, trans-membrane regions and unique substitutions in open reading frames were predicted using ORF Finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html), Conserved Domain Prediction tool (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) and HMMTOP program (http://www.enzim.hu/hmmtop/index.php). The potential N-glycosylation sites (N-X-T/S, where X denoted any residue except a Proline) were predicted by NetNGlyc 1.0 server (http://www.cbs.dtu.dk/services/ NetNGlyc) and accepted if the G-score was 0.5. Similarly, the diversity and/or conserveness of residues at important but hypervariable motif/s were analysed through WebLogo version 3.1 (accessible at http://weblogo.threeplusone.com/create.cgi).

Estimation of evolutionary and divergence dates. Using a Bayesian Markov Chain Monte Carlo
(MCMC) approach implemented in Bayesian evolutionary analysis sampling trees (BEAST) software package version 1.8.0 63 , the molecular evolutionary and divergence rates were co-estimated for complete genome and individual genes. For each dataset, a total of three independent runs of MCMC were conducted under a strict molecular clock model, using the Hasegawa-Kishino-Yano model of sequence evolution with a proportion of invariant sites and gamma distributed rate heterogeneity (HKY + I + C) with partitions into codon positions, and the remaining default parameters in the prior's panel. For each gene, the MCMC run was 36107 steps long and the posterior probability distribution of the chains was sampled every 1000 steps. Convergence was assessed on the (2020) 10:17 | https://doi.org/10.1038/s41598-019-54714-w www.nature.com/scientificreports www.nature.com/scientificreports/ basis of an effective sampling size after 10% burn-in using Tracer software, version 1.5 (http://tree.bio.ed.ac.uk/ software/tracer/). The estimations were the mean values obtained for the three runs. The mean time of the most recent common ancestor (TMRCA) and the 95% CI were calculated, and the best-fitting models were selected by a Bayes factor using marginal likelihoods implemented in Tracer 64 .
Phylogeography-based reconstruction of evolutionary tree. A reliability of a gene for molecular epidemiology was assessed by comparing all coding genes (N, P, M, F, H and L) extracted from whole genome sequence of SRMV and aligned separately by ClustalW methods incorporated in the BioEdit version 5.0.6 61 . The phylogenetic trees were constructed by neighbour-joining method with best-fit substitution model for each set of sequences using MEGA version 6.06 62 . A 1000 replication bootstrap value was adjusted to better elucidate the probability and reliability of clustering of isolates or any change in their clustering pattern.
Nucleotide diversity and natural selective pressure analysis. Based upon variable sites for mutations, and average numbers of pairwise nucleotide differences, the nucleotide diversity among coding sequences (CDS) of complete genome sequences was assessed for genomic polymorphism by DnaSP version 5.10.01 (accessible at http://www.ub.es/dnasp). The departure from neutrality in all isolate's sequences was tested by Tajima's D statistical method 65 . Divergence time in nucleotide diversity was estimated by a direct statistical model (HKA test). Data-monkey adaptive evolution server (http://www.datamonkey.org/) was used to evaluate synonymous (d S ) and non-synonymous (d N ) substitution rate per codon among CDS of all sequences 66  Detection of putative recombination event. The sequences were analyzed for the identification of reliable putative breakpoints by different tools including SimPlot version 3.5.1 68 , GARD (http://www.datamonkey. org/GARD), DAMBE version 5.2.30 69 and RDP4 version 4.95 70 . However, owing to an enhanced accuracy, clarity and reliability of analysis, outcomes gained by RDP4 were considered conclusive for further interpretation. The RDP4 was preferred because it employs a combination of seven different algorithms named RDP, GENECONV, BootScan, MaxChi, Chimaera, SiScan and 3Seq to better unleash putative recombinant and parent isolates at p < 0.001. A putative recombination event was assumed to have occurred only when it was consistently identified by at least four of the above-mentioned algorithms at a probability threshold of 0.05.
Ethical approval and informed consent. This research did not involve human participants or animals.