Phylogenetic evidence of the intercontinental circulation of a Canine distemper virus lineage in the Americas

Canine distemper virus (CDV) is the cause of a multisystem disease in domestic dogs and wild animals, infecting more than 20 carnivore and non-carnivore families and even infecting human cell lines in in vitro conditions. Phylogenetic classification based on the hemagglutinin gene shows 17 lineages with a phylogeographic distribution pattern. In Medellín (Colombia), the lineage South America-3 is considered endemic. Phylogenetic studies conducted in Ecuador using fragment coding for the fusion protein signal peptide (Fsp) characterized a new strain belonging to a different lineage. For understanding the distribution of the South America-3 lineage in the north of the South American continent, we characterized CDV from three Colombian cities (Medellín, Bucaramanga, and Bogotá). Using phylogenetic analysis of the hemagglutinin gene and the Fsp region, we confirmed the circulation of CDV South America-3 in different areas of Colombia. We also described, for the first time to our knowledge, the circulation of a new lineage in Medellín that presents a group monophyletic with strains previously characterized in dogs in Ecuador and in wildlife and domestic dogs in the United States, for which we propose the name “South America/North America-4” due its intercontinental distribution. In conclusion, our results indicated that there are at least four different CDV lineages circulating in domestic dogs in South America: the Europe/South America-1 lineage circulating in Brazil, Uruguay, and Argentina; the South America-2 lineage restricted to Argentina; the South America-3 lineage, which has only been reported in Colombia; and lastly an intercontinental lineage present in Colombia, Ecuador, and the United States, referred to here as the “South America/North America-4” lineage.

www.nature.com/scientificreports www.nature.com/scientificreports/ PCR and sequencing. Next, cDNAs from clinical specimens were screened by PCR of the phosphoprotein (P) gene using the Maxima Hot Start PCR Master Mix (2X) (Thermo Scientific ® ) reagent kit in accordance with the manufacturer's instructions. Viral cDNA was detected using morbillivirus universal primers 31 for amplifying a 429 bp fragment of the phosphoprotein gene. For PCR, 4 µl cDNA was added to a reaction mix, which consisted of 25 µl Maxima Hot Start PCR Master Mix (2X), 15 µl nuclease-free water, and 3 µl (10 µM) of each of the forward and reverse primers. PCR was performed on a ProFlex ™ PCR Thermal Cycler (Applied Biosystems ® ) under the following conditions: initial denaturation at 95 °C for 4 min followed by 35 cycles of denaturation at 95 °C for 30 s, annealing at 50.8 °C for 30 s, extension at 72 °C for 1 min, and a final extension at 72 °C for 5 min. Ultrapure water was used as a negative control and cDNA from one of the vaccines as positive control.
In all samples that tested positive for the P gene, the full-length H gene and the Fsp-coding region were amplified using Maxima Hot Start PCR Master Mix Kit in accordance with the manufacturer's instructions. The H gene was detected using the primers CDVff1 and HS2 32 for amplifying a 2099 bp fragment of the CDV genome that includes the H gene and flanking regions at both ends. The Fsp-coding region was amplified using the primers CDV-F4854 and CDV-R5535 10 or F5/R5 21 flanking the Fsp-coding region. In all cases, 4 µl cDNA was added to a PCR reaction mix, which consisted of 25 µl Maxima Hot Start PCR Master Mix (2X), 15 µl nuclease-free water and 3 µl (10 µM) of each of the primers (Table 1). PCR was performed on ProFlex ™ PCR Thermal Cycler (Applied Biosystems ® ) under the following conditions: initial denaturation at 95 °C for 4 min followed by 35 cycles of denaturation at 95 °C for 30 s, annealing for 30 s, extension at 72 °C for 2 min, and a final extension at 72 °C for 10 min. The annealing temperature for H gen was 48.2 °C, and for the Fsp-coding region the temperature was 50.8 °C for the F5/R5 primers and 58 °C for the CDV-F4854/R5535 primers.
Following PCR, 5 µl amplicons were analyzed by gel electrophoresis on a 1.5% agarose gel (AGAROSE I ™ , Amresco, Solon, OH, USA) at 110 V for 60 min. The gels were stained using EZ-VISION ™ dye (Amresco1 Solon, OH, USA) and viewed by transillumination with UV light using the Molecular Imager ® GelDoc TM XR + System with the image acquisition software ImageLab ™ (Bio-Rad, Hercules, CA, USA). Amplification product sizes were estimated using a 100-3000 bp molecular weight ladder (GeneRuler ™ 100 bp Plus DNA Ladder, Thermo Scientific ® ). PCR amplicons of the H gene and Fsp-coding region were submitted to Macrogen Inc. (Seoul, Korea) for purification and sequencing. An additional set of eight primers, published elsewhere, for the H gene were used for sequencing 32,33 (Table 1) using ABI3711 ™ automatic sequencer (Macrogen ™ ).
Phylogenetic analysis. Sequence data were assembled and edited using SeqMan program (DNAStar Lasergene ™ V15.0 software package, Madison, Wisconsin, USA). Nucleotide BLAST (Basic Local Alignment Search Tool) was used for exploring similarity between Colombian CDV strain sequences and all CDV sequences available in the NCBI nucleotide databases. For the H gene, a total length of 1824 nucleotides and corresponding deduced amino acid sequences were obtained only from dogs from Medellín and Bucaramanga, and for the Fsp-coding region (405 nucleotides and the corresponding deduced amino acid sequence), samples were obtained from all three studied cities (Bogotá DC, Medellín, and Bucaramanga). Phylogenetic analyses were carried out with at least two sequences for each reported lineage and vaccine strains from different geographical regions using MEGA ™ 7 34 and the MUSCLE algorithm, and nucleotide and amino acid differences were assessed as uncorrected (p) distances.
Phylogenetic relationships based on the nucleotide alignment of complete H gene sequences were inferred using distance-based (neighbor-joining) and character-based (maximum likelihood, Bayesian) approaches

Detection of P gene and clinical features.
A fragment of 429 bp of the phosphoprotein gene was detected in 68 (79.1%) clinical specimens from the 86 dogs sampled. In total, 44.1% of the CDV positive animals were male and 55.9% were female. Young dogs from one to six months old were the most affected (46.9%), although the disease also presented in dogs older than 12 months (21.9%). Concerning clinical manifestations in affected dogs, nervous and respiratory symptoms accounted for 25% of the cases, closely followed by the presentation of respiratory signs alone (21.4%). An equal proportion (17.9%) of clinically ill dogs presented with tegumentary/respiratory/nervous signs or respiratory/digestive symptoms alone.

Sequence analysis of the H gene and the Fsp-coding region.
We only were able to amplify and sequence a fragment of 2099 bp of the H gene of six clinical specimens. Next, 405 bp of the Fsp-coding region was assessed, and we were able to obtain positive amplifications and sequence 23 clinical samples out of the 68 P gene positives Samples. Information regarding the age, gender, breed, vaccination status, and clinical signs of the dogs, as well as outcome and accession numbers from H gene-positive samples, is summarized in Table 2. As we previously reported 17 , BLAST analysis of H sequence data from commercial Colombian vaccines used as positive controls in the present study revealed that vaccines had 99-100% identity with the vaccine strains from the North America-1 lineage, and one vaccine showed 99% identity with a Rockborn vaccine strain.
Colombian H sequences subjected to analysis displayed high identity with each other (93.5-99.9% nt; 93-99.9% aa) with an overall mean distance of 0.039. Alignment of the H gene of Colombian CDV strains and the Onderstepoort vaccine strain (AF378705) showed an identity that varied between 89.6% and 91.1% at the aa level and between 90.8% and 91.8% at the nt level. As expected, higher variability was found in the Fsp-coding region (81.39-99.01% nt, 63.91-97.7% aa) with an overall mean distance of 0.1014. Moreover, the Colombian Fsp-coding region sequences displayed very low identity with those of the Onderstepoort vaccine strain (80.9-83.6% nt, 57.6-67.4% aa).
Phylogenetic relationships based on the nucleotide alignment of complete H gene sequences inferred by distance (neighbor-joining) and character approaches (maximum likelihood and Bayesian inference) resulted in trees with similar topology. The phylogenetic tree of the H gene showed 16 lineages with a defined geographical distribution pattern (the Asia-3 lineage was grouped with strains of the America-1 lineage), while the Fsp tree only showed 15 lineages, primarily because there are no available Fsp sequences for the European Wildlife lineage.
Interestingly, we showed that Colombian CDV sequences cluster in two different branches in both the Fsp and H gene trees (Figs 1 and 2); one group of Colombian Fsp-coding sequences cluster in the same clade as Ecuadorian strains (Fig. 1), and also, interestingly, with two recently reported North America-4 lineage sequences 21 (97.1% identity). Other Colombian CDV sequences cluster with the South America-3 lineage previously reported in Colombia (Fig. 2). Unfortunately, no Ecuadorian H sequences have been reported to date.
Keeping in mind that Colombian viruses showed high variability at both the nucleotide and amino acid levels, we analyzed the identity of Colombian CDV H sequences and the previously reported CDV lineages (Table 3). By analysis of uncorrected (p) distances, we found that the amino acid sequences of Colombian and Ecuadorian CDV strains of the South America-4 lineage differed by less than 4% to those of strains of the previously reported North America-4 lineage. According to this, we propose that this lineage should be termed "South America/ North America-4" due its intercontinental distribution. Remarkably, we observed high variation (approximately 10%) between Colombian CDV lineages and the North America-1 lineage, which includes most of the commercial vaccine strains (Table 3). For evaluating this subclassification for the Fsp-coding region, we arbitrarily extrapolated the classification and found five subgenotypes in the South America-3 lineage (3A to 3E in Fig. 1). Subgenotype A included strains 42-CO-2012, 1aM-CO-2017, and BUC-12-CO-2016 (aa variation 1.2-2.5%) and subgenotype B included strains 18-CO-2012 and 44-CO-12 (aa variation 2-3.5%); both of these subgenotypes were also found with the H gene.      ,284 0,324 0,255 0,228 0, www.nature.com/scientificreports www.nature.com/scientificreports/ America-4" lineage displayed the following unique substitutions: Q5R, L38S, T193I, V198I, V235I, T291M, E333V, H339D, S341L, T348K, and F353I.

CDV subgenotype analysis. In CDV subgenotype analysis, based on criteria for measles Paramyxovirus
In the present study, three sequences from three dogs that had not been completely vaccinated were grouped in the "South America/North America-4" lineage; one strain had a very extensive branch ( Fig. 1; sequence Mde-19a-CO-2017), while another was shown to be related to CDV strains circulating in North America ( Fig. 2; sequence Mde-18a-CO-2017), and the final strain, 1aM-CO-2017, was shown to belong to the South America-3 lineage (Fig. 2). A comparison of the linear hemagglutinin noose epitope (HNE) between the vaccine strains and these Colombian strains showed the presence of multiple substitutions (Supplemental material Fig. S2). Regarding the potential glycosylation analysis of the Fsp peptide of the South America-3 lineage, we found the presence of two potential glycosylation sites (NXS/T) at positions 62-64 and 108-110 that are common to the other lineages. However, no potential glycosylation sites were found in the any of the "South America/North America-4" sequences.

CDV H and Fsp sites under positive selection.
We evaluated non-neutral selection using the FUBAR method. For the H gene, we found that sequences harbored three sites under positive selection: 522, 549, and 582, with a posterior probability of 0.9 and a Bayes factor of 28.3, 220.9, and 44.7 respectively. We also found 247 sites under negative selection with a posterior probability of 0.9 and Bayes factor < 1.

H gene and Fsp phylogeography.
For this analysis, the vaccine sequences were eliminated since there were no exact dates of isolation for these strains, which would have produced biases in the substitution rates. Table 4 shows the evolutionary model for each gene. Also, Table 5 shows the estimated TMRCA for the H gene. The South America-3 and "South America/North America-4" lineages have a TMRCA corresponding to 1964 and 1925, respectively (see Table 5). The phylogeography of the Fsp and H gene is shown in Figs 3 and 4, respectively.

Discussion
Phylogenetic characterization of CDV is performed on the basis of the H gene sequence because this gene shows high nucleotide variability in CDV between field strains and vaccine strains in comparison with other paramyxoviruses 42,43 . Using this method, it is accepted that two strains belong to the same lineage when their amino acid diversity is less than 4% 42 . Presently, there are 17 known lineages worldwide with a geographical distribution pattern. However, a fragment that codes for the Fsp of CDV has been suggested as an alternative for the classification of CDV strains since it is highly divergent and has given similar classification results to the H gene. Using this approach, two strains are considered to belong to the same CDV lineage if their amino acid divergence is less than 19% 10 .
In 2012, based on samples only from Medellin city 17 , it was characterized the South America-3 lineage by using the full H gene sequencing approach. However, it was not possible to compare the circulation of this CDV lineage with viruses from other regions of Colombia or in neighboring countries. In the present study, the Fsp-coding region was sequenced from samples obtained in Medellín between 2012 and 2017 and from other    www.nature.com/scientificreports www.nature.com/scientificreports/ Through the phylogenetic analysis of the Fsp-coding region of the present study, we observed that the sequences of the Ecuadorian strains and the former called "America-4" lineage formed a monophyletic group, evidencing a 12% amino acid divergence, which was supported by the table of distances (Table 3). This led us to the conclusion that those viral sequences belong to the same lineage. However, for confirming this finding, we try to amplify the H gene of the same samples. In the resulting phylogenetic tree, we observed a monophyletic group of the strains of the lineage America-4 And also the Ecuadorian strain, which harbored an amino acid divergence in the H gene of 2.8%, showing that these strains belong to the same CDV lineage. For this reason, we suggest calling this lineage "South America/North America-4. " For comparing the topologies of the phylogenetic trees, we used the North America-1/Vaccine lineage as an outgroup. We observed that although the lineages characterized in this work are sister groups, they present different ancestry. In the H gene tree, the oldest clades are the Africa-1, Arctic-like, and Asia-2 lineages. From this node, the Asia-4, South America-3, and "South America/North America-4" lineages appear as sister groups. A polytomy emerges, from which lineages emerge as sister groups: North America-3 and South America-2; North America-2, European Wildlife, Asia-1, and Rockborn-like; and Africa-2 and Europe/South America-1.
In the Fsp-coding region tree, the ancestral clades of the characterized lineages arise from a polytomy from which sister groups originate: Arctic-like and Asia-2; Africa-1; North America-1 and North America-2; and Rockborn-like (Fig. 1). From this node, another polytomy emerges, giving rise to more sister groups as follows: the Europe/South America-1, South America-2, and Africa-2; and Asia-1, North America-3, South America-3, and "South America/North America-4." In their 2013 study, Sarute et al. described topological differences between the Fsp and H gene trees 10 . Although the topological structures they identified did not entirely correspond with those found in the present study (Figs 1 and 2), it should be taken into account that nowadays, there are a higher number of Fsp coding sequences available, representing most of the lineages characterized to date; besides, Fsp-coding region sequences could now be obtained from Genbank full CDV genomes. However standard phylogenetic studies still been done with the H gene, thus having a greater number of sequences available in the Genbank.
On the basis of the classification system for measles, a subgenotype consists of H gene sequences that have an amino acid identity of 98% and a high bootstrap value (>70%) 28 . On the basis of these criteria, both the South America-3 and "South America/North America-4" lineages present three subgenotypes. When we arbitrarily extrapolated this classification to the Fsp region, a different set of subgenotypes was found (subgenotypes A-E for South America-3 and A-C for "South America/North America-4"), although fewer than for Europe/South America-1, which reportedly contains at least eight subgenotypes (A-H), including CDV H gene sequences from at least seven different countries 28 . www.nature.com/scientificreports www.nature.com/scientificreports/ In measles, different subgenotypes are not geographically restricted, although some appear to be mainly endemic in different areas of the world 44 . In the present study, it was not possible to determine the geographic pattern of CDV subgenotypes on the basis of the H gene as has been previously reported for the Europe/South America-1 and South African subgenotypes 6,45 and for measles 46 . However, with the Fsp fragment, distribution patterns can be observed between regions (Fig. 1); subgenotype 3A circulates only in Medellín, while subgenotype 3D circulates only in Bogotá. Also, in the "South America/North America-4" lineage, subgenotype 4A was only reported in Colombian strains and subgenotype 4D in Ecuadorian strains. A higher number of CDV sequences collected from different areas within those countries would be necessary for better understanding of the circulation history of CDV subgenotypes in the Americas.
A temporary pattern of distribution has been reported for some of the Europe/South America-1 and South African subgenotypes as well as for measles virus genotypes and subgenotypes 28,44,46 . Our results showed a similar temporary pattern of distribution in most of the subgenotypes in both the South America-3 and the "South America/North America-4" lineages (Figs 1 and 2). These results must be carefully evaluated, as although they may show a temporary pattern of CDV distribution and a possible strain displacement pattern, sampling bias could be another possible explanation. Routine international determination of CDV lineages and subgenotypes plus molecular surveillance could be useful for gaining a more accurate epidemiological understanding of temporary CDV distribution.
The uncontrolled commercialization of puppies from South America in the USA could be the route of transmission of the "South America/North America-4" lineage in these two regions of the continent. It is important to highlight that this is the second lineage that is reportedly actively circulating in two different continental regions, the first being Europe/South America-1 20 . It is imperative that wider phylogeographic studies of the "South America/North America-4" lineage are conducted to establish its origin and geographical spread throughout the American continent; it may have originated in Ecuador and spread through Colombia to the USA, or vice versa. Since CDV is a re-emerging infection in the USA, with at least five different lineages in circulation 15 , deeper phylogenetic analysis could help in gaining an understanding of the epidemiology of CDV on this continent.
In the present study, an amino acid divergence close to 11% in the H protein was observed between vaccine strains and the lineages South America-3 and "South America/North America-4" ( Table 3). CDV is presently recognized as a single serotype 47 as there is little evidence of antigenic divergence as a result of genetic divergence. Recently, significant differences were reported in the evaluation of neutralizing titers between "South America/ North America-4" lineage strains and an America-1 type vaccine strain 27 . Given those results and the fact that multiple recognized CDV cases have been recorded in vaccinated animals 17,21,28,48 , it is necessary to perform wider, updated antigenic analyses of CDV for understanding the antigenic differences between the multiple worldwide circulating lineages and, potentially, to produce a vaccine update that includes most prevalent antigenic types.
In the positive selection analysis, we observed that the "South America/North America-4" lineage possesses a unique substitution (V79I) in the Fsp fragment at a site that is under positive selection; the South America-3 lineage also possesses a unique substitution (I102S) at a site under positive selection. This was determined using the FUBAR method, which assumes that the selection pressure for each site is constant throughout the phylogeny 38 . In this way, it was also determined that sites 98, 99, 101, and 102 of Fsp are under positive selection. These changes must be studied to understand the role of such substitutions in vaccine failures and interspecies host changes.
Likewise, we found in gene H sites 522, 549, and 582 under positive selection coinciding with site 549, which has been proposed as a key in the species barrier jump 4 , however sites 522 and 582 have not been previously reported related to pathogenicity, vaccine failure or species barrier jump, which should be deeply studied to understand the role of this sites in pathogenesis and inter-species transmission.
The linear HNE located at amino acids 364-392 of the CDV H protein is conserved among the morbilliviruses 49 . This is the region of the H protein that is recognized by antibodies 50,51 . Recently, it has been suggested that substitutions in this epitope may interfere with the ability of the vaccine to provide adequate protection against infection with wild-type viruses 15 . As reported recently in the previously called "North America-4" strains, we observed the presence of multiple substitutions in the HNE of CDV viruses in vaccinated animals (Supplemental Material Fig. S2). However, from the bioinformatic approach used in this study, we can only suggest that these substitutions could be interfering with the capacity of the vaccine to neutralize wild strains. For this reason, it is necessary to perform neutralization studies of vaccines with wild strains. Currently, structural biology studies of Colombian CDV viruses are underway in an effort to understand the role of structural substitutions in the HNE epitope of Colombian CDV viruses and their role in viral neutralization. On this way, it is important to consider thate glycosylation sites found in the Fsp region of the South America-3 lineage (62-64 and 108-110) could be involved in the evasion of the response or could be the result of epistatic interactions in the H gene 52 .
CDV has one of the highest-reported substitution rates in the Paramyxoviridae family (10.53-11.65 × 10 −4 substitutions/base/year) 43 . Our results show that CDV circulating in Colombia exhibits high variability and includes two lineages and multiple subgenotypes (Figs 1 and 2 and Table 3). The temporary and geographical scope of our sampling was not sufficient to accurately explain the variability of CDV in the region. However, considering that the H gene has undergone genetic drift in different geographical regions 26 , we hypothesize that selective and nonselective processes may play important roles in the co-circulation of multiple lineages in an area, as has been reported previously 52 .
Phylogeographic analysis of the Fsp region and the H gene shows an evolutionary rate for the H gene similar to that reported by Fischer et al. 53 . However, the resulting two trees differ in their topology in such a way that the ancestry of the lineages of interest is very different (Figs 3 and 4). These differences in topology in comparison to other trees 54 may be due to the fact that in the present analysis we excluded reported vaccine strains because the vaccine strains has been adapted to cell culture and have different evolutionary rates in comparison with CDV wild strains 52 ; Also added three newly reported lineages, including "South America/North America-4." (2019) 9:15747 | https://doi.org/10.1038/s41598-019-52345-9 www.nature.com/scientificreports www.nature.com/scientificreports/ By comparing both trees, we observed apparently, that the "South America/North America-4"lineage circulated first in Colombia and Ecuador, then in the United States, and again in Colombia (Figs 3 and 4). It is unclear if the variation in spatiotemporal sampling of the "South America/North America-4" lineage is more likely due to bias, as has been reported in other viral models 55 . In addition, since we suspected that most of the ancestral Colombian sequences of both trees are immune escape mutants, deeper analyses must be performed to avoid misleading results regarding the dynamics of the "South America/North America-4" CDV lineage.
Outbreaks of CDV occur in endemic and acute epidemic cycles, leading to transmission among susceptible host populations 54 . In the presence of full or partial vaccination, lifelong immunity could lead to the survival of the remaining coexisting lineages driven by nonselective epidemiological processes 44,56 . Our results, based on unvaccinated and/or incompletely vaccinated populations, support this hypothesis ( Table 2).
In contrast with measles, the only natural host of which is humans, broad ranges of host species are susceptible to CDV infection, which results in complications in terms of selection pressure for this virus. It is noteworthy that the "South America/North America-4" lineage characterized in the USA was isolated from domestic dogs and foxes, indicating that this lineage has the ability to jump the species barrier 3,21 . In Colombia, there have been reports of CDV infection in wildlife 57 ; however, no phylogenetic analysis has been performed in viruses from those infected animals.
Reported substitutions in circulating CDV protein H in wildlife include E276V, Q392R, R519I, I542F, and Y549H, i.e., sites that show the same substitutions in the South America-3 and South America/NorthAmerica-4 lineages, which indicates the potential of these viruses to jump the species barrier. However, there is no statistical association that demonstrates these hypotheses 58 .
In conclusion, we report the co-circulation of two CDV lineages in Colombia: the South America-3 lineage circulating in Medellín, Bucaramanga, and Bogotá, and the concurrent circulation of a new lineage not previously described in the country that mainly infects dogs in Medellín. The latter lineage is evolutionarily related to strains reported in domestic dogs in Ecuador and in domestic dogs and wildlife in the USA. Given the intercontinental circulation of this lineage, we propose to name it "South America/North America-4. "

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author upon request.