Introduction

Canine distemper virus (CDV) belongs to the Paramyxoviridae family, genus Morbillivirus, which includes viruses with epidemiological relevance to human and animal populations1,2. The etiological agent of a highly prevalent viral infectious disease of domestic and wild carnivores, CDV poses a conservation threat to endangered species worldwide3,4.

Clinical symptoms in all affected species are influenced by strain virulence, environmental conditions, host age, and immune status. The gastrointestinal tract and the respiratory and nervous systems are the most affected in all species5,6. Clinical signs of CDV in dogs include the onset of a cutaneous rash, serous nasal and ocular discharge, conjunctivitis, and anorexia, followed by gastrointestinal and respiratory signs, which are often complicated by secondary bacterial infections and neurological disorders7.

Of the six encoded viral proteins, the hemagglutinin (H) and fusion (F) glycoproteins are responsible for virus attachment to and fusion with the host cells8. The H protein is of principal importance as it determines viral tropism in vivo and in vitro and thus determines host-specific immunity9.

Due to external pressures on the host immune system, the H gene displays higher genetic variability when compared with other CDV genes9, which makes it suitable for lineage identification and phylogenetic analysis. Moreover, researchers have established that a short region of the F gene, which encodes the signal peptide of the F protein (Fsp), is extremely variable, suggesting that this region could also be a useful marker for evolutionary studies as it allows for the straightforward identification of CDV lineages10,11,12,13,14.

Phylogenetic studies based on the complete sequence of the H gene or the Fsp-coding region of several CDV viruses worldwide have revealed a geographical pattern of genetic diversity. According to this pattern, there are multiple distinct lineages/genotypes, most of which follow a geographical pattern of distribution. These lineages are known as America-1 (which includes the commercially available vaccines), America-2-5, Arctic-like, Asia-1-4, Africa-1 and -2, European Wildlife, Europe/South America-1, and South America-2 and -39,15,16,17,18,19,20,21,22,23,24.

Although immunization with attenuated vaccines has been widely used for preventing CDV, researchers have hypothesized that the strong genetic diversity and wide variability of the H gene could mean that the antigenic profile of these new genetic variants is altered relative to the vaccine strain if specific sites on the H protein associated with immune neutralization are affected9,25,26. In fact, recent analysis has suggested the need for developing an updated CDV vaccine due to differences in cross-neutralization assays revealing wide antigenic differences among wild-type CDV isolates and the vaccine strains currently used in the U.S.27.

Different publications have shown a worldwide increase in the incidence of the disease, even in vaccinated dog populations12,17,28,29 and as most CDV commercial vaccines are formulated with strains belonging to the America-1 lineage21, antigenic differences could possibly explain the worldwide increase in the incidence of the disease even in vaccinated dogs.

Researchers have suggested that the South American continent has one of the highest CDV antigenic variabilities in the world13. Phylogenetic analyses based on the CDV H gene from South America have been performed for establishing the evolutionary patterns of the virus in the region and have revealed multiple circulating lineages of CDV, each differing in prevalence. In Brazil, Uruguay, and Argentina, the most prevalent lineage is Europe/South America-120,28; the second-most prevalent, known as South America-2, is restricted to Argentinian canine populations and appears to be associated with strains isolated from wild carnivore species in Europe30.

In the northern part of South America, the circulation of different lineages has been reported; in Colombia, a third lineage (South America-3) that causes disease even in vaccinated dogs was identified on the basis of complete H gene sequencing17. In the same year, a possible different CDV lineage was described based on analysis of the Fsp-coding region of Ecuadorian strains11. Owing to differences in methodologies, complete H gene sequences are not comparable with Fsp-coding region sequences available from this region of the continent. It is possible that a genetic relationship exists between CDV strains from Colombia and Ecuador because they have similar geographic distributions in the northern region of South America.

For determining the genetic diversity of CDV from Colombia in relation to CDV from other regions of South America, we analyzed the genetic diversity within the Fsp-coding region and H gene of CDV strains currently circulating in different regions of Colombia and compared it with that of wild-type CDV circulating throughout America and worldwide in addition to vaccine strains.

Materials and Methods

Ethical considerations

This study was approved by the Ethics Committee for Animal Experimentation of the Universidad Cooperativa de Colombia in Bucaramanga. All experiments were performed in accordance with relevant guidelines and regulations. Dog owners signed informed consent forms approved by the ethics committee. Viral samples are subject of contract for access to genetic resources and derived products No. 132 of 2016 RGE0177 signed with the Ministry of Environment and Sustainable Development of the Republic of Colombia. In addition, the authors declare that the implementation of this work followed all scientific, technical, and administrative rules for animal research.

Clinical specimens and vaccine strains

A total of 86 clinical samples from dogs exhibiting signs suggestive of CDV were obtained from three main Colombian provinces (Supplemental Material Fig. S1). Of these, 48 clinical specimens were taken from Bogotá D.C. between January 2014 and June 2015, 11 were taken from Bucaramanga between June 2016 and November 2016, and 27 were taken from Medellín between May and September 2017. Samples were taken in different veterinary hospitals and basic data were gathered at the time of sampling, including the dogs’ age, sex, breed, vaccination status, municipality and/or neighborhood of origin, and clinical signs. Clinical specimens included serum and ocular discharge.

RNA extraction

Total RNA was extracted from 140 µl of the supernatant of ocular discharge, serum, and four commercial CDV vaccines using QIAamp Viral RNA Mini Kit (QIAGEN®, Hilden, Germany) in accordance with the manufacturer’s instructions. The quality and quantity of the RNA was determined using spectrophotometric analysis with a NanoDrop™ One UV-Vis Spectrophotometer (Thermo Scientific, Wilmington, Delaware, USA), and RNA aliquots were stored at −80 °C until use.

Complementary cDNA synthesis

Complementary DNA (cDNA) was synthesized using RevertAid™ Premium First Strand cDNA Synthesis Kit (Thermo Scientific®, Glen Burnie, MD) in accordance with the manufacturer´s instructions. A denaturation mix consisting of 1 µl (100 pmol/µl) random hexamers, 1 µl dNTP Mix (10 mM) and 13 µl (0.02–4.6 µg) total RNA was initially denatured at 65 °C for 5 min and immediately incubated on ice. The RT mix solution consisted of 4 µl 5X Reverse Transcriptase Buffer and 1 µl RevertAid™ Premium Enzyme Mix. The RT mix was added to the denaturation mix and reverse transcription was performed in a total volume of 20 µl in a ProFlex™ PCR Thermal Cycler (Applied Biosystems®, Foster city, California, USA) for 10 min at 25 °C followed by 30 min at 50 °C; the reaction was terminated by heating to 85 °C for 5 min. The reaction product was stored at −80 °C until use. Commercially available vaccines were used as positive controls for RT-PCR reactions.

PCR and sequencing

Next, cDNAs from clinical specimens were screened by PCR of the phosphoprotein (P) gene using the Maxima Hot Start PCR Master Mix (2X) (Thermo Scientific®) reagent kit in accordance with the manufacturer’s instructions. Viral cDNA was detected using morbillivirus universal primers31 for amplifying a 429 bp fragment of the phosphoprotein gene. For PCR, 4 µl cDNA was added to a reaction mix, which consisted of 25 µl Maxima Hot Start PCR Master Mix (2X), 15 µl nuclease-free water, and 3 µl (10 µM) of each of the forward and reverse primers. PCR was performed on a ProFlex™ PCR Thermal Cycler (Applied Biosystems®) under the following conditions: initial denaturation at 95 °C for 4 min followed by 35 cycles of denaturation at 95 °C for 30 s, annealing at 50.8 °C for 30 s, extension at 72 °C for 1 min, and a final extension at 72 °C for 5 min. Ultrapure water was used as a negative control and cDNA from one of the vaccines as positive control.

In all samples that tested positive for the P gene, the full-length H gene and the Fsp-coding region were amplified using Maxima Hot Start PCR Master Mix Kit in accordance with the manufacturer’s instructions. The H gene was detected using the primers CDVff1 and HS232 for amplifying a 2099 bp fragment of the CDV genome that includes the H gene and flanking regions at both ends. The Fsp-coding region was amplified using the primers CDV-F4854 and CDV-R553510 or F5/R521 flanking the Fsp-coding region. In all cases, 4 µl cDNA was added to a PCR reaction mix, which consisted of 25 µl Maxima Hot Start PCR Master Mix (2X), 15 µl nuclease-free water and 3 µl (10 µM) of each of the primers (Table 1). PCR was performed on ProFlex™ PCR Thermal Cycler (Applied Biosystems®) under the following conditions: initial denaturation at 95 °C for 4 min followed by 35 cycles of denaturation at 95 °C for 30 s, annealing for 30 s, extension at 72 °C for 2 min, and a final extension at 72 °C for 10 min. The annealing temperature for H gen was 48.2 °C, and for the Fsp-coding region the temperature was 50.8 °C for the F5/R5 primers and 58 °C for the CDV-F4854/R5535 primers.

Table 1 Oligonucleotides used for CDV P gene detection and for full length H gene and Fsp-coding region amplification and sequencing.

Following PCR, 5 µl amplicons were analyzed by gel electrophoresis on a 1.5% agarose gel (AGAROSE I™, Amresco, Solon, OH, USA) at 110 V for 60 min. The gels were stained using EZ-VISION™ dye (Amresco1 Solon, OH, USA) and viewed by transillumination with UV light using the Molecular Imager® GelDoc TM XR + System with the image acquisition software ImageLab™ (Bio-Rad, Hercules, CA, USA). Amplification product sizes were estimated using a 100–3000 bp molecular weight ladder (GeneRuler™ 100 bp Plus DNA Ladder, Thermo Scientific®).

PCR amplicons of the H gene and Fsp-coding region were submitted to Macrogen Inc. (Seoul, Korea) for purification and sequencing. An additional set of eight primers, published elsewhere, for the H gene were used for sequencing32,33 (Table 1) using ABI3711™ automatic sequencer (Macrogen™).

Phylogenetic analysis

Sequence data were assembled and edited using SeqMan program (DNAStar Lasergene™ V15.0 software package, Madison, Wisconsin, USA). Nucleotide BLAST (Basic Local Alignment Search Tool) was used for exploring similarity between Colombian CDV strain sequences and all CDV sequences available in the NCBI nucleotide databases. For the H gene, a total length of 1824 nucleotides and corresponding deduced amino acid sequences were obtained only from dogs from Medellín and Bucaramanga, and for the Fsp-coding region (405 nucleotides and the corresponding deduced amino acid sequence), samples were obtained from all three studied cities (Bogotá DC, Medellín, and Bucaramanga). Phylogenetic analyses were carried out with at least two sequences for each reported lineage and vaccine strains from different geographical regions using MEGA™ 734 and the MUSCLE algorithm, and nucleotide and amino acid differences were assessed as uncorrected (p) distances.

Phylogenetic relationships based on the nucleotide alignment of complete H gene sequences were inferred using distance-based (neighbor-joining) and character-based (maximum likelihood, Bayesian) approaches implemented in MEGA™ 7 and MrBayes 3.2.635 software, respectively. The best-fit model for nucleotide substitution was identified by MEGA™ 7 as Tamura 3-parameter with gamma-distributed rate heterogeneity (T92 + G) according to the Bayesian information criterion for the H gene and Hasegawa-Kishino-Yano with gamma distribution (G) (HKY + G) for the Fsp-coding region. Maximum likelihood analysis was performed using the latter model; however, Bayesian inference analysis was performed with a general time-reversible model plus gamma-distributed rate heterogeneity (GTR + G) because the T92 + G model cannot be implemented in MrBayes 3.2.6. For this method, two parallel analyses were run for 1,000,000 generations with a 25% burn-in period. The convergence of the Markov chain Monte Carlo (MCMC) chains was assessed by the standard deviation of split frequencies, which fall below 0.01. The America-1 lineage was used as an outgroup to root the phylogenetic trees. The consensus trees were edited in FigTree software version 1.436.

Amino acid analysis of the H protein and the Fsp peptide

The deduced amino acid sequences of the H protein (607 aa) and the Fsp peptide (135 aa) of the Colombian wild-type CDV strains were aligned with multiple CDV protein sequences from different geographical regions using MEGA™ 7 for exploring their amino acid profiles and potential differences between vaccine and wild-type strains of known CDV lineages. Potential N-linked glycosylation sites were predicted using NetNGlyc 1.037.

Sites under positive selection

To identify amino acid sites under positive selection in the CDV H protein and Fsp peptide, the ratio of non-synonymous (dN) to synonymous (dS) substitutions was calculated by ML phylogenetic reconstruction using the general reversible nucleotide substitution model available through the Datamonkey web server. To detect non-neutral selection, Fast Unconstrained Bayesian AppRoximation (FUBAR) within the HyPhy software package was implemented in Datamonkey38. The range of significance for the posterior probability was 0–1. Generally, posterior probabilities > 0.9 are strongly suggestive of positive selection. Finally, a Bayes factor = 50 was used for estimating the rates of dN and dS within each codon. Values of dN/dS > 1, dN/dS = 1, and dN/dS < 1 were used for defining positive selection (adaptive molecular evolution), neutral mutations, and negative selection (purifying selection), respectively.

H gene and Fsp phylogeography

The mean substitution rate (substitutions per site per year), the time to the most recent common ancestor (TMRCA), the geographic origin, and the overall spatial dynamics of the major CDV clades were inferred using the Bayesian approach of the MCMC implemented in the BEAUti/BEAST v1.8.4 package39. The analysis was implemented using a strict molecular clock with a constant population size, and 3E07 generations were run in order to ensure an effective population size greater than 200 for the evaluated parameters using the Tracer v1.7 program39. The initial 10% of the MCMC, which corresponds with low probability states at the beginning of the chain, was eliminated. The tree of maximum credibility of the MCC clades was built with TreeAnnotator and visualized with FigTree v1.4.340.

Results

Detection of P gene and clinical features

A fragment of 429 bp of the phosphoprotein gene was detected in 68 (79.1%) clinical specimens from the 86 dogs sampled. In total, 44.1% of the CDV positive animals were male and 55.9% were female. Young dogs from one to six months old were the most affected (46.9%), although the disease also presented in dogs older than 12 months (21.9%). Concerning clinical manifestations in affected dogs, nervous and respiratory symptoms accounted for 25% of the cases, closely followed by the presentation of respiratory signs alone (21.4%). An equal proportion (17.9%) of clinically ill dogs presented with tegumentary/respiratory/nervous signs or respiratory/digestive symptoms alone.

Sequence analysis of the H gene and the Fsp-coding region

We only were able to amplify and sequence a fragment of 2099 bp of the H gene of six clinical specimens. Next, 405 bp of the Fsp-coding region was assessed, and we were able to obtain positive amplifications and sequence 23 clinical samples out of the 68 P gene positives Samples. Information regarding the age, gender, breed, vaccination status, and clinical signs of the dogs, as well as outcome and accession numbers from H gene–positive samples, is summarized in Table 2.

Table 2 Clinical features of Colombian dogs infected with canine distemper virus (CDV).

The Fsp-coding region sequences of the strains from Medellín, D/Mde_19a/CO/2017, D/Mde_2aP/CO/2017, D/Mde_13b/CO/2017, and D/Mde_16a/CO/2017, were 100% identical. Likewise, the sequences from Bogotá, D/Bog-4/CO/2015, D/Bog-5/CO/2015, and D/Bog-6/CO/2015, were 100% identical to each other. Consequently, only one of those sequences (D/Bog-4/CO/2015) were included with the rest of the Colombian strains in the phylogenetic analysis. Vaccines used as positive controls in the PCR reactions were also sequenced.

As we previously reported17, BLAST analysis of H sequence data from commercial Colombian vaccines used as positive controls in the present study revealed that vaccines had 99–100% identity with the vaccine strains from the North America-1 lineage, and one vaccine showed 99% identity with a Rockborn vaccine strain.

Colombian H sequences subjected to analysis displayed high identity with each other (93.5–99.9% nt; 93–99.9% aa) with an overall mean distance of 0.039. Alignment of the H gene of Colombian CDV strains and the Onderstepoort vaccine strain (AF378705) showed an identity that varied between 89.6% and 91.1% at the aa level and between 90.8% and 91.8% at the nt level. As expected, higher variability was found in the Fsp-coding region (81.39–99.01% nt, 63.91–97.7% aa) with an overall mean distance of 0.1014. Moreover, the Colombian Fsp-coding region sequences displayed very low identity with those of the Onderstepoort vaccine strain (80.9–83.6% nt, 57.6–67.4% aa).

Phylogenetic relationships based on the nucleotide alignment of complete H gene sequences inferred by distance (neighbor-joining) and character approaches (maximum likelihood and Bayesian inference) resulted in trees with similar topology. The phylogenetic tree of the H gene showed 16 lineages with a defined geographical distribution pattern (the Asia-3 lineage was grouped with strains of the America-1 lineage), while the Fsp tree only showed 15 lineages, primarily because there are no available Fsp sequences for the European Wildlife lineage.

Interestingly, we showed that Colombian CDV sequences cluster in two different branches in both the Fsp and H gene trees (Figs 1 and 2); one group of Colombian Fsp-coding sequences cluster in the same clade as Ecuadorian strains (Fig. 1), and also, interestingly, with two recently reported North America-4 lineage sequences21 (97.1% identity). Other Colombian CDV sequences cluster with the South America-3 lineage previously reported in Colombia (Fig. 2). Unfortunately, no Ecuadorian H sequences have been reported to date.

Figure 1
figure 1

Phylogenetic relationships between 64 CDV strains based on Fsp fragment gene sequences. The phylogenetic tree was inferred by the maximum likelihood method using 1000 replicates. GenBank accession numbers, the species from which each isolate was obtained, name of the strain, country of origin, and year of isolation are indicated in the tip labels if available. Numbers at the nodes are bootstrap values for the clade. Abbreviations for animal species: AL: African lion (Panthera leo), AWD: African wild dog (Lycaon pictus), B: badger (Meles meles), BS: Baikal seal (Pusa sibirica), D: dog (Canis lupus familiaris), F: ferret (Mustela putorius furo), FX: fox (Vulpes urocyon), H358: human lung cells, M: mink (Neovison vison), R: raccoon (Procyon lotor), RD: raccoon dog (Nyctereutes procyonoides), S: seal (Phoca vitulina), SH: spotted hyena (Crocuta crocuta). Abbreviations for countries: AR: Argentina, BR: Brazil, CN: China, CO: Colombia, DE: Germany, EC: Ecuador, GA: Gabon, IT: Italy, JP: Japan, KZ: Kazakhstan, RU: Russia, TZ: Tanzania, US: United States, UY: Uruguay, ZA: South Africa.

Figure 2
figure 2

Phylogenetic relationships between 68 CDV strains based on H gene sequences. The phylogenetic tree was inferred by the maximum likelihood method using 1000 replicates. GenBank accession numbers, the species from which each isolate was obtained, name of the strain, country of origin, and year of isolation are indicated in the tip labels if available. Numbers at the nodes are bootstrap values for the clade. Abbreviations for animal species: AL: African lion (Panthera leo), B: badger (Meles meles), CT: Cerdocyon thous, D: dog (Canis lupus familiaris), F: ferret (Mustela putorius furo), FX: fox (Vulpes urocyon), GJ: golden jackal (Canis aureus), GP: giant panda (Ailuropoda melanoleuca), J: javelina (Tayassu pecari), LP: lesser panda (Ailurus fulgens), M: mink (Neovison vison), MP: Martes pennanti, R: raccoon (Procyon lotor), RD: raccoon dog (Nyctereutes procyonoides), S: seal (Phoca vitulina), SH: spotted hyena (Crocuta crocuta). H358: human lung cells. Abbreviations for countries: AR: Argentina, AT: Austria, BR: Brazil, CN: China, CO: Colombia, DE: Germany, DK: Denmark, HU: Hungary, IT: Italy, JP: Japan, KR: South Korea, KZ: Kazakhstan, MX: Mexico, SE: Sweden, TZ: Tanzania, US: United States, UY: Uruguay, ZA: South Africa.

Keeping in mind that Colombian viruses showed high variability at both the nucleotide and amino acid levels, we analyzed the identity of Colombian CDV H sequences and the previously reported CDV lineages (Table 3). By analysis of uncorrected (p) distances, we found that the amino acid sequences of Colombian and Ecuadorian CDV strains of the South America-4 lineage differed by less than 4% to those of strains of the previously reported North America-4 lineage. According to this, we propose that this lineage should be termed “South America/North America-4” due its intercontinental distribution. Remarkably, we observed high variation (approximately 10%) between Colombian CDV lineages and the North America-1 lineage, which includes most of the commercial vaccine strains (Table 3).

Table 3 Uncorrected distances (p) observed in pairs of amino acid sequences of the Fsp and H genes between CDV lineages.

CDV subgenotype analysis

In CDV subgenotype analysis, based on criteria for measles Paramyxovirus (H amino acid identity of 98% and bootstrap values >70%), we identified at least three subgenotypes in the South America-3 lineage. Subgenotype A included strains 13-CO-12, 19-CO-2012, 26-CO-12, and 1aM-CO-2017 (aa variation 0.2–0.06%); subgenotype B included strains 18-CO-2012, 40-CO-2012, and 44-CO-2012 (aa variation 0.8–1%); and subgenotype C included strains 14-CO-2012 and 39-CO-2012 (aa variation 2.2%) (subgenotypes 3A to 3C in Fig. 2).

Also, we identified three subgenotypes within the “South America/North America-4” lineage. Subgenotype A included strains 16a-CO-2017, 2aP/CO-2017, 13b-CO-2017, and 2a-CO-2017 (aa variation 0.2–1.4%); subgenotype B included strains 13-1941-US-2013 and 13-2262-US-2013 (aa variation 2.6%) (American strains); and subgenotype C included strain 18a-CO-2017 (aa variation 4.3%) (subgenotypes 4A to 4C in Fig. 2).

For evaluating this subclassification for the Fsp-coding region, we arbitrarily extrapolated the classification and found five subgenotypes in the South America-3 lineage (3A to 3E in Fig. 1). Subgenotype A included strains 42-CO-2012, 1aM-CO-2017, and BUC-12-CO-2016 (aa variation 1.2–2.5%) and subgenotype B included strains 18-CO-2012 and 44-CO-12 (aa variation 2–3.5%); both of these subgenotypes were also found with the H gene. Two other subgenotypes were not characterized with the H gene: subgenotype D, including eight Bogotá 2015 strains (aa variation 4.7–5.7%) and subgenotype E, including strains BOG2-CO-2014 and BOG11-CO-2015 (aa variation 5.5–5.7%).

Likewise, in the Fsp analysis of the “South America/North America-4” lineage, we observed four well-defined subgenotypes (4A to 4D in Fig. 1). Subgenotype A included strains 16-CO-2017, 2aP/CO-2017, 13b-CO-2017, 9-CO-2017, and 2-CO-2017 (aa variation 0–1.5%); subgenotype B included strains 13-1941-US-2013, 13-2262-US-2013, and 19-CO-2017 (aa variation 5.5–11.2%); and subgenotype D included Ecuadorian strains not characterized with the H gene (aa variation 1.5–2%).

Amino acid analysis of the H protein

Analysis of the deduced amino acid sequences of the full-length H protein (607 amino acids) of Colombian CDV viruses showed the presence of exclusive substitutions also found in the South America-3 (N261S, G488R, T544S) and the “South America/North America-4” lineages (E333V, T348K). Remarkably, we found a set of substitutions that are common to both Colombian lineages (T193I, V198I, E333V, S343L, T348K, and A365T); however, the substitutions in the South America-3 lineage appear in the new sequences and not in the previously reported 2012 sequences.

Colombian strains showed the same residue, isoleucine, at position 506 as vaccine strains belonging to the America-1 lineage. Several substitutions in Colombian strains were also present in other wild-type strains reported in different lineages: S22R (America-2), V41I (European Wildlife), N128S (Africa-1), K281R (America-2), G314S (Asia-2 and America-2), and I315V (European Wildlife). All Colombian South America-3 lineage sequences carried asparagine at position 530, a highly variable residue linked to interspecies transmission of the virus (McCarthy et al. 2007). The South America-4 sequences included a serine at position 530, while North America-4 displayed an aspartic acid in this position. The H sequences of the 2017 “South America/North America-4” lineage displayed the following unique substitutions: Q5R, L38S, T193I, V198I, V235I, T291M, E333V, H339D, S341L, T348K, and F353I.

In the present study, three sequences from three dogs that had not been completely vaccinated were grouped in the “South America/North America-4” lineage; one strain had a very extensive branch (Fig. 1; sequence Mde-19a-CO-2017), while another was shown to be related to CDV strains circulating in North America (Fig. 2; sequence Mde-18a-CO-2017), and the final strain, 1aM-CO-2017, was shown to belong to the South America-3 lineage (Fig. 2). A comparison of the linear hemagglutinin noose epitope (HNE) between the vaccine strains and these Colombian strains showed the presence of multiple substitutions (Supplemental material Fig. S2). Sequence 18a-CO-2017 presented the following substitutions: A367V, E372D, G376N, and T386S, while strain 1aM-CO-2017 of the South America-3 lineage presented A367V, G376N, and T386S.

Amino acid analysis of the Fsp peptide

The Fsp peptide possesses 95 variable amino acids (of 135 total). We found 16 substitutions in its sequence that are exclusive to the South America-3 lineage: S9P, T32I, A35S, T40P, D54N, R55K, S58N, Y59H, M61T, R67F H80C, H83R, I102S, Q115H, C116F, and L129F. In the Fsp peptide of the “South America/North America-4” lineage, 20 exclusive substitutions were found: D28S, E29G, A35T, N62D, S71G, H80R, V94I, N108T, S112P, S114P, and K134E. Furthermore, a different set of amino acid substitutions with a specific geographic pattern were found only in South America-4: T13V, T40K, S45F, S58Q, V79I, S95P, R105W, and G113C. In the most identical strains “South America/North America-4” we only found the substitutions T13M and N76S, indicating autapomorphic characteristics.

CDV H and Fsp glycosylation analysis

Potential glycosylation sites for the CDV H protein from the South America-3 lineage have previously been reported17. No new potential glycosylation sites were found in the 2017 samples belonging to this lineage in new sampling areas in Colombia. For the “South America/North America-4” lineage, we identified the presence of eight potential glycosylation sites (NXS/T) at positions 19–21, 149–151, 309–311, 391–393, 422–424, 456–458, 587–589, and 603–605, which are common to other lineages. The previously reported Asia-1 potential glycosylation sites at positions 584–58641 were not present in the sequences of the “South America/North America-4” lineage.

Regarding the potential glycosylation analysis of the Fsp peptide of the South America-3 lineage, we found the presence of two potential glycosylation sites (NXS/T) at positions 62–64 and 108–110 that are common to the other lineages. However, no potential glycosylation sites were found in the any of the “South America/North America-4” sequences.

CDV H and Fsp sites under positive selection

We evaluated non-neutral selection using the FUBAR method. For the H gene, we found that sequences harbored three sites under positive selection: 522, 549, and 582, with a posterior probability of 0.9 and a Bayes factor of 28.3, 220.9, and 44.7 respectively. We also found 247 sites under negative selection with a posterior probability of 0.9 and Bayes factor < 1.

In the Fsp-coding region, we found nine sites under positive selection using the FUBAR method: 21, 39, 46, 51, 79, 98, 99, 101, and 102, with a posterior probability of 0.9 and a Bayes factor of 67, 306, and 88 in sites 21, 76, and 98, respectively. Conversely, we found five sites under negative selection: 45, 83, 89, 121, and 133, with a posterior probability of 0.9 and a Bayes factor < 1.

H gene and Fsp phylogeography

For this analysis, the vaccine sequences were eliminated since there were no exact dates of isolation for these strains, which would have produced biases in the substitution rates. Table 4 shows the evolutionary model for each gene. Also, Table 5 shows the estimated TMRCA for the H gene. The South America-3 and “South America/North America-4” lineages have a TMRCA corresponding to 1964 and 1925, respectively (see Table 5). The phylogeography of the Fsp and H gene is shown in Figs 3 and 4, respectively.

Table 4 Evolutionary Parameters in Gene H and Fsp coding region.
Table 5 H gene molecular clocks in different CDV lineages.
Figure 3
figure 3

Time-measured Bayesian MCC tree for CDV Fsp fragment. Branches are colored according to the country color code in the upper left. Colombian sequences are depicted in red.

Figure 4
figure 4

Time-measured Bayesian MCC tree for CDV H gene. Branches are colored according to the country color code in the upper left. Colombian sequences are depicted in red.

Discussion

Phylogenetic characterization of CDV is performed on the basis of the H gene sequence because this gene shows high nucleotide variability in CDV between field strains and vaccine strains in comparison with other paramyxoviruses42,43. Using this method, it is accepted that two strains belong to the same lineage when their amino acid diversity is less than 4%42. Presently, there are 17 known lineages worldwide with a geographical distribution pattern. However, a fragment that codes for the Fsp of CDV has been suggested as an alternative for the classification of CDV strains since it is highly divergent and has given similar classification results to the H gene. Using this approach, two strains are considered to belong to the same CDV lineage if their amino acid divergence is less than 19%10.

In 2012, based on samples only from Medellin city17, it was characterized the South America-3 lineage by using the full H gene sequencing approach. However, it was not possible to compare the circulation of this CDV lineage with viruses from other regions of Colombia or in neighboring countries. In the present study, the Fsp-coding region was sequenced from samples obtained in Medellín between 2012 and 2017 and from other Colombian cities such as Bucaramanga (2016) and Bogotá (2015), establishing the wide circulation of the CDV South America-3 lineage in these regions (Fig. 1). In agreement with Sarute et al.10, the amino acid divergence among the strains (1.5–6.7%) showed that they belonged to the same lineage (data not shown).

Through the phylogenetic analysis of the Fsp-coding region of the present study, we observed that the sequences of the Ecuadorian strains and the former called “America-4” lineage formed a monophyletic group, evidencing a 12% amino acid divergence, which was supported by the table of distances (Table 3). This led us to the conclusion that those viral sequences belong to the same lineage. However, for confirming this finding, we try to amplify the H gene of the same samples. In the resulting phylogenetic tree, we observed a monophyletic group of the strains of the lineage America-4 And also the Ecuadorian strain, which harbored an amino acid divergence in the H gene of 2.8%, showing that these strains belong to the same CDV lineage. For this reason, we suggest calling this lineage “South America/North America-4.”

For comparing the topologies of the phylogenetic trees, we used the North America-1/Vaccine lineage as an outgroup. We observed that although the lineages characterized in this work are sister groups, they present different ancestry. In the H gene tree, the oldest clades are the Africa-1, Arctic-like, and Asia-2 lineages. From this node, the Asia-4, South America-3, and “South America/North America-4” lineages appear as sister groups. A polytomy emerges, from which lineages emerge as sister groups: North America-3 and South America-2; North America-2, European Wildlife, Asia-1, and Rockborn-like; and Africa-2 and Europe/South America-1.

In the Fsp-coding region tree, the ancestral clades of the characterized lineages arise from a polytomy from which sister groups originate: Arctic-like and Asia-2; Africa-1; North America-1 and North America-2; and Rockborn-like (Fig. 1). From this node, another polytomy emerges, giving rise to more sister groups as follows: the Europe/South America-1, South America-2, and Africa-2; and Asia-1, North America-3, South America-3, and “South America/North America-4.” In their 2013 study, Sarute et al. described topological differences between the Fsp and H gene trees10. Although the topological structures they identified did not entirely correspond with those found in the present study (Figs 1 and 2), it should be taken into account that nowadays, there are a higher number of Fsp coding sequences available, representing most of the lineages characterized to date; besides, Fsp-coding region sequences could now be obtained from Genbank full CDV genomes. However standard phylogenetic studies still been done with the H gene, thus having a greater number of sequences available in the Genbank.

On the basis of the classification system for measles, a subgenotype consists of H gene sequences that have an amino acid identity of 98% and a high bootstrap value (>70%)28. On the basis of these criteria, both the South America-3 and “South America/North America-4” lineages present three subgenotypes. When we arbitrarily extrapolated this classification to the Fsp region, a different set of subgenotypes was found (subgenotypes A–E for South America-3 and A–C for “South America/North America-4”), although fewer than for Europe/South America-1, which reportedly contains at least eight subgenotypes (A–H), including CDV H gene sequences from at least seven different countries28.

In measles, different subgenotypes are not geographically restricted, although some appear to be mainly endemic in different areas of the world44. In the present study, it was not possible to determine the geographic pattern of CDV subgenotypes on the basis of the H gene as has been previously reported for the Europe/South America-1 and South African subgenotypes6,45 and for measles46. However, with the Fsp fragment, distribution patterns can be observed between regions (Fig. 1); subgenotype 3A circulates only in Medellín, while subgenotype 3D circulates only in Bogotá. Also, in the “South America/North America-4” lineage, subgenotype 4A was only reported in Colombian strains and subgenotype 4D in Ecuadorian strains. A higher number of CDV sequences collected from different areas within those countries would be necessary for better understanding of the circulation history of CDV subgenotypes in the Americas.

A temporary pattern of distribution has been reported for some of the Europe/South America-1 and South African subgenotypes as well as for measles virus genotypes and subgenotypes28,44,46. Our results showed a similar temporary pattern of distribution in most of the subgenotypes in both the South America-3 and the “South America/North America-4” lineages (Figs 1 and 2). These results must be carefully evaluated, as although they may show a temporary pattern of CDV distribution and a possible strain displacement pattern, sampling bias could be another possible explanation. Routine international determination of CDV lineages and subgenotypes plus molecular surveillance could be useful for gaining a more accurate epidemiological understanding of temporary CDV distribution.

The uncontrolled commercialization of puppies from South America in the USA could be the route of transmission of the “South America/North America-4” lineage in these two regions of the continent. It is important to highlight that this is the second lineage that is reportedly actively circulating in two different continental regions, the first being Europe/South America-120. It is imperative that wider phylogeographic studies of the “South America/North America-4” lineage are conducted to establish its origin and geographical spread throughout the American continent; it may have originated in Ecuador and spread through Colombia to the USA, or vice versa. Since CDV is a re-emerging infection in the USA, with at least five different lineages in circulation15, deeper phylogenetic analysis could help in gaining an understanding of the epidemiology of CDV on this continent.

In the present study, an amino acid divergence close to 11% in the H protein was observed between vaccine strains and the lineages South America-3 and “South America/North America-4” (Table 3). CDV is presently recognized as a single serotype47 as there is little evidence of antigenic divergence as a result of genetic divergence. Recently, significant differences were reported in the evaluation of neutralizing titers between “South America/North America-4” lineage strains and an America-1 type vaccine strain27. Given those results and the fact that multiple recognized CDV cases have been recorded in vaccinated animals17,21,28,48, it is necessary to perform wider, updated antigenic analyses of CDV for understanding the antigenic differences between the multiple worldwide circulating lineages and, potentially, to produce a vaccine update that includes most prevalent antigenic types.

In the positive selection analysis, we observed that the “South America/North America-4” lineage possesses a unique substitution (V79I) in the Fsp fragment at a site that is under positive selection; the South America-3 lineage also possesses a unique substitution (I102S) at a site under positive selection. This was determined using the FUBAR method, which assumes that the selection pressure for each site is constant throughout the phylogeny38. In this way, it was also determined that sites 98, 99, 101, and 102 of Fsp are under positive selection. These changes must be studied to understand the role of such substitutions in vaccine failures and interspecies host changes.

Likewise, we found in gene H sites 522, 549, and 582 under positive selection coinciding with site 549, which has been proposed as a key in the species barrier jump4, however sites 522 and 582 have not been previously reported related to pathogenicity, vaccine failure or species barrier jump, which should be deeply studied to understand the role of this sites in pathogenesis and inter-species transmission.

The linear HNE located at amino acids 364–392 of the CDV H protein is conserved among the morbilliviruses49. This is the region of the H protein that is recognized by antibodies50,51. Recently, it has been suggested that substitutions in this epitope may interfere with the ability of the vaccine to provide adequate protection against infection with wild-type viruses15. As reported recently in the previously called “North America-4” strains, we observed the presence of multiple substitutions in the HNE of CDV viruses in vaccinated animals (Supplemental Material Fig. S2). However, from the bioinformatic approach used in this study, we can only suggest that these substitutions could be interfering with the capacity of the vaccine to neutralize wild strains. For this reason, it is necessary to perform neutralization studies of vaccines with wild strains. Currently, structural biology studies of Colombian CDV viruses are underway in an effort to understand the role of structural substitutions in the HNE epitope of Colombian CDV viruses and their role in viral neutralization. On this way, it is important to consider thate glycosylation sites found in the Fsp region of the South America-3 lineage (62–64 and 108–110) could be involved in the evasion of the response or could be the result of epistatic interactions in the H gene52.

CDV has one of the highest-reported substitution rates in the Paramyxoviridae family (10.53–11.65 × 10−4 substitutions/base/year)43. Our results show that CDV circulating in Colombia exhibits high variability and includes two lineages and multiple subgenotypes (Figs 1 and 2 and Table 3). The temporary and geographical scope of our sampling was not sufficient to accurately explain the variability of CDV in the region. However, considering that the H gene has undergone genetic drift in different geographical regions26, we hypothesize that selective and nonselective processes may play important roles in the co-circulation of multiple lineages in an area, as has been reported previously52.

Phylogeographic analysis of the Fsp region and the H gene shows an evolutionary rate for the H gene similar to that reported by Fischer et al.53. However, the resulting two trees differ in their topology in such a way that the ancestry of the lineages of interest is very different (Figs 3 and 4). These differences in topology in comparison to other trees54 may be due to the fact that in the present analysis we excluded reported vaccine strains because the vaccine strains has been adapted to cell culture and have different evolutionary rates in comparison with CDV wild strains52; Also added three newly reported lineages, including “South America/North America-4.” By comparing both trees, we observed apparently, that the “South America/North America-4”lineage circulated first in Colombia and Ecuador, then in the United States, and again in Colombia (Figs 3 and 4). It is unclear if the variation in spatiotemporal sampling of the “South America/North America-4” lineage is more likely due to bias, as has been reported in other viral models55. In addition, since we suspected that most of the ancestral Colombian sequences of both trees are immune escape mutants, deeper analyses must be performed to avoid misleading results regarding the dynamics of the “South America/North America-4” CDV lineage.

Outbreaks of CDV occur in endemic and acute epidemic cycles, leading to transmission among susceptible host populations54. In the presence of full or partial vaccination, lifelong immunity could lead to the survival of the remaining coexisting lineages driven by nonselective epidemiological processes44,56. Our results, based on unvaccinated and/or incompletely vaccinated populations, support this hypothesis (Table 2).

In contrast with measles, the only natural host of which is humans, broad ranges of host species are susceptible to CDV infection, which results in complications in terms of selection pressure for this virus. It is noteworthy that the “South America/North America-4” lineage characterized in the USA was isolated from domestic dogs and foxes, indicating that this lineage has the ability to jump the species barrier3,21. In Colombia, there have been reports of CDV infection in wildlife57; however, no phylogenetic analysis has been performed in viruses from those infected animals.

Reported substitutions in circulating CDV protein H in wildlife include E276V, Q392R, R519I, I542F, and Y549H, i.e., sites that show the same substitutions in the South America-3 and South America/NorthAmerica-4 lineages, which indicates the potential of these viruses to jump the species barrier. However, there is no statistical association that demonstrates these hypotheses58.

In conclusion, we report the co-circulation of two CDV lineages in Colombia: the South America-3 lineage circulating in Medellín, Bucaramanga, and Bogotá, and the concurrent circulation of a new lineage not previously described in the country that mainly infects dogs in Medellín. The latter lineage is evolutionarily related to strains reported in domestic dogs in Ecuador and in domestic dogs and wildlife in the USA. Given the intercontinental circulation of this lineage, we propose to name it “South America/North America-4.”