Introduction

Avian paramyxoviruses (APMVs) are a group of paramyxoviruses known to infect a variety of bird species across the globe. Till date, 21 species of APMVs have been identified and the list is expected to grow with increase in viral surveillance in wild and domesticated birds. With recent ICTV 2019 classification, these viruses now belong to three genera, Metaavulavirus (APMV-2, -5, -6, -7, -8, -10, -11, -14, -15 and -20), Orthoavulavirus (APMV-1, -9, -12, -13, -16, APV-A, APV-B and APV-C) and Paraavulavirus (APMV-3 and -4) within the new subfamily Avulavirinae under family Paramyxoviridae in the order Mononegavirales1. APMV-1 to -9 were isolated before 1980, APMV-10 to -13 were identified by 2015 and all the other APMVs were reported in the recent years2.

Avian Paramyxoviruses are enveloped with a single stranded, non-segmented, negative sense RNA genome of size 13 to 17 kb2. The prototype virus, APMV-1 of genus Orthoavulavirus, also well known as Newcastle disease virus or NDV, is the most extensively studied virus in this group. NDV causes severe economically important disease in poultry. There are five pathotypes of NDV based on the clinical signs exhibited by infected chickens: (a) viscerotropic velogenic or highly virulent, pantropic NDV causing severe mortality (b) neurotropic velogenic or highly virulent NDV, specifically causing neurological illnesses and high mortality (c) mesogenic or moderately virulent NDV, with mortalities as high as 50% and reducing egg production (d) lentogenic–either respiratory or enteric type NDV, low virulence and causing low reduction in egg production and (e) asymptomatic or avirulent NDV. However, the pathotype classification is not always clear-cut3. There are more than thousand strains of NDV which have been isolated, sequenced and found to exhibit wide spectrum of virulence. The viral RNA genome encodes for six genes arranged in tandem, each coding for six structural proteins, 3’-N-P-M-F-HN-L-5’4,5. N is nucleocapsid protein, each N protomer is known to bind exactly 6 nucleotides of genomic and antigenomic RNA of most paramyxoviruses thus imposing a hexamer phase on the entire RNA genome. In nature, the genomic length of paramyxoviruses is polyhexameric (6n + 0) which is found to be necessary for efficient replication and this is called the ‘rule of six’6,7. N together with P, phosphoprotein and L, large polymerase protein, forms viral RNA dependent RNA polymerase complex essential for viral genome transcription and replication; M, matrix protein, is seen within the envelope, aids in virus assembly and budding; two viral glycoproteins, F, fusion protein and HN, hemagglutinin-neuraminidase protein, are studded on the envelope and assist with fusion of virus with host membrane and receptor binding, respectively4. NDV F protein is known as the virulence determinant; the virulent strains have unique multiple basic amino acids, at least three arginine (R) or lysine (K) residues, at fusion protein cleavage site starting at amino acid position 113, and a phenylalanine residue at position 1173. APMV-6 is also known to express an additional small hydrophobic (SH) protein from SH gene located between F and HN genes8. Further, by co-transcriptional RNA editing of P gene, two mRNAs, V and W are expressed4,9,10,11,12. Also, in certain paramyxoviruses, by a process of alternative transcription initiation in P gene (+1 reading frame), accessory C proteins are generated13,14. Thus by these mechanisms, paramyxoviruses are able to efficiently utilize over 95% of their small RNA genome for expression of viral proteins12.

The P gene carries a slippery sequence, a stretch of adenosine (A) nucleotides and guanosine (G) nucleotides called the ‘editing site’ where insertions of 1 G or 2 G nucleotides occur during transcription of P gene by the stuttering viral polymerase that reiteratively reads the template base12,15. A single G nucleotide addition leads to +1 frameshift in the ORF, generating V mRNA with a frequency of 25 to 35% and two G nucleotides addition leads to +2 frameshift in the ORF generating W mRNA with a frequency of 2 to 8.5% and the unedited mRNA (60-70%) codes for P protein in NDV12,16. Among the paramyxoviruses, members of the subfamily Rubulavirinae and APMV-11 of genus Metaavulavirus, encode V protein from their unedited transcript, while P protein is coded by +2 frameshift and W protein is expressed by +1 frameshift17,18. The resulting three mRNAs from P gene (P, V and W mRNAs) share common N terminal sequences and differ both in length and amino acid composition in their C terminal region. Their specific functions are dictated by their unique C terminal sequences. Studies on V protein of APMV-1 and other paramyxoviruses have revealed that V protein is multifunctional, targets STAT1 degradation, interferes with MDA5, is interferon antagonist19,20,21,22, inhibits apoptosis23,24, assist in viral replication20,25 and plays important roles in tissue tropism, virulence determination11,20,26 and host range restriction21. On the other hand, there is very limited information about the function of W protein. The W protein of Nipah virus has been shown to impact viral pathogenesis and support the virus to evade the host immunity27,28,29. In APMV-1, the nuclear localization of W protein and its incorporation into the virion has been recently reported16,30.

The complete genome sequences of all 21 species of APMV have been described individually17,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62. A comprehensive comparative analysis of complete genome and structural genes of 20 species of Avulaviruses has been recently published2. Three clades of viruses were concluded based on phylogenomic analysis of 20 APMVs: Clade I included APMV-2, -5, -6, -7, -8, -10, -11, -14, -15 and -20 (currently classified under new genus Metaavulavirus), clade II comprised of APMV -1, -9, -12, -13, -16, APV-A, -B and -C (currently assigned under genus Orthoavulavirus) and clade III included APMV-3 and -4 (currently under new genus Paraavulavirus)1,2. One of the viruses, previously classified as APMV-17 (South Korean “avian paramyxovirus 17”) has now been proposed as a separate species (APMV-21) based on phylogenies of complete genomes, complete F and L genes, PASC and STD analysis2,63. Nevertheless, very little is known about the P gene edited accessory viral proteins of APMVs. We have examined and analyzed 55 viruses belonging to all 21 APMV species identified till date and discuss here the genetic diversity and molecular evolution of P gene edited proteins, V and W.

Materials and methods

Sequence information

The full length sequences of P gene available for all the 21 species of APMVs were obtained from National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/). Additionally, publications reporting the complete genome sequence of these viruses were also referred for identification of P gene editing site, for prediction of sequences of  V and W proteins. A total of 55 viruses belonging to 21 species of APMVs were analyzed in this study which included (their GenBank accession numbers are provided in Table 1) four strains each of avirulent APMV-1, moderately virulent APMV-1 and highly virulent APMV-1; eight isolates of APMV-6; five isolates each of APMV-2 and APMV-8; four isolates of APMV-10; three isolates of APMV-13; two isolates each of APMV-3, APMV-4, APMV-5; one isolate each of APMV-7, APMV-9, APMV-11, APMV-12, APMV-14, APMV-15, APMV-16, APMV-20, APMV-21 and APV-A, APV-B and APV-C. Detailed information of these isolates along with metadata such as hosts, year and location of isolation in addition to their sequence information are provided in Table 1. The complete sequences of P gene ORF, V proteins and W proteins used in this study that were either directly collected from NCBI or derived by prediction using DNASTAR software suite are provided in Supplementary Files S1, S2 and S3.

Table 1 Detailed information on V and W proteins of APMV species and strains analyzed in this study.

Sequence alignment, comparison and prediction of conserved motifs/domains

Multiple sequence alignments of V and W proteins were performed using the TCOFFEE multiple alignment algorithm, mode ‘expresso’ and the sequence similarities were colored through ESPript64,65,66. All residues/amino acid positions mentioned in the results and discussion correspond to APMV-1 strain KJ808820.1, the strain that appears first in the alignment file. Individual sequences were also analyzed in NCBI’s interface, conserved domain (CD)-search67 and aligned sequences were run in DREME version 5.0.5 software68 to identify conserved motifs/domains. The intraclade amino acid percentage identity was estimated using Megalign software from DNASTAR.

Prediction of Nuclear Localization Signal (NLS) and Nuclear Export Signal (NES) in V and W proteins

The nuclear localization signal (NLS) in V and W proteins of APMV species were identified using online tool, cNLS mapper with a cut-off score of 5.0 that predicted NLS specific to the importin αβ pathway69. The presence of nuclear export signal (NES) in V and W proteins of APMV species was predicted using online tool, NetNES 1.1 server that predicted leucine-rich NES using a combination of neural networks and hidden Markov models70 and using LocNES that predicted the classical NESs in CRM1 cargoes71.

Phylogenetic analysis and evolutionary divergence

Phylogenetic analysis was performed using MEGA7 software. For drawing the phylogenetic trees, evolutionary history was inferred by using the Maximum Likelihood method with JTT matrix-based model72 for V proteins and Dayhoff matrix based model for W proteins73. For drawing the phylogenetic tree of V proteins, bootstrap consensus tree inferred from 500 replicates was taken to represent the evolutionary history of the taxa analyzed74. Branches corresponding to partitions, reproduced in less than 80% bootstrap replicates, were collapsed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches. Initial trees for heuristic search were obtained automatically by applying Neighbor-Joining and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then topology with superior log likelihood value was selected. A discrete gamma distribution was used for V proteins tree to model evolutionary rate differences among sites (16 categories (+G, parameter = 2.4693)). The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 5.67% sites). For drawing the phylogenetic tree of W proteins, a discrete gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 1.1488)). The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 0.70% sites). Analysis of both the trees involved 55 amino acid sequences. All positions containing gaps and missing data were eliminated. There were a total of 141 and 71 positions in the final dataset for drawing V and W proteins’ phylogenetic trees, respectively.

The estimates of evolutionary divergence over sequence pairs between groups were analyzed for V and W proteins of all 21 APMV species using MEGA775. Briefly, based on the maximum likelihood fits of 56 different amino acid substitution models, the final analyses were conducted in MEGA7 using JTT matrix-based model72 for V proteins and Dayhoff matrix based model73 for W proteins. The rate variation among sites was modeled with a gamma distribution (shape parameter = 1). The analysis included 55 amino acid sequences. All positions containing gaps and missing data were eliminated.

Selection pressure analysis

The number of nonsynonymous substitutions per nonsynonymous site (dN), the number of synonymous substitutions per synonymous site (dS), and the dN/dS ratios for the nucleotide sequences of V and W proteins of all 21 species were analyzed for the entire sequence and also their shared N terminal and unique C terminal regions. The shared portion in the N-terminus of all the three proteins was considered up to the RNA editing site (KKG motif). The C terminal regions of  V and W proteins of all 21 species were considered after RNA editing site (KKG motif). The dN/dS ratio of 21 species of APMV nucleotide sequences were estimated by DnaSP v6.12.03 software76. The protein was considered under positive selection or diversifying when the dN/dS ratio is >1 and negative or purifying selection when dN/dS ratio <1.

Evolutionary rate analysis

To estimate evolutionary rates of different APMV species in V and W nucleotide sequences, the substitution rate analysis was performed by BEAST v 1.10.4 software77. The substitution model GTR and site heterogeneity model G + I was found to be the best by MEGA7 and was used here to study the substitution rate of  V and W sequences. The tree prior coalescent, constant size was used for individual and all the species. The uncorrelated relaxed clock with lognormal was implemented. The MCMC chain 4 × 108 cycles was used to reach the ESS value more than 200 to converge the data except for W proteins of APMV-8 strains where MCMC chain length of 2 × 108 cycles was used. The final data analysis was performed using tracer v 1.7.1 software.

Results

RNA editing site and prediction of  V and W protein sequences

Previous reports suggested identical P gene RNA editing sites for APMV-1, -2, -4, -5, -6, -7, -8, -9, -10, -12, -13, -15, -20, APV -A, -B and –C and varied P gene editing site sequences for APMV-3, -11, -14 and -16. We observed the following conserved pattern in the RNA editing sequences among APMVs: U3C6 for APMV-4; U4C4 for APMV-14; U4C5 for APMV-20; U5C3 for APMV -1 (all strains except KJ736742.1 and KJ808820.1), APMV-8 and APMV-10; U5C4 for APMV -1 (strains KJ736742.1 and KJ808820.1) and APMV-2; U5C6 for APMV-15; U6C3 for APMV -5 (strain GU206351.1), APMV -7, -9, -12, -13, -16, -19, -21; U6C2 for APMV-5 (strain LC168750.1); U6C4 for APMV -6, -17, -18 and UUCUUC5 for APMV-11. Further, variations were observed in the cis-acting sequence at the editing site: the sequences immediately upstream of the editing site were3’AA in APMV -2, -3, -4, -5, -7, -8, -10, -11, -14, -15, -17, -18, -19, -20 and3’GA in APMV -1, -9, -12, -13, -16 and -21 (all orthoavulaviruses) while3’AG was conserved among APMV-6 strains.

The hexamer phase of the start of the template C run in the P gene editing site (Table 1) revealed that this position was conserved within each species except in APMV-3 and for one strain of APMV-6 (KT962980.1). The hexamer phasing positions for APV -A, -B, -C were not determined as their genome lengths did not conform to ‘rule of six’. The start of the C run was at hexamer position 1 for APMV-20; at hexamer position 2 for APMV -4, -11 and -14; at hexamer position 3 for APMV -2, -3 (EU782025), -5, -8 and -10; at hexamer position 4 for APMV-3 (EU403085) and APMV-7; at hexamer position 5 for APMV-1, one strain of APMV-6 (KT962980.1), APMV -9, -12, -13, and -21 and at hexamer position 6 for APMV -15, -16 and all strains of APMV-6 except one strain (KT962980.1).

All APMVs except APMV-11 expressed P protein from unedited mRNA. The editing site of APMV-11 resembled that of other paramyxoviruses that insert 2G for generating P mRNA or the ‘genomic V’ viruses12. The P protein of APMV-11 is derived from 2G nucleotides insertion, W protein from single G nucleotide addition and unedited mRNA expresses the V protein17. The V and W protein sequences of the other 20 APMV species were predicted by insertion of single G and two G nucleotides at the P gene RNA editing site, respectively.

Amino acid sequence analysis: percentage identity and conservations

The V and W protein sequences of all 21 APMV species shared common N terminal region with P protein. The variations in the amino acid sequences were minimum within the first 60 amino acids in N terminal region. The N-terminal portions of P, V and W proteins of metaavulaviruses and orthoavulaviruses showed closer identity than paraavulaviruses. In ortho and paraavulaviruses, the N- and C- terminal regions showing high homology up to 93%. Metaavulaviruses showed 100% identity in all their C- and N- terminal regions. The C terminal region of W-protein sequences showed 0.0–100% identity, as both the amino acid composition and length variations at C-terminal portion for all the species were higher (Table 2). As described previously, the soyuz1 and soyuz2 motifs were observed within N terminal region in all APMVs except in APMV-3 strains78. Additionally, conserved domains (CD) were predicted in APMV-1 mesogenic strain Komarov (CD between aa 25 to 167) for large tegument protein UL36 (superfamily member PHA03247), in APMV-14 (CD between aa 31 and 144) for gene regulated by oestrogen in breast cancer- GREB1 (superfamily member, pfam15782) and in APMV-21 (CD between aa 53 and 120) for Tumor necrosis factor receptor superfamily member cd13415.

Table 2 Intraclade percentage amino acid sequence identity for P-gene products in APMV species.

Comparison of V protein sequences of APMV species

The V protein of APV-C was the shortest (221 aa, MW: 23.73 kDa) and that of APMV-21 was the longest (304 aa, MW: 31.98 kDa) among the 21 APMV species. The length of V protein (in aa) conserved within species was as follows: APMV-2 strains (232 aa), APMV-4 strains (224 aa), APMV-5 strains (277 aa), APMV-8 strains (238 aa), APMV-10 strains (246 aa) and APMV-13 strains (241 aa). However, in APMV-1, -3 and -6, variation in the length of V protein was observed between strains within the same species. Between species, the similarity in the V protein length was observed as follows: V protein length of 252 aa was observed in APMV-14, -15 and one strain of APMV-3; APMV-11 and -5 showed 277 aa long V protein; APMV -9 and -20 had 263 aa long V protein while the V protein length of APMV-16 and two lentogenic strains of APMV-1 were 245 aa. The lowest amino acid identity (10.4%) was observed between V proteins of APMV-3 strain Wisconsin and APV-A. The lowest amino acid divergence was noticed between APV-B and APV-C (70.4) and both showed identity of 53.7% at amino acid level which was the highest between APMV species (Supplementary File S4).

The multiple sequence alignment of V protein sequences of APMV species revealed higher amino acid conservations in both N (majorly in the first 60 amino acids) and C terminal regions (Fig. 1). All viruses in this study had the following conserved motifs similar to V proteins of other known paramyxoviruses (i) KKG motif in the N terminal region, which is the coding sequences at the P gene mRNA editing site (residues 132-134, corresponding to APMV-1 strain KJ808820.1, the first strain in the alignment file) except in APMV-3, APMV-4, APMV-12, APMV-13, APMV-14 and APMV-20 (ii) HRRE motif (residues 177 – 180, corresponding to APMV-1 strain KJ808820.1), (iii) WCNP motif (residues 195-198, corresponding to APMV-1 strain KJ808820.1) and (iv) conserved seven cysteine-rich domain. Interestingly, the following amino acids were also conserved in the C terminal region in majority of APMV species with few exceptions: Proline residues at five positions: (a) position 175 in ten species (except in APMV -3,-4,-6, -7, -8, -9, -10, -11, -15, -20 and -21), (b) position 198 in all APMVs, (c) position 202 (except in APMV-4), (d) position 207 in all APMVs and (e) position 218 in all APMVs; Glycine residues at three positions: (a) position 176 (except in APMV-4 and -6), (b) position 188 (except in certain strains of APMV-1 and APMV-7, -9, -11, -12, -13, -14, -15, -16, APV -A and -B), (c) position 215 (except in a single lentogenic strain, KM885162, of APMV-1, all strains of APMV-3, all strains of APMV-10 and interestingly in all these viruses, the glycine residue was replaced with arginine residue); Serine residues at two positions: (a) position 182 (except in APMV -4, -11, -20) and (b) position 194 (except in APMV -3, -4, -5, -6, -12, -14); Arginine residue at position 208 (except in APMV -3, -4, -5, -7, -8, -11, -13, -16, APV-A, -B and –C); Leucine residue at position 223 (except in APMV-5, -6, -7, -8 and -11) and Aspartic acid residue at position 227 (except in APMV-2, -3, -7, -9, -11 and -21). The percentage amino acid conservation in the C terminal region of V proteins of all 21 APMV species was between 30% (in the longest V protein, that of APMV-21) and 48% (in the shortest V protein, that of APV-C). The NLS were predicted in V proteins of APMV-5 and APMV-20 by cNLS mapper with a cut off score of 5.0. The NES were identified only in APMV-5 strains (Table 3b).

Figure 1
figure 1

Multiple sequence alignment of V proteins of 21 APMV species. The conserved motifs and conserved amino acids are highlighted. The dots represent the gaps in the alignment. Highly conserved motifs and amino acids are in red and highlighted. The C terminal region is considered beyond the conserved KKG motif.

Table 3 Predicted nuclear localization (NLS) and nuclear export signals (NES) in W (3a) and V (3b) proteins of 21 APMV species.

Comparison of W protein sequences of APMV species

The length of W protein varied between 125 and 227 amino acids (aa) with calculated molecular weights between 13.30 kDa and 24.38 kDa. APMV-3 strain Netherland and APMV-7 had the shortest W protein (125 aa) and two strains of APMV-1, mesogenic strain KX761866.1 and velogenic strain KJ808820.1 (227 aa) had the longest W protein.

The length (in aa) of W protein was conserved among all the strains of APMV-2 (207 aa), all the strains of APMV-4 (137 aa), all the strains of APMV-5 (187 aa), all the strains of APMV-10 (172 aa) and all the strains of APMV-13 (150 aa). Also, similarity in the W protein length was noticed between the following species: W protein length of 172 aa was observed in all strains of APMV-10 and all strains of APMV-8 except strain FJ215863.2; W protein length of 177 aa was deduced in one strain of APMV-1 (JQ015296.1) and 3 strains of APMV-6 (EU622637.2, AY029299.1, EF569970.1), while the W protein length of APMV-12 and APMV-5 were 187 aa. Variations in the W protein length between strains within the same species were observed in APMV-1 (227, 221, 196, 183, 179, 177, 137 aa), APMV-3 (125, 127 aa), APMV-6 (157, 162, 177, 197 aa) and APMV-8 (172, 203 aa). APMV-1 strains analyzed in this study, had the longest unique C terminal region when compared to other APMV species (Table 1 and Fig. 2).

Figure 2
figure 2

Comparison of C terminal region of W proteins of APMV species. The C terminal region is considered beyond the conserved KKG motif (please refer Fig. 1). Number of aminoacids in the unique C terminal is mentioned within parenthesis. The basic amino acids (R, K and H) are bolded and enlarged. Amino acid residues that are conserved between strains of the same species are bolded.

The lowest amino acid percentage identity (2.4) was observed between W proteins of APMV-12 and APMV-3 strain Netherland. Incidentally, APMV-3 strain Netherland also showed the lowest amino acid identity with W proteins of other APMV species. The lowest amino acid divergence (83.7) was noticed between APMV-1 isolate HN1007 (KX761866.1) and APMV-16, their W protein amino acid identity was 48.6% which was the highest homology observed between APMV species (Supplementary File S5).

The NLS were identified in five out of the twelve strains of APMV-1, in one of the eight strains of APMV-6 (GQ406232.1) and in APMV-9 in their C terminal region while NLS in W protein of APMV-20 was observed in the N terminal region (shared with P and V proteins). The presence of NES were predicted in these viruses except in one strain of APMV-1 (JQ015296.1) and APMV-20 (Table 3a).

Phylogenetic tree and evolutionary distance analysis

Based on their V protein sequences, phylogenetically the APMV species formed three distinct groups: group 1 consisted of APMV -3 strains, group 2 consisted of APMV -1, -9, -12, -13, -16, -21, APV -A, -B and -C (all orthoavulaviruses) and group 3 consisted of APMV -2, -5, -6, -7, -8, -10, -11, -14, -15, -20 and -4 (all metaavulaviruses and one paraavulavirus) (Fig. 3). The highest evolutionary divergence of 2.20 was observed between APMV-3 & APMV-4 and APMV-3 & APMV-12 followed by a divergence value of 1.91 between APMV-3 & APMV-5 and APMV-3 & APMV-11. The lowest divergence was between APV-A and APV-B (0.47) followed by APMV-9 & APMV-21 (0.50). The distance between the strains of the same species was noticed more in APMV-3 (0.4) followed by APMV-1 (0.297), which was further reiterated by their lower percentage of amino acid homology (Table 4).

Figure 3
figure 3

Phylogenetic tree derived from analysis of V proteins of APMV species by Maximum Likelihood method. The evolutionary history was inferred by using the Maximum Likelihood method based on the JTT matrix-based model. The bootstrap consensus tree inferred from 500 replicates is taken to represent the evolutionary history of the taxa analyzed. Evolutionary analyses were conducted in MEGA7. All the orthoavulaviruses clustered together, all metaavulaviruses grouped along with APMV-4 (Paraavulavirus) while APMV-3 strains (Paraavulavirus) formed a separate branch.

Table 4 Estimates of Evolutionary Divergence over Sequence Pairs between Groups, analyzed for V proteins of 21 APMV species.

The phylogenetic tree obtained from W protein sequences analysis showed clustering of strains of the same species (Fig. 4). The evolutionary distance analyses of W proteins of APMV species revealed that APMV-3 species is more divergent than other APMV species. The highest evolutionary divergence was noticed between APMV-3 & APV-C species (10.322) followed by APMV-3 & APMV-13 (8.594) and APMV-3 & APMV-12 (8.270). The lowest divergence was observed between APMV-9 & APMV-21 (0.400) followed by APV-A & APV-B (0.407). The distance between the strains of the same species was more in APMV-3 (0.619) followed by APMV-1 (0.256) which was also apparent from their lower percentage of amino acid homology (Table 5).

Figure 4
figure 4

Phylogenetic tree derived from analysis of W proteins of APMV species by Maximum Likelihood method. The evolutionary history was inferred by using the Maximum Likelihood method based on the Dayhoff matrix based model. The bootstrap consensus tree inferred from 500 replicates is taken to represent the evolutionary history of the taxa analyzed. Evolutionary analyses were conducted in MEGA7.The tree shows clustering together of strains of the same species.

Table 5 Estimates of Evolutionary Divergence over Sequence Pairs between Groups, analyzed for W proteins of 21 APMV species.

Selection pressure analysis

The dN/dS ratio was used to determine the natural selection pressure acting on the P gene edited products. The dN/dS ratio was estimated by DnaSP v6.12.03 for APMV species that comprised of more than one strain. The dN/dS values were significantly less than 1 for both V and W sequences (complete, N- and C-terminal regions) of most species explaining that they are under negative selection pressure. Only the C terminal region of V proteins of APMV-3 strains showed positive selection with dN/dS> 1 (Table 6).

Table 6 Selection Pressure Analysis of V and W proteins of 21 APMV species.

Evolutionary rate analysis

The substitution rate of the V and W nucleotide sequences of APMV species that comprised of more than two strains were estimated by uncorrelated relaxed clock with lognormal using BEAST software. APMV-10 comprised of four strains, which were 98.65% to 100% identical to each other and hence the substitution rate could not be determined. The substitution rate was highest in APMV-13 followed by APMV-2, APMV-6 for both V and W proteins. The overall substitution rate was 7.37 × 10−5 for V protein and 8.07 × 10−5 for W protein (Table 7).

Table 7 Evolutionary Rate analysis by Molecular Clock- Estimated nucleotide substitution rates for V and W nucleotide sequences of all 21 APMV species.

Discussion

Avian paramyxoviruses are known to infect a variety of bird species across the globe. Currently 21 species (previously called as serotypes) of APMVs are characterized and more viruses could be identified in future with improved viral surveillance programs. Paramyxoviruses with their small genome have a unique strategy of maximizing their genomic information by expressing viral proteins through co-transcriptional RNA editing. This helps to avoid error catastrophe caused by higher mutation rates often associated with larger genomes. Additionally, these viruses follow the ‘rule of six’ for efficient replication. Though, detailed studies on APMV structural genes and their complete genomes are available, a comparative knowledge of their accessory proteins expressed through RNA editing is lacking. In this study, using bioinformatics approach, we analyzed the P gene editing site, predicted and studied the protein sequences of edited products- V and W, of all 21 APMV species (55 viruses) known till date.

The hexamer phasing at the P gene editing site within each virus group is conserved7. We observed conserved hexamer phasing between certain APMV species and also, within each APMVs except in APMV-3 and for one strain of APMV-6. The hexamer phase is known to regulate the mRNA editing pattern, though subtle, it is important; for example, in human and bovine parainfluenza virus type 3 (PIV-3) in which the hexamer positions at P gene editing site are 2 to 5, higher mRNA editing frequency (~70%) and more number of G insertions (1 to 6 at equal frequencies) are observed while least mRNA editing ~30% with only 1 to 3 G insertions occur in Sendai virus wherein hexamer phase position is 1 at the P gene editing site 7,79,80. Based on the hexamer phasing position, it is anticipated that, in all APMVs except in APMV -20, the editing frequency could be extensive with possibilities of more number of G insertions. However, the cis-acting sequence of P gene editing site in APMV-20 (3’AA), suggests higher mRNA editing frequency and increased number of G insertions as reported in human and bovine PIV-381,82. Thus APMVs seem to follow PIV-3 RNA editing phenotype.

Another interesting observation is the unique editing site sequence of APMV-11 (3’A4UUCUUC5), in which the unedited mRNA translates to V protein and it has been suggested that 2G insertions in mRNA translates to P protein17. In rubulaviruses with P gene editing site sequence of3’A3UUCUC4, realignment of the nascent mRNA/template hybrid during 1G insertion would mean non permissible A:C base pairing hence the minimum insertion expected is 2G83. The base pairing between the nascent chain and the template genome to form a hybrid is important to prevent transcriptional slippage by the polymerase80. Similarly, in APMV-11, 1G and 2G insertions would lead to unstable A:C base pairing, hence 3G insertion (V protein) could be the minimum number of insertions expected, also, while a 4G insertion would translate to W protein, a 5G insertion would lead to P protein synthesis. It needs to be explored if APMV-11 expresses more V protein (from both unedited mRNA and 3G insertions) than other paramyxoviruses. Furthermore, it will be interesting to study if deletions in addition to G insertions could happen in APMV-11 and other APMVs with longer C runs at the editing site as described previously with recombinant Sendai virus and PIV-3 minigenomes83.

Three factors, the editing site (sequence and the length of C runs), the type of sequences immediately upstream of the editing site (cis-acting sequence) and the hexamer phase positions are known to decide the editing phenotype (i.e. number of G insertions, deletions and frequency of mRNA editing) which further can influence the virus pathogenicity81,82,84,85. Among APMVs, variations are observed in (i) P gene editing site, (ii) hexamer phase at the editing site and also (iii) the sequences immediately upstream of the editing site, all of which will determine the expression levels and relative proportions of P, V and W proteins in the APMVs which in turn could explain their differences in replication, pathogenicity and virus-host interactions.

With respect to the length and amino acid composition of V and W proteins, there were huge variations between species than within species, which was also reiterated by their dN/dS estimates. The V proteins were more conserved than W proteins. Higher sequence identity for V proteins was observed between the strains of the same species (exception was APMV-3) more often than between species. Phylogenetically, V protein analysis of APMV species grouped viruses similar to the individual gene-based phylogeny2, all the members of genus Orthoavulavirus clustered into one group and all members of genus Metaavulvirus along with one of the avian paraavulaviruses (APMV-4) formed the second group while the avian paraavulavirus, APMV-3, formed an outgroup. This was further affirmed by evolutionary distance analysis. The phylogenetic analysis and the evolutionary distance data of both V and W proteins clearly showed that APMV-3 strains are the most divergent.

The N terminal region of V and W proteins which is shared with P protein, showed highest conservation among APMVs. In their C terminal region, the V proteins of paramyxoviruses carry conserved arginine and isoleucine residues upstream of highly conserved seven cysteine residues (zinc binding domain), known to play important roles in MDA5 interference, STAT1 degradation and blocking interferon signaling to evade host immunity86,87,88. The V proteins of all 21 APMV species analyzed in this study had the seven cysteine residues and remarkably, many other amino acids were also conserved in their C terminal region. Though similar observations have been made earlier in other paramyxovirus V proteins, their functional importance is unknown yet86.

The V and W genes were found to be under negative selection pressure with dN/dS <1 in all the species. This shows the conserved nature of the non-structural viral proteins within the species and probably indicates their functional importance, which is yet to be completely explored. Furthermore, the substitution rates of APMV species determined by molecular clock was varying across the species and was higher between species than within species. The substitution rates for W proteins was higher than V proteins except for APMV-1, where more strains were available for comparison. Though the substitution rate was slightly higher for V nucleotide sequence (5.15 × 10−4) of APMV-1 compared to W nucleotide sequence (1.0 × 10−4), it did not lead to changes in amino acid sequence as evident from dN/dS ratio estimates suggesting negative selection pressure. The higher conservation of V protein sequence implies its significant role in virus biology such as replication, pathogenesis and immune evasion.

In contrast to V proteins, the APMV W proteins were highly disordered, showed little sequence conservation when compared to V proteins and their divergence values were higher when compared to V proteins. The evolutionary data analysis of W proteins suggested higher sequence identity among strains of same species and higher variability between species. There were no conserved sequences or motifs in the C terminal region of W proteins except that most them carried large number of basic amino acids suggesting W protein to be highly basic as described previously12. The exceptions were one strain of APMV-1 (KJ736742.1), APMV-4, APMV-7 and APV-C, and all of them had shorter W protein length. This genetic diversity seen in the W proteins may determine the degree of pathogenesis, variable interferon antagonistic activity and the wide host range exhibited by the APMV species.

The likelihood of W mRNA occurrence could be less than that of V mRNA, because of two unstable base pairing created by the two mismatches (2G) during polymerase stuttering. This skepticism becomes more compelling and doubts rise to whether W protein is expressed at all in those APMV species whose predicted W protein sequences have only fewer amino acid residues in their C terminal region- single (in APMV-7) or two (in APMV-4, APMV-13 & APMV-15) or three (in APMV-1 isolate R75/98) or four (in APMV-6, JX522537.1 & APV-C) or five (in APMV-5 & APMV-16) amino acids. However, equal or higher frequencies of insertions of 1G and 2G during RNA editing have been reported in certain paramyxoviruses such as Nipah virus and Bovine Parainfluenza virus type 389,90.

The presence of W mRNA of APMV-1 was first accounted in 1993 with a frequency of about 10%12, and the W protein expression from APMV-1 lentogenic strain Clone 30 and from APMV-1 lentogenic strain La Sota and velogenic strain SG10 was recently confirmed16,91]. We had earlier shown that W protein of APMV-1 mesogenic strain Komarov compartmentalized in the nucleus using plasmid system30, while the same has also been documented during virus infection in cells in the above two studies. Here, we report NLS and NES of W proteins predicted only in certain APMV species, also, we could identify NLS only in five out of twelve strains of APMV-1 implying that the not all the W proteins of APMV-1 strains localize in the nucleus. The W protein sequence analysis of nearly 1000 strains of APMV-1 in our lab show variations in the W protein length between strains (unpublished data),which was also reported recently in an analysis of 286 strains of NDV91, furthermore, W proteins of only about 50% of the strains analyzed by us are predicted to localize into the nucleus (data not shown) leading us to speculate that these differences in W proteins can attribute to the wide spectrum of pathogenicity and virulence observed in Newcastle disease.

Among paramyxoviruses, the W protein of Nipah and Hendra viruses, are the most well characterized. The nuclear localization of W protein of Nipah virus was found to modify p53 expression and activity92, sequester inactive STAT1 within nucleus93, prevent IRF3 phosphorylation, inhibit IFN signaling mediated both by the virus and TLR-329, modulate host immunity, influence the disease course and viral pathogenesis specifically neurovirulence27,28,94. Intriguingly, neither the lack of W protein nor its cytoplasmic localization in APMV-1 strain clone 30 had any effect on viral replication in cell culture16. Though no conserved motifs could be identified between the W proteins of APMVs and Nipah virus, it will be interesting to study if similar roles are executed by W proteins of any or all the APMV species. To our knowledge, this is the first comprehensive and comparative evolutionary study of the P gene edited accessory viral proteins of APMVs. The information obtained by this study will enable designing future studies to understand the specific functions of conserved motifs/amino acids of V and W proteins and decipher their evolutionary significance on the virus and as well as on the host.