Insight into the global evolution of Rodentia associated Morbilli-related paramyxoviruses

One portion of the family Paramyxoviridae is a group of Unclassified Morbilli-Related Viruses (UMRV) recently recognized in wild small mammals. At a global level, the evolutionary history of these viruses is not properly understood and the relationships between UMRV and their hosts still remain largely unstudied. The present study revealed, for the first time, that Rodentia associated UMRV emerged from a common ancestor in southern Africa more than 4000 years ago. Sequenced UMRV originating from different regions in the world, clustered into four well-supported viral lineages, which suggest that strain diversification occurred during host dispersal and associated exchanges, with purifying selection pressure as the principal evolutionary force. In addition, multi-introductions on different continents and islands of Rodentia associated UMRV and spillover between rodent species, most probably Rattus rattus, were detected and indicate that these animals are implicated in the vectoring and in the worldwide emergence of this virus group. The natural history and the evolution dynamics of these zoonotic viruses, originating from and hosted by wild animals, are most likely shaped by commensalism related to human activities.

The map was reconstructed using SPREAD 1 and visualized using Google Earth (http://earth. google.com). This figure is similar but not identical to the original image, and is therefore for illustrative purposes only.

Supplementary Text 1: Comparing partial-L-Gene to complete L-Gene sequences.
In order to determine if partial L-gene sequences can be used to conduct phylogenetic analyses at the level of viral species, we carried out a double phylogenetic analysis using a representative set of the Paramyxoviridae containing 179 different sequences and comparison of the associated phylogenetic tree to the one derived from the analysis of the polymerase locus across its full length (8373 base pairs). In Supplementary Fig. 1  problematic when using long sequences as if we consider an equal saturation process (substitution accumulation), the background noise could be prevailing and reduce the phylogenetic signal. Finally, the seven topological inconsistencies found between the two trees cannot be attributed to either of the trees. It is probable that both phylogenetic constructions are valid. Thus, we consider that the use of partial L-gene sequences is a viable alternative for Paramyxoviridae phylogenetic reconstruction and, in particular, given that for UMRV no complete sequence has been yet reported.
In order to consider phylodynamics in Paramyxoviridae, it is necessary to demonstrate that the partial L-gene region allows correct extrapolations of driving predictions, which we have already carried out on this viral family from orthologous genes different from the polymerase gene (Supplementary Text 2).
It has often been suggested that the use of sequences of a few hundred bases does not provide sufficient information to correctly infer phylogenetic relationships 4 . Even though in certain cases this point is justified, one should be cautious about such generalizations. The verification of the validity of the phylogenetic analyses using short sequences remains an appropriate approach, particularly when complete sequences are not available 5 . The most common challenge in molecular phylogenetic analyses is the discordance between phylogenies. Even if a simple addition of sequences at a genomic-scale helps clarify different phylogenetic inconsistencies, in some cases maybe insufficient to resolve relationships.
Philippe et al. 6  Consequently, genomic information may not be considered as an absolute, and hence, genomic level phylogenies for paramyxoviruses may not reflect their true evolutionary trajectory. Besides, in situations when the "background noise" confounds the genuine phylogenetic signal, which is often associated with RNA viruses because of their notably quick evolution, data has to be employed either with the strongest phylogenetic signal, or with shorter sequence lengths 9 . In the case of Paramyxoviridae, and particularly for the UMRV, recent studies have shown that the short region from the L Polymerase gene, namely partial Lgene, can provide satisfactory phylogenetic reconstruction at the genus level 10,11,12,13 . As discussed elsewhere by Horreo 14 , we consider the partial L-gene region as a useful "representative" region of the Paramyxoviridae for phylogenetic inference.

Supplementary Text 2: Estimate of the evolutionary rate change for the partial L-gene among the Measles virus.
As Rodentia PV presented in this study are very close Morbilli-related viruses 10,11,15 and for some authors probably could be considered as sister genera 13 Table 3). Besides, one might think that only 24 sequences of the MeV may lead to inaccurate estimates; however, to compensate this aspect, the L-gene may provide the most reliable evolutionary signal at the family level. The best-fit substitution model was K2 + G, but we used HKY85 + G the nearest close relative best-fit model available in Beauti Runs were carried out with chain lengths of 100-200 millions. The output from Beast was analyzed using the program Tracer 1.6 (http://beast.bio.ed.ac.uk/Tracer). Obtained mean rate was used to the subsequent UMRV phylogeographic analysis.
The majority of the set comprising the 145 sequences of UMRV felt in a too small time interval, introducing too much bias to determine empirically the evolutionary history and migration patterns of these viruses directly from our data set. This is due to the fact that different field campaigns to collect samples have all been conducted between 2009 and 2013.
However, it is reasonable to assume that determining the evolutionary rate change a virusrelated of the same genera or from a virus that proceeds to the same common ancestor could remain a good alternative and a reliable approximation in our case (L-gene). Parameter estimates were consistent among models and similar values were obtained for all the analyses.
The evolutionary rate of the current circulating MeV for the L-gene was estimated to be 3.48 x 10 -4 substitutions per site per year (95% HPD 3.94 x 10 -5 , 6.74 x 10 -4 ), and coalescent estimates place its recent emergence at around 1900 (~116 years from 2014, 95% HPD 30.46 to 245.53). The mean evolutionary rate and the t MRCAs derived from the partial L-gene were consistent and quietly similar to those already described for other various genes of the current circulating MeV. However, our values were slightly lower than those obtained for the structural H and N genes 18,19 , but it was not unexpected, being considered that the polymerase gene is the most conserved among the Paramyxoviridae to maintain crucial structural domains such as multiple functions related to different replicative mechanisms 13,20 . This molecular clock has been used for the phylogeographic analysis.