Abstract
Tomato brown rugose fruit virus (ToBRFV) poses a significant threat to tomato production worldwide, prompting extensive research into its genetic diversity, evolutionary dynamics, and adaptive strategies. In this study, we conducted a comprehensive analysis of ToBRFV at the codon level, focusing on codon usage bias, selection pressures, and evolutionary patterns across multiple genes. Our analysis revealed distinct patterns of codon usage bias and selection pressures within the ToBRFV genome, with varying levels of genetic diversity and evolutionary constraints among different genes. We observed a transition/transversion bias of 2.07 across the entire ToBRFV genome, with the movement protein (MP) gene exhibiting the highest transition/transversion bias and SNP density, suggesting potential evolutionary pressures or a higher mutation rate in this gene. Furthermore, our study identified episodic positive selection primarily in the MP gene, highlighting specific codons subject to adaptive changes in response to host immune pressures or environmental factors. Comparative analysis of codon usage bias in the coat protein (CP) and RNA-dependent RNA polymerase (RdRp) genes revealed gene-specific patterns reflecting functional constraints and adaptation to the host's translational machinery. Our findings provide valuable insights into the molecular mechanisms driving ToBRFV evolution and adaptation, with implications for understanding viral pathogenesis, host-virus interactions, and the development of control strategies. Future research directions include further elucidating the functional significance of codon usage biases, exploring the role of episodic positive selection in viral adaptation, and leveraging these insights to inform the development of effective antiviral strategies and crop protection measures.
Similar content being viewed by others
Introduction
The agricultural sector is a crucial pillar for global food security, providing sustenance for the world's burgeoning population. Among the myriad challenges faced by the sector, plant diseases caused by viruses are particularly pernicious, posing a grave threat to crop yields and quality. In the realm of viral plant pathogens, the Tomato brown rugose fruit virus (ToBRFV) has emerged as a particularly troublesome entity due to its adverse impact on tomato crops around the globe1,2,3. The virus, a member of the Tobamovirus genus, has been responsible for significant fruit losses, leading to economic hardship for farmers and stakeholders within the agricultural supply chain4. The battle against ToBRFV is a testament to the ongoing struggle to maintain crop health and productivity in the face of evolving plant diseases.
The genetic diversity of ToBRFV and its mechanisms of resistance is of paramount importance in the formulation of effective management strategies aimed at mitigating the virus's impact on tomato production5,6. This introduction serves to illuminate the critical role that genetic analysis plays in deciphering the virus's modus operandi and in the development of resistant tomato cultivars. A robust understanding of the genetic underpinnings of ToBRFV is essential for tackling the challenges posed by this virus7.
The genome of ToBRFV consists of a single-stranded RNA molecule encoding four open reading frames (ORFs), including the coat protein (CP), movement protein (MP), RNA-dependent RNA polymerase (RdRp), and a protein of unknown function. The CP, MP, and RdRp genes play crucial roles in viral replication, movement, and pathogenicity. Consequently, analyzing these genes can provide insights into the genetic variability and evolutionary relationships among different ToBRFV isolates2,6.
Given the virus's capacity to evolve rapidly and adapt to different environments, it is vital to investigate the genetic diversity within the ToBRFV population. The study of nucleotide substitutions in the ToBRFV movement protein gene, particularly those associated with the breaking of resistance in wild tomato species such as Solanum habrochaites and Solanum peruvianum, underscores the virus's ability to circumvent the genetic defenses of its hosts8. The emergence of such resistance-breaking strains necessitates constant vigilance and an adaptable approach to resistance breeding.
In this study, we conducted a comprehensive analysis of 215 whole-genome sequences of ToBRFV obtained from the National Center for Biotechnology Information (NCBI). Our analysis focused on elucidating the phylogenetic relationships among these isolates, assessing the genetic diversity within the viral population, and characterizing the molecular evolution of key genomic regions. Our study provides a comprehensive analysis of the genomic diversity, evolutionary dynamics, and molecular characteristics of ToBRFV. These findings contribute to our understanding of viral evolution and adaptation, with implications for disease management strategies and breeding programs aimed at enhancing tomato resistance to ToBRFV infection.
Materials and methods
Sample collection and genome sequencing
A total of 215 whole-genome sequences of ToBRFV were obtained from the National Center for Biotechnology Information (NCBI) database from Asia, Europe, Africa and America (Supplementary data 1). These sequences were derived from diverse geographical locations and represented a broad spectrum of ToBRFV isolates. Each sequence was carefully curated to ensure high-quality data for downstream analysis. We filtered unverified sequences in NCBI and sequences with gaps and ambiguous nucleotides.
Sequence alignment and phylogenetic analysis
The amino acid sequences of three key genes MP, CP and RdRp were extracted from each ToBRFV genome sequence. These sequences were concatenated to create a composite sequence representing the entire coding region. Multiple sequence alignment (MSA) was performed using MUSCLE to align the concatenated sequences, ensuring accurate positioning of homologous amino acids.
Phylogenetic analysis was conducted using the maximum likelihood (ML) method implemented in MEGA 119. The best-fitting substitution model was selected based on the Bayesian Information Criterion (BIC), and phylogenetic trees were constructed with 1000 bootstrap replicates to assess the robustness of the inferred tree topology. Bootstrap values ≥ 60 were considered statistically significant, providing confidence in the branching patterns.
Ancestral sequence reconstruction and evolutionary time estimation
Ancestral sequence reconstruction was performed to infer putative ancestral sequences using the ML method. This analysis aimed to identify the most likely ancestral states at each internal node of the phylogenetic tree, allowing us to trace the evolutionary history of ToBRFV. The evolutionary divergence over time was estimated using the UPGMA (Unweighted Pair Group Method with Arithmetic Mean)10 method, which provided insights into the temporal dynamics of ToBRFV evolution.
Genetic variation analysis
To assess genetic variation between ToBRFV isolates, several analyses were conducted. We aligned all sequences using Clustal Omega version 1.2.211. Then, the number of single nucleotide polymorphisms (SNPs) was determined for each gene and the entire genome, highlighting regions of genetic diversity using Geneious Prime 2023 with default parameters in the “Find Variation/SNPs” tool (https://www.geneious.com). Furthermore, the lengths of the CP, MP, and RdRp genes were compared to identify variations in gene size and potential functional implications.
To identify and visualize SNPs in amino acid sequences, we utilized the Biopython library (version 1.84) in Python 3 to analyze the multiple sequence alignment (MSA) data. The MSA was stored in a fasta file. The analysis was performed using a custom Python script, which first loaded the alignment data using the AlignIO module from Biopython. The script then iterated over the alignment columns to identify positions where at least one SNP was present, defined as columns containing more than one unique amino acid residue. For each SNP position identified, the script recorded the position and the number of unique amino acids observed at that position. This information was used to generate a bar plot using the Matplotlib library (version 3.7.2), with the x-axis representing the SNP position and the y-axis representing the number of unique amino acids (i.e., SNP frequency) at each position. The plot was designed to facilitate easy interpretation of the SNP distribution across the amino acid sequence. The resulting plot was saved as a PDF file with a resolution of 300 DPI for inclusion in this manuscript. The Python code used for this analysis is in Supplementary data 2.
Evolutionary divergence analysis
Average evolutionary divergence over sequence pairs within and between groups was calculated using MEGA 11. This analysis provided quantitative measures of genetic distance, allowing us to assess the degree of divergence among ToBRFV isolates. By comparing divergence levels within and between groups, we gained insights into the genetic relationships and population structure of ToBRFV.
The number of base substitutions per site from averaging over all sequence pairs within each group and between groups is shown. Analyses were conducted using the Maximum Composite Likelihood model. This analysis involved 215 nucleotide sequences. All ambiguous positions were removed for each sequence pair (pairwise deletion option). There was a total of 6395 positions in the final dataset.
The estimated Transition/Transversion bias (R) and substitution pattern and rates were estimated under the Kimura (1980) 2-parameter model. For estimating ML values, a tree topology was automatically computed. The maximum Log-likelihood for this computation was − 19,082.264. This analysis involved 215 nucleotide sequences. There was a total of 6395 positions in the final dataset.
Genetic distance matrix
The genetic distances between ToBRFV sequences from various geographic locations were calculated and organized into a matrix using MEGA 11. This matrix served as the input data for the Principal Component Analysis (PCA). To explore the independence of genetic distances from geographic distances, PCA was performed on the genetic distance matrix using a code in Python 3 (Supplementary data 2). The genetic distance matrix was organized as a nested list, where each row and column corresponded to a country, and each value represented the genetic distance between the respective countries. The countries included in the analysis were Jordan, Peru, Israel and the State of Palestine, Germany, Mexico, Italy, United Kingdom, Canada, Greece, Netherlands, Egypt, USA, China, Turkey, and Belgium. The nested list was converted into a Panda DataFrame for ease of manipulation and analysis. DataFrame had countries as both row and column labels. PCA was performed using the PCA class from the sklearn.decomposition module in Python. The analysis was conducted to reduce the dimensionality of the data to two principal components, which explained the most variance in the genetic distances. The results of the PCA were visualized using a scatter plot, where each point represented a country. The countries were color-coded for clarity, and the first two principal components were plotted to illustrate the genetic relationships.
Selective pressure analysis
To investigate selective pressures acting on ToBRFV genes, the ratio of nonsynonymous to synonymous substitutions (dN/dS) was calculated for the CP, MP, and RdRp genes. This analysis, performed using the Datamonkey web server (https://www.datamonkey.org/meme/)12, assessed the relative contributions of positive selection, purifying selection, and neutral evolution to the genetic diversity of ToBRFV.
MEME (Mixed Effects Model of Evolution) estimates a site-wise synonymous (α) and a two-category mixture of non-synonymous (β with proportion p-, and β + with proportion [1-p-]) rates, and uses a likelihood ratio test to determine if β + > α at a site. The estimates aggregate information over a proportion of branches at a site, so the signal is derived from episodic diversification, which is a combination of the strength of selection [effect size] and the proportion of the tree affected. A subset of branches can be selected for testing as well, in which case an additional (nuisance) parameter will be inferred from the non-synonymous rate on branches NOT selected for testing.
Recombination analysis of ToBRFV isolates
We investigated the potential for recombination events among ToBRFV isolates. We performed a whole genome recombination analysis using GARD, a genetic algorithm for recombination detection (https://www.datamonkey.org/gard)13. All available complete genome sequences of ToBRFV isolates from this study were included in the analysis. Default parameters were employed for the recombination detection algorithms.
Codon usage bias analysis
A total of 215 sequences encoding the MP, CP, and RdRp of ToBRFV were retrieved from NCBI in FASTA format. These sequences underwent preprocessing, including sequence trimming, quality assessment, and the removal of non-coding regions. Codon usage bias was analyzed using the R programming language (R-Studio version 9) with the coRdon package (Supplementary Data 214). Codon frequencies were calculated to determine the prevalence of each codon across the dataset. Relative Synonymous Codon Usage (RSCU) values were estimated to assess the non-uniform usage of synonymous codons, while the Codon Adaptation Index (CAI) was computed to measure bias towards codons associated with highly expressed genes. The Effective Number of Codons (ENC) was calculated to evaluate overall codon usage bias. Additionally, GC content, including GC3S (GC content at synonymous third positions), was determined. Descriptive statistics, including mean, median, standard deviation, and range, were calculated for CAI, ENC, GC content, and GC3S, and their distributions were visualized using histograms. The results were then exported as CSV files for further interpretation, with a focus on their implications for gene expression and evolutionary dynamics within the ToBRFV coding regions.
Result and discussion
Comprehensive phylogenetic analysis of ToBRFV isolates
The circular phylogenetic tree representing 215 ToBRFV isolates based on their amino acid sequences revealed a complex network of evolutionary relationships. Set at a scale of 0.01, the tree allowed for a detailed analysis of genetic distances between sequences. Color-coded branches highlighted distinct phylogenetic clades, with the majority represented in blue, a specific subset (including Tomato mosaic virus amino acid sequences of CP, MP, and RdRp as outgroup) in yellow, Iranian isolates in dark blue, and red dots indicating bootstrap values for each node (Fig. 1). This visualization emphasized the remarkable genetic diversity within the ToBRFV population and provided a platform for discussing the observed evolutionary patterns and divergence in our study.
Interestingly, the phylogenetic analysis showed that isolates from different geographic locations were randomly distributed throughout the tree, without clear subgrouping based on geographical origin. This may suggest that the virus has spread globally in recent years, leading to increased genetic similarity among isolates from various regions. Our comprehensive tree construction, incorporating all coding regions of the virus (CP, MP, and RdRp), facilitated a thorough examination of the relationships between virus isolates. Notably, the Iranian isolates formed a distinct cluster, indicating the need for further investigation into their genetic characteristics and potential implications for virus spread and management.
Comparing our findings with previous studies that assessed genetic differentiation and migration patterns among ToBRFV populations from Europe, Asia, Africa and America, our results revealed differing trends. While Güller et al. reported high gene flow among geographic populations based on CP and MP gene domains, our study highlighted a lack of clear genetic differentiation between isolates from different regions. The absolute values of Fst among geographic populations being less than 0.33 and the high migration rates (> 1) observed in our study support the notion of extensive gene flow among ToBRFV populations from European, Asian, and American variants. Furthermore, the low values of Kst* and Z* metrics, and the non-significant p-values in pairwise comparisons for both gene regions, suggest a lack of significant genetic divergence between geographic populations. Particularly, the Snn metric results < 0 indicate minimal genetic differences between the populations, further supporting the concept of ongoing gene flow and genetic similarity among ToBRFV isolates globally15.
SNP distribution analysis reveals differential evolutionary pressures on ToBRFV genes
The SNP distribution plot provides a visual comparison of amino acid change across the three genes of the ToBRFV: MP, CP, and RdRp. Each column represents a location of amino acid change within the respective gene sequences. The density of amino acid changes is markedly higher in the RdRp gene but if we calculate amino acid changes compared to amino acid length for each gene, we found that MP shows more variable (0.218 Number of SNP/gene length) suggesting a greater variability and potential for evolutionary adaptation (Table 2). This disparity in amino acid change distribution offers insights into the differential conservation of these genes and may have implications for the virus’s infectivity and resistance mechanisms (Fig. 2).
Ancestral relationships of ToBRFV isolates
Our study utilized whole genome nucleotide sequences of ToBRFV (215 isolates) to construct a circular phylogenetic tree, offering insights into the virus’s evolutionary history. The tree, with a scale set at 0.01 to reflect genetic distances between sequences, indicates significant evolutionary events. The analysis revealed two distinct groups in green representing ancestors and Jordan isolates, while the yellow section denoted TMV as the outgroup. This phylogenetic analysis not only unveiled ancestral relationships but also suggested potential evolutionary pathways of the virus, illuminating its genetic makeup and transmission dynamics (Fig. 3). Confirming the findings of Salem et al.2, our results supported Jordan as the origin of the virus, emphasizing the importance of investigating transmission dynamics, possibly through seed dispersal.
Interestingly, the Iranian isolate displayed a distinct clad from other isolates, signaling the presence of a potentially new strain or sequencing artifact16. More variation in the Iranian isolate compared to other isolates was shown in the RDRP gene where more SNP were discovered. CP and MP were more like other isolates. This observation underscores the necessity for comprehensive genomic screening to identify any novel changes that could lead to the emergence of more destructive biotypes or strains. Further investigation and sequencing of this isolate are recommended to elucidate its unique genetic characteristics and implications for viral spread and management strategies.
Moreover, insights from previous studies shed light on the emergence and evolutionary history of ToBRFV. Analysis by Yan et al. revealed a distinct clustering of tobamoviruses17, with ToBRFV and TMV forming sister branches, while ToMV and ToMMV grouped. Recombination events involving other tobamoviruses, as suggested by Salem et al.2, may have contributed to the origin of ToBRFV. The detection of a recombination event involving strains of TMV and ToMMV further underscores the complex evolutionary dynamics shaping the genetic diversity of ToBRFV6,18. Our analysis using the GARD algorithm did not find any evidence of recombination events among the ToBRFV isolates included in this study. A total of 2647 models were examined at a rate of 0.73 models per second. The alignment contained 638 potential breakpoints, leading to a search space of 638 models with up to 1 breakpoint. The genetic algorithm explored 414.89% of this search space.
Additionally, findings by Esmaeilzadeh et al. propose Peru as a potential center for the emergence of ToBRFV5, emphasizing the importance of studying isolates from diverse geographical locations. The sequencing of a ToBRFV genome from tomato seeds in Peru (MW314111) provided valuable insights into the genetic diversity of the virus, suggesting a potential origin in South America rather than the Middle East5. However, we used more whole genome nucleotide sequences which we believe our result (Fig. 3) is more reliable than others also, our results are confirmed by the origin of the first report of the virus2.
Tomato brown rugose fruit virus diversity
The table presents the estimates of average evolutionary divergence over sequence pairs within various geographic groups. These values represent the genetic variation within each group, with a higher number indicating greater divergence (Table 1). Isolates in each country were classified. Jordan, the United Kingdom, Canada, and Mexico exhibit relatively low divergence values (0.0016 to 0.0017), suggesting a high degree of genetic similarity among the ToBRFV sequences within these populations. Peru, Israel and State of Palestine, the USA, and Belgium show moderate divergence (0.0022 to 0.0027), indicating a fair amount of genetic variation. Germany, Netherlands, and China have higher divergence values (0.0029 to 0.0034), which could reflect a more diverse set of sequences or a longer period of viral evolution within these groups. Italy and Greece stand out with the highest divergence values (0.0039 and 0.0041, respectively), pointing to significant genetic diversity within the ToBRFV sequences from these countries. Egypt shows an exceptionally low divergence value (0.0001), which might suggest a recent introduction of the virus or a very stable viral population with little genetic change. Notably, Turkey has a divergence value of 0, indicating no detectable variation among the sampled sequences, which could be due to a very recent spread or a highly conserved virus population. The genetic data reflect a bottleneck caused by eradication efforts, indicating that the virus is still undergoing geographical expansion4. Despite the geographic diversity, ToBRFV isolates from different regions exhibit a high level of interrelatedness, with low genetic diversity and random mutations across genomes, attributed to the introduction of infected seeds4.
These divergence estimates are crucial for understanding the genetic landscape of ToBRFV across different regions. They provide insights into the virus’s spread, mutation rates, and potential adaptation to diverse environmental conditions or host varieties. This information can be instrumental in developing targeted strategies for monitoring and controlling the spread of ToBRFV.
The study aimed to analyze the transition/transversion bias and density of single nucleotide polymorphisms (SNPs) across different regions of the ToBRFV genome. The obtained data revealed interesting patterns in genetic variation among ToBRFV genes (Table 2).
Our analysis indicated a transition/transversion bias of 2.07 across the entire ToBRFV genome, with an SNP density of 0.198 per gene length. This suggests a higher frequency of transitions compared to transversions, in line with observations in RNA viruses4. Moreover, the CP gene exhibited a slightly higher bias of 2.43, indicating a similar SNP density to the overall genome but with a preference for transitions. In contrast, the MP gene displayed a significantly higher transition/transversion bias of 3.5 and the highest SNP density at 0.218, suggesting potential evolutionary pressures or a higher mutation rate in this gene19.
Interestingly, the RdRp gene showed a bias of 2.73 and an SNP density of 0.195, hinting at a higher rate of transitions compared to the CP gene. These varying biases across ToBRFV genes may reflect distinct evolutionary dynamics and constraints in each gene, underscoring the importance of considering gene-specific factors in mutational processes20.
Comparing our findings with previous studies, we observed a high level of genetic similarity among ToBRFV sequences, with up to 43 SNPs identified7. The CP gene emerged as the most conserved region, displaying low genetic variation and high conservation levels, possibly linked to elicitor recognition mechanisms in host plants21.
In contrast, the MP gene exhibited the highest nucleotide diversity, consistent with its role in overcoming plant resistance mechanisms such as the Tm-22 gene. Notably, specific amino acids in the MP gene have been identified as critical for evading host resistance, emphasizing the significance of genetic variation in viral adaptation22.
The analysis of the average evolutionary divergence between geographic groups of ToBRFV isolates revealed insightful observations regarding the genetic relationships and variability among different populations. Our findings indicate distinct patterns of divergence within various regions, shedding light on the evolutionary dynamics of ToBRFV. Notably, the PCA of the genetic distance data, as illustrated in the provided plot (Fig. 4), reveals significant differences among geographic groups. The PCA scatter plot shows the distribution of countries based on the first two principal components, with PC1 (33.67% variance) and PC2 (12.07% variance) together capturing 45.74% of the total variance in the genetic distances. Countries such as Jordan and Turkey are positioned close to each other, indicating low genetic distances and suggesting a close genetic relationship or recent common ancestry. In contrast, Peru, Germany, Italy, Greece, and Belgium are spread out further along the principal components, displaying higher genetic distances. This suggests a greater degree of genetic variation, possibly due to prolonged separate evolution or adaptation to diverse environments. Israel and the State of Palestine, Mexico, the United Kingdom, Canada, the Netherlands, Egypt, and the USA exhibit moderate divergence. These countries balance genetic similarity and diversity, reflecting intermediate positions on the PCA plot. China, positioned distinctly on the PCA plot, shows moderate to high divergence, suggesting a unique evolutionary path or a diverse set of isolates within the region.
Our results align with previous research indicating low gene flow and limited genetic variability in ToBRFV populations. The high nucleotide identities among isolates from different regions imply a common origin, possibly linked to contamination through infected seeds or the exchange of infected fruit between countries5.
Furthermore, Abrahamian et al.4 observed that ToBRFV diverges from neutral evolutionary theory, indicating the virus is not undergoing natural selection and that accumulated mutations are low-frequency and random. The virus appears to not undergo natural selection, with accumulated mutations being low-frequency and random. The divergence from neutrality is most likely caused by a population expansion of ToBRFV, supported by the absence of any structuring in the phylogenetic tree. These insights are crucial for ensuring the continued efficacy of current diagnostic tools4.
Codon-level analysis reveals episodic positive selection in the ToBRFV MP gene
Our study conducted a selection analysis on specific codons within the ToBRFV genome to investigate episodic positive selection and evolutionary dynamics. Utilizing a likelihood ratio test (LRT), we identified episodic positive selection primarily in the MP gene, with significant findings at codons 123 and 192 (Fig. 5). These findings suggest that certain codons within the ToBRFV genome are subject to episodic positive selection, which may be indicative of adaptive changes in response to host immune pressures or other environmental factors. The detection of these sites is crucial for understanding the evolutionary dynamics of the virus and could have implications for antiviral strategies.
For codon 123 (GTt > ACt), belonging to the first set of codons analyzed, we observed a non-synonymous rate (beta) of 1568.25 with a weight of 1.00, indicating strong positive selection. The LRT value of 8.712 (p = 0.0057) confirmed episodic selection at this site. Similarly, codon 192 exhibited a significant non-synonymous rate of 228.83, with an LRT value of 4.490 (p = 0.0491), signifying episodic positive selection (Table 3).
These findings suggest that certain codons in the ToBRFV genome undergo adaptive changes in response to environmental factors or host pressures, which are crucial for understanding the virus’s evolutionary dynamics and have implications for vaccine design and antiviral strategies (Table 3).
Comparing our results to previous studies, Güller et al. and Esmaeilzadeh et al. reported strong purifying (negative) selection on the MP and CP gene domains of ToBRFV. Our study corroborates these findings, indicating that negative selection is the predominant force shaping the evolution of the ToBRFV genome. Additionally, Çelik et al. highlighted strong negative evolutionary constraints on the ORFs of ToBRFV5,15,20 However, we found two coding regions of MP under positive selective pressures, suggesting potential adaptive changes in these regions.
Furthermore, our study aligns with the findings of Hak and Spiegelman and Yan et al. regarding specific residues in the ToBRFV MP gene involved in evading host resistance mechanisms. Hak and Spiegelman identified residues in the central region of MP critical for escaping recognition, while Yan et al. demonstrated the importance of residues H67, N125, K129, A134, I147, and I168 in evading Tm-22-mediated resistance19,22. Our identification of coding positions 123 and 192 in the central region of MP confirms the previous studies and suggests that these residues are also important in resistance breaking. This indicates that the evolutionary pressures on the MP gene may be driving adaptations that enhance the virus's ability to overcome host defenses. Our findings support the notion that specific residues in the MP gene are under positive selection and play a crucial role in resistance breaking. The identification of these adaptive changes highlights the importance of further investigating these coding regions to understand their role in the virus's ability to evade host resistance mechanisms. Future studies should focus on the functional implications of these residues to develop strategies for managing ToBRFV infections.
Codon usage bias analysis of tomato brown rugose fruit virus genome analysis
The comparative analysis of the codon usage bias in the MP, CP, and RdRp genes of ToBRFV revealed distinct patterns of codon usage bias and adaptation to the host’s translational machinery (Fig. 6, Supplementary Table 1–3). The codon usage patterns observed in our study align with previous research findings in plant viruses23,24. Our results indicate that there are significant differences in codon preferences among the genes, reflecting the functional importance and evolutionary pressures experienced by each gene.
In our study, the GC content in the CP gene was centered around 55%, suggesting a moderate GC bias that could influence codon usage and protein structure (Fig. 6). The Effective Number of Codons (ENC) values for CP indicated a moderate level of codon bias, reflecting the balance between mutational pressure and natural selection, as similarly observed in various plant RNA viruses25. The Codon Adaptation Index (CAI) peaks in CP at around 0.7, indicating a moderate level of adaptation to the host’s translational machinery26. In contrast, Gómez et al. reported that PVY genes had lower CAI values compared to their hosts, suggesting that PVY is less adapted to its hosts than ToBRFV, which highlights the variability in codon adaptation among different plant viruses27.
For the MP gene, the GC content peaked at about 50%, slightly lower than CP, potentially impacting the amino acid composition of the protein. The ENC distribution for MP suggested a similar level of codon bias as CP, while the CAI peaks at around 0.8 indicated a higher adaptation to the host’s translational efficiency compared to CP26.
In the RdRp gene, the GC content peaked slightly above MP, hinting at a potential for a more stable RNA structure. The broader distribution of ENC values for RdRp suggested less codon usage bias, indicating a more diverse set of codons used for encoding amino acids. The CAI for RdRp, although similar to MP, was slightly lower, suggesting a lesser degree of adaptation to the host’s translational machinery compared to MP26.
Our findings support the idea that codon usage patterns are influenced by a combination of factors, including mutational bias and translational selection, rather than solely translational selection23,24,28. He et al. highlighted that these factors, along with gene length, secondary protein structure, and selective transcription, play significant roles in shaping codon usage bias in plant RNA viruses25. This is corroborated by the ENC–GC3S plot analysis in PVY, which showed clustering below the expected curve, indicating the influence of GC content on codon usage27. The observed differences in codon preferences among genes within the same genome may be attributed to varying evolutionary pressures and functional constraints, similar to the findings in Potato Virus M28. While Cheeran et al. demonstrated that natural selection predominantly shapes codon usage bias in TMV genes, our analysis of ToBRFV suggests a nuanced interplay between mutational bias and translational selection, reflecting unique evolutionary trajectories in different viral genomes29.
Understanding these patterns of codon usage bias and adaptation to the host’s translational machinery in ToBRFV genes can provide valuable insights into the virus’s evolution and host-virus interactions. He et al. demonstrated that host selection pressure significantly influences the codon usage patterns of plant RNA viruses, suggesting that ToBRFV may similarly evolve to optimize its replication and survival in its host25. This is further supported by the work of Gómez et al., who demonstrated that PVY strains also show similar codon usage preferences, underscoring shared evolutionary mechanisms among plant-infecting viruses27.
The RSCU values for the CP gene of ToBRFV were rigorously analyzed, uncovering significant codon bias patterns. These patterns closely resemble those found in potato virus Y (PVY) strains, which also exhibit a preference for codons ending in A or U27. RSCU serves as a metric for comparing the observed frequency of codons to the expected frequency under equal usage of synonymous codons. Notably, codons associated with Phenylalanine (Phe), Leucine (Leu), Serine (Ser), and Proline (Pro) exhibited varying degrees of bias towards specific codons, as evidenced by their RSCU values. For instance, the Proline codons CCC and CCG demonstrated substantial bias with RSCU values of 1.00464 and 2.450116, respectively, indicating a distinct preference for these codons. Similarly, Histidine (His) and Glutamine (Gln) codons displayed a pronounced bias towards CAC and CAA, respectively, with RSCU values approximating 2, suggestive of their preferential usage within the CP gene. Moreover, the Arginine (Arg) codons, particularly CGC and AGA, manifested noticeable bias, underscored by their elevated RSCU values, indicative of a predilection for these codons within the CP gene of ToBRFV (Table 4).
Moreover, the analysis of the w_cai column, reflecting the relative adaptiveness of each codon, unveiled crucial insights into the codon usage patterns of ToBRFV. These findings are in line with Gómez et al., who reported that PVY strains also showed a high preference for A/U-ending codons, suggesting a common evolutionary strategy among plant viruses to optimize codon usage for host adaptation27. Noteworthy, values closer to 1 in the w_cai column signify higher adaptiveness, thus delineating the efficiency of gene expression and protein synthesis within ToBRFV. This comprehensive analysis not only enhances our understanding of codon usage dynamics in ToBRFV but also bears significant implications for gene expression regulation and the formulation of targeted control strategies against the virus.
Comparing our findings with those of He et al., who investigated the codon usage patterns of Narcissus late season yellows virus (NLSYV) and Narcissus yellow stripe virus (NYSV) and Narcissus degeneration virus (NDV) CP genes, a recurring preference for A/U in the third codon position across narcissus viruses was observed30. This comparative analysis sheds light on the shared characteristics and distinctive features of codon usage among different viruses, providing valuable insights into the evolutionary mechanisms and selective pressures governing codon bias in viral genomes. The codon usage patterns in ToBRFV show a mix of U- and G-ending codons, similar to what has been observed in Potato Virus M (He et al., 2019). This preference, despite the GC-rich or AU-rich composition, suggests the influence of mutation pressure on codon selection.
Therefore, the meticulous analysis of RSCU values and codon adaptation index in the CP gene of ToBRFV offers the essential groundwork for deciphering the intricate interplay between codon bias, gene expression, and viral evolution. These findings not only enrich our understanding of viral genome dynamics but also hold promise for informing future antiviral strategies and therapeutic interventions.
In this study, we conducted a comprehensive RSCU analysis for the MP and RdRp genes of ToBRFV (Tables 5 and 6). RSCU provides valuable insights into the preferences and biases in codon usage, shedding light on the genetic characteristics of the virus. The RSCU values, presented in Tables 5 and 6, reveal distinct patterns of codon usage in both genes, which could have significant implications for gene expression efficiency, protein synthesis, and the development of control strategies against ToBRFV.
Analyzing the MP gene of ToBRFV, our results indicate specific codon preferences. Notably, Phenylalanine (Phe) codons TTT and TTC exhibit a slight preference for TTT (RSCU = 1.090909), whereas Leucine (Leu) codons favor TTG over TTA (RSCU = 1.23 and 0.76, respectively). Additionally, Serine (Ser) codons demonstrate a strong preference for TCA (RSCU = 1.777778), and Proline (Pro) codons display a significant bias towards CCG (RSCU = 2.23). Furthermore, Histidine (His), Glutamine (Gln), and Arginine (Arg) codons show notable biases towards CAC, CAG, and CGC, respectively, with RSCU values close to 2.
For the RdRp gene of ToBRFV, our analysis reveals distinct codon usage preferences. Phenylalanine (Phe) codons favor TTT over TTC, while Leucine (Leu) codons exhibit no bias between TTA and TTG. Serine (Ser) codons display a moderate bias, with TCT showing the highest RSCU value. Proline (Pro) codons show a strong bias towards CCA and CCG, and Histidine (His) and Glutamine (Gln) codons exhibit a strong bias towards CAC and CAA, respectively. Moreover, Arginine (Arg) codons display notable biases, particularly for CGA and CGC. Comparing our findings with previous studies on cucurbit-infecting tobamoviruses26, we observe similarities in codon usage preferences, particularly in the preference of U over C in most synonymous third codon positions. This consistency across tobamoviruses underscores the potential for targeted interventions in viral gene control and attenuation. Strategies such as deoptimizing synonymous codons less used by both the virus and host may offer avenues for reducing viral gene expression and virulence. However, considerations of codon-specific biases, such as increasing codons ending with CpA to align with host preferences, must be carefully evaluated to avoid adverse effects on viral survival.
Comparing our findings with those of Cheeran et al. on Tobacco Mosaic Virus (TMV), we observe commonalities in codon usage preferences, particularly in the prevalence of high-frequency codons ending with nucleotide T, indicative of shared evolutionary pressures shaping viral codon bias29. The codon usage patterns in ToBRFV show a mix of U- and G-ending codons, similar to what has been observed in Potato Virus M28. This preference, despite the GC-rich or AU-rich composition, suggests the influence of mutation pressure on codon selection.
Moreover, our study contributes to understanding the mutation pressures shaping codon usage in ToBRFV genes. We found a preference for A/G in TuMV protein-coding regions, indicative of mutation pressures31. This insight underscores the dynamic nature of codon usage and its implications for viral evolution and adaptation. The comprehensive analysis of codon usage bias in ToBRFV genes, alongside comparative studies like that of Potato Virus M28, provides deeper insights into the evolutionary dynamics of plant viruses. These insights are essential for developing effective antiviral strategies and improving our understanding of virus-host interactions.
In conclusion, our analysis provides valuable insights into the codon usage patterns of ToBRFV, which could inform strategies for controlling viral gene expression and virulence. Further research into the functional implications of codon biases and their interplay with host factors is warranted to deepen our understanding of virus-host interactions and aid in the development of effective control measures against ToBRFV.
Conclusion
In conclusion, our multi-faceted analysis of ToBRFV provides valuable insights into its genetic diversity, evolutionary dynamics, and adaptive strategies. Through comprehensive phylogenetic analysis, we revealed a complex network of evolutionary relationships among ToBRFV isolates, indicating extensive global spread and significant genetic diversity. Our findings suggest ongoing gene flow among ToBRFV populations from different geographic regions, with limited genetic differentiation and notable genetic similarity observed across diverse populations. Furthermore, SNP distribution analysis unveiled differential evolutionary pressures on ToBRFV genes, emphasizing the importance of considering gene-specific factors in mutational processes. At the codon level, our study identified distinct patterns of codon usage bias and selection pressures within the ToBRFV genome. We observed varying levels of genetic diversity and evolutionary constraints among different genes, with Episodic positive selection primarily observed in the MP gene. This indicates adaptive changes in response to host immune pressures or environmental factors. Comparative analysis of codon usage bias in the CP and RdRp genes provided further insights into functional constraints and adaptation to the host's translational machinery. These findings underscore the importance of understanding codon usage dynamics in the context of viral evolution and host-virus interactions. Overall, our study enhances the understanding of ToBRFV evolution, host-virus interactions, and the molecular mechanisms driving viral adaptation. These insights can inform the development of targeted strategies for monitoring and controlling the spread of ToBRFV, as well as guide future research into viral pathogenesis and the development of antiviral interventions. Continued Collaboration and surveillance efforts are essential to stay ahead of emerging viral threats and safeguard global tomato production.
Data availability
All data are available in this manuscript.
References
Zhang, S., Griffiths, J. S., Marchand, G., Bernards, M. A. & Wang, A. Tomato brown rugose fruit virus: An emerging and rapidly spreading plant RNA virus that threatens tomato production worldwide. Mol. Plant Pathol. 23, 1262–1277 (2022).
Salem, N., Mansour, A., Ciuffo, M., Falk, B. & Turina, M. A new tobamovirus infecting tomato crops in Jordan. Arch. Virol. 161, 503–506 (2016).
Ghorbani, A., Rostami, M., Seifi, S. & Izadpanah, K. First report of tomato brown rugose fruit virus in greenhouse tomato in Iran. New Dis. Rep. 44, e12040 (2021).
Abrahamian, P. et al. Comparative analysis of tomato brown rugose fruit virus isolates shows limited genetic diversity. Viruses 14, 2816 (2022).
Esmaeilzadeh, F., Santosa, A. I., Çelik, A. & Koolivand, D. Revealing an Iranian isolate of tomato brown rugose fruit virus: Complete genome analysis and mechanical transmission. Microorganisms 11, 2434 (2023).
Maayan, Y. et al. Using genomic analysis to identify tomato Tm-2 resistance-breaking mutations and their underlying evolutionary path in a new and emerging tobamovirus. Arch. Virol. 163, 1863–1875 (2018).
van de Vossenberg, B. T., Dawood, T., Woźny, M. & Botermans, M. First expansion of the public tomato brown rugose fruit virus (ToBRFV) Nextstrain build; inclusion of new genomic and epidemiological data. PhytoFrontiers 1, 359–363 (2021).
Jewehan, A. et al. Isolation and molecular characterization of a tomato brown rugose fruit virus mutant breaking the tobamovirus resistance found in wild Solanum species. Arch. Virol. 167, 1559–1563 (2022).
Tamura, K., Stecher, G. & Kumar, S. MEGA11: Molecular evolutionary genetics analysis version 11. Mol. Biol. Evol. 38, 3022–3027 (2021).
Rédei, G. UPGMA (unweighted pair group method with arithmetic means). Encycl. Genet. Genom. Proteome. Inf. 8, 2068–2068 (2008).
Sievers, F. & Higgins, D. G. Clustal omega. Curr. Prot. Bioinf. 48, 3–13 (2014).
Weaver, S. et al. Datamonkey 2.0: A modern web application for characterizing selective and other evolutionary processes. Mol. Biol. Evol. 35, 773–777 (2018).
Kosakovsky Pond, S. L., Posada, D., Gravenor, M. B., Woelk, C. H. & Frost, S. D. GARD: A genetic algorithm for recombination detection. Bioinformatics 22, 3096–3098 (2006).
Elek, A. coRdon: An R Package for Codon Usage analysis and Prediction of Gene Expressivity (University of Zagreb. Faculty of Science, 2018).
Güller, A., Mustafa, U. & Randa-Zelyüt, F. Genetic diversity and population structure of tomato brown rugose fruit virus (ToBRFV) variants from Antalya province, Turkey. Notulae Botanicae Horti Agrobotanici Cluj-Napoca 51, 13356–13356 (2023).
Bananej, K., Keshavarz, T., da Silva, J. P. H. & Zerbini, F. M. Isolation and whole-genome sequencing of tomato brown rugose fruit virus from pepper in Iran. J. Plant Dis. Protect. 131, 1–7 (2023).
Yan, Z.-Y. et al. Biological and molecular characterization of tomato brown rugose fruit virus and development of quadruplex RT-PCR detection. J. Integr. Agric. 20, 1871–1879 (2021).
Caruso, A. G. et al. Tomato brown rugose fruit virus: A pathogen that is changing the tomato production worldwide. Ann. Appl. Biol. 181, 258–274 (2022).
Hak, H. & Spiegelman, Z. The tomato brown rugose fruit virus movement protein overcomes Tm-22 resistance in tomato while attenuating viral transport. Mol. Plant-Microbe Interact. 34, 1024–1032 (2021).
Çelik, A., Coşkan, S., Morca, A. F., Santosa, A. I. & Koolivand, D. Insight into population structure and evolutionary analysis of the emerging tomato brown rugose fruit virus. Plants 11, 3279 (2022).
Taraporewala, Z. F. & Culver, J. N. Structural and functional conservation of the tobamovirus coat protein elicitor active site. Mol. Plant-Microbe Interact. 10, 597–604 (1997).
Yan, Z. Y. et al. Identification of genetic determinants of tomato brown rugose fruit virus that enable infection of plants harbouring the Tm-22 resistance gene. Mol. Plant Pathol. 22, 1347–1357 (2021).
Adams, M. & Antoniw, J. Codon usage bias amongst plant viruses. Arch. Virol. 149, 113–135 (2003).
Cardinale, D. J., DeRosa, K. & Duffy, S. Base composition and translational selection are insufficient to explain codon usage bias in plant viruses. Viruses 5, 162–181 (2013).
He, Z., Qin, L., Xu, X. & Ding, S. Evolution and host adaptability of plant RNA viruses: Research insights on compositional biases. Comput. Struct. Biotechnol. J. 20, 2600–2610 (2022).
He, M., He, C.-Q. & Ding, N.-Z. Evolution of cucurbit-infecting tobamoviruses: Recombination and codon usage bias. Virus Res. 323, 198970 (2023).
Gómez, M. M., de Mello Volotão, E., Assandri, I. R., Peyrou, M. & Cristina, J. Analysis of codon usage bias in potato virus Y non-recombinant strains. Virus Res. 286, 198077 (2020).
He, Z., Gan, H. & Liang, X. Analysis of synonymous codon usage bias in potato virus M and its adaption to hosts. Viruses 11, 752 (2019).
Cheeran, K., Suresh, K. P., Jacob, S. S., Gowda, C. S. S. & Gejendiran, N. Analysis of codon usage bias of six genes of replicase/coat protein of tobacco mosaic virus. Indian J. Agric. Res. 1, 7 (2023).
He, Z., Ding, S., Guo, J., Qin, L. & Xu, X. Synonymous codon usage analysis of three narcissus potyviruses. Viruses 14, 846 (2022).
Qin, L., Ding, S., Wang, Z., Jiang, R. & He, Z. Host plants shape the codon usage pattern of turnip mosaic virus. Viruses 14, 2267 (2022).
Acknowledgements
The study was supported by grants awarded to Abozar Ghorbani by the Nuclear Science and Technology Research Institute (NSTRI).
Author information
Authors and Affiliations
Contributions
AG collected data and did data analysis, wrote the first draft and revised the final manuscript. All result from this manuscript was provided by ourselves and we agree to publish these data.
Corresponding author
Ethics declarations
Competing interests
The author declares no competing interests.
Ethics approval and consent to participate
The research reported here did not involve experimentation with human participants or animals.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ghorbani, A. Genetic analysis of tomato brown rugose fruit virus reveals evolutionary adaptation and codon usage bias patterns. Sci Rep 14, 21281 (2024). https://doi.org/10.1038/s41598-024-72298-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-72298-y