Identification of two novel poleroviruses and the occurrence of Tobacco bushy top disease causal agents in natural plants

Tobacco bushy top disease (TBTD) is a devastating tobacco disease in the southwestern region of China. TBTD in the Yunnan Province is often caused by co-infections of several plant viruses: tobacco bushy top virus (TBTV), tobacco vein distorting virus (TVDV), tobacco bushy top virus satellite RNA (TBTVsatRNA) and tobacco vein distorting virus-associated RNA (TVDVaRNA). Through this study, two new poleroviruses were identified in two TBTD symptomatic tobacco plants and these two novel viruses are tentatively named as tobacco polerovirus 1 (TPV1) and tobacco polerovirus 2 (TPV2), respectively. Analyses of 244 tobacco samples collected from tobacco fields in the Yunnan Province through RT-PCR showed that a total of 80 samples were infected with TPV1 and/or TPV2, and the infection rates of TPV1 and TPV2 were 8.61% and 29.51%, respectively. Thirty-three TPV1 and/or TPV2-infected tobacco samples were selected for further test for TBTV, TVDV, TBTVsatRNA and TVDVaRNA infections. The results showed that many TPV1 and/or TPV2-infected plants were also infected with two or more other assayed viruses. In this study, we also surveyed TBTV, TVDV, TBTVsatRNA and TVDVaRNA infections in a total of 1713 leaf samples collected from field plants belonging to 29 plant species in 13 plant families and from 11 provinces/autonomous regions in China. TVDV had the highest infection rates of 37.5%, while TVDVaRNA, TBTV and TBTVsatRNA were found to be at 23.0%, 12.4% and 8.1%, respectively. In addition, TVDV, TBTV, TBTVsatRNA and TVDVaRNA were firstly detected of co-infection on 10 plants such as broad bean, pea, oilseed rape, pumpkin, tomato, crofton weed etc., and 1 to 4 of the TBTD causal agents were present in the samples collected from Guizhou, Hainan, Henan, Liaoning, Inner mongolia and Tibet autonomous regions. The results indicated that TBTD causal agents are expanding its host range and posing a risk to other crop in the field.

High throughput sequencing and data analyses. To verify the viruses infecting the two TBTD-symptoms tobacco field samples, the YBSh and YKMPL leaf samples were collected and then quick-frozen by liquid nitrogen and stored at − 80 °C tentatively. The two samples were sent to Biomarker Technologies (Beijing, China) for High throughput sequencing (HTS) RNA-Seq sequencing after depletion of the rRNAs with Epicentre Ribo-ZeroTM kit, which was then sequenced using the Illumina HiSeq X-ten platform with PE150 bp (Illumina, San Diego, CA, USA). Sequence data were analyzed using CLC Genomic Workbench 9.5 (QIAGEN, Hilden, Germany) as described 19 . Reads without sequence similarity and not mapping to the reference tobacco genome were assembled de novo by Trinity program. The generated contigs were used as queries for BLAST searches; contigs that were not identified as sequences already included in the databases were sorted out as candidate genomic fragments of the novel virus.

Full genome amplification and sequencing of the viruses in samples YBSh and YKMPL.
The sequence gaps between the aligned contigs were filled by RT-PCR using virus-specific primers. The 5′-and 3′-end sequences of TBTV, TVDV, TBTVsatRNA and TVDVaRNA were determined by the rapid amplification of cDNA ends (RACE) technique using SMARTer RACE 5′/3′ Kit (Clontech, USA).
The genomic sequences of the viruses were assembled using the DNASTAR 7.0 package (DNASTAR Inc., Madison, WI, USA), and then submitted to the GenBank database in NCBI. To characterize the two newly Scientific Reports | (2021) 11:21045 | https://doi.org/10.1038/s41598-021-99320-x www.nature.com/scientificreports/ identified viruses, ORF finder software (https:// www. ncbi. nlm. nih. gov/ orffi nder/) was used to predict their ORFs. Pairwise comparisons were performed using the EMBOSS Needle Pairwise Sequence Alignment software available at the http:// www. ebi. ac. uk/ Tools/ psa/ emboss_ needle/ nucle otide. html. Phylogenetic relationship between the two new viruses and the other known poleroviruses was determined by the MEGA 5.0 software. The sequences were all linearized at the start of the RdRp gene and then aligned using MEGA 5.0. The alignments were used to infer Neighbor joining trees in MEGA 5.0 with P-distance model and 1000 bootstrap replicates as described 20 .
Viruses detection and sequence confirmation of the two new poleroviruses in the field tobacco samples. To detect the two newly identified poleroviruses in the field-collected samples, total RNAs were extracted from 817 tobacco leaf samples using the TRIpure Reagent (Bioteke, Beijing, China) for reverse transcription-polymerase chain reaction (RT-PCR). RT-PCR reactions were performed using specific primers based on the two new poleroviruses sequences (Supplementary Table 2) and PrimeScript™ One-Step RT-PCR Kit Ver. 2 (TaKaRa Biotechnology, Dalian, China) as instructed. Positive RT-PCR products were gel purified and cloned individually into the pMD19-T vector (TaKaRa). The resulting plasmid DNAs were sequenced by BGI (BGI, Guangzhou, China) and the resulting viral sequences were assembled using the DNASTAR 7.0 package (DNASTAR Inc., Madison, WI, USA).

Detection of TBTV, TVDV, TBTVsatRNA and TVDVaRNA infections in the field-collected samples.
To determine the occurrence of TBTV, TVDV, TBTVsatRNA and TVDVaRNA in the field samples, virus specific primers (Supplementary Table 2) were used and the four viruses were simultaneously detected in the field-collected samples through multiplex RT-PCR as previously described 15 .
Identification and preservation of plant samples. The

Results
Symptomology of TBTD and viruses detected in the two TBTD affected tobacco plants by HTS. The most frequently observed TBTD symptoms on flue-cured tobacco (Nicotiana tobacum) in field include small leaves, irregular necrotic lesions on leaves, yellowing or chlorosis, internode shortening and stunting ( Fig. 1). Diseased tobacco plants became chlorosis, significantly stunted and failed to flower when infected at an early stage (Fig. 1A, B), the fully infected plants were thus unmarketable. While late infections developed www.nature.com/scientificreports/ lateral branches proliferation, small leaves, foliar yellowing or chlorosis, stunting and without impaired flowering ( Fig. 1C, D), only the lower and uninfected leaves were marketable from these plants. Two TBTD symptoms tobacco plants YBSh and YKMPL were collected for HTS RNA-Seq sequencing to further verify the viruses infecting the TBTD-symptoms field tobacco plants. A total of 36,744,321 and 33,356,805 clean RNA reads were obtained from the YBSh and YKMPL samples through HTS after removing the failed reads, respectively. These clean reads were assembled using the Illumina HiSeq X-ten platform with PE150 bp and the CLC Genomic Workbench 9.5 (QIAGEN, Beijing, China) as described 19 . In which 56,964 contigs for YBSh and 52,916 contigs for YKMPL larger than 200 bp were assembly generated by de novo. A total of 122,028 reads were associated with TBTVsatRNA, 22,370 reads were associated with TVDVaRNA, 1775 reads were associated with TBTV, 15,430 reads were associated with TVDV, 6397 reads were associated with TPV1 and 8594 reads were associated with TPV2 in sample YBSh. A total of 116,779 reads were related with TBTVsatRNA, 1124 reads were related with TVDVaRNA, 14,388 reads were related with TBTV, 2923 reads were related with TVDV, 277 reads were related with TPV1 and 339 reads were related with TPV2 in sample YKMPL. The resulting contigs were subjected to BlastX and BlastN searches against the databases at the NCBI, results revealed that both tobacco plants YBSh and YKMPL infected with six viruses.
The results showed that both the YBSh and YKMPL tobacco samples were co-infected with 6 different viral agents of TBTV, TBTVsatRNA, TVDV, TVDVaRNA and two novel poleroviruses (designated as isolates YBSh and YKMPL, respectively). Based on the results of sequence alignment, we tentatively named these two new poleroviruses as tobacco polerovirus 1 (TPV1) and tobacco polerovirus 2 (TPV2). To validate the reliability of HTS results, the total RNA samples isolated from the YBSh, YKMPL samples as well as healthy tobacco sample were analyzed by RT-PCR. TPV1 and TPV2 detection primers were designed according to the virus contigs identified through HTS, while the primers and multiplex one-step RT-PCR used to detect TBTV, TBTVsatRNA, TVDV, TVDVaRNA were described previously 15 . The results showed that the PCR products representing the six viruses were indeed present in the YBSh and YKMPL samples, but not in the sample from healthy plant (Figure S1A-C). and YKMPL isolates, overlapping amplicons cloning strategy was used in a series of sequential RT-PCR with virus specific primers designed according to the virus sequences from HTS. The primers, amplification strategies (position in the virus genomes), size of the amplicons, and specify chemistry for sequencing were add to the supplementary materials ( Fig. S2; Table S3). At least three clones from each amplicon were sequenced on both strands using M13 forward and reverse primers as well as specific sequencing primers if necessary. Results showed that the full-length genome sequences of the two TBTV isolates both were determined to be 4152 nucleotides (GenBank accession number: TBTV-YBSh, MW579556; TBTV-YKMPL, MW579557). Pairwise comparison of the complete nucleotide sequences of different TBTV isolates showed that TBTV-YBSh shared 97.0% nt sequence identity with TBTV-YKMPL. TBTV-YBSh and TBTV-YKMPL shared 94.7% (TBTV-MD-II, KM067277) to 98.7% (TBTV-MD-I, KM016225) and 85.6% (TBTV-YDHo, KX216406) to 98.8% (TBTV-MD-I) nt sequence identity with other TBTV isolates available in GenBank. The full-length genomic sequences of the two TVDV isolates were determined to be 5920 nt (GenBank accession number: TVDV-YBSh, MW579560; TVDV-YKMPL, MW579561). Pairwise comparison of the complete nucleotide sequences of different TVDV isolates showed that TVDV-YBSh shared 99.5% nt sequence identity with TVDV-YKMPL.

Sequence analysis and genome organization of TPV1 and TPV2. Two new poleroviruses, TPV1
and TPV2, were found both in the YBSh and YKMPL field samples through HTS. The nearly full-length genome sequences of isolates TPV1-YBSh (GenBank accession number: MW579552), TPV1-YKMPL (GenBank accession number: MW579553) and TPV2-YBSh (GenBank accession number: MW579554), TPV2-YKMPL, (Gen-Bank accession number: MW579555) were confirmed to be 5722nt, 5725nt and 5907nt, 5912nt, respectively, by series of sequential RT-PCR and SMARTer®RACE 5′/3′ kit (Clontech Laboratories. lnc, USA) with virus specific primers based on the HTS data followed by Sanger sequencing. Pairwise comparison results of the nearly complete sequences showed that TPV1-YBSh shared 99.7% nt identity with TPV1-YKMPL, and TPV2-YBSh shared 98.9% nt identity with TPV2-YKMPL, respectively. The genomic sequences of TPV1-YBSh and TPV2-YBSh, therefore, were used in the subsequent sequence analysis. The genomic nucleotide sequence identity between TPV1 and TPV2 is 54.2%, suggesting they are two distinct species. Blast search results indicated that TPV1 and TPV2 had the highest nt sequence identity with the known poleroviruses. The genome structures of TPV1 and TPV2 were predicted using the ORFfinder software (https:// www. ncbi. nlm. nih. gov/ orffi nder). The genomic organization and structure of TPV1 and TPV2 is typical of poleroviruses when comparing with PLRV, and both TPV1 and TPV2 contain seven ORFs: ORF0, ORF1, ORF1-ORF2, ORF3a, ORF3, ORF4 and ORF3-ORF5 (Fig. 2 (Table 3). It is worthy to note that the TPV1 ORF5showed the highest differences with that of other poleroviruses. In contrast, the TPV1 ORF3 has the highest nucleotide sequence similarities or the aa sequence identities with that of poleroviruses. The aa sequence identities between the TPV1 P4 or the ORF3-ORF5 readthrough protein and those of other 19 poleroviruses are all less than 90%. The results also showed that TPV1 had the highest nt and aa sequence identity over 96% in ORF0, ORF1 and ORF1-ORF2 with TV2, while had 54.0-65.9% and 38.5-62.6% identity at nt and aa sequence level in ORF3, ORF4 and ORF3-ORF5 with TV2, respectively. With the exception of TV2, TPV1 shared the highest aa sequence identity of 54.7% (ORF0), 67.0% (ORF1), 71.7% (ORF1-ORF2), 94.1% (ORF3), 88% (ORF4) and 77.7% (ORF3-ORF5) with other poleroviruses. In Table 3 we can see that there have high identities between TPV1 and TV2 in 5′ proximal ORFs, TPV1 and TuYV in 3′ proximal ORFs, It is speculated that there may be recombination events in TPV1, TV2 and TuYV. The values are under the current species demarcation criteria for the Solemoviridae 9 , indicating that TPV1 should be a novel species in genus Polerovirus.

Survey of the TPV1 and TPV2 infections in field tobacco plants. To verify the occurrence of TPV1
and TPV2 in field, 244 leaf samples were randomly selected from 817 virus-like tobacco fields samples collected in 2013 to 2018 in Yunnan Province, and tested for TPV1 and TPV2 infections through RT-PCR. The results showed that 8 samples were single infected with TPV1 (detection rate of 3.28%) and 59 samples were single infected with TPV2 (24.18%) ( Table 6). In addition, 13 samples were infected with both TPV1 and TPV2 (5.33%). The average detection rate of TPV 1 or TPV2 were up to 32.79%, suggesting that TPV1 and TPV2 were common on tobacco these years.
Then 33 TPV1, TPV2, or TPV1 + TPV2 infecting samples were selected and tested for TBTV, TVDV, TBTV-satRNA and TVDVaRNA infections by RT-PCR with virus specific primers. The results showed that TPV1 and TPV2 always co-infected field plants with two to four TBTD casual viruses (Table 7). For example, five samples were co-infected with all six viruses, and 11 samples were co-infected with five different viruses. No single TPV1 or TPV2 infection was detected, and TPV1 or TPV2 always co-infected with both TVDV and TVDVaRNA. It's speculated that TPV1, TPV2 may have a synergistic relationship with the causal agents of TBTD, and the interactions among TPV1, TPV2 and the causal agents of TBTD is also worthy for further study.  Table 1). In addition, 65 pepper leaf samples from nine provinces/autonomous regions, and 83 tomato leaf samples from eight provinces/autonomous regions of China were also collected. All the sampled plants showed virus-like symptoms. Eleven crofton weed leaf samples were also collected from Guizhou, 1 purple perilla and 3 dahlia were collected from Liaoning. These collected samples were then tested for TBTV, TVDV, TBTVsatRNA and TVDVaRNA infections through RT-PCR with virus specific primers as described by  (Table 8). Among the virus infection 12 families, family Fabaceae, Brassicaceae, Cueurbitaceae, Caricaceae, Poaceae, Araceae, Araliaceae, Dioscoreaceae, Liliaceae and Amaranthaceae had not been reported as the hosts of TBTV, TVDV, TBTVsatRNA and TVDVaRNA. In this study, sticktight, broad bean, pea, oilseed rape, pumpkin, tomato, crofton weed and black nightshade plants were firstly found to be infected with all four assayed viruses of TBTV, TVDV, TBTVsatRNA and TVDVaRNA (Table 8). The presence of the causal agents of TBTD were firstly confirmed in Guizhou, Hainan, Henan, Liaoning, Inner Mongolia and Tibet besides Yunnan. These results suggest that the causal agents of TBTD were widely distributed in China and have spread to a broader plant hosts.
In this study, 663 out of the 1713 tested leaf samples were found to be infected with TBTV, TVDV, TBTV-satRNA and/or TVDVaRNA, with the average detection rate of 38.70% (Table 9). The result also showed that, among the four assayed viruses, the infection rate of TVDV was the highest (37.5%) while the infection rate of TBTVsatRNA was the lowest (8.1%) (Fig. 4). Six hundred and sixty-three samples were detected at least one causal agents of TBTD. It was found that the combination of the two to four causal agents of TBTD in the field was commonly. Meanwhile, there are 364 samples co-infected with two causal agents of TBTD (TBTV + TVDV, TBTV + TBTVsatRNA or TVDV + TVDVaRNA) (accounting for 54.9% of these 663 TBTD diseased samples) . Fivty-one samples co-infected with 3 causal agents of TBTD combined with TBTV + TVDV+TVDVaRNA, TBTV+TVDV + TBTVsatRNA (accounting for 7.7% of these 663 TBTD diseased samples), 86 samples coinfected with all four causal agents of TBTD (accounting for 13.0% of these 663 TBTD diseased samples). The causal agents of TBTD were mainly in 2 agents combination in the field, more common for 3 or 4 agents co-infections. TVDV was found in most of the pathogen combinations, which indicated that TVDV played an important role in the occurrence of TBTD in the field. In this study, 156 samples were single infected with TVDV (9.11%) and 6 samples were single infected with TBTV (0.35%), whereas no single TBTVsatRNA or TVDVaRNA infection was detected. TVDV was found in most of the virus combinations indicating that TVDV plays an very important role in the occurrence of TBTD in the field. There were 21 samples infected by TBTV or TBTVsatRNA but ansence TVDV, which declared that there may be other viral agents that can asist TBTV and TBTVsatRNA complete vector transmission.

Discussion
Several viral agents have been reported to cause TBTD in some countries. For example, an early study had suggested that TBTD in Zimbabwe was caused by a co-infection of TVDV and TBTV 1 21,22 . In virus infections plants, large numbers of virus-derived siRNAs (vsiRNAs) will be generated along with the viral genomic RNAs, and these vsiRNAs can be identified and assembled into virus contigs or even full-length viral genome [23][24][25] . This technology can also help us to identify new virus(es) associated with TBTD. In this study, HTS was used to analyze two tobacco samples showing typical TBTD-like symptoms in two different locations. Based on the assembled sequences, we have determined two new near full-length polerovirus (i.e., TPV1 and TPV2) sequences. Sequence alignment result showed that TPV1 shares the highest nucleotide sequence identity with TV2 (79.1%). The deduced amino acid sequences of the TPV1 P4 protein and the readthrough protein (P3-P5) share less than 90% identities with that of viruses in the genus Polerovirus. Sequence alignment result also showed that TPV2 shares the highest nucleotide sequence identity of 70.4% with TVDV. The predicted amino acid sequences of TPV2 proteins, except P3, share less than 90% identities with that of viruses in the family Solemoviridae. Therefor, we conclude that TPV1 and TPV2 are two novel poleroviruses.
Ethiopian tobacco bushy top disease symptoms are similar to that of TBTD in China, and is also caused by several different polerovirus and umbravirus 10 . Recent study showed that ETBTV can complete its vector transmission assist by cowpea polerovirus 1 (genus Polerovirus) besides PLRV 10,26 , and the results of our group also proved that TBTV can complete its aphid transmission with the assistance of barley yellow dwarf virus GAV (genus Luteovirus; unpublished data). It can be inferred that there may be other polerovirus could assist TBTV acomplish its aphid transmission. Our survey results also revealed that 0.35% and 0.88% field samples infected TBTV or TBTV + TBTVsatRNA do not coinfected with TVDV (Table 9). That means there could be another polerovirus other than TVDV could help TBTV acomplish its aphid transmission, and TPV1 and/or TPV2 should be a potential aphid transmission help virus for TBTV in nature. The phylogenetic analysis showed that TPV1 is closely related to PLRV and TV2, and TPV2 is closely related to TVDV. Because TPV1 and TPV2 are often co-infected with one or more of the other four TBTD known causal viruses, we speculate that both TPV1 and TPV2 might have important roles in the induction of TBTD-like symptoms and/or in the TBTD disease cycle. However, whether TPV1 and TPV2 are responsible for TBTD in China is remain unknown.   [16][17][18] . In this study, we have determined that except tobacco, a total of 21 plant species in 12 families can be infected with at least one of the TBTD causal viruses. In addition, we have found TVDV + TVDVaRNA + TBTV + TBTVsatRNA co-infection in crofton weed in Guizhou Province, tomato plants in Hainan Province, and tobacco, sticktight, broad bean, pea, oilseed rape, pumpkin, tomato as well as black