Characterization of Papaya ringspot virus isolates infecting transgenic papaya ‘Huanong No.1’ in South China

In 2006, the release and cultivation of the genetically modified papaya cultivar ‘Huanong No.1’ successfully controlled the destructive papaya ringspot disease caused by Papaya ringspot virus (PRSV) in South China. However, some transgenic papaya plants from Guangdong and Hainan are found infected by PRSV. In this study, Field investigation was carried out and susceptible transgenic papaya samples were collected during 2012–2016. Twenty representative isolates were artificially inoculated into Cucurbita pepo and commercialised ‘Huanong No.1’ papaya, and results indicated that the plants showed obvious disease symptoms. Phylogenetic analysis of CP genes of 120 PRSV-infected isolates showed that PRSV can be divided into three groups. Isolates from Guangdong and Hainan belong to Group III, which is further divided into two subgroups. The isolates collected in this study have greatly diverged from the previously reported dominant strains Ys, Vb and Sm in South China, indicating that they belong to a new lineage. Further analysis showed a highly genetic differentiation between isolates, and 27.1% of the isolates were identified as recombinants on the basis of CP nucleotide sequences. These results indicate that the genetic variation of PRSV and the formation of the new virus lineage may explain the loss of transgenic papaya resistance in South China.

Papaya ringspot virus (PRSV) of the genus Potyvirus and the family Potyviridae is a threat to the papaya and cucurbit industries in tropical and subtropical regions. Papaya infected with PRSV exhibits symptoms such as leaf chlorosis and mosaic, water-soaked oily streaks on the petiole, ringspots on the fruit, young leaf distortion and even shoestring-like symptoms. Consequently, fruit production and quality can be severely decreased, and fruit sugar levels reduce by 50% or more. PRSV is grouped into papaya-infecting type-P (PRSV-P) and non-papaya-infecting type-W (PRSV-W). PRSV-P isolates infect plants of the families Caricaceae, Cucurbitaceae and Chenopodiaceae, whereas PRSV-W isolates infect plants of the families Cucurbitaceae and Chenopodiaceae 1 . PRSV-P and PRSV-W are serologically indistinguishable, but can be distinguished by specific hosts and are transmitted non-persistently by several species of aphids 2 . The virus has a positive single-stranded RNA of approximately 10 kb and encodes a polyprotein that is cleaved into 10 mature functional proteins 3 , namely, protein 1 protease, helper component protease, P3 protein, six kilodalton peptide, cytoplasmic inclusion, 6K2, viral protein genome-linked, nuclear inclusion A protease, nuclear inclusion b RNA-dependent RNA polymerase (NIb), coat protein (CP) 4 and additional peptide P3N-PIPO 5 . Several reports on PRSV diversity, strain identification, phylogeny and recombination focused on the CP gene [6][7][8][9][10] .
From a global perspective, the most effective approach to prevent and control PRSV is to cultivate virus-resistant transgenic papaya cultivars. The world's first commercially genetically modified papaya containing the PRSV CP gene was developed in Hawaii (USA) in 1998. Commercialised cultivars 'SunUp' and 'Rainbow' show high resistance to Hawaii PRSV isolates [11][12][13] . These two cultivars have been planted in Hawaii and remained resistant over the past two decades (personal communication). However, those transgenic papayas exhibit varying defence mechanisms against PRSV isolates from other geographical regions. For example, symptoms in papaya infected with isolates from the Bahamas, Florida or Mexico are delayed and mitigated, whereas those in papaya

Field investigation of the virus infection. A total of 495 susceptible samples showing symptoms of leaf
mosaic and malformation, streaks on stems and petioles, and ringspot on the fruits were collected from 'Huanong No.1'papaya plantations in Guangdong and Hainan during 2012-2016 (Fig. 1). In 2016, the disease incidence in Guangdong was generally 5-30% but ≥60% in few seriously diseased areas, whereas that in Hainan was usually 80% but 90-100% in some areas. This finding suggests that 'Huanong No.1' papaya plants in South China are susceptible to viruses and that the disease incidence in Hainan is generally higher than that in Guangdong. Then, all of these samples were identified by RT-PCR and then by sequence analyses. Results showed that 128 of those samples were infected with PRSV, including 69 samples in Guangdong (41 in  Symptoms of squash and papaya inoculated with representative isolates. The 128 samples infected with PRSV isolates were divided into four groups according to geographical regions (Guangdong and Hainan) and leaf symptom characteristics (mosaic and distortion). Five samples were selected from each group, and 20 purified isolates were obtained by inoculation on Chenopodium quinoa. Then, isolates from the four groups were utilised to inoculate squash cultivar 'Zaoqingyidai' (C. pepo) and 'Huanong No.1' papaya. Results showed that all of the isolates caused similar viral symptoms in the inoculated plants of squash and papaya. In specific, irregular chlorosis along the leaf veins on squash, and distortion, irregular chlorosis and green islands on papaya were observed (Fig. 2). The RNAs of those diseased plants were extracted, and PRSV CP genes were cloned and sequenced. The amino acid sequences of the infected PRSV isolates were essentially coincident with those of the original inoculated isolates. These results reveal that the PRSV isolates obtained from South China can indeed infect transgenic papaya cultivar 'Huanong No.1' and break the resistance against PRSV in South China.

Analysis of sequence variation in CP genes.
Sequencing results showed that the CP genes of the 128 isolates in this study were approximately 864-867 bp in nucleotide length and were deposited in GenBank (Table 1S). Among the 128 isolates, 69 were from Guangdong and 59 from Hainan. In addition, five PRSV complete CP sequences originated from Hainan were retrieved from GenBank. Consequently, 133 complete CP sequences were included in the population genetic analysis of the virus. In the meantime, 23 complete CP sequences from different countries and regions, including the three previously reported dominant strains (Ys, Vb and Sm) in South China 18 , were downloaded from the GenBank as references (Table 1S) (Fig. 1S). These results indicated that the collected PRSV isolates differed from Ys but were relatively similar to Vb and Sm in South China. Therefore, five Guangdong isolates (NS1, P1, Z1, ZS1 and GZ21) and five Hainan isolates (SD1, SD50, S1, S18 and HA1), as well as Ys, Vb and Sm, were chosen for alignment analysis of the amino acid sequence of CP genes. A large number of EK (glutamic acid and lysine) repeats, including three conserved DAG, WCIEN and QMKAAA regions, were found in all isolates, but several domain sites of amino acid sequences showed a distinct variation (Fig. 3). Among them, three to four EK repeats were added and two sites were changed (Thr 34 to Ala 38    Arg 235 , Leu 233 to Arg 237 , Cys 244 to Leu 248 , Asp 245 to Arg 249 , Arg 246 to Asn 250 , Pro 247 to Thr 251 , Ala 248 to Ser 252 , Ala 249 to Arg 253 and His 276 to Asp 280 ) in 10 isolates as compared with that in Ys, and two sites were different (Val 59 to Ala 59 , Asn 127 to Ser 127 ) between the Guangdong and Hainan isolates. Diversity analysis at the nucleotide level showed that 107 haplotypes were identified in the 133 complete sequences of CP genes with an overall haplotype diversity of 0.995 and a nucleotide diversity of 0.057 when the sequences from the Guangdong and Hainan isolates were considered together (Table 1). When the sequences were considered according to individual geographic location, higher nucleotide (0.054 vs. 0.045) but lower haplotype (0.986 vs. 0.995) diversity was found in Guangdong than in Hainan. These results indicated that the sequence differences of the Guangdong isolates were greater than those of the Hainan isolates, but the genetic diversity of population in Hainan was higher than that in Guangdong. Overall, the haplotype diversity in both regions was greater than 0.5 and the nucleotide diversity was higher than 0.005, indicating that the PRSV population in both regions had a high degree of genetic diversity.
Analysis of genetic differentiation. Genetic differentiation of 133 CP gene sequences of the Guangdong and Hainan isolates showed that the K ST , Snn and F ST of the CP gene sequences were 0.128, 0.985 and 0.227, respectively ( Table 2). The P values of these indicator values from the outcome of 1000 Bernoulli trials were 0.000, which showed statistically significant differences at the 0.001 level. Hence, the null hypothesis that the two populations from Guangdong and Hainan are not genetically differentiated was rejected. The above three indicator values revealed a significant differentiation between the two PRSV populations from Guangdong and Hainan. China, NS1, P1, Z1, ZS1, and GZ21 are Guangdong isolates, while SD1, SD50, S1, S18, and HA1 are Hainan isolates. Different letters in the same column indicate different amino acids in that position. The yellow regions mean the glutamic acid and lysine (EK) repeat patterns, the gray and purple regions mean the different amino acids between this study's isolates and three previously reported dominant strains in South China. Multiple alignments were performed with CLUSTALW program included in MEGA 6.  Results showed that 36 of the isolate sequences (27.07% of total 133 isolates) were identified with a high level of confidence as recombinants (Table 3), which could be divided into two recombination types according to the breakpoints of the sequence alignment. Twenty-eight isolate sequences from Guangdong and four sequences from Hainan (breakpoints at 829-411 bp) were identified as recombinants by all seven methods, which were assigned as type I; four isolate sequences from Guangdong (breakpoints at 854-423 bp) were identified as recombinant by six methods except GENECONV(G), which were assigned as type II. These results showed that the recombinants of the Guangdong isolates (46.3% of 69 isolates) were significantly higher than those of the Hainan isolates (6.3% of 64 isolates), and most of the recombinants belonged to recombination type I.

Phylogenetic analysis.
To rule out the interference of recombination in phylogenetic tree construction 25 , we removed the sequences of 36 recombinants and used the remaining 97 CP sequences of the Guangdong and Hainan isolates to construct a phylogenetic tree. The phylogenetic tree results showed that 120 isolates were divided into three groups based on the nucleotide sequences of the CP genes ( Fig. 4). All American and Indian isolates were clustered into Group I, whereas Asian isolates from Thailand, South Korea, China including Taiwan and South China were clustered into Group II, including three Hainan isolates (Vb_HN, P_HN and LM_HN) and three previously reported dominant South China strains (Ys, Vb and Sm). All 91 isolates collected from Guangdong and Hainan were clustered into Group III, and Group III was further divided into two subgroups. Most Hainan isolates were grouped into subgroup I, and all Guangdong isolates were grouped into subgroup II.
The genetic distance values between groups and within groups of the isolates are shown in Table 2S. The genetic distance values within three groups ranged from 0.059 ± 0.005 to 0.089 ± 0.005, which were lower than those between groups (0.079 ± 0.006 to 0.115 ± 0.008). In particular, the genetic distance values between Groups III and II and between Groups III and I were 0.115 ± 0.008 and 0.115 ± 0.009, respectively, which were higher than those between Groups II and I (0.079 ± 0.006). This result indicates that Group III is a new emergent lineage of PRSV.
Selection pressures acting on PRSV CP genes. A large number of selection pressure sites (Table 4) were detected in the 133 CP gene sequences of PRSV from the Guangdong and Hainan isolates with the fixed effects likelihood (FEL), internal branches fixed-effects likelihood (IFEL) and mixed effects model of evolution (MEME) methods in HyPhy package. A total of 92 purifying (negative) (94.8%) and 5 positive selection sites (5.2%) were detected by FEL, and 64 purifying (92.8%) and 5 positive selection pressure sites (7.2%) were detected by IFEL. The number of purifying selection pressure sites was much higher than that of positive selection pressure sites, indicating that the CP genes of PRSV were mainly acted under the pressure of purifying (negative) selection.

Neutrality tests. The Tajima's D, Fu & Li's D and Fu & Li's F values for PRSV Group III from the Guangdong
and Hainan isolates were negative, but the data were not significant (Table 5). This result indicates that the PRSV populations in Guangdong and Hainan were relatively stable.

Discussion
Papaya is an important fruit crop which is widely cultivated in the tropical and subtropical regions of South China and other countries worldwide. PRSV is the most widespread and damaging virus that infects papaya. The large-scale cultivation and commercialisation of antiviral transgenic papaya have been successful in Hawaii and South China for a long time 17,20,26,27 . However, recent studies have found that PRSV infects 'Huanong No.  Table 2. Statistical tests for differentiation between two Papaya ringspot virus populations from Guangdong and Hainan based on the CP gene nucleotide sequences. ns, not significant; *0.01 < P < 0.05; **0.001 < P < 0.01; ***P < 0.001. K ST and S nn were implemented in DnaSP 5.10, while F ST was in Arlequin 3.5. The hypothesis of deviation from null population differentiation was tested by 1000 permutations of the raw data.
mosaic and Lc with leaf curl. In the present study, 20 representative isolates from Guangdong and Hainan were inoculated onto C. pepo, and the leaves exhibited irregular chlorosis along the leaf veins (Fig. 2). This symptom was obviously different from those caused by the four strains mentioned above. Furthermore, the CP sequence alignment between these strains (Ys, Vb and Sm) and the 133 isolates from Guangdong P-value range   Z5,FM26,GZ30,GZ35,NK11,K6,K9,NO13,T5,  T6,TF3,TF4,WS1,Y10,Y9,YJ4,ZC1,ZC2,HZ1,  H22,LD1,MZ1,QY1,ZS2,ZS5,ZS6,SD24,      that these 133 isolates were significantly different from Ys in South China but relatively similar to Vb and Sm. The amino acid sequences of 10 representative Guangdong and Hainan isolates were also compared with Ys, Vb and Sm (Fig. 3), and results showed several domain sites that had been changed. In the 10 Guangdong and Hainan isolates, three to four E or K were added in the sites of EK repeats as compared with Ys, Vb and Sm. Bateson et al. 28 and Jain et al. 29 analysed the CP-coding region of PRSV isolates from Vietnam and India, respectively, and found that the number of amino acids also changed in the 'EK' region. The EK region is an important component located on the outer surface of the CP 30 protein and is closely related to the aphid transmission element 31-33 .
Mulot et al. 34 revealed that the membrane-bound Ephrin receptor (Eph) in Myzus persicae is a novel aphid protein which is involved in the transmission of the Turnip yellows virus (TuYV) and further confirmed that the minor capsid protein of TuYV, essential for aphid transmission, was able to bind the external domain of Eph in yeast. Therefore, the addition or deletion of amino acids in the 'EK' region may influence aphids on the transmission of PRSV, so that the original dominant strain cannot be effectively and continuously transmitted. After the virus evolves a new 'EK' region suitable for aphid transmission, the new isolates gradually disseminate and substitute for the original strains. In addition, the conserved DAG, WCIEN and QMKAAA domains in the CP region were also postulated to be associated with virus transmission by aphids 30,31,33 . Meanwhile, outside the 'EK' domain, we found that two amino acid residues (Thr 34 to Ala 38 , Thr 82 to Ser 86 ) of the 10 representative Guangdong and Hainan isolates were different from the Ys, Vb and Sm strains, 12 amino acids residues (11 at the C-terminal and 1 at the N-terminal) were different from the Ys strain and two sites (both at the N-terminal) were different between the Guangdong and Hainan isolates. The core region of CP is associated with viral assembly 35,36 , plasmodesmatal gating 37 and cell-to-cell movement 38  In the present study, most of the variation sites were located at the N-or C-terminals of CP, indicating that their changes may lead to the differentiation of the concentrations or systemic movement capacity of PRSV isolates in papaya. In addition, Zamora et al. 42 analysed the RNA binding mode in Potyvirus and revealed that mutations in the conserved Arg and Asp residues of the CP impaired in vitro assembly of the potyvirus and blocked the assembly and cell-to-cell movement of the potexvirus in plants.
In present study, we speculated that the CP of PRSV might have RNA binding function. Thus, various patterns of conserved elements and the recombinations may affect the RNA binding function. Further studies should be conducted for verification. Phylogenetic analysis of PRSV CP nucleotide sequences from South China and other countries were performed. Results showed that PRSV isolates were distinctly divided into three groups (Fig. 4). Almost all Guangdong and Hainan isolates were clustered into Group III, although they belonged to two subgroups based on geographical locations. However, Ys, Vb and Sm in South China were clustered into Group II, along with Asian isolates from Thailand, South Korea and China Taiwan. The genetic distance values between Groups III and II and between Groups III and I were 0.115 ± 0.008 and 0.115 ± 0.009, respectively, which were higher than those between Groups II and I (0.079 ± 0.006), indicating that Group III belongs to a new lineage. We suppose that the long-term cultivation of genetically modified papaya in Guangdong and Hainan may have led to the emergence of the new PRSV isolate lineage.
Recombination is an important factor to promote virus evolution, which can increase genomic biodiversity 43 , reduce mutations in specific genome sequences 44 and restore genome integrity 45 . In addition, recombination enhances the virulence of the virus and extends its host ranges 46 . This process has been found in many Potyvirus species [47][48][49][50] . The recombination hotspot of PRSV is the P1 gene, followed by P3 and CI, HC-Pro and CP 51 . According to the two breakpoints of nucleotides (746-853 and 591-861), Mangrauthia et al. 51 discovered two recombination types in the CP region. In present study, 36 (27.1%) of the 133 PRSV Guangdong and Hainan isolates showed clear recombination in the CP region. Two recombination types were also detected, but the recombination sites were respectively located in positions 829-411 and 854-423, which were different from those of the types above. The breakpoints found in the former 51 are mainly located at the C-terminal of CP, whereas those in the present study are located at the N-and C-terminals of CP, respectively. We also compared the amino acid sequences of CP from the Guangdong and Hainan isolates and found that the variation sites were mostly located in the N-and C-terminals of CP. Thus, we supposed that the virus can reduce harmful mutations and maintain genome stability by variation or recombination of the CP region, which probably led to the rapid spread of new group populations.
In the present study, the CP genes of the PRSV Guangdong and Hainan isolates exhibited a high degree of genetic diversity. Three indices were used to test the population differentiation between the Guangdong and Hainan isolates based on the PRSV CP nucleotide sequences. K ST and F ST may measure the relative proportions of total genetic diversity attributable to among populations, and range from 0.00 to 1.00. A value of 1.00 for K ST or F ST indicates that populations are completely differentiated, while a value of 0.00 indicates the populations are   53 . The Snn value may indicate the frequency of the most similar sequences in the same population, and Snn values close to 1.0 indicate that the population is highly differentiated while values near 0.5 54 indicate that the population is identical. In the current study, the K ST and F ST values were close to or higher than 0.15 and the Snn values were close to 1, suggesting that the Guangdong and Hainan isolates in this study composed a highly differentiated new population of PRSV in South China. Positive selection of virus population may endow the virus more fitness to adapt a new hosts or environments, whereas rapid divergence driven by positive selection has been seldom demonstrated 55 . Similar to other viral genes, majority of the codons in the CP genes of the Guangdong and Hainan isolates in the present study were detected to be under the status of negative (purifying) selection. This result suggests that most of the codon mutations in the PRSV genome are detrimental and thus are easily eliminated by natural selection. In this case, the selection may come from living environment differences between the two regions, such as differences in papaya cultivars and climatic conditions. After a long-term tracking survey of transgenic cultivars in Guangdong and Hainan, we found that 'Huanong No.1' was the dominant cultivar grown in Guangdong, while more various cultivars from other countries and regions, including 'Huanong No.1' , were grown in Hainan. Guangdong Province is located at 20° 13′N-25° 31′N, 109° 39′E-117° 19′E, while Hainan Island is located at 19° 20′N-20° 10′N, 108° 21′E-111° 03′E. The former belongs to subtropical and tropical monsoon climate with an annual average temperature of 19-23 °C, whereas the latter belongs to tropical monsoon climate with an annual average temperature of 23-25 °C. These two regions were separated by Qiongzhou Strait, resulting in a certain geographical isolation 56 . These differences may lead to the gradual differentiation of PRSV between the Guangdong and Hainan isolates and eventually induce those isolates to evolve into two subgroups.
Tajima 57 has developed a statistical method for testing the neutral mutation hypothesis by using the average number of nucleotide differences and the number of segregating sites. If a population experiences a bottleneck or balance, Tajima's D values are significantly higher than 0; if a population experiences a size expansion or directional selection, Tajima's D values are significantly less than 0. Since both balance and directional selection fall into the category of positive selection, natural selection may be accepted as long as the Tajima's D value deviates significantly from 0, whereas the null hypothesis that neutral selection cannot be rejected when the Tajima Table 1S). More than 10 diseased samples were randomly collected from papaya farms to investigate the virus occurrence. Samples were stored on ice-cold package, taken back to the laboratory, frozen in liquid nitrogen and then stored at −80 °C.
Artificial inoculation. The samples were divided into four groups according to geographical regions (Guangdong and Hainan) and symptomatic characteristics (mosaic and distortion), and 0.1 M phosphate buffer (pH 7.0) was used to grind leaf samples as inoculum. Five samples were selected from each group, inoculated firstly into C. quinoa for purification and then inoculated into squash cultivar 'Zaoqingyidai' (C. pepo) and transgenic papaya cultivar 'Huanong No.1' 18 . Phosphate buffer was administered in the control group. Symptoms were recorded at 7-15 days post-inoculation from 15 plants per treatment, and the experiment was repeated thrice. The plants were maintained in a greenhouse at 25 °C under natural light. The RNAs were extracted from susceptible squash and papaya leaves, and the coat protein genes of PRSV were cloned to compare with the sequences of the original inoculated PRSV isolates. Results of this experiment were used to verify Koch's Postulate and compare the isolates' biological characteristics. by 1% agarose gel electrophoresis, extracted and then purified from the gel with a Tiangen gel extraction kit (TianGen, Beijing, China). The purified products were cloned into the pMD18-T vector (Takara) in accordance with the manufacturer's protocol and then transformed into Escherichia coli DH5α competent cells. Three positive clones from each transformation were selected and sequenced (Abi 3130xl Genetic Analyzer; Hitachi). All sequences generated in this study were deposited in the GenBank database. In addition, 23 nucleotide sequences of different isolates from different countries and regions were downloaded from GenBank as the reference sequences (Table 1S).
Genetic diversity and population genetic differentiation. Nucleotide sequence identity matrices were calculated using BioEdit software 59 after all gaps were removed. Boxplots of the 133 Guangdong and Hainan isolates and previously reported dominant strains (Ys, Vb and Sm) in South China were mapped using R 2.9.1 (R Project for Statistical Computing website). Haplotype and nucleotide diversities were estimated using DnaSP 5.0 60 . Haplotype diversity refers to the frequency and number of haplotypes in the population. Nucleotide diversity estimates the average pairwise differences among the sequences. Genetic differentiation among populations was calculated by F ST using Arlequin 3.5 61 . Genetic differentiation among populations was also calculated by K ST and S nn using DnaSP 5.10 52,54,60 . The hypothesis of deviation from the null population differentiation was tested by 1000 permutations of the raw data.
Recombination analysis. The nucleotide sequences of 133 PRSV isolates from Guangdong and Hainan were subjected to recombination analysis using seven methods (RDP, GENECONV, BOOTSCAN, MaxChi, CHIMAERA, SiSCAN and 3SEQ) implemented in RDP v4.71 software 62 . The probability of a putative recombination event was corrected by a Bonferroni procedure with a cutoff of p-value less than 0.01. To avoid misidentification, only events supported by at least four of the seven methods were considered to be recombinants. Recombinants were excluded from the reconstruction of phylogenetic trees.
Phylogenetic analysis. The nucleotide sequences of the CP genes were aligned using the Muscle algorithm 63 implemented in MEGA 6 64 . Phylogenetic tree of PRSV isolates excluding the recombinants was reconstructed by the Neighbour-Joining method 65 implemented in MEGA 6. The CP gene of one isolate of Moroccan watermelon mosaic virus (MWMV) (Accession No. NC_009995.1) was used as a outgroup 8 . Bootstrap analysis was repeated 1000 times to evaluate the significance of the internal branches. The intra-and inter-group genetic distances were calculated using MEGA 6.
Selection pressure analysis. The selection pressure was estimated by d N /d S ratio, where d N represents the average number of non-synonymous substitutions per non-synonymous site and d S represents the average number of synonymous substitutions per synonymous site. HyPhy 2.10b 66 was used to identify the nucleotide sites in CP cistrons that may be involved in PRSV adaptation. Three codon-based approaches, FEL, IFEL and MEME, were included in the HyPhy package 67-69 . Neutrality test. Tajima