Multiple begomoviruses found associated with cotton leaf curl disease in Pakistan in early 1990 are back in cultivated cotton

The first epidemic of cotton leaf curl disease (CLCuD) in early 1990’s in the Indian subcontinent was associated with several distinct begomoviruses along with a disease-specific betasatellite. Resistant cotton varieties were introduced in late 1990’s but soon resistance was broken and was associated with a single recombinant begomovirus named Burewala strain of Cotton leaf curl Kokhran virus that lacks a full complement of a gene encoding a transcription activator protein (TrAP). In order to understand the ongoing changes in CLCuD complex in Pakistan, CLCuD affected plants from cotton fields at Vehari were collected. Illumina sequencing was used to assess the diversity of CLCuD complex. At least three distinct begomoviruses characterized from the first epidemic; Cotton leaf curl Multan virus, Cotton leaf curl Kokhran virus and Cotton leaf curl Alabad virus, several distinct species of alphasatellites and cotton leaf curl Multan betasatellite were found associated with CLCuD. These viruses were also cloned and sequenced through Sanger sequencing to confirm the identity of the begomoviruses and that all clones possessed a full complement of the TrAP gene. A new strain of betasatellite was identified here and named CLCuMuBVeh. The implications of these findings in efforts to control CLCuD are discussed.

single βC1 protein and depends on the helper virus for movement, replication and encapsidation 11 . Alphasatellites (~1.4 kb in size) encode their own replication associated (Rep) protein, replicate independently from their helper virus and are usually not required for pathogenicity 11 .
During the first epidemic, CLCuD was caused by begomovirus-betasatellite complex. CLCuD was associated with several distinct monopartite begomoviruses -Cotton Leaf curl Multan virus (CLCuMuV), Cotton leaf curl Kokhran virus (CLCuKoV), Cotton leaf curl Alabad virus (CLCuAV) and Papaya leaf curl virus (PaLCuV) 4,6,12 , and a single betasatellite (Cotton leaf curl Multan betasatellite [CLCuMuB]) 13,14 . Following the appearance of the second CLCuD epidemic in cotton, to present time, a predominant single recombinant begomovirus named as Cotton leaf curl Kokhran virus-Burewala (CLCuKoV-Bu), previously known as Cotton leaf curl Burewala virus (CLCuBuV), was associated with CLCuD in Pakistan and India 9 . However, the occasional occurrence of four other virus species and strains [CLCuKoV-Ko (2005), CLCuKoV-La (2008), CLCuKoV-Sha (2004 and CLCoMuV- Dar (2006)] was also observed on cotton in Pakistan (Table S3). CLCuKoV-Bu is a recombinant virus, consisting of sequences encoding complementary-sense genes derived from CLCuMuV and sequences encoding virion-sense genes and origin of replication derived fromCLCuKoV 9,15 . The most interesting feature of CLCuKoV-Bu isolates, associated with resistance breaking, was the lack of a full-length transcriptional activator protein (TrAP) and a mutated C2 protein of only 35 amino acids (aa) 9 . Recently CLCuKoV-Bu, with a full complement TrAP, has also been identified showing severe symptoms of CLCuD 16 . The betasatellite associated with CLCuKoV-Bu was also found to be recombinant with most of its sequence from CLCuMuB, but also containing a small fragment of SCR region derived from tomato leaf curl betasatellite 16 .
The CLCuD complex is in a state of continuous change, evolving rapidly to overcome resistance by component capture, recombination and mutation 17 . Recently a bipartite begomovirus, Tomato leaf curl New Delhi virus (ToLCNDV) was identified associated with CLCuD in Pakistan 18,19 . Interestingly, ToLCNDV isolated from cotton maintained a full complement of TrAP. Another study identified Chickpea chlorotic dwarf virus (CpCDV), a Mastrevirus in cotton showing leaf curl virus disease symptoms 20 . These results suggest that CLCuD complex has captured viruses that may have contributed to complete breakdown of resistance.
Here we have characterized begomoviruses and associated satellites from symptomatic samples of cotton collected in Vehari from lines that were being screened for virus resistance. In this study, we tried to understand the evolving nature and recent changes in begomovirus disease complexes by using rolling circle amplification (RCA) followed by next generation sequencing (NGS) and Sanger sequencing. Based on NGS and Sanger sequencing data, three distinct begomoviruses (CLCuMuV, CLCuKoV, CLCuAV) characterized from the first epidemic and several distinct alphasatellites and a single betasatellite species were identified associated with CLCuD. We also performed Southern blot hybridization for semi-quantification of begomoviruses and betasatellites. An important feature of the complex found in recent samples from Vehari is the absence of CLCuKoV-Bu in recent samples. Implications of these findings on begomovirus disease complexes are discussed.

Material and Methods
Sample collection, DNA extraction and rolling circle amplification. Cotton leaf samples from a total of six lines showing the typical CLCuD disease symptoms were collected from the Cotton Research Station (CRS) Vehari (Punjab province, Pakistan) in July 2015 (Fig. 1). Genomic DNA was extracted from infected samples using Cetyl trimethyl ammonium bromide (CTAB) method 21 , followed by ethanol precipitation and DNA quantification. To amplify circular molecules RCA 22 was performed using phi 29 DNA polymerase (Thermo Fisher Scientific, Waltham, MA USA). RCA product was purified, enriched and processed for NGS. Library preparation and Illumina sequencing. The Illumina NeoPrep automation system (Illumina, San Diego, CA) was used with library kit, Illumina #NP-101-1001, "TruSeq Nano DNA Library Kit for NeoPrep", which includes the adapter set "TruSeq LT". The target insert size was 350 bp, with size selection performed by the NeoPrep instrument. Actual lower size limit of the libraries was ~300 bp as measured by the Agilent 2200 TapeStation. Sequencing was performed on the Illumina MiSeq, v2 chemistry, 2 × 150 bp.
Nucleotide sequence assembly and analysis. The MiSeq Reporter software was set to automatically trim the adaptors. These short sequences were processed using CLC Genomics Work Bench 7.5. The paired-end reads obtained from the Illumina MiSeq Sequencer pipeline were subjected to quality filtering using quality score 0.001 and Phred quality score of 30. De novo as well as reference-guided assemblies were made. Reference-guided mapping was performed using begomovirus and satellite sequences present in Genbank. Based on good quality of data, twelve sequences from six plant samples (MW 6, MW 7, MW 8, MW 9, MW 10 and MW 11) were selected as shown in Table 1. All sequences were searched for similarity in NCBI non-redundant nucleotides database (nt), using BLASTn tool, provided by NCBI.  (Table S4). The PCR products of ~2.8 kb, ~1.4 kb partial fragments for virus, and ~1.4 kb for alphasatellite and betasatellite, were cloned in a TA cloning vector ((pTZ57R/T; Thermo Fisher Scientific, Waltham, MA USA). An average of five clones per plants were selected for sequencing. Plasmids of the desired clones were purified using AxyPrep ™ plasmid miniprep kit (Axygen, USA) and sequenced commercially on an Applied Biosystems 3730XL DNA sequencer (USA). The sequences were assembled and analyzed using Lasergene software (DNAStar Inc., Madison, USA). All the sequences were analyzed and compared to the sequences available in the data bank (NCBI) using BLASTn search tool.
SDT and phylogenetic analysis. Sequence demarcation tool (SDT) v1.2 26 was used for identification of species and strains of begomoviruses based on muscle alignment and identity score matrixes. Full length viruses and satellites sequences were aligned using pairwise multiple MUSCLE alignment algorithm in MEGA6 27 . This alignment was used to construct phylogenetic trees supported with 1000 bootstrap values through a neighbor-joining algorithm. Phylogenetic trees were edited and labelled in MEGA 6.

Recombination analysis.
Recombination among viruses and satellites were detected using RDP4 Beta 4.74 28 . RDP using 9 recombination analysis methods, RDP 29

Results
Begomoviruses, alphasatellites and betasatellites obtained through NGS. The de novo assembly of the high-throughput Illumina sequencing data was carried out on CLC Genomics Work bench 7.5 (CLC bio) software. Resulting contigs of size ~2.8 kb and 1.4 kb, based on putative genome size of begomoviruses and alpha/betasatellites respectively, were selected from the data. The selected contigs were searched by using the BLASTn search tool for closely related begomoviruses and satellites molecules available at NCBI-GenBank database. From the NGS data, BLASTn indicated three kinds of begomoviruses, Cotton leaf curl Multan virus-Pakistan (CLCuMuV-PK), Cotton leaf curl Alabad virus-Multan (CLCuAlV-Mu) and cotton leaf curl Kokhran virus-Kokhran (CLCuKoV-Ko) were present. However, CLCuKoV-Ko sequences identified here with NGS have low percent (50-60%) coverage as shown in Table 1.

Analysis of TrAP gene of begomoviruses.
A detailed analysis of TrAP encoding gene showed that isolates, CLCuMuV-PK, CLCuMuV-Raj and CLCuKoV-Sha obtained here through Sanger sequencing encoded a putatively full-length TrAP of 150 amino acids as shown in Table 2.

Recombination analysis of begomoviruses.
To determine recombination among begomoviruses isolates identified here, RDP4 Beta 4.74 28 analysis was conducted. The isolates of CLCuMuV-Raj, CLCuKoV-Sha and CLCuMuV identified here were aligned with 260 full genome sequences of begomoviruses available and retrieved from the database (Table S1). The RDP result for begomoviruses is shown in Fig. 2 and further details including p-values are provided in Table S2.
Recombination analysis of CLCuMuV-Raj isolates show that CLCuMuV-Raj is a recombinant of two viruses CLCuMuV and CLCuKoV-Ko as reported previously 40   CLCuMuV-Raj and CLCuKoV-Sha isolates shows that they are separated with closely related begomoviruses interspersed with the three begomoviruses as shown in Fig. 3.

Analysis of alphasatellites.
Three types of alphasatellites GDarSLA, CLCuBuA and AConSLA were identified from Sanger sequencing data. The phylogenetic tree of alphasatellites indicated that they clustered with closely related GDarSLA, CLCuBuA and AConSLA as shown in Fig. 4. RDP analysis of alphasatellites shows that sequence (MZ-40) consists of recombinant fragments of CLCuBuA and Croton yellow vein mosaic alphasatellite (CYVMA) sequences. RDP analysis of alphasatellites is shown with detail in Figure S2.
Betasatsallites identified and analysis of SCR region. A single type of betasatellite, CLCuMuB was obtained from Sanger sequencing data. Analysis of the SCR region of these betasatellites showed that three types of betasatellites separated into three groups as shown in Fig. 5. The first group of betasatellites consisting of 3 clones (MZ-33, MZ-34, MZ-36) segregated with CLCuMuB which was associated with CLCuD in Pakistan during the 1990s. It was first identified by Briddon et al. 41 , and known as Multan betasatellite (CLCuMuB Mul ). The second group consisted of a single clone (MZ-35) and is closely related to recombinant CLCuMuB associated with CLCuBuV known as (CLCuMuB Bur ). The clone (MZ-35) contains approximately 95 nt derived from Tomato leaf curl betasatellite (ToLCuB) within the SCR region 17,42 . The third group consisted of two clones (MZ-32, MZ-37) and their sequence in SCR does not match either CLCuMuB Mul or ToLCuB. Instead they contain approximately 90 nt within the SCR derived from cotton leaf curl virus betasatellite defective interfering DNA (KT228331), recently identified from India, and here we designate it as Vehari betasatellite (CLCuMuB Veh ). Phylogenetic trees of full length betasatellites and their SCR regions is shown in Fig. 5.

Southern blot hybridization of begomoviruses and betasatellites.
To determine relative titer of begomoviruses and betasatellites, Southern blot hybridization of the infected cultivated cotton samples was performed. Southern blot analysis of DNA samples extracted from symptomatic leaves showed weak hybridization signals for virus probe in sample 1 and 2 indicating that virus titer was low in these samples, while virus titer was high in samples 3, 4, 5 and 6 showing good hybridization signals for virus probe as shown in (Fig. 6A). The titer of betasatellites was high in all the samples showing good hybridization signals with beta probe (Fig. 6B).

Discussion
The begomovirus disease complex can severely affect yields of cultivated cotton in Pakistan and India, and previously often occurred as an infection of multiple begomoviruses. The CLCuKoV-Bu strain, with a truncated TrAP, is the only begomovirus strain associated with the resistance breaking of CLCuD currently found throughout Pakistan, with the occasional identification of other strains (Table S3) 9,43 . We have found that some recent isolates of CLCuKoV-Bu encode a full length TrAP (Hassan et al., unpublished data). Our current study shows that the CLCuD complex can occur as result of infection of multiple begomoviruses. The begomovirus disease complex can easily evolve due to component capture, recombination and mutation in order to overcome disease resistance and to expand its host range 17,44 .
To understand ongoing changes in the CLCuD complex, symptomatic leaves of previously resistant cotton varieties were collected from Vehari. Vehari is a major cotton growing area of Pakistan and exhibits a high diversity of begomoviruses. To evaluate the samples collected, a highly sensitive method of RCA coupled with NGS was used followed by confirmation with Sanger sequencing. Our NGS data showed that the three distinct begomoviruses, CLCuMuV, CLCuAlV and CLCuKoV-Ko associated with the first epidemic of CLCuD in early 1990s, a betasatellite, CLCuMuB and 3 alphasatellites GDarSLA, GDavSLA and CLCuBuA were identified in infected samples from Vehari. The reason for the low coverage (50-60%) of CLCuKoV-Ko in NGS data is not known. To confirm our NGS data results, genomic DNA from the collected samples were amplified with universal begomo, alpha and beta primers, cloned and dideoxy nucleotide chain termination sequencing was performed. From the Sanger sequencing data, two strains of Multan, CLCuMuV-Mu, CLCuMuV-Raj and one strain of Kokhran, CLCuKoV-Sha associated with the first epidemic of CLCuD, were identified. Our results from both NGS and Sanger sequencing data shows that CLCuKoV-Bu was not identified in infected cotton samples from Vehari. The presence of previously identified begomoviruses and the absence of CLCuKoV-Bu which was the only begomovirus found in  infected cotton plants, indicate a potential major change in the begomovirus complex associated with CLCuD in Pakistan. A single species of betasatellite (CLCuMuB) and three species of alphasatellites (GDarSLA, CLCuBuA and AConSLA) were identified in these samples. Sanger sequencing data complementing our NGS data, identified and confirmed the presence of CLCuMuV, CLCuMuV-Raj and CLCuKoV-Sha. However, CLCuAlV identified with NGS was not amplified with universal begomo primers. The CLCuAlV was amplified from infected samples with CLCuAlV specific primers designed according to the assembled sequences of NGS (Table S4).
A closer detailed analysis of the TrAP gene of CLCuMuV, CLCuMuV-Raj and CLCuKoV-Sha isolates identified here encode a full length TrAP protein of 150 aa. The CLCuKoV-Bu which is the dominant strain associated with CLCuD from 2000 onward in cultivated cotton, encodes a truncated C2 protein of 35 amino acids (aa) 9,14 . TrAP encoded by begomoviruses is a multi-functional protein, functioning as a pathogenicity determinant 44 and also a strong suppressor of post transcriptional gene silencing activity (PTGS) of the host 45 . Full length and truncated TrAP both exhibited PTGS activity but truncated ones have lower PTGS activity as compared to full length TrAP 46 .
Recombination is a major process of evolution in begomoviruses 47,48 . The recombination analysis shows that diversification of begomoviruses identified here occurs as a result of recombination between CLCuMuV and CLCuKoV. Most of begomoviruses identified here are recombinant between CLCuMuV and CLCuKoV exchanged as either virion-sense or complementary-sense genes but some isolates of CLCuMuV contain fragments of PeLCV and ToLCPalV in virion and complementary sense. Three types of betasatellites, replicating with begomoviruses, were identified from infected samples. A new type of recombinant betasatellite named as CLCuMuB Veh , has a recombinant region within the SCR region different from the previously identified CLCuMuB Mul and recombinant CLCuMuB Bur .
The emergence of viruses associated with the first epidemic of CLCuD in cultivated cotton indicate that the begomovirus complex may be changing to resemble the original CLCuD complex found during the first epidemic of the 1990s before the introduction of resistant varieties derived from LRA5166 and CP15/2 as a source of resistance. The arrival CLCuMuV, CLCuAlV, and CLCuKoV back into cultivated cotton (Table S3) is a sign of changes in the CLCuD complex and might be a sign that a new epidemic is possible. A further study of begomovirus diversity at Vehari and surrounding areas using RCA and NGS will help to understand CLCuD in a broader sense. The changing scenario indicates the need for identification and incorporation of new sources of resistance with several recently developed technologies, into the cultivated cotton [49][50][51] .