Introduction

Tomato yellow leaf curl disease (TYLCD) is caused by tomato yellow leaf curl virus (TYLCV) and is one of the most important diseases of tomato 1. TYLCV belongs to the genus Begomovirus within the family Geminiviridae. The whitefly, Bemisia tabaci, transmits TYLCV from diseased plants to healthy plants2,3, and the host range in plants is very wide, which include various types of vegetables, flowers, and weeds4. Tomatoes that are infected with TYLCV display leaf curling, yellowing, and stunting of plant growth. Severe damage due to TYLCV result in plant death and subsequently no crop yield 5, ultimately leading to substantial economic losses. In 2006, TYLCD outbreak was reported in many areas of China, resulting in a significant decrease in tomato production6,7. The primary strategies for controlling TYLCV transmission involve controlling populations of the whitefly vector and choosing tomato cultivars that are resistant to the virus. However, it is generally difficult to control whiteflies as these have strong reproductive and migratory behaviors. Furthermore, previous applications of large quantities of insecticides may have led to whitefly resistance to insecticides8,9. Consequently, planting of cultivars that are more resistant to TYLCV infection has become a more prominent strategy to maintain tomato production 10.

At the onset of a TYLCV outbreak, genes related to resistance were not present in cultivated tomato strains. However, several wild tomato species exhibited high levels of resistance to TYLCV11, and six major TYLCV resistance genes have been discovered in wild species. The Ty-1 and Ty-3 alleles were introgressed into tomato via Solanum chilense accessions LA1969 and LA2779, respectively. These alleles map to chromosome 6 and encode an RNA-dependent RNA polymerase12,13. Ty-2 originated from S. habrochiates strain B6013 and is located on the long arm of chromosome 1114,15. Ty-4 is located on the long arm of chromosome 3 and was introgressed from S. chilense strain LA193216. Marker SlNAC1 is linked to ty-517,18 and was derived from a S. peruvianum accession. Lastly, Ty-6 is a novel resistance gene that is located on chromosome 10 and was also derived from S. chilense strain LA277919.

Of the above resistance genes, Ty-1, Ty-2, and Ty-3 have been primarily introgressed into hybrid tomato cultivars in China and have prevented substantial TYLCV-related losses in the tomato industry. However, the virus exhibits a very high mutation rate, and the introgression of other resistance genes into hybrid cultivars is necessary20. The TYLCV resistant line AVTO1227 was introduced from the World Vegetable Center in 2013. AVTO1227 is highly resistant to TYLCV after inoculation with whiteflies that are viruliferous for the TYLCV strain. Genotypic analysis using the marker SlNAC1 indicated that their resistance is conferred by ty-5. Consequently, developing tomato cultivars with ty-5 is of great interest for tomato breeding programs of China. In particular, fine-mapping of ty-5 may substantially aid these efforts.

Recently, whole genome resequencing (WGR) has been widely adopted in gene mapping21,22. WGR is faster and more efficient in developing linkage markers compared to traditional methods. In this study, we conducted WGR of two genomic DNA pools representing tomatoes that are resistant and susceptible to TYLCV and were developed from plants in a TYLCV-inoculated F2 population. Polymorphic SNPs and INDELs were then identified between the two genome pools. The region of the chromosome where the TYLCV resistance gene was located was further analyzed. Combined genotypic and phenotypic analyses of the F2 population indicated that pelota is the gene corresponding to ty-5. Finally, two transversions within this region were detected.

Results

Comparison of tomato line phenotypes after TYLCV inoculation

Tomato lines with different levels of TYLCV resistance were inoculated, and the severity of disease symptoms was subsequently evaluated. Two inoculation methods were used to evaluate disease severity: whiteflies viruliferous for the TYLCV-IL strain and Agrobacterium carrying the infectious TYLCV-IL clone (Table 1). In contrast to the high disease severity index (DSI) results for the control tomato line Money maker that was susceptible to infection, all the other lines were highly resistant to TYLCV using both inoculation methods. The CLN2777A and AVTO1227 lines did not display any typical TYLCV disease symptoms, although PCR analysis indicated that all five tomato lines carried the TYLCV. Quantitative RT-PCR analysis also confirmed that these lines harbored the TYLCV. Comparison of TYLCV levels at 7 and 14 day post inoculation (dpi) indicated significant differences. The observed increase in TYLCV content in material 1227 was not as high as that in the other materials. TYLCV content in Money maker was significantly higher compared to the other lines regardless of the number of days after inoculation (7 or 14 dpi) (Fig. 1).

Table 1 Tomato yellow leaf curl virus disease severity index values for five inbred tomato lines with two inoculation methods.
Figure 1
figure 1

TYLCV abundances in tomato lines with different genotypes 7 and 14 days post inoculation with an infectious TYLCV-IL clone. TYLCV abundances are based on RT-PCR analysis of gene copies. The line designations TY52, CLN2777A, 10-A-GY, AVTO1227, and Money maker refer to those with the Ty-1, Ty-2, Ty-3, and ty-5 resistance genes and no resistance genes, respectively. Asterisks above the bars represent significant differences between 7 and 14 days post inoculation. (*P < 0.05, **P < 0.01).

Inheritance of TYLCV resistance in line AVTO1227

AVTO1227 (P1) and Money maker (P2) were used to develop the F1, F2, BC1P1 and BC1P2 populations. Population phenotypes were then evaluated and recorded 45 days following TYLCV inoculation. All F1 and BC1P2 seedlings showed stunting and yellowing symptoms. In contrast, F2 and BC1P1 seedlings infected with TYLCV exhibited variations in symptoms (Table 2). The segregation ratio of resistant-susceptible individuals indicated that a single recessive gene conferred resistance to TYLCV in AVTO1227. Analysis of the molecular marker SlNACI then indicated that the candidate recessive gene in AVTO1227 is ty-5.

Table 2 Variation in resistance to TYLCV in different generations.

BSA-seq analysis

Two DNA genome pools, namely, resistant (R-) and susceptible (S-), were constructed for BSA-seq analysis using Illumina high-throughput sequencing. A total of 253,249,960 and 239,118,590 clean reads were obtained from the R and S-pools, respectively. Raw data were deposited in the NCBI Sequence Read Archive under the accession number PRJNA312569. About 91.7% of these clean reads were mapped onto the tomato genome, and resulted in 96.66% and 96.86% coverage, with at least 10× depth in the R- and S-pools, respectively. A total of 1,709,042 SNPs and 94,066 INDELs were identified as differential between the R- and S-pools. Circos software was used to analyze the distribution of the polymorphisms, which indicated that the distribution of SNPs and INDELs across chromosomes is not uniform (Fig. 2). For example, there were only 1,983 differential INDELs on chromosome 5, whereas there were 46,592 differential INDELs on chromosome 9.

Figure 2
figure 2

Distribution of polymorphic variations between resistant and susceptible tomato genomes. The external ring is the reference genome. The middle ring shows the distribution of SNPs as indicated by a scatter plot. The inner ring is the distribution of INDELs as indicated by a linear histogram.

Gene location association analysis using BSA-seq

The Δ(SNP_index), Δ(INDEL_index), and ED values were calculated to conduct association mapping. Peak regions above the threshold value are defined as regions where ty-5 may be located. SNP analysis indicated that the region encompassing the genomic positions 2,084,876–3,198,109 on chromosome 4 may be the candidate location of ty-5. Concordantly, INDEL analysis suggested that the interval encompassing the genomic positions 2,227,907–3,198,109 region on chromosome 4 was identified as candidate region of ty-5. Combining both results, the TYLCV resistance gene ty-5 was mapped to the interval encompassing the genomic positions 2,227,907–3,198,050 region on tomato chromosome 4 (Fig. 3). A total of 129 genes are located in this 970-kb region. Some of these genes contain NB-ARC domains, and functional annotation suggests that these play a role in disease resistance. The PCR product of the linkage marker SlNACI was then sequenced, and sequence comparison indicated that this marker is located within the ty-5 interval region.

Figure 3
figure 3

Euclidean distance association analysis of the ty-5 candidate interval in the tomato genome. (a) ED calculated based on the SNP-index. The red arrow indicates the chromosomal location of the ty-5. (b) ED calculated based on the INDEL-index. The blue arrow indicates the chromosomal location of the ty-5.(c) Enlarged view of chromosome 4 from Fig. 3a (left) and 3b (right). Different colors in Fig. 3a and b represent different tomato chromosomes. The red dotted line indicates a 0.5% empirical outlier threshold for the BSA sequencing.

Mapping of ty-5 gene

The 970-kb interval harbored 940 SNPs and 184 INDELs between the two experimental groups. One SNP was identified in nearly every kilobase and polymorphisms between parents can be used in ty-5 mapping. DNA samples from 10 TYLCV-resistant and 10 TYLCV-susceptible plants were used to determine linkage of markers to resistance genes. The genotypes of 2,136 F2 plants were assessed using the SlNACI marker and ty5–17. The resistance gene was identified in the interval between these two markers, and 64 recombinant plants were detected between them. Fine-mapping was conducted to better resolve the location of ty-5, and linkage markers between SlNACI and ty5–17 were used in genotyping (Table 3). Twenty-five recombinants were ultimately identified between ty5–13 and ty5–17, narrowing down the region harboring the resistance gene to an interval of 101 kb.

Table 3 Molecular marker information for mapping the ty-5 gene.

Genotyping did not identify a content marker that could be directly used in electrophoretic analysis. Consequently, SNP polymorphisms were further assessed by sequencing PCR products. Three sequence markers, ty5–25, ty5–26, and ty5–29 were used to analyze the 25 recombinants (Fig. 4). All 25 recombinants with ty5–26 and ty5–29 exhibited the same genotype (Table 4). Seven of these plants indicated that recombination occurred between ty5–25 and ty5–29. The F2 phenotype of these plants then indicated that the ty-5 gene is located within the chromosomal interval between ty5–25 and ty5–13. Three of the seven recombinant plants, namely, 45, 1198, and 1683, were selfed and received F3 seeds. The three F3 populations were inoculated with TYLCV, and disease severity was then analyzed. All individuals of the F3 population of plant 1683 exhibited TYLCV resistance, which coincides with the genotype of the F2 progeny. In contrast, all individuals of the F3 population from plant 45 showed susceptibility to TYLCV. The F3 population from plant 1198 exhibited variations in TYLCV resistance. Order of Markers on the chromosome 4 is from SlNACI to ty5–17, just as that showed in the first row of Table 4. All these results indicate that ty-5 is located between markers ty5–25 and ty5–29 (Fig. 5), which are located on chromosome 4, between genomic positions 3116418 of ty5–29 and 3,130,934 of ty5–25. Over this 14.5 kbp genomic interval, only one gene, pelota, is present. The pelota gene, which corresponds to ty-5, confers TYLCV resistance in AVTO1227. In the interval harboring the pelota gene, two transversions within the promoter region and a SNP in the exon were detected. The transversion A-to-C in the first exon of pelota resulting in a Valine16-to-Glycine substitution in AVTO1227 (Table 5).

Figure 4
figure 4

Identification of SNP polymorphisms via PCR product sequencing. Red boxes indicate identified polymorphisms. Groups are labeled such that R indicates the homozygous resistant genotype, H indicates the heterozygous genotype, and S indicates the homozygous susceptible genotype.

Table 4 Genotype and phenotype analysis of plants that underwent recombination between ty5–13 and ty5–17.
Figure 5
figure 5

Genetic map of the TYLCV resistance locus ty-5 on tomato chromosome 4. Linkage maps were constructed from F2 populations that were derived from a cross between the susceptible tomato line Money maker and the resistant line AVTO1227. Numbers on the left of the vertical line indicate genetic distances (in centimorgans).

Table 5 Sequence difference of R- and S-pool within the interval between markers ty5–25 and ty5–29.

Expression of ty-5 under different conditions

Three tomato lines, namely, AVTO1227, Money maker, and CLN2777A, were inoculated with TYLCV. Seven days post-inoculation, inoculated and uninoculated control samples were collected for assessment of ty-5 expression levels. Quantitative RT-PCR (qPCR) analysis indicated that ty-5 and its allele, Ty-5, were expressed in all samples in both experimental groups. Furthermore, the expression level of these alleles did not significantly differ among all six samples (P = 0.05, Fig. 6).

Figure 6
figure 6

Relative expression of ty-5 transcripts based on real-time RT-PCR analysis in three tomato lines seven days after agroinfiltration. The GAPDH gene was used as internal reference. Error bars represent standard errors of three biological replicates. Significant differences were analyzed at the 0.05 level.

Discussion

Since 2006, TYLCV has emerged in the majority of tomato-producing areas in China6. At that time, almost all of the primary cultivated varieties of tomatoes were susceptible to TYLCV. However, since then, several tomato varieties with TYLCV resistance have been bred. In particular, Ty-1, Ty-2, and Ty-3 have been widely used in tomato cultivars to confer resistance. However, long-term use of these genes may lead to the development of new mechanisms by which the pathogen can overcome resistance. Consequently, new TYLCV resistance genes are needed, and one such gene, ty-5, has been shown to confer superior resistance, thereby indicating that this may be potentially applied to the field.

The identification of gene locations that control favorable phenotypes is a critical step in plant breeding and gene cloning, where tightly linked markers can be used in marker-assisted selection during plant breeding. Traditional gene mapping is labor-intensive, time-consuming, and costly. Numerous DNA markers have been used to detect polymorphic sites between parents or DNA pools such as those identified by bulked-segregant analysis (BSA). Parts of these markers are polymorphic and only a few of these are linked to target genes. However, polymorphisms can readily be identified using next-generation sequencing (NGS). BSA has been recently combined with NGS technologies for gene mapping. Using these methodologies, we located ty-5 within a 970-kb interval on chromosome 4 of tomato. SNP indices and analysis of ED values among polymorphic sites within this interval have allowed us to easily identify linkage markers for ty-5, which include CAPS, INDELs, and SNPs that can be subsequently used for further investigations.

Compared to the susceptible control line, Money maker, plants carrying TYLCV resistance genes still show low virus levels, but do not exhibit severe symptoms after inoculation with TYLCV. TYLCV can replicate and spread across tomato lines, regardless of the presence of resistance genes in the lines. Tomato lines with ‘resistance’ genes are thus not technically resistant, but rather tolerate low levels of TYLCV. We have thus named these genes based on the phenotype of each tomato line after inoculation with TYLCV. Genes that are associated with TYLCV susceptibility have thus been referred to as ‘S-genes’23. The discovery and utilization of S-genes is a novel strategy for disease resistance breeding.

In this study, a recessive gene in AVTO1227, namely, ty-5, was identified within a 14.5-kb interval on chromosome 4, where only a single gene, pelota, is located. Pelota has been demonstrated as a surveillance factor in the dissociation process of ribosomes into subunits24,25. TYLCV resistance genes differ from typical resistance genes, and of these, ty-5 is the only one that was associated with recessive resistance. Two other examples of recessive resistance genes, ol-2 and pot-1, have been reported in powdery mildew fungus and potyviruses, respectively26,27. These recessive resistance genes play key roles in pathogenesis, and natural or TILLING mutations involving these genes can result in broad-spectrum resistance in tomato. Resequencing of ty-5 and its allele, Ty-5, demonstrated that there are two nucleotide transversions within the promoter region and one nucleotide transversion in an exonic region. The expression of these alleles did not change regardless of TYLCV inoculation. Consequently, variations in the exon may lead to protein changes. Considering that ty-5 is a recessive resistance gene similar to ol-2 and pot-1, we hypothesize that Ty-5 may be an S-gene that is important to TYLCV invasion. CRISPR/Cas9 methods may also be employed to edit the Ty-5 allele of the Money maker tomato line.

Two different polymorphic regions were identified between ty-5 and its susceptibility allele, Ty-5, in AVTO1227. Two transversions were found within the promoter of the ty-5 gene, whereas a SNP was detected in the exon of ty-5. No difference in expression of ty-5 or Ty-5 was detected among all samples, regardless of whether the sample was inoculated with TYCLV. Thus, the two transversions in the ty-5 promoter may not contribute to TYLCV resistance in AVTO1227. Future investigations should use the SNP within the exon to generate an amino acid substitution in ty-5 of AVTO1227. The pelota gene, which is located within the ty-5 region, has been demonstrated as a messenger RNA surveillance factor that plays a role in the dissociation of ribosomes into subunits when ribosomes are stalled. However, there is no current evidence that pelota participates in TYLCV resistance. TYLCV have six partially overlapping open reading frames, and similar to most viruses, TYLCV relies on the host cell machinery to complete their infection cycle. Consequently, investigations involving mutations in the pelota gene may elucidate its role in viral protein biosynthesis.

Methods

Plant materials and growth conditions

The TYLCV resistance gene accessions AVTO1227 and CLN2777A that carry the ty-5 and Ty-2 genotypes, respectively, were obtained from the Asian Vegetable Research and Development Center. The Money maker accession is susceptible to TYCLV. A cross between AVTO1227 (P1) and Money maker (P2) were performed, and the resulting F1 population was used as the female parent in generating the BC1P1 and BC1P2 lines. The F1 plants were then self-pollinated to obtain the F2 populations. Lastly, the F2 individuals were self-pollinated to generate F2:3 lines. The TYLCV resistant inbred line CLN2777A and the TYLCV susceptible line 9210 were used as controls to assess the success of TYLCV inoculation. All plants were grown at 26 °C under the same conditions.

Whitefly reproduction and TYLCV inoculation

Whitefly feeding and reproduction was conducted in a phytotron. Six-week-old 9210 line seedlings were inoculated with whiteflies viruliferous for the TYLCV-IL strain. Whitefly-mediated TYLCV inoculation was considered successful when the disease index of the seedlings reached >55%. Disease evaluation was conducted when the abundance of viruliferous whitefly was high enough for inoculation. Three leaf-stage seedlings were used to evaluate disease states. Trays with seedlings were moved to the phytotron, and viruliferous whiteflies on line 9210 were introduced to the inoculated seedlings, followed by shaking of the seedlings three times each day to achieve uniform inoculation. Seven days after inoculation, the trays were moved to another phytotron. Whiteflies were killed using imidacloprid, and the seedlings were transplanted to the glasshouse. One single seedling was conducted per pot plant, as previously described28.

Agroinfiltration was used in RT-PCR analysis. An infectious TYLCV clone (kindly provided by Dr. Baolong Zhang, Jiangsu Academy of Agricultural Science, Nanjing, Jiangsu, China) was transformed to Agrobacterium tumefaciens strain LBA4404 and used to agroinoculate the tomato seedlings. A. tumefaciens containing the TYLCV clone at an OD600 of 0.5 was used in the inoculations. The leaves of three-week-old seedlings were infiltrated by pressure inoculation using a needleless syringe.

Disease evaluation

Forty days after inoculation, the plants were assessed for TYLCV infection severity. The index of disease severity ranged from 0 to 4, where 0 = no visible symptoms and inoculated plants were similar to non-inoculated plants; 1 = mild yellowing of leaves at the apical point and no curly leaves; 2 = some yellowing and curling of apical leaves; 3 = strong yellowing and curling across a wide area of the inoculated plant, and slowed, but not arrested, growth; 4 = severe stunting, yellowing and curling, and plant growth ceased. Intermediate scores (e.g., 0.5 and 1.5) represent intermediate disease morphologies based on the above scale and previously described methods29.

DNA extraction and genome pool construction

DNA was extracted using a modified CTAB technique, as described elsewhere 30, and DNA concentrations were quantified with an Eppendorf BioSpectrometer machine (Eppendorf, Germany). Two genomic DNA pools were constructed for BSA-seq analysis, namely, the R-pool (resistant to TYLCV) and the S-pool (susceptible to TYLCV). Pools were constructed by mixing an equal concentration of DNA from 27 TYLCV resistant (grade = 0) and 27 TYLCV susceptible (grade = 4) F2 individuals.

Whole genome resequencing data analysis

Illumina libraries for the R- and S-pools were prepared according to the manufacturer’s protocols. Libraries with mixtures of 300–500 bp DNA fragments were constructed following fragmentation, adapter ligation, size selection, and PCR enrichment. Paired-end sequencing of fragments was performed using the Illumina High-seq 2500 sequencing platform at Berry Genomics Co., Ltd. To generate high-quality clean reads, raw sequence reads were filtered and trimmed using the following criteria: reads that matched a minimum of 25 nt of the adaptor sequence on the 5′ end were trimmed; reads with >10% unknown nucleotides or ambiguous bases were removed; and reads in which the percentage of low-quality bases (base quality ≤3) was ≥50% were removed. The clean reads from both pools were aligned to the Heinz 1706 reference genome using BWA and SAMTools software. Duplicate removal was conducted using the Picard software. SNPs and INDELs were then assessed22,31.

Gene location association analysis

A SNP index was used to indicate the proportion of reads harboring SNPs that differed from reference sequences. An Euclidean distance value (ED) was calculated by comparing SNPs across the two DNA genome pools as follows: SNP-indexalt = Nalt/(Nalt + Nref), Δ(SNP − indexalt) = SNP − indexalt (R-pool) − SNP − indexalt (S-pool), SNP-indexref = Nref/(Nalt + Nref), Δ(SNP − indexref) = SNP − indexref (R-pool) − SNP − indexref (S-pool), ED = [Δ(SNP − indexref)2 + Δ(SNP − indexalt)2]1/2. Nalt indicates the number of short reads that differed from the reference sequence. Nref indicates the number of short reads that were identical to reference sequences. Using these formulae, we then assessed whether the measured values fell within the following ranges, −1 ≤ Δ(SNP − index) ≤ 1, 0 ≤ ED ≤ 1.414. An ED of 1.414 would indicate that all of the SNP-containing reads from one DNA genome pool differ from the reference sequences, and that the SNP sites from the other pool do not differ from the reference sequences. When the ED is near 0, the proportion of reads with SNPs is the same in both the R- and S-pools. Loess regression fitting was then used to determine the association and location of the resistance gene, based on previously described methods21. Genomic regions with values over the threshold were considered candidate regions that were associated with ty-5. The ED of INDEL sites was calculated using the INDEL-index and Δ(INDEL - index) as described above for calculating the ED for SNP regions. Sliding window analysis was applied to ED plots using 2-Mb window sizes and in 10-kb increments. The average ED of the SNPs in each window was then calculated as previously described22,32.

Development of molecular markers for linkage map construction

Polymorphic SNP and INDEL sites among candidate ty-5 regions were used to design molecular markers. Primers for each polymorphic site were designed using the Primer Premier 5.0 software. Endonuclease enzymes were used when SNPs could be used as cleaved amplified polymorphic sequence (CAPSA) markers. Alternatively, if no endonuclease was suitable for the SNP site, then PCR reaction products were amplified and sequenced to directly identify SNPs. CAPS and INDEL markers were detected on 2% agarose and 6% polyacrylamide gels. The PCR and enzyme digestion reactions were as described by Wang et al. 201230. The genetic linkage map was constructed using the JOINMAP4.0 software and a LOD threshold of 3.033.

Expression analysis and ty-5 sequence comparisons

Candidate gene expression experiments were conducted after genetic linkage mapping. Leaf samples were collected from 10 individuals of both AVTO1227 and Money maker lines before inoculation and seven days after inoculation. Three biological replicates were conducted for each sample. Total leaf RNA was extracted using the RNA simple Total RNA Kit (TIANGEN, Beijing, China). Reverse transcription of RNA to cDNA was conducted using the PrimeScript 1st Strand cDNA synthesis kit (TaKaRa, Japan). Primers for quantitative RT-PCR of ty-5 were also designed using the Primer Premier 5.0 software. Primer specificity was evaluated by BLAST searches against the NCBI database and also via melt curve analysis after qPCR amplification. PCR amplifications were performed in the QuantStudio 6 Flex real-time thermal cycler (Thermo Fisher Scientific, USA) with 20-μL final reaction volumes containing 2.0 μL of cDNA, 0.4 μL of each primer (10 μM), 6.8 μL of sterile water, 0.4 μL of ROX Reference Dye II, and 10 μL (2×) SYBR Premix ExTaq™ II Kit (TaKaRa, Japan). The conditions for amplification were as follows: denaturation at 95 °C for 30 s, followed by 40 cycles of 95 °C for 5 s and 60 °C for 34 s. Expression levels of the selected genes were normalized to GAPDH expression levels. Three technical replicates of each sample were performed for ty-5 and GAPDH expression. Relative gene expression was calculated using the 2−ΔΔCT method34.

Accession codes

The sequencing data of this study has been deposited in the NCBI Sequence Read Archive under the accession number PRJNA312569.