Genomic analyses provide insights into the history of tomato breeding


The histories of crop domestication and breeding are recorded in genomes. Although tomato is a model species for plant biology and breeding, the nature of human selection that altered its genome remains largely unknown. Here we report a comprehensive analysis of tomato evolution based on the genome sequences of 360 accessions. We provide evidence that domestication and improvement focused on two independent sets of quantitative trait loci (QTLs), resulting in modern tomato fruit 100 times larger than its ancestor. Furthermore, we discovered a major genomic signature for modern processing tomatoes, identified the causative variants that confer pink fruit color and precisely visualized the linkage drag associated with wild introgressions. This study outlines the accomplishments as well as the costs of historical selection and provides molecular insights toward further improvement.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Genome-wide relationship and fruit morphology in cultivated tomato and its wild relatives.
Figure 2: Evolution of fruit mass during domestication and improvement.
Figure 3: A major genomic signature of modern processing tomatoes and three causative variants for pink fruit.
Figure 4: Introgressions and sweeps.

Accession codes

Primary accessions

Sequence Read Archive


  1. 1

    Borlaug, N.E. Contributions of conventional plant breeding to food production. Science 219, 689–693 (1983).

  2. 2

    Diamond, J.M. Guns, Germs, and Steel (W.W. Norton & Company, New York, 1997).

  3. 3

    Doebley, J.F., Gaut, B.S. & Smith, B.D. The molecular genetics of crop domestication. Cell 127, 1309–1321 (2006).

  4. 4

    Gross, B.L. & Olsen, K.M. Genetic perspectives on crop domestication. Trends Plant Sci. 15, 529–537 (2010).

  5. 5

    Huang, X. et al. A map of rice genome variation reveals the origin of cultivated rice. Nature 490, 497–501 (2012).

  6. 6

    Hufford, M.B. et al. Comparative population genomics of maize domestication and improvement. Nat. Genet. 44, 808–811 (2012).

  7. 7

    Lam, H.M. et al. Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. Nat. Genet. 42, 1053–1059 (2010).

  8. 8

    Qi, J. et al. A genomic variation map provides insights into the genetic basis of cucumber domestication and diversity. Nat. Genet. 45, 1510–1515 (2013).

  9. 9

    Vincent, H. et al. A prioritized crop wild relative inventory to help underpin global food security. Biol. Conserv. 167, 265–275 (2013).

  10. 10

    Meissner, R. et al. A new model system for tomato genetics. Plant J. 12, 1465–1472 (1997).

  11. 11

    Ranc, N., Munos, S., Santoni, S. & Causse, M. A clarified position for Solanum lycopersicum var. cerasiforme in the evolutionary history of tomatoes (solanaceae). BMC Plant Biol. 8, 130 (2008).

  12. 12

    Jenkins, J. The origin of the cultivated tomato. Econ. Bot. 2, 379–392 (1948).

  13. 13

    Rick, C.M. Hybridization between Lycopersicon esculentum and Solanum pennellii: phylogenetic and cytogenetic significance. Proc. Natl. Acad. Sci. USA 46, 78–82 (1960).

  14. 14

    Tomatod Genome Consortium. The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485, 635–641 (2012).

  15. 15

    Sim, S.C. et al. High-density SNP genotyping of tomato (Solanum lycopersicum L.) reveals patterns of genetic variation due to breeding. PLoS ONE 7, e45520 (2012).

  16. 16

    Spooner, D.M., Peralta, I.E. & Knapp, S. Comparison of AFLPs with other markers for phylogenetic inference in wild tomatoes (Solanum L. section Lycopersicon (Mill.) Wettst.). Taxon 54, 43–61 (2005).

  17. 17

    Rick, C. & Holle, M. Andean Lycopersicon esculentum var. cerasiforme: genetic variation and its evolutionary significance. Econ. Bot. 44, 69–78 (1990).

  18. 18

    Tajima, F. Evolutionary relationship of DNA sequences in finite populations. Genetics 105, 437–460 (1983).

  19. 19

    Bolger, A. et al. The genome of the stress-tolerant wild tomato species Solanum pennellii. Nat. Genet. 46, 1034–1038 (2014).

  20. 20

    Blanca, J. et al. Variation revealed by SNP genotyping and morphology provides insight into the origin of the tomato. PLoS ONE 7, e48198 (2012).

  21. 21

    Gutenkunst, R.N., Hernandez, R.D., Williamson, S.H. & Bustamante, C.D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009).

  22. 22

    Chakrabarti, M. et al. A cytochrome P450 regulates a domestication trait in cultivated tomato. Proc. Natl. Acad. Sci. USA 110, 17125–17130 (2013).

  23. 23

    Grandillo, S., Ku, H. & Tanksley, S. Identifying the loci responsible for natural variation in fruit size and shape in tomato. Theor. Appl. Genet. 99, 978–987 (1999).

  24. 24

    Frary, A. et al. fw2.2: a quantitative trait locus key to the evolution of tomato fruit size. Science 289, 85–88 (2000).

  25. 25

    Muños, S. et al. Increase in tomato locule number is controlled by two single-nucleotide polymorphisms located near WUSCHEL. Plant Physiol. 156, 2244–2254 (2011).

  26. 26

    Liu, K. et al. A GH3-like gene, CcGH3, isolated from Capsicum chinense L. fruit is regulated by auxin and ethylene. Plant Mol. Biol. 58, 447–464 (2005).

  27. 27

    Pnueli, L. et al. The SELF-PRUNING gene of tomato regulates vegetative to reproductive switching of sympodial meristems and is the ortholog of CEN and TFL1. Development 125, 1979–1989 (1998).

  28. 28

    Mao, L. et al. JOINTLESS is a MADS-box gene controlling tomato flower abscission zone development. Nature 406, 910–913 (2000).

  29. 29

    Tanksley, S.D. et al. Advanced backcross QTL analysis in a cross between an elite processing line of tomato and its wild relative L. pimpinellifolium. Theor. Appl. Genet. 92, 213–224 (1996).

  30. 30

    Xu, J. et al. Phenotypic diversity and association mapping for fruit quality traits in cultivated tomato and related species. Theor. Appl. Genet. 126, 567–581 (2013).

  31. 31

    Ballester, A.R. et al. Biochemical and molecular analysis of pink tomatoes: deregulated expression of the gene encoding transcription factor SlMYB12 leads to pink tomato fruit color. Plant Physiol. 152, 71–84 (2010).

  32. 32

    Adato, A. et al. Fruit-surface flavonoid accumulation in tomato is controlled by a SlMYB12-regulated transcriptional network. PLoS Genet. 5, e1000777 (2009).

  33. 33

    Rick, C.M. The tomato. Sci. Am. 239, 76–87 (1978).

  34. 34

    Tanksley, S.D. et al. Yield and quality evaluations on a pair of processing tomato lines nearly isogenic for the Tm2a gene for resistance to the tobacco mosaic virus. Euphytica 99, 77–83 (1998).

  35. 35

    Kaloshian, I. et al. Genetic and physical localization of the root-knot nematode resistance locus Mi in tomato. Mol. Gen. Genet. 257, 376–385 (1998).

  36. 36

    Verlaan, M.G. et al. The Tomato Yellow Leaf Curl Virus resistance genes Ty-1 and Ty-3 are allelic and code for DFDGD-class RNA-dependent RNA polymerases. PLoS Genet. 9, e1003399 (2013).

  37. 37

    Seah, S., Yaghoobi, J., Rossi, M., Gleason, C.A. & Williamson, V.M. The nematode-resistance gene, Mi-1, is associated with an inverted chromosomal segment in susceptible compared to resistant tomato. Theor. Appl. Genet. 108, 1635–1642 (2004).

  38. 38

    Schauer, N. et al. Mode of inheritance of primary metabolic traits in tomato. Plant Cell 20, 509–523 (2008).

  39. 39

    Gawel, N. & Jarret, R. A modified CTAB DNA extraction procedure for Musa and Ipomoea. Plant Mol. Biol. Rep. 9, 262–266 (1991).

  40. 40

    Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009).

  41. 41

    Li, R. et al. SNP detection for massively parallel whole-genome resequencing. Genome Res. 19, 1124–1132 (2009).

  42. 42

    Felsenstein, J. PHYLIP—phylogeny inference package (version 3.2). Cladistics 5, 164–166 (1989).

  43. 43

    Falush, D., Stephens, M. & Pritchard, J.K. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587 (2003).

  44. 44

    Patterson, N., Price, A.L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).

  45. 45

    Gaut, B.S. Molecular clocks and nucleotide substitution rates in higher plants. Evol. Biol. 30, 93–120 (1998).

  46. 46

    Barrett, J.C., Fry, B., Maller, J. & Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).

  47. 47

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

  48. 48

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  49. 49

    Takagi, H. et al. QTL-seq: rapid mapping of quantitative trait loci in rice by whole genome resequencing of DNA from two bulked populations. Plant J. 74, 174–183 (2013).

  50. 50

    Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).

  51. 51

    Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355–360 (2010).

  52. 52

    Bradbury, P.J. et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007).

Download references


We thank J. Maloof (University of California, Davis) for providing tomato RNA sequencing data and L.A. Mueller and N. Menda (Cornell University) for setting up a genome browser of SNPs. This work was supported by funding from the National Program on Key Basic Research Projects in China (973 program; 2012CB113900 and 2011CB100600), the National Science Fund for Distinguished Young Scholars (31225025 to S.H.), the National HighTech Research Development Program in China (863 Program; 2012AA100101 and 2012AA100105), the National Natural Science Foundation of China (31272160, 31230064, 31272171 and 31171962), the Chinese Ministry of Finance (1251610601001), CAAS (an Agricultural Science and Technology Innovation Program grant to S.H.), the China Agriculture Research System (CARS-25-A-09 and CARS-25-A-15), the Special Fund for Agro-Scientific Research in the Public Interest of China (201303115), the Major Special Science and Technology Project during the Twelfth Five-Year Plan Period of Xinjiang (201230116-3) and the US National Science Foundation Plant Genome Program (IOS-0923312). This work was also supported by the Shenzhen municipal and Dapeng district governments.

Author information




S.H., Y.D., Z.Y. and Jingfu Li conceived and designed the research. T.L., G.Z., J.Z., X.X., Q.Y., Z. Zheng, Y.L., S.L., T.W. and Yuyang Zhang performed DNA sequencing and biological experiments. T.L., G.Z., Z. Zhang, K.L., Yancong Zhang, C.L., Y.X., X.W., Z.H., D.Z., Junming Li, G.X., C.Z., A.M., M.C., Z.F., J.J.G., R.T.C., A.W. and T.S. performed the data analysis. S.H., G.Z., T.L., J.Z., X.X., Q.Y. and Z. Zhang wrote the manuscript. Y.D., Z.Y., Jingfu Li, Z. Zhang, C.L., Y.X., A.M., M.C., Z.F., J.J.G., R.T.C., D.Z. and T.S. revised the manuscript.

Corresponding authors

Correspondence to Jingfu Li or Zhibiao Ye or Yongchen Du or Sanwen Huang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Spectra of fruit weight for three tomato groups.

Supplementary Figure 2 Determination of ΔK using STRUCTURE.

ΔK analysis for a different number of clusters (K) for a tomato population consisting of 331 accessions (excluding 10 wild accessions). ΔK showed a peak at 2, suggesting two clusters as the optimal option. Source data

Supplementary Figure 3 Principal-component analysis (PCA) of 331 tomato accessions.

A total of 2,340,973 whole-genome SNPs (MAF > 10%, missing ≤ 5%) were used for PCA. Two-dimension coordinates were plotted for the 331 tomato accessions. CER (orange) and BIG (blue) accessions have a relatively concentrated distribution, whereas PIM accessions (green) are dispersed widely. Source data

Supplementary Figure 4 Distribution of private SNPs in three tomato groups.

Private SNPs are presented for each chromosome in PIM (green), CER (orange) and BIG (blue). Source data

Supplementary Figure 5 Genome-wide average LD decay in three tomato groups.

LD decay is estimated by the squared correlations of allele frequency (r2) against distance between polymorphic sites in PIM (green), CER (orange) and BIG (blue). Source data

Supplementary Figure 6 Distribution of fruit weight and size of the 500 F2 individuals.

(a) Frequency distribution of fruit weight in the F2 population. The fruit weight of both parents and the F1 are shown. (b) Fruit appearances of individuals from the F2 population. The two bulks (big bulk and small bulk) are constructed by selecting the fruits shown in the first and last two rows, respectively. Source data

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–6, Supplementary Tables 2, 3, 5, 6 and 11, and Supplementary Note. (PDF 1319 kb)

Supplementary Tables 1, 4 and 7–10

Supplementary Tables 1, 4 and 7–10. (XLS 4822 kb)

Supplementary Data Set

Scripts and pipelines. (ZIP 226 kb)

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lin, T., Zhu, G., Zhang, J. et al. Genomic analyses provide insights into the history of tomato breeding. Nat Genet 46, 1220–1226 (2014).

Download citation

Further reading