Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Genomic analyses provide insights into the history of tomato breeding


The histories of crop domestication and breeding are recorded in genomes. Although tomato is a model species for plant biology and breeding, the nature of human selection that altered its genome remains largely unknown. Here we report a comprehensive analysis of tomato evolution based on the genome sequences of 360 accessions. We provide evidence that domestication and improvement focused on two independent sets of quantitative trait loci (QTLs), resulting in modern tomato fruit 100 times larger than its ancestor. Furthermore, we discovered a major genomic signature for modern processing tomatoes, identified the causative variants that confer pink fruit color and precisely visualized the linkage drag associated with wild introgressions. This study outlines the accomplishments as well as the costs of historical selection and provides molecular insights toward further improvement.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Genome-wide relationship and fruit morphology in cultivated tomato and its wild relatives.
Figure 2: Evolution of fruit mass during domestication and improvement.
Figure 3: A major genomic signature of modern processing tomatoes and three causative variants for pink fruit.
Figure 4: Introgressions and sweeps.

Accession codes

Primary accessions

Sequence Read Archive


  1. 1

    Borlaug, N.E. Contributions of conventional plant breeding to food production. Science 219, 689–693 (1983).

    CAS  Article  Google Scholar 

  2. 2

    Diamond, J.M. Guns, Germs, and Steel (W.W. Norton & Company, New York, 1997).

  3. 3

    Doebley, J.F., Gaut, B.S. & Smith, B.D. The molecular genetics of crop domestication. Cell 127, 1309–1321 (2006).

    CAS  Article  Google Scholar 

  4. 4

    Gross, B.L. & Olsen, K.M. Genetic perspectives on crop domestication. Trends Plant Sci. 15, 529–537 (2010).

    CAS  Article  Google Scholar 

  5. 5

    Huang, X. et al. A map of rice genome variation reveals the origin of cultivated rice. Nature 490, 497–501 (2012).

    CAS  Article  Google Scholar 

  6. 6

    Hufford, M.B. et al. Comparative population genomics of maize domestication and improvement. Nat. Genet. 44, 808–811 (2012).

    CAS  Article  Google Scholar 

  7. 7

    Lam, H.M. et al. Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. Nat. Genet. 42, 1053–1059 (2010).

    CAS  Article  Google Scholar 

  8. 8

    Qi, J. et al. A genomic variation map provides insights into the genetic basis of cucumber domestication and diversity. Nat. Genet. 45, 1510–1515 (2013).

    CAS  Article  Google Scholar 

  9. 9

    Vincent, H. et al. A prioritized crop wild relative inventory to help underpin global food security. Biol. Conserv. 167, 265–275 (2013).

    Article  Google Scholar 

  10. 10

    Meissner, R. et al. A new model system for tomato genetics. Plant J. 12, 1465–1472 (1997).

    CAS  Article  Google Scholar 

  11. 11

    Ranc, N., Munos, S., Santoni, S. & Causse, M. A clarified position for Solanum lycopersicum var. cerasiforme in the evolutionary history of tomatoes (solanaceae). BMC Plant Biol. 8, 130 (2008).

    Article  Google Scholar 

  12. 12

    Jenkins, J. The origin of the cultivated tomato. Econ. Bot. 2, 379–392 (1948).

    Article  Google Scholar 

  13. 13

    Rick, C.M. Hybridization between Lycopersicon esculentum and Solanum pennellii: phylogenetic and cytogenetic significance. Proc. Natl. Acad. Sci. USA 46, 78–82 (1960).

    CAS  Article  Google Scholar 

  14. 14

    Tomatod Genome Consortium. The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485, 635–641 (2012).

  15. 15

    Sim, S.C. et al. High-density SNP genotyping of tomato (Solanum lycopersicum L.) reveals patterns of genetic variation due to breeding. PLoS ONE 7, e45520 (2012).

    CAS  Article  Google Scholar 

  16. 16

    Spooner, D.M., Peralta, I.E. & Knapp, S. Comparison of AFLPs with other markers for phylogenetic inference in wild tomatoes (Solanum L. section Lycopersicon (Mill.) Wettst.). Taxon 54, 43–61 (2005).

    Article  Google Scholar 

  17. 17

    Rick, C. & Holle, M. Andean Lycopersicon esculentum var. cerasiforme: genetic variation and its evolutionary significance. Econ. Bot. 44, 69–78 (1990).

    Article  Google Scholar 

  18. 18

    Tajima, F. Evolutionary relationship of DNA sequences in finite populations. Genetics 105, 437–460 (1983).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19

    Bolger, A. et al. The genome of the stress-tolerant wild tomato species Solanum pennellii. Nat. Genet. 46, 1034–1038 (2014).

    CAS  Article  Google Scholar 

  20. 20

    Blanca, J. et al. Variation revealed by SNP genotyping and morphology provides insight into the origin of the tomato. PLoS ONE 7, e48198 (2012).

    CAS  Article  Google Scholar 

  21. 21

    Gutenkunst, R.N., Hernandez, R.D., Williamson, S.H. & Bustamante, C.D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009).

    Article  Google Scholar 

  22. 22

    Chakrabarti, M. et al. A cytochrome P450 regulates a domestication trait in cultivated tomato. Proc. Natl. Acad. Sci. USA 110, 17125–17130 (2013).

    CAS  Article  Google Scholar 

  23. 23

    Grandillo, S., Ku, H. & Tanksley, S. Identifying the loci responsible for natural variation in fruit size and shape in tomato. Theor. Appl. Genet. 99, 978–987 (1999).

    CAS  Article  Google Scholar 

  24. 24

    Frary, A. et al. fw2.2: a quantitative trait locus key to the evolution of tomato fruit size. Science 289, 85–88 (2000).

    CAS  Article  Google Scholar 

  25. 25

    Muños, S. et al. Increase in tomato locule number is controlled by two single-nucleotide polymorphisms located near WUSCHEL. Plant Physiol. 156, 2244–2254 (2011).

    Article  Google Scholar 

  26. 26

    Liu, K. et al. A GH3-like gene, CcGH3, isolated from Capsicum chinense L. fruit is regulated by auxin and ethylene. Plant Mol. Biol. 58, 447–464 (2005).

    CAS  Article  Google Scholar 

  27. 27

    Pnueli, L. et al. The SELF-PRUNING gene of tomato regulates vegetative to reproductive switching of sympodial meristems and is the ortholog of CEN and TFL1. Development 125, 1979–1989 (1998).

    CAS  Google Scholar 

  28. 28

    Mao, L. et al. JOINTLESS is a MADS-box gene controlling tomato flower abscission zone development. Nature 406, 910–913 (2000).

    CAS  Article  Google Scholar 

  29. 29

    Tanksley, S.D. et al. Advanced backcross QTL analysis in a cross between an elite processing line of tomato and its wild relative L. pimpinellifolium. Theor. Appl. Genet. 92, 213–224 (1996).

    CAS  Article  Google Scholar 

  30. 30

    Xu, J. et al. Phenotypic diversity and association mapping for fruit quality traits in cultivated tomato and related species. Theor. Appl. Genet. 126, 567–581 (2013).

    Article  Google Scholar 

  31. 31

    Ballester, A.R. et al. Biochemical and molecular analysis of pink tomatoes: deregulated expression of the gene encoding transcription factor SlMYB12 leads to pink tomato fruit color. Plant Physiol. 152, 71–84 (2010).

    CAS  Article  Google Scholar 

  32. 32

    Adato, A. et al. Fruit-surface flavonoid accumulation in tomato is controlled by a SlMYB12-regulated transcriptional network. PLoS Genet. 5, e1000777 (2009).

    Article  Google Scholar 

  33. 33

    Rick, C.M. The tomato. Sci. Am. 239, 76–87 (1978).

    Article  Google Scholar 

  34. 34

    Tanksley, S.D. et al. Yield and quality evaluations on a pair of processing tomato lines nearly isogenic for the Tm2a gene for resistance to the tobacco mosaic virus. Euphytica 99, 77–83 (1998).

    Article  Google Scholar 

  35. 35

    Kaloshian, I. et al. Genetic and physical localization of the root-knot nematode resistance locus Mi in tomato. Mol. Gen. Genet. 257, 376–385 (1998).

    CAS  Article  Google Scholar 

  36. 36

    Verlaan, M.G. et al. The Tomato Yellow Leaf Curl Virus resistance genes Ty-1 and Ty-3 are allelic and code for DFDGD-class RNA-dependent RNA polymerases. PLoS Genet. 9, e1003399 (2013).

    CAS  Article  Google Scholar 

  37. 37

    Seah, S., Yaghoobi, J., Rossi, M., Gleason, C.A. & Williamson, V.M. The nematode-resistance gene, Mi-1, is associated with an inverted chromosomal segment in susceptible compared to resistant tomato. Theor. Appl. Genet. 108, 1635–1642 (2004).

    CAS  Article  Google Scholar 

  38. 38

    Schauer, N. et al. Mode of inheritance of primary metabolic traits in tomato. Plant Cell 20, 509–523 (2008).

    CAS  Article  Google Scholar 

  39. 39

    Gawel, N. & Jarret, R. A modified CTAB DNA extraction procedure for Musa and Ipomoea. Plant Mol. Biol. Rep. 9, 262–266 (1991).

    CAS  Article  Google Scholar 

  40. 40

    Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009).

    CAS  Article  Google Scholar 

  41. 41

    Li, R. et al. SNP detection for massively parallel whole-genome resequencing. Genome Res. 19, 1124–1132 (2009).

    CAS  Article  Google Scholar 

  42. 42

    Felsenstein, J. PHYLIP—phylogeny inference package (version 3.2). Cladistics 5, 164–166 (1989).

    Google Scholar 

  43. 43

    Falush, D., Stephens, M. & Pritchard, J.K. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44

    Patterson, N., Price, A.L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).

    PubMed  PubMed Central  Google Scholar 

  45. 45

    Gaut, B.S. Molecular clocks and nucleotide substitution rates in higher plants. Evol. Biol. 30, 93–120 (1998).

    CAS  Google Scholar 

  46. 46

    Barrett, J.C., Fry, B., Maller, J. & Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).

    CAS  Article  Google Scholar 

  47. 47

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  Article  Google Scholar 

  48. 48

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  49. 49

    Takagi, H. et al. QTL-seq: rapid mapping of quantitative trait loci in rice by whole genome resequencing of DNA from two bulked populations. Plant J. 74, 174–183 (2013).

    CAS  Article  Google Scholar 

  50. 50

    Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).

    CAS  Article  Google Scholar 

  51. 51

    Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355–360 (2010).

    CAS  Article  Google Scholar 

  52. 52

    Bradbury, P.J. et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007).

    CAS  Article  Google Scholar 

Download references


We thank J. Maloof (University of California, Davis) for providing tomato RNA sequencing data and L.A. Mueller and N. Menda (Cornell University) for setting up a genome browser of SNPs. This work was supported by funding from the National Program on Key Basic Research Projects in China (973 program; 2012CB113900 and 2011CB100600), the National Science Fund for Distinguished Young Scholars (31225025 to S.H.), the National HighTech Research Development Program in China (863 Program; 2012AA100101 and 2012AA100105), the National Natural Science Foundation of China (31272160, 31230064, 31272171 and 31171962), the Chinese Ministry of Finance (1251610601001), CAAS (an Agricultural Science and Technology Innovation Program grant to S.H.), the China Agriculture Research System (CARS-25-A-09 and CARS-25-A-15), the Special Fund for Agro-Scientific Research in the Public Interest of China (201303115), the Major Special Science and Technology Project during the Twelfth Five-Year Plan Period of Xinjiang (201230116-3) and the US National Science Foundation Plant Genome Program (IOS-0923312). This work was also supported by the Shenzhen municipal and Dapeng district governments.

Author information




S.H., Y.D., Z.Y. and Jingfu Li conceived and designed the research. T.L., G.Z., J.Z., X.X., Q.Y., Z. Zheng, Y.L., S.L., T.W. and Yuyang Zhang performed DNA sequencing and biological experiments. T.L., G.Z., Z. Zhang, K.L., Yancong Zhang, C.L., Y.X., X.W., Z.H., D.Z., Junming Li, G.X., C.Z., A.M., M.C., Z.F., J.J.G., R.T.C., A.W. and T.S. performed the data analysis. S.H., G.Z., T.L., J.Z., X.X., Q.Y. and Z. Zhang wrote the manuscript. Y.D., Z.Y., Jingfu Li, Z. Zhang, C.L., Y.X., A.M., M.C., Z.F., J.J.G., R.T.C., D.Z. and T.S. revised the manuscript.

Corresponding authors

Correspondence to Jingfu Li, Zhibiao Ye, Yongchen Du or Sanwen Huang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Spectra of fruit weight for three tomato groups.

Supplementary Figure 2 Determination of ΔK using STRUCTURE.

ΔK analysis for a different number of clusters (K) for a tomato population consisting of 331 accessions (excluding 10 wild accessions). ΔK showed a peak at 2, suggesting two clusters as the optimal option.

Source data

Supplementary Figure 3 Principal-component analysis (PCA) of 331 tomato accessions.

A total of 2,340,973 whole-genome SNPs (MAF > 10%, missing ≤ 5%) were used for PCA. Two-dimension coordinates were plotted for the 331 tomato accessions. CER (orange) and BIG (blue) accessions have a relatively concentrated distribution, whereas PIM accessions (green) are dispersed widely.

Source data

Supplementary Figure 4 Distribution of private SNPs in three tomato groups.

Private SNPs are presented for each chromosome in PIM (green), CER (orange) and BIG (blue).

Source data

Supplementary Figure 5 Genome-wide average LD decay in three tomato groups.

LD decay is estimated by the squared correlations of allele frequency (r2) against distance between polymorphic sites in PIM (green), CER (orange) and BIG (blue).

Source data

Supplementary Figure 6 Distribution of fruit weight and size of the 500 F2 individuals.

(a) Frequency distribution of fruit weight in the F2 population. The fruit weight of both parents and the F1 are shown. (b) Fruit appearances of individuals from the F2 population. The two bulks (big bulk and small bulk) are constructed by selecting the fruits shown in the first and last two rows, respectively.

Source data

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–6, Supplementary Tables 2, 3, 5, 6 and 11, and Supplementary Note. (PDF 1319 kb)

Supplementary Tables 1, 4 and 7–10

Supplementary Tables 1, 4 and 7–10. (XLS 4822 kb)

Supplementary Data Set

Scripts and pipelines. (ZIP 226 kb)

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lin, T., Zhu, G., Zhang, J. et al. Genomic analyses provide insights into the history of tomato breeding. Nat Genet 46, 1220–1226 (2014).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing