Modern tomatoes have narrow genetic diversity limiting their improvement potential. We present a tomato pan-genome constructed using genome sequences of 725 phylogenetically and geographically representative accessions, revealing 4,873 genes absent from the reference genome. Presence/absence variation analyses reveal substantial gene loss and intense negative selection of genes and promoters during tomato domestication and improvement. Lost or negatively selected genes are enriched for important traits, especially disease resistance. We identify a rare allele in the TomLoxC promoter selected against during domestication. Quantitative trait locus mapping and analysis of transgenic plants reveal a role for TomLoxC in apocarotenoid production, which contributes to desirable tomato flavor. In orange-stage fruit, accessions harboring both the rare and common TomLoxC alleles (heterozygotes) have higher TomLoxC expression than those homozygous for either and are resurgent in modern tomatoes. The tomato pan-genome adds depth and completeness to the reference genome, and is useful for future biological discovery and breeding.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Data availability

Raw genome and RNA-Seq reads have been deposited into the National Center for Biotechnology Information Sequence Read Archive under accession codes SRP150040, SRP186721 and SRP172989, respectively. The nonreference genome sequences and annotated genes of the tomato pan-genome and SNPs called from the RIL population are available via the Dryad Digital Repository (https://doi.org/10.5061/dryad.m463f7k).

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Change history

  • 23 May 2019

    In the version of the article originally published, the URL https://doi.org/10.5061/dryad.m463f7k in the ‘Data availability’ section was hyperlinked incorrectly. In addition, the copyright holder was listed as ‘The Author(s)’, but the copyright line should have read ‘This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply, 2019’. The errors have been corrected in the HTML and PDF versions of the article.


  1. 1.

    The Tomato Genome Consortium. The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485, 635–641 (2012).

  2. 2.

    Bauchet, G. & Causse, M. in Genetic Diversity in Plants (Intech, 2012).

  3. 3.

    Tanksley, S. D. The genetic, developmental, and molecular bases of fruit size and shape variation in tomato. Plant Cell 16 (Suppl.), S181–S189 (2004).

  4. 4.

    Zhu, G. et al. Rewiring of the fruit metabolome in tomato breeding. Cell 172, 249–261 (2018).

  5. 5.

    Labate, J. A. & Robertson, L. D. Evidence of cryptic introgression in tomato (Solanum lycopersicum L.) based on wild tomato species alleles. BMC Plant Biol. 12, 133 (2012).

  6. 6.

    Kim, J. et al. Analysis of natural and induced variation in tomato glandular trichome flavonoids identifies a gene not present in the reference genome. Plant Cell 26, 3272–3285 (2014).

  7. 7.

    Aflitos, S. et al. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing. Plant J. 80, 136–148 (2014).

  8. 8.

    Lin, T. et al. Genomic analyses provide insights into the history of tomato breeding. Nat. Genet. 46, 1220–1226 (2014).

  9. 9.

    Tieman, D. et al. A chemical genetic roadmap to improved tomato flavor. Science 355, 391–394 (2017).

  10. 10.

    Blanca, J. et al. Genomic variation in tomato, from wild ancestors to contemporary breeding accessions. BMC Genom. 16, 257 (2015).

  11. 11.

    Wang, W. et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557, 43–49 (2018).

  12. 12.

    Causse, M. et al. Whole genome resequencing in tomato reveals variation associated with introgression and breeding events. BMC Genom. 14, 791 (2013).

  13. 13.

    Bolger, A. et al. The genome of the stress-tolerant wild tomato species Solanum pennellii. Nat. Genet. 46, 1034–1038 (2014).

  14. 14.

    Strickler, S. R. et al. Comparative genomics and phylogenetic discordance of cultivated tomato and close wild relatives. PeerJ 3, e793 (2015).

  15. 15.

    Itkin, M. et al. Biosynthesis of antinutritional alkaloids in solanaceous crops is mediated by clustered genes. Science 341, 175–179 (2013).

  16. 16.

    Graham, J. S. et al. Wound-induced proteinase inhibitors from tomato leaves. II. The cDNA-deduced primary structure of pre-inhibitor II. J. Biol. Chem. 260, 6561–6564 (1985).

  17. 17.

    de Kock, M. J. D., Brandwagt, B. F., Bonnema, G., de Wit, P. J. G. M. & Lindhout, P. The tomato Orion locus comprises a unique class of Hcr9 genes. Mol. Breed. 15, 409–422 (2005).

  18. 18.

    Ori, N. et al. The I2C family from the wilt disease resistance locus I2 belongs to the nucleotide binding, leucine-rich repeat superfamily of plant resistance genes. Plant Cell 9, 521–532 (1997).

  19. 19.

    Martin, G. B. et al. Map-based cloning of a protein kinase gene conferring disease resistance in tomato. Science 262, 1432–1436 (1993).

  20. 20.

    Zhao, Q. et al. Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat. Genet. 50, 278–284 (2018).

  21. 21.

    Golicz, A. A. et al. The pangenome of an agronomically important crop plant Brassica oleracea. Nat. Commun. 7, 13390 (2016).

  22. 22.

    Li, Y. H. et al. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat. Biotechnol. 32, 1045–1052 (2014).

  23. 23.

    Contreras-Moreira, B. et al. Analysis of plant pan-genomes and transcriptomes with GET_HOMOLOGUES-EST, a clustering solution for sequences of the same species. Front. Plant Sci. 8, 184 (2017).

  24. 24.

    Gordon, S. P. et al. Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat. Commun. 8, 2184 (2017).

  25. 25.

    Hurgobin, B. et al. Homoeologous exchange is a major cause of gene presence/absence variation in the amphidiploid Brassica napus. Plant Biotechnol. J. 16, 1265–1274 (2018).

  26. 26.

    Montenegro, J. D. et al. The pangenome of hexaploid bread wheat. Plant J. 90, 1007–1013 (2017).

  27. 27.

    Menda, N. et al. Analysis of wild-species introgressions in tomato inbreds uncovers ancestral origins. BMC Plant Biol. 14, 287 (2014).

  28. 28.

    Shinozaki, Y. et al. High-resolution spatiotemporal transcriptome mapping of tomato fruit development and ripening. Nat. Commun. 9, 364 (2018).

  29. 29.

    Saladié, M. et al. A reevaluation of the key factors that influence tomato fruit softening and integrity. Plant Physiol. 144, 1012–1028 (2007).

  30. 30.

    Mu, Q. et al. Fruit weight is controlled by Cell Size Regulator encoding a novel protein that is expressed in maturing tomato fruits. PLoS Genet. 13, e1006930 (2017).

  31. 31.

    Tiwari, P., Sangwan, R. S. & Sangwan, N. S. Plant secondary metabolism linked glycosyltransferases: An update on expanding knowledge and scopes. Biotechnol. Adv. 34, 714–739 (2016).

  32. 32.

    Buttery, R. G., Teranishi, R., Flath, R. A. & Ling, L. C. in Flavor Chemistry: Trends and Developments, Vol. 388 (eds Teranishi, R., Buttery, R. G. & Shahidi, F.) 213–222 (American Chemical Society, 1989).

  33. 33.

    Buttery, R. G., Seifert, R. M., Guadagni, D. G. & Ling, L. C. Characterization of additional volatile components of tomato. J. Agr. Food Chem. 19, 524–529 (1971).

  34. 34.

    Tieman, D. et al. The chemical interactions underlying tomato flavor preferences. Curr. Biol. 22, 1035–1039 (2012).

  35. 35.

    Shen, J. et al. A 13-lipoxygenase, TomloxC, is essential for synthesis of C5 flavour volatiles in tomato. J. Exp. Bot. 65, 419–428 (2014).

  36. 36.

    Chen, G. et al. Identification of a specific isoform of tomato lipoxygenase (TomloxC) involved in the generation of fatty acid-derived flavor compounds. Plant Physiol. 136, 2641–2651 (2004).

  37. 37.

    Ashrafi, H., Kinkade, M. & Foolad, M. R. A new genetic linkage map of tomato based on a Solanum lycopersicum × S. pimpinellifolium RIL population displaying locations of candidate pathogen response genes. Genome 52, 935–956 (2009).

  38. 38.

    Hayward, S., Cilliers, T. & Swart, P. Lipoxygenases: From isolation to application. Compr. Rev. Food Sci. Food Saf. 16, 199–211 (2017).

  39. 39.

    Klee, H. J. & Tieman, D. M. The genetics of fruit flavour preferences. Nat. Rev. Genet. 19, 347–356 (2018).

  40. 40.

    Baldwin, E. A., Scott, J. W., Shewmaker, C. K. & Schuch, W. Flavor trivia and tomato aroma: Biochemistry and possible mechanisms for control of important aroma components. HortScience 35, 1013–1022 (2000).

  41. 41.

    Tettelin, H., Riley, D., Cattuto, C. & Medini, D. Comparative genomics: The bacterial pan-genome. Curr. Opin. Microbiol. 11, 472–477 (2008).

  42. 42.

    Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

  43. 43.

    Li, D. et al. MEGAHITv1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102, 3–11 (2016).

  44. 44.

    Daniell, H. et al. Complete chloroplast genome sequences of Solanum bulbocastanum, Solanum lycopersicum and comparative analyses with other Solanaceae genomes. Theor. Appl. Genet. 112, 1503–1518 (2006).

  45. 45.

    Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).

  46. 46.

    Camacho, C. et al. BLAST+: Architecture and applications. BMC Bioinform. 10, 421 (2009).

  47. 47.

    Li, W. & Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).

  48. 48.

    Edgar, R. C. MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 5, 113 (2004).

  49. 49.

    Han, Y. & Wessler, S. R. MITE-Hunter: A program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucl. Acids Res. 38, e199 (2010).

  50. 50.

    Holt, C. & Yandell, M. MAKER2: An annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinform. 12, 491 (2011).

  51. 51.

    Stanke, M. & Morgenstern, B. AUGUSTUS: A web server for gene prediction in eukaryotes that allows user-defined constraints. Nucl. Acids Res. 33, W465–W467 (2005).

  52. 52.

    Korf, I. Gene finding in novel genomes. BMC Bioinform. 5, 59 (2004).

  53. 53.

    Kopylova, E., Noe, L. & Touzet, H. SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 28, 3211–3217 (2012).

  54. 54.

    Kim, D., Langmead, B. & Salzberg, S. L. HISAT: A fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

  55. 55.

    Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

  56. 56.

    Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).

  57. 57.

    Iwata, H. & Gotoh, O. Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features. Nucl. Acids Res. 40, e161 (2012).

  58. 58.

    Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).

  59. 59.

    Gotz, S. et al. High-throughput functional annotation and data mining with the Blast2GO suite. Nucl. Acids Res. 36, 3420–3435 (2008).

  60. 60.

    Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

  61. 61.

    Golicz, A. A. et al. Gene loss in the fungal canola pathogen Leptosphaeria maculans. Funct. Integr. Genom. 15, 189–196 (2015).

  62. 62.

    Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).

  63. 63.

    Hubisz, M. J., Falush, D., Stephens, M. & Pritchard, J. K. Inferring weak population structure with the assistance of sample group information. Mol. Ecol. Resour. 9, 1322–1332 (2009).

  64. 64.

    Earl, D. A. & vonHoldt, B. M. STRUCTURE HARVESTER: A website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv. Genet. Resour. 4, 359–361 (2012).

  65. 65.

    Bradbury, P. J. et al. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007).

  66. 66.

    Zhong, S. et al. High-throughput Illumina strand-specific RNA sequencing library preparation. Cold Spring Harb. Protoc. 2011, 940–949 (2011).

  67. 67.

    Dobin, A. et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

  68. 68.

    McKenna, A. et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

  69. 69.

    Tikunov, Y. et al. A novel approach for nontargeted data analysis for metabolomics: Large-scale profiling of tomato fruit volatiles. Plant Physiol. 139, 1125–1137 (2005).

  70. 70.

    Alba, R. et al. Transcriptome and selected metabolite analyses reveal multiple points of ethylene control during tomato fruit development. Plant Cell 17, 2954–2965 (2005).

  71. 71.

    Gonda, I. et al. Sequencing-based bin map construction of a tomato mapping population, facilitating high-resolution quantitative trait loci detection. Plant Genome 12, 180010 (2019).

  72. 72.

    Broman, K. W., Wu, H., Sen, S. & Churchill, G. A. R/qtl: QTL mapping in experimental crosses. Bioinformatics 19, 889–890 (2003).

  73. 73.

    Spindel, J. et al. Bridging the genotyping gap: Using genotyping by sequencing (GBS) to add high-density SNP markers and new value to traditional bi-parental mapping and breeding populations. Theor. Appl. Genet. 126, 2699–2716 (2013).

  74. 74.

    Glauser, G. et al. Velocity estimates for signal propagation leading to systemic jasmonic acid accumulation in wounded Arabidopsis. J. Biol. Chem. 284, 34506–34513 (2009).

  75. 75.

    Pfaffl, M. W. A new mathematical model for relative quantification in real-time RT-PCR. Nucl. Acids Res. 29, e45 (2001).

Download references


This research was supported by grants from the US National Science Foundation (IOS-1339287 to Z.F. and J.J.G.; IOS-1539831 to Z.F., J.J.G. and H.J.K.; and IOS-1564366 to E.v.d.K., J.C. and D.M.T.), BARD, the US–Israel Binational Agricultural Research and Development Fund, a Vaadia-BARD Postdoctoral Fellowship Award (FI-508-14 to I.G.) and the USDA Agricultural Research Service.

Author information

Author notes

  1. These authors contributed equally: Lei Gao, Itay Gonda.


  1. Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, NY, USA

    • Lei Gao
    • , Itay Gonda
    • , Honghe Sun
    • , Qiyue Ma
    • , Kan Bao
    • , Kaitlin A. Stromberg
    • , Yimin Xu
    • , James J. Giovannoni
    •  & Zhangjun Fei
  2. Unit of Aromatic and Medicinal Plants, Newe Ya’ar Research Center, Agricultural Research Organization, Ramat Yishay, Israel

    • Itay Gonda
  3. Horticultural Sciences, Plant Innovation Center, University of Florida, Gainesville, FL, USA

    • Denise M. Tieman
    •  & Harry J. Klee
  4. Department of Food Science, Cornell University, Ithaca, NY, USA

    • Elizabeth A. Burzynski-Chang
    •  & Gavin L. Sacks
  5. US Department of Agriculture–Agricultural Research Service, Robert W. Holley Center for Agriculture and Health, Ithaca, NY, USA

    • Tara L. Fish
    • , Theodore W. Thannhauser
    • , James J. Giovannoni
    •  & Zhangjun Fei
  6. Department of Plant Science, The Pennsylvania State University, University Park, PA, USA

    • Majid R. Foolad
  7. Institute for the Conservation and Improvement of Agricultural Biodiversity, Polytechnic University of Valencia, Valencia, Spain

    • Maria Jose Diez
    • , Jose Blanca
    •  & Joaquin Canizares
  8. Department of Horticulture, University of Georgia, Athens, GA, USA

    • Esther van der Knaap
  9. Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China

    • Sanwen Huang


  1. Search for Lei Gao in:

  2. Search for Itay Gonda in:

  3. Search for Honghe Sun in:

  4. Search for Qiyue Ma in:

  5. Search for Kan Bao in:

  6. Search for Denise M. Tieman in:

  7. Search for Elizabeth A. Burzynski-Chang in:

  8. Search for Tara L. Fish in:

  9. Search for Kaitlin A. Stromberg in:

  10. Search for Gavin L. Sacks in:

  11. Search for Theodore W. Thannhauser in:

  12. Search for Majid R. Foolad in:

  13. Search for Maria Jose Diez in:

  14. Search for Jose Blanca in:

  15. Search for Joaquin Canizares in:

  16. Search for Yimin Xu in:

  17. Search for Esther van der Knaap in:

  18. Search for Sanwen Huang in:

  19. Search for Harry J. Klee in:

  20. Search for James J. Giovannoni in:

  21. Search for Zhangjun Fei in:


Z.F., J.J.G., H.J.K., S.H. and E.v.d.K. designed and managed the project. I.G., E.A.B.-C., K.A.S., T.L.F., G.L.S., T.W.T., D.M.T., Y.X., M.J.D., J.B., J.C., M.R.F. and E.v.d.K. collected samples and performed experiments. L.G., I.G., H.S., Q.M. and K.B. performed data analyses. L.G. and I.G. wrote the manuscript. Z.F. and J.J.G. revised the manuscript.

Competing interests

The authors declare no competing interests.

Corresponding authors

Correspondence to James J. Giovannoni or Zhangjun Fei.

Supplementary information

  1. Supplementary Information

    Supplementary Figs. 1–14 and Supplementary Note

  2. Reporting Summary

  3. Supplementary Tables

    Supplementary Tables 1–20

About this article

Publication history