Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Whole-genome resequencing of 445 Lactuca accessions reveals the domestication history of cultivated lettuce

Abstract

Lettuce (Lactuca sativa) is an important vegetable crop worldwide. Cultivated lettuce is believed to be domesticated from L. serriola; however, its origins and domestication history remain to be elucidated. Here, we sequenced a total of 445 Lactuca accessions, including major lettuce crop types and wild relative species, and generated a comprehensive map of lettuce genome variations. In-depth analyses of population structure and demography revealed that lettuce was first domesticated near the Caucasus, which was marked by loss of seed shattering. We also identified the genetic architecture of other domestication traits and wild introgressions in major resistance clusters in the lettuce genome. This study provides valuable genomic resources for crop breeding and sheds light on the domestication history of cultivated lettuce.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Phylogeny and population structure of cultivated lettuce and its wild relative species.
Fig. 2: Proposed domestication center of cultivated lettuce near the Caucasus.
Fig. 3: Identification of selective sweeps associated with domestication traits in cultivated lettuce.
Fig. 4: Introgressive contribution of L. serriola to lettuce resistance breeding.

Data availability

All raw sequencing data were deposited into the Sequence Read Archive (under BioProject accession PRJNA693894) and CNGB Nucleotide Sequence Archive (CNSA; under the accession number CNP0000335). Variant files, genome assemblies and annotation files are stored in CNSA under the same accession number. Source data are provided with this paper.

Code availability

All of the code used in this study is available at https://github.com/popgenome/lettuce2020.

References

  1. 1.

    Lindqvist, K. On the origin of cultivated lettuce. Hereditas 46, 319–350 (1960).

    Google Scholar 

  2. 2.

    De Vries, I. Origin and domestication of Lactuca sativa L. Genet. Resour. Crop Evol. 44, 165–174 (1997).

    Google Scholar 

  3. 3.

    Zohary, D. The wild genetic resources of cultivated lettuce (Lactuca sativa L.). Euphytica 53, 31–35 (1991).

    Google Scholar 

  4. 4.

    Lebeda, A., Ryder, E. J., Grube, R., Doležalová, I. & Krístková, E. in Genetic Resources, Chromosome Engineering, and Crop Improvement Vol. 3 (ed. Singh, R. J.) 377–472 (CRC Press, 2007).

  5. 5.

    Ryder, E. J. Lettuce, Endive and Chicory (Cab International, 1999).

  6. 6.

    Lebeda, A., Dolezalová, I., Feráková, V. & Astley, D. Geographical distribution of wild Lactuca species (Asteraceae, Lactuceae). Bot. Rev. 70, 328 (2004).

    Google Scholar 

  7. 7.

    Van Treuren, R., Coquin, P. & Lohwasser, U. Genetic resources collections of leafy vegetables (lettuce, spinach, chicory, artichoke, asparagus, lamb’s lettuce, rhubarb and rocket salad): composition and gaps. Genet. Resour. Crop Evol. 59, 981–997 (2012).

    Google Scholar 

  8. 8.

    Lebeda, A. et al. Wild Lactuca species, their genetic diversity, resistance to diseases and pests, and exploitation in lettuce breeding. Eur. J. Plant Pathol. 138, 597–640 (2014).

    CAS  Google Scholar 

  9. 9.

    Zhang, L. et al. RNA sequencing provides insights into the evolution of lettuce and the regulation of flavonoid biosynthesis. Nat. Commun. 8, 2264 (2017).

    PubMed  PubMed Central  Google Scholar 

  10. 10.

    Van Herwijnen, Z. & Manning, J. A review of the history and taxonomy of the enigmatic southern African endemic wild lettuce Lactuca dregeana DC. (Asteraceae: Lactuceae: Lactucinae). S. Afr. J. Bot. 108, 352–358 (2017).

    Google Scholar 

  11. 11.

    Sochor, M. et al. Lactuca dregeana DC. (Asteraceae: Chicorieae)—a South African crop relative under threat from hybridization and climate change. S. Afr. J. Bot. 132, 146–154 (2020).

    CAS  Google Scholar 

  12. 12.

    Reyes-Chin-Wo, S. et al. Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce. Nat. Commun. 8, 14953 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Lin, T. et al. Genomic analyses provide insights into the history of tomato breeding. Nat. Genet. 46, 1220–1226 (2014).

    CAS  PubMed  Google Scholar 

  14. 14.

    Qi, J. et al. A genomic variation map provides insights into the genetic basis of cucumber domestication and diversity. Nat. Genet. 45, 1510–1515 (2013).

    CAS  PubMed  Google Scholar 

  15. 15.

    Vavilov, N. I., Vavylov, M. I., Vavílov, N. Í., Vavilov, N. I. V. & Dorofeev, V. F. Origin and Geography of Cultivated Plants (Cambridge Univ. Press, 1992).

  16. 16.

    Mikel, M. A. Genealogy of contemporary North American lettuce. HortScience 42, 489–493 (2007).

    Google Scholar 

  17. 17.

    Lavelle, D. O. Genetics of Candidate Genes for Developmental and Domestication-Related Traits in Lettuce (Univ. California, Davis, 2009).

  18. 18.

    Yuan, H., Pan, J. & Chen, J. Genetic analysis and mapping of genes controlling lettuce lobed leaf. Acta Hortic. Sin. 44, 1496–1504 (2017).

    Google Scholar 

  19. 19.

    Konishi, S. et al. An SNP caused loss of seed shattering during rice domestication. Science 312, 1392–1396 (2006).

    CAS  PubMed  Google Scholar 

  20. 20.

    Li, C., Zhou, A. & Sang, T. Rice domestication by reducing shattering. Science 311, 1936–1939 (2006).

    CAS  PubMed  Google Scholar 

  21. 21.

    Lin, Z. et al. Parallel domestication of the Shattering1 genes in cereals. Nat. Genet. 44, 720–724 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Mitsuda, N. et al. NAC transcription factors, NST1 and NST3, are key regulators of the formation of secondary walls in woody tissues of Arabidopsis. Plant Cell 19, 270–280 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Chen, A. et al. PHYTOCHROME C plays a major role in the acceleration of wheat flowering under long-day photoperiod. Proc. Natl Acad. Sci. USA 111, 10037–10044 (2014).

    CAS  PubMed  Google Scholar 

  24. 24.

    Su, W. et al. Characterization of four polymorphic genes controlling red leaf color in lettuce that have undergone disruptive selection since domestication. Plant Biotechnol. J. 18, 479–490 (2020).

    CAS  PubMed  Google Scholar 

  25. 25.

    Van Treuren, R., Van der Arend, A. & Schut, J.Distribution of downy mildew (Bremia lactucae Regel) resistances in a genebank collection of lettuce and its wild relatives. Plant Genet. Resour. 11, 15–25 (2013).

    CAS  Google Scholar 

  26. 26.

    Meyers, B. C. et al. The major resistance gene cluster in lettuce is highly duplicated and spans several megabases. Plant Cell 10, 1817–1832 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Christopoulou, M. et al. Genome-wide architecture of disease resistance genes in lettuce. G3 (Bethesda) 5, 2655–2669 (2015).

    CAS  Google Scholar 

  28. 28.

    Parra, L. et al. Rationalization of genes for resistance to Bremia lactucae in lettuce. Euphytica 210, 309–326 (2016).

    CAS  Google Scholar 

  29. 29.

    Koopman, W. J., Guetta, E., van de Wiel, C. C., Vosman, B. & van den Berg, R. G. Phylogenetic relationships among Lactuca (Asteraceae) species and related genera based on ITS-1 DNA sequences. Am. J. Bot. 85, 1517–1530 (1998).

    CAS  PubMed  Google Scholar 

  30. 30.

    Koopman, W. J., Zevenbergen, M. J. & Van den Berg, R. G. Species relationships in Lactuca s.l. (Lactuceae, Asteraceae) inferred from AFLP fingerprints. Am. J. Bot. 88, 1881–1887 (2001).

    CAS  PubMed  Google Scholar 

  31. 31.

    Lev-Yadun, S., Gopher, A. & Abbo, S. The cradle of agriculture. Science 288, 1602–1603 (2000).

    CAS  PubMed  Google Scholar 

  32. 32.

    Van Treuren, R. & van Hintum, T. J. Next-generation genebanking: plant genetic resources management and utilization in the sequencing era. Plant Genet. Resour. 12, 298–307 (2014).

    Google Scholar 

  33. 33.

    Porebski, S., Bailey, L. G. & Baum, B. R. Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Mol. Biol. Rep. 15, 8–15 (1997).

    CAS  Google Scholar 

  34. 34.

    Huang, J. et al. BGISEQ-500 WGS library construction. protocols.io https://doi.org/10.17504/protocols.io.ps5dng6 (2018).

  35. 35.

    Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Liu, B. et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. Preprint at https://arxiv.org/abs/1308.2012 (2013).

  37. 37.

    Pellicer, J. & Leitch, I. J. The Plant DNA C-values database (release 7.1): an updated online repository of plant genome size data for comparative studies. New Phytol. 226, 301–305 (2020).

    PubMed  Google Scholar 

  38. 38.

    Wang, S. et al. Genomes of early-diverging streptophyte algae shed light on plant terrestrialization. Nat. Plants 6, 95–106 (2020).

    CAS  PubMed  Google Scholar 

  39. 39.

    Tarailo‐Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 25, 4.10.1–4.10.14 (2009).

    Google Scholar 

  40. 40.

    Jurka, J. et al. Repbase update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).

    CAS  Google Scholar 

  41. 41.

    Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351–i358 (2005).

    CAS  PubMed  Google Scholar 

  42. 42.

    Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Besemer, J. & Borodovsky, M. GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res. 33, W451–W454 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. 45.

    Chang, Z. et al. Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol. 16, 30 (2015).

    PubMed  PubMed Central  Google Scholar 

  46. 46.

    Duvick, J. et al. PlantGDB: a resource for comparative plant genomics. Nucleic Acids Res. 36, D959–D965 (2007).

    PubMed  PubMed Central  Google Scholar 

  47. 47.

    Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    CAS  PubMed  Google Scholar 

  48. 48.

    Badouin, H. et al. The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution. Nature 546, 148–152 (2017).

    CAS  PubMed  Google Scholar 

  49. 49.

    Scaglione, D. et al. The genome sequence of the outbreeding globe artichoke constructed de novo incorporating a phase-aware low-pass sequencing strategy of F1 progeny. Sci. Rep. 6, 19427 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Berardini, T. Z. et al. The Arabidopsis information resource: making and mining the “gold standard” annotated reference plant genome. Genesis 53, 474–485 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. 51.

    Consortium, T. G.The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485, 635–641 (2012).

    Google Scholar 

  52. 52.

    Campbell, M. S. et al. MAKER-P: a tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiol. 164, 513–524 (2014).

    CAS  PubMed  Google Scholar 

  53. 53.

    Li, L., Stoeckert, C. J. & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. 54.

    Abascal, F., Zardoya, R. & Telford, M. J. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res. 38, W7–W13 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. 55.

    Katoh, K., Kuma, K., Toh, H. & Miyata, T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33, 511–518 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. 56.

    Talavera, G. & Castresana, J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 56, 564–577 (2007).

    CAS  PubMed  Google Scholar 

  57. 57.

    Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006).

    CAS  PubMed  Google Scholar 

  58. 58.

    Mirarab, S. et al. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30, i541–i548 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. 59.

    Nguyen, L.-T., Schmidt, H. A., Von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).

    CAS  PubMed  Google Scholar 

  60. 60.

    Huelsenbeck, J. P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755 (2001).

    CAS  PubMed  Google Scholar 

  61. 61.

    Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K., von Haeseler, A. & Jermiin, L. S.ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. 62.

    Dierckxsens, N., Mardulyn, P. & Smits, G. NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Res. 45, e18 (2017).

    Google Scholar 

  63. 63.

    Lanfear, R., Calcott, B., Ho, S. Y. & Guindon, S. PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol. Biol. Evol. 29, 1695–1701 (2012).

    CAS  PubMed  Google Scholar 

  64. 64.

    McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. 65.

    DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. 66.

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. 67.

    Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. 68.

    Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2015).

    PubMed  Google Scholar 

  69. 69.

    Fan, X., Abbott, T. E., Larson, D. & Chen, K. BreakDancer: identification of genomic structural variation from paired-end read mapping. Curr. Protoc. Bioinformatics 45, 15.6.1–15.6.11 (2014).

    Google Scholar 

  70. 70.

    Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. 71.

    Reumers, J. et al. SNPeffect: a database mapping molecular phenotypic effects of human non-synonymous coding SNPs. Nucleic Acids Res. 33, D527–D532 (2005).

    CAS  PubMed  Google Scholar 

  72. 72.

    Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. 73.

    Felsenstein, J. PHYLIP (Phylogeny Inference Package) v.3.6 (Department of Genome Sciences, University of Washington, Seattle, 2005); https://evolution.genetics.washington.edu/phylip/faq.html

  74. 74.

    Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. 75.

    Jakobsson, M. & Rosenberg, N. A. CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23, 1801–1806 (2007).

    CAS  PubMed  Google Scholar 

  76. 76.

    Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. 77.

    Nychka, D., Furrer, R., Paige, J. & Sain, S. fields: tools for spatial data. https://doi.org/10.5065/D6W957CT (2017).

  78. 78.

    Durand, E. Y., Patterson, N., Reich, D. & Slatkin, M. Testing for ancient admixture between closely related populations. Mol. Biol. Evol. 28, 2239–2252 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  79. 79.

    Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. 80.

    Malaspinas, A.-S. et al. A genomic history of Aboriginal Australia. Nature 538, 207–214 (2016).

    CAS  PubMed  Google Scholar 

  81. 81.

    Terhorst, J., Kamm, J. A. & Song, Y. S.Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 49, 303–309 (2016).

    PubMed  PubMed Central  Google Scholar 

  82. 82.

    Guo, S. et al. Resequencing of 414 cultivated and wild watermelon accessions identifies selection for fruit quality traits. Nat. Genet. 51, 1616–1623 (2019).

    CAS  PubMed  Google Scholar 

  83. 83.

    Chen, H., Patterson, N. & Reich, D. Population differentiation as a test for selective sweeps. Genome Res. 20, 393–402 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  84. 84.

    Truco, M. J. et al. An ultra-high-density, transcript-based, genetic map of lettuce. G3 (Bethesda) 3, 617–631 (2013).

    CAS  Google Scholar 

  85. 85.

    Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  86. 86.

    Leigh, J. W. & Bryant, D. popart: full‐feature software for haplotype network construction. Methods Ecol. Evol. 6, 1110–1116 (2015).

    Google Scholar 

  87. 87.

    Van Treuren, R. & van Hintum, T. J.Comparison of anonymous and targeted molecular markers for the estimation of genetic diversity in ex situ conserved Lactuca. Theor. Appl. Genet. 119, 1265–1279 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  88. 88.

    Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  89. 89.

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  90. 90.

    Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  91. 91.

    Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

    CAS  Google Scholar 

  92. 92.

    Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  93. 93.

    Liu, S., Yeh, C.-T., Tang, H. M., Nettleton, D. & Schnable, P. S. Gene mapping via bulked segregant RNA-seq (BSR-Seq). PLoS ONE 7, e36406 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This research was supported by grants from the National Key Research and Development Program of China (2019YFC1711000 to H.L.), Shenzhen Municipal Government of China (JCYJ20170817145512467 to H.L.) and Guangdong Provincial Key Laboratory of Genome Read and Write (2017B030301011 to X.X.). The contributions of R.v.T. and T.v.H. were part of the Fundamental Research Programme ‘Circular and Climate Neutral’ (KB-34-013-001) funded by the Dutch Ministry of Agriculture, Nature and Food Quality. We thank S. Feng, S. K. Sahu and T. Chiu (BGI-Shenzhen) for helpful discussion on population structure.

Author information

Affiliations

Authors

Contributions

H. Liu, Xin Liu, J. Wang, H.Y., X.X., J.C.C. and T.v.H. conceived of the study idea. T.W., Xinjiang Liu, S.H., X.W., Z.X., Yaqiong Liu and J. Wei carried out the sampling process. H. Lu performed the library preparation and sequencing. J.C., P.S. and H.K. performed the RNA-seq and bulked segregant analysis experiments. T.W., Xinjiang Liu, Z.Z., Yang Liu, S.D. and T.Y. performed the analyses. T.W. and R.v.T. drafted the manuscript. T.L., Yang Liu, X.N., H.K., H. Liu and T.v.H. revised the manuscript.

Corresponding authors

Correspondence to Rob van Treuren or Xin Liu or Huan Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Genetics thanks Xiaowu Wang, Aureliano Bombarely and Thomas Schmutzer for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Phylogenetic relationships of the investigated Lactuca species.

Phylogenetic relationships of the investigated Lactuca species. a, A coalescence-based phylogenetic tree inferred from 4,513 single-loci nuclear genes identified from 12 Lactuca species and the outgroup Helianthus annuus. Branches are maximally supported by ASTRAL posterior probability/IQ-TREE bootstrap score/Mrbayes posterior probability unless otherwise indicated. PhyParts pie charts marked on the nodes, with blue, green, red, and gray color represent the gene trees of concordance, top conflict, other conflict, and no signals, respectively. b, A maximum-likelihood phylogenetic tree based on RAxML analysis of 75 plastid genes. Branches are maximally supported by RAxML bootstrap score/MrBayes posterior probability unless otherwise indicated. The primary (GP1), secondary (GP2), and tertiary gene pool (GP3) species are indicated by the species name.

Source data

Extended Data Fig. 2 Population structure of 440 Lactuca accessions.

Population structure of 440 Lactuca accessions. a, Model-based clustering analysis with different numbers of ancestry kinship (K) from 1 to 20. Species names and geographic origins are indicated in two colored bars at the bottom. L. serriola groups from Central Asia, Caucasus, Western Asia, Southern Europe and Eastern Europe are indicated below the color bars, and Turkish and admixed accessions are indicated by arrows. b, Cross-validation errors for each K and those for K from 6 to 20 are shown in the right panel. Each box plot represents the D statistics from 20 independent runs with randomly chosen seeds. The internal line in each box represents the median and the lower and upper hinges represent the 25th and 75th percentiles, respectively. The whiskers represent 1.5 multiplied by the interquartile range, and dots beyond the whiskers are outliers.

Source data

Extended Data Fig. 3 PCA plot of L. sativa and L. serriola accessions.

PCA plot of L. sativa and L. serriola accessions. a-b, PCA plot of L. sativa and L. serriola accessions colored by species (a) and origins (b). c-d, PCA plot of 199 L. serriola accessions showing the first and second components (c), and the third and fourth components (d). Colors denote geographic origins, with Western Europe (WEU) in light green, Southern Europe (SEU) in blue, Central Europe (CEU) in dark green, Western Asia (WAS) in purple, the Caucasus (CAU) in red, Central Asia (CAS) in orange, admixed samples and those with no collection information (other) in gray. The proportions of variance explained by the PCs are presented in the axis legends.

Source data

Extended Data Fig. 4 Linkage disequilibrium decay measured by r2 in four Lactuca species (a), four lettuce crop types (b) and six L. serriola phylogeographic groups (c).

Linkage disequilibrium decay measured by r2 in four Lactuca species (a), four lettuce crop types (b) and six L. serriola phylogeographic groups (c). Six phylogeographic groups of L. serriola include Western Europe (WEU), Southern Europe (SEU), Eastern Europe (CEU), Western Asia (WAS), the Caucasus (CAU), and Central Asia (CAS).

Source data

Extended Data Fig. 5 Change of effective population size (Ne) over time in cultivated and wild lettuce.

Change of effective population size (Ne) over time in cultivated and wild lettuce. a-b, Change of Ne in L. sativa and L. serriola inferred by a SMC++ estimate analysis with two replicates. c-d, Divergence between L. sativa and L. serriola inferred by a SMC++ split analysis with two replicates. SMC++ estimate and split analyses were performed on the same two sets of 30 randomly chosen accessions from each species. A generation was set as 1 and the mutation rate per generation per site as 4×10−8.

Source data

Extended Data Fig. 6 Neighbor-joining trees of L. sativa and L. serriola accessions using genome-wide SNPs (a), and the SNPs associated with leaf morphology (b), seed shattering (c) and leaf vein spine (d).

Neighbor-joining trees of L. sativa and L. serriola accessions using genome-wide SNPs (a), and the SNPs associated with leaf morphology (b), seed shattering (c) and leaf vein spine (d). L. sativa samples are in black, and L. serriola ones are colored according to their geographic origins, with Western Europe (WEU) in light green, Southern Europe (SEU) in blue, Eastern Europe (CEU) in dark green, Western Asia (WAS) in purple, the Caucasus (CAU) in red, Central Asia (CAS) in orange, and admixed samples and those with no collection information (other) in gray. L. serriola accessions close to the L. sativa clade are indicated by black triangles with their accession numbers and country of origin.

Source data

Extended Data Fig. 7 Genome-wide association analysis (GWAS) of flowering time in cultivated lettuce.

Genome-wide association analysis (GWAS) of flowering time in cultivated lettuce. a, Manhattan plot and quantile-quantile plot of GWAS result of flowering time in cultivated lettuce. b, Manhattan plot of GWAS result within Chr. 7:163-166 Mb. Red horizontal dashed line in the Manhattan plot represents the Bonferroni-corrected threshold for genome-wide significance (α = 0.05). Red and blue lines underneath the plot represent genes from the plus and minus DNA strands, respectively. The position of PHYTOCHROME C (PHYC), is indicated by the blue arrow in (a) and the red triangle in (b). c, Genotypes of PHYC in 133 L. sativa accessions. Left color bar represents flowering time. Oilseed lettuce accessions are indicated by arrows on the right. A key variant, Chr. 7:164,643,259 G-to-GA indel that causes frameshift mutation, is indicated by the black box. d, Boxplot of the flowering date in lettuce accessions carrying the reference (blue; n = 84 independent samples) and alternative (ref; n = 49 independent samples) allele. The internal line in each box represents the median and the lower and upper hinges represent the 25th and 75th percentiles, respectively. The whiskers represent 1.5 multiplied by the interquartile range, and dots beyond the whiskers are outliers. Statistical significance is examined by a two-sided Student’s t-test.

Source data

Extended Data Fig. 8 Genome-wide association (GWAS) of anthocyanin biosynthesis in cultivated lettuce.

Genome-wide association (GWAS) of anthocyanin biosynthesis in cultivated lettuce. a-b, Photos of cultivated accessions with various leaf anthocyanin content (a) and flower anthocyanin presence (b). Bar = 5 cm in a; bar = 1 cm in b. c-d, Manhattan plots of quantile-quantile plots of GWAS result of leaf anthocyanin content (c) and flower anthocyanin presence (d). The positions of RED LETTUCE LEAF2 (RLL2) and Anthocyanin synthase (ANS) are indicated by arrows in (c,d). Red horizontal dashed line in the Manhattan plots represents the Bonferroni-corrected threshold for genome-wide significance (. Red horize, Genotypes of RLL2 and ANS in 124 L. sativa accessions recorded with leaf anthocyanin content. Left color bar represents leaf anthocyanin content from low (colored in green) to high (in red). f, Genotypes of ANS in 84 L. sativa accessions recorded with flower anthocyanin presence or absence. Left color bar represents flower anthocyanin absence (colored in green) or presence (in red). A key variant in ANS, A-to-C transition at Chr. 9:152,765,187 that causes a stop codon loss, is indicated by the black box.

Source data

Extended Data Fig. 9 Variants in a downy mildew resistance gene, Dm7.

Variants in a downy mildew resistance gene, Dm7. a, Median-joining network of Dm7 haplotypes in the investigated L. sativa accessions with records of resistance to Bremia lactucae isolate 14 (Bl14). b, Dm7 genotypes in all the L. sativa accessions. Seven haplotypes are indicated in the left. c, A neighbor-joining tree of L. sativa and L. serriola accessions using the 260 SNPs within Dm7. Those showing full resistance to Bl14 are indicated by blue ticks. The Dm7.b clade is indicated by a red arrow. d, Geographic distribution of the tested L. serriola accessions. Colors denote resistance to Bl14 as shown in (a). The world map was drawn using the R/ggplot2 package with the Natural Earth data set (http://www.naturalearthdata.com).

Source data

Extended Data Fig. 10 Proposed lettuce domestication and breeding history.

Proposed lettuce domestication and breeding history. Domestication, improvement, and breeding are indicated by arrows. The photos of cultivated lettuce are in green frames, L. virosa is in a purple frame, SEU and CAU groups of L. serriola are in blue and red frames, respectively. Scale bar, 2 cm. Potential introgression processes are indicated by “×”. qLFD, qSHT and qSPN represent three loci controlling leaf morphology, seed shattering, and leaf spine. The world map was drawn based on the Natural Earth data set (http://www.naturalearthdata.com).

Supplementary information

Supplementary Information

Supplementary Note and Figs. 1–12

Reporting Summary

Supplementary Tables

Supplementary Tables 1–19

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 9

Statistical source data.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wei, T., van Treuren, R., Liu, X. et al. Whole-genome resequencing of 445 Lactuca accessions reveals the domestication history of cultivated lettuce. Nat Genet 53, 752–760 (2021). https://doi.org/10.1038/s41588-021-00831-0

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing