A genomic variation map provides insights into the genetic basis of cucumber domestication and diversity

Journal name:
Nature Genetics
Year published:
Published online

Most fruits in our daily diet are the products of domestication and breeding. Here we report a map of genome variation for a major fruit that encompasses ~3.6 million variants, generated by deep resequencing of 115 cucumber lines sampled from 3,342 accessions worldwide. Comparative analysis suggests that fruit crops underwent narrower bottlenecks during domestication than grain crops. We identified 112 putative domestication sweeps; 1 of these regions contains a gene involved in the loss of bitterness in fruits, an essential domestication trait of cucumber. We also investigated the genomic basis of divergence among the cultivated populations and discovered a natural genetic variant in a β-carotene hydroxylase gene that could be used to breed cucumbers with enhanced nutritional value. The genomic history of cucumber evolution uncovered here provides the basis for future genomics-enabled breeding.

At a glance


  1. Cucumber populations.
    Figure 1: Cucumber populations.

    (a) The core collection of 115 lines sequenced in this study has a wide geographic distribution. Color codes indicate geographic groups. (b) Fruit morphology of the four groups. The cucumber line CG1601 (East Asian) bears fruits with dense, white spines and an elongated stalk. Fruits of cucumber line CG5278 (Eurasian) lack spines and have a short fruit stalk. Cucumber line CG9164 (Xishuangbanna) bears melon-like fruits with a low fruit shape index (length/width) and a unique orange endocarp. Cucumber line CG0002 (Indian) bears small, oval fruits with sparse, black spines. Note that the images differ in scale. (c) Model-based clustering analysis of the core set, given different number of groups (K = 3, 4 or 5), using STRUCTURE. The y axis quantifies subgroup membership, and the x axis shows the different accessions. (d) Decay of LD, measured by r2, in the four groups. (e) Summary of nucleotide diversity and population divergence across the four groups. Values in parentheses represent measures of nucleotide diversity for the group, and values between pairs indicate population divergence (FST).

  2. Detection and functional annotation of domestication sweeps.
    Figure 2: Detection and functional annotation of domestication sweeps.

    (a) Detected domestication sweeps on the seven chromosomes. A total of 112 regions with both the top 5% of πw/πc values (genetic diversity in the cultivated groups compared to the wild group) and the top 5% of XP-CLR scores were considered to be candidate sweeps (purple bars). The horizontal dashed line indicates the threshold (15.4) defining the top 5% of πw/πc values. Gold bars represent windows that are not considered to be candidate sweeps. Note that some gold bars above the πw/πc threshold are excluded as candidate sweeps because they do not pass the threshold defining the top 5% of XP-CLR scores. (b) A sweep within the physical interval of the fl3.1 QTL for fruit length. The QTL was mapped in the interval defined by two markers (SSR13466 and SSR16680) on chromosome 3 by analysis of four single-fragment introgression lines (IL4–IL7) with the wild cucumber accession CG0002 (PI183967) as the donor parent and the cultivated cucumber line 931 as the recurrent parent. (c) The fl6.1 QTL for fruit length was mapped by genetic analysis of the F2:F3 population from the cross of CG0002 (PI183967) and CG1601 (179 F3 families). The peak of the QTL (SSR23284) is located within the sweep region (23.205–23.755 Mb) on chromosome 6. (d) Signal for a domestication sweep at the Bt locus that confers fruit bitterness. Bt resides between two SNP markers (SNPBt5-25 and SNPBt5-27), corresponding to a 442-kb region on chromosome 5. Numbers below the horizontal line indicate the numbers of recombinants between two neighboring markers in a large segregating population containing 1,822 F2 individuals. The genetic order of all markers is consistent with their physical order. The mapped region for Bt overlaps with a large sweep region (5.290–6.070 Mb on chromosome 5) that shows almost no nucleotide diversity in the three cultivated groups (Xishuangbanna, Eurasian and East Asian).

  3. Population divergence and identification of a key mutation responsible for the accumulation of [beta]-carotene in the fruit of the Xishuangbanna cucumbers.
    Figure 3: Population divergence and identification of a key mutation responsible for the accumulation of β-carotene in the fruit of the Xishuangbanna cucumbers.

    (a) Highly divergent regions (top 5%; FST ≥ 0.57) and nonsynonymous SNPs (top 5%; FST ≥ 0.70) between the East Asian and Eurasian groups. Green vertical bars higher than the dashed line (FST = 0.70) indicate highly divergent regions; purple dots indicate highly divergent nonsynonymous SNPs. (b) Physical positions of the genetically mapped ore gene and the 43 nonsynonymous SNPs with FST = 1 between the Xishuangbanna group (n = 19) and all other cucumbers (n = 96). Blue diamonds below the seven chromosomes indicate the positions of the SNPs. (c) A key mutation changed the conserved amino acid of a putative β-carotene hydroxylase (CsaBCH1). Residue 257 is located in the conserved PF04116 domain (the fatty acid hydroxylase domain). Xishuangbanna group cucumbers carry asparagine, whereas all other cucumbers and homologous proteins from ten other species carry alanine. Proteins used in the alignment refer to the corresponding accessions in GenBank: AEK86567 (C. moschata), XP_002327604 (P. trichocarpa), AAM77007.1 (V. vinifera), ABB49053 (C. sinensis), ADZ14893 (C. papaya), ABM54182 (B. napus), NP_194300 (A. thaliana), NP_001234348 (S. lycopersicum), ABF93742 (O. sativa) and ADC96676 (Z. mays). (d) CsaBCH1 mRNA levels in the East Asian cucumber line CG4210 are significantly elevated during the period 40–60 d after pollination when the Xishuangbanna cucumber line CG9164 rapidly accumulates β-carotene. The orange endocarps of Xishuangbanna cucumber fruits represent the accumulation of large amounts of β-carotene. Three replicate RT-PCR assays were performed. The value obtained from the sample at 20 d was taken as 100%, and the values for other samples were normalized to that for the 20-d sample. Data are represented as average values with s.d.

Accession codes

Primary accessions

Sequence Read Archive


  1. Morrell, P.L., Buckler, E.S. & Ross-Ibarra, J. Crop genomics: advances and applications. Nat. Rev. Genet. 13, 8596 (2011).
  2. Chia, J.M. et al. Maize HapMap2 identifies extant variation from a genome in flux. Nat. Genet. 44, 803807 (2012).
  3. Huang, X. et al. A map of rice genome variation reveals the origin of cultivated rice. Nature 490, 497501 (2012).
  4. Huang, X. et al. Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm. Nat. Genet. 44, 3239 (2012).
  5. Hufford, M.B. et al. Comparative population genomics of maize domestication and improvement. Nat. Genet. 44, 808811 (2012).
  6. Jiao, Y. et al. Genome-wide genetic changes during modern breeding of maize. Nat. Genet. 44, 812815 (2012).
  7. Lam, H.M. et al. Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. Nat. Genet. 42, 10531059 (2010).
  8. Xu, X. et al. Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nat. Biotechnol. 30, 105111 (2012).
  9. Cao, J. et al. Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat. Genet. 43, 956963 (2011).
  10. Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627631 (2010).
  11. Sebastian, P., Schaefer, H., Telford, I.R. & Renner, S.S. Cucumber (Cucumis sativus) and melon (C. melo) have numerous wild relatives in Asia and Australia, and the sister species of melon is from Australia. Proc. Natl. Acad. Sci. USA 107, 1426914273 (2010).
  12. Lv, J. et al. Genetic diversity and population structure of cucumber (Cucumis sativus L.). PLoS ONE 7, e46919 (2012).
  13. Huang, S. et al. The genome of the cucumber, Cucumis sativus L. Nat. Genet. 41, 12751281 (2009).
  14. Li, Z. et al. RNA-Seq improves annotation of protein-coding genes in the cucumber genome. BMC Genomics 12, 540 (2011).
  15. Li, X. et al. Construction of wild cucumber substitution lines. Acta Horticulturae Sinica 38, 886892 (2011).
  16. Ren, Y. et al. An integrated genetic and cytogenetic map of the cucumber genome. PLoS ONE 4, e5795 (2009).
  17. Qi, C., Yuan, Z. & Li, Y. A new type of cucumber—Cucumis sativus L. var. Xishuangbannanesis. Acta Horticulturae Sinica 10, 259264 (1983).
  18. Tajima, F. Evolutionary relationship of DNA sequences in finite populations. Genetics 105, 437460 (1983).
  19. Doebley, J.F., Gaut, B.S. & Smith, B.D. The molecular genetics of crop domestication. Cell 127, 13091321 (2006).
  20. Tang, H., Sezen, U. & Paterson, A.H. Domestication and plant genomes. Curr. Opin. Plant Biol. 13, 160166 (2010).
  21. Gross, B.L. & Olsen, K.M. Genetic perspectives on crop domestication. Trends Plant Sci. 15, 529537 (2010).
  22. Gutenkunst, R.N., Hernandez, R.D., Williamson, S.H. & Bustamante, C.D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009).
  23. Ross-Ibarra, J., Tenaillon, M. & Gaut, B.S. Historical divergence and gene flow in the genus Zea. Genetics 181, 13991413 (2009).
  24. Caicedo, A.L. et al. Genome-wide patterns of nucleotide polymorphism in domesticated rice. PLoS Genet. 3, 17451756 (2007).
  25. Konishi, S. et al. An SNP caused loss of seed shattering during rice domestication. Science 312, 13921396 (2006).
  26. Li, C., Zhou, A. & Sang, T. Rice domestication by reducing shattering. Science 311, 19361939 (2006).
  27. Wang, H. et al. The origin of the naked grains of maize. Nature 436, 714719 (2005).
  28. Horiguchi, G., Gonzalez, N., Beemster, G.T., Inze, D. & Tsukaya, H. Impact of segmental chromosomal duplications on leaf size in the grandifolia-D mutants of Arabidopsis thaliana. Plant J. 60, 122133 (2009).
  29. Balkema-Boomstra, A.G. et al. Role of cucurbitacin C in resistance to spider mite (Tetranychus urticae) in cucumber (Cucumis sativus L.). J. Chem. Ecol. 29, 225235 (2003).
  30. Barham, W.S. The inheritance of a bitter principle in cucumbers. Proc. Amer. Soc. Hort. Sci. 62, 441442 (1953).
  31. Kang, H. et al. Fine genetic mapping localizes cucumber scab resistance gene Ccu into an R gene cluster. Theor. Appl. Genet. 122, 795803 (2011).
  32. Bo, K. et al. Inheritance and mapping of the ore gene controlling the quantity of β-carotene in cucumber (Cucumis sativus L.) endocarp. Mol. Breed. 30, 335344 (2012).
  33. Walter, M.H. & Strack, D. Carotenoids and their cleavage products: biosynthesis and functions. Nat. Prod. Rep. 28, 663692 (2011).
  34. Guo, S. et al. The draft genome of watermelon (Citrullus lanatus) and resequencing of 20 diverse accessions. Nat. Genet. 45, 5158 (2013).
  35. Murray, M.G. & Thompson, W.F. Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 8, 43214325 (1980).
  36. Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 19661967 (2009).
  37. Yang, L. et al. Chromosome rearrangements during domestication of cucumber as revealed by high-density genetic mapping and draft genome assembly. Plant J. 71, 895906 (2012).
  38. Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265272 (2010).
  39. Li, R. et al. SNP detection for massively parallel whole-genome resequencing. Genome Res. 19, 11241132 (2009).
  40. Xia, Q. et al. Complete resequencing of 40 genomes reveals domestication events and genes in silkworm (Bombyx). Science 326, 433436 (2009).
  41. Lai, J. et al. Genome-wide patterns of genetic variation among elite maize inbred lines. Nat. Genet. 42, 10271030 (2010).
  42. Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988995 (2004).
  43. Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307321 (2010).
  44. Falush, D., Stephens, M. & Pritchard, J.K. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 15671587 (2003).
  45. Evanno, G., Regnaut, S. & Goudet, J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol. Ecol. 14, 26112620 (2005).
  46. Anderson, M.J. PCO: A FORTRAN Computer Program for Principal Coordinate Analysis (Department of Statistics, University of Auckland, Auckland, New Zealand, 2003).
  47. Barrett, J.C., Fry, B., Maller, J. & Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263265 (2005).
  48. Xu, Z., Kaplan, N.L. & Taylor, J.A. TAGster: efficient selection of LD tag SNPs in single or multiple populations. Bioinformatics 23, 32543255 (2007).
  49. Chen, H., Patterson, N. & Reich, D. Population differentiation as a test for selective sweeps. Genome Res. 20, 393402 (2010).
  50. de Meeûs, T. & Goudet, J. A step-by-step tutorial to use HierFstat to analyse populations hierarchically structured at multiple levels. Infect. Genet. Evol. 7, 731735 (2007).
  51. Cunningham, F.X. Jr. & Gantt, E. A portfolio of plasmids for identification and analysis of carotenoid pathway enzymes: Adonis aestivalis as a case study. Photosynth. Res. 92, 245259 (2007).
  52. Tian, L. & DellaPenna, D. Characterization of a second carotenoid β-hydroxylase gene from Arabidopsis and its relationship to the LUT1 locus. Plant Mol. Biol. 47, 379388 (2001).

Download references

Author information

  1. These authors contributed equally to this work.

    • Jianjian Qi,
    • Xin Liu,
    • Di Shen,
    • Han Miao,
    • Bingyan Xie &
    • Xixiang Li


  1. Institute of Vegetables and Flowers of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Beijing, China.

    • Jianjian Qi,
    • Di Shen,
    • Han Miao,
    • Bingyan Xie,
    • Xixiang Li,
    • Shenhao Wang,
    • Yi Shang,
    • Xingfang Gu,
    • Yongchen Du,
    • Ying Li,
    • Tao Lin,
    • Jinhong Yuan,
    • Xueyong Yang,
    • Xingyao Xiong,
    • Zhonghua Zhang &
    • Sanwen Huang
  2. BGI-Shenzhen, Shenzhen, China.

    • Xin Liu &
    • Peng Zeng
  3. State Key Laboratory of Crop Genetics and Germplasm Enhancement, College of Horticulture, Nanjing Agricultural University, Nanjing, China.

    • Jinfeng Chen
  4. Hunan Vegetable Research Institute, Hunan Academy of Agricultural Sciences, Changsha, China.

    • Huiming Chen
  5. Hunan Provincial Key Laboratory for Germplasm Innovation and Utilization of Crop, Horticulture & Landscape College, Hunan Agricultural University, Changsha, China.

    • Xingyao Xiong &
    • Ke Huang
  6. Boyce Thompson Institute for Plant Research, US Department of Agriculture (USDA) Robert W. Holley Center for Agriculture and Health, Ithaca, New York, USA.

    • Zhangjun Fei &
    • Linyong Mao
  7. Department of Plant Sciences, University of California, Davis, Davis, California, USA.

    • Li Tian
  8. Plant Ecological Genetics, Institute of Integrative Biology, Eidgenössische Technische Hochschule (ETH) Zurich, Zurich, Switzerland.

    • Thomas Städler
  9. Department of Biology, University of Munich, Munich, Germany.

    • Susanne S Renner
  10. The Sainsbury Laboratory, Norwich Research Park, Norwich, UK.

    • Sophien Kamoun
  11. Department of Plant Biology, University of California, Davis, Davis, California, USA.

    • William J Lucas


S.H. and Z.Z. conceived and designed the experiments. J.Q., D.S., H.M., X.G., S.W., Y.L., T.L., Y.S., X.Y., H.C., X.X., K.H., J.C. and L.T. performed the experiments. Z.Z., J.Q., X. Liu, B.X., X. Li, P.Z., J.Y., Y.D., Z.F., L.M., T.S., S.S.R., W.J.L., S.K. and S.H. analyzed the data. S.H., Z.Z., X. Liu and J.Q. wrote the manuscript. Z.F., T.S., S.S.R., W.J.L. and S.K. revised the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (4,742 KB)

    Supplementary Note, Supplementary Tables 2, 3, 5, 6 and 14, and Supplementary Figures 1–14

Excel files

  1. Supplementary Table 1 (65 KB)

    Summary of the sampled core collection

  2. Supplementary Table 4 (147 KB)

    Presence and absence variation (PAV) genes identified in the core collection of 115 cucumber accessions

  3. Supplementary Table 7 (58 KB)

    The SNP loci chosen for validation by PCR and Sanger sequencing

  4. Supplementary Table 8 (76 KB)

    Putative regions identified to be under domestication sweeps

  5. Supplementary Table 9 (553 KB)

    Genes within the putative regions identified to be under domestication sweeps

  6. Supplementary Table 10 (80 KB)

    Summary of the genes present within Bt region

  7. Supplementary Table 11 (59 KB)

    Highly differentiated regions across the cultivated groups

  8. Supplementary Table 12 (1,126 KB)

    Genes located in the highly differentiated regions

  9. Supplementary Table 13 (453 KB)

    Genes containing nonsynonymous SNPs of significantly high FST values

  10. Supplementary Dataset (6,661 KB)

    Supplementary dataset for Supplementary Figures 1–5 and 7–10

Additional data