Abstract
Three pumpkin species Cucurbita maxima, C. moschata, and C. pepo are commonly cultivated worldwide. To identify genome-wide SNPs in these cultivated pumpkin species, we collected 48 F1 cultivars consisting of 40 intraspecific hybrids (15 C. maxima, 18 C. moschata, and 7 C. pepo) and 8 interspecific hybrids (C. maxima x C. moschata). Genotyping by sequencing identified a total of 37,869 confident SNPs in this collection. These SNPs were filtered to generate a subset of 400 SNPs based on polymorphism and genome distribution. Of the 400 SNPs, 288 were used to genotype an additional 188 accessions (94 F1 cultivars, 50 breeding lines, and 44 landraces) with a SNP array-based platform. Reliable polymorphisms were observed in 224 SNPs (78.0%) and were used to assess genetic variations between and within the four predefined populations in 223 cultivated pumpkin accessions. Both principal component analysis and UPGMA clustering found four major clusters representing three pumpkin species and interspecific hybrids. This genetic differentiation was supported by pairwise Fst and Nei’s genetic distance. The interspecific hybrids showed a higher level of genetic diversity relative to the other three populations. Of the 224 SNPs, five subsets of 192, 96, 48, 24, and 12 markers were evaluated for variety identification. The 192, 96, and 48 marker sets identified 204 (91.5%), 190 (85.2%), and 141 (63.2%) of the 223 accessions, respectively, while other subsets showed <25% of variety identification rates. These SNP markers provide a molecular tool with many applications for genetics and breeding in cultivated pumpkin.
Similar content being viewed by others
Introduction
Pumpkin (Cucurbita spp.; 2n = 2x = 40) is a major crop in the Cucurbitaceae family including cucumber, melon, and watermelon. The Cucurbita genus consists of at least 12 diverse species; three major cultivated species are C. maxima Duchesne, C. moschata Duchesne, and C. pepo L.1. The cultivated varieties in these species produce thicker, more highly colored, and less fibrous fruit flesh relative to wild species2. Pumpkin provides an excellent source of pro-vitamin A, carotenoids, sugars, and minerals3,4. In addition, the world production of pumpkin including squash and gourd exceeded 27.6 million tons from 2.04 million ha in 20185. Due to the nutritional and economic value of pumpkin, breeders have made many efforts to develop new varieties in public and private breeding programs. Therefore, plant variety protection (PVP) is important to prevent unauthorized use of new varieties and support breeding activities6.
The International Union for the Protection of New Varieties of Plant (UPOV) has harmonized PVP systems among 76 member countries and organizations (as of February 2020). Within this PVP system, a new variety must have distinctness, uniformity, and stability (DUS) to be eligible for registration and protection. UPOV provides a total of 331 guidelines for DUS tests in crop species (as of February 2020). Current DUS testing is mainly based on phenotypic evaluations during two growing seasons, and is labor-intensive, time-consuming, and environment-sensitive7. Therefore, the biochemical and molecular techniques (BMT) working group of UPOV have suggested models for the application of molecular markers in variety registration8,9. Recent advances in high-throughput genotyping technology have made molecular markers a more attractive option for supplementing or even replacing phenotype-based DUS testing7,10.
Molecular markers, especially DNA markers, are an effective tool to explore genetic variations in crop species. Of these markers, simple sequence repeats (SSRs) have been commonly used for DNA fingerprinting and genetic diversity assessment due to advantages such as co-dominant and multi-allelic natures11,12,13,14,15,16,17. However, SSR markers are not suitable for high-throughput genotyping with a large number of markers. Single nucleotide polymorphisms (SNPs) are amenable to automation for high-throughput and cost-effective genotyping. Next-generation sequencing (NGS) technologies have accelerated the identification of genome-wide SNPs, making SNPs ideal to many applications in plant breeding18. In pumpkin, NGS-based transcriptome sequencing of C. pepo found over 9,000 SNPs19. A total of 8,660 SNPs were also identified from genotyping by sequencing (GBS) in the F2 population (n = 186), which were derived from two inbred lines of C. maxima20. These SNPs were used to construct high-density genetic maps and to detect QTL associated with a dwarf vine. In addition, the GBS-based SNP studies were conducted in the pumpkin species (C. pepo, C. moschata, and C. okeechobeensis subsp. martinezii) and the resulting SNPs were used to map loci associated with powdery mildew resistance and fruit-related traits21,22. Recently, the Cucurbita genomes were assembled in C. maxima, C. moschata, and C. pepo23,24. For C. pepo, the genomes of seven morphotypes were also studied by resequencing with an average of 33.5x coverage25. These genome resources have accelerated genome-wide SNP discovery in cultivated pumpkin.
Although a large number of SNPs were previously identified in pumpkins, this genomic resource is limited to investigation of genetic variations and variety identification in cultivated pumpkin germplasm including commercial F1 cultivars. Therefore, we generated genome-wide SNPs with a GBS approach in a collection of 48 commercial F1 cultivars representing intraspecific hybrids for each of three pumpkin species (C. maxima, C. moschata, and C. pepo) and interspecific hybrids (C. maxima × C. moschata). Of these, 288 SNPs were used to genotype an additional collection of 188 accessions with the Fluidigm platform. The genetic variations between and within pumpkin populations were assessed based on these SNP markers. In addition, several subsets of SNP markers were generated for variety identification in commercial F1 cultivars. These SNP markers are a useful resource for developing a cost-effective and rapid DNA-based system for DUS testing and thus benefit breeders by protecting their ownerships of new pumpkin varieties.
Results
Genome-wide SNP discovery in commercial F1 pumpkin cultivars
The sequencing of GBS libraries for the 48 F1 cultivars generated a total of 389.9 million reads ranging from 2.4 million to 13.2 million per cultivar with an average of 8.1 million (Table 1). All of these reads represented 39.4 Gb that is 102x coverage for the genome assembly (386.8 Mb) of C. maxima24. The 381.6 million reads (97.9%) showed expected barcodes and 953,780 tags were mapped to the C. maxima genome. The TASSEL-GBS pipeline detected a total of 232,256 variants including 202,722 SNPs (Table 1). Of these, we obtained 37,869 bi-allelic SNPs with >5% of minor allele frequency and <10% of missing data. These SNPs were unevenly distributed on 20 chromosomes ranging from 1,270 to 3,741 SNPs per chromosome (Fig. 1a). Furthermore, the number of SNP in four predefined populations varied from 26,707 (C. maxima × C. moschata) to 34,869 (C. pepo) (Table 2). We detected 23,703 transition SNPs (62.6%) and 14,166 transversion SNPs (37.4%) in the collection of 48 F1 cultivars. Similarly, the number of transition SNPs was ~1.7 times higher relative to transversion SNPs in all four populations (Table 2). Two transition types (A/G and C/T) showed similar numbers, while the number of the A/T transversion type was higher than the other types (A/C, G/T, and C/G).
The polymorphic information content (PIC) values of 37,869 confident SNPs were calculated based on polymorphisms in the 48 F1 cultivars. The 24,684 SNPs with ≥0.3 PIC values were used to select core sets of SNP markers for variety identification. First, a subset of 400 SNPs was filtered based on their physical positions relative to the C. maxima genome. These SNPs were distributed across 20 chromosomes with the average intervals ranging from 0.41 Mb (chromosome 13) to 0.69 Mb (chromosome 19) (Fig. 1b). Based on the C. maxima annotation24, 287 SNPs (71.7%) were derived from coding sequences, consisting of 126 non-synonymous and 161 synonymous SNPs (Table 3). The remaining 113 SNPs (28.2%) were non-coding sequence variants. Of these, 47 and 24 SNPs were upstream and downstream gene variants, respectively, while the other 42 SNPs were from UTRs, introns, splice sites, and intergenic regions (Table 3).
SNP chip-based genotyping for validation
We used 288 of 400 SNPs to genotype an additional collection of 188 pumpkin accessions in the Fluidigm assay. The 174 accessions (92.6%) were genotyped with call rates >90%. One accession of C. pepo (breeding line ‘Zhdana’) showed a call rate of 83.8%. Since the other 13 accessions (4 F1 cultivars and 9 landraces) showed call rates of 36.6–62.0%, these were excluded from further analyses. In the 174 accessions, 224 of 288 SNPs (77.8%) were polymorphic and 13 SNPs (4.5%) were monomorphic (Table 4). Of these polymorphic SNPs, 165 SNPs (73.7%) were derived from coding sequences and 109 SNPs (48.7%) were non-synonymous. In addition, the genotypes of 19 SNPs were undetermined due to ambiguous clustering patterns and 32 SNPs showed no call.
Most of the SNPs showed two or three clusters corresponding to two homozygous genotypes (XX and YY) and a heterozygous genotype (XY) (Fig. 2a, b). However, we observed different clustering patterns for several SNPs (Fig. 2c, d). For example, the SNP marker ‘S14_2246878’ showed two clusters for a homozygous genotype (Fig. 2c). Similarly, the same heterozygous genotypes were separated into two clusters for the SNP marker ‘S14_9835352’ (Fig. 2d). The Sanger sequencing for the flanking sequences of these SNPs detected additional SNPs in the binding sites of specific target amplification and locus specific primers (Fig. S1a and S1b). These secondary SNPs could lead to inaccurate amplification for the target alleles and thus result in these clustering patterns.
Genetic variations between and within cultivated pumpkin populations
The genotypic data of 224 SNP markers were used to investigate genetic variations in the 223 pumpkin accessions used for GBS and Fluidigm assays. This collection consisted of 73 C. maxima, 63 C. moschata, 45 C. pepo, 31 interspecific hybrids (C. maxima × C. moschata), and 11 unknown accessions. In PCA, the 223 pumpkin accessions were divided into four major clusters using the first two principal components (PC1 and PC2), which explained 65.9% and 15.3% of the total variance, respectively (Fig. 3a). Of the 73 C. maxima accessions, 59 (38 F1 cultivars, 12 breeding lines, and 9 landraces) were grouped with two C. moschata (all F1 cultivars), five C. pepo (one F1 cultivar and four breeding lines), and two unknown accessions (all F1 cultivars) in cluster 1. We found 30 of 31 interspecific hybrids in cluster 2, which also included 10 intraspecific hybrids (two C. maxima, six C. moschata, and two C. pepo) and three unknown F1 cultivars. These accessions in cluster 2 were further divided into two sub-clusters based on PC1 (Fig. 3a). One sub-cluster contained seven interspecific and seven intraspecific hybrids, while another sub-cluster contained 23 interspecific and six intraspecific hybrids (Table S1). In cluster 3, 53 of the 63 C. moschata accessions were found with four C. maxima (two F1 cultivars and two breeding lines), one interspecific hybrid, and four unknown (all F1 cultivars) accessions. Cluster 4 consisted of 38 C. pepo, eight C. maxima (five breeding lines and three landraces), and two C. moschata (one F1 cultivar and one breeding line), and two unknown (all F1 cultivars) accessions. Most of the pumpkin accessions were separated according to their predefined populations based on a priori knowledge using PC1 and PC2. However, we found no obvious genetic differentiation among F1 cultivars, breeding lines, and landraces (Fig. 3a, b). The F1 cultivars were found in all four clusters, while breeding lines and landraces were distributed across clusters 1, 3, and 4.
The UPGMA dendrogram based on Euclidian genetic distances showed four clusters in the 223 pumpkin accessions (Fig. 4a). Moreover, the 138 F1 cultivars were also separated into four clusters as shown in PCA (Fig. 4b). Only four of the 223 accessions (three F1 cultivars and one landrace) were differently clustered between the UPGMA and PCA methods, indicating a consistent result (Table S1). The magnitude and significance of genetic differentiation between the four predefined populations were measured using pairwise Fst and Nei’s genetic distance (D). For this analysis, we excluded the 11 unknown accessions. The four populations were all significantly differentiated by pairwise Fst at P < 0.001 (Table 5). We found the highest level of genetic differentiation between C. maxima and C. moschata populations (Fst = 0.63 and D = 0.58). The C. pepo population was separated from C. maxima (Fst = 0.49 and D = 0.30) and C. moschata (Fst = 0.46 and D = 0.24) populations. The pairwise estimates of Fst and D suggested that the interspecific hybrids were more similar to C. maxima (Fst = 0.21 and D = 0.12) than C. moschata (Fst = 0.41 and D = 0.30) (Table 5).
Allelic richness (A), expected heterozygosity (He), and PIC were used to investigate levels of genetic diversity within each of the predefined populations (Table 6). The C. maxima × C. moschata hybrids showed the highest estimates of these descriptive statistics (A = 1.98, He = 0.42, and PIC = 0.32), while the C. pepo population showed the lowest estimates (A = 1.77, He = 0.18, and PIC = 0.15). Similar levels of genetic diversity were found in the C. maxima (A = 1.90, He = 0.27, and PIC = 0.22) and C.moschata (A = 1.90, He = 0.22, and PIC = 0.19) populations (Table 6).
Core SNP markers for variety identification
The 224 SNP markers differentiated 211 (94.6%) of the 223 pumpkin accessions including all inbred accessions and 126 of 138 F1 cultivars (Fig. 4a, b). From these markers, we selected 192 SNP markers as a core set for variety identification based on their polymorphisms. These core markers were effective in identifying 204 (91.5%) of the 223 accessions. The remaining 19 accessions, which were not separated by the 192 SNP markers, consisted of 11 interspecific hybrids and 8 intraspecific hybrids (4 C. maxima and 4 C. moschata). Interestingly, all 45 C. pepo accessions including F1 were distinct using this core set of SNP markers (Fig. 4a, b). Four additional subsets of 96, 48, 24, and 12 SNPs were generated from the 192 SNP markers to evaluate their performance for variety identification (Figs. S2 and S3). The 92 and 48 SNP markers detected genetic variations to distinguish 190 (85.2%) and 141 (63.2%) of the 223 pumpkin accessions, while the 24 and 12 SNP markers identified 54 (24.2%) and 11 (4.9%) accessions, respectively (Fig. S2). The 92 and 48 marker sets revealed four clusters representing the predefined populations using PC1 and PC2 as the 192 marker set (Fig. S3). Although accessions in each cluster were loosely grouped, the 24 marker set was also able to detect these four clusters. Therefore, the 24 marker set can be used for the pre-identification of pumpkin varieties based on species.
Discussion
Next-generation sequencing (NGS) technologies have led to rapid SNP discovery and high-throughput genotyping. As an NGS-based method, genotyping by sequencing (GBS) is a cost-effective approach based on genome complexity reduction for identifying genome-wide SNPs26. Therefore, GBS have been commonly used as a powerful tool for high-resolution genetic mapping, genome-wide association study, and genetic diversity analysis in crop species20,27,28,29. In this study, a total of 37,869 confident SNPs were generated using GBS in the collection of 48 F1 pumpkin cultivars representing three main Cucurbit species (C. maxima, C. moschata, and C. pepo) and interspecific hybrids (C. maxima × C. moschata). In addition, 26,707–34,869 SNPs were found in each of these four populations. Previous studies in pumpkins reported relatively small numbers of SNPs using inbred accessions. Blanca et al.19 identified 9,043 filtered SNPs between two C. pepo subspecies using NGS-based transcriptome sequencing. The GBS study generated 8,660 SNPs in the F2 population of C. maxima20. We also found similar percentages of transition (~63%) and transversion (~37%) SNPs relative to those in the study of Blanca et al.19. The genome-wide SNPs from the present study may be biased to C. maxima because the GBS reads of 48 F1 cultivars were mapped to the C. maxima genome assembly for SNP discovery. Therefore, the GBS reads are deposited in the Sequence Read Archive (SRA) of NCBI (PRJNA633011) for a customized SNP identification with the C. moschata or C. pepo genomes. Our results contribute to developing a large SNP collection that is a useful resource to investigate genetic variations in three major pumpkin species.
A subset of 288 SNPs with ≥0.3 PIC values was used to genotype the 188 pumpkin accessions (94 F1 cultivars, 50 breeding lines, and 44 landraces) in the Fluidigm assay. Of these, 224 SNPs (77.8%) showed clear polymorphism in this collection. With these SNP markers, the 223 pumpkin accessions including 48 F1 cultivars used for GBS were separated into four clusters in both PCA and UPGMA dendrogram. Most of the accessions in each cluster were derived from C. maxima, C. moschata, C. pepo, or C. maxima × C. moschata populations. The pairwise Fst and Nei’s genetic distance also indicated significant genetic differentiation between these four populations. Similar genetic relationships between these Cucurbit species were also previously found using SSR markers16,30,31. In addition, the sub-division in the C. maxima × C. moschata accessions suggests that the interspecific hybrids used in this study are differentiated from two different genetic backgrounds. These results demonstrate that the SNP markers are a powerful tool to detect species-specific loci and/or alleles in discriminating Cucurbit species.
Both C. maxima (n = 73) and C. moschata (n = 63) populations displayed higher estimates of allelic richness, expected heterozygosity, and PIC than the C. pepo population (n = 45). In the study of Gong et al.30, the C. pepo accessions showed the highest levels of genetic diversity among these three species, despite having a smaller number of accessions (18 C. maxima, 20 C. moschata, and 7 C. pepo). Cultivated C. pepo, which is known to have a great diversity of morphology, consisted of two subspecies: ssp.pepo (Pumpkin, Vegetable Marrow, Cocozelle, and Zucchini) and ssp.ovifers (Acorn, Scallop, Crookneck, and Straightneck)32,33. In the PCA analysis, our C. pepo accessions were tightly aggregated in a cluster relative to the other two species, suggesting that the C. pepo population represents few morphotypes including Zucchini. Therefore, it is possible that this discrepancy in genetic diversity of C. pepo is due to the sampling of accessions.
To recognize breeder’s intellectual property rights, new varieties must satisfy three criteria in the plant variety protection (PVP) system: distinctness, uniformity and stability (DUS). The current DUS testing based on phenotypic evaluation involves laborious and time-consuming tasks. Therefore, a DNA-based system with molecular markers has been considered as an alternative to improve the efficiency and accuracy of DUS testing7,10. With NGS technologies and high-throughput genotyping platforms, genome-wide SNP markers have been widely used across numerous applications in crop species18. The core set of 192 SNP markers in this study were sufficient in detecting genetic variations for identification of all 85 pumpkin inbred accessions. For the collection of 138 commercial F1 cultivars, 119 accessions were identified using these SNP markers. The unidentified 19 F1 cultivars, which consist of 11 interspecific hybrids (C. maxima × C. moschata) and 8 intraspecific hybrids (4 C. maxima and 4 C. moschata), are likely to have narrow genetic bases. Kong et al.31 also found similar genetic backgrounds between the commercial F1 cultivars of C. maxima x C. moschata. A possible explanation for these reduced genetic bases is to use a few elite inbred lines as parents to develop different F1 cultivars in breeding programs. Therefore, variety identification in the F1 cultivars with high levels of genetic similarity is often challenging. Our study revealed that the 224 SNP markers identified six more interspecific hybrids relative to the 192 SNP markers, suggesting that increasing the number of SNP markers allows to us distinguish these F1 cultivars. Since we identified a total of 26,707 SNPs in the interspecific hybrid population using GBS, it is possible to find additional markers that are effective in detecting minimal genetic variations in unidentified F1 cultivars.
Although the core set of 192 SNP markers is a powerful tool for variety identification, their subsets can also be useful in providing additional options for genotyping with different platforms. Of the four subsets, the 96 and 48 SNP markers were able to identify 85.2% and 63.2% of the 223 pumpkin accessions, respectively. The other two subsets of 24 and 12 SNP markers showed low identification rates (24.2% and 4.9%, respectively). This result indicates that the subsets of 96 and 48 SNPs are suitable for pre-screening tests using cost-effective genotyping platforms. The 24 marker set can also be sufficient when detecting species-specific genetic variations between intraspecific hybrids. Thus, these subsets of SNP markers are valuable resources for developing a DNA-based system for PVP in pumpkin.
In conclusion, a large collection of SNPs was generated for three major pumpkin species (C. maxima, C. moschata, and C. pepo) using GBS and commercial F1 cultivars. These SNPs contribute to an expansion of genomic resources for both basic and applied researches in cultivated pumpkin. Our results also demonstrate that the core sets of SNP markers are useful for exploring genetic variations between and within the four pumpkin populations representing three species and interspecific hybrids (C. maxima × C. moschata). Furthermore, these SNP markers provide a rapid and accurate option for variety identification and facilitate development of a DNA-based system for DUS testing in the PVP system. Other applications of these markers include seed purity tests and background selection in breeding programs.
Materials and methods
Plant materials and DNA isolation
A total of 48 commercial F1 cultivars was collected to identify genome-wide SNPs via genotyping by sequencing (GBS). This collection included 40 intraspecific hybrids of three species (15 C. maxima, 18 C. moschata, and 7 C. pepo) and eight interspecific hybrids (C. maxima × C. moschata) derived from 25 seed companies (Table S1). We also used an additional collection of 188 germplasm consisting of 94 F1 cultivars and 94 inbred accessions (50 breeding lines and 44 landraces) for SNP validation (Table S1). These inbred accessions were collected from the National Agrobiodiversity Center in Rural Development Administration in the Republic of Korea (ROK). Their countries of origin are ROK (13 breeding lines and 8 landraces), the United States (5 breeding lines and 10 landraces), Russia (9 breeding lines and 5 landraces), China (12 breeding lines), Bulgaria (one breeding line and 8 landraces), Turkey (2 breeding lines and 7 landraces), Ukraine (8 breeding lines), and Nepal (6 landraces).
Genomic DNA was extracted from fresh, young leaves using a modified cetyl trimethyl ammonium bromide (CTAB) method34. The quality and quantity of DNA was measured using the NanoDrop 1000 spectrophotometer (ThermoFisher Scientific, Wilmington, DE 19810, USA). The final concentration of DNA was adjusted to 50 ng/μL for GBS and Fluidigm assay.
Genotyping by sequencing (GBS)
GBS libraries of 48 F1 cultivars were prepared according to the protocol described by Elshire et al.26. The 200 ng of genomic DNA for each cultivar were digested using a methylation-sensitive restriction enzyme, ApeKI (NEB, Ipswich, MA, USA). After digestion, the DNA fragments were ligated to different barcode adapters that were assigned to each cultivar. These DNA samples were pooled and amplified by PCR to generate GBS libraries. The libraries were sequenced with the pair-end method in the HiSeq 2500 platform (Illumina Inc., San Diego, CA, USA). For SNP calling, the filtered, high-quality sequencing reads were mapped to the C. maxima (Rimu) genome24 using the Burrows-Wheeler Alignment (BWA) method35 in the TASSEL-GBS pipeline36. The resulting bi-allelic SNPs with 5x of minimum depth were filtered based on >5% of minor allele frequency and <10% of missing data for further analysis.
Fluidigm genotyping with SNP markers
A subset of SNPs was selected based on polymorphism information content (PIC) value and physical position on 20 chromosomes for SNP genotyping with the Fluidigm JunoTM system (Fluidigm, San Francisco, CA, USA) in the 188 pumpkin accessions. The PIC value for each SNP was calculated using the following equation:
where n is the number of alleles and pi is the frequency of the ith allele37.
For the Fluidigm SNP genotyping, three types of primers were designed using the 300 bp flanking sequence of each SNP and the D3 Assay Design software (Fluidigm, San Francisco, CA, USA). Both specific target amplification and locus specific primers were used for pre-amplification and two allele specific primers were used for PCR amplification in the Juno 96.96 Genotyping IFC (Integrated Fluidic Circuit). The resulting end-point fluorescence images were analyzed for SNP calling using the Fluidigm SNP genotyping analysis software v4.5.1.
Data analysis
The genotypic data from both GBS and Fluidigm SNP genotyping were used to investigate genetic variations in the cultivated pumpkin germplasm (Table S2). Principal component analysis (PCA) was performed using the pcaMethods package38 as implemented in R39. The Euclidean genetic distances were also calculated between pumpkin accessions with the dist function and hierarchical cluster analysis was then conducted using the unweighted pair group method with arithmetic mean (UPGMA) and the hclust function in R. The UPGMA dendrogram was visualized using the R package dendextent40. Pairwise Fst41 and Nei’s genetic distance were estimated between four predefined populations of the pumpkin collection using the Microsatellite Analyzer (MSA) software v4.0542. The P-value for the pairwise Fst was obtained from 10,000 permutations of genotypes and an applied Bonferroni correction. In addition, allelic richness (A)43,44 and expected heterozygosity (He)45 were calculated for the pumpkin populations using MSA.
Data availability
All of the GBS reads of 48 F1 pumpkin cultivars generated in this study are deposited in the Sequence Read Archive (SRA) of National Center of Biotechnology Information (NCBI) with the BioProject accession number PRJNA633011 (Release data: 08-30-2020). The genotypic data of SNP markers for the 223 pumpkin accessions are included as supplementary information.
References
Sanjur, O. I., Piperno, D. R., Andres, T. C. & Wessel-Beaver, L. Phylogenetic relationships among domesticated and wild species of Cucurbita (Cucurbitaceae) inferred from a mitochondrial gene: Implications for crop plant evolution and areas of origin. Proc. Natl. Acad. Sci. USA99, 535–540 (2002).
Whitaker, T. W. & Bemis, W. P. Evolution in the genus Cucurbita. Evolution18, 553–559 (1964).
Aamir, H. D., Sofi, S. A. & Rafiq, S. Pumpkin the functional and therapeutic ingredient: a review. Int. J. Food Sci. Nutr.2, 165–170 (2017).
Zhou, T., Kong, Q., Huang, J., Dai, R. & Li, Q. Characterization of nutritional components and utilization of pumpkin. Food1, 313–321 (2007).
Food & Agriculture Organization of the United, N. FAOSTAT statistical database. https://search.library.wisc.edu/catalog/999890171702121 (2020).
Jones, H. et al. Evaluation of the use of high-density SNP genotyping to implement UPOV Model 2 for DUS testing in barley. Theor. Appl. Genet.126, 901–911 (2013).
Jamali, S. H., Cockram, J. & Hickey, L. T. Insights into deployment of DNA markers in plant variety protection and registration. Theor. Appl. Genet.132, 1911–1929 (2019).
UPOV, International Union for the Protection of New Varieties of Plants. Combining morphological and molecular distance in the management of the reference collection of potato. BMT/13/10, https://www.upov.int/edocs/mdocs/upov/en/bmt_13/bmt_13_1 0.pdf (2011).
UPOV, International Union for the Protection of New Varieties of Plants. Guidance on the use of biochemical and molecular markers in the examination of distinctness, uniformity and stability (DUS). TGP/15, https://www.upov.int/edocs/tgpdocs/en/tgp_15.pdf (2013).
Jones, H. & Mackay, I. Implications of using genomic prediction within a high-density SNP dataset to predict DUS traits in barley. Theor. Appl. Genet.128, 2461–2470 (2015).
Bae, K. M. et al. Development of genomic SSR markers and genetic diversity analysis in cultivated radish (Raphanus sativus L.). Hortic. Environ. Biotechnol.56, 216–224 (2015).
Choi, S. P. et al. Genetic characterisation of commercial Chinese cabbage varieties using SSR markers. Seed Sci. Technol.44, 595–608 (2016).
Jamali, S., Sadeghi, L. & Sadeghin-Motahhar, S. Identification and distinction of soybean commercial cultivars using morphological and microsatellite markers. Iran. J. Crop. Sci.13, 131–145 (2011).
Kwon, Y. S. & Choi, K. J. Construction of a DNA profile database for commercial cucumber (Cucumis sativus L.) cultivars using microsatellite marker. Korean J. Hortic. Sci. Technol.31, 344–351 (2013).
Phan, N. T., Kim, M.-K. & Sim, S.-C. Genetic variations of F1 tomato cultivars revealed by a core set of SSR and InDel markers. Sci. Hortic.212, 155–161 (2016).
Sim, S. C., Hong, J. H. & Kwon, Y. S. DNA profiling of commercial pumpkin cultivars using simple sequence repeat polymorphisms. Hortic. Environ. Biotechnol.56, 811–820 (2015).
Kong, Q. S., Liu, Y., Xie, J. J. & Bie, Z. L. Development of simple sequence repeat markers from de novo assembled transcriptomes of pumpkins. Plant Mol. Biol. Rep.38, 130–136 (2020).
Hao, N. et al. Genome-based breeding approaches in major vegetable crops. Theor. Appl. Genet. https://doi.org/10.1007/s00122-019-03477-z (2019).
Blanca, J. et al. Transcriptome characterization and high throughput SSRs and SNPs discovery in Cucurbita pepo (Cucurbitaceae). BMC Genomics12, 104 (2011).
Zhang, G. et al. A high-density genetic map for anchoring genome sequences and identifying QTLs associated with dwarf vine in pumpkin (Cucurbita maxima Duch.). BMC Genomics16, 1101 (2015).
Holdsworth, W. L., LaPlant, K. E., Bell, D. C., Jahn, M. M. & Mazourek, M. Cultivar-based introgression mapping reveals wild species-derived Pm-0, the major powdery mildew resistance locus in squash. PLoS ONE11, e0167715 (2016).
Montero-Pau, J. et al. An SNP-based saturated genetic map and QTL analysis of fruit-related traits in Zucchini using Genotyping-by-sequencing. BMC Genomics18, 94 (2017).
Montero-Pau, J. et al. De novo assembly of the zucchini genome reveals a whole-genome duplication associated with the origin of the Cucurbita genus. Plant Biotechnol. J.16, 1161–1171 (2018).
Sun, H. et al. Karyotype stability and unbiased fractionation in the paleo-allotetraploid Cucurbita genomes. Mol. Plant.10, 1293–1306 (2017).
Xanthopoulou, A. et al. Whole-genome resequencing of Cucurbita pepo morphotypes to discover genomic variants associated with morphology and horticulturally valuable traits. Hortic. Res.6, 94 (2019).
Elshire, R. J. et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE6, e19379 (2011).
Kishor, D. S. et al. Development of SNP markers and validation assays in commercial Korean melon cultivars, using Genotyping-by-sequencing and Fluidigm analyses. Sci. Hortic. 263, 109113 (2019).
Phan, N. T. et al. Identification of loci associated with fruit traits using genome-wide single nucleotide polymorphisms in a core collection of tomato (Solanum lycopersicum L.). Sci. Hortic.243, 567–574 (2019).
Spindel, J. et al. Bridging the genotyping gap: using genotyping by sequencing (GBS) to add high-density SNP markers and new value to traditional bi-parental mapping and breeding populations. Theor. Appl. Genet.126, 2699–2716 (2013).
Gong, L. et al. Genetic relationships and evolution in Cucurbita as viewed with simple sequence repeat polymorphisms: the centrality of C. okeechobeensis. Genet. Resour. Crop Evol.60, 1531–1546 (2013).
Kong, Q. S. et al. Genetic diversity of Cucurbita rootstock germplasm as assessed using simple sequence repeat markers. Sci. Hortic.175, 150–155 (2014).
Ferriol, M., Pico, B. & Nuez, F. Genetic diversity of a germplasm collection of Cucurbita pepo using SRAP and AFLP markers. Theor. Appl. Genet.107, 271–282 (2003).
Paris, H. A proposed subspecific classification for Cucurbita pepo. Phytologia61, 133–138 (1986).
Kabelka, E., Franchino, B. & Francis, D. M. Two loci from Lycopersicon hirsutum LA407 confer resistance to strains of Clavibacter michiganensis subsp. michiganensis. Phytopathology92, 504–510 (2002).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics25, 1754–1760 (2009).
Glaubitz, J. C. et al. TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS ONE9, e90346 (2014).
Botstein, D., White, R. L., Skolnick, M. & Davis, R. W. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am. J. Hum. Genet.32, 314–331 (1980).
Stacklies, W., Redestig, H., Scholz, M., Walther, D. & Selbig, J. pcaMethods—a bioconductor package providing PCA methods for incomplete data. Bioinformatics23, 1164–1167 (2007).
R: A language and environment for statistical computing (R Foundation for Statistical Computing Vienna, Austria, 2010).
Galili, T. dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics31, 3718–3720 (2015).
Weir, B. S. & Cockerham, C. C. Estimating F-statistics for the analysis of population structure. Evolution38, 1358–1370 (1984).
Dieringer, D. & Schlotterer, C. MICROSATELLITE ANALYSER (MSA): a platform independent analysis tool for large microsatellite data sets. Mol. Ecol. Notes3, 167–169 (2003).
Hurlbert, S. H. The nonconcept of species diversity: a critique and alternative parameters. Ecology52, 577–586 (1971).
El Mousadik, A. & Petit, R. High level of genetic differentiation for allelic richness among populations of the argan tree [Argania spinosa (L.) Skeels] endemic to Morocco. Theor. Appl. Genet.92, 832–839 (1996).
Nei, M. Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics89, 583–590 (1978).
Acknowledgements
This work was supported by the Korea Institute of Planning and Evaluation for Technology in Food, Agriculture, Forestry and Fisheries (IPET) through the Agri-Bioindustry Technology Development Program funded by the Ministry of Agriculture, Food and Rural Affairs (MAFRA) (No. 317011-04-3-HD030) to S.S.
Author information
Authors and Affiliations
Contributions
S.S., J.J., E.S., S.C., Y.P., and G.L. conceived and designed the project. N.N. and M.K. performed experiments and analyzed the data. N.N. and M.K. wrote the first draft of the paper and S.S. critically revised the paper. All authors reviewed and approved the submitted version of the paper.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Nguyen, N.N., Kim, M., Jung, JK. et al. Genome-wide SNP discovery and core marker sets for assessment of genetic variations in cultivated pumpkin (Cucurbita spp.). Hortic Res 7, 121 (2020). https://doi.org/10.1038/s41438-020-00342-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41438-020-00342-9
This article is cited by
-
Evaluation of homozygosity levels of onion (Allium cepa L.) inbred lines using molecular markers evenly distributed in a high-resolution linkage map
Euphytica (2023)
-
Genome-wide analysis-based single nucleotide polymorphism marker sets to identify diverse genotypes in cabbage cultivars (Brassica oleracea var. capitata)
Scientific Reports (2022)
-
Discovery of SNP markers of red shrimp Aristeus antennatus for population structure in Western Mediterranean Sea
Conservation Genetics Resources (2021)