A high-density linkage map and QTL mapping of fruit-related traits in pumpkin (Cucurbita moschata Duch.)

Pumpkin (Cucurbita moschata) is an economically worldwide crop. Few quantitative trait loci (QTLs) were reported previously due to the lack of genomic and genetic resources. In this study, a high-density linkage map of C. moschata was structured by double-digest restriction site-associated DNA sequencing, using 200 F2 individuals of CMO-1 × CMO-97. By filtering 74,899 SNPs, a total of 3,470 high quality SNP markers were assigned to the map spanning a total genetic distance of 3087.03 cM on 20 linkage groups (LGs) with an average genetic distance of 0.89 cM. Based on this map, both pericarp color and strip were fined mapped to a novel single locus on LG8 in the same region of 0.31 cM with phenotypic variance explained (PVE) of 93.6% and 90.2%, respectively. QTL analysis was also performed on carotenoids, sugars, tuberculate fruit, fruit diameter, thickness and chamber width with a total of 12 traits. 29 QTLs distributed in 9 LGs were detected with PVE from 9.6% to 28.6%. It was the first high-density linkage SNP map for C. moschata which was proved to be a valuable tool for gene or QTL mapping. This information will serve as significant basis for map-based gene cloning, draft genome assembling and molecular breeding.


Results
Analysis of ddRAD data and tags development. DNA samples from two parents and 200 F2 individuals derived from CMO-1 × CMO-97 was subjected to ddRAD seq. ddRAD library was constructed using EcoRI and NlaIII for digestion. By ddRAD library construction and high throughput sequencing, a total of 86.77 G data was generated with an average of 429.54 Mb for each sample (Supplementary Table S1). Among the high-quality SNPs, a total of 74,899 SNPs were polymorphic. After filtering low quality sequences, a total of 18,314 co-dominant loci SNPs which were all present in the two parents were applied for neighbor-joining tree construction ( Supplementary Fig. S1). In the neighbor-joining tree, all F2 individuals were clustered normally and could be employed for downstream analysis. After discarding markers with missing rate under 20%, a total of 3,470 biallelic SNP markers were targeted for genetic map construction.
Characteristics of the high resolution linkage genetic maps. The 3,470 markers were mapped onto 20 LGs with a LOD (logarithm of odds threshold) of 6.0, designated LG1-LG20 using Joinmap 4.0 software (Fig. 1). The total length of the map was determined to be 3087.03 cM with the average interval of 0.89 cM and the largest interval of 22.298 cM. About 80% of the marker intervals were within 5 cM. The sizes of LGs  (Table 1). LG1 was most saturated than other LGs with an average marker interval of 1.01 cM and the smallest LG, LG15, contained 128 markers with an average marker interval of 0.68 cM. Map length calculations were performed based on recombination frequencies. The information of all markers on the LGs was presented in Supplementary Table S2. Considering full genome sequences of C. moschata were not reported, all the markers in the map were involved in genome comparative mapping to genome sequences of cucumber, watermelon and melon. With sequencing identity of ≥80%, 2,073, 2,422 and 2,167 markers were selected for anchoring in cucumber, watermelon and melon genomes, respectively (Fig. 2). A part of markers on the same LGs could be anchored to the same chromosome, especially those markers in the close location.
Characterization of fruit-related traits of two parents and correlation of fruit traits. Two parents, CMO-1 and CMO-97, derived from different regions, had considerable differences in fruit morphologies including pericarp color, flesh color, stripes, tuberculate, fruit shape and width (Fig. 3). The fruit flavor of two parents was also different with sweet of CMO-97 but glutinous of CMO-1. Hence, we measured fruit morphologies and quantified fruit metabolites by high performance liquid chromatography (HPLC) with a total of 14 traits ( Table 2, Supplementary Fig. 2). Phenotype data (including pericarp color, stripe, lutein, β-carotene, α-carotene, total carotenoids, glucose, sucrose, glucose versus sucrose ratio, tuberculate, hollow, fruit diameter, pulp thickness and chamber width) of two parents, F1 individuals and F2 individuals were presented in Supplementary Table S2. Fruit of CMO-1 had dark green pericarp with tuberculates and no stripes while fruit of CMO-97 had light green pericarp with stripes and no tuberculates (Fig. 3). A hollow was detected in the fruit chamber of CMO-1 but not in CMO-97 (Fig. 3). Fruit diameter and chamber width were larger in CMO-1 than CMO-97 while pulp of CMO-97 was thicker than CMO-1. Carotenoid compositions were obviously different for the two parents: CMO-1 accumulated more lutein and total carotenoid contents than CMO-97 but less β-carotene and α-carotene contents than CMO-97. There were three sugars, glucose, sucrose and fructose detected in the two parents. CMO-1 accumulated higher glucose content but lower sucrose content and sucrose versus glucose ratio than CMO-97 ( Table 2, Supplementary Fig. 2). There were small percentages of fructose in both parents. As a result, fructose QTL was not identified with LOD threshold of 3.0. Using person's correlation analysis, 14 traits among the F2 population were highly correlated (Table 2), for instance, lutein was correlated with all traits except α-carotene, glucose, sucrose and tuberculate. The lowest correlation coefficients were found between β-carotene and glucose, total carotenoid and sucrose, α-carotene and pulp thickness, and glucose and tuberculate. Morphological traits were highly correlated with each other. β-Carotene, lutein, α-carotene and total carotenoid contents were highly correlated. Even though sucrose content was not correlated with glucose content, the ratio of sucrose versus glucose was correlated with each content.
Fine mapping of pericarp color and pericarp stripe. Pericarp color of CMO-1 was dark green with no stripes in immature fruit and not easy to turn orange during maturing while the pericarp color in CMO-97 was light green with many yellow stripes and easy to turn orange (Fig. 3) QTL mapping of fruit carotenoid, sugar and shape traits. In addition to fine mapping of pc and ps, 29 QTLs for other fruit-related traits were mapped based on the phenotype data ( Fig. 4., Table 3, Supplementary  Fig. 3  glucose ratio, one QTL on LG19 was identified in different region as glucose with PVE of 12.4%, respectively. For the fruit morphological traits, there were 2 QTLs for fruit tuberculate trait with PVE from 12.3% to 16.9%, 2 QTLs for fruit hollow with PVE from 10.3% to 16.9%, 2 QTLs for fruit diameter with PVE from 11.1% to 19.0%, 3 QTLs for fruit thickness with PVE from 10.3% to 15.2%, 3 QTLs for chamber width with PVE value from 9.6% to 11.4%. With genome LOD threshold, there were two significant QTLs for lutein and one for traits including β-carotene, total carotenoids, tuberculate, hallow, diameter and thickness among all the QTLs. Same significant QTL located on LG20 was identified for lutein, β-carotene and total carotenoids which were correlated with each other described before. Thickness and diameter were correlated in F2 population which shared the same significant QTL in the LG8. With the group threshold, the region distances for all 29 QTLs were from 0.1 cM to 18.30 cM. There were 2 to 23 markers in the QTL loci for all the QTLs. The common QTL on LG20 for lutein, β-carotene and total carotenoids had a long region of 13.69 cM which included 23 markers and could be divided into three small possible regions including 1.17cM-2.54 cM, 4.69 cM-6.152 cM and 7.12 cM-7.64 cM.

Discussion
This study described the construction of the first high-density linkage map of C. moschata using SNP markers developed by ddRAD technology. Consequently, a high-density linkage map consisting of 20 LGs was constructed based on 3,470 SNPs filtered through 74,899 polymorphic SNPs. Compared with reported two genetic maps for C. moschata which individually comprised 205 SSR markers and 95 SSR markers, the density of this map was improved greatly 20 . Thus, for the first time, a dense linkage map of C. moschata was generated. Compared with other density maps of C. maxima and C. pepo, there were more individuals with 200 individuals involved in map construction 31,33 ; the map length, 3087.03 cM, was longer than that reported for C. maxima with 2,566.8 cM and C. pepo with 2817.6 cM. Moreover, we identified more SNPs than C. maxima but lower than C. pepo 31,33 . It can be explained that the variability of C. moschata is probably higher than C. maxima but lower than C. pepo. Based on this map, we identified a novel gene locus for pericarp color and stripe traits and 29 QTLs for other 12 fruit-related traits explaining 9.6-93.6% of the phenotypic variance. Compared with fruit-related QTL mapping of C. pepo using RIL population by GBS technology explaining 1.5-62.9%, the genetic map constructed in our study was sufficient for QTL mapping 33 .
In this study, pc and ps were identified to be controlled by the same single gene which was fine mapped to a region of 0.31 cM on LG8 with estimated PVE of 93.6% and 90.2%, respectively. It was previously reported that dark green with multiple mottling was dominant to light green with slight mottling in pericarp of mature fruits 20,48 . Green color locus in one population had a distance of 3.3 cM away from locus in another population and locus region were unknown as there was only a single marker in the locus 20 . In contrast, we found light green   with stripes was dominant to dark green with no stripes which was predicted to be controlled by a novel gene. Based on the dense linkage map, pericarp color and stripes were fine mapped to the same small region of 0.3 cM.
In zucchini, there were multiple genes controlling rind color and major QTLs controlling the rind color of the immature and mature fruit were present in different regions of LG 4 with PVE of 40.6% and 18.5%, respectively 33 . In other cucurbits, mature fruit color in cucumber was identified to be fine mapped in a single locus on chromosome 4 48 and in wax gourd it was controlled by a single gene which was fine mapped on LG5 32 . In additon, there were 2 QTLs for stripe trait with PVE of 19.8% and 61.7% in melon 43 . Afterall, it was not consistent for genetic controlling patterns of pericarp color and stripes in different cucurbits. The pc and ps controlled by a single and novel gene mapped in this study would be used for reference for other Cucurbita species. Abundance of lutein, a yellow carotenoid, in CMO-1 contributes the yellow pulp while high percentages of orange carotenoids, β-carotene and α-carotene, contribute the orange pulp of CMO-97. Due to the difference in carotenoid composition of two parents, 2 loci in LG11 and LG20 separately explained over 20% and 3 loci in LG8 and LG11 separately explained over 10% of the variation for lutein content, suggesting the high efficiency in QTL detection using ddRAD technology (Table 2, Fig. 5). There were 4 identical QTLs for lutein and total carotenoids and 2 identical QTLs for β-carotene and lutein since the three carotenoid traits were closely correlated with each other. Moreover, it was the same region controlling lutein, β-carotene and total carotenoids traits in LG20 with the PVE over 20% which suggested that QTL locus in LG20 could contain a gene regulating the carotenoid metabolic pathway in general. QTLs for minor carotenoids, α-carotene, were localized on different positions compared with other three carotenoid traits. The main reason suggested for α-carotene localized on different position as other carotenoid traits was that α-carotene was not correlated with lutein and had small correlation coefficients with other two carotenoid traits. It was reported that variation in carotenoid composition, such as, high β-carotene content, was determined by structure genes of the carotenoid biosynthetic pathway in tomato 49 . In cucumber, a locus with genetic variant in a β-carotene hydroxylase gene was identified to explain high β-carotene content 28 . Subsequently, further QTL cloning study is required to explain carotenoid traits.
Sugar is a significant quality trait of pumpkin. There are considerable different sugar composition between two parents since sucrose was the dominant sugar in CMO-97 while glucose was the dominant in CMO-1. Even though no QTLs related to Brix and fructose were detected with LOD threshold of 3.0 in present study, three single QTLs located in different locus with PVE of about 10% were identified for glucose, sucrose and ratio of glucose versus sucrose, respectively. Similarly, two single QTLs for glucose and sucrose detected were detected in watermelon using genetic linkage map constructed by CAPS and SSR markers but was in the identical locus with PVE of less than 20% 50 . In the integrated watermelon map, there were one QTL for glucose and 3 QTLs for  Table 3. sucrose with PVE from 6.39% to 16.71% 51 . In melon, 7 QTLs for sucrose and one for glucose were identified using RIL population with PVE from 11.3% to 19.7% 43 . It is possible that there are multiple genes controlling sucrose and glucose content in cucurbits. Nevertheless, due to no sugar QTLs cloned for cucurbits previously, more efforts on QTL cloning is necessary to reveal molecular regulation of sugar biosynthesis.
Fruit morphological traits were significant for yield and fruit quality. In this study, we identified 12 QTLs for tuberculate, hollow, diameter, thickness and chamber width traits with PVE from 9.6% to 19.0%. There were common loci for the 12 QTLs, such as in LG8 and LG13, because the fruit morphological traits were highly correlated (Table 2, Fig. 4). Interestingly, we found in LG8 which were located with several QTLs and a gene locus which could be explained that LG8 contained multiple fruit-related genes (Fig. 4). In pumpkin species, C. pepo, 2 QTLs for mature fruit width were identified with PVE of 7.42% to 15.14% and other fruit morphological traits, such as fruit length and shape were mapped 33 . In other cucurbits, cucumber, melon, watermelon, a number of fruit morphological QTLs have been identified 30,34,37,41,43,50,52 . However, fruit morphological candidate genes were seldom identified 33,41 . Our study firstly identified fruit morphological QTLs for C. moschata which paved the way for cloning candidate gene. The developed SNP markers linked to fruit-related traits in the loci facilitates molecular breeding of new varieties through marker-assisted selection (MAS). It has been proved effective for improving quantitative and qualitative traits in cucumber [53][54][55][56] , melon 57,58 , watermelon 59,60 even Cucurbita species, C. pepo 61 . However, it was still in its infancy for application in C. moschata. Flanking markers with small mapping regions (<1 cM) have the potential to increase MAS efficacy by reducing errors during selection associated with double crossing-over. Here, the average interval of SNP markers was lower than 1 cM and there were 2-28 markers in the identified QTL regions. Thus, SNP markers in the loci were prime candidates for MAS breeding to increase gain from selection of pericarp color, stripe traits, fruit morphology traits and fruit carotenoid, sugar content traits and assisting the new variety generation. Our research will also be valuable for future genome alignment genetic improvement since genome sequences of C. moschata were not released. By comparing genome sequence of other cucurbits, it can provide useful information for genetic analysis prediction, linear relation and gene location speculation and evolutionary relationships. Moreover, our study will facilitate the identification of candidate genes underlying of fruit-related QTLs which can be exploited for searching new attractive market productions. The study of some traits, such as carotenoid and sugar composition, which were poorly studied, opened the new possibilities to modify the nutritional value. Subsequently, this map will allow genetic studies and molecular breeding within this species and also be elucidating for the evolution of different Cucurbita species.

Materials and Methods
Plant materials, growth condition and fruit collection. Two C. moschata highly inbred lines, CMO-1 and CMO-97, were employed in this study. CMO-1 was derived from Thailand and CMO-97 was derived from South China which was a 'miben' type germplasms. Two germplasms all bear small fruits (~1.5 kg) but exhibit contrasting phenotypes for other fruit traits, such as pericarp color, pericarp strip, flesh color and sweetness. The phenotypes and different traits of two parents were showed in detail in the results section. A population of 200 F2 individuals was generated from a single F1 plant by cross of CMO-1 × CMO-97. 20 parental lines, 20 F1 individuals and 200 F2 individuals were cultivated in the research experiment field of Vegetable Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China in February 2016. Germinated seeds were sown in plastic pots filled with growth medium for 2 weeks in greenhouse. Subsequently, seedlings were transplanted to field in the ridges with 30 cm in height and 200 cm in width and distance of the seedlings in rows and column was 60 cm. The ridges were filled with basal fertilizer in advance and irrigation was carried out when water was needed and fertilizer was added once before fruiting stage. Vines were leaded to climb to the trellises when it was about 2 meters long. Female flowers were pollinated with pollen from male flowers of the same plant before flowering. After one fruits sited successfully, the end flower bud in the major tendrils and lateral tendrils were cut to make sure enough nutrition for fruit. Young and healthy leaves from the two parents as well as F2 individuals were picked and frozen in liquid nitrogen immediately, and stored at − 80 °C freezer for DNA extraction which was employed for construction of high genetic density map. All self-hybridized fruits were harvested at 45 days after pollination (DAP) which were applied for fruit-related phenotypes analysis. Fruit flesh without peel and seeds were fragmented and placed in blender (Philips HR7628/00) to obtain the homogeneous mass. Fresh samples were temporary stored at −80 °C refrigerator and were then orderly freeze dried to remove water. Dried fruit samples were grounded to powder and stored permanently at −80 °C refrigerator for carotenoid and sugar measurements.
Phenotype analysis of the parents and population. A total of 14 traits, including pericarp color, stripe, lutein, β-carotene, α-carotene, total carotenoids, glucose, sucrose, glucose versus sucrose ratio, tuberculate, hollow, fruit diameter, pulp thickness and chamber width, were analyzed. Pericarp stripe was analyzed using 10 DAP fruits of all 200 F2 individuals and pericarp color was analyzed from 20 DAP to 45 DAP fruits of 148 F2 individuals for three times through visual assessment. Other fruit shape traits were measured using fruits of 148 F2 individuals harvested at 45 DAP. Each pumpkin fruit was divided into half by longitudinal cuts at the widest cross section to measure the diameter, chamber width (chamber diameter) and pulp thickness using ruler. Hallow was visually estimated from the cross section. Tuberculate fruit was evaluated using 5 grades as follows: CMO-97 with no tuberculate was estimated as 1st grade; CMO-1 with many tuberculates in the pericarp was estimated as 5th grade; the other three type fruits with little to some tuberculates were estimated as 2st to 4st grades. Among 148 F2 individuals, fruits of 127 F2 individuals with BRIX > 4.0 representing normal accumulation of nutrient were chosen for further carotenoid and sugar measurement. Carotenoids were separated by HPLC connected with photo diode array detector (Waters, Milford, MA, USA) on a Waters Spherisorb 5 μm ODS2 4.6 mm × 250 mm column (Waters, Milford, MA, USA) as reported 62 . Carotenoids were eluted at a flow rate of 1.2 ml/min with a linear gradient from 100% solvent A [acetonitrile/methanol/0.1 M TRIS-HCl (pH 8.0), 84:2:14, v/v/v] to 100% solvent B (methanol/ethylacetate, 68:32, v/v) over a 15-min period, followed by 10 min of 100% solvent B. Carotenoids including lutein, β-carotene and α-carotene were measured at 450 nm and identified by comparing retention times and spectra against standard compounds which were finally quantified by integrating peak areas and converted them to concentrations by comparison with authentic standards that purchased from Sigma. Total carotenoid content was the sum of lutein, β-carotene and α-carotene contents. Sugars were measured using HPLC connected with a refractive index (RI) detector (Waters, Milford, MA, USA) as described by Chávez-Servıń et al. with some modification 63 . The chromatographic separation was achieved using a Xbridge Amide (Waters, 4.6 mm × 150 mm, 3.5 μm) operating at 40 °C. The mobile phase was acetonitrile/deionized water, 8:2 (v/v) which flowed at a flow rate of 1 ml/min for 30 mins. Sugars including glucose, sucrose and fructose were quantified by comparison with authentic standards purchased from Sigma. Ratio of glucose versus sucrose was calculated in EXCEL. DNA extraction. Genomic DNA from two parents and their offspring was isolated by traditional phenol-chloroform extraction method in combination with RNase treatment and stored at -20 °C 64 . Before construction of ddRAD libraries, all DNA samples were quantified using a NanoDrop instrument (Thermo Scientific, DE, USA) and agarose gel electrophoresis. 202 individuals' genomic DNA with high purity (OD 260 / 280 = between 1.8 ~ 2.0; OD 260 / 230 = 1.8~ 2.0) and good integrity (molecular size of the primary band > 20 kb) were finally chosen for ddRAD libraries construction. Their concentrations were adjusted to 50 ng/mL using Tris-EDTA buffer.
Library construction and sequencing. ddRAD libraries were constructed according to the method described by Peterson et al. 28 . Briefly, 500 ng of DNA template from each individual was double-digested using two restriction enzymes, EcoRI and NlaIII (New England Biolabs [NEB], Ipswich, MA, USA; 20 U/reaction) in one combined reaction for 30 min at 37 °C. Subsequently, each fragmented sample was purified using a QiagenMinElute Reaction Cleanup Kit (Qiagen, Valencia, CA, USA) and eluted in 20 μL elution buffer (EB). Fragments were then ligated to P1 adapters (including a unique 4-8-bp multiplex identifier [MID] used to distinguish each individual) that bound to EcoRI-created restriction sites and P2 adapters that bound to overhangs generated by NlaIII. In each reaction, 500 ng DNA, 1 μL P1 adapter (10 mM), 1 μL P2 adapter (10 mM), 1 μL T4 ligase (1,000 U/mL), 4 μL of 10 × T4 ligation buffer, and double-distilled water were combined into a total volume of 40 μL. The ligation was processed on a polymerase chain reaction (PCR) machine using the following conditions: 37 °C for 30 min, 65 °C for 10 min, followed by a decrease in temperature by 1.3 °C/min until the temperature reached 20 °C. Samples were pooled and size-selected (400-600 bp) from an agarose gel. Then DNA product was purified using a QiagenMinElute Gel Purication Kit and eluted in 10 μL EB. Paired-end (150 bp) sequencing of the ddRAD products from the 202 individuals was performed using an IlluminaHiSeqXten sequencing platform (Illumina, Inc., San Diego, CA, USA). Sequencing data for each individual were then extracted according to the specific MID. SNP discovery and genotyping. We first filtered out Illumina short reads lacking sample-specific MIDs and expected restriction enzyme motifs. Then, reads were filtered on the basis of quality score using Trimmomatic (v0.32) 65 in three steps: (1) removing adapters; (2) removing reads with bases from the start or end of a read, if below the quality threshold; and (3) scanning the reads with a 4-bp sliding window, removing the read when the average Phred quality per base was below 10. The STACKS pipeline was used to assemble loci de novo from the sequencing data for SNP calling 66  Genome comparative mapping. To detect cross-species synteny, each amplicon or unigene was BLASTN searched against the genome sequences of cucumber 17 , watermelon 19 and melon 18 and the sequences were considered orthologous if sharing ≥80% sequence identity with an e-value ≤1e-5. In cases where multiplehits occurred, only the best hits were used. The software Circos was employed to visualize the genome syntenic relationships 68 . QTL mapping. QTLs were identified using MapQTL 5.0 69 with multiple QTL mapping (MQM). Automatic cofactor selection (backward elimination, P < 0.05) was used for the detection of significantly associated markers as cofactors. LOD significance threshold levels were determined on the basis of 1,000 permutations at significance levels of P < 0.05. The location of each QTL was determined according to its LOD peak location and surrounding region. The percentage of the phenotypic variance explained by a QTL (R2) was estimated at the highest probability peak.