The wax gourd genomes offer insights into the genetic diversity and ancestral cucurbit karyotype

The botanical family Cucurbitaceae includes a variety of fruit crops with global or local economic importance. How their genomes evolve and the genetic basis of diversity remain largely unexplored. In this study, we sequence the genome of the wax gourd (Benincasa hispida), which bears giant fruit up to 80 cm in length and weighing over 20 kg. Comparative analyses of six cucurbit genomes reveal that the wax gourd genome represents the most ancestral karyotype, with the predicted ancestral genome having 15 proto-chromosomes. We also resequence 146 lines of diverse germplasm and build a variation map consisting of 16 million variations. Combining population genetics and linkage mapping, we identify a number of regions/genes potentially selected during domestication and improvement, some of which likely contribute to the large fruit size in wax gourds. Our analyses of these data help to understand genome evolution and function in cucurbits.


3
The current manuscript describe a comprehensive genomic effort focused on wax 4 gourd and its relation with other cucurbit species. The authors started by creating a 5 draft de novo assembly of the genome of this plant, which may be used as a first 6 reference genome for this species. Then they performed various comparative and 7 evolutionary analyses to describe ancestry and evolutionary processes across cucurbit 8 species. Finally, they describe the population structure and genetic variation across 9 wax gourd diverse collection and the mapping of fruit morphology traits. Assembly of 10 a genome for additional cucurbits species is valuable for the evolutionary analysis of 11 this important family and will provide valuable data for that purpose. Assembly of a 12 reference genome for Benincasa hispida is of interest to this crop community and will 13 encourage and facilitate further and more effective genetic research in this species. 14 While this manuscript includes comprehensive genomic information, its major 15 weakness is in the part of the trait mapping data and the associations that are proposed 16 between genetic variation and fruit morphology traits. In that respect, the claim in the 17 abstract ("we found that genes involved in plant hormone signaling and cell cycle 18 regulation likely contribute to the large fruit size ") is an overstatement as it is 19 supported by weak experimental results in the manuscript. 20

Response: 21
Thanks for your comments. To facilitate the identification of the candidate genes for 22 fruit size, we have now added the RNA-seq data of fruit at three (0,10 and 20 days 23 after pollination[DAP]) developmental stages for both wax gourd accession B227 24 bearing large fruit and B214 with small fruit. The differentially expressed genes 25 (DEGs) were identified and analyzed, and in total, 1,642, 4,320 and 4,307 genes were 26 identified as DEGs (Response Data 1-3) at 0, 10 and 20 DAP, respectively (Response 27 Fig.1). To avoid confusion and overstatement, we have also rephrased the abstract. 28 fw3.1 is described and shown to be located within a domestication sweep interval, 127 and a candidate gene (Bhi03G000723) is proposed based on annotated function. 128 The authors also refer to the high expression in fruit of this gene but this is not 129 reflected in Fig. 5f as the expression looks uniform also in leaf and root. In line 130 243, " Fig. 6f" should be corrected to Fig. 5f. It is not mentioned whether this QTL 131 also showed on the GWAS analysis. Taken together this is a speculative 132 discussion on a candidate gene 133 -the co-occurrence of QTL within domestication sweep interval is very probable 134 to be random considering the high proportion of domestication sweep regions. 135

Response: 136
We used an F2 segregating population including 146 individuals which were derived 137 from a cross between landrace accession B214, with fruit of ~2.0 Kg, and cultivated 138 accession 'B227' with fruit of ~20 Kg. This information has been incorporated into 139 our revised manuscript. The physical intervals of these QTLs have been presented in 140 Supplementary data 8. For the candidate gene, Bhi03G000723, we identified five 141 missense variant SNPs in its coding region. However, as mentioned by Reviewer 1, 142 the level of analysis of this candidate gene is limited, and thus any conclusion as to 143 function remains speculative. In view of this fact, we chose a new example 144 candidate gene having more supportive evidence. 145 10. The authors can use syntheny analyses to identify and focus on candidate genes 146 from other cucurbits. Specifically, there are several QTL studies on fruit size and 147 shape QTLs in melon, including the mapping of candidate genes from other 148 species (i.e. tomato). Alignment of QTLs in syntenic regions could add another 149 layer to candidate genes identification. 150

Response: 151
In response to your suggestions, we collected the genes responsible for fruit size in 152 tomato and other cucurbits. By aligning protein sequences of these genes against the 153 genes within sweep and fruit size-related QTL regions in wax gourd, an orthologous (Bhi10G000196) gene was identified. Bhi10G000196 is located at the physical 155 interval of the fl10.1 QTL for fruit length and the domestication sweep from 5.2 Mb 156 to 6.5 Mb. The mutation of its homologue SlFIN (Solyc11g064850) in tomato can 157 cause enlarged fruit 4 . RNA-seq data showed that Bhi10G000196 is down-regulated in 158 the large-fruited accession, at three developmental stages (Response Fig. 2a), 159 compared to that in small fruited-accession, supporting a role in enlarged fruit during 160 wax gourd domestication. This gene, as a new candidate gene with more evidence, 161 was added into the revised manuscript. 166 11. GWAS is described in lines 245-253. Again, in this part, the co-localization of 167 GWAS hits with domestication/improvement sweeps has high probability and 168 could occur by chance alone. There is no description or definition of 169 co-localization parameters (how is the confidence interval for GWAS hit defined?). 170 The candidate gene mentioned in this part (Bhi11G001327) is also indicated 171 within a QTL interval (based on Fig 5b but  The confidence intervals for our GWAS results are defined on the basis of p-values, and this has been added in the revised manuscript. 177 The candidate gene Bhi11G001327 is indeed within the interval of QTL fd11.1. To 178 avoid undue speculation, in the revised manuscript, we provide only a list of genes 179 with significant signals.

Response: 187
We revised this figure and added detail information into the legend. Expression 188 profiles for the candidate genes were generated by RNA-seq. 189 13. Analysis of differential expression of candidate genes between large and small 190 fruited-accession is required to further support the claimed hypothesis on 191 candidate genes. 192

Response: 206
Full term for TE was used. 207 16. Line 269: the verb "prefer" should probably be replaced with a passive verb that 208 describe evolutionary advantage. 209

Response: 210
We combined the Results and Discussion sections in the revised manuscript and this 211 sentence was rephrased. this information used to asses candidate genes in QTL intervals? 232

Response: 233
Yes, this information was used to assess candidate genes in sweeps and QTL intervals. 234 For example, within the physical interval of the fl10.1 QTL for fruit length, one 235 domestication sweep from 5.20 to 6.56 Mb contained 55 genes. Among these genes, 236 Bhi10G000196 is a homologue of SlFIN (Solyc11g064850) in tomato, the mutation of 237 which can cause enlarged tomato fruit 4 . Moreover, differential expression analysis 238 supports its role in affecting fruit size. To provide further evidence, we examined the analyses, the authors concluded that the wax gourd genome represents the most 250 ancestral karyotype of cucurbits, and they identified potential genome regions that 251 have been affected during wax gourd domestication and improvement, as well as 252 candidate genes contributing to large fruit size of wax gourd. The reported wax gourd 253 draft genome provides a valuable resource for future comparative and evolutionary 254 genomic studies, and the study provides certain insights into cucurbit genome 255 evolution and wax gourd domestication. 256 Major: 257 1. The title is quite misleading and inaccurate. First, the genome does not bear a 258 giant fruit. Second, the sequenced wax gourd genome is not an ancestral cucurbit genome; it could have just retained the most ancestral cucurbit karyotype. This 260 needs to be fixed throughout the entire manuscript. 261

Response: 262
Thanks for your insightful comments. The title has been revised to "The wax gourd 263 genomes offer insights into the ancestral cucurbit karyotype and the genetic basis of 264 diversity". The revised manuscript has been written to reflect this situation. 265 2. Genome assembly: The authors need to provide the statistics of assembled contigs 266 as well as the size of gapped regions. 267

Response: 268
We added this information in the revised manuscript (Supplementary Table 2 and 3).

Response: 273
We added more detailed information on ancestral genome reconstruction in the 274 Methods section on "Evolutionary scenario of cucurbit genomes", as follows: 275 By comparing different cucurbit genomes, phylogenetically, we adopted a bottom-up 276 approach to reconstruct the ancestral cell karyotypes of cucurbit plants. Firstly, by 277 inferring putative homologous genes and collinear genes, we drew homologous gene 278 dot plots within a genome and between genomes. Ks values were estimated to infer 279 collinear genes produced by different events, and the information was integrated into 280 the dot plots. Secondly, in that pumpkin is the outgroup of other studied cucurbits, we 281 checked the dot plots to assess whether its chromosomes or main structures of its 282 chromosomes, were shared by other cucurbit plants. Thirdly, the fusion and fission 283 events during genome evolution of cucurbit species from their ancestral chromosomes 284 were determined. 285 Then in the Results and Discussion we revised the section on ancestral chromosomes 286 reconstruction. Actually, we found that, ignoring some intra-chromosome breakages and inversions, seven wax gourd chromosomes, ., were shared with at least one of the 288 other cucurbits, showing that they are most likely proto-chromosomes before the 289 divergence of these cucurbits, ordinally named proto-chromosomes 1-7. For an extra 290 whole-genome duplication in pumpkin, if one copy of a duplicated chromosome is 291 shared with other cucurbits, this would mean that it represents a proto-chromosome in 292 the cucurbit common ancestor. We determined that wax gourd and pumpkin share six 293 out of these 7 proto-chromosomes, suggesting the conserved nature of their genomes. 294 Furthermore, for the wax gourd chromosomes, Bhi2, Bhi3, and Bhi10, both 295 homoeologous copies produced by the cucurbit-common whole-genome duplication 296 was preserved in pumpkin (Cma), showing that they could represent 297 proto-chromosomes before the event. The authors need to provide an explanation of this inconsistency. 317

Response: 358
Thanks for your comments. This question was similar to the fifth question raised by 359 Reviewer #1. We employed several cutoffs (top 1%, 5%, 10%) to detect the sweep 360 regions. Using the top 1% or 5% as cutoff, several QTL regions (fw3.1, fd3.1and ft3.1) 361 related to fruit size in the wax gourd were excluded, but they show obvious sweep 362 signal (see Fig. 5b). Based on this finding, we selected the top 10% as the cutoff, as 363 with maize 1 and pear 2 . To improve the identification efficiency of domestication and 364 improvement sweeps, we further calculated the XP-CLR scores, and retained those 365 with top 50% of XP-CLR scores, as in apple 3 and pear 2 (Response Table 1). 366  between cucumber and melon (about 10 MYA) 6 . This information was added to the 386 revised manuscript. 387 9. Fig. 1 and Fig. 2 legends: please add the meanings of sWGD, CCT, ECH… 388

Response: 389
The abbreviations sWGD, CCT and ECH are specific whole genome duplication, 390 cucurbit-common tetraploidization, and eudicot-common hexaploidization, 391 respectively. This information was added in the revised manuscript. 392 10. Non-wax group and wax group could be compared to identify some interesting 393 genome regions that may underlie the phenotypic difference. 394

Response: 395
Population fixation index (FST) between Cultivar2 (non-wax) and Cultivar1 (wax) 396 group was calculated using HIERFSTAT 7 R package, base on the high confidence 397  shared with other cucurbits, this would mean that it represents a proto-chromosome in 465 the cucurbit common ancestor. We determined that wax gourd and pumpkin share six 466 out of these 7 proto-chromosomes, suggesting the conserved nature of their genomes. 467 Furthermore, for the wax gourd chromosomes, Bhi2, Bhi3, and Bhi10, both 468 homoeologous copies produced by the cucurbit-common whole-genome duplication 469 was preserved in pumpkin (Cma), showing that they could represent 470 proto-chromosomes before the event.

Response: 493
We have attempted to use several population genetic methods such as ROD, Tajima D, XP-CLR to investigate signatures of domestication and improvement. About 74.5% 495 regions overlapped between the results from different methods, using the same cutoff. 496 To improve the accuracy, we first define the regions with the top 10% reduction of 497 nucleotide diversity, and then excluded those without top 50% of XP-CLR scores, as 498 in apple 3 (Response Table 1). 499 Section 'Candidate genes conferring fruit traits': 500 -How the candidate genes are selected among the GWAS/QTL intervals and 501 associated improvement/domestication sweeps? Several candidates may probably 502 pass the considered criteria, why and how a single candidate is then 503 presented/selected? How the interval boundaries are defined? How many annotated 504 genes in the intervals? How many gene with selection footprints? How a single gene 505 is finally selected as candidate? Any functional validation (using publicly available 506 resources among the Cucurbitaceae) have been conducted to propose the delivered 507 candidate genes? 508

Response: 509
To facilitate the identification of candidate genes responsible for fruit size, we 510 generated RNA-seq data for fruit at three (0, 10 and 20 DAP) developmental stages, 511 for the large (B227) and small (B214) fruited accessions. Differentially expressed 512 genes were then used to identify potential candidate genes for fruit size during 513 domestication and improvement. 514 In addition, we collected the genes responsible for fruit size in tomato and other 515 cucurbits. By aligning protein sequences of these genes against the genes within 516 sweep and fruit-size related QTL regions, in wax gourd, one orthologous 517 (Bhi10G0001968) gene was identified. In addition, a gene with more supporting 518 evidence for a role in fruit enlargement was included in the revised manuscript. 519 Minor concerns 520 The authors should make sure that all the data are provided as supplementary datasets; 521 -OrthoMCl gene families. 522 -List of orthologous and paralogous genes. 523 -List of genes with domestication and improvement sweeps. 524 -List of GWAS and QTL regions and associated annotated genes. 525 Among many others, so that that "skilled in the art" can reproduce the results and can 526 integrate the delivered data in complementary analyses. 527

Response: 528
We have now provided all the data in our revised manuscript.  Relatively limited further experimental data was provided to the current version of the 564 manuscript with respect to the candidate genes classification, prioritization or 565 validation (differential gene expression data between the parental lines on developing 566 fruits was added). This part of the manuscript was adjusted and slightly improved and 567 is now more coherent and focused. While it now better reflecting the message that the 568 combined approaches of genetic mapping and selection signature regions detection 569 can be informative to detect candidate domestication genes, none of the candidates 570 presented pass beyond this level of hypothesis. 571 572 Specific comments: 573 1. Lines 302-309: Bhi10G001538 gene is candidate only based its localization within 574 a domestication sweep region and on differential expression between two large and 575 small-fruited accessions. This, in my view, is still very weak and speculative. In 576 addition, it seems that a fruit size QTL was not mapped in this interval, which further 577 weakens this hypothesis. 578

Response: 579
Thanks for your comments. To provide more evidence for the candidate gene 580 Bhi10G001538, we analyzed its ortholog (Csa2G258100) in cucumber. It was found 581 that Csa2G258100 was mapped in the QTL interval fd2.1 for fruit diameter 1 . 582

RNA-seq data of two near isogenic cucumber lines bearing different fruit in length 583
show that this gene is significantly up-regulated in long-fruited line 2 (Response Fig.  584 1), which is consistent with its role in wax gourd. 585 In addition, to test the function of SAUR(Bhi10G001538) gene in cell expansion in 586 wax gourd, we transiently expressed 35S-MYC-SAUR and a vector control in 587 cotyledons of wax gourd by agroinfiltration. The expression of MYC-SAUR protein 588 was detected by immunoblotting using anti-MYC antibodies when infiltrated with 589 OD 600 at 0.9 (Response Fig. 2a). We further investigated the effects of expression of 590 SAUR protein on the cell size of epidermal pavement cells at 5 days post infiltration. 591 The results revealed that the cell size was significantly larger in cotyledon expressing 592 MYC-SAUR (Response Fig. 2b and 1c). These results provide further evidence that 593 the SAUR gene plays an important role in cell expansion and plant organ size. 594 For wax gourd, we have not developed a high-efficiency transformation system. Thus, 595 presently, it is not possible to validate the function of this gene using transformation 596 system. In a future project, we plan to validate these genes for wax gourd. In this 597 manuscript, we focus on the novel insights into the ancestral cucurbit karyotype and 598 the genetic basis of wax gourds' diversity through our analyses of these genome data. 599 These new findings have been incorporated into the revised manuscript. Couldn't find it at the provided reference (#27). Also, on line 365 it is stated that it is 612 highly expressed in the fruit ( Figure S14). This is not completely correct. The highest 613 expression of this gene according to figure S14 is in the leaf and root. 614

Response: 615
We apologize for the error in the reference which, unfortunately, was mis-linked by 616 endnote automatically. This error has been corrected in the revised manuscript. SlFIN 617 (Solyc11g064850) is expressed in different tissues and at different stages in tomato 3 . 618 We also investigated the SlFIN ortholog in other cucurbit species, such as cucumber 619 and melon. In cucumber, the highest expression of FIN is in the leaf 4 and for melon it 620