Population-scale peach genome analyses unravel selection patterns and biochemical basis underlying fruit flavor

Yu, Yang; Guan, Jiantao; Xu, Yaoguang; Ren, Fei; Zhang, Zhengquan; Yan, Juan; Fu, Jun; Guo, Jiying; Shen, Zhijun; Zhao, Jianbo; Jiang, Quan; Wei, Jianhua; Xie, Hua

doi:10.1038/s41467-021-23879-2

Download PDF

Article
Open access
Published: 14 June 2021

Population-scale peach genome analyses unravel selection patterns and biochemical basis underlying fruit flavor

Yang Yu ORCID: orcid.org/0000-0001-6123-4992¹^na1,
Jiantao Guan¹^na1,
Yaoguang Xu¹^na1,
Fei Ren²^na1,
Zhengquan Zhang¹,
Juan Yan³,
Jun Fu ORCID: orcid.org/0000-0001-9969-8944¹,
Jiying Guo²,
Zhijun Shen³,
Jianbo Zhao²,
Quan Jiang²,
Jianhua Wei¹ &
…
Hua Xie ORCID: orcid.org/0000-0003-2867-4114¹

Nature Communications volume 12, Article number: 3604 (2021) Cite this article

8622 Accesses
30 Citations
5 Altmetric
Metrics details

Subjects

Abstract

A narrow genetic basis in modern cultivars and strong linkage disequilibrium in peach (Prunus persica) has restricted resolution power for association studies in this model fruit species, thereby limiting our understanding of economically important quality traits including fruit flavor. Here, we present a high-quality genome assembly for a Chinese landrace, Longhua Shui Mi (LHSM), a representative of the Chinese Cling peaches that have been central in global peach genetic improvement. We also map the resequencing data for 564 peach accessions to this LHSM assembly at an average depth of 26.34× per accession. Population genomic analyses reveal a fascinating history of convergent selection for sweetness yet divergent selection for acidity in eastern vs. western modern cultivars. Molecular-genetics and biochemical analyses establish that PpALMT1 (aluminum-activated malate transporter 1) contributes to their difference of malate content and that increases fructose content accounts for the increased sweetness of modern peach fruits, as regulated by PpERDL16 (early response to dehydration 6-like 16). Our study illustrates the strong utility of the genomics resources for both basic and applied efforts to understand and exploit the genetic basis of fruit quality in peach.

Marker-trait associations and genomic predictions of interspecific pear (Pyrus) fruit characteristics

Article Open access 21 June 2019

Genetic and phenotypic analyses reveal major quantitative loci associated to fruit size and shape traits in a non-flat peach collection (P. persica L. Batsch)

Article Open access 01 November 2021

Genome-wide association studies provide insights into the genetic determination of fruit traits of pear

Article Open access 18 February 2021

Introduction

Fruits are an indispensable component of healthy human diets, providing vitamins, minerals, dietary fibers, antioxidants, and calories¹. Sweetness and acidity are two of the important flavor determinants which influence consumer preference and acceptability². Current genome researches have strengthened the genetic basis underlying these two internal quality properties for fruit flavor improvement in many fruit crops^3,4,5,6,7.

Domesticated peach (Prunus persica (L.) Batsch), a model for genetics and genomics of the genus Prunus and other related Rosaceae perennial fruit crops⁸ especially in study on the formation mechanism of fruit quality⁹, originated in China over two million years ago (MYA)^10,11 and had undergone thousands of years’ cultivation and improvement, particularly for fruit quality in China^12,13. Chinese peach germplasm has been foundational in the development of virtually all modern peach cultivars¹². Two phases of peach dispersal from China have together profoundly impacted the genetic diversity of modern cultivars worldwide: an initial dispersal of primitive peach landraces (presumably) from northwestern China (dating from the final centuries BC) and the later dispersal of landraces with excellent fruit quality (particularly low-acid and sweet peaches) from eastern China (dating from mid-19th century) to locations around the world^{12,14,15,16,17}. It is notable that current preferences for peach flavors differ substantially around the world, forming two typical flavor types: sweet, low-acid vs. sweet, acid taste, respectively favored by eastern and western consumers^18,19. However, molecular mechanisms that explain how past genetic improvement had shaped such alternative fruit flavors are still not well characterized.

Recent genomic studies of cultivated peaches and some of their wild relative species have identified specific genome regions targeted by human selection, some of which are related to fruit taste flavor, clarifying that such selection occurred both during domestication^11,20 and subsequent improvement efforts^20,21. However, much remains unknown about how specific improvement-related loci/genes have contributed to peach fruit flavor. Although previous studies in peach have reported some QTLs and/or candidate genes for fruit sweetness and acidity flavor-related traits^{22,23,24,25,26,27,28,29}, their actual genetic determinant(s) underlying these QTLs have not been identified. Partially accounting for difficulties in advancing from the peach QTLs down to the gene level, the resolution power for linkage studies has been restricted in peach by its narrow genetic basis and high level of linkage disequilibrium (LD)¹⁵. Ultimately, these are related to its long-generation time and self-compatibility³⁰—few recombinant events and the small sizes of examined segregating populations for linkage analysis—as well as the relatively limited number of examined germplasm collections used in GWAS (genome-wide association study) analysis^15,27.

The current peach reference genome Lovell v2.0 (227.4 Mb, assembled based on Sanger sequencing data)^31,32 is from the doubled haploid PLOV2-2N of a western cultivar Lovell that has been widely used as rootstock³³. Notably, the Chinese Cling peaches are regarded as the most influential germplasm in the history of global peach breeding^15,34, yet the absence of the genome assembly of this fundamental material has hindered full exploration of the genetic basis of peach improvement.

Here, we present a high-quality P. persica reference genome (257.2 Mb) of Longhua Shui Mi (hereafter referred to as LHSM) (Supplementary Table 1), a typical eastern “juicy honey peach” (Shui Mi Tao in Mandarin Chinese) and a representative of the Chinese Cling peaches that feature a pleasant sweet and low-acid taste flavor³⁵. We also collect genome data for a total of 548 diverse P. persica accessions representing Chinese landraces as well as modern eastern and western cultivars, and 15 close wild relative P. kansuensis accessions. Population genomic analyses of these genomes identify a set of improved landraces (ILs), notable for their obviously contributions as elite germplasm for modern peach breeding worldwide, and our analyses show a clear trend of eastward dispersal of these landraces in the historical period before formal modern peach breeding was initiated. We also perform GWAS based on multi-year fruit flavor-related phenotypic data, and identify loci underlying the sweetness- and acidity-related flavor traits of peach fruits. Biochemical analyses of candidate genes using peach mesocarp tissues confirm that the PpALMT1 (aluminum-activated malate transporter 1) promotes malate accumulation and that PpERDL16 (early response to dehydration 6-like 16) increases fructose content during peach improvement.

Results

A high-quality LHSM reference genome

The genome of LHSM was de novo assembled using 30.90 gigabases (Gb) of PacBio long reads (~120.13× coverage), 27.71 Gb of Illumina short reads (~107.73× coverage), and 37.87 Gb (~147.25× coverage) of Hi-C data (Supplementary Fig. 1 and Supplementary Table 2). Based on a k-mer analysis using all Illumina reads, the LHSM genome size was estimated to be ~271 Mb, with a heterozygosity of 0.32% (Supplementary Table 3). The final assembled genome size reached up to ~257.2 Mb, covering ~95% of the estimated genome (Table 1), and the assembly comprised 243 contigs with a contig N50 of 5.17 Mb. A total of 145 contigs, which accounted for 95.7% (~246.0 Mb) of the total assembled genome, were anchored into eight pseudo-chromosomes using the Hi-C reads (Fig. 1a, Supplementary Fig. 2, and Supplementary Table 4).

Table 1 Summary statistics for the LHSM genome assembly in comparison with the Lovell v2.0 reference genome.

Full size table

The LHSM genome assembly exhibited a significantly high Pearson correlation coefficient (R) (ranging from 0.95 to 0.99 for different chromosomes) with the recently reported peach genetic map³⁶ (Supplementary Fig. 3), suggesting an excellent linear agreement between the physical and the genetic map. The accuracy and completeness of the LHSM genome were supported by a high mapping rate for the Illumina reads (98.63% of 185,951,324) and the expressed sequence tags (ESTs) (94.11% of 80,805) of P. persica from NCBI (Supplementary Tables 5 and 6). The LHSM genome assembly exhibited a high LAI (LTR Assembly Index) score (20.67) and 97.4% (2066 out of 2121) of complete BUSCO genes could be aligned to the assembly, similar to the level obtained for the Lovell v2.0 genome (LAI: 21.29; BUSCO: 96.8%, 2054 of 2121) (Supplementary Table 7).

We predicted a total of 35,215 protein-coding genes and 40,072 transcripts (Table 1 and Supplementary Table 8), which were comparable with those of the Lovell v2.0 genome (31,972 genes and 47,089 transcripts) using the same integrative strategy combining in silico de novo gene prediction, protein-based homology searches, and transcript data from RNA sequencing analysis of various tissues (Supplementary Table 9). An analysis of TEs overlap with CDS regions indicated TEs overlap for 10,118 protein-coding genes; the percentage of CDS overlapped by TEs was 28.7% on average (Supplementary Table 10). Apart from the different methodologies used, the large difference in the number of protein-coding genes between the LHSM and Lovell v2.0 genome assemblies is likely due to a conservative selection criterion against TEs in the Lovell genome: their pipeline used an overlap value of less than 20% for TEs overlap of CDS regions³¹.

The annotated protein-coding genes in the LHSM genome covered 94.1% (1996 out of 2121) of the complete BUSCO genes (Supplementary Table 7), and 88.29% of these genes could be annotated by at least one of public database (Pfam, InterPro, NR, GO, and KEGG) (Supplementary Table 11). Notably, we also annotated 118.35 Mb repetitive elements accounting for 46.01% of the LHSM assembly (Supplementary Table 12), a level slightly higher than that (44.26%) of the Lovell v2.0 genome. Collectively, these multiple lines of evidence attest to the high-quality of our de novo LHSM genome assembly, supporting its utility as an excellent reference for genomic-variation mining and genome-wide comparative analyses in peach.

We next performed analyses for genome evolution for 12 dicot plant species including seven Rosaceae (including peach) and five other species based on their 367 single-copy gene families (Supplementary Fig. 4). The maximum-likelihood phylogenetic tree revealed that P. persica and cultivated almond (P. dulcis) diverged about 4.6–16.6 MYA, consistent with the previous reports^11,37. We found 425 significantly expanded gene families (P < 0.01) comprising 4104 genes in peach as compared to the common ancestor of peach and almond (Supplementary Data 1); intriguingly, these expanded genes were significantly enriched in categories associated with defense response, ATPase activity, response to auxin, pollination, pectinesterase activity, and malate transport (Supplementary Fig. 5). Also notably, the aluminum-activated malate transporter (ALMT) gene family (gene family OG0000394 in Supplementary Data 1), which have made large contributions to fruit acidity by affecting malate content in some fruit crops, such as apple, tomato, and grapevine^5,6,38,39,40, was found to have higher copy number in peach (seven copies) than that (four copies) in almond.

A total of 705,879 SNPs and 181,788 InDels were identified between the LHSM and Lovell v2.0 genomes (Supplementary Table 13), potentially exerting effects on 10,234 (29.06%) protein-coding genes through the detected non-synonymous substitutions, frameshift insertions/deletions, and other large-effect mutations (stop gain, stop loss, and splicing) (Supplementary Table 14). We also identified 2309 LHSM-specific genomic segments (2.01 Mb) and 910 Lovell-specific genomic segments (0.74 Mb) (Supplementary Data 2), as well as a total of 263 LHSM-specific PAV (presence–absence variation) genes and 141 Lovell-specific PAV genes positioned within these specific segments (Supplementary Data 3). Compared with the Lovell v2.0 genome, among the syntenic regions, a total of 2653 deletions and 2068 insertions were found to affect 2.24 and 2.17 Mb genomic regions, respectively; among the rearranged regions in the LHSM genome assembly, we found 45 inversions (6.10 Mb), 391 translocations (11.22 Mb), and 1320 duplications (8.60 Mb) (Supplementary Table 15). Notably, we found a region at Chr3: 13.31—18.86 Mb, including the top-three ranked largest inversions (0.87, 0.83, and 1.27 Mb) and the adjacent translocations (0.71 and 1.53 Mb for two translocated segments); ~9% (3193) of protein-coding genes were located within or overlapped with these InDels and rearranged regions (Supplementary Data 4 and 5). Thus, we further examined this region through comparison between the Hi-C contact matrices of the LHSM and Lovell v2.0 assemblies constructed using LHSM Hi-C data (Fig. 1b, c), and through synteny analysis between LHSM genome assembly and scaffolds of Lovell v2.0 genome. Beyond showing the complexity of this region in the Lovell genome which—was highlighted by Verde et al.³²—these results supported the putative misordering or misorientation of some scaffolds in the corresponding region of the Lovell v2.0 genome; for example, the Super_27 and Super_451 were misordered, and their order in the pseudomolecule should be inverted in a future release (Supplementary Table 16).

In addition to variations in genomic sequences, we also explored the gene copy number variations between the LHSM and Lovell v2.0 genomes. Based on clustering analysis of orthologous genes, we found 22,166 species-conserved orthogroups covering 23,726 genes, and 2419 and 944 species-expanded orthogroups covering 7727 and 2988 genes for the LHSM and Lovell v2.0 genomes, respectively (Supplementary Table 17). GO functional enrichment analysis revealed that genes in the species-expanded orthogroups of the LHSM genome were enriched for functions related to defense response, whereas there was enrichment for genes involved in proteolysis and reproduction process in the Lovell v2.0 genome (Supplementary Fig. 6).

Peach population structure and pre-breeding improvement in fruit quality

We identified a total of 6.97 million SNPs and 1.23 million InDels across 548 P. persica genomes from various geographic regions and 15 closely wild relative (P. kansuensis) genomes with an average depth of 26.34× based on mapping to the LHSM reference genome (Supplementary Data 6–8). Using the P. kansuensis accessions as the outgroup, a neighbor-joining (NJ) phylogenetic tree for all P. persica accessions provided a first separation of group I (including all ornamental peaches and most of landraces) and group II (mainly including most of the modern cultivars) (Fig. 2a). Group II was further classified into two subgroups (group II-1 and II-2); group II-1 mainly contained eastern cultivars (ECs) from China and other Asian regions and group II-2 mainly contained western cultivars (WCs) notably from the Americas and Europe (Supplementary Data 6). These classifications were also supported by the principal component analysis (PCA) (Fig. 2b), the model-based clustering analysis (K = 3 and 4) using ADMIXTURE (Fig. 2a), and a previous study²⁰.

**Fig. 2: Population structure and genetic divergence of primitive landraces and improved landraces.**

Group II-1 showed clear admixture within some ILs; another NJ-tree for all the landraces and ornamental peaches supported that these ILs from eastern China (Fig. 2c), including most of the famous Chinese Cling peaches from the Yangtze River Delta region and some elite landraces from the adjacent Huang-Huai region, are genetically derived from the primitive landraces (PLs) across western, central, and eastern China in group I. Regarding their fruit quality traits, ILs displayed remarkable improvement in higher fructose content and lower fruit acidity relative to PLs (Fig. 2d), suggesting selection of ILs by agriculturalists (an early improvement process) prior to modern peach breeding programs. A multiple sequentially Markovian coalescent (MSMC) analysis showed that PLs had an earlier expansion as well as a lager effective population size than ILs (Fig. 2e and Supplementary Fig. 7). Moreover, ILs had markedly elevated LD and reduced genetic diversity (θπ) compared to PLs (Fig. 2f), suggesting that a bottleneck (θπ_PL/θπ_IL = 1.37) occurred during the early improvement along with the eastward dispersal. Notably, the protein-coding genes within the selective sweep regions in the comparison of ILs and PLs showed enrichment for GO terms including sucrose biosynthetic process (GO:0005986), sugar-phosphatase activity (GO:0050308), malate metabolic process (GO:0006108), malate dehydrogenase activity (GO:0046554), organic acid biosynthetic process (GO:0016053), and regulation of pH (GO:0006885) (Supplementary Data 9 and 10), indicating the potential alteration towards fruit flavor during this early improvement process.

We compared each accession of the modern cultivars for signatures of introgressed fragments inherited from the PLs or ILs based on rIBD (relative identical by descent) analysis^41,42,43 (Supplementary Fig. 8a). The result indicated that the modern cultivars had larger proportions of genomic introgressions from the ILs than from the PLs. Through investigations of the genomic segments introgressed from the ILs into modern cultivars (ECs or WCs), we found genes putatively encoding enzymes or proteins known to function in the synthesis or transport of major organic acids (e.g., ALMTs⁶, NADP-malic enzyme⁴⁴, isocitrate dehydrogenases⁴⁵, and H-ATPase⁷) and sugars (e.g., SWEET sugar transporters⁴⁶, tonoplast monosaccharide transporter⁴⁷, sugar transporter, polyol/monosaccharide transporter⁴⁸, sucrose synthase, phosphofructokinase⁴⁹, beta-galactosidase, and beta-glucosidases) (Supplementary Fig. 8b and Supplementary Data 11–14). These findings suggest that potential genetic source from IL peaches contributed to the fruit flavor-related traits during modern peach breeding.

Divergent selection for fruit acidity during modern peach breeding

Given the genetic divergence between ECs and WCs (Fig. 2a), we performed selective sweep analysis to search for the genome regions bearing strong selective signatures in the comparisons between ECs and WCs; we were also interested in identifying possible genes under selection in such regions (Fig. 3a). Notably, we found enriched GO terms related to malate (dicarboxylic acid) transport (GO: 0015743), citrate (tricarboxylic acid) metabolic process (GO: 0006101), dicarboxylic acid metabolic process (GO: 0043648), and tricarboxylic acid metabolic process (GO: 0072350) among the protein-coding genes within the selective regions, indicating potential alteration of malate and citrate accumulation (Supplementary Data 15 and 16). Moreover, we found genes encoding putative ALMT^5,6, ATP citrate lyase (ACL)⁵⁰, lactate/malate dehydrogenases (LDH/MDH)⁵¹, isocitrate/isopropylmalate dehydrogenases (IDH/ISDH)⁵², and H-ATPase⁷ (Fig. 3a and Supplementary Data 17 and 18) among the enriched GO terms; homologs of these proteins have been previously implicated in the metabolism or transport of organic acids in fruit crops. These findings suggesting divergent selection for fruit-acidity-related traits during peach breeding promoted us to quantify acidity-related phenotypes in ripe fruits, including the content of the organic acids: quinic acid and shikimic acid, as well as the two major contributors for peach fruit acidity: malate and citrate⁵³. We examined these phenotypes for accessions over two consecutive years (2016 and 2017), and found significantly higher levels of both malate and citrate in WCs compared to ECs (Fig. 3b and Supplementary Fig. 9). We further measured the pH of their ripe fruits, and also collected titratable acidity (TA) data for all accessions in 2017. Our phenotypic analysis for fruit acidity, as measured by TA and pH, showed that WCs have significantly higher TA level and lower pH compared to those of ECs, multiple lines of empirical evidence supporting the divergent selection for fruit acidity in ECs vs. WCs.

We noted that malate, which is the predominant organic acid in peach¹⁸, showed the strongest correlation with pH (R = −0.62, P < 0.001 in 2017) and TA (R = 0.76, P < 0.001 in 2017) among the examined organic acids (Supplementary Data 19), apparently accounting for a large extent of the divergence in fruit acidity between ECs and WCs. Of particular note, we found that five putative ALMT encoding genes were among the genes located in selective sweep regions (Fig. 3a). We examined the expression of these ALMT genes in mesocarp tissues at 48 DAA (days after anthesis), a period corresponding to the primary phase for malate accumulation in peach based on our data (Supplementary Fig. 10) as well as a previous study¹⁹. One ALMT gene (Pp.LH.06G01819) was expressed at a significantly higher level (P = 0.004, two-sided Student’s t-test) in fruits of three high-malate WC accessions compared to fruits of three low-malate EC accessions (Fig. 3c). Phylogenetic analysis showed that Pp.LH.06G01819 was clustered into the corresponding Arabidopsis ALMT clade I with the previously reported TaALMT1, as a malate channel in wheat⁵⁴ (Supplementary Fig. 11). We named Pp.LH.06G01819 as PpALMT1, and peach mesocarp tissues transiently overexpressing PpALMT1 had significantly increased malate content compared to vector control mesocarp tissues (Fig. 3d). These results indicate that PpALMT1 functions to increase malate content in peach fruit and supports the inference that differential expression of PpALMT1 has likely contributed to the divergence of ECs and WCs in fruit acidity during modern peach breeding.

To further explore whether the genetic loci associated with acidity have undergone divergent selection, we performed GWAS analysis of four acidity-related traits including pH, TA, malate content, and citrate content. For pH, malate, and citrate, we respectively detected 11, 8, and 4 significant loci in 2016 and 20, 11, and 3 loci in 2017, and for TA, a total of 16 significant loci were detected in 2017 (Supplementary Table 18 and Supplementary Data 20). One strongly associated locus (Chr5: 21,714–1,812,811 bp) explained a large proportion of the phenotypic variance across these four traits (ranging from 9.43 to 38.04%) (Supplementary Fig. 12); this overlapped with the known D locus of chromosome 5, which has been variously reported to exert a large-effect on TA or pH^{22,23,25,28,29}. In addition to this locus on chromosome 5, there were other significant loci with relatively high PVE (phenotypic variance explanation) values (6.46–27.11%), results implying a complicated genetic regulation mechanism underlying fruit acidity. In particular, a significantly associated locus (Chr2: 29,927,641 bp) was among the very top-ranking loci in terms of both P and PVE values for all four traits in at least 1 year, findings clearly suggesting its potential contribution to fruit acidity. It also bears mention that 26.7% to 100.0% of the peak SNPs for each trait positioned within acidity-associated loci shared overlap (or were nearby; <100 kb) with the selective sweep regions between ECs and WCs (Supplementary Table 19 and Supplementary Data 20). Beyond clearly indicating that these genetic loci have contributed to the divergent selection of fruit acidity traits between ECs and WCs, these GWAS results provide an empirical basis for investigating causal variations for fruit acidity, a major organoleptic determinant of fruit flavor quality.

Genetic loci associated with major sugars underlying peach fruit sweetness

Another major organoleptic aspect of fruit flavor is sweetness, which is determined by both the type and content of soluble sugars, including for example sucrose, fructose, glucose, and sorbitol^27,55. We quantified the content of these four major sugars for the ripe fruits of the P. persica accessions over two consecutive years (2016 and 2017). Specifically, for 2016 and 2017 data, we found that sucrose accounts for ~76.93 and 75.10% of the examined sugar content at average, followed by glucose (8.65 and 12.16%), fructose (9.98 and 9.58%), and sorbitol (4.44 and 3.16%) (Supplementary Fig. 13), similar trends as in previous studies^27,56,57.

We performed GWAS to identify significantly associated loci for the content of these four sugars based on 1,067,831 SNPs with minor allele frequency (MAF) ≥0.05 (Supplementary Fig. 14). A major locus on chromosome 5 (Chr5: 614,754—1,109,368 bp) explained 8.6 and 7.6% of the phenotypic variance for sucrose content in 2016 and 2017, respectively (Fig. 4a), and this overlapped with previously reported QTLs for sucrose content on chromosome 5 identified by using the hybrid populations^22,23,24 (Supplementary Data 21). It is notable that a gene (PpTST1) encoding a tonoplast sugar transporter (TST) is positioned adjacent to this locus from our GWAS. TST proteins can load soluble sugars into the vacuole^58,59, and the PpTST1 was recently reported to affect sucrose content in peach fruit⁶⁰. We identified four loci significantly associated with glucose content in 2017 (Fig. 4b) (Chr1: 30,732,072—30,732,099 bp, Chr3: 15,707,662—15,707,662 bp, Chr4: 10,736,973—12,413,438 bp, and Chr8: 14,342,373—14,343,414 bp), respectively explaining 7.1, 6.8, 12.5, and 7.1% of the phenotypic variance for glucose content. The major locus on chromosome 4 was found to overlap with a previously reported glucose-related QTL (Supplementary Data 21). Within this major locus (Chr4: 10,736,973—12,413,438 bp), Pp.LH.04G02050 encoding a putative β-glucosidase that catalyzes hydrolysis of β-D-glucoside or oligosaccharide substrates⁶¹ may regulate glucose accumulation. Sorbitol is universally found in stone fruits and is a significant contributor to sweetness in peaches^19,62. We detected two significant signals on chromosome 6 associated with sorbitol content in 2016, and one signal each for chromosome 1 and chromosome 3 in 2017 (Supplementary Fig. 14); these respectively explained 7.5, 8.1, 7.9, and 7.7% of phenotypic variation, thus identifying candidate loci for investigations about the genetic determinants of sorbitol accumulation. Of these, one signal on chromosome 6 (Chr6: 22,350,242—22,451,210) was found to overlap with a recently reported QTL for sorbitol⁶³ (Supplementary Data 21).

**Fig. 4: Genetic loci associated with the content of sucrose, glucose, and fructose affecting peach fruit sweetness.**

Fructose has a higher sweetness impact (1.7-fold as compared to sucrose) compared to sucrose, glucose, or sorbitol⁶⁴. Selection for the elevation of fructose content in tomato has been applied to develop sweeter cultivars^65,66. In this study, we identified a major GWAS locus (Chr1: 11,738,129— 12,006,040 bp) for fructose content (Fig. 4c); this overlapped the previously reported FRU QTL on chromosome 1 identified by using a hybrid population²⁴ (Supplementary Data 21). The peak SNP (P = 6.49e-16) in the major locus could explain up to 13.87% of phenotypic variation for fructose content in our panel. It was notable that this locus showed a strong selection signature, supported by significantly reduced nucleotide diversity (θπ) from primitive (PLs) to improved (ECs, WCs, or ILs) (Fig. 4d). This finding is particularly interesting when considering the results of our comparative sugar content analyses collectively: our data support that only fructose content has been elevated during the peach improvement (Fig. 4e and Supplementary Fig. 15). Accordingly, the raised sweetness levels of improved peach germplasm have resulted from elevated accumulation of fructose. This conclusion agreed with the previous suggestion that commercial high-quality peaches have higher fructose content as compared to native peach accessions^27,67.

Identification of the PpERDL16 gene and its contribution to increased fructose accumulation during peach improvement

The haplotype blocks were estimated using PLINK in our candidate region for fructose content; this effort further narrowed this region into only two haplotype blocks harboring significantly associated SNPs (block1: Chr1: 11,735,344—11,784,598 bp and block2: Chr1: 11,912,057—11,962,326 bp) (Fig. 5a). A qPCR analysis of ripening fruits showed that one gene (Pp.LH.01G01754) out of all the 13 protein-coding genes found within these two blocks had notably higher expression in the three tested low-fructose accessions compared to the three high-fructose accessions (Supplementary Fig. 16). We also found that its expression level was significantly negatively correlated with fruit fructose content (R = −0.56, P = 3e-04, two-sided Student’s t-test) in a larger panel of 37 peach accessions (Fig. 5b), helping to explain the earlier report that the FRU QTL region displayed a strong negative effect on fructose content throughout fruit development⁶⁸.

**Fig. 5: Identification of *PpERDL16* and its contribution to increased fructose accumulation during peach improvement.**

Phylogenetic analysis showed that Pp.LH.01G01754 belongs to the ERD6-like subfamily of monosaccharide transporters and it has the closest relationship with ERD6-like 16 (early response to dehydration 6-like 16) protein of Arabidopsis (Supplementary Fig. 17), so it was designated as PpERDL16. Previous studies showed that AtERDL6 in Arabidopsis and MdERDL6-1 in apple are symporter proteins that function in glucose export from the vacuole into the cytosol^69,70, and transgenic Arabidopsis lines overexpressing AtERDL6 showed lower levels of glucose and fructose in leaves as compared to wild type plants⁶⁹. We confirmed the tonoplast localization of the PpERDL16-GFP fusion protein in tobacco leaf cells using the Atγ-TIP-mCherry fusion protein as the positive control (Fig. 5c). We also examined peach mesocarp tissues transiently overexpressing PpERDL16, and found that mesocarp tissues infiltrated with the PpERDL16 vector had significantly reduced levels of both glucose and fructose compared to empty vector control mesocarp tissues (Fig. 5d). Viewed collectively, these results support that PpERDL16 is very likely the causal gene underlying the previously reported major FRU QTL locus for fruit fructose accumulation.

We found that the θπ values of PpERDL16 (both its CDS and the upstream (~5 kb) region harboring potential cis-regulatory elements) were lower among the modern cultivars (ECs or WCs) compared to PLs (Supplementary Fig. 18), which could hypothetically have resulted from selection for PpERDL16. This motivated an additional detailed analysis of the PpERDL16 throughout peach improvement. After filtering 17 low frequency haplotypes (i.e., only carried by one accession), all 76 SNPs in the genic region of PpERDL16 could be classified into eight haplotypes for all peach accessions (including 15 P. kansuensis accessions) (Fig. 5e). Haplotype network analysis showed that the primitive haplotypes (Hap6–8) were only carried by wild relative P. kansuensis (Fig. 5e and Supplementary Fig. 19), whereas Hap1–5 occurred in ornamental peaches, peach landraces, and cultivars, with the highest frequency (86.5%) for Hap4, followed by Hap1 (8.7%), Hap5 (5.4%), Hap2 (1.7%), and Hap3 (0.2%) (Fig. 5e). Moreover, the fructose content of the accessions carrying Hap4 (average 9.58 mg/ml) or Hap5 (average 8.61 mg/ml) was significantly higher than those carrying haplotype Hap1 (average 2.39 mg/ml), Hap2 (average 3.89 mg/ml), and Hap3 (average 2.39 mg/ml) (Fig. 5f). The frequencies of the Hap4 and Hap5 were increased among ILs, ECs, and WCs, as compared to ornamental peaches and PLs (Fig. 5g). Finally, consistent with our speculations about PpERDL16’s function, the result that Hap4 was carried by 92.8% of ECs and 98.3% of WCs highlighted apparently convergent selection for increased fructose content in both ECs and WCs.

Discussion

We present a high-quality LHSM reference genome and mapped resequencing data for a large natural population comprising 564 peach accessions to this genome. These resources collectively explain the extent of genetic variations in peach substantially, thereby supporting peach genetic studies by augmenting resolution power for association studies. This is significant, because the resolution power has long been dragged down owing to the narrow genetic basis and high levels of LD in peach^15,27. It is notable that our study revealed a historical eastward dispersal and continuous improvement trend for domesticated peaches that occurred before modern peach breeding efforts. It was these early efforts which led to the low-acid and sweet ILs, including the typical Chinese Cling peaches, that have subsequently served as elite germplasm, a situation reflected in the overwhelming contribution of the ILs to the modern cultivars as compared to PLs. Nevertheless, our data also show that the PLs have much higher genetic diversity than the ILs, supporting their utility for breeding and improvement applications requiring an expanded genetic basis for introducing economically important traits into modern peaches (e.g., the potential for resistance to viral pathogens, etc.). Additionally, our LHSM genome, as a typical IL genome, will surely facilitate mining of valuable genomic information for peach genetic improvement generally and specifically for efforts to modulate fruit quality traits.

Eating quality is an important aspect for the improvement of fruit-bearing crops as well as seed crops like rice, maize, wheat, and soybean⁷¹. Sweetness and acidity are understood as the two most impactful organoleptic attributes for fruit flavor. In peach, the common consumer demand for sweeter taste, coupled with differentiated cultural preferences for acidity, has resulted in the formation of two typical flavor types: sweet, low-acid taste vs. sweet, acid taste, respectively favored by eastern and western consumers^18,19. Similar preferences are evident for apple cultivars: North Americans and Europeans favor sweet, sub-acid apples, whereas sweet apples with barely any acid flavor are preferred in Asia and India⁷². Our data revealed signatures of selection in the peach genome that underlie the divergent selection for fruit acidity that has occurred between eastern and western peach breeding programs. And we used these detected differences to pursue specific acidity-related loci and/or candidate genes. Among candidate genes, a PpAMLT1 gene was found to affect accumulation of malate, the predominant acid in peach fruits, thus illustrating the utility of our data and providing specific information to support genetic improvement towards acidity. Despite the dominant role of malate in contributing to the divergence of fruit acidity between ECs and WCs, it is bears mention that the significantly increased citrate content in WCs, as compared to the PLs (Supplementary Fig. 9), could also serve as a non-ignorable factor in elevating the fruit acidity in WCs.

We show that PpERDL16 is a casual gene that controls fructose accumulation in peach fruit and haplotype analysis clearly highlighted how this locus has driven the elevation of sweetness that has advanced during multiple stages of peach improvement. Our study also provides an excellent example for how a single phenotype (sweetness) desired by consumers can be obtained via separate selection trajectories in multiple fruit crop species involving distinct biochemical mechanisms. For example, the selection of ClTST2 which encodes a TST, led to the increased accumulation of sucrose and hexoses in watermelon⁴⁷, whereas the increased sweetness in peach and some tomato varieties⁶⁶ result from elevated fructose content as controlled by PpERDL16 and SlFgr (encoding a tomato SWEET transporter), respectively. More broadly, the differentiation of maize into field corn and sweet corn varieties resulted from altered starch biosynthesis as mediated by a mutation in ZmSUGARY1 (encoding an isoamylase-type starch-debranching enzyme)⁷³.

In summary, our study shows how harnessing a high-quality genome assembly for a long-prized improved Chinese landrace ultimately supported development of additional genome-scale germplasm diversity resources at a population scale. Beyond providing valuable genomic resources for peach genomic and genetic research, our study provide insight into the improvement of peach flavor, revealing genetic basis underlying fruit flavor. Our findings also provide a genomic framework for fruit crops that can deepen understanding of fruit quality trait physiology and that suggests strategies for flavor improvement.

Methods

Plant materials

The sequenced peach (P. persica) accessions used in this study were obtained from the Beijing and the Nanjing National Peach Germplasm Repositories, China. The 15 P. kansuensis accessions were collected in the Gansu province of China. A representative Chinese Cling peach (cv. Longhua Shui Mi (LHSM)) was collected from the Nanjing National Peach Germplasm Repository, China.

DNA extraction and sequencing

Extraction and purification of high molecular weight DNA was performed using the DNeasy Plant Maxi Kit (Qiagen, Germany). DNA concentration was measured using a NanoDrop spectrophotometer (Thermo Fisher Scientific, USA) and the Qubit 2.0 Fluorometer (Invitrogen, USA). Illumina short-read data were obtained using the Illumina NovaSeq platform, which generated a total of 184.73 million reads with a total length of 27.71 Gb. Single-molecule real-time (SMRT) cells were sequenced on the PacBio Sequel platform (Pacific Biosciences, CA, USA), generating a total of 3.54 million reads with a total length of 30.90 Gb. Hi-C libraries were created from young leaves, which were fixed with formaldehyde and then lysed before the cross-linked DNA was digested overnight with DpnII. Sticky ends were biotinylated and proximity-ligated to form chimeric junctions that were enriched for, and then physically sheared to a size of 500−700 bp. Chimeric fragments representing the original cross-linked long-distance physical interactions were processed into paired-end sequencing libraries. This allowed us to generate a total of 126.25 million paired-end reads and 37.87 Gb of sequencing data on an Illumina NovaSeq platform. The alignment of the Hi-C reads was implemented using the HiC-Pro program⁷⁴ and revealed a high proportion (82%) of valid interactions that confirmed the high quality of the Hi-C data (Supplementary Fig. 20).

Genome assembly

In order to estimate the genome size of LHSM, the Illumina short reads were recruited to determine the k-mer distributions using the GenomeScope software⁷⁵. The PacBio long-read data were de novo assembled into PacBio contigs using Canu version 1.9⁷⁶, generating a total of 2212 contigs with a N50 of 686.03 kb. We then used the Highly Efficient Repeat Assembly (HERA) method⁷⁷ based on the Canu-corrected PacBio long-read data in order to extend the PacBio contigs to 243 contigs (HERA contigs v1) with a N50 of 5.17 Mb. The Illumina short-read data were used for error correcting the contigs using Pilon⁷⁸. Subsequently, and in order to anchor the corrected contigs (HERA contigs v2) into chromosomes, we aligned the Hi-C sequencing data into these contigs using Juicer v1.8.9⁷⁹. The contigs were finally linked into eight distinct chromosomes by 3D-DNA⁸⁰.

Repeats and gene annotation

The annotation of transposable elements was performed using RepeatMasker (http://www.repeatmasker.org). The repeat libraries included the RepBase-20170127 and the de novo repeat library created using RepeatModeler (http://www.repeatmasker.org) (with the parameter -LTRStruct). The LTRharvest⁸¹ and the LTR_FINDER⁸² programs were used to identify intact LTRs in the genome assembly and to calculate the LAI index⁸³.

The pipeline for ab initio gene annotations included de novo gene predictions of the repeat-masked genome using AUGUSTUS⁸⁴ and SNAP⁸⁵, as well as evidence-based gene annotations using MAKER2⁸⁶. For de novo gene prediction, we used the AUGUSTUS and SNAP programs trained on the homolog protein-coding genes of Arabidopsis thaliana, Oryza sativa, and P. persica. The homolog sequences were collected from the Swiss-Prot database. Transcript evidence included transcripts assembled from RNA-Seq data obtained from different tissues (root, leaf, flower stages, and fruit; see Supplementary Table 9) using HISAT and StringTie⁸⁷. This evidence was submitted to MAKER2, and the output was refined by the AED metric (AED <0.7). Gene functional annotation was achieved using BLASTP (−evalue < 1e − 5) against the Swiss-Prot, Pfam⁸⁸, and the NR databases⁸⁹, as well as using InterProScan version 5.27-66.0⁹⁰ against the InterPro database⁹¹. Gene Ontology terms were obtained for each gene from the corresponding InterPro entries. The pathways associated with each gene assigned by BLASTP⁹² against the KEGG database⁹³, with an E-value cut-off of 1e − 5.

Evaluation of genome assembly

The flanking sequences of the molecular markers obtained from the high-density and the multi-population consensus genetic linkage map for peach³⁶ were mapped against the LHSM genome assembly using BLASTN. The Pearson correlation coefficient was computed between the genetic distance and the physical position of the uniquely aligned markers. The Illumina short-read data were also used to evaluate assembly accuracy and completeness using BWA-MEM version 0.7.17-r1188⁹⁴. The completeness of the genome assembly and the gene annotations were assessed with a plant database composed by 2121 conserved plant genes (eudicotyledons_odb10) using BUSCO version 3.0.2⁹⁵. The EST sequences that were retrieved from NCBI were aligned to the genome assembly using GMAP (version 2019-09-12)⁹⁶.

Gene families and phylogenetic analysis

We used OrthoFinder (v2.3.9)⁹⁷ to identify shared gene families between peach and 12 other plant species, including six Rosaceae (almond, apricot, European pear, apple, black raspberry, and woodland strawberry), one Brassicaceae (Arabidopsis), one Rutaceae (orange), one Salicaceae (Populus trichocarpa), one Vitaceae (grape), one Solanaceae (tomato), and one monocot (rice). Based on the protein sequences of 367 single-copy ortholog families, the phylogenetic relationship among these species was estimated using RAxML (v8.2.12)⁹⁸. Divergence times were estimated by the MCMCtree program embedded in PAML (v4.9)⁹⁹. We measured the expansion and contraction of orthologous gene families based on the maximum likelihood tree using the software CAFE v4.2 (https://github.com/hahnlab/CAFE).

Comparative genomics

Genome alignment between LHSM and Lovell v2.0 was performed using the NUCmer program embedded in MUMmer¹⁰⁰ with the parameters “-mumreference -g 1000 -c 90 -l 40”. The delta-filter program was used to remove the mapping noise and to determine the one-to-one alignment blocks with parameters “-r -q”. SNPs and InDels were identified using the show-snps program (-ClrT -x 1). Gene synteny analysis was performed using the MCScanX package¹⁰¹ and BLASTP with the parameters “-evalue < 1e-10, -v 5, -b 5” in order to determine the pairwise similarity between the protein sequences of the LHSM and the Lovell v2.0 genomes.

To identify the presence/absence variations (PAVs) in the LHSM genome, we divided it into 500 bp overlapping windows with a step size of 100 bp. Each 500 bp window was then aligned against the Lovell v2.0 genome using BWA-MEM with the parameters “-w 500–M”. The genetic sequences within the different windows that failed to align with the Lovell v2.0 genome, or those that aligned with less than 25% coverage, were defined as LHSM-specific sequences. Overlapping windows that could not be aligned were merged together. The Lovell-specific sequences were then identified following the same method.

In order to identify structural rearrangements, we used Minimap2 v2.17-r941¹⁰² to align the LHSM assembly to the reference Lovell v2.0 genome with the following parameter setting “-ax asm5 –eqx”. Structural rearrangements and local variants (>50 bp) were detected using SyRI¹⁰³. To identify gene copy number variation, we first performed the gene family clustering using OrthoFinder version 2.3.9⁹⁷ based on the protein sequences from the LHSM and the Lovell v2.0 genomes, and identified CNVs using a PERL script developed in-house.

SNP and small InDel calling

We collected Illumina resequencing data for 564 peach accessions (Supplementary Data 6) with an average depth of 26.34×. These included 379 newly sequenced accessions. The quality control for the raw re-sequencing data was performed using fastp version 0.20.1¹⁰⁴ with default settings. For SNP calling, Illumina short reads were aligned to the LHSM genome using BWA-MEM; PCR duplicates were removed using Picard version 1.118 (http://broadinstitute.github.io/picard/). SNPs and InDels were identified using HaplotypeCaller available from the Genome Analysis Toolkit (GATK, version 4.1.5.0)¹⁰⁵, and subsequently filtered following ref. ³. SNPs with a read depth <5 and non-biallelic SNPs were removed from further analyses.

Phylogenetic and population structure analyses

A total of 337,386 SNPs with a MAF ≥0.05, missing rate ≤50%, and with a Hardy–Weinberg Equilibrium (HWE) P value >1e-6 was used to build a maximum likelihood phylogenetic tree, as well as to perform population structure and PCA. The phylogenetic tree was built using the FastTree2 program (version 2.1.10)¹⁰⁶. Population structure was investigated using ADMIXTURE¹⁰⁷ and evaluating each K from 2 to 12. PCA was performed using the smartPCA program embedded in the Eigensoft package version 7.2.1¹⁰⁸.

Relative IBD and introgression analysis

To investigate introgression from the PLs and ILs to each accession of the modern peach cultivars, we performed pairwise IBD analysis by first phasing the genotypes using Beagle (v5.1)¹⁰⁹ and then detecting shared IBDs tracks between any two accessions using RefinedIBD (v17Jan20.102)¹¹⁰. After this, we counted the number of shared IBD tracks in 10-kb sliding windows (in steps of 5-kb) between each modern cultivar and PLs or ILs. These counts were then normalized as nIBD = shared IBD number/number of PLs or ILs), and the rIBD was calculated as rIBD = nIBD_IL – nIBD_PL. Average rIBD values of individuals in ECs or WCs were calculated along each window and then normalized following a standard normal distribution. Windows with Z-scores greater than 2 were considered as putative introgressed regions.

Multiple sequentially Markovian coalescent analysis

MSMC2 (v2.1.1)¹¹¹ was used to infer the demographic history of peach. To improve reliability, genome regions were masked with SNPable tool (http://lh3lh3.users.sourceforge.net/snpable.shtml) when the coverage depth was <15× after removing reads with mapping quality <20. First, we split the reference genome into overlapping 35-mers and then mapped these back to the reference genome using BWA (bwa aln -R 1000000 -O 3 -E 3). Only regions where the majority of 35-mers were uniquely mapped and without mismatch were retained for further analysis. We selected the top ten samples in each population with the highest coverage after masking. The eight most frequent haplotypes were randomly selected from the ten samples in order to infer the demographic history of each population. We repeated this procedure 20 times. Scaled times were converted to years by assuming a generation time of 3 and 4 years, respectively and a mutation rate of 7.7 × 10^–9 per site per generation for peach, following Xie et al.¹¹².

Linkage disequilibrium

To estimate and compare the patterns of LD decay in each population, we computed the mean squared correlation coefficient (r²) values between any two SNPs within 500 kb using the software PopLDdecay (v3.41)¹¹³. To eliminate the potential effects of sample size, we randomly sampled ten accessions for each population (we repeated this procedure 100 times). We used a 500 bp bin size to generate the plot.

Genetic diversity

Genetic nucleotide diversity (θπ, the average number of pairwise nucleotide differences per site between any two randomly chosen DNA sequences from the population) was calculated using VCFtools (v0.1.17)¹¹⁴ on 20 kb sliding windows (with a step size of 10 kb) across the peach genome.

Selective sweeps

We used multiple methods to detect regions and genes under positive selection. SNPs with MAF below 5% were removed from this analysis. To identify potential selective sweeps between population A and population B, log₂(π_B/π_A) and F_ST was calculated together using VCFtools (v0.1.17)¹¹⁴ on a 20 kb sliding window with step size of 10 kb. Windows that contained less than ten SNPs were excluded from further analysis. The windows that were simultaneously (1) in the top 5% of Z-transformed F_ST values and (2) in the bottom 5% of log₂ (π_B/π_A) were considered as candidate selective regions in population A. XP-CLR¹¹⁵ is a method that uses allele frequency differentiation at linked loci between two populations to detect selective sweeps. Each chromosome was analyzed using the XP-CLR (v1.0) program with parameters “-w1 0.0005 200 200 1 -p1 0.9”. The average XP-CLR scores were calculated for each 20 kb sliding window with a step size of 2 kb. The windows in the top 1% of the XP-CLR scores were considered as candidate selective regions. XP-EHH¹¹⁶ was implemented using the program Selscan (v1.1.0)¹¹⁷. The results were normalized on a 20 kb window basis and the ratio of extreme scores (|score| ≥ 2) were calculated in each window. The top 1% of windows (with the highest ratio of extreme scores) were considered as candidate selective regions. Subsequently, the results from each of the above methods were combined. The genes contained within the merged candidate selective regions along the peach genome were considered as candidate selective genes.

GO enrichment

R package ClusterProfiler (v3.18.0)¹¹⁸ was used to perform GO enrichment analysis. The GO terms showing a P value < 0.05 were considered as significantly enriched.

Phenotypic analysis for fruit flavor related traits

We harvested ten matured peach fruits per plant and prepared the crushed mixed fruit juice for phenotypic analysis. SSC, pH, sugar (sucrose, fructose, glucose, and sorbitol), and organic acid (malate, citrate, quinate, and shikimate) contents were measured in 2016 and 2017, and the TA was detected in 2017. The pH was measured using a pH electrode (Sartorius, PB-10). The TA was measured by titrating 25 ml of fruit juice with 0.1 mol/L NaOH to a pH = 8.1, according to “Fruit and vegetable products—Determination of titratable acidity” (GB/T 12456, 2008)¹¹⁹. High performance liquid chromatography (HPLC) was used to determine the sugar and organic acid contents following Filip et al.¹²⁰. The fruit juice was mixed with ethanol (in a proportion of 3:7 (v/v)) prior to centrifugation at 8050 × g for 5 min. The resulting supernatant was forced through PVDF 0.22-μm syringe filters and then injected into the HPLC system (LC-20A, Shimadzu). The organic acid contents were detected using a photo diode array detector (SPD-M20A) and an InertSustain C18 column (250 mm × 4.6 mm ID, 5 μm, GL Sciences Inc.). The samples were eluted with 20 mM monopotassium phosphate (KH₂PO₄, pH = 2.6) at 40 °C and injected at a flow rate of 1 mL/min. The eluted compounds were detected by UV absorbance at 210 nm. The sugars were detected using a refractive index detector (RID-10A) and Luna® 5um NH2 100 Å column (250 mm × 4.6 mm, Phenomenex). The mobile phase was 80% acetonitrile with a flow rate of 3 mL/min for peak separation at 40 °C. Organic acids and sugar contents were calculated from calibration curves obtained from the corresponding external standards.

Transiently overexpression assay

Transient overexpression analysis in peach mesocarps was performed following previously described procedures¹²¹. Briefly, the two pairs of primers (see Supplementary Table 20) were designed to amplify the full-length coding sequence of PpALMT1 and PpERDL16 and the PCR products were then inserted into a pGreen0029 62-SK vector. The recombinant constructs and the vector control were then chemically transformed into Agrobacterium tumefaciens GV3101 (pSoup). The flesh slices were taken from the peel-off mesocarps and then precultured on a MS medium at 24 °C for 24 h. The flesh slices were submerged in an A. tumefaciens suspension and subjected to vacuum conditions (−70 kPa). After vacuum infiltration, the flesh slices were rinsed with sterile water and cultured on a MS medium in a growth chamber (24 °C, RH 85%) for 48 h. The flesh slices were then used for phenotypic and gene expression analyses.

GWAS analysis

We retained peach SNPs with a MAF ≥0.05 and a missing rate ≤50% to perform the GWAS analysis. After imputation using Beagle (v4.1)¹⁰⁹ with default parameters, the GWAS analysis was performed based on a linear mixed model using the program Fast-LMM v2.06.20130802¹²². The P value threshold for significance was estimated as 0.05/n (where n corresponds to the SNP number). The phenotypic variance that was explained by each SNP was estimated¹²³. The haplotype blocks were estimated using the default parameter (–hap) in Plink v1.90b6.10¹²⁴.

Validation and quantification of gene expression

qRT-PCR analysis was used to quantify the expression levels of the 13 candidate genes within the two significantly associated haplotype blocks from six peach accessions (three with the highest fructose content and three with the lowest fructose content as measured in 2017). A total of 37 peach accessions were used to quantify the expression of PpERDL16. Total RNA was extracted from the mesocarp of pre-ripened fruits using the Trelief^TM RNAprep Pure Plant Kit (polysaccharides and polyphenolics-rich) (Tsingke, China). The first-strand cDNA was synthesized using a PrimeScript^TM RT Reagent Kit with gDNA Eraser (Takara, Japan). Quantitative PCR was performed using the TSINGKE Master qPCR Mix (SYBR Green I with UDG) (Tsingke, China), on a StepOnePlus^TM Real-Time PCR System (Applied Biosystems, USA) following the manufacturer’s instructions. cDNA transcript levels were normalized to those of the reference gene actin using the 2^-ΔΔCT method^125,126. The entire set of primers (see Supplementary Table 20) was designed to span an intron in order to avoid the amplification of genomic DNA. PCR reactions were performed in triplicate for each biological replicate; three or more biological replicates were used in all of the PCR reactions.

Analysis of the subcellular localization of PpERDL16

The Atγ-TIP coding region lacking the stop codon (At2g36830), which encodes a vacuolar membrane protein¹²⁷, was synthetized and cloned into the pMD85-mcherry between the CaMV35S promoter and the mCherry coding sequence in order to generate the 35S-γ-TIP-mCherry construct. The PpERDL16 full-length CDS lacking the stop codon was amplified from the cDNA of Redhaven fruit using PF2 primer pairs and subsequently introduced into the pMD85-GFP vector. The resulting fusion vector pMD85-PpERDL16-GFP was co-transformed with pMD85-γ-TIP-mCherry into tobacco (Nicotiana benthamiana) leaves via A. tumefaciens (strain EHA105). The infected tissues were analyzed under a fluorescence microscope (A1R; Nikon, Japan) 72 h after infiltration.

Haplotype analysis and median-joint network

PpERDL16 haplotypes were constructed using the entire set of SNPs present in the gene. SNPs were phased using Beagle (v5.1)¹⁰⁹. Haplotypes with frequency less than 2 were removed. Median Joining Networks for the PpERDL16 haplotypes were built using PopART (v1.7.1)¹²⁸ with default parameters.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Data supporting the findings of this work are available within the paper and its Supplementary Information files. A reporting summary for this Article is available as a Supplementary Information file. The datasets and plant materials generated and analyzed during the current study are available from the corresponding author upon request. The raw resequencing data have been deposited in the Sequence Read Archive of the National Center for Biotechnology Information (NCBI) under BioProjects PRJNA715782 and PRJNA663114. The genome assembly has been deposited at GenBank under the accession JAGEPH000000000. The raw PacBio data and Hi-C data are available in the NCBI Sequence Read Archive under BioProject PRJNA707388. Online tools used in this paper include: Pfam [http://pfam.xfam.org/], InterPro [https://www.ebi.ac.uk/interpro], NR [https://www.ncbi.nlm.nih.gov/refseq/about/nonredundantproteins/], GO [http://geneontology.org], KEGG [https://www.genome.jp/kegg/]. Source data are provided with this paper.

References

Bento, C., Gonçalves, A. C., Silva, B. & Silva, L. R. Peach (Prunus Persica): phytochemicals and health benefits. Food Rev. Int. 3, 1–32 (2020).
Google Scholar
Hui, Y. H. et al. Handbook of Fruit and Vegetable Flavors Vol. 64 (Wiley, 2010).
Guo, S. et al. Resequencing of 414 cultivated and wild watermelon accessions identifies selection for fruit quality traits. Nat. Genet. 51, 1616–1623 (2019).
Article CAS PubMed Google Scholar
Zhao, G. et al. A comprehensive genome variation map of melon identifies multiple domestication events and loci influencing agronomic traits. Nat. Genet. 51, 1607–1615 (2019).
Article CAS PubMed Google Scholar
Bai, Y. et al. A natural mutation-led truncation in one of the two aluminum-activated malate transporter-like genes at the Ma locus is associated with low fruit acidity in apple. Mol. Genet. Genomics 287, 663–678 (2012).
Article CAS PubMed Google Scholar
Ma, B. et al. Genes encoding aluminum-activated malate transporter II and their association with fruit acidity in apple. Plant Genome 8, 1–14 (2015).
Article CAS Google Scholar
Strazzer, P. et al. Hyperacidification of Citrus fruits by a vacuolar proton-pumping P-ATPase complex. Nat. Commun. 10, 1–11 (2019).
Article CAS Google Scholar
Abbott, A. et al. Peach: the model genome for Rosaceae. Acta Hortic. 1, 145–156 (2002).
Article Google Scholar
Arús, P., Verde, I., Sosinski, B., Zhebentyayeva, T. & Abbott, A. G. The peach genome. Tree Genet. Genomes 8, 531–547 (2012).
Article Google Scholar
Su, T., Wilf, P., Huang, Y., Zhang, S. & Zhou, Z. Peaches preceded humans: fossil evidence from SW China. Sci. Rep. 5, 1–7 (2015).
Article Google Scholar
Yu, Y. et al. Genome re-sequencing reveals the evolutionary history of peach fruit edibility. Nat. Commun. 9, 1–13 (2018).
Article ADS CAS Google Scholar
Faust, M. & Timon, B. Origin and dissemination of peach. Hort. Rev. 17, 331–379 (1995).
Google Scholar
Zheng, Y., Crawford, G. W. & Chen, X. Archaeological evidence for peach (Prunus persica) cultivation and domestication in China. PloS ONE 9, e106595 (2014).
Article ADS PubMed PubMed Central CAS Google Scholar
Layne, D. R. & Bassi, D. (eds) The Peach: Botany, Production and Uses (CABI, 2008).
Aranzana, M. J., Abbassi, E. K., Howad, W. & Arús, P. Genetic variation, population structure and linkage disequilibrium in peach commercial varieties. BMC Genet. 11, 1–11 (2010).
Article CAS Google Scholar
Li, X. W. et al. Peach genetic resources: diversity, population structure and linkage disequilibrium. BMC Genet. 14, 1–16 (2013).
Article MathSciNet Google Scholar
Micheletti, D. et al. Whole-genome analysis of diversity and SNP-major gene association in peach germplasm. PloS ONE 10, e0136803 (2015).
Article PubMed PubMed Central CAS Google Scholar
Baccichet, I. et al. Characterization of fruit quality traits for organic acids content and profile in a large peach germplasm collection. Sci. Hortic. 278, 109865 (2021).
Article CAS Google Scholar
Moing, A. et al. Compositional changes during the fruit development of two peach cultivars differing in juice acidity. J. Am. Soc. Hortic. Sci. 123, 770–775 (1998).
Article CAS Google Scholar
Li, Y. et al. Genomic analyses of an extensive collection of wild and cultivated accessions provide new insights into peach breeding history. Genome Biol. 20, 1–18 (2019).
Article Google Scholar
Akagi, T., Hanada, T., Yaegaki, H., Gradziel, T. M. & Tao, R. Genome-wide view of genetic diversity reveals paths of selection and cultivar differentiation in peach domestication. DNA Res. 23, 271–282 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dirlewanger, E. et al. Mapping QTLs controlling fruit quality in peach (Prunus persica (L.) Batsch). Theor. Appl. Genet. 98, 18–31 (1999).
Article CAS Google Scholar
Etienne, C. et al. Candidate genes and QTLs for sugar and organic acid content in peach [Prunus persica (L.) Batsch]. Theor. Appl. Genet. 105, 145–159 (2002).
Article CAS PubMed Google Scholar
Quilot, B. et al. QTL analysis of quality traits in an advanced backcross between Prunus persica cultivars and the wild relative species P. davidiana. Theor. Appl. Genet. 109, 884–897 (2004).
Article CAS PubMed Google Scholar
Boudehri, K. et al. Phenotypic and fine genetic characterization of the D locus controlling fruit acidity in peach. BMC Plant Biol. 9, 1–14 (2009).
Article CAS Google Scholar
Salazar, J. A. et al. Quantitative trait loci (QTL) and Mendelian trait loci (MTL) analysis in Prunus: a breeding perspective and beyond. Plant Mol. Biol. Rep. 32, 1–18 (2014).
Article Google Scholar
Cirilli, M., Bassi, D. & Ciacciulli, A. Sugars in peach fruit: a breeding perspective. Hort. Res 3, 1–12 (2016).
Google Scholar
Rawandoozi, Z. J. et al. Identification and characterization of QTLs for fruit quality traits in peach through a multi-family approach. BMC Genomics 21, 1–18 (2020).
Article CAS Google Scholar
Zheng, B. et al. Assessment of organic acid accumulation and its related genes in peach. Food Chem. 334, 127567 (2021).
Article CAS PubMed Google Scholar
Velasco, D. et al. Evolutionary genomics of peach and almond domestication. G3-Genes Genom. Genet. 6, 3985–3993 (2016).
Google Scholar
Verde, I. et al. The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat. Genet. 45, 487–494 (2013).
Article CAS PubMed Google Scholar
Verde, I. et al. The Peach v2.0 release: high-resolution linkage mapping and deep resequencing improve chromosome-scale assembly and contiguity. BMC Genomics 18, 1–18 (2017).
Article CAS Google Scholar
Wheeler, W., Wytsalucy, R., Black, B., Cardon, G. & Bugbee, B. Drought tolerance of Navajo and Lovell peach trees: precision water stress using automated weighing lysimeters. HortScience 54, 799–803 (2019).
Article CAS Google Scholar
Xie, R. et al. Evaluation of the genetic diversity of Asian peach accessions using a selected set of SSR markers. Sci. Hortic. 125, 622–629 (2010).
Article CAS Google Scholar
Werner, D. J. & Okie, W. R. A history and description of the Prunus persica plant introduction collection. HortScience 33, 787–793 (1998).
Article Google Scholar
da Silva Linge, C. et al. High-density multi-population consensus genetic linkage map for peach. PloS ONE 13, e0207724 (2018).
Article PubMed PubMed Central CAS Google Scholar
Delplancke, M. et al. Combining conservative and variable markers to infer the evolutionary history of Prunus subgen. Amygdalus under domestication. Genet. Resour. Crop. Ev. 63, 221–234 (2016).
Article Google Scholar
Lu, J. et al. Molecular cloning and functional characterization of the Aluminum-activated malate transporter gene MdALMT14. Sci. Hortic. 244, 208–217 (2019).
Article CAS Google Scholar
Ye, J. et al. An InDel in the promoter of Al-ACTIVATED MALATE TRANSPORTER9 selected during tomato domestication determines fruit malate contents and aluminum tolerance. Plant Cell 29, 2249–2268 (2017).
Article CAS PubMed PubMed Central Google Scholar
De Angeli, A. et al. The vacuolar channel VvALMT9 mediates malate and tartrate accumulation in berries of Vitis vinifera. Planta 238, 283–291 (2013).
Article PubMed CAS Google Scholar
Bosse, M. et al. Genomic analysis reveals selection for Asian genes in European pigs following human-mediated introgression. Nat. Commun. 5, 1–8 (2014).
Article ADS CAS Google Scholar
Wang, X., Chen, L. & Ma, J. Genomic introgression through interspecific hybridization counteracts genetic bottleneck during soybean domestication. Genome Biol. 20, 1–15 (2019).
Article Google Scholar
Hao, C. et al. Resequencing of 145 landmark cultivars reveals asymmetric sub-genome selection and strong founder genotype effects on wheat breeding in China. Mol. Plant 13, 1733–1751 (2020).
Article CAS PubMed Google Scholar
Knee, M. & Finger, F. L. NADP+-malic enzyme and organic acid levels in developing tomato fruits. J. Am. Soc. Hortic. Sci. 117, 799–801 (1992).
Article CAS Google Scholar
Sadka, A., Dahan, E., Or, E. & Cohen, L. NADP+-isocitrate dehydrogenase gene expression and isozyme activity during citrus fruit development. Plant Sci. 158, 173–181 (2000).
Article CAS PubMed Google Scholar
Wei, X., Liu, F., Chen, C., Ma, F. & Li, M. The Malus domestica sugar transporter gene family: identifications based on genome and expression profiling related to the accumulation of fruit sugars. Front. Plant Sci. 5, 569 (2014).
Article PubMed PubMed Central Google Scholar
Ren, Y. et al. A tonoplast sugar transporter underlies a sugar accumulation QTL in watermelon. Plant Physiol. 176, 836–850 (2018).
Article CAS PubMed Google Scholar
Reuscher, S. et al. The sugar transporter inventory of tomato: genome-wide identification and expression analysis. Plant Cell Physiol. 55, 1123–1141 (2014).
Article CAS PubMed Google Scholar
Lü, H. et al. Genome-wide identification, expression and functional analysis of the phosphofructokinase gene family in Chinese white pear (Pyrus bretschneideri). Gene 702, 133–142 (2019).
Article PubMed CAS Google Scholar
Hu, X. M. et al. Genome-wide identification of citrus ATP-citrate lyase genes and their transcript analysis in fruits reveals their possible role in citrate utilization. Mol. Genet. Genomics 290, 29–38 (2015).
Article CAS PubMed Google Scholar
Beeler, S. et al. Plastidial NAD-dependent malate dehydrogenase is critical for embryo development and heterotrophic metabolism in Arabidopsis. Plant Physiol. 164, 1175–1190 (2014).
Article CAS PubMed PubMed Central Google Scholar
Meléndez-Hevia, E., Waddell, T. G. & Cascante, M. The puzzle of the Krebs citric acid cycle: assembling the pieces of chemically feasible reactions, and opportunism in the design of metabolic pathways during evolution. J. Mol. Evol. 43, 293–303 (1996).
Article ADS PubMed Google Scholar
Etienne, A., Génard, M., Lobit, P., Mbeguié-A-Mbéguié, D. & Bugaud, C. What controls fleshy fruit acidity? A review of malate and citrate accumulation in fruit cells. J. Exp. Bot. 64, 1451–1469 (2013).
Article CAS PubMed Google Scholar
Kovermann, P. et al. The Arabidopsis vacuolar malate channel is a member of the ALMT family. Plant J. 52, 1169–1180 (2007).
Article CAS PubMed Google Scholar
Nookaraju, A. et al. Molecular approaches for enhancing sweetness in fruits and vegetables. Sci. Hortic. 127, 1–15 (2010).
Article CAS Google Scholar
Bassi, D. & Selli, R. Evaluation of fruit quality in peach and apricot. Adv. Hortic. Sci. 4, 107–112 (1990).
Google Scholar
Brooks, S. J., Moore, J. N. & Murphy, J. B. Quantitative and qualitative changes in sugar content of peach genotypes [Prunus persica (L.) Batsch.]. J. Am. Soc. Hortic. Sci. 118, 97–100 (1993).
Article CAS Google Scholar
Yan, N. Structural advances for the major facilitator superfamily (MFS) transporters. Trends Biochem. Sci. 38, 151–159 (2013).
Article CAS PubMed Google Scholar
Chen, L. Q., Cheung, L. S., Feng, L., Tanner, W. & Frommer, W. B. Transport of sugars. Annu. Rev. Biochem. 84, 865–894 (2015).
Article CAS PubMed Google Scholar
Peng, Q. et al. Functional analysis reveals the regulatory role of PpTST1 encoding tonoplast sugar transporter in sugar accumulation of peach fruit. Int. J. Mol. Sci. 21, 1112 (2020).
Article CAS PubMed Central Google Scholar
Bisaria, V. S., Mishra, S. & Eveleigh, D. E. Regulatory aspects of cellulase biosynthesis and secretion. Crit. Rev. Biotechnol. 9, 61–103 (1989).
Article CAS PubMed Google Scholar
Walker, R. P. et al. Non-structural carbohydrate metabolism in the flesh of stone fruits of the genus Prunus (Rosaceae)–A review. Front. Plant Sci. 11, 549921 (2020).
Article PubMed PubMed Central Google Scholar
Cao, K. et al. Comparative population genomics identified genomic regions and candidate genes associated with fruit domestication traits in peach. Plant Biotechnol. J. 17, 1954–1970 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kroger, M., Meister, K. & Kava, R. Low-calorie sweeteners and other sugar substitutes: a review of the safety issues. Comp. Rev. Food Sci. F. 5, 35–47 (2006).
Article CAS Google Scholar
Levin, I., Gilboa, N., Yeselson, E., Shen, S. & Schaffer, A. A. Fgr, a major locus that modulates the fructose to glucose ratio in mature tomato fruits. Theor. Appl. Genet. 100, 256–262 (2000).
Article CAS Google Scholar
Shammai, A. et al. Natural genetic variation for expression of a SWEET transporter among wild species of Solanum lycopersicum (tomato) determines the hexose composition of ripening tomato fruit. Plant J. 96, 343–357 (2018).
Article CAS PubMed Google Scholar
Robertson, J. A., Meredith, F. I. & Scorza, R. Characteristics of fruit from high-and low-quality peach cultivars. HortScience 23, 1032–1034 (1988).
Article Google Scholar
Desnoues, E. et al. Dynamic QTLs for sugars and enzyme activities provide an overview of genetic control of sugar metabolism during peach fruit development. J. Exp. Bot. 67, 3419–3431 (2016).
Article CAS PubMed PubMed Central Google Scholar
Poschet, G. et al. A novel Arabidopsis vacuolar glucose exporter is involved in cellular sugar homeostasis and affects the composition of seed storage compounds. Plant Physiol. 157, 1664–1676 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zhu, L. et al. MdERDL6-mediated glucose efflux to the cytosol promotes sugar accumulation in the vacuole through up-regulating TSTs in apple and tomato. Proc. Natl Acad. Sci. USA 118, e2022788118 (2021).
Article PubMed CAS Google Scholar
Meyer, R. S. & Purugganan, M. D. Evolution of crop species: genetics of domestication and diversification. Nat. Rev. Genet. 14, 840–852 (2013).
Article CAS PubMed Google Scholar
Tarjan, S. Autumn Apple Musing. News and notes of the UCSC farm and garden. Center for Agroecol. Sustain. Food Syst. 109, 1–2 (2006).
Dinges, J. R., Colleoni, C., James, M. G. & Myers, A. M. Mutational analysis of the pullulanase-type debranching enzyme of maize indicates multiple functions in starch metabolism. Plant Cell 15, 666–680 (2003).
Article CAS PubMed PubMed Central Google Scholar
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 1–11 (2015).
Article CAS Google Scholar
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204 (2017).
Article CAS PubMed PubMed Central Google Scholar
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Article CAS PubMed PubMed Central Google Scholar
Du, H. & Liang, C. Assembly of chromosome-scale contigs by efficiently resolving repetitive sequences with long reads. Nat. Commun. 10, 1–10 (2019).
Article ADS CAS Google Scholar
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).
Article ADS PubMed PubMed Central CAS Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinforma. 9, 1–14 (2008).
Article CAS Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
Article PubMed PubMed Central Google Scholar
Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126 (2018).
PubMed PubMed Central Google Scholar
Stanke, M., Steinkamp, R., Waack, S. & Morgenstern, B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 32, W309–W312 (2004).
Article CAS PubMed PubMed Central Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinforma. 5, 1–9 (2004).
Article Google Scholar
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinforma. 12, 1–14 (2011).
Article Google Scholar
Pertea, M., Kim, D., Pertea, G. M., Leek, J. T. & Salzberg, S. L. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 11, 1650–1667 (2016).
Article CAS PubMed PubMed Central Google Scholar
El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47, D427–D432 (2019).
Article CAS PubMed Google Scholar
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Article PubMed CAS Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Article CAS PubMed PubMed Central Google Scholar
Blum, M. et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 49, D344–D354 (2021).
Article CAS PubMed Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article CAS PubMed Google Scholar
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article PubMed CAS Google Scholar
Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
Article CAS PubMed Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 1–14 (2019).
Article Google Scholar
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Article CAS PubMed PubMed Central Google Scholar
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Article CAS PubMed Google Scholar
Marcais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).
Article PubMed PubMed Central CAS Google Scholar
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Goel, M., Sun, H., Jiao, W. B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 1–13 (2019).
Article Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Article PubMed PubMed Central CAS Google Scholar
McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS PubMed PubMed Central Google Scholar
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).
Article CAS PubMed PubMed Central Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Article CAS PubMed PubMed Central Google Scholar
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Article CAS PubMed Google Scholar
Browning, B. L. & Browning, S. R. Genotype imputation with millions of reference samples. Am. J. Hum. Genet. 98, 116–126 (2016).
Article CAS PubMed PubMed Central Google Scholar
Browning, B. L. & Browning, S. R. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194, 459–471 (2013).
Article PubMed PubMed Central Google Scholar
Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014).
Article CAS PubMed PubMed Central Google Scholar
Xie, Z. et al. Mutation rate analysis via parent–progeny sequencing of the perennial peach. I. A low rate in woody perennials and a higher mutagenicity in hybrids. Proc. R. Soc. B. 283, 20161016 (2016).
Article PubMed PubMed Central Google Scholar
Zhang, C., Dong, S. S., Xu, J. Y., He, W. M. & Yang, T. L. PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics 35, 1786–1788 (2019).
Article CAS PubMed Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Article CAS PubMed PubMed Central Google Scholar
Chen, H., Patterson, N. & Reich, D. Population differentiation as a test for selective sweeps. Genome Res. 20, 393–402 (2010).
Article CAS PubMed PubMed Central Google Scholar
Sabeti, P. C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913–918 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Szpiech, Z. A. & Hernandez, R. D. selscan: an efficient multithreaded program to perform EHH-based scans for positive selection. Mol. Biol. Evol. 31, 2824–2827 (2014).
Article CAS PubMed PubMed Central Google Scholar
Yu, G., Wang, L. G., Han, Y. & He, Q. Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. J. Integr. Biol. 16, 284–287 (2012).
CAS Google Scholar
Gong, L. & Xu, Q. Determination of total acid in foods. Vol. GB/T 12456–2008 (China standard Press, Beijing, 2008).
Filip, M., Vlassa, M., Coman, V. & Halmagyi, A. Simultaneous determination of glucose, fructose, sucrose and sorbitol in the leaf and fruit peel of different apple cultivars by the HPLC–RI optimized method. Food Chem. 199, 653–659 (2016).
Article CAS PubMed Google Scholar
Cao, X. et al. Peach carboxylesterase PpCXE1 is associated with catabolism of volatile esters. J. Agric. Food Chem. 67, 5189–5196 (2019).
Article CAS PubMed Google Scholar
Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).
Article CAS PubMed Google Scholar
Shim, H. et al. A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians. PloS ONE 10, e0120758 (2015).
Article PubMed PubMed Central CAS Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Guan, J. et al. Genome structure variation analyses of peach reveal population dynamics and a 1.67 Mb causal inversion for fruit shape. Genome Biol. 22, 1–25 (2021).
Article CAS Google Scholar
Livak, K. J. & Schmittgen, T. D. Analysis of relative gene expression data using real-time quantitative PCR and the 2− ΔΔCT method. Methods 25, 402–408 (2001).
Article CAS PubMed Google Scholar
Hunter, P. R., Craddock, C. P., Di Benedetto, S., Roberts, L. M. & Frigerio, L. Fluorescent reporter proteins for the tonoplast and the vacuolar lumen identify a single vacuolar compartment in Arabidopsis cells. Plant Physiol. 145, 1371–1382 (2007).
Article CAS PubMed PubMed Central Google Scholar
Leigh, J. W. & Bryant, D. popart: full-feature software for haplotype network construction. Methods Ecol. Evol. 6, 1110–1116 (2015).
Article Google Scholar
Zeballos, J. L. et al. Mapping QTLs associated with fruit quality traits in peach [Prunus persica (L.) Batsch] using SNP maps. Tree Genet. Genomes 12, 1–17 (2016).
Article Google Scholar

Download references

Acknowledgements

This research was supported by the National Key Research and Development Program (grant no. 2018YFD1000200) and the Financial Special Foundation (grant no. KJCX201907-2), the Innovation Capacity Building Foundation (grant no. KJCX20210432), and the Youth Foundation (grant no. QNJJ202120) from Beijing Academy of Agriculture and Forestry Sciences.

Author information

These authors contributed equally: Yang Yu, Jiantao Guan, Yaoguang Xu, Fei Ren.

Authors and Affiliations

Beijing Agro-Biotechnology Research Center, Academy of Agriculture and Forestry Sciences/Beijing Key Laboratory of Agricultural Genetic Resources and Biotechnology, Beijing, China
Yang Yu, Jiantao Guan, Yaoguang Xu, Zhengquan Zhang, Jun Fu, Jianhua Wei & Hua Xie
Beijing Academy of Forestry and Pomology Sciences, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China
Fei Ren, Jiying Guo, Jianbo Zhao & Quan Jiang
Institute of Pomology, Jiangsu Academy of Agricultural Sciences/Jiangsu Key Laboratory for Horticultural Crop Genetic Improvement, Nanjing, China
Juan Yan & Zhijun Shen

Authors

Yang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jiantao Guan
View author publications
You can also search for this author in PubMed Google Scholar
Yaoguang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Fei Ren
View author publications
You can also search for this author in PubMed Google Scholar
Zhengquan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Juan Yan
View author publications
You can also search for this author in PubMed Google Scholar
Jun Fu
View author publications
You can also search for this author in PubMed Google Scholar
Jiying Guo
View author publications
You can also search for this author in PubMed Google Scholar
Zhijun Shen
View author publications
You can also search for this author in PubMed Google Scholar
Jianbo Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Quan Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Wei
View author publications
You can also search for this author in PubMed Google Scholar
Hua Xie
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.X. and J.W. designed the research. Q.J., F.R., J.Y., J.G., Z.S., and J.Z. provided materials and information. J.G., Y.Y., Z.Z., and J.F. performed data analyses. Y.X. performed experiments and drafted related methods; H.X., Y.Y., and J.G. wrote and revised the manuscript with input and comments from the other authors.

Corresponding authors

Correspondence to Quan Jiang, Jianhua Wei or Hua Xie.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks Takashi Akagi, Xiangchao Gan, and other, anonymous reviewers for their contributions to the peer review of this work. Peer review reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer Review File

Supplementary data 1-21

Description of additional supplementary files

Reporting Summary

Source data

Source data file

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yu, Y., Guan, J., Xu, Y. et al. Population-scale peach genome analyses unravel selection patterns and biochemical basis underlying fruit flavor. Nat Commun 12, 3604 (2021). https://doi.org/10.1038/s41467-021-23879-2

Download citation

Received: 03 February 2021
Accepted: 17 May 2021
Published: 14 June 2021
DOI: https://doi.org/10.1038/s41467-021-23879-2

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

A high-quality LHSM reference genome

Peach population structure and pre-breeding improvement in fruit quality

Divergent selection for fruit acidity during modern peach breeding

Genetic loci associated with major sugars underlying peach fruit sweetness

Identification of the PpERDL16 gene and its contribution to increased fructose accumulation during peach improvement

Discussion

Methods

Plant materials

DNA extraction and sequencing

Genome assembly

Repeats and gene annotation

Evaluation of genome assembly

Gene families and phylogenetic analysis

Comparative genomics

SNP and small InDel calling

Phylogenetic and population structure analyses

Relative IBD and introgression analysis

Multiple sequentially Markovian coalescent analysis

Linkage disequilibrium

Genetic diversity

Selective sweeps

GO enrichment

Phenotypic analysis for fruit flavor related traits

Transiently overexpression assay

GWAS analysis

Validation and quantification of gene expression

Analysis of the subcellular localization of PpERDL16

Haplotype analysis and median-joint network

Reporting Summary

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links