Genetic diversity, population structure, and relationships of apricot (Prunus) based on restriction site-associated DNA sequencing

Single-nucleotide polymorphisms (SNPs) are the most abundant form of genomic polymorphisms and are widely used in population genetics research. Here, high-throughput sequencing was used to examine the genome-level diversity, population structure, and relationships of apricot, which are important for germplasm conservation and molecular breeding. Restriction site-associated DNA sequencing (RAD-seq) was adopted to sequence 168 Prunus spp. accessions distributed in five ecological groups, including 74 accessions of cultivated Prunus armeniaca L. and 94 accessions of wild apricots (P. armeniaca L. and Prunus sibirica L.), which generated 417,961 high-quality SNPs. We used cluster, genetic structure, and principal component analyses to examine the genetic diversities and genetic relationships of the 168 accessions. The Dzhungar-Ili ecological group accessions showed the highest genetic diversity in terms of private allele number, observed heterozygosity, and nucleotide diversity. We speculate that the Central Asian ecological group accessions were domesticated from the Dzhungar-Ili ecological group accessions. The population structure and gene flow of the North China and European ecological group accessions suggested a genetic background of P. sibirica. We argue that the two groups should be considered hybrid swarms connected to P. sibirica by continuous and extensive gene flow. P. armeniaca originated in Northwest China (Ili Valley), subsequently spread throughout Central Asia, and eventually spread to Europe. In addition, selective sweep signatures in P. armeniaca during domestication from wild to cultivated apricots, combined with differentially expressed genes, underlie distinct fruit traits, including sugars, aromas, organic acids, and carotenoids. This study provides substantive and valuable genomic resources that will significantly advance apricot improvement and effective utilization.

Chinese scholars 8 divided apricot into six ecological groups based on geographical distribution. The Central Asian ecological group (CAG) accessions are the most abundant, with vigorous tree growth, many slender twigs, and small leaves. These accessions flower late, are selffruiting, and produce soft, small fruits with a sweet kernel 8 . The European ecological group (EG) accessions are medium-sized trees with thick branches, weak growth, and a short dormancy period, and mostly produce firm fruits through self-fertilization. The flowers, leaves, and fruits of EG accessions are usually large 8 . Accessions in the Dzhungar-Ili ecological group (DZG) are small trees that produce small fruits with a pubescent peel and exhibit strong cold, drought, and waterlogging resistance 8 . The North China ecological group (NCG) accessions are tall trees with vigorous growth. These accessions have intermediate cold and drought resistance and produce fruits via self-fertilization that are large and juicy 8 . The Northeast Asian ecological group (NAG) accessions are small shrubs that have strong cold resistance. These accessions can tolerate temperatures as low as −50°C in winter, exhibit early leaf fall, and produce small and dry fruits with bitter flesh that are not suitable for consumption 8 . The East China ecological group (ECG) accessions include shrubs or medium-sized trees and tall trees and have strong disease resistance. The seed-setting rate of self-pollinated flowers is very low, the colors of the flowers vary, and the fruits are mostly green 8 . Therefore, the groups exhibit large morphological and physiological differences and wide adaptability to ecological conditions 9 .
According to Vavilov 10 , apricots have three origin centers: China, Central Asia, and the Near East. Most scholars 5,6,11,12 support the history of ancient apricots in Central Asia and China, regarding these regions as independent centers of domestication. However, the early history of apricot is not completely clear, and the main question of whether the cultivation history in Central Asia is longer or shorter than that in China remains unanswered 13 . De Candolle et al. 14 argued that apricot cultivation in China dates back to the end of 3000 BC. However, there is no archeological evidence for the beginning of apricot domestication and cultivation. In Central Asia, apricot cultivation was recently introduced in~1000 to 2000 BC 15 . Many scholars have shown that the genetic diversity of apricots in China is the richest worldwide [16][17][18] .
The Kashgar, Hotan, and Kuqa oasis areas around the Tarim Basin in the southern part of the Xinjiang Uygur Autonomous Region of China are the main apricotproducing areas and contain the greatest abundance of apricot cultivars. The wild apricot forest in Ili, Xinjiang, China, is a relic of a broad-leaved forest from the late Tertiary community that played a decisive role in the domestication and cultivation of apricot worldwide 18 . It is an important part of the deciduous broad-leaved forest under the mountain coniferous forest and abovemountain grassland in Xinjiang 8 . There is only one mountain between the southern Xinjiang and Ili areas in the northern Tianshan Mountains, and there are several corridors between the northern and southern Tianshan Mountains. Geographically, the apricots cultivated in Xinjiang, southern Tianshan Mountains (CAG), most likely evolved from the spread of wild apricots in the Ili Valley (DZG) 8 .
Crop domestication not only changes economic and agronomic traits but also retains the genetic characteristics that affect the genetic diversity and population structure of domesticated plants 19 . Identifying genetic variations in wild and cultivated populations and within populations can provide insights into the general mechanisms of plant domestication and diversification and guide crop genetic improvement in future breeding programs 20 . There is evidence that domestication can lead to a decrease in the genetic diversity of cultivated crops 21 . The genetic information present before domestication and artificial selection may be retained in wild populations, which are special resources for studying the effects of variation in the apricot genome. However, the natural apricot population in China is strongly influenced by human activities, such as logging and deforestation. Consequently, protective measures must be taken to prevent further declines in wild apricot resources. Information on genetic diversity and population structure is essential for the development of management and conservation approaches. We studied the genetic diversity and population structure of apricot and identified the representative core germplasm. We can preserve the genetic diversity of the whole population to the greatest extent by using the smallest amount of genetic resources, and further improve the protection efficiency of apricot germplasm resources 22 . Developing genomic resources, increasing the knowledge of the apricot gene pool, and obtaining information about genetic diversity and population structure can accelerate biological research and genetic improvement.
The rapid development of high-throughput sequencing technology has provided whole-genome information for nonreference crops. The use of this technology has produced large data sets of short sequences within millions of genomes with deep coverage. These methods, including restriction site-associated DNA sequencing (RAD-seq) and genotyping-by-sequencing (GBS), have been successfully applied in population genomics studies [23][24][25] . Previous studies used nuclear DNA, mitochondrial DNA, and chloroplast DNA 3,26-28 sequence fragments to study the genetic diversity of apricots and the relationships between apricot and other related species 29 . Compared with genome-level sequencing, plastid and nuclear gene molecular markers have a lower accuracy and resolution. The limited number of polymorphic loci produced by conventional molecular markers hinders genetic research on apricot.
The main purposes of this study were to evaluate the genetic diversity, population structure, and relationships of apricots in different ecological groups using RAD-seq of the whole genome, clarify the phylogenetic relationships between five ecological groups and provide a theoretical basis for further improving and effectively utilizing apricot germplasm resources.

Sequence data quality and processing
A collection of 168 Prunus spp. accessions were successfully sequenced using the HiSeq 2000 system (Illumina Inc., USA) producing a total of 192 Gb of raw reads, with an average of 1.14 Gb for each accession. After filtering low-quality reads, a total of 180 Gb of clean data were retained, with an average of 1.07 Gb per accession and an average success rate of 98.44%. The quality of the sequencing data was high (Q20 > 90%, Q30 > 85%), the GC content was stable at 37.33-41.25%, and all samples were free from contamination (Supplementary Data- Table 2). The enzyme capture rate ranged from 92.84 to 98.92%. The mapping rate ranged from 88.7 to 94.37%. A total of 1,475,632 small nucleotides (SNPs) were obtained from the 168 Prunus spp. accessions using SAMtools software 30 , and 417,961 SNPs were obtained after filtering, which ensured the accuracy and reliability of subsequent genetic diversity and population structure analyses. At the population level, more than half of the SNPs (with an average of 55.72%) had base transitions, and the transition vs. transversion (ts/tv) ratios for the populations ranged from 1.787 to 1.799. The ts/tv ratios of SNPs in the two wild populations (NAG and DZG) were larger than those in the cultivated populations ( Fig. 1; Supplementary Data- Table 3).

Population phylogenetic relationships
A phylogenetic tree of all 168 accessions was constructed to better understand their relationships (Fig. 2). In the phylogenetic tree, most branches had high bootstrap values, which indicated that the maximum likelihood (ML) tree was highly reliable. The phylogenetic tree divided the collected accessions into five groups. One group included the NAG accessions, located in Northeast China and belonging to P. sibirica. These accessions are adapted to and grown in extremely cold regions in China, and are thus considered very hardy. The other four groups belonged to P. armeniaca. These groups include (1) the DZG (group I), with four subgroups based on the strong geographical distribution pattern among the branches (Yining County (group I-1), Huocheng County (group I-2), Xinyuan County (group I-3), and Gongliu County (group I-4)) in the Ili region, Xinjiang; (2) the CAG (group II), with apricot cultivars mainly from the Kashgar, Hotan, and Kuqa areas in Xinjiang that could not be clearly distinguished based on their geographical distributions, which may be due to the high genetic diversity and frequent gene exchange generated by selection or introduction, reflecting the complex domestication history of cultivated apricots 18 (notably, as revealed by the population structure analysis (Fig. 3), the CAG was mixed with the genetic background of DZG; therefore, Xinjiang cultivated apricots originated from Ili wild fruit forests); (3) the EG (group III), with apricot cultivars mainly from North America, Italy, and France; and (4) the NCG (group IV), with apricot cultivars mainly from Northeast China, North China, and Korea.

Population genetic structure
Population genetic structure analysis was conducted based on high-quality SNPs filtered by VCFtools. Based on the ML, K = 2-8 ancestral populations were assumed in this study. The optimal population K value was determined according to the coefficient of variation (CV), and the optimal value obtained in this study was K = 5 (Supplementary Data- Fig. 1). According to Fig. 4, when the K value was 5, 168 materials were not assigned to the five groups, which was inconsistent with the results of the phylogenetic tree analysis. Due to divergence in the population structure and phylogenetic tree results, it was necessary to analyze population structure under different K values. Therefore, we examined the population structure for K values ranging from 2 to 8 (Fig. 3).
When K = 3, the CAG and DZG accessions were distinguished, while the EG, NCG, and NAG accessions could not be separated. When K = 6, in addition to the CAG and DZG accessions, the NCG and EG accessions could be distinguished from each other. When K = 8, the population structure of the CAG, NCG, and EG accessions remained unchanged, and internal differentiation occurred in only the DZG accessions, which were divided into four subgroups: Huocheng, Yining, Gongliu, and Xinyuan. The genetic background of the CAG accessions was complex, but most of the genetic background was found to originate from the DZG accessions, which supports the hypothesis that the cultivated apricots in Xinjiang originated from wild apricots in the Ili region. However, it was interesting to note that the population structure of the EG and NAG accessions changed very little as K increased from 2 to 8. The highly polymorphic and diverse properties of the DZG, CAG, and NCG might have influenced the population structure of the EG and NAG. Combined with the phylogenetic tree, these results clearly divided apricot into five groups: CAG, DZG, NCG, EG, and NAG accessions.

Population genetic relationships
The collection of 168 accessions was analyzed by principal component analysis (PCA) to better understand the relationships between ecological groups (Fig. 4), and the PCA scores were used to evaluate genetic variance. The first three components explained 16.49% of the total genetic variation (8.28, 5.88, and 2.33%, respectively). According to the first three components, the accessions were divided into five groups: the CAG, DZG, NCG, EG, and NAG.

Gene flow analysis
We analyzed the gene flow between the five ecological groups (Supplementary Data-Fig. 2). Allowing just one migration event (m = 1) (Fig. 5a), we observed that gene flow occurred from the NAG to NCG accessions. In terms of geographical location, these two ecological groups were close to each other, which may have caused the germplasm of the NCG accessions to be mixed with that of P. sibirica. Allowing two admixture events (m = 2) (Fig. 5b), the EG, NCG, and CAG accessions exhibited extensive gene flow, which likely reflects their many shared genomic components due to hybridization in their domestication and breeding histories. This result was consistent with that observed on the basis of population structure (Fig. 3).

Population genetic diversity
The expected heterozygosity (He) of the P. armeniaca populations ranged from 0.378 to 0.407; the observed heterozygosity (Ho) of the P. armeniaca populations ranged from 0.218 to 0.260; Wright's F-statistic (F IS ) of the P. armeniaca populations ranged from 0.030 to 0.078; and the number of private alleles (A P ) in the populations of P. armeniaca ranged from 11-11,069. The average nucleotide diversity (π) and polymorphism (Waterson's theta, θw) estimates of the P. armeniaca population mutation parameters were 0.0027 and 0.0016 at the genome level, respectively (Table 1, Fig. 6). The mean nucleotide variation of P. armeniaca was higher than that of other perennial crops, such as peach (π = 0.0015) 31 and cassava (π = 0.0026) 32 , but lower than that of date palm (π = 0.0092) 33 .
Further analysis of the DZG accessions (Table 2) showed that the observation heterozygosity (Ho) ranged from 0.373 to 0.383, the expected heterozygosity (He) ranged from 0.256 to 0.268, the inbreeding coefficient (F IS ) ranged from 0.015 to 0.055, and the private allele number (A P ) ranged from 2026 to 8847. The Ho and A P in Xinyuan County were higher than those in Gongliu, Yining, and Huocheng Counties.

Genetic differentiation and AMOVA
The genetic differentiation index (Fst) between populations ranged from 0.1080 to 0.1946 (Fig. 6), and the  (Fig. 6), indicating that the genetic differences within populations were greater than those between populations and the occurrence of gene flow between populations 34,35 . The Fst values of the EG and NAG accessions were the largest among all the groups analyzed (Fig. 6), which may have been due to geographical isolation, low gene flow between populations, and large genetic differences 34,35 .
Assessment of genetic differences amDNA was extracted from young leavesong the five geographical groups by analysis of molecular variance (AMOVA) revealed that 12.02% of the genetic variation occurred between populations and 87.98% occurred within populations (Table 3). Although the variation among the five geographical groups of accessions mainly occurred within populations, the differences between populations were also large.    Table 9).

Scans for selective sweeps
One carotenoid biosynthesis gene, one cellulose biosynthesis gene, two serine and threonine metabolism genes, and two GTPase regulatory genes were identified in       Table 4). Eight serine and threonine metabolism genes, three ethylene biosynthesis genes, two aromatic metabolism genes, two organic acid metabolism genes, one high-affinity transport system gene, two stress tolerance-related genes, one pectin degradation gene, one pathogen-related gene, and one salicylic acid biosynthesis gene were detected in the EG accessions (Supplementary Data- Table 6). One auxin-related gene, thirteen serine and threonine metabolism genes, five GTPase regulatory genes, five ethylene biosynthesis genes, one organic acid metabolism gene, one stress tolerance-related gene, one pathogen-related gene, one salicylic acid biosynthesis gene, three sugar metabolism genes, one auxin-related gene, one alanine metabolism gene, one aspartate metabolism gene, one glutamate metabolism gene, and one glucuronoxylan biosynthesis gene were detected in the NCG accessions (Supplementary Data- Table 8).
The candidate genes identified in selective sweeps of P. armeniaca during domestication from wild to cultivated apricots differ greatly in function, indicating that different ecological groups of P. armeniaca adapted to different selection pressures for desired agronomic characteristics in cultivated apricot. Thus, the domestication of cultivated apricot was a process of differential selection.

Discussion
Genomic data provide a novel perspective for studying the genetic diversity and domestication of apricot. In this study, we collected 168 accessions from five geographical groups and performed whole-genome restriction enzyme digestion to quickly obtain highly accurate variation information based on markers. The sample sizes averaged 1.14 Gb, ensuring the accuracy of the population genetic analysis. We used RAD-seq to detect 417,961 SNPs, which exceeded the numbers of amplified fragment length polymorphism (AFLP) and random amplified polymorphic DNA (RAPD) markers.
Substitution between purine and purine or between pyrimidine and pyrimidine is called transition, and substitution between purine and pyrimidine is called transversion. Point mutations are common in organisms and cause a wide variety of phenotypic changes. This nucleotide mutation pattern is also observed in other plants, such as peanut 36 , maize 37 , Amorphophallus paeoniifolius 38 , and Arabidopsis 39 . Due to the biased mutation process in the plant genome, the predicted number of transitions is much larger than the predicted number of transversions 38 . Consistent with these predictions, we found that the base transition values were higher than the base transversion values and that the ts/tv ratios of the five geographical groups ranged from 1.787 to 1.799 (Fig. 1), indicating strong transition bias. Further analysis showed that the ts/tv ratio of the cultivated population was lower than that of the wild population, which may have been due to long-term, strict artificial selection, and the loss of evolutionary potential. The genetic diversity index of the cultivated population was much lower than that of the wild population (P. armeniaca) (Table 1, Fig. 6), and the genetic diversity of the EG accessions was lower than that of the other groups of accessions. The A P was most significant in the wild population (DZG), measuring 11,069, indicating a large pool of genetic variation, which could guide the genetic improvement of crops in future breeding programs. In contrast, A P values of only 11-3784 (Table 1) were found in the domesticated population (P. armeniaca). These differences may be due to inbreeding or continuous directional selection during the domestication process, which narrows the genetic basis of germplasm, reduces genetic diversity, and increases the possibility of genetic drift during the domestication process. Similar results were obtained by Li et al. 26 .
According to Vavilov 10 , China and Central Asia were the two main centers of apricot domestication. Most cultivated apricots belong to P. armeniaca. Among the four main ecological groups of P. armeniaca species, Kostina 40 argued that DZG accessions are the most primitive and that CAG accessions have the longest cultivation history. The cultivation of EG accessions has a relatively recent history, being traced back to~2000 years ago 16 . In Central Asia, local cultivars are most likely to have originated from wild apricots, moving southward from the Kazakhstan-China border (Dzhungar-Ili) to Kashmir and westward into the mountains of Afghanistan 16 . In our study, 168 accessions were divided into five ecological groups based on population structure, PCA and ML tree analysis. The CAG and DZG accessions were tightly clustered, indicating that the DZG (wild apricot) may be the ancestor of the CAG. Zhebentyayeva et al. 18 and Li et al. 26 reached similar conclusions. The genetic background of the CAG was most similar to that of the DZG (Fig. 4). The genetic diversity indexes (Ho, A P , and π) of the DZG accessions were higher than those of accessions in the other ecological groups (Table 1, Fig. 6), which showed the lowest genetic differentiation for CAG accessions (Fig. 6), further supporting the CAG accessions as having originated from the DZG accessions. We speculate that the DZG accessions are the common ancestors of the CAG, NCG, and EG accessions. P. armeniaca originated in Northwest China (Ili Valley), subsequently dispersed throughout Central Asia, and eventually spread to Europe. These findings are reasonable from a historical perspective, as there was extensive cultural contact along the Silk Road from 207 BCE to 220 CE 41 . Therefore, historical and commercial influences may have contributed to the development of this unique species of cultivated apricot.
Further analysis of the DZG (Fig. 2) clustered the Xinyuan County and Gongliu County accessions into one group and the Yining County and Huocheng County accessions into another group, providing further geographical support for the relationships among these accessions. According to the genetic diversity indexes (Table 2), the Xinyuan County accessions were the most diverse. He et al. 42 also found that wild apricots in the Yili Valley maintained a high level of diversity and that genetic variation mainly occurred within populations.
The population structure and gene flow analyses of the NCG and EG accessions suggested a genetic background of P. sibirica. We argue that the two groups should be considered hybrid swarms connected to P. sibirica by continuous and extensive gene flow.
Human intervention via the artificial selection of favorable phenotypic traits to enhance production and improve desirable agronomic traits can both reduce the levels of genetic variability and skew allele frequencies 43 . The different genes identified in selective sweeps of CAG, EG, and NCG apricots were found to be enriched in biological processes, including the metabolism of sugars, aromas, organic acids, and carotenoids. These genes may have been involved in different domestication pathways, leading to differences in traits among apricot ecological groups. Some sugar-related genes have been found in selective sweeps, suggesting a preference for sweet fruit during domestication. For the NCG accessions (Supplementary Data- Table 8), a total of three sugar-related genes were identified in the selected regions, including two genes that encode enzymes (vacuolar invertase and neutral invertase). Two organic acid genes were identified in the EG accessions (Supplementary Data- Table 6), and only one was identified in the NCG accessions, indicating that the dominant acid components of apricot differ among ecological groups. Fewer genes involved in carotenoid biosynthesis were identified in selective sweeps. One carotenoid biosynthesis gene was identified in the CAG accessions (Supplementary Data- Table 4), indicating a preference for a certain color during domestication. For volatile compounds in apricot fruit, two genes related to aromatic metabolism (fatty acid desaturase and cytochrome P450) were identified in selective sweeps of the EG accessions (Supplementary Data- Table 6), indicating that aroma in these groups of apricots is regulated by different genes in the metabolic pathway.

Conclusion
In this study, we report genomic information for 168 wild and cultivated apricot accessions. Our findings provide insights into the dissemination of P. armeniaca. Selective sweep signatures in P. armeniaca during domestication from wild to cultivated apricots, combined with differentially expressed genes, underlie distinct fruit traits, including sugars, aromas, organic acids, and carotenoids. Population structure analysis provided new evidence for the mixed genetic background of apricot in different ecological groups. Our study provides abundant genomic resources for wild and cultivated apricots and makes important contributions to the genetic improvement and molecular breeding of apricot.
Seventy-four of the accessions were provided by the Luntai National Fruit Germplasm Resources Garden of the Xinjiang Academy of Agricultural Sciences, the Yingjisha County Apricot National Forest Germplasm Bank, and the Xiongyue National Germplasm Resources Garden of the Liaoning Institute of Pomology. Ninety wild apricots (P. armeniaca) were collected from the natural distribution areas of wild fruit forests in Xinyuan County, Gongliu County, Yining County and Huocheng County in Ili, Xinjiang. Four wild apricots (P. sibirica) were provided by the Xiongyue National Germplasm Resources Garden of the Liaoning Institute of Pomology (Supplementary Data- Table 1).
DNA extraction, library preparation, and sequencing DNA was extracted from young leaves using a Plant Genomic DNA Kit (Tiangen, Beijing, China) 44 . The purity and integrity of DNA were analyzed by 1% agarose gel electrophoresis and a NanoDrop® spectrophotometer (ND-1000, Thermo Fisher Scientific, USA). The DNA concentration was accurately quantified with a Qubit TM 2.0 fluorometer (Invitrogen, Carlsbad, CA, USA), after which the DNA was stored at −20°C 34 . RAD library preparation and sequencing were conducted by Novogene Bioinformatics Technology Co. Ltd., Beijing. The DNA concentration used for library preparation was 75 ng/μl, and the qualified DNA was digested by the EcoRI (GAATTC) restriction enzyme. Each sample was physically randomly fragmented by the Covaris S220 system (Covaris, Woburn, MA, USA), and then subjected to barcode ligation, gel fragment selection, adapter connection and PCR fragment amplification, and 300-500 bp sequences were recovered 38 . Paired-end sequencing was performed on the Illumina HiSeq 2000 platform (Illumina, San Diego, CA, USA), and each sample generated 2.32 Gb of raw data. The clean sequencing data have been submitted to the NCBI Short Read Achieve with accession number PRJNA592274.

Mapping and SNP calling
Qualified sequencing data were aligned to the CAG 09 reference genome (P. armeniaca cv. "Cuijianali"; NCBI accession number PRJNA592274) by Burrows-Wheeler Aligner (version 0.7.8-r455) 45 . We used SAMtools software 30 to detect SNPs in the population. The final product was converted to variant call format (VCF).

Sequencing and SNP-calling statistics
To obtain high-quality data, the raw data were sequenced, compared, assessed for mutations, screened, and analyzed. After the reads were successfully aligned to the reference genome through quality control, SAMtools 30 software was used to calculate read lengths. The ts/tv ratio was obtained with BCFtools (version 1.8) 30 .
To clarify the genetic relationships among the five ecological groups, phylogenetic trees were constructed. We used the published RAD-seq data of M. sieversii 46 (NCBI accession number SRX2934932) as an outgroup. TreeBeST (version 1.9.2) 47 was used to calculate the distance matrix, RAxML (version 8.2.12) 48 was used to construct the ML phylogenetic tree, and 1000 bootstrap replicates were used.
Population structure was analyzed by the frappe program 49 by setting the number of genetic groups from 2 to 8 and using 1000 bootstrap replicates. To determine the optimal group structure classification, the optimal group number was identified as that with the minimum CV 50 . Group structure was analyzed by PCA using GCTA 51 software, and the significance level of feature vectors was determined by using a Tracy-Widom test. Genome-wide complex trait analysis (GCTA) software integrates the layered correction method of Eigenstart 52,53 , which can clearly distinguish cases and controls.

Gene flow analysis
Based on the SNPs, we used TreeMix (version 1.13) to evaluate the gene flow between the five ecological groups with the "-se-bootstrap-k 1000 -m" setting, where the number (-m) varied from 1 to 3.

Population genetic diversity and differentiation analysis
The genetic diversity indexes included A P , Ho, He, π 43 , θw 54 , F IS , and Tajima's D 43 . Sliding windows with a 5 kb step size and 10 kb size were used to calculate Ho, He, π, θw, and Tajima's D. For each window, these values were run using the Bio::PopGen package with a custom Perl script 55 . Paired Fst values 56 were calculated in the same window to measure the difference between populations. F IS 29 was calculated using the POPULATION program in the STACKS pipeline. The statistical values were all calculated using VCFtools (version 0.1.14) 57 .
Using AMOVA in Arlequin (version 3.5.1) 58 , different genetic structures were defined and statistically tested by classifying and dividing the populations at different levels.

Identification of selective sweeps
Qualified sequencing data were aligned to the P. mume L 59 . reference genome to identify selective sweeps in the 168 Prunus spp. accessions distributed in five ecological groups. The results were plotted using the ggplot2 package in R software.