Phylogeography of Prunus armeniaca L. revealed by chloroplast DNA and nuclear ribosomal sequences

To clarify the phytogeography of Prunus armeniaca L., two chloroplast DNA fragments (trnL-trnF and ycf1) and the nuclear ribosomal DNA internal transcribed spacer (ITS) were employed to assess genetic variation across 12 P. armeniaca populations. The results of cpDNA and ITS sequence data analysis showed a high the level of genetic diversity (cpDNA: HT = 0.499; ITS: HT = 0.876) and a low level of genetic differentiation (cpDNA: FST = 0.1628; ITS: FST = 0.0297) in P. armeniaca. Analysis of molecular variance (AMOVA) revealed that most of the genetic variation in P. armeniaca occurred among individuals within populations. The value of interpopulation differentiation (NST) was significantly higher than the number of substitution types (GST), indicating genealogical structure in P. armeniaca. P. armeniaca shared genotypes with related species and may be associated with them through continuous and extensive gene flow. The haplotypes/genotypes of cultivated apricot populations in Xinjiang, North China, and foreign apricot populations were mixed with large numbers of haplotypes/genotypes of wild apricot populations from the Ili River Valley. The wild apricot populations in the Ili River Valley contained the ancestral haplotypes/genotypes with the highest genetic diversity and were located in an area considered a potential glacial refugium for P. armeniaca. Since population expansion occurred 16.53 kyr ago, the area has provided a suitable climate for the population and protected the genetic diversity of P. armeniaca.

The evolutionary history of organisms, including their genetic diversity, population structure and historical dynamics, is critical to species conservation 1 . Understanding the effects of climate change on the spatial genetic patterns of species, particularly endangered species, can help reveal not only the evolutionary history of species but also conservation strategies 2,3 . The origins of mountain biodiversity are complex and may include the immigration of preadapted lineages [4][5] , in situ diversification 6 , or the continuation of ancestral lineages 7 . Compared with those of other mountains, such as the Hengduan Mountains, the organismal evolution and diversity of the Tianshan Mountains are still poorly understood. The Ili River Valley is located in the western part of the Tianshan Mountains in China and is surrounded by mountains on three sides. This valley was the main northern crossing of the ancient Silk Road. In the late Tertiary period, a large number of species, including wild apricot, wild apple and wild hawthorn, remained in the Ili River Valley and were important components of the deciduous broad-leaved forest at elevations below the coniferous forest and above the mountain grassland belt in the Xinjiang Uygur Autonomous Region, China 8 . However, there have been few studies on the phylogeography of plant species in the arid region of Northwest China [9][10] .
In recent decades, the method of combining molecular data with paleoclimatic and geographical evidence has been effective in the study of systematic geography [11][12] . Many systematic geographic studies have shown that the Last Glacial Maximum (LGM) of the Pleistocene strongly influenced the genetic variation and biodiversity of plants throughout the Northern Hemisphere [13][14] . Especially during the Quaternary glacial period, species in the ice-free zone were mainly affected by the cold and dry climate 15 . Climatic fluctuations cause the distribution of species to shrink and expand 16 , and cold and dry climates prompt plant and animal species to retreat to refugia,

Materials and methods
Sample collection. The samples used for cpDNA analysis included 123 individuals from 20 populations. A total of 171 individuals from 19 populations were used for the ITS analysis, of which 38 samples were obtained from the NCBI database (Table S1). The samples studied were from P. armeniaca and related species (Prunus sibirica, NAG; Prunus mandshurica, LX; Prunus dasycarpa, ZX; Prunus mume, ECG; Prunus zhengheensis, ZHX 33 ; Prunus limeixing, LMX 33 and Prunus brigantina, FGX). Prunus davidiana (T) was used as the outgroup.
Our collection of wild apricot (P. armeniaca) populations covered most of the natural distribution in China, including Huocheng County (DZGhcmd, DZGhcy and DZGhcm populations), Yining County (DZGyn population), Gongliu County (DZGglb and DZGgld populations) and Xinyuan County (DZGxyt, DZGxya and DZGxyz populations). The distance between individuals sampled in each population was at least 100 m. Young leaves were collected and dried immediately with silica gel.
The cultivated populations of P. armeniaca included the Xinjiang apricot group (CAG, cultivated apricots in Xinjiang), the North China apricot group (NCG, cultivated apricots in Shandong, Shaanxi, Gansu, Liaoning and Ningxia) and the foreign apricot group (EG, cultivated apricots in the USA, France, Italy and Australia). Detailed sample information is provided in Table 1 and Table S2. The main characteristics of different populations can be found in Zhang et al. 8 .
DNA sequencing. Total genomic DNA was extracted from the silica gel-dried leaf materials using a Plant Genomic DNA Kit (Tiangen Biotech, Beijing, China) 34 . The quality and concentration of the extracted DNA were determined by 1% agarose gel electrophoresis and ultraviolet spectrophotometry, respectively.
cpDNA and nrDNA sequences from 15 samples were initially screened using universal primers. The sequencing results showed that the sequences of cpDNA (genes trnL-trnF and ycf1) and two nuclear ribosomal ITS regions (ITS1 and ITS2) were polymorphic. cpDNA and ITS fragments were amplified by polymerase chain reaction (PCR), and the details of their primers are provided in Table S3 [35][36][37] . PCR was performed in a total volume of 25 µL that contained 1 µL DNA, 5.5 µL PCR mix, 16.5 µL double-distilled water and 1 µL each forward or reverse primer. PCR amplifications were performed under the following conditions: 5 min of initial denaturation at 94 °C and 35 cycles of 0.5 min at 94 °C, 0.5 min of annealing at 58°, and 0. 5  Haplotype/genotype diversity (Hd) and nucleotide diversity (π) were calculated using DnaSP ver. 5.10 software 44 . The within-population gene diversity (H S ), gene diversity in all populations (H T ), interpopulation differentiation (G ST ) and number of substitution types (N ST ) were calculated using PERMUT 45 ver. 2.0. The last two indexes (G ST and N ST ) were analyzed via permutation tests with 1000 permutations. When N ST is greater than G ST , it indicates the existence of genealogical geographic structure 45 . Analysis of molecular variance (AMOVA) was performed using Arlequin ver. 3.5.2.2 46 to partition the genetic variation at different levels, with statistical significance determined by 1,000 permutations.
The outgroup was P. davidiana, and the time point of peach-apricot differentiation was used as the calibration point 50 . The BEAUti interface was used to create an input file for BEAST 51 , for which the GTR + I + G nucleotide substitution model was used. The data were analyzed using a relaxed log-normal clock model and the Yule
A phylogenetic tree of all 57 genotypes was constructed to better understand their relationships (Fig. 3). The phylogenetic tree roughly divided the collected accessions into two groups. One group included the P. armeniaca  (T2, T3, T11, T25, T8, T9, and T12) as P. armeniaca, indicating that they are associated with P. armeniaca through continuous and extensive gene flow. Based on the concatenated cpDNA sequences (trnL-trnF and ycf1), 19 haplotypes (G1-G19) were identified among 107 individuals from 12 populations of P. armeniaca (Fig. 2). Variable sites among the 19 haplotypes are shown in Table S6. The Hd and π values detected at the cpDNA sequence level in P. armeniaca were 0.548 and 1.9 × 10 -3 , respectively. The geographic distribution of the 19 haplotypes is shown in Fig. 2A, illustrating that G1 haplotypes were distributed in all populations. The cpDNA haplotype network (Fig. 2B) showed that 7 haplotypes were differentiated from haplotype G1 by a one-step mutation with G1 at the center. Three haplotypes were differentiated from G11 by a one-step mutation. The overall network map showed a "star-like" distribution pattern. Based on the ITS sequences, 34 haplotypes (P1-P34) were identified among 124 individuals from 12 populations of P. armeniaca ( Figure S2). The Hd and π values detected at the ITS sequence level in P. armeniaca were 0.865 and 5.4 × 10 -3 , respectively.
AMOVA revealed significant genetic differentiation among all populations of P. armeniaca (cpDNA: F ST = 0.1628, P < 0.001; ITS: F ST = 0.0297, P < 0.001), with most of the genetic diversity occurring within the populations and relatively little occurring among them (Table 4).
Demographic history and estimation of divergence times. The mismatch distribution analysis based on the cpDNA and ITS dataset analysis, in which multimodal data were drawn from the cultivated populations or all populations, revealed a demographic equilibrium (Fig. 4, Figure S3). Both the neutrality tests based on Tajima's D (cpDNA: − 2.272, P < 0.05; ITS: − 0.966, P < 0.05) and Fu's F S (cpDNA: − 5.8253, P < 0.05; ITS: -− 2.223, P < 0.05) and the mismatch distribution analysis (Fig. 4) based on the cpDNA and ITS datasets suggested recent range or demographic expansion in wild populations of P. armeniaca. In addition, neither the SSDs (cpDNA: 0.037, P > 0.05; ITS: 0.002, P > 0.05) nor the HRI (cpDNA: 0.179, P > 0.05; ITS: 0.017, P > 0.05) Table 2. Sample information and summary of haplotype/genotype distribution, genetic diversity for each population. N, sample size; Hd, haplotype/genotype diversity; π, nucleotide diversity.  (Table 5), indicating no deviation of the observed mismatch distribution from that obtained via model simulation under sudden demographic expansion. Thus, we concluded that the demographic expansion of the wild populations of P. armeniaca occurred 16.53 kyr ago. The cpDNA dataset was employed to estimate when the onset of divergence between P. armeniaca and its related species occrured (Fig. 5, Table S7). Thirty-three haplotypes were divided into two groups: those of P. armeniaca (blue) and those of related species (green) (Fig. 5). The divergence time estimation revealed that the differentiation of P. armeniaca from its related species occurred during the middle Eocene, approximately 45.68 Ma (95% highest posterior density (HPD) = 28.47-61.87). The onset of intraspecific divergence in P. armeniaca was estimated to have occurred 25.55 (95% HPD = 12.93-39.63) Ma.

Discussion
Parental genetic markers are often combined with single-parent organelle markers for population genetics studies. Li et al. 32 used cpDNA trnL-trnF, rpl16 and nrDNA ITS sequences to infer the evolutionary history of S. sinomontana. Yang et al. 52 used cpDNA psbA-trnH, trnL-trnF, ycf1, and matK sequences to access the demographical history and genetic diversity of a Deciduous Oak (Quercus liaotungensis) in Northern China. Zhang et al. 53 used cpDNA markers to successfully determine the genetic diversity, genetic structure, and demographic history of 7 Michelia yunnanensis populations. Many scholars [54][55][56][57] believed that the diversity of wild apricot is richest in the Ili River Valley, with low levels of genetic differentiation and genetic variation mainly occurring within populations. Hu et al. 56 used simple sequence repeat markers to analyze the diversity of 212 apricot germplasms from 14 populations in the Ili River Valley. Among the populations, that from the Tuergen township in Xinyuan County had the highest genetic diversity, and the genetic distance between populations was significantly correlated with geographical distance. The self-incompatibility, wide distribution, and long-distance transmission of pollen through insects and strong winds of apricot are the main factors affecting its genetic structure 56 . Based on cpDNA and ITS data, we concluded that the haplotype/genotype diversity of wild apricot populations distributed in Ili River Valley was relatively high (Table 2), with that of the DZGhcmd and DZGyn populations being the highest. The results of AMOVA (Table 4) showed that the genetic diversity in P. armeniaca mainly occurs within populations (cpDNA: 83.72%; ITS: 97.03%), but there were also significant differences among populations (cpDNA: www.nature.com/scientificreports/ 16.28%; ITS: 2.97%), which was consistent with previous results based on simple sequence repeat markers 56 . The relatively high genetic diversity also confirmed the Tianshan Mountains as the origin center of cultivated apricot 56 . The limited informative mutation sites among the ITS genotypes led to very little resolution for the construction of genotype relationships ( Figure S2), suggesting rapid intraspecific differentiation in the recently derived species P. armeniaca, similar to the results found in S. sinomontana 32 .
The genetic backgrounds of the related species had the same genotypes (T2, T3, T11, T25, T8, T9, and T12) as P. armeniaca, indicating that they are associated with P. armeniaca through continuous and extensive gene    58 concluded that P. sibirica was divided into two groups based on microsatellite markers, one of which may have undergone gene exchange with P. armeniaca, further verifying our results. In addition, the authors found an extensively mixed genetic background in the germplasm of cultivated apricots in China. This study indicated that the cultivated and wild populations of P. armeniaca had the same ancestral haplotype, G1. The haplotypes of the CAG, EG and NCG populations were mixed with the haplotypes/genotypes of the large wild populations (P. armeniaca). According to coalescent theory 59 , chloroplast haplotype G1, which was widely distributed and located in the center of the chloroplast network (Fig. 2), should be considered the oldest haplotype. The Kashgar, Hotan and Aksu oasis areas around the Tarim Basin in the southern part of the Xinjiang Uygur Autonomous Region of China are the main apricot-producing areas and contain the greatest abundance of apricot cultivars. There is only one mountain between southern Xinjiang and the Ili Valley, and there are several corridors between the northern and southern Tianshan Mountains. Therefore, the apricots cultivated in Xinjiang, southern Tianshan Mountains (CAG), most likely evolved from the spread of wild apricots in the Ili River Valley. Liu et al. 58 argued that apricots have experienced at least three domestication events, giving rise to apricots in Europe (the United States and continental Europe), southern Central Asia (Turkmenistan,   58 . In this study, the nucleotide diversity and haplotype/genotype diversity of wild apricots (DZG accessions) were higher than those of CAG and NCG accessions. These findings are reasonable from a historical perspective, as there was extensive cultural contact along the Silk Road from 207 BCE to 220 CE 60 . Therefore, historical and commercial influences may have contributed to the development of this unique species of cultivated apricot. The theory and method of phylogeography can reveal the historical dynamics of species or populations, such as expansion, differentiation, isolation, migration and extinction 29 . It is of great significance for us to understand the origin of species and the evolution of geographical patterns, and to better protect existing biodiversity. Phylogeographic studies have shown that ancient haplotypes and high genetic diversity can be used to identify refuges 1,61 . Populations in refugia usually display more genetic diversity and exclusive haplotypes than migratory populations 14 . Many scholars 17,19,24 have suggested that the complex geographic history of Northwest China may have provided refuges for species during glacial periods. By combining two markers, we showed that all the wild populations of apricots distributed in the Ili River Valley contained ancestral haplotypes/genotypes and had high genetic diversity (Table 2). These populations were located in areas considered glacial refugia for P. armeniaca, which appears to be a relic of Quaternary glaciation. The region provides a suitable climate for the biological community and protects the genetic diversity of P. armeniaca. Climatic changes during Pleistocene glacialinterglacial cycles had a dramatic effect on species distribution ranges, causing migration and/or extinction of populations, followed by periods of isolation, divergence and subsequent expansion 14 . Both neutrality tests and mismatch distribution analysis based on the cpDNA and ITS datasets suggested recent range or demographic expansion of wild populations of P. armeniaca. We estimated that the recent demographic expansion of the wild populations of P. armeniaca occurred 16.53 kyr ago, that is, at the end of the LGM 49 .
The selected taxon-sampling and fossil calibration strategies will influence the age estimation [62][63] . Due to the lack of fossil evidence for P. armeniaca, we used the peach-apricot divergence time as the calibration point. In this study, we tried to use a cpDNA dataset to estimate divergence time, and the effect was acceptable. However, compared with the median ages estimated by Chin et al. 50 (mean age of 31.1 Ma), the divergence time estimates in this study should be interpreted with caution because the limited coverage and low number of calibration points may lead to an overly high divergence time estimate for P. sibirica (mean age of 33.43 Ma).

Conclusion
Based on cpDNA and ITS data, the haplotype/genotype diversity of wild apricot populations distributed in the Ili River Valley was relatively high, and the haplotype/genotype diversity of DZGhcmd and DZGyn populations was greater than that of other populations. P. armeniaca exhibits genealogical structure. Affected by the Quaternary glaciation of the Pleistocene, the Ili River Valley in Northwest China served as a glacial refugium for P. armeniaca, providing the species with a suitable climate and preserving its genetic diversity. During the interglacial period, the species underwent a recent expansion in the face of favorable climatic and environmental Research involving plants. The