Population structure and adaptive variation of Helichrysum italicum (Roth) G. Don along eastern Adriatic temperature and precipitation gradient

Immortelle (Helichrysum italicum (Roth) G. Don; Asteraceae) is a perennial plant species native to the Mediterranean region, known for many properties with wide application mainly in perfume and cosmetic industry. A total of 18 wild H. italicum populations systematically sampled along the eastern Adriatic environmental gradient were studied using AFLP markers to determine genetic diversity and structure and to identify loci potentially responsible for adaptive divergence. Results showed higher levels of intrapopulation diversity than interpopulation diversity. Genetic differentiation among populations was significant but low, indicating extensive gene flow between populations. Bayesian analysis of population structure revealed the existence of two genetic clusters. Combining the results of FST - outlier analysis (Mcheza and BayeScan) and genome-environment association analysis (Samβada, LFMM) four AFLP loci strongly associated with the bioclimatic variables Bio03 Isothermality, Bio08 Mean temperature of the wettest quarter, Bio15 Precipitation seasonality, and Bio17 Precipitation of driest quarter were found to be the main variables driving potential adaptive genetic variation in H. italicum along the eastern Adriatic environmental gradient. Redundancy analysis revealed that the partitioning of genetic variation was mainly associated with the adaptation to temperature oscillations. The results of the research may contribute to a clearer understanding of the importance of local adaptations for the genetic differentiation of Mediterranean plants and allow the planning of appropriate conservation strategies. However, considering that the identified outlier loci may be linked to genes under selection rather than being the target of natural selection, future studies must aim at their additional analysis.

. Sampling locations and genetic diversity revealed by AFLP markers in 18 H. italicum populations from eastern Adriatic coast. a N-North; E-East; Coordinates are in degree decimal format; n -sample size; %P -proportion of polymorphic bands; Npr -number of private bands; I -Shannon's information index; H E -gene diversity of population assuming Hardy-Weinberg equilibrium; DW -frequency down-weighted marker values. www.nature.com/scientificreports/ Both analyzes, with or without spatially informative priors classified the populations identically into two clusters as revealed by the STRU CTU RE analysis (Fig. S2). The correlation between the matrix of F ST / (1−F ST ) values and the matrix of natural logarithms of geographic distances (in km) between the analyzed populations was r = 0.510 and was highly significant (P Mantel < 0.0001). The coefficient of determination was R 2 = 0.260, revealing that 26% of the genetic differentiation between the analyzed populations can be explained by their spatial distance (Fig. S3).
Bioclimatic variation. Nineteen bioclimatic variables accounted for in this study were highly correlated.
A strong positive correlation (r > 0.70) was found in 26 cases, and in five cases a strong negative correlation (r < −0.70), out of 171 pairs examined, respectively (Table S2. Principal component analysis (PCA), based on the correlation matrix, showed that the first four principal components had eigenvalues greater than 1 and together explained 93.82% of the variance ( Table 2). The first principal component explained 39.36% of the total variance. A strong positive correlation with the first principal component (PC1) was found for five environmental variables (Bio01 Average annual temperature, Bio06 Minimum temperature of the coldest month, Bio09 Average temperature of the driest quarter, Bio10 Average temperature of the warmest quarter, Bio11 Average temperature of the coldest quarter). The second principal component explained 24.48% of the total variance and was positively correlated with four environmental variables (Bio12 Annual precipitation, Bio13 Precipitation in the wettest month, Bio16 Precipitation in the wettest quarter, and Bio19 Precipitation in the coldest quarter). The first principal component separated the southern Adriatic populations (P16 Živogošće, P15 Omiš, P13 Hvar and P12 Brač) from sampled sites characterized by higher temperatures and lower precipitation, from the northern Adriatic populations (P01 Krk, P02 Cres, P03 Lošinj, P04 Rab, P05 Zrće, Pag and P06 Miškovići, Pag) and central Adriatic populations (P14 Sinj, P11 Seget, P10 Unešić, P09 Kistanje, P08 Benkovac, and P07 Obrovac) where lower temperatures and higher precipitation were recorded. The second principal component separated the populations from the Kvarner Bay where higher precipitation and lower temperatures were recorded, from the populations from central Dalmatia where higher temperatures and lower precipitation were recorded (Fig. 2).  www.nature.com/scientificreports/ www.nature.com/scientificreports/ Adaptive genetic variation. The outlier loci were detected on a set of 446 AFLP markers (markers with a frequency under 3% or above 97% were removed). With a 99% confidence level using the Mcheza program, a total of 21 outlier loci (4.71%) under possible selection were detected, of which ten (2.24%) were under positive selection (directional selection) and 11 (2.47%) under balancing selection (Fig. 3a). BayeScan identified nine (2.02%) loci under positive selection that exceeded the critical value of log 10 (PO) threshold used to determine the significance of loci that showed atypical values at a false-positive probability level of 0.01 [FDR < 0, 01; PO = 29.03; log 10 (PO) = 1.463]. No locus was under balancing selection (Fig. 3b). Mcheza and BayeScan together detected a set of seven (1.57%) markers that were potentially under positive selection. By calculating logistic regressions between all possible marker pairs and bioclimatic variable (8,474 models), the Samβada program revealed 184 (2.17%) significant models that included 50 (11.21%) markers associated with one to 13 bioclimatic variables. The Mcheza and BayeScan, identified 12 loci, and the Samβada program 41. Bioclimatic variable associated with more than 20 markers were Bio08 average temperature of the wettest quarter, Bio14 amount of precipitation in the driest month, Bio15 coefficient of precipitation variation, Bio17 amount of precipitation in the driest quarter, and Bio18 amount of precipitation in the warmest quarter (Table 3). Of 50 markers detected by Samβada, five markers were also identified by Mcheza and BayeScan to be under positive selection. Latent factor mixed model (LFMM) identified 50 markers associated with one to 15 bioclimatic variables. Out of the total 446 markers, four markers (0.89%) were identified by all four methods, as shown in the Wenn diagram (Fig. 3c). The bioclimatic variables Bio03 Isothermality, Bio08 Mean temperature of wettest quarter, Bio15 Precipitation seasonality, and Bio17 Precipitation of driest quarter, were correlated with the highest number of loci and may have played a key role in adaptive divergence in populations H. italicum in the eastern Adriatic (Table 3). Six uncorrelated bioclimatic variables, four temperature-related (i.e., Bio03, Bio04, Bio08, Bio10) and two precipitation-related (i.e., Bio18, Bio19) were selected for linear model redundancy analysis (RDA). The optimal RDA model included three variables: Bio18, Bio04 and Bio03. The model was highly significant (P < 0.0001) and explained 20.17% (adjusted R 2 = 0.2017) of the inherent genetic variation, indicating an important role of these environmental variables in shaping the distribution of AFLP genotypes. The first two RDA axes were significant and explained 19.82% and 8.10% of the variation, respectively. Variable Bio18 Precipitation of warmest quarter, significantly associated with RDA axis 1, contributed to the partitioning among northern, central and southern H. italicum populations. Both variables Bio03 Isothermality and Bio04 Temperature seasonality were associated with RDA axis 2 differentiating central H. italicum populations from the rest (Fig. 4).

Discussion
Molecular analyzes of the genetic diversity and structure of H. italicum populations from the eastern Adriatic coast using AFLP markers revealed high intrapopulation diversity and low differentiation between populations as well as the population structure characterized by a pattern of isolation by distance. The dynamics of gene set in species is strongly influenced by life history traits such as life form, geographic range, reproductive system, seed dispersal mechanism and successional status [52][53][54] .
Recent studies have been conducted in the study area and have contributed significantly to the knowledge of species that are typical representatives of the Mediterranean climate, e.g., Dalmatian pyrethrum (Tanacetum cinerariifolium (Trevir.) Sch. Bip.) 55 and sage (Salvia officinalis L.) 56 . Our results are comparable with studies on the above species, because the species have similar life traits. They are thermophilic, xerophytic, outcrossing perennials. Although these species generally have a wider range, their distribution pattern overlaps on the eastern Adriatic coast. The overall genetic diversity of H. italicum populations, indicated by the Shannon information index, showed intermediate values (0.355) compared to the populations of S. officinalis (0.387) and T. cinerariifolium (0.223). All three species are of great economic value, either as medicinal plants or, in the case of T. cinerariifolium, as a source of the potent natural insecticide pyrethrin. Overexploitation has been observed in all three species but may have most affected the genetic diversity of T. cinerariifolium, the species that has been extensively collected over long periods. In the case of H. italicum, collection of wild populations has increased more recently, in the last decade, and the impact on overall genetic diversity will become apparent in the future. Collection of immortelle from natural habitats is legally regulated in Croatia by the Nature Protection Act 57 and the Ordinance on Collection of Native Wild Species 58 . Nevertheless, it is necessary to systematically monitor habitats, as harvesting regulations are often not respected. Results of the research showed that differentiation between populations of H. italicum was low, indicating extensive gene flow between populations 59 , and compared to S. officinalis and T. cinerariifolium it was the lowest. The AMOVA results showed greater intrapopulation genetic variation in H. italicum, which is typical for outcrossing plant species 54 . The greater genetic variation within populations compared to variation between populations was also observed in another outcrossing plant species from the family Asteraceae Xanthium italicum Moretti, naturally distributed on Corsica 60 . The similar partitioning of genetic diversity between and within populations was also found in the studies by Grdiša et al. 55 and Jug-Dujaković et al. 56 .
The results of the model-based methods showed clustering of H. italicum populations into two distinct cluster: A, with populations from the north (Kvarner Bay) and B, with populations from the central and southern part of the study area (Dalmatia). These findings are congruent with the Fitch-Margoliash tree. The eastern Adriatic coast and the coastal Dinarides were heavily glaciated at various times during the Pleistocene 61-63 , which is well documented in the study of Quaternary sediments along the north of east Adriatic coast and the islands of Krk, Rab and Pag, as well as northern Dalmatia 63 . Much evidence suggests that glaciers were the barriers along the eastern Adriatic coast 64 and may have been the reason for the separation of these two ancestral groups for a long time. We assume that the populations of H. italicum survived in the mini-refugia (microecologically favorable pockets) during various unfavorable climatic events and reconnected when climatic conditions became favorable again. This theory is confirmed by the existence of four refugia for the eastern Adriatic coast and the coastal  www.nature.com/scientificreports/  www.nature.com/scientificreports/ Obrovac area. Grdiša et al. 55 , also identified strong genetic differentiation of the north Adriatic genetic group for T. cinerariifolium with similar phylogeographic split at Kvarner Bay, near the Zrmanja canyon. In another study, Liber et al. 75 found a distinct cluster at Kvarner Bay for S. officinalis, which also suggests the existence of refugia in the northern Adriatic. In the study by Rešetnik et al. 72 , a phylogeographic split is found between genetic clusters of S. officinalis located further south (at the border between northern and central Dalmatia). Glasnović et al. 73 , in their studies of Edraianthus tenuifolius, also indicated the possible presence of these two separate "refugia within refugia" during the LGM. The results can also be compared with studies of Edraianthus tenuifolius 68 , Campanula pyramidalis 76 and Cardamine maritime 77 , which showed a similar phylogeographic split, but in the area of the Neretva Valley (Central Dalmatia). The rarity (DW value) in the northwestern H. italicum populations was higher than in the other parts of the study area, indicating long-term survival of the populations in the north 5 . The Shannon information index of H. italicum was also higher in the northern Adriatic populations. The high rarity of the northern Adriatic populations (Kvarner Bay) was also reported by Grdiša et al. 55 for T. cinerariifolium and Liber et al. 75 25 reported that 5-10% of loci are outliers when summarizing data from available studies. In the study of the species Mikania micrantha Kunth 78 , 14 outlier loci (2.9%) were identified using the Dfdist and BayeScan programs, while in the analysis of the species Eruca sativa Mill. 30 nine loci were identified, but only three of them were confirmed by another method (1.6%). In mountain populations of Sideritis scardica Giseb. 79 , a lower number of loci were also identified under selection by BayeScan (3.10%) compared to Mcheza (5.31%). Similar results were obtained by other authors who emphasize the more conservative approach of the Bayesian method, that works more efficiently as it detects a larger number of outlier loci with a lower presence of false positives [80][81][82] . Therefore, it is recommended to use more methods that incorporate different approaches and use different algorithms to increase the probability that the detected loci are truly adaptive.
When adaptive loci are combined with climate variables, there is also the possibility of determining which climate factors are responsible for adaptive evolution 83 . The correlation between allele frequency variation and climatic variables were assessed by using the Spatial Analysis Method (Samβada), which does not depend on genetic models and operates at the individual level 84 21 used both frequency based (Dfdist and BayeScan) and correlation based (MLM) methods and showed that six outlier loci were strongly associated with at least one climate factor (temperature, precipitation, and radiation being the most important factors) in Liriodendron chinense (Hemsl.) Sarg. Authors Oberprieler et al. 34 reported that using three methods (Mcheza, BayeScan, and Samβada) 732 AFLP loci screened revealed only 1.6% (full dataset) and 0.4% (reduced data set) of all loci were found to be under selection in Diplotaxis harra (Forssk.) Boiss. In the study of Sideritis scardica Giseb. 79 , seven outlier loci were identified by Mcheza, BayeScan, and Samβada and associated with bioclimatic variables with precipitation identified as a significant environmental factor driving adaptive genetic variation. Temperature and precipitation were the most important environmental factors triggering adaptation in Rhododendron oldhamii 85 , Keteleeria davidiana var. formosana 86 , Salix species 87 , and alpine species 11,[88][89][90][91] . Another method, LFMM has been used to test gene-environment associations while estimating the effects of hidden factors that represent background residuals of population structure. The advantage of the LFMM method is that it has a low rate of false positives and negatives 92 , and it also offers the best compromise between detection capabilities and error rates when dealing with complex hierarchical neutral genetic structure 93 . The LFMM has recently been used in several studies. For example 94 genetic divergence in Pinus bungeana Zucc. ex Endl. was investigated and six environmental variables were identified that were related to the ecological habitat of the species and were correlated with the highest number of environmentally associated loci. In the study of Li et al. 95 , annual mean temperature, annual precipitation and slope were considered to be the most important environmental factors associated with adaptive genetic divergence in Cunninghamia konishii Hayata. Our results obtained by combining F ST -outlier analysis (Mcheza and BayeScan) and genomeenvironment association analysis (Samβada, LFMM) showed that the most important environmental variables for adaptive genetic divergence in H. italicum populations on the eastern Adriatic coast were: Bio3 Isothermality, Bio08 Mean temperature of wettest quarter, Bio15 Precipitation seasonality, and Bio17 Precipitation of driest quarter. The observed natural populations of H. italicum inhabit sites under Mediterranean climate characterized by hot summers and cool winters with unevenly distributed precipitation prevailing in winter. Mediterranean plants cope with drought by developing different mechanisms that allows them to survive unfavorable or insufficient precipitation distribution 96 . In our study, the variable Bio15 Precipitation seasonality describes the differences between minimum and maximum precipitation values; therefore, our results suggest that H. italicum populations might be adapted to large fluctuations/amplitudes in the amount of precipitation. In Mediterranean climates, large rainfall amplitudes are common, and Mediterranean species strive to adapt and survive extremes. As Mediterranean region is highly vulnerable to climate change 96 future projections for global climate change state that more uneven temporal distributions of precipitation are expected 97 . Climate models suggest that the hydrological cycle will intensify due to rising temperatures and increasing evaporation, leading to more storms www.nature.com/scientificreports/ and precipitation in particular areas, while drought will occur in areas far from storms 98 . According to Giorgi and Lionello 99 and Matusik et al. 100 the decrease of precipitation and increase of warming is expected in the future, together with increased inter-annual variability of both precipitation and temperature, which will cause greater frequency of extremely arid periods. Plant adaptation to seasonal water stress in Mediterranean climate is an important driver of genetic differentiation 96 .
The RDA approach used to study the contribution of bioclimatic variables to the genetic structure of natural populations of H. italicum also identified Bio03 Isothermality as important bioclimatic variable, together with Bio04 Temperature seasonality and Bio18 Precipitation of the warmest quarter as the main bioclimatic variables distinguishing northern, southern and central Adriatic populations. The variable Bio18 Precipitation of the warmest quarter contributed to the partitioning of the northern populations of H. italicum, as they are exposed to greater amounts of precipitation during the warmer months, leading to an adjustment of the populations in this direction. The two variables Bio03 Isothermality and Bio04 Temperature seasonality influenced the differentiation of the central H. italicum populations from the rest. Both variables are a measure of temperature heterogeneity, with Bio03 Isothermality quantifying how temperatures vary in relation to annual oscillations, while Bio04 temperature seasonality measures temperature changes throughout the year 101 . As mentioned above, the Mediterranean region is characterized by strong seasonality, both in temperature and precipitation, and the climate results in strong, opposing pressures on species 102 . Seasonality is an important component of climate that influences the availability of resources and thus the distribution of species in the environment at both temporal and spatial scales 103 . Central inland H. italicum populations cope with the wide range of daytime and nighttime temperatures, as well as cold winters and high summer temperatures, and are most likely adapted to temperature fluctuations, making them the most likely candidates for species persistence under ongoing climate change.
The development of numerous genome scanning and spatial statistical methods has facilitated analyzes and our knowledge of adaptability in non-model species, but there are some limitations to these methods. AFLPs are useful marker systems with high reproducibility and ability for discovering polymorphism without previous information of the genome 89 , but they are also poorly informative because they are dominant and biallelic, involving only reduced and anonymous part of the genome. The key weakness of genome scans is that they often detect false positives due to deviations from Hardy-Weinberg equilibrium and population structure model assumption 104 . Demographic events such as bottleneck, allele surfing during population expansion, secondary contact, and isolation by distance can mimic selection 12,105 , making it difficult to differ selection from demography 106 and consequently to draw inferences about selection. Sampling strategy helps reduce occurrence of false positives by including abundant number of sampled individuals, more than 10 per site for allele frequency-based methods (e.g., F ST ) 107,108 . Another way to reduce number of false positives and have more confidence in the outliers found is to use different approaches simultaneously and set strict thresholds, as was performed in this research. Four (0.89%) loci detected by all four methods (Mcheza, BayeScan, Samβada and, LFMM) are expected since only small number of them are affected by natural selection. These loci are possibly linked to genes under selection due to the the 'hitchhiking effect' , but their location and function is unknown due to a lack of prior knowledge about the genome structure of the species under study. Therefore, this research is the first step in finding evidence for adaptive divergence of H. italicum and additional analyzes involving the discovery of the location and function of detected loci are needed. Adaptation to precipitation and temperature oscillations in H. italicum populations appears to be the most important trait to be further investigated using genotype-phenotype association studies.
The analyzes performed in this study and those proposed for future investigations could be the basis for future conservation strategies of H. italicum on eastern Adriatic coast. Extended sampling in the Mediterranean region should also be included in future analyses as the results obtained and conclusion derived in this study may not be applicable to the entire distribution range of the species. On the eastern Adriatic coast, in addition to longer-term problems such as degradation, habitat loss and climate change, increased demand for H. italicum has led to overharvesting, which could also lead to a long-term decline in genetic diversity. Evaluation of genetic diversity allows the identification of populations that have lower genetic diversity and are more vulnerable so conservation measures can be focused on them. Conservation of entire habitats through monitoring in the wild (in situ) can help legislators respond in a timely manner by adopting regulatory measures regarding the amounts of plant material allowed to be collected. Ex situ conservation should include seed banks with special attention to covering the most genetic diversity of the species 109 . Another possible solution is to promote the cultivation of H. italicum based on knowledge gained from molecular analysis and the adaptive potential of the species, which should be incorporated into breeding programs.  www.nature.com/scientificreports/ area, populations P07-P10 in the central part and populations P11 to P18 in the southern part of the study area. The average distance between sampling sites was 145.83 km, with minimum and maximum distance of 7.93 km and 415.94 km, respectively. The climatic data on precipitation and temperature conditions of each sampling site were taken from the WorldClim database at a spatial resolution of approximately 1 km 2110 (available at: www. world clim. org). Generally, the average annual temperature increases from the northern to the southern parts of the study area, while the amount of precipitation decreases. Precipitation deficit becomes greater and lasts longer from the northern to southern part of the eastern Adriatic coast 111 .

Materials and methods
DNA extraction and AFLP fingerprinting. DNA samples were extracted from 23 to 25 individuals per each of the 18 populations. Total genomic DNA was isolated from 25 mg of silica gel dried leaf tissue using a DNeasy Plant Mini Kit (Qiagen®). DNA concentrations were measured using a P300 NanoPhotometer (Implen®).

Data analysis.
Within-population diversity. AFLP fragments were scored using the GeneMapper v. 4.0 software (Applied Biosystems) and fragments from 100 to 500 bp were analyzed. Error rates were estimated using the scanAFLP v. 1.2 114 . Genetic diversity within the populations was estimated by calculatating the percentage of polymorphic bands (%P), the number of private alleles (N pr ) and the Shannon's information index (I) 115 . The Shannon's information index was calculated as I = − Σ (p i log 2 p i ), where p i is the phenotypic frequency 115 . The frequency down-weighted marker values (DW) were calculated according to Schönswetter and Tribsch 5 using AFLPdat 116 in R 117 .
Population differentiation and structure. Analysis of molecular variance (AMOVA 118 ) was used to partition the total AFLP diversity between and within H. italicum populations. Analysis was performed using the program Arlequin v. 3.5.2.2 119 and the significance of the φ ST values was calculated based on 10,000 permutations. Pairwise population comparisons examined with AMOVA yielded a matrix of φ ST values corresponding to the proportion of total variance shared between two populations that can be interpreted as the distance average between any two populations 120 .
Based on the frequency of amplified fragments of each AFLP marker in each analyzed population, allelic frequencies were calculated using the Bayesian method with a non-uniform prior distribution of allele frequencies according to Zhivotovsky 121 , implemented in AFLP-Surv v. 1.0 122 . We assumed that populations were in Hardy-Weinberg equilibrium (F IS = 0) due to the outcrossing nature of H. italicum. The calculated allele frequencies were used to analyze genetic diversity within and between populations as described in Lynch and Milligan 123 . Total gene diversity (H T ), average gene diversity within populations (H W ), average gene diversity between populations that exceeds that observed within populations (H B ), and Wright's F ST were used to describe the genetic structure of populations.
Standard Nei genetic distance (D NEI72 ) was calculated between all populations pairs of using AFLP-Surv v. 1.0 122 . An unrooted Fitch-Margoliash tree based on pairwise Nei's standard genetic distances between populations was created using the FITCH program v. 3.6b (PHYLIP 124 ). The bootstrap method was used to create 1,000 distance matrices using the AFLP-Surv, and the bootstrap values were calculated using the FITCH and CONSENSE programs within PHYLIP software package.
The genetic structure of H. italicum populations was assessed using two Bayesian model-based clustering methods as implemented in STRU CTU RE v. 2.3.4 125 and BAPS v. 6 126,127 . In STRU CTU RE, the number of clusters (K) was set from 1 to 11 and 30 runs per K were performed on the Isabella computer cluster at the University of Zagreb, University Computing Centre (SRCE). An admixture model with correlated allele frequencies was applied with a burn-in period of 200,000 steps and 1,000,000 MCMC replicates after burn-in. The detection of the most likely number of clusters was performed as described by Evanno et al. 128 , and implemented in STRU CTU RE HARVESTER v. 0.6.94 129 . Results from independent runs were clustered and averaged using Clumpak 130 to obtain the Q-value matrix.
Another Bayesian model-based analysis was performed using BAPS 126,127 to verify data obtained from STRU CTU RE. Mixture analysis was performed both without the geographic coordinates as an informative prior ('Clustering of individuals') and with this prior ('Spatial clustering of individuals' 131 . BAPS was run with a maximal number of clusters (K) set to 20 with each run replicated 10 times. The best K value was assessed using the log marginal likelihood values of the best partitions and the distribution of posterior probabilities for different K values. To detect admixture between clusters the results of the mixture analysis were used as input to the population admixture analysis 132  Analyzes were restricted to loci with dominant allele frequencies between 5 and 95% across the whole dataset to avoid a bias in global F ST estimates 140 . The methodology used in Mcheza is based on the assumption that loci under directional selection have significantly higher F ST values than the majority of neutral loci in a sample. In contrast, loci under balancing selection are expected to exhibit significantly lower F ST values. The neutral distribution of F ST values was determined by 1,000,000 iterations using the 'Neutral mean F ST ' and 'Force mean F ST options. Outlier loci were selected with a confidence interval (CI) of 99% and a false discovery rate (FDR) of 0.1 135,136 . BayeScan evaluates individual loci within a hierarchical Bayesian model that decomposes genetic variation into population-and locus-specific effects. For each locus, two models are defined that include or exclude the effect of natural selection. The posterior probabilities of these two models are then estimated using a reversible-jump MCMC approach. We used 20 pilot runs with 5,000 iterations to fit the proposal distribution to acceptance rates between 0.25 and 0.45. Analyzes were performed with a burn-in of 50,000 iterations, a sample size of 10,000, and a thinning interval of 50, resulting in a total number of 550,000 iterations. The logarithm of Posterior Odds [log 10 (PO)] greater than 1.5 was considered as a 'very strong' evidence for selection 24,141 . The false discovery rate 142 (FDR) was set to 0.01 to fit the corresponding log 10 (PO) significance threshold.
In Samβada, the spatial analysis method was used to calculate multiple univariate logistic regressions to test the probability of the presence of an allelic variant given the values of environmental variables of the sampling sites. Significance was assessed with both the log-likelihood G ratio and Wald test using Bonferroni correction (P < 0.01). A model was considered significant only if both tests rejected the corresponding null hypothesis 84 .
Gene-environment associations have also been identified using latent factor mixed models (LFMM 92,143 ) implemented in the R package lfmm 137 . In LFMMs, associations between genetic variation and environmental variables are tested, while estimating the effects of hidden factors representing background residual levels of population structure. Thus, in LFMMs, environmental variables are introduced as fixed effects, while population structure is modeled by latent factors. The number of latent factors (K) was set to two based on the results of STRU CTU RE and BAPS analyses. Regularized least squares were calculated using a ridge penalty (lfmm_ridge) and estimated the genomic inflation factor (GIF) was estimated based on median z-scores (lfmm_test). To account for multiple testing, p values were converted to q values using the R package qvalue 144 . Significant associations were selected based on a false discovery rate (FDR) of 5% (q < 0.05).
Linear model redundancy analysis (RDA), implemented in the R package vegan v. 2.5-7 139 , was used to analyse the effects of environmental variables on AFLP variation among populations. The highly correlated environmental variables (|r|> 0.70) were excluded before analysis. Hellinger-transformed allele frequencies 145 were calculated in vegan based on the AFLP allele frequencies estimated using AFLP-Surv. The optimal model was determined using the ordiR2step function in vegan using a forward selection procedure with 10,000 permutations. To test the significance of the RDA model and constrained-axis, the vegan function anova.cca was run with 10,000 permutations.

Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.