Introduction

Environmental factors play a key role in shaping patterns of genetic diversity and phenotypic variation in plant species1. They often display plasticity and genetic variation along environmental gradients, as these gradients can affect gene flow among populations or create extreme abiotic conditions. Interactions between the environment, phenotypic and genetic diversity have been studied in different plant species at the population level, emphasizing the effect of climate gradients (e.g., temperature and precipitation)2,3. Genetic diversity is crucial since it can act as a buffer against abiotic or biotic stress, anthropological disturbance, inbreeding depression, and interaction with invasive species2,4. Populations with a history of overexploitation and fragmentation can face genetic depletion that can severely impact the survival and evolution of the species by negatively affecting fitness characteristics5,6,7. Under extreme conditions such as prolonged drought or high temperature, high genetic diversity is a significant advantage given the higher possibility of having genotypes/alleles that vary in their response to extremes8, enabling adaptation to environmental changes9. A species' capability to occupy new niches can also be affected by the loss of genetic diversity as well as damaging population growth and breeding10,11,12. A population's response to the environment (land cover, climate, etc.) can additionally be examined using functional traits, traits that are thought to have a profound influence on plant fitness13,14,15. The plasticity of these traits under various environmental conditions is a fundamental mechanism that plants rely on to react to environmental changes. The convoluted relationship between genetic diversity, functional traits, and environmental heterogeneity of the locations at the population level has been the subject of quite a few studies16,17 and deserves more attention.

Tuberous orchids have a rich history of traditional use and have suffered from overcollection. The dried tubers of specific wild terrestrial orchids are milled to make a flour known as 'salep'18, which traditionally has been an essential ingredient of confectioneries19. In addition to conferring aromatic and flavor to formulations, it also is a thickening and stabilizing chemical agent20. Salep is integral to several medicines, beverages, and ice-creams21. Its critical rheological properties are due to the presence of polysaccharides, especially glucomannan (GLN; 16–55%)18, that is, natural fibers with high water solubility. These also have specific medicinal benefits such as normalizing blood sugar, inhibiting liver irregularities, and alleviating pancreas stress22. In recent years the harvest of tuberous orchids has increased due to high prices fueled by international demand, mainly from Iran's neighboring countries such as India, Turkey, and Pakistan23. Increasing public awareness of salep's benefits and growing demand is a dangerous combination that may jeopardize the long-term viability of natural populations of these orchids.

Besides the ongoing over-exploitation of the wild population of tuberous orchids, climate change may also hasten the extinction of orchid species24,25. Agricultural expansion, urbanization, and silviculture have exacerbated the loss of genetic diversity in orchid populations26,27. Such problematic circumstances have already taken their toll on orchids in Iran, and some orchid species, including Orchis mascula L., have been classified as endangered26. Therefore, conservation strategies are needed to maintain the genetic diversity in the remaining populations and assure the species' long-term viability28,29. However, the absence of reliable information on the orchid's population status or the ecological consequences of severe over-collection prevents the implementation of robust conservation plans.

Although in Iran, as in other countries, legislators have enacted detailed regulations in favor of germplasm protection. Unfortunately, these regulations have not been enforced properly, and current measures are not limiting the collection or marketing of natural resources, particularly orchid tubers23,27. Recently, numerous small-scale regional studies on the population genetic status of tuberous orchids in the west of Iran have been reported30,31,32. However, the data required to develop conservation programs to protect genetic diversity or to develop the germplasm stocks in breeding programs can only be gained through sufficiently large-scale analyses, preferably at the landscape level33,34. O. mascula is a relatively small perennial orchid widely distributed across Europe, western Asia, and northern Africa. In Iran, this species is distributed primarily in the north (Alborz Mountains) and the west (the Zagros Mountains)35,36. It is one of several widely collected terrestrial orchids with numerous traditional medicinal applications. Overcollection of tubers has reduced O. mascula population sizes to the point that this species is now endangered; thus, fundamental large-scale studies of the functional traits and genetic diversity responses are needed. This study aimed to sample a high number of individuals from 18 populations of the terrestrial orchid O. mascula in northern Iran to address the following questions: (1) does phenotypic plasticity in functional traits exist in response to environmental heterogeneity? (2) how is genetic diversity and population structure affected by the land cover pattern and climatic variables? (3) do any population(s) have superior biochemical and phenotypic traits to be utilized in breeding programs?

Results

Phenotypic and chemotypic plasticity in response to environmental conditions

Land cover significantly affected most of the traits (p ≤ 0.05 and p ≤ 0.01); the same applies to the population effect (Table 1). The variance component of PC1 and PC2 under land cover effect was considerably high (54.88 and 34.71, respectively). As agriculturally important characters, starch content and bulb fresh weight (BFW; 154.75 and 32.34, respectively) indicated the highest variance component, while floral traits generally had the lowest values (Table 1). Plant height (PLH) with 80.97% and PC1 with 74.52% showed the highest percent of variation (%var). In the other source of variation, populations as 'random effect,' variance component was generally low, and PC1(48.77) and glucomannan (GLN, 22.71) had the highest values. The two essential traits, BFW and dry bulb weight (BDW) observed to have the highest %var (83.10 and 64.49) among the traits. Interestingly, %var among individuals within populations (error) for floral traits was often higher than other traits.

Table 1 Results of unbalanced REML mixed-model ANOVA of first two principal components and 19 phenotypic and 3 biochemical traits for 198 individual plants from 18 natural populations.

Phenotypic traits varied substantially between populations. For example, in traits associated with stem and leaves, the populations from the east side of the Hyrcanian ecoregion, Klaleh (KAH), Ziarat (ZIR), Azarshahr (AZS), and Maraveh Tapeh (MVT) populations overall had the highest values in general with the last population MVT, holds the most significant values. When it comes to floral traits, the variation across the populations is almost intangible (Table 2). The MVT and Talesh (TAS) populations with 3.79 and 3.27 g were found to have the highest BFW, noticeably more than the majority of the populations. The former, with 2.22 g, had the highest value for BDW. Another population from that region, KAH with 1.23 g, also showed a high value for this economically important trait. The difference between the populations became more apparent for starch and GLN contents, in which populations from Golestan, the eastern part, had higher levels, and those from Golestan were superior regarding GLN content in general. This pattern was further analyzed by assessing the relation of these two compounds against land cover, where a reverse relationship emerged (Fig. 1); the content of GLN increased in pastures/grasslands while decremented in woodland. Starch, however, seemed more affected by land cover types, but in an opposite manner, in pastureland/grassland, its content was down to 2.5 g/100 and maximized in the woodland.

Table 2 Means ± standard errors of 19 phenotypic and 3 biochemical traits in 198 individuals from 18 natural populations of O. mascula.
Figure 1
figure 1

The graphs display the influence of land cover types on glucomannan (GLN; g/100 g) and starch (g/100 g) content. The difference in GLN content between pasture and woodland was significant at *P < 0.05 and nonsignificant between pasture and shrubland. Starch content under woodland land cover type versus pasture and shrubland was significant at *P < 0.05.

Hierarchical clustering on principal components (HCPC) and cluster analysis

The HCPC provides information on the factor map resulting from principal component analysis based on phenotypic and biochemical traits of the populations with a three-dimensional hierarchical tree, thereby providing a clearer view of the clustering pattern (Fig. 2). Dimension 1 explains 66.59% of the phenotypic variation, while the second dimension stands for only 23.94%. Results indicate three clusters that are projecting distinct positioning of the populations.

Figure 2
figure 2

Hierarchical Clustering on Principal Components (HCPC) based on phenotypic and biochemical traits of 18 O. mascula natural populations. Dimension 1 explained 66.59% of the phenotypic variation, while the second dimension was only 23.94%. The populations clustered into three groups reflecting the influence of land cover types. For an explanation of abbreviations, see Tables 6 and 7.

Two-way cluster analysis (heatmap) identified two main groups for phenotypic and biochemical traits (Fig. 3, column). Among biochemical traits, starch content and TAC formed a separated cluster (B) while the rest were placed in A. Traits such as GLN, number of leaves/plant (NLP), BFW, and BDW were grouped in a subcluster, and floral and leaf traits placed in AII. Two main clusters appeared at the population level (Fig. 3, row); cluster A included two subclusters; cluster AI encompassed populations from Mazandaran, Golestan, and Guilan provinces. However, populations in the subclusters were grouped based on similarity in their phenotypic or biochemical similarities, for instance, Talesh (TAS), Khalkhal (KHAL), Masouleh(MSL), Darestan (DRS), Chapoul (CHP), Khasib-Dasht (KHD) and Garmab-Dasht (GRD) all from Guilan province, clustered together. The three populations from the eastern side (MVT, KAH, and ZIR) stand out by forming a separate subcluster (AII). Four populations of Palamjan (PAJ), Mirkhamand (MKH), Kalech (KCH), and Kjour (KJR) with the lowest values of traits placed in the second cluster (B). Although discrepancies were observed in grouping patterns, clustering mainly succeeded in grouping the population according to their location or traits based on their identity (i.e., biochemical, tuber characteristics, leaves, floral features, etc.).

Figure 3
figure 3

Heatmap visualization of two-way cluster analyses for 19 phenotypic and 3 biochemical traits (row tree) and 18 natural populations (column tree) evaluated in the growing season of 2020 in northern Iran, mainly Hyrcanian ecoregion. Column annotations indicate grouping based on 18 populations, 3 provinces, bulb dry weight, GLN, starch, and spike length. For an explanation of abbreviations, see Tables 6 and 7.

Relationship between phenotypic and chemotypic traits and climatic variables

Correlation analyses revealed that the phenotypic traits negatively correlated with the annual precipitation and geographical region altitude and positively correlated with longitude. The important traits, PLH, number of flower (NF) and NLP responded positively to annual mean temperature (r = 0.7 and 0.6, P < 0.01), the same applies to GLN (r = 0.7, P < 0.01) and weakly with longitude. BWF and BDW weakly and negatively correlated to land cover (r =  − 0.4), but there was no correlation between land cover and other phenotypic traits (Fig. 4). The starch content significantly correlated with land cover and geographical altitude (r = 0.7, r = 0.6; P < 0.01) and weakly with annual precipitation (r = 0.5). Also, it was negatively correlated with longitude (r =  − 0.4) and showed no correlation with annual mean temperature. However, the GLN content significantly decreased with the increase in geographical altitude, annual precipitation, and land cover (r =  − 0.7, − 0.6, − 0.6, P < 0.01), respectively. A seesaw correlation was unveiled between GLN and starch (r =  − 0.9, P < 0.01), as observed earlier in Table 2 and Fig. 1. The former compounds correlated weakly with the increase of TAC, and the latter indicated a relatively positive relationship.

Figure 4
figure 4

Correlogram of 19 phenotypic and 3 biochemical traits with 9 environmental and climatic variables evaluated in 18 O. mascula populations in northern Iran, Hyrcanian ecoregion. For an explanation of abbreviations, see Tables 6 and 7.

Population genetics

Genetic diversity and differentiation of O. mascula populations

At the population level, TAS, MSL, and CHP had the highest observed alleles (Na = 2), while AZS had the lowest value (1.65) among populations with an average of 1.89. The highest effective alleles (Ne = 1.64) were related to the TAS population, and the lowest Ne (1.21) belonged to the ZIR population with an average of 1.46 (Table 3). Among the eighteen studied populations, the average polymorphism percentage (P%) was 31.14%, while MVT from Golestan had the highest (65.63%), and PAJ was the lowest (18.75%). With an average of 0.35, the highest value of Shannon's information index (I) was observed in the JAD population (0.49), while MSL and PBD populations (0.25) had the lowest level. An average of 0.42 expected heterozygosity (He) and 0.30 for Nei's gene diversity index (H) were observed. The parameters of He and H were the highest in the KAH population (0.55, 0.39, respectively), and the MSL population (0.31, 0.18, respectively) showed the lowest. The average of the populations' total genetic diversity (Ht) was 0.37, and the genetic diversity within populations (Hs) was 0.24.

Table 3 Genetic diversity statistics of 18 populations of Orchis mascula based on AFLP.

The coefficient of genetic differentiation among populations (G'st) was 0.35, indicating that 35% of the genetic variation was among the populations, and 65% was within the populations. The G'st was higher than average, considering that G'st between 0.05 and 0.15 is defined as moderate and values over 0.30 as high37. Thus, G'st reflected the significant genetic differentiation among the populations. These findings were consistent with the relatively low gene flow value (Nm = 0.46) observed among populations (Table 3). The principal coordinate analysis (PCoA) for 198 individuals of O. mascula revealed the presence of two groups (Fig. 5). Almost all 118 individuals of populations from Guilan (7 populations) and Mazandaran (5 populations) gathered into group I, and group II contained 80 individuals from 6 populations of Golestan. The pattern observed in PCoA confirms the clustering pattern revealed in heatmap visualization of two-way cluster analyses. Analysis of molecular variance (AMOVA) showed that value for genetic diversity occurred among populations (PhiPR = 0.388, p = 0.001) with a significant share of total genetic diversity (48%), whereas a larger proportion of variation (52%) existed within populations (PhiPT = 0.509, p = 0.001; Table 4). Differentiation among the three regions accounted for 20% of the genetic variance that existed among the three regions (PhiRT = 0.374, p = 0.001).

Figure 5
figure 5

Principal coordinates analysis of 198 individuals of 18 populations of Orchis mascula based on the genetic similarity matrix derived from AFLP markers divided the populations into two groups. For an explanation of abbreviations, see Table 6.

Table 4 Result of analysis of molecular variance (AMOVA) for 18 natural populations of Orchis mascula.

Population structure

Structure analysis divided the populations into two clusters at K = 2 (Figs. 6 and 7a), Cluster I, and Cluster II (Fig. 7b). Cluster I contained 117 individuals, among which 75 were from Guilan province and 42 individuals were from Mazandaran province (Fig. 7a, red). Cluster II included 63 individuals from Golestan province (Fig. 7a, green). However, 18 Golestan and Mazandaran individuals were identified as genetic admixtures (Fig. 7b, cyan). Structure analysis was further evaluated at K = 4 to see the consistency of the populations' membership, which changed to some level (Fig. 7c); Cluster I included 33 individuals, and 100% of their members came from the Guilan province. Cluster II had 24 individuals from Guilan province. Cluster III comprised 57 individuals, of which 15 were from Guilan and 42 from Mazandaran. However, this cluster had 24 individuals who shared some of them with Cluster IV (57 individuals), all belonging to Golestan populations. The admixture individuals were at the east of Mazandaran, more toward Golestan province (Fig. 7c, cyan color).

Figure 6
figure 6

Map of 18 populations of and Bayesian admixture proportions identified by STRUCTURE of 198 individual plants of Orchis mascula. (A) The optimum number of subpopulations was determined using ΔK in the Bayesian clustering method, K = 2; (B) K = 2 and (C) K = 4. The colors used in the maps are approximately associated with STRUCTURE clusters. Cyan-colored circles represent the 'admixed' populations. For an explanation of abbreviations, see Table 6.

Figure 7
figure 7

Heatmap visualization of two-way cluster analyses for 18 natural populations of Orchis mascula (row tree) and alleles generated by AFLP markers (column tree) evaluated in the growing season of 2020 in north of Iran, Hyrcanian ecoregion. For an explanation of abbreviations, see Tables 3 and 6.

Influence of environmental variables on genetic diversity

Correlation analyses revealed that population polymorphism significantly decreased with the increase of annual precipitation and geographical altitude (r =  − 0.9, − 0.7, p < 0.01, respectively) (Fig. 8), which suggested a pattern of low genetic diversity at high precipitation and elevation. The population polymorphism weakly and negatively correlated to land cover (r =  − 0.4), it also weakly and positively correlated to longitude (r = 0.4) and strongly to annual mean temperature (r = 0.6, p < 0.01). The population Nei’s genetic diversity demonstrated a negative correlation to annual precipitation (r =  − 0.6, p < 0.01), and positively to longitude (r = 0.6, p < 0.01). This parameter also potently and negatively correlated to land cover (r =  − 0.6, p < 0.01), precipitation in June (r =  − 0.4) and geographical altitude (r =  − 0.5). The population expected heterozygosity also significantly correlated to annual precipitation (r =  − 0.7, p < 0.01), negatively correlated to geographical altitude (r =  − 0.5), precipitation in June (r =  − 0.3), land cover (r =  − 0.4), positively with longitude (r = 0.5), annual mean temperature (r = 0.4), mean temperature in May (r = 0.4) and mean temperature in June (r = 0.4) (Fig. 8).

Figure 8
figure 8

Correlogram of genetic diversity indices and geoecology variable. Abbreviations: precipitation in May and June (Prec. June, Prec.May); Polymorphism (%; Polymo.); Shannon Diversity Index (Shann.Diver); Expected heterozygosity (Expec.Hetero); Nei's genetic diversity (Nei's.Gen); Mean Temperature annual (MT.ann), in May and June (MT.June and MT.May); Annual precipitation(Prec.ann); Altitude(Alt.), Longitude(Long.). For an explanation of other abbreviations, see Table 3.

In testing for isolation by distance (IBD), the relationship between geographical and genetic distance and geographic and phenotypic distance were insignificant (Table 5). These results indicate the lack of effect of IBD. On the contrary, in isolation by environment (IBE), relationships between genetic and phenotypic distances with environmental variables were revealed to be significant, showing the influence of essential variables such as mean temperature or annual precipitation on controlling the variation among the populations. Other tests also exhibited significant correlations between phenotypic and genetic distance. For example, in other partial Mantel tests, the correlation between residuals of genetic × geographic with phenotypic distance or phenotypic × genetic with geographic was nonsignificant. The dominant effect of the environment was observed in all the comparisons (Table 5).

Table 5 Correlations between Nei's genetic distance, phenotypic distance, climatic (19 environmental variables), and geographic (km) differences among 18 populations of Orchis mascula tested with simple and partial Mantel tests for IBD and IBE.

Marker-trait association

Applying two models of MLM1 and MLM2 to find a possible marker-trait association (Table S1) indicated strong comparisons of traits vs. AFLP markers with r2 > 0.40 at P < 0.01, where three of the markers revealed a simultaneous association with several important traits. For example, marker P-CCA + M-AGA-49 showed significant simultaneous associations with quantitative traits PLH, STL, NLP, LL, IL, NF, LLI, LMLL, BFW, and BDW on both MLM1 and MLM2 models (Table S1). Marker P-TGG + M-CTT-33 had concurrently significant associations with quantitative traits LW, IL, NF, LET, LIT, LLI, LMLL, LMLW, and biochemical traits GLN and TAC based on MLM1; and showed significant associations with quantitative traits of STL, LL, LW, NF, LET, LIT and biochemical trait GLN on MLM2 model. In addition, primer E-AGG + M-CGT-22 showed significant associations with quantitative traits PLH, IL, NF, WL, LMLL, and biochemical traits GLN and TAC based on MLM1. It showed significant associations with quantitative traits of PLH, SL, NLP, IL, NF, LLI, WL, and biochemical traits GLN and TAC based on MLM2 model (Table S1).

Discussion

Phenotypic plasticity in the context of environmental heterogeneity

It's often asserted that relying on phenotypic markers for genetic diversity assessment is an inferior approach given the significant influence of environmental variables38,39,40. While this is to some extent true, phenotypic traits are still valuable tools to unmask the responsiveness of the species across climatic gradients. Such information can reflect the fundamental pattern for species biodiversity conservation and find agronomically important traits with possible application in crop improvement programs. In this study, we examined the effects of environmental factors on genetic diversity and functional traits in O. mascula. To uncover the impact of sources of variation at the population level on phenotypic plasticity, an unbalanced REML mixed-model ANOVA was employed, showing the notable influence of two factors, land cover and the population (Table 1).

In this study, we observed a strong pattern of phenotypic response to environmental variables, revealing the importance of land cover type and precipitation. O. mascula is relatively tolerant to shade; however, deep shade can significantly jeopardize its survival. Several comprehensive studies have used this species to understand the influence of light availability on reproduction performance41,42. The dominant pattern witnessed in this study was the better performance of O. mascula in pasturelands/grasslands of east Hyrcanian, which could be linked to the availability of light and soil nutrients (i.e., not content) as opposed to the woodlands of Mazandaran and low altitudes of Guilan. The determinative effect of light was studied by41, who found that the overall growth rate was higher in coppiced than undisturbed woodland. Although plant species respond differently to shade, many usually do not succeed in flowering under deep shade42. In addition to light, higher nutrient availability in clear-cuts and pastures43 may have facilitated better growth performance.

High variation in floral traits, particularly in labellum length and width, has been reported recently in Orchis purpurea Huds.44, and in O. mascula and O. pauciflora Ten.45. The results of the current study were, to a large extent, consistent with these studies. In the comprehensive phenotypic study of Ebrahimi, et al.26. quantitative floral traits showed considerably higher variation than the results reported here; however, the number of flowers had the highest coefficient of variation (41.34%), similar to our results (62.07%, Table 1). Their study area was Abr Forest, where O. mascula manifested considerably higher variation among almost all qualitative and quantitative floral traits indicating high ornamental values in these populations.

Growth and the probability of flowering are in most tuberous orchids primarily associated with access to essential nutrients and the carbohydrates stored in the belowground organs46,47. Interestingly, two primary compounds in underground organs of O. mascula (GLN and starch) showed an inverse relationship (Fig. 1), similar to what has been previously reported by Tekinşen and Güner48 on several salep species, including Orchis italica Poir. and Orchis morio L.. The starch and GLN content reported on round-type tubers like O. mascula in Iran (6.15% and 22.1%, respectively)22 were notably lower than our observation (Table 2; 11.13 g/100 starch and 38.15 g/100 GLN). The dry matter produced in bulbs was considerably higher in populations in the east or at high elevations in the wast. High nutrient and light availability together with a higher amount of assimilated carbohydrates, can significantly contribute to better growth and reproduction. Nonetheless, in terrestrial orchids relying on stored material belowground may not be sufficient at the beginning of growth or reproduction when demand reaches its peak49. Improving local light conditions in woodland orchids by coppicing may directly or indirectly positively affect orchids50 and lead to larger plants with a higher flower number and larger inflorescence size. Similarly, the dry matter under the ground can result from the increased light conditions and explain why the difference in land cover significantly influenced plant performance in this study (Fig. 4).

Another major player shaping the observed patterns of phenotypic variation is precipitation, which clinaly decreased from 800 mm in the west of Hyrcanian to 250 mm in the east. The low rainfall may impose abiotic stress on O. mascula populations in Golestan and lead to increased phenotypic variation. It's possible that increased dry matter of tubers in areas with lower precipitation encourages the accumulation of carbohydrate compounds in O. mascula. Aggregation of carbohydrate compounds is a common drought tolerance strategy51. Low predictability of precipitation may contribute to a relative increase in floral trait variation on the eastern side. March-Salas, et al.52 reported enhanced within-individual and within-population variability in flowering phenology by less foreseeable precipitation patterns in Onobrychis viciifolia Scop. populations in a multigeneration experiment.

Population genetics and structure

Preserving genetic diversity is crucial since it can buffer against upcoming short-term and long-term climate-change-driven stress, disturbance, inbreeding depression, and interactions with hostile species2,53,54. In situations where high genetic diversity can be a significant advantage, the population faces unpredictable levels of abiotic stress in the environment, such as high temperature or water stress8. Population genetic diversity is the reflection of mating system, habitat fragmentation, and climatic elements, which survival of plant species depends on as the greater genetic diversity is, the higher chance of coping with various climatic changes.

In this study, we found that genetic variation among O. mascula populations follows the pattern of environmental heterogeneity and is significantly affected by climatic variables such as temperature, precipitation, and land cover. Genetic diversity (H) at the population level with an average of 0.30 was observed to be higher than the previous studies on O. mascula; for example, Jacquemyn, et al.41 reported an average of 0.16 or Gholami, et al.31 0.10. However, the latter used ISSR markers with a small sample size (5). Strong geographical affinity seemed to be followed by the pattern of genetic diversity among populations, analogous to phenotypic variation, the level of genetic diversity was notably higher in the populations of Golestan, where KLH had the highest H (0.39) and other populations from this province similarly showed high H (Table 3). As an orchid species with a mixed breeding system, O. mascula in this study was possibly affected by several factors. Land cover type appears to have been a significant player because the genetic variation among populations was considerably higher in pastureland/grassland populations. This response could be supported by the light availability of the eastern populations against woodland of the west and Mazandaran province; brighter conditions mean the prevalence of pollinators in addition to a higher number of flowers (as was observed in phenotypic diversity). Thus, the possibility of pollinators visiting more flowers available before leaving due to the food-deceiving nature of this species is higher; however, the gene flow (Nm) mean was low.

The genetic structuring of populations was strong in K = 2 and K = 4 (Figs. 6 and 7a), further supporting the high genetic differentiation (G'st: 0.35, Table 3). It could be argued that the large geographic distance from the west to the east of the Hyrcanian ecoregion (≈ 500 km) strongly contributes to the limited gene flow. However, this does not seem to be the case, as we found no evidence for IBD. On the other hand, the genetic structure mainly supports the prominent effect of the environment on shaping this pattern among populations. Low gene flow is often caused by habitat loss and fragmentation, leading to reductions in population size and increased spatial isolation55,56. Low within-population genetic diversity (52%) and high differentiation (G'st: 0.35) in this study are contrary to previous results reported on O. mascula, showing a high genetic diversity within population (92%) and low G'st (0.125)41. Moreover, the woodland type did not affect genetic partitioning. This is in sharp contrast with the results of this study, where 20% percent of the total genetic diversity was partitioned among three groups of land covers (woodland, shrubland, and pastureland). Differences in scale can explain these contrasting results. Whereas Jacquemyn, et al.57 studied 15 populations in the local landscape with slight variation in climatic conditions, in our study, populations were studied across a much larger scale leading to strong differences in climatic conditions between eastern and western populations and significant IBE. This suggests that populations have genetically adapted to the different precipitation regimes across Hyrcania and that this genetic adaptation has contributed to the observed high genetic differentiation. These results further support the findings of Siepielski, et al.58, who showed that precipitation was a dominant driver of natural selection in plant and animal populations for over 150 species.

Implication for breeding efforts

Given its economic importance, O. mascula population sizes have been reduced beyond recovery in many areas in the west, northwest, and north of Iran. Developing a breeding program could be a sustainable option to counteract the further loss of populations. Selection of the most suitable populations and cultivation of local plant material could represent the first steps in a sustainable management of O. mascula populations in the wild. Our results showed that populations from Golestan possessed several important desirable traits, including a high GLN content and bulb dry weight that meets the high-quality threshold48. Another approach to probe for favorable traits is the percent of variation (%var) populations display59. These traits mentioned above also had the highest variation percentage under the effects of land cover and population (Table 1). A phenotypic study on O. mascula populations by26 also indicated high variation for tuber characteristics, BFW & BDW in particular.

Additionally, marker-assisted phenotypic trait selection may provide heritable traits with high economic value. In this context, marker-trait associations based on MLM1 and MLM2 models revealed significant associations of P-CCA + M-AGA-49 with important traits such as PLH, BFW, and BDW, P-TGG + M-CTT-33 and E-AGG + M-CGT-22 with PLH and GLN (Table S1). The associated traits observed here are important in the adaptability of O. mascula as they were found to be most responsive in the populations coping with stressful conditions. These linked markers with another quantitative trait might also show a pleiotropic effect or may be located very closely60. Marker-trait association in terrestrial orchids, including O. mascula, has been previously reported by Gholami, et al.31, who also observed a significant association between ISSR markers and tuber dry weight. Notable traits with high potential for selection have been identified in this study. However, it should be noted that while all measurements were conducted on samples collected from natural populations, a common garden experiment can provide better insights into this trait relationship and its reliability for further application.

Our study illustrates the functional trait variation in response to environmental variables in relation to genetic diversity, offers empirical evidence that supports a combination of different datasets in population-level studies, and observes the interactions from a broader perspective to gain a more reliable estimation. Readily measurable functional traits and their incorporation with population genetics provide a deeper understanding of species' ability to adapt to environmental changes and facilitate using this variation for applications such as plant breeding61. Our findings in O. mascula can be used as the foundation of a plant breeding program that allows for the cultivation of O. mascula in an agricultural setting, thereby reducing the demand for wild-collected tubers. Given the current limitations of the legal protection provided for O. mascula, this may be the best way to conserve natural populations of this species, as well as other species of tuberous orchids.

Material and methods

Study sites and sampling

In the spring of 2020, eighteen populations of O. mascula were sampled in the Hyrcanian ecoregion (Table 6), following Flora Iranica and Flora of Iran62,63. This is an exceptionally rich biodiversity area containing relict species in Tertiary period forests covering extensive regions between Iran and Azerbaijan. Anthropogenic threats to wild species in these areas include exploitation of natural populations and habitat destruction through the construction of housing and industrial activity. It should be noted that in 2020 accessing the populations of O. mascula for this study was possible because graduate researchers are automatically allowed to pursue research within their field of study, exempt from common regulations to collect plant material unless the plant of interest is in the protected, disputed areas near borders or the collection is for commercial purpose. In this case, our study was a conservation/breeding attempt entirely in line with the current rules to access endangered plant species; thus, official permission or license was not needed. However, in Jul. 2021, new regulations were announced (the complete list can be found at https://rc.majlis.ir/fa/law/show/1667090) in which access to any wild population requires official permission. Dr. Gholizadeh identified the species identity of the samples, and voucher specimens from the populations were deposited in the University of Guilan Herbarium (GUM; no. 5631 to 5649), available for botanical studies with an official request.

Table 6 Details of geographical locations where populations of Orchis mascula were sampled.

A total of 198 individuals from 18 populations, with at least 20 m distance between individuals in each population, were collected from these populations. Nineteen quantitative phenotypic traits related to the size of the plants and flowers (Table 7) were recorded for each individual. A digital caliper was used to measure the quantitative traits with a precision of 0.01 mm. Using a digital scale, the fresh weight of bulbs was recorded, and subsequently, bulbs were dried in the oven at 72 °C and measured for dry weight afterward.

Table 7 Characteristics used in the phenotypic and biochemical analyses of Orchis mascula populations.

Young and fresh leaves were collected from each individual, immediately dried with silica gel, and transported to the laboratory for DNA extraction. For each population, the land cover was characterized by assessing the percentage of vegetation (e.g., woodland, shrubland, and pastureland/grassland). Using Google Earth Pro®, high-resolution aerial images of the locations were acquired. The interpretation of the images was based on visual classification using the ArcGIS 10.4® software (URL: https://www.esri.com/en-us/arcgis/products/arcgis-desktop/resources), a program used to calculate the areas and percentages of land use and land cover by each category.

Chemotyping

Glucomannan and starch contents

To produce salep powder from bulbs, a traditional method previously described by Tekinşen and Güner48 was used. For this purpose, the bulbs were washed with cold water, cleaned from dirt and mud, and boiled for 10–15 min in milk; then, the samples were dried at 21 ± 2 °C until they hardened (7–10 days). The dried specimens were first cut into small pieces and then powdered into flour by milling. Further analysis was carried out to measure the components of interest in the flour.

The samples were prepared according to procedures specified for GLN and starch using GLN (K-GLUM 10/04) and total starch (AA/AMG 11/01, AOAC Method 996.11) assay kits (Megazyme International Ireland Limited, 2004a, b). The GLN and starch contents of the samples were determined by measurement of absorbance values of prepared blind and sample solutions (for GLN A1, A2, A3; for starch ΔA, F) in a UV–Vis Spectrophotometer (Shimadzu – UV Mini 1240) at 340 nm (for GLN) and 510 nm (for starch) and using the following formula 1 and 2, respectively64,65:

$$\begin{aligned} & \Delta {\text{A}}_{{{\text{glucomannan}}}} = \left( {{\text{A}}3{-}{\text{A}}1} \right)_{{{\text{sample}}}} {-}\left( {{\text{A}}3{-}{\text{A}}1} \right)_{{{\text{blank}}}} \times 36.8 \, \left[ {{\text{g}}/100\;{\text{g}}} \right] \\ & {\text{Starch}} = \Delta A \times \left( {{\text{F}}/{\text{W}}} \right) \times 90 \, \left[ {{\text{g}}/100\;{\text{g}}} \right] \\ \end{aligned}$$

where ΔA = absorbance value of sample solution compared with blind solution; F = 100 (μg of glucose control)/absorbance value of glucose control (1.03); and W = the weight of the sample (100 mg). The final content was reported as g/100 g of dry matter.

Total anthocyanin content

To measure tepals and labellum's anthocyanin content (purple pigment), the pigment was first extracted with 0.5 mL methanol/0.1% HCl for 24 h in the dark. Then the absorbance was read at 510 nm using a UV–Vis spectrophotometer (UV1601; Shimadzu, Kyoto, Japan)66. The anthocyanin content is expressed as mg per gram of fresh weight.

Genotyping

Genomic DNA was extracted from silica gel dried young leaves of the individuals utilizing CTAB procedure67. The quality and quantity of extracted DNA were assayed by 1% agarose gel electrophoresis and NanoDrop® spectrophotometer and diluted with sterile distilled water to give a final 100 ng/μL. The method described by Vos, et al.68 was employed with slight modification for AFLP analysis. An amount of 250 ng genomic DNA (per sample) was homogenized with EcoRI and MseI primers without a selective base in pre-amplification and with three selective bases in amplification in 20 µL master mix (2 µL Buffer, 2 µL 10 X BSA, 0.5 µL MseI, 0.5 µL EcoRI, 4.5 µL double-distilled water). The obtained combination of genomic DNA and restriction enzymes were incubated for 12 h at 30 °C. Then double-stranded EcoR I (Eco) and Mse I (Mse) linkers were ligated to the restriction fragments without additional nucleotides and afterward amplified by primers with three additional nucleotides at the 3′-end (Table 1). In selective amplification, the combinations of primers were used. The PCR Thermo cycles were carried out with a program of 35 three-step cycles, including 94 °C for 1 min, 56 °C for 30 s, and 72 °C for 1 min. On 6% acrylamide gel in vertical electrophoresis, Bio-Rad, the DNA fragments were separated, and to reveal the bands, the silver staining method was applied69. The banding pattern of the amplified DNA fragments was turned into binary codes as presence (1) or absence (0), and only the consistently repeatable bands were scored. Weak or smeared bands were excluded, and the fragments of the same molecular weight were counted as the same locus.

Data analysis

Phenotypic data

A principal component analysis (PCA) was performed on the phenotypic data. The first two principal components were retained as they explained the highest portion of the total variation and were treated like traits and subjected to further analysis. To determine how phenotypic variation is partitioned between different types of land covers (fixed effect), among populations (random effect), and among individuals (error), an unbalanced restricted maximum likelihood (REML) mixed-model analysis of variance (ANOVA) was conducted for the 22 measured traits and PC1 and PC2 (Table 1). Means ± standard errors of the traits based on each population were also provided (Table 2). The analyses were implemented in PROC MIXED (SAS Institute, 2008, Cary, NC).

Hierarchical Clustering on Principal Components (HCPC) using "Factoextra" and "FactoMineR" packages70,71 was used to assess relationships among phenotypic traits. The phenotypic diversity expressed by the sampled O. mascula population was further explored with a two-way clustering analysis using the "Heatmaply" package in the R software72. Given the difference in the scaling of the traits, each trait was normalized using Z-score before cluster analysis.

A correlation analysis between phenotypic traits was carried out using Pearson's correlation coefficients (P 0.01) using the "CorrPlot" package73. The same package was utilized to assess the relationship between genetic diversity indices and environmental factors (average temperature, annual precipitation, and during the growing seasons, altitude, longitude, and latitude).

Molecular data

Genetic diversity

Genetic variation indices including the observed number of alleles (Na), the effective number of alleles (Ne), expected heterozygosity (He), Shannon's diversity index (I), Nei's genetic diversity (H), total heterozygosity (Ht), and intra-population genetic diversity (Hs) were computed using POPGENE v3.074 (Table 3). All these calculations assumed that populations were in Hardy–Weinberg equilibrium.

Genetic structure

A Cluster (heatmap) analysis was performed at the population level based on the matrix of Nei's unbiased genetic distance using "ggplot2"75 and "Pheatmap"76 packages in the R software. A principal coordinate analysis (PCoA) was carried out using the "logisticPCA" package77 in the R software to visualize the spatial structure. Bayesian model-based cluster analysis using STRUCTURE v2.3.478 was also used to infer genetic structure and define the number of clusters in the data set. The correlated allele frequencies and admixed model were applied with 50,000 burn-in and 100,000 MCMC. The assumed number of groups (K) varied from 1 to 10, and 10 runs per K were performed. The STRUCTURE HARVESTER79 determined optimum K based on L(K) and ΔK. The structure result at K = 2 and K = 4 were mapped according to geographical locations using the ArcGIS 10.4® software (URL: https://www.esri.com/en-us/arcgis/products/arcgis-desktop/resources).

Genetic differentiation

To assess the magnitude of genetic differentiation, the partitioning of total genetic variation at different scales was computed with analysis of molecular variance (AMOVA) in "vegan"80 in the R software. Shannon differentiation coefficient (G'st) was calculated based on the following formula, G'st = (Hsp − Hpop)/Hsp (Hsp, total Shannon information index; Hpop, average Shannon information index within the population), and the gene flow (Nm) among populations was estimated using POPGENE v3.074.

Isolation by distance/isolation by environment

To test the effects of geographic distance and environmental differences on the genetic and phenotypic structure, the genetic distances among populations were calculated using Nei's genetic distance implemented in R package "Poppr"81 and phenotypic distance using the "dist" function in R software. Euclidean geographic distances were calculated using the "fossil" package82. For sampling sites, climatic variables were collected from WorldClim version 2.067. Bioclimatic variables (19) were considered as different environmental space vectors and used the Canberra distance to calculate the distance between populations in this vector space. For isolation by distance (IBD) and isolation by environment (IBE), the relationship of genetic and phenotypic distances with geographic and environmental conditions was analyzed, respectively, using simple and partial Mantel tests implemented in the R package "vegan"80. Additionally, all other possible comparisons (geographic, phenotype, genetic, and environment) were tested.

Trait-marker association analysis

Two models were compared to test the marker-trait association between AFLP and phenotypic and chemotypic traits using TASSEL 4.0.1. (http://www.maizegenetics.net/bioinformat-ics) 83: a Mixed Linear Model (MLM) using the kinship matrix (K) estimated by the STRUCTURE HARVESTER79 as a random effect (MLM1) and an MLM using both Q and K (MLM2). Results were compared to determine a better model. Significance of associations between loci and traits was described as P-values (a probability level of 0.001).