Introduction

Climate warming is affecting the biodiversity and functioning of plant communities across the globe, and in response to this changing climate, species are now shifting their geographical distributions and expanding their ranges across elevational and latitudinal gradients1,2. This range expansion can increase the survival chances of species that might get affected by climate change driven shrinkage of suitable habitats; however, individual species' range responses vary greatly, with some species changing their ranges and/or local abundances rapidly while others do so slowly or not at all3,4. A growing body of literature seeks to explain this variation in range shifts using species’ ecological and life-history traits, with expectations that these shifts are determined by the capacity of species to disperse, establish new populations, and proliferate in the new environments5,6. In this race for survival, clonal plants appear to be at a particular disadvantage due to their limited mobility and limited capacity for adaptation7. Although, clonal plants have persisted for thousands, or even millions of years during past environmental changes8, the future climate change is predicted to advance much faster than during post-glacial times9. If a species can neither adapt to the modified environmental conditions nor migrate fast enough, then population extirpation or in the worst-case extinction of entire species is expected10,11. This extinction can be avoided if the populations undergo evolutionary adaptation12,13,14; however, the evolutionary potential of a population in turn depends on the pre-existing genetic variation within the species, and a high level of standing variation may allow a faster response to environmental changes15,16. Thus, empirical studies that estimate genetic diversity in clonal species with limited distribution ranges are important for understanding their possible survival chances in anticipated climate change.

Although the advent of new molecular tools had a major impact on the study of clonality in plant species, the results from genotyping have quite often been interpreted without consideration of the restrictions of the genetic markers used17, probably resulting in biased estimation of the extent of clonality. Moreover, the low resolution of allozyme markers often leads to an overestimation of the extent of clonality18. On the other hand, due to failure to account for small differences between multi-locus genotypes because of PCR and scoring errors or somatic mutations, the use of high-resolution markers such as Amplified Fragment Length Polymorphisms (AFLPs) may lead to an under-estimation of the extent of clonality19. Some molecular markers, e.g., Simple Sequence Repeats (SSRs) have been the marker of choice due to their high polymorphism detection, high reproducibility, co-dominant nature, cost-effectiveness, and ease of study20. In addition to molecular markers, Internal Transcriber Spacers (ITS) have been frequently used for phylogenetic analysis21 and the study of genetic diversity. ITS ribosomal DNA length variants have been reported for several plant species22,23. In the present study, eight SSR markers, initially designed for Sambucus nigra, and ITS1–ITS4 region Sanger sequencing, were used to quantify the genetic diversity of Sambucus wightiana (an endemic clonal plant species) to predict its survival chances and potential spread during anticipated climate change. The species is commonly found in forest gaps making way into the canopy, threatening the suitable habitat of many important medicinal plants, decreasing understory plant diversity, and potentially hindering the natural regeneration of conifer saplings.

Results

Allelic and gene diversity

SSR markers used during the present study successfully amplified expected genomic regions in a set of 37 S. wightiana accessions/genotypes. Multiple loci were amplified by most of the SSR markers as evident by the fact that these 8 SSR makers amplified 17 loci ranging from 1 locus to 3 loci with an average of 2.12 loci per SSR marker. While counting the number of alleles per marker, it was revealed that a total of 73 alleles were amplified by all the 8 SSR markers using 37 S. wightiana genotypes. The number of alleles varied from 7 for SSR marker “EMSn016” and “EMSn003” to 1 for SSR marker “EMSn023” and “EMSn025” with an average of 9.12 alleles/marker. The average number of alleles detected per loci was 4.29 (Total alleles = 73/Total loci = 17) and the percent polymorphic loci was 70.59%. The analysis of gene diversity data revealed moderate gene diversity (He) noticed for all SSR marker loci, which varied from 0 (for monomorphic loci) and 0.818 with an average of 0.526 (SE = 0.086). Except for the monomorphic loci of Sn023 and Sn025 the average He was high with an average of 0.745. The mean values uHe and I were 0.534 (SE = 0.087) and 1.086 (SE = 0.179) respectively (Table 1). Further analysis of marker data based on 37 genotypes of 7 watersheds is presented in Table 2. The analysis revealed that the average number of alleles (Na) ranged from 1.412 for sub-population 6, 7 to 3.059 for sub-population 1. The private alleles were found in only 2 populations: Pop1 and Pop4 with 0.143 and 0.167 frequency with an overall mean of 0.11. The mean effective number of alleles ranged from 1.412 to 2.458, which were comparable to the average number of alleles. The gene diversity (He) and uHe values ranged from 0.206 to 0.445 and 0.275 to 0.475 respectively. Also, Shannon’s information index (I) was calculated for populations with the highest diversity found in Pop1 (I = 0.828, SE = 0.151) and lowest in Pop6 (I = 0.285, SE = 0.085).

Table 1 Genetic diversity estimates for Sambucus wightiana at seventeen loci and averaged over all sub-populations.
Table 2 Mean allelic patterns across populations of Sambucus wightiana.

Clustering and principal coordinate analysis (PCoA)

To understand the patterns of variations of genetic diversity by SSR data, two multivariate methods were employed; clustering and principal coordinate analysis.

Clustering

The clustering analysis of the 37 genotypes collected across the geographical spectrum of the S. wightiana in the Kashmir valley revealed three main clusters (Fig. 1). Cluster I contain two small sub-clusters Ia (Samb-1 and Samb-3) and sub-cluster Ib. The sub-cluster Ib is further divided into 2 more small clusters Ib1 (Samb-13 and Samb-32) and Ib2 (Samb-33, 34, and 36). Cluster II is sub-divided into two clusters, sub-cluster IIa (Samb-25, 26, and 27) and sub-cluster IIb. The sub-cluster IIb is further divided into two small clusters cluster IIb1 (Samb-7, 14, 15, 16, 17, and 21) and IIb2 (Samb-11, 18, 19, 20, 23, 24) with several smaller sub-clusters. Cluster III is divided into two sub-clusters, sub-cluster IIIa and sub-cluster IIIb. Sub-cluster IIIa is further divided into two smaller clusters cluster IIIa1 (samb-11, 12, and 35) and cluster IIIa2 (Samb-8, 9, 10, and 28). Sub-cluster IIIb is further divided into IIIb1 (Samb-2) and IIIb2 (samb-4, 5, 6, 29, 30, 31, and 37). The clustering does not show any rigid correspondence between genotypes and their geographic location.

Figure 1
figure 1

UNJ dendrogram showing clustering pattern of S. wightiana Samples (DARwin ver. 6.0; http://darwin.cirad.fr/darwin).

Principal co-ordinate analysis (PCoA)

Principal Coordinate Analysis (PCoA) was carried by using the data matrix/scores (1-presence and 0-absence) of all 8 SSR markers that led to the uniform spread of all 37 S. wightiana genotypes into all 4 coordinates (Fig. 2). Principal Coordinates Analysis (PCoA) could not identify significant isolation in populations as all genotypes were widely distributed in all the quadrants (I and IV) of factorial analysis. PCoA could not identify any significant isolation and rigid assemblies of genetic proximity to spatial distribution.

Figure 2
figure 2

Results of the principal coordinates analysis of 17 microsatellite loci in S. wightiana (DARwin ver. 6.0; http://darwin.cirad.fr/darwin).

The analysis of molecular variance (AMOVA)

AMOVA was carried to analyze the distribution of genetic variation among sub-population from 7 different watersheds representing the different populations and among and within individuals/accessions/genotypes. The AMOVA revealed that the majority of the genetic variation is partitioned among individuals (i.e., within site/population) i.e., 90% and only 10% variation is present among populations (Table 3). Based on AMOVA the genetic differentiation values for global Fst, Fst max, and F’st were 0.104, 0.403, and 0.258 respectively indicating a moderate level of genetic differentiation among populations. Pairwise Fst genetic distances between each pair of population/subpopulations were also estimated. The values for pairwise Fst between populations/ sub-populations 1–4 and 6–7 were high (0.125 and 0.417 respectively) with an average of 0.285 (Table 4). Pairwise Nei’s Genetic Distance, Nei’s Genetic identity, and corresponding unbiased Nei’s Genetic Distance and Nei’s Genetic identity values (Table 5) were in agreement with pairwise Fst calculations reflecting a moderate genetic differentiation among populations. Chi-Square Tests for Hardy–Weinberg Equilibrium revealed the probability at all loci except for 2 monomorphic loci; Sn023 and Sn025 were statistically significant (Table 6) confirming the deviation from the HWE and nonrandom mating of S. wightiana with subtle signs of inbreeding.

Table 3 Summary of analysis of molecular variance (AMOVA).
Table 4 Pairwise population Fst values.
Table 5 Genetic Diversity Indices analyzed in S. wightiana populations; Pairwise Population Matrix of (a) Nei’s Genetic Distance, (b) Nei’s Unbiased Genetic Distance, (c) Nei’s Genetic Identity and (d) Nei’s Unbiased Genetic Identity.
Table 6 Summary of chi-square tests for Hardy–Weinberg equilibrium.

ITS region sequence analysis

The Internal Transcribed Spacer (ITS) region that lies between nuclear small rDNA and nuclear large rDNA is considered most important for phylogenetic inference at the generic and intrageneric levels in plants (Alvarez and Wendel 2003). The ITS sequence data has been used in ~ 66% of studies to analyze the phylogeny and 34% of all published phylogenetic hypotheses have been based exclusively on ITS sequences23,24. Therefore, efforts have been made during the present study to sequence the ITS region of 37 S. wightiana accessions/genotypes collected from the North-Western Himalayas of Jammu and Kashmir. The sequence data generated during the present study was also compared with ITS region sequencing data of 18 other Sambucus species including S. wightiana. The analysis of sequencing data led to the identification of 10 SNPs (SNP density: 1 SNP/ 57.2 bp) in the 645 bp sequence of 37 S. wightiana genotypes. Among the 10 SNPs, five SNPs were found very promising (Table 7), and the frequency of these SNPs was either 20% or > 20%. In addition, one insertion has also been noticed with a frequency of 34%. A large number of SNPs were also noticed while comparing the S. wightiana genotypes with the ITS sequences of 18 other Sambucus species. One large insertion of ~ 25 bp was identified in S. wightiana or deletion of 25 bp in other 18 Sambucus species. Sequence variation between different genotypes has been identified at ITS sequence in several earlier studies in different plants including Cinnamomum25, Jujube26, Common bean21,24, Coneflowers and Relatives27, and Sambucus and Adoxa28. The important SNPs identified during the present study could be converted into user-friendly PCR-based markers for species discrimination, genetic characterization, and phylogeny analysis.

Table 7 List of SNPs detected in the ITS sequences of S. wightiana.

ITS sequences in phylogenetic analysis of S. wightiana

‘Fasttree’, built using online software tool “Clustal W”, with slow NNI and MLACC = 3 based on phylogenetic reconstructions using the function ‘build’ of ETE3 v.3.1.1 was comparable to MEGA 6 results. The optimal tree by MEGA 6 was constructed with the sum of branch length for ITS1 (0.48689146 for NJ and 0.52255238 for UPGMA) and ITS4 (1.61554106 for NJ and 1.80660053 for UPGMA). Both NJ and UPGMA methods show similar results, yet UPGMA showed the best fit results (Suppl S2 and S3). The results didn’t show any geographic-specific clustering among the genotypes.

The sequences of 17 other species of Sambucus and one S. wightiana sequence downloaded from gene bank (https://www.ncbi.nlm.nih.gov/genbank/) were used as outgroups genotype during the analysis. The cluster analysis led to a clear-cut separation of 17 Sambucus species from S. wightiana (Suppl S4). It is important to note that our genotypes clustered with the out-group S. wightiana genotype sequence downloaded from the gene bank. The results of the present study show the diverse nature of S. wightiana genotypes growing wild in the natural habitats of the North-Western Himalayas.

Structure analysis

The analysis of the population structure of 37 S. wightiana genotypes revealed the presence of two sub-populations (Fig. 3). Sub-population one possesses 17 genotypes, and the second sub-population possesses 20 genotypes. Among 37 genotypes, a set of 4 genotypes were found admixed i.e., their affiliation probability with a particular sub-population is < 80% and these individuals tend to have mixed ancestry and tend to group with different sub-populations. The information of population structure is particularly important for working out marker-trait association and avoiding spurious associations.

Figure 3
figure 3

Estimation of the number of groups based on output from STRUCTURE-software (STRUCTURE 2.3.4; https://web.stanford.edu/group/pritchardlab/structuresoftware/releaseversions/v2.3.4/html/structure.html). (a) ΔK over K from 1 to 10 with the 8 SSR markers. (b) Bar plot showing grouping of 37 genotypes into two different groups.

Marker-trait associations (MTAs)

The study of MTAs using the General Linear Model (GLM) and Mixed Linear Model (MLM) was analyzed in the software program TASSEL led to the identification of a total of 5 SSR markers associated with 11 traits in S. wightiana. Through GLM, one allele/marker each was found associated with plant height, leaf weight, and INL, two alleles/markers with RGR, stem weight, plant weight, MSL, MA, and TL, and three alleles/markers with fruit weight and rhizome weight. Marker SN-04 was found associated with 8 traits, marker SN-07 was found associated with three traits, marker SN-08 was found associated with 5 traits, marker SN-12 with 3 traits, and marker SN-06 with only one trait (Table 8), while as through MLM approach, only one marker (SN-12) was found associated with two different traits (stem weight and fruit weight). The marker SN-12 found associated with 3 traits through GLM and two traits through MLM is the most important marker/genomic loci for S. wightiana.

Table 8 Association of marker alleles with phenotypic traits using GLM and MLM.

Discussion

Type of reproduction has an important effect on the maintenance of particular populations and species persistence in time and space. This trait significantly influences the ecological and genetic structure of populations and in consequence the evolution of species29. The prevailing opinion is that clonality reduces genetic diversity30. Many studies document lower levels of genetic variation in clonal species preceding repeated genetic bottlenecks leading to a prolonged lag phase31,32 and in populations of such species extinction is prevented with more intense vegetative reproduction with extensive human-mediated propagule dispersal33,34,35,36. In contrast to this opinion, comparable, or even higher genetic variation for 21 clonal plant species has been documented than non-clonal ones37. The results of many other studies also confirm this assertion, concerning the high level of genetic variation and genotypic diversity of species with clonal reproduction35,38,39. The most plausible explanation for high levels of genetic variation in populations of clonal plants is the presence of gene flow via long-distance dispersal and multiple introductions reducing the ecological and environmental constraints on genetic diversity whereas enhancing the potential of sexual reproduction, even if it is periodic and meagre40,41,42,43. Thus clonality helps in the maintenance and expansion of existing populations while the occurrence of sexual reproduction helps in the recruitment of new individuals/ populations however maintaining the genetic diversity44. Clonal plants are more successful in heterogeneous environments benefiting from clonal integration, which may enhance sexual fitness45. However, clonality does not substitute sexual mode of reproduction and genetic rather than ecological factors prove crucial for long term success46,47,48.

The well-known genetic paradox of how spreading species maintain genetic diversity despite going through founder effects and genetic bottlenecks during the range expansion49. However, some studies have shown spreading plant species can maintain a remarkable level of genetic diversity across distances, in such cases the gene flow via long-range seed dispersal or pollen might be more important than commonly suggested50,51. This may be also associated with the multiple introductions leading to diverse genetic admixtures52,53,54,55,56. Local adaptation to environmental conditions is also considered one among the other dependent factors providing an advantage to spreading species57,58,59,60,61,62. Genetic variation plays important role in the evolution-related traits for local adaptation63. Also, evolutionary modifications to reproductive systems provide the spreading capability for widespread species64.

The non-availability of any data on genetic diversity and the aggressive clonal nature led to the perception of low diversity in S. wightiana. Against the anticipated low genetic diversity in clonal plants, S. wightiana showed a remarkable genetic diversity which is consistent with some other studies65,66. The number of alleles per marker is a good indicator of genetic variability67. The detection of 73 alleles in 37 S. wightiana genotypes by 8 SSR primers with 70.59 percent polymorphic loci indicates moderate to high levels of genetic diversity. The high value of average gene diversity for all polymorphic loci (He = 0.74) also points to a good amount of overall genetic variability. Also, the variable ITS regions are used as a standard to measure genetic variability for a long time. The identification of several SNPs among different S. wightiana genotypes with SNP density of 1 SNP/57.2 bp also indicates high diversity in S. wightiana at the sequence level. The present study revealed a good magnitude of genetic diversity and a moderate differentiation with no geographical pattern among populations68.

Our study does not find any geographically distinct clustering of S. wightiana individuals from both clusterings, PCoA by SSR scores and phylogeny by ITS markers69. This can be attributed to the mixing of the genotypes across the regions via gene flow between populations70. AMOVA showed that most of the variation is due to individuals/ genotypes only i.e., 90 percent and only 10 percent of the variation between geographically distant populations. These results suggest the recruitment of new genotypes into the populations with equivalent genetic differentiation as of non-clonal species. Yet clonality remains a vital strategy, shown by the deviance from Hardy–Weinberg equilibrium for the studied microsatellite loci71. Clonality associated with perineal growth can moderate the negative effects of stochastic population events like genetic drifts, etc.72. This magnitude of polymorphism at the studied microsatellite loci and the resultant genetic diversity may point to the spreading success of this clonal species. The potential for both sexual and clonal reproduction of a species provides the greater ability for landscape spread47. Significant pairwise Fst values (0.125–0.417) were observed which agree with the Nei’s genetic distance and identity parameters, indicating the substantial differentiation between the populations73.

The structure analysis using SSR marker data during the present study revealed different sub-populations (two sub-populations) in our collection of 37 S. wightiana genotypes. The presence of two sub-populations indicated differences in allele frequency and population differentiations in different sub-populations, indicating a good diversity available in S. wightiana genotypes in the Western Himalayas. The analysis of genotypic data in combination with trait data of the traits led to the identification of genomic loci/genomic regions that are associated with these traits. Using different models (GLM and MLM) 5 markers/loci were found associated with 11 traits, with some markers controlling more than one trait (pleiotropy). The markers controlling more than one trait are considered the most important markers/genomic loci and may be due to pleiotropy or linkage of multiple genes. Identification of more genomic loci through GLM than MLM is attributed to more stringent criteria and incorporation of kin-ship matrix into the analysis in MLM, in addition to genotypic data, trait data, and population structure matrix. In GLM association analysis, only genotypic data, trait data, and population structural matrix are used to work out MTAs. The genes/genomic regions/loci identified will prove useful in future Sambucus breeding programs and provided insights into the genetics of these traits and the kind of gene action in Sambucus.

Conclusion

Clonal endemic plant species like S. wightiana can acquire a moderate amount of genetic diversity, creating a wider genetic base that can withstand the stress environments like the conditions created by changing climate of the North-Western Himalaya. In this study, no particular genotype was found dominant across the regions. Lack of correlation between genetic diversity and geographic distances in S. wightiana could be associated with successful sexual reproduction between genets followed by long-range seed dispersion by human or animal agents, and subsequent recruitment in isolated S. wightiana populations enabling the gene flow between populations and forming diverse genetic admixtures74. The present study also found the successful transferability of these microsatellite markers, which could be used for further studies at a finer scale.

Methods

Plant species

Sambucus wightiana Wall. Ex Wight & Arn. is a clonal sub-shrub, mostly found in the forests of Kashmir valley within an altitudinal range of 1700–3300 m.a.s.l. Commonly called Kashmir elder, the species is a rhizomatous geophyte growing up to 2 m in height with white or yellowish-white flowers. The umbels consist of orange or red individual berries that reach full ripeness in late summer. The species is native to the North-Western Himalaya and the primary range spans from Indian administered Kashmir to Chitral in Pakistan75. Recently the species has been reported from other parts of the Indian Himalayas like Uttarakhand, Himachal Pradesh, and Sikkim76, pointing towards its spread due to adaptive evolution in response to climate change61. Sambucus wightiana is closely allied to S. adnata Wall., from which it differs in its glabrous or almost glabrous inflorescence.

Study area

The present study was carried out in the Kashmir Himalaya (32° 20′ to 34° 54′ N and 73° 55′ to 75° 35′ E, total area of 15,948 km2), that represents a unique biogeographical province in the Indian Himalayan region77. The altitude of the region ranges from 1600 m.a.s.l. at Srinagar to 5420 m.a.s.l. at Kolahoi peak. Within this altitudinal range, Kashmir elder is found from 1700 to 3300 m.a.s.l. in the forests of Kashmir valley. The climate of the region is predominantly of a continental temperate type with wet and cold winters and relatively dry and hot summers. The temperature ranges from an average daily maximum of 31 °C and minimum of 15 °C during summer to an average daily maximum of 4 °C and minimum of − 4 °C during winter. The region has been reported to be a hotspot for climate change due to its complex topography, enormous glacial and water resources, and quick responding watersheds with intense seasonal and climatic variability over a small spatial scale78.

For the present study, we sampled 37 S. wightiana individuals (Table 9) from seven watersheds (Vishaw, Ferozpur, Lidder, Sindh, Doodhganga, Pohru, and Brinji) across Kashmir Himalaya, aiming to cover a wide geographic area and range of elevations (Fig. 4). The samples from each watershed were clubbed as populations for further analysis.

Table 9 Description of the sampling sites.
Figure 4
figure 4

Study area showing sampling sites marked as black dots on the map inset (ArcGIS 10.2; http://resources.arcgis.com/en/help/main/10.2/index.html).

DNA extraction

Genomic DNA was extracted from the freshly collected young foliar parts of 37 S. wightiana accessions/genotypes by Qiagen® DNeasy™ Plant mini kit following the manufacturer’s protocol. The quality and concentration of the DNA were determined by visual comparison with the known amount of λ DNA on 0.8% agarose gel.

Selection of SSR markers and PCR amplification

Eight polymorphic microsatellite (SSR) markers, previously developed from the S. nigra genome (EMSn002, 003, 010, 016, 017, 019 023, and 025) were selected, which have also been found informative in Sambucus canadensis79 (Table 1). Their sequences were downloaded from NCBI (https://www.ncbi.nlm.nih.gov/genbank/). The primers for these eight SSR markers were synthesized on contract from Sigma Aldrich, Bangalore, India. The PCR was carried out in 15 µL reaction volume with the constituents: 2 µL 20 ng DNA template, 2µL (5.0 pmol) forward and reverse primers, 0.4 µL (2.5 mM) dNTPs, 0.3 µL (1.5 U) Taq Polymerase (Sigma), 1.5 µL buffer containing MgCl2 with a PCR profile of; 5 min at 95 °C, 40 cycles of 95 °C for 1 min, 54 °C for 1 min 30 S of annealing temperature, 72 for 1 min and a final extension of 72 °C for 10 min in a Peqlab PCR machine.

Selection of ITS1 and ITS4 primer sequences and their PCR

In addition to SSR markers, the ITS1-5.8S-ITS2 rDNA region was amplified using primers (ITS1: 5-GTCCACTGAACCTTATCATTTAG-3 and ITS-4: 5-TCCTCCGCTTATTGATATGC-3) synthesized on contract from Sigma Aldrich, Bangalore (for more details about primer see Choudhary et al. 2018). The PCR was done in a 25 µL reaction volume with the following constituents: 25 ng DNA template, 5.0 pmol forward and reverse primers, 2.5 mM of each dNTPs, 1X buffer, 2.0 mM MgCl2, and 1.0 U of Taq DNA polymerase (Sigma) by using the following PCR profile; 5 min at 94 °C, 45 cycles of 94 °C for 30 S, 52 °C for 45 S of annealing temperature, 72 °C for 1 min and final extension at 72 °C for 5 min in a Peqlab PCR machine.

Agarose/poly-acrylamide gel electrophoresis (PAGE)

After successful amplification, the PCR products of both ITS1–ITS4 markers and all the eight SSR markers were tested on 1.6% agarose gel to check for any amplification errors and robustness of amplification (Suppl S1a). Amplification of ITS led to a single conspicuous band of about 750 bp. The PCR product was cleaned, and Sanger sequenced by ABI 3730xl sequencer at SciGenome Genomic Services facility, Cochin, Kerala (www.scigenome.com). All SSR amplifications for 37 S. wightiana genotypes were subjected to PAGE analysis using PeqLab Perfect Blue Dual Gel Vertical Electrophoresis system. The gels were silver stained and digitally photographed (Suppl S1b) and the data were scored manually.

SSR marker data analysis

For the characterization of genetic variation in the 37 S. wightiana genotypes and to test the transferable use of the microsatellite markers, a set of 8 SSR markers were selected. Out of the 8 SSR primers, 6 primers were polymorphic. Primers EMSn0023 and 0025 amplified a single monomorphic band. To determine the various parameters of genetic diversity, different software packages and online tools were used. The SSR marker data scored were analyzed using the software program DARwin ver. 6.0 (http://darwin.cirad.fr/darwin) 80, for multivariate analysis (clustering and PCoA). Both clustering and PCoA were based on Jaccard’s dissimilarity co-efficient with 1000 bootstraps. Two clustering methods (UPGMA Hierarchical clustering and Neighbor-Joining clustering) were followed. The final inferences were made on the Unweighted Neighbor-Joining clustering (UNJ), which provided the best fit to the data, and a dendrogram was constructed81.

Analysis of Molecular Variance (AMOVA) based on 999 permutations using FST statistics was performed with the scored SSR fragment lengths (8 markers at 17 loci) to determine the segregation of total genetic variation between, among, and within populations using GenAlEx v.6.5 (https://biology-assets.anu.edu.au/GenAlEx/Welcome.html)82. The parameters included the number of alleles for a given locus (N), Number of effective alleles (Ne), Shannon’s Information Index (I), Observed Heterozygosity (Ho), Expected Heterozygosity (He), Unbiased Expected Heterozygosity (uHe), and Fixation Index (F) along with their mean and Standard Error (SE). GenAlEx was also used to calculate the mean allelic patterns across the populations with the above-mentioned parameters.

Since our study was confined to a small number of samples, pairwise Nei’s Genetic Distance, Nei’s Genetic identity, and corresponding unbiased Nei’s Genetic Distance and Nei’s Genetic identity values were calculated across populations83,84. Also, Chi-Square Tests for Hardy–Weinberg Equilibrium were implemented in GenAlEx v.6.5. The Polymorphism Information Content (PIC) values of individual primers at all loci were calculated based on the formula PIC = 2 × F (1 − F)85.

An analysis of population structure was done using the software STRUCTURE v.2.3.4 (https://web.stanford.edu/group/pritchardlab/structure_software/release_versions/v2.3.4/html/structure.html)86. Population structure was analyzed by setting the number of sub-populations (k-values) from 1 to 10 and each run was repeated five times. The program was set to 300,000 burn-in iterations, followed by 600,000 Markov Chain Monte Carlo (MCMC) replications along with the admixture model. The STRUCTURE HARVESTER web version v0.6.94 (http://taylor0.biology.ucla.edu/structureHarvester/)87 was used to derive the appropriate number of sub-populations using a modified delta K (∆K) method88.

A study of Marker-Trait Associations (MTAs) was done using genotypic data, trait data, and population structure information using General Linear Models (GLM), and Mixed-Linear Model (MLM) in the software program TASSEL (https://tassel.bitbucket.io/)89. Kinship matrix was also used in addition to genotypic data, trait data, and population structure information. The kinship matrix was calculated from genotypic data using the software program TASSEL.

Analysis of ITS1–ITS4 Sanger sequencing data

The Sanger sequences of ITS1 and ITS4 were manually checked for sequencing errors. The sequences were aligned using the online ClustalW tool (http://www.genome.jp/tools-bin/clustalw) and phylogenetic reconstructions were performed using the function ‘build’ in ETE3 v.3.1.190, as implemented on the GenomeNet (http://www.genome.jp/tools/ete/). The tree was constructed using the function ‘fasttree’ with slow NNI and MLACC = 3 (to make the maximum-likelihood NNIs more exhaustive)91. The values at nodes are SH-like local support. To compare these tree results for validity and reliability, MEGA 6 (Build#: 6140226; https://www.megasoftware.net/)92 was also used for tree construction. The evolutionary history was inferred using the UPGMA method93 and Neighbor-Joining method94. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Maximum Composite Likelihood method95 and are in the units of the number of base substitutions per site. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches96. Codon positions included were 1st + 2nd + 3rd + Noncoding. All positions containing gaps and missing data were eliminated. There were a total of 528 and 537 positions for ITS1 and ITS4 respectively in the final dataset. Also, the aligned sequences of all 37 accessions were analyzed in BioEdit v.7.2.6 (http://en.bio-soft.net/format/BioEdit.html)97,98 for SNP identification.