Genetic diversity may help evolutionary rescue in a clonal endemic plant species of Western Himalaya

Habitat loss due to climate change may cause the extinction of the clonal species with a limited distribution range. Thus, determining the genetic diversity required for adaptability by these species in sensitive ecosystems can help infer the chances of their survival and spread in changing climate. We studied the genetic diversity and population structure of Sambucus wightiana—a clonal endemic plant species of the Himalayan region for understanding its possible survival chances in anticipated climate change. Eight polymorphic microsatellite markers were used to study the allelic/genetic diversity and population structure. In addition, ITS1–ITS4 Sanger sequencing was used for phylogeny and SNP detection. A total number of 73 alleles were scored for 37 genotypes at 17 loci for 8 SSRs markers. The population structural analysis using the SSR marker data led to identifying two sub-populations in our collection of 37 S. wightiana genotypes, with 11 genotypes having mixed ancestry. The ITS sequence data show a specific allele in higher frequency in a particular sub-population, indicating variation in different S. wightiana accessions at the sequence level. The genotypic data of SSR markers and trait data of 11 traits of S. wightiana, when analyzed together, revealed five significant Marker-Trait Associations (MTAs) through Single Marker Analysis (SMA) or regression analysis. Most of the SSR markers were found to be associated with more than one trait, indicating the usefulness of these markers for working out marker-trait associations. Moderate to high genetic diversity observed in the present study may provide insurance against climate change to S. wightiana and help its further spread.

Principal co-ordinate analysis (PCoA). Principal Coordinate Analysis (PCoA) was carried by using the data matrix/scores (1-presence and 0-absence) of all 8 SSR markers that led to the uniform spread of all 37 S. wightiana genotypes into all 4 coordinates (Fig. 2). Principal Coordinates Analysis (PCoA) could not identify significant isolation in populations as all genotypes were widely distributed in all the quadrants (I and IV) of factorial analysis. PCoA could not identify any significant isolation and rigid assemblies of genetic proximity to spatial distribution.
The analysis of molecular variance (AMOVA). AMOVA was carried to analyze the distribution of genetic variation among sub-population from 7 different watersheds representing the different populations and among and within individuals/accessions/genotypes. The AMOVA revealed that the majority of the genetic variation is partitioned among individuals (i.e., within site/population) i.e., 90% and only 10% variation is present among populations (   Structure analysis. The analysis of the population structure of 37 S. wightiana genotypes revealed the presence of two sub-populations (Fig. 3). Sub-population one possesses 17 genotypes, and the second sub-population possesses 20 genotypes. Among 37 genotypes, a set of 4 genotypes were found admixed i.e., their affiliation probability with a particular sub-population is < 80% and these individuals tend to have mixed ancestry and tend to group with different sub-populations. The information of population structure is particularly important for working out marker-trait association and avoiding spurious associations.  www.nature.com/scientificreports/

Marker-trait associations (MTAs). The study of MTAs using the General Linear Model (GLM) and
Mixed Linear Model (MLM) was analyzed in the software program TASSEL led to the identification of a total of 5 SSR markers associated with 11 traits in S. wightiana. Through GLM, one allele/marker each was found associated with plant height, leaf weight, and INL, two alleles/markers with RGR, stem weight, plant weight, MSL, MA, and TL, and three alleles/markers with fruit weight and rhizome weight. Marker SN-04 was found associated with 8 traits, marker SN-07 was found associated with three traits, marker SN-08 was found associated with 5 traits, marker SN-12 with 3 traits, and marker SN-06 with only one trait (Table 8), while as through MLM approach, only one marker (SN-12) was found associated with two different traits (stem weight and fruit weight). The marker SN-12 found associated with 3 traits through GLM and two traits through MLM is the most important marker/genomic loci for S. wightiana.

Discussion
Type of reproduction has an important effect on the maintenance of particular populations and species persistence in time and space. This trait significantly influences the ecological and genetic structure of populations and in consequence the evolution of species 29 . The prevailing opinion is that clonality reduces genetic diversity 30 . Many studies document lower levels of genetic variation in clonal species preceding repeated genetic bottlenecks leading to a prolonged lag phase 31,32 and in populations of such species extinction is prevented with more intense vegetative reproduction with extensive human-mediated propagule dispersal [33][34][35][36] . In contrast to this opinion, comparable, or even higher genetic variation for 21 clonal plant species has been documented than non-clonal ones 37 . The results of many other studies also confirm this assertion, concerning the high level of genetic variation and genotypic diversity of species with clonal reproduction 35,38,39 . The most plausible explanation for high levels of genetic variation in populations of clonal plants is the presence of gene flow via long-distance dispersal and multiple introductions reducing the ecological and environmental constraints on genetic diversity whereas enhancing the potential of sexual reproduction, even if it is periodic and meagre [40][41][42][43] . Thus clonality helps in the maintenance and expansion of existing populations while the occurrence of sexual reproduction helps in the recruitment of new individuals/ populations however maintaining the genetic diversity 44 . Clonal plants are more successful in heterogeneous environments benefiting from clonal integration, which may enhance sexual fitness 45 . However, clonality does not substitute sexual mode of reproduction and genetic rather than ecological factors prove crucial for long term success [46][47][48] .
The well-known genetic paradox of how spreading species maintain genetic diversity despite going through founder effects and genetic bottlenecks during the range expansion 49 . However, some studies have shown spreading plant species can maintain a remarkable level of genetic diversity across distances, in such cases the gene flow via long-range seed dispersal or pollen might be more important than commonly suggested 50,51 . This may be also associated with the multiple introductions leading to diverse genetic admixtures [52][53][54][55][56] . Local adaptation to environmental conditions is also considered one among the other dependent factors providing an advantage to spreading species [57][58][59][60][61][62] . Genetic variation plays important role in the evolution-related traits for local adaptation 63 . Also, evolutionary modifications to reproductive systems provide the spreading capability for widespread species 64 .
The non-availability of any data on genetic diversity and the aggressive clonal nature led to the perception of low diversity in S. wightiana. Against the anticipated low genetic diversity in clonal plants, S. wightiana showed a remarkable genetic diversity which is consistent with some other studies 65,66 . The number of alleles per marker is a good indicator of genetic variability 67   Our study does not find any geographically distinct clustering of S. wightiana individuals from both clusterings, PCoA by SSR scores and phylogeny by ITS markers 69 . This can be attributed to the mixing of the genotypes across the regions via gene flow between populations 70 . AMOVA showed that most of the variation is due to individuals/ genotypes only i.e., 90 percent and only 10 percent of the variation between geographically distant populations. These results suggest the recruitment of new genotypes into the populations with equivalent genetic differentiation as of non-clonal species. Yet clonality remains a vital strategy, shown by the deviance from Hardy-Weinberg equilibrium for the studied microsatellite loci 71 . Clonality associated with perineal growth can moderate the negative effects of stochastic population events like genetic drifts, etc. 72 . This magnitude of polymorphism at the studied microsatellite loci and the resultant genetic diversity may point to the spreading success of this clonal species. The potential for both sexual and clonal reproduction of a species provides the greater ability for landscape spread 47 . Significant pairwise Fst values (0.125-0.417) were observed which agree with the Nei's genetic distance and identity parameters, indicating the substantial differentiation between the populations 73 .
The structure analysis using SSR marker data during the present study revealed different sub-populations (two sub-populations) in our collection of 37 S. wightiana genotypes. The presence of two sub-populations indicated differences in allele frequency and population differentiations in different sub-populations, indicating a good diversity available in S. wightiana genotypes in the Western Himalayas. The analysis of genotypic data in combination with trait data of the traits led to the identification of genomic loci/genomic regions that are associated with these traits. Using different models (GLM and MLM) 5 markers/loci were found associated with 11 traits, with some markers controlling more than one trait (pleiotropy). The markers controlling more than one trait are considered the most important markers/genomic loci and may be due to pleiotropy or linkage of multiple genes. Identification of more genomic loci through GLM than MLM is attributed to more stringent criteria and incorporation of kin-ship matrix into the analysis in MLM, in addition to genotypic data, trait data, and population structure matrix. In GLM association analysis, only genotypic data, trait data, and population structural matrix are used to work out MTAs. The genes/genomic regions/loci identified will prove useful in Table 8. Association of marker alleles with phenotypic traits using GLM and MLM. With P value = < 0.01.

S. No
Trait Marker Allele F ratio P value R 2 (%) GLM www.nature.com/scientificreports/ future Sambucus breeding programs and provided insights into the genetics of these traits and the kind of gene action in Sambucus.

Conclusion
Clonal endemic plant species like S. wightiana can acquire a moderate amount of genetic diversity, creating a wider genetic base that can withstand the stress environments like the conditions created by changing climate of the North-Western Himalaya. In this study, no particular genotype was found dominant across the regions. Lack of correlation between genetic diversity and geographic distances in S. wightiana could be associated with successful sexual reproduction between genets followed by long-range seed dispersion by human or animal agents, and subsequent recruitment in isolated S. wightiana populations enabling the gene flow between populations and forming diverse genetic admixtures 74 . The present study also found the successful transferability of these microsatellite markers, which could be used for further studies at a finer scale. Recently the species has been reported from other parts of the Indian Himalayas like Uttarakhand, Himachal Pradesh, and Sikkim 76 , pointing towards its spread due to adaptive evolution in response to climate change 61 . Sambucus wightiana is closely allied to S. adnata Wall., from which it differs in its glabrous or almost glabrous inflorescence.

Methods
Study area. The present study was carried out in the Kashmir Himalaya (32° 20′ to 34° 54′ N and 73° 55′ to 75° 35′ E, total area of 15,948 km 2 ), that represents a unique biogeographical province in the Indian Himalayan region 77 . The altitude of the region ranges from 1600 m.a.s.l. at Srinagar to 5420 m.a.s.l. at Kolahoi peak. Within this altitudinal range, Kashmir elder is found from 1700 to 3300 m.a.s.l. in the forests of Kashmir valley. The climate of the region is predominantly of a continental temperate type with wet and cold winters and relatively dry and hot summers. The temperature ranges from an average daily maximum of 31 °C and minimum of 15 °C during summer to an average daily maximum of 4 °C and minimum of − 4 °C during winter. The region has been reported to be a hotspot for climate change due to its complex topography, enormous glacial and water resources, and quick responding watersheds with intense seasonal and climatic variability over a small spatial scale 78 .
For the present study, we sampled 37 S. wightiana individuals (Table 9) from seven watersheds (Vishaw, Ferozpur, Lidder, Sindh, Doodhganga, Pohru, and Brinji) across Kashmir Himalaya, aiming to cover a wide geographic area and range of elevations (Fig. 4). The samples from each watershed were clubbed as populations for further analysis. DNA extraction. Genomic DNA was extracted from the freshly collected young foliar parts of 37 S. wightiana accessions/genotypes by Qiagen® DNeasy™ Plant mini kit following the manufacturer's protocol. The quality and concentration of the DNA were determined by visual comparison with the known amount of λ DNA on 0.8% agarose gel.

Selection of SSR markers and PCR amplification. Eight polymorphic microsatellite (SSR) markers,
previously developed from the S. nigra genome (EMSn002, 003, 010, 016, 017, 019 023, and 025) were selected, which have also been found informative in Sambucus canadensis 79 (Table 1). Their sequences were downloaded from NCBI (https:// www. ncbi. nlm. nih. gov/ genba nk/). The primers for these eight SSR markers were synthesized on contract from Sigma Aldrich, Bangalore, India. The PCR was carried out in 15 µL reaction volume with the constituents: 2 µL 20 ng DNA template, 2µL Agarose/poly-acrylamide gel electrophoresis (PAGE). After successful amplification, the PCR products of both ITS1-ITS4 markers and all the eight SSR markers were tested on 1.6% agarose gel to check for any amplification errors and robustness of amplification (Suppl S1a). Amplification of ITS led to a single conspicuous band of about 750 bp. The PCR product was cleaned, and Sanger sequenced by ABI 3730xl sequencer at wightiana genotypes were subjected to PAGE analysis using PeqLab Perfect Blue Dual Gel Vertical Electrophoresis system. The gels were silver stained and digitally photographed (Suppl S1b) and the data were scored manually.
SSR marker data analysis. For the characterization of genetic variation in the 37 S. wightiana genotypes and to test the transferable use of the microsatellite markers, a set of 8 SSR markers were selected. Out of the 8 SSR primers, 6 primers were polymorphic. Primers EMSn0023 and 0025 amplified a single monomorphic band.
To determine the various parameters of genetic diversity, different software packages and online tools were used. The SSR marker data scored were analyzed using the software program DARwin ver. 6.0 (http:// darwin. cirad. fr/ darwin) 80 , for multivariate analysis (clustering and PCoA). Both clustering and PCoA were based on Jaccard's dissimilarity co-efficient with 1000 bootstraps. Two clustering methods (UPGMA Hierarchical clustering and Neighbor-Joining clustering) were followed. The final inferences were made on the Unweighted Neighbor-Joining clustering (UNJ), which provided the best fit to the data, and a dendrogram was constructed 81 . Analysis of Molecular Variance (AMOVA) based on 999 permutations using FST statistics was performed with the scored SSR fragment lengths (8 markers at 17 loci) to determine the segregation of total genetic variation between, among, and within populations using GenAlEx v.6.5 (https:// biolo gy-assets. anu. edu. au/ GenAl Ex/ Table 9. Description of the sampling sites. a Based on the geographic location, all the samples from nearby locations were clubbed as populations for analysis.  82 . The parameters included the number of alleles for a given locus (N), Number of effective alleles (Ne), Shannon's Information Index (I), Observed Heterozygosity (Ho), Expected Heterozygosity (He), Unbiased Expected Heterozygosity (uHe), and Fixation Index (F) along with their mean and Standard Error (SE). GenAlEx was also used to calculate the mean allelic patterns across the populations with the above-mentioned parameters. Since our study was confined to a small number of samples, pairwise Nei's Genetic Distance, Nei's Genetic identity, and corresponding unbiased Nei's Genetic Distance and Nei's Genetic identity values were calculated across populations 83,84 . Also, Chi-Square Tests for Hardy-Weinberg Equilibrium were implemented in GenAlEx v.6.5. The Polymorphism Information Content (PIC) values of individual primers at all loci were calculated based on the formula PIC = 2 × F (1 − F) 85 .
An analysis of population structure was done using the software STRU CTU RE v.2.3.4 (https:// web. stanf ord. edu/ group/ pritc hardl ab/ struc ture_ softw are/ relea se_ versi ons/ v2.3. 4/ html/ struc ture. html) 86 . Population structure was analyzed by setting the number of sub-populations (k-values) from 1 to 10 and each run was repeated five times. The program was set to 300,000 burn-in iterations, followed by 600,000 Markov Chain Monte Carlo (MCMC) replications along with the admixture model. The STRU CTU RE HARVESTER web version v0.6.94 (http:// taylo r0. biolo gy. ucla. edu/ struc tureH arves ter/) 87 was used to derive the appropriate number of sub-populations using a modified delta K (∆K) method 88 .
A study of Marker-Trait Associations (MTAs) was done using genotypic data, trait data, and population structure information using General Linear Models (GLM), and Mixed-Linear Model (MLM) in the software program TASSEL (https:// tassel. bitbu cket. io/) 89 . Kinship matrix was also used in addition to genotypic data, trait data, and population structure information. The kinship matrix was calculated from genotypic data using the software program TASSEL.
Analysis of ITS1-ITS4 Sanger sequencing data. The Sanger sequences of ITS1 and ITS4 were manually checked for sequencing errors. The sequences were aligned using the online ClustalW tool (http:// www. genome. jp/ tools-bin/ clust alw) and phylogenetic reconstructions were performed using the function 'build' in ETE3 v.3.1.1 90 , as implemented on the GenomeNet (http:// www. genome. jp/ tools/ ete/). The tree was constructed using the function 'fasttree' with slow NNI and MLACC = 3 (to make the maximum-likelihood NNIs more exhaustive) 91 . The values at nodes are SH-like local support. To compare these tree results for validity and reliability, MEGA 6 (Build#: 6140226; https:// www. megas oftwa re. net/) 92 was also used for tree construction. The evolutionary history was inferred using the UPGMA method 93 and Neighbor-Joining method 94 . The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Maximum Composite Likelihood method 95 and are in the units of the number of base substitutions per site. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches 96 . Codon positions included were 1st + 2nd + 3rd + Noncoding. All positions containing gaps and missing data were eliminated. There were a total of 528 and 537 positions for ITS1 and ITS4 respectively in the final dataset. Also, the