The influence of breeding history, origin and growth type on population structure of barley as revealed by SSR markers

Natural and mass selection during domestication and cultivation favored particular traits of interest in barley. In the present study, population structure, and genetic relationships among 144 accessions of barley landraces and breeding materials from various countries were studied using a set of 77 and 72 EST-SSR and gSSR markers, respectively distributed on seven chromosomes of barley. In total, 262 and 429 alleles were amplified in 77 EST-SSRs and 72 gSSR loci, respectively. Out of which, 185 private/group-specific alleles were identified in the landraces compared with 14 in "cultivar and advanced breeding lines", indicating the possibility to introgress favorite alleles from landraces into breeding materials. Comparative analysis of genetic variation among breeding materials, Iranian landraces, and exotic landraces revealed higher genetic diversity in Iranian landraces compared with others. A total of 37, 15, and 14 private/group-specific alleles were identified in Iranian landraces, exotic landraces, and breeding materials, respectively. The most likely groups for 144 barley genotypes were three as inferred using model- and distance-based clustering as well as principal coordinate analysis which assigned the landraces and breeding materials into separate groups. The distribution of alleles was found to be correlated with population structure, domestication history and eco-geographical factors. The high allelic richness in the studied set of barley genotype provides insights into the available diversity and allows the construction of core groups based on maximizing allelic diversity for use in barley breeding programs.

Genetic diversity across origins, growth habits and number of ear rows groups. To compare among groups' genetic variation based on gSSR and EST-SSR, the barley genotypes were grouped based on their (i) origins and breeding history, (ii) growth habits and (iii) the number of ear rows. Genetic diversity parameters for these groups are presented in Table 4. The mean number of different alleles (Na), effective number of alleles (Ne), Shannon's index (I), unbiased expected heterozygosity (uHe) and number of private alleles (Npa) calculated based on the allelic variation of 77 EST-SSRs were slightly higher for Iranian landraces (3.29, 2.18, 0.83, 0.47 and 5, respectively) compared with exotic landraces (3.20, 2.10, 0.80, 0.46 and 4, respectively), but the differences were not significant (Wilcoxon test, p-value = 0.10). However, the mean of all parameters was lower for "varieties and advanced breeding lines" (2.71, 1.86, 0.66, 0.39 and 2, respectively) compared with the landraces. Comparison of the parameters calculated based 72 gSSR data showed that the Iranian landraces and exotic landraces were equivalent in mean values of Na, Ne, I and uHe, but the number of private alleles in Iranian landraces (32 alleles) was significantly higher compared with exotic landraces (11 alleles) (Wilcoxon test, p-value ≤ 0.001). For this marker set, "varieties and advanced breeding lines showed the lower mean value of Na (4.12), Ne (2.73), I (1.05), and (0.56) except Npa which was 12 compared with 11 for exotic landraces.
The mean values of Na, Ne, I and uHe estimated based on EST-SSR data were slightly higher in winter growth habit (3.31, 2.25, 0.85 and 0.49, respectively) than spring growth habit (3.19, 1.92, 0.72 and 0.41, respectively), but significantly higher than facultative genotypes (2.42, 1.77, 0.59 and 0.36, respectively) (Wilcoxon, p-values ≤ 0.01). A total of 14, 3 and zero EST-SSR private alleles were detected in winter, spring and facultative growth habits, respectively. Higher diversity was observed among the genotypes with various growth habits using gSSR markers compared with EST-SRRs. The mean values of Na, Ne, I, uHe and Npa in winter growth habit genotypes were 5.64, 3.85, 1.32, 0.65, and 44, respectively, compared with 5.25, 3.10, 1.12, 0.55 and 17, respectively, in spring growth habit and 3.39, 2.36, 0.86, 0.48 and zero, respectively, in facultative genotypes.
Considering ear row number, mean Na, Ne, I, uHe and Npa of EST-SSRs in two-row genotypes were 3.21, 1.94, 0.72, 0.411 and 8, compared with 3.29, 2.22, 0.84, 048 and 14, respectively in six-row genotypes. The mean of these parameters for gSSRs were 5.37, 3.06, 1.10, 0.54 and 13, in two-row barley and 5.58, 3.77, 1.31, 0.65 and 41, respectively in six-row barley. The analysis revealed slightly higher genetic diversity in six-row genotypes Table 2. Description of the used EST-SSR loci including chromosome location (Ch), major allele frequency (MAF), number of alleles (Na), effective number of alleles (Ne), Shannon's information index (I), gene diversity (He) and polymorphic information (PIC). www.nature.com/scientificreports/ www.nature.com/scientificreports/ compared with two-row genotypes, but the difference was only significant for Npa. However, allelic diversity calculated based on gSSR data was higher in both groups compared with EST-SSRs.
Inferring the population structure. In the model-based clustering using SSR, EST-SSR, and combined data sets, the log-likelihood value [LnP(D)] increased continuously as K changed from 1 to 10, but inflection was evident when K increased from 2 to 3 (Figs. 1c, 2c, 3c). Thus, the most likely numerical value of K was 3. The further validation of the optimal number of clusters (K) was assessed using the second-order statistics of ΔK. The ΔK value showed a peak at K = 3 (Figs. 1b, 2b, 3b), which supported the classification of the studied populations into three major sub-populations corresponding to Iranian landraces, exotic landraces and "varieties and advance breeding lines" (Figs. 1d, 2d, 3d). Considering a probability of membership threshold of 70%, Iranian landraces, exotic landraces, and "varieties and advanced breeding lines" except few exceptions were assigned into three distinct sub-populations. The distance-based cluster analysis using Neighbor-Joining algorithms and based on gSSR, EST-SSR, and combined data sets assigned the 144 barley genotypes into three clusters which were fully in agreement with the results of model-based clustering implemented in STRU CTU RE (Figs. 1a, 2a, 3a). In the resulted phylogenetic trees, the Iranian landraces were grouped along with two landraces from Egypt and one from Spain. All the varieties and advanced breeding lines constructed a separate group and the exotic landraces were assigned into a distinct cluster.
The PCoA based on origin and breeding history of the barley genotypes using EST-SSR data could separate Iranian landraces, exotic landraces, and "cultivars and advanced breeding lines" into distinct groups. The 1st and 2nd coordinates explained 18.47 and 15.73% molecular variation. The first coordinate discriminates Iranian www.nature.com/scientificreports/ landraces from the other genotypes (Fig. 4a). In PCoA using gSSR data, 16.42 and 13.06% of molecular variation conserved by the 1st and 2nd coordinates, respectively. Plotting of the genotypes based on two first coordinates revealed some admixture between Iranian and exotic landraces (Fig. 4b). When PCoA was performed using EST-SSR + gSSR data, the 1st and 2nd coordinates explained 16.58 and 15.79% of the variation, respectively and biplot of the genotypes using 1st and 2nd coordinates could discriminate all three groups (Fig. 4c). The principal coordinate analysis was also conducted to discriminate genotype based on their growth habits (spring, winter, and facultative). Using EST-SSR, the 1st coordinate explaining 37.97% of genetic variation distinguished winter habit versus the spring and facultative types, while based the 2nd coordinate accounting for 25.83% of the variation high admixture was observed among all growth habits (Fig. 5a). Analysis using gSSR and combination of EST-SSR, and gSSR data revealed better discrimination winter, spring and facultative genotypes. In both analyses, the 1st coordinate could separate wither types from spring and facultative genotypes and spring and facultative types were clustered into two distinct groups (Fig. 5b,c). Analysis of population structure between two-and six-row genotypes using PCoA based on EST-SSR, gSSR and EST-SSR + gSSR data showed discrimination of two groups in all analysis. The 1st coordinate accounting for 37.98, 39.35 and 40.89% of molecular variation of EST-SSR, gSSR and EST-SSR + gSSR data could distinguish two-and six-row barley (Fig. 6a-c).
The allelic pattern of EST-SSR and gSSR markers across barley genotype grouped based on origin and breeding history, growth habit, and number of ear rows are presented in Table 5. The number of different alleles per locus (Na) in Iranian landraces, exotic landraces and "cultivars and advanced breeding lines" were 3.29, 3.20 and 2.71 for EST-SSRs and 5.53, 5.17 and 4.12 for gSSRs, respectively. The effective number of alleles (Ne) was also higher for gSSR markers (4.07, 3.88 and 2.73) compared with EST-SSR marker (2.18, 2.10, and 1.86) across Iranian landraces, exotic landraces and "cultivars and advanced breeding liners". Shannon's index (I) and unbiased expected heterozygosity (uHe) ranged from 0.66 to 0.88 and 0.39 to 0.47, respectively for EST-SSR and 1.05 to 1.25 and 0.56 to 0.61 for gSSR markers. The number of private alleles was higher in Iranian landraces for both Neighbor-Joining and model-based cluster analyses cold assign "Iranian landraces", "exotic landraces" and "varieties and advanced breeding lines" into separate groups.   www.nature.com/scientificreports/ markers (5 and 32 for EST-SSR and gSSR, respectively) compared with exotic landraces (4 and 11 for EST-SSR and SSR, respectively) and "cultivars and advanced breeding lines (2 and 12 for EST-SSR and SSR, respectively). EST-SSR and gSSR diversity within spring barley as revealed by Na, Ne, I and uHe were not significantly differed from those of winter type but were higher compared with facultative genotypes. The higher number of private alleles were detected in winter genotypes (14 and 44 for EST-SSR and gSSR, respectively) compared with spring genotypes (3 and 17 for EST-SSR and gSSR, respectively) and facultative (no private allele). Genetic diversity based on ear row type revealed no major differences in the number of different alleles per locus and the effective number of alleles found in the two-row genotypes (EST-SSR: 3

Discussion
Knowledge of population structure and genetic relationships among the genotypes is a prerequisite for plant breeding programs as well as for emerging genome-wide association studies (GWAS) as an alternate approach for QTL detection in comparison to linkage map-based QTL analysis 23 . Besides, to achieve future production under changing environments, greater genetic diversity than that is present in current elite germplasms will be needed 24 . Fortunately, an extensive reservoir of biodiversity has been stored in genebanks as seeds of historical breeding materials, locally adapted landraces, and crop wild relatives which could be used to enrich cultivated gene pools. Although the ease of mobilization of favorable alleles into breeding materials is inversely related to the degree of adaptation, advances in genomics and molecular breeding technologies can accelerate the use of exotic germplasm for crop improvement 25 . However, access to novel allelic combinations available in the genebank collections requires thoughtful and renewed genotypic and phenotypic characterization of the materials 6 .

Allelic variation and genetic diversity.
In the present study, a panel of 144 barley genotypes (80 tworowed and 64 six-rowed) including 69 landraces from various regions of Iran, and 50 landraces from other countries mainly from China and Egypt along with 9 advanced breeding lines and 16 commercial cultivars were genotyped using 149 SSR markers. This set of SSR markers amplified 691 alleles in the genotype under study and could provide a reasonable genotypic tool as a result of relatively high allele number, informativeness, and genomic coverage. The mean number of alleles/locus (4.64), gene diversity (0.55) and PIC (0.50) indicate a relatively high level of genetic variation in our study compared with some previous studies. An average of 3.20, 4.32, and 8.10 alleles/locus and the mean PIC of 0.38, 0.52, and 0.60 were reported in the analysis of diverse germplasms including wild and cultivated barley using SSR markers [26][27][28] .
The number of alleles/locus and PIC values have been suggested as a criterion to assess the level of genetic diversity in germplasm collections 29 . The presence of 17 SSR loci with PIC value of 0.80 or a higher and high number of alleles/locus could provide unique fingerprints for the studied barley genotypes and used for rapid analysis of genetic diversity and population structure in barley germplasm collections. A total of 691 polymorphic alleles detected in our study were enough to assess genetic relationships among barley landraces and cultivars. It was concluded that the presence of 350-550 alleles is enough for objective assessment of the genetic relationship between wheat accessions and it could be applicable for barley accessions as well 30,31 .
Overall genetic diversity in our study considering all types of materials was slightly lower compared with some studies carried out using barley landraces 18,28 . In our study, significant differences were observed between landraces and improved genotypes in number of alleles, number of effective alleles, number of private alleles, and gene diversity. On the other hand, the number of private alleles was 185 for the landraces while it was only 14 for cultivars and advanced breeding lines. It shows that modern breeding programs reduced the level of genetic variation in breeding materials. However, in our study, the number of landraces (119) was higher than that of cultivars and advanced breeding lines (25), but low level of genetic variation in the modern cultivars and breeding materials indicates the need of introducing new alleles from sources such as landraces and wild relatives to tackle climate change through plant breeding and better use of plant genetic resources. Compared with exotic landraces and "cultivars and advanced breeding lines", Iranian landraces had a higher level of gene diversity as revealed by Na, Ne, He, and number of private alleles. Significant differentiation assessed by F ST (P = 0.001) was observed among the three groups. The number of private alleles in Iranian landraces (37 alleles at 29 loci) was also higher compared to exotic landraces (15 alleles at 14 loci) and "cultivars and advanced breeding lines" (14 alleles at 11 loci). Although Iranian landraces contained slightly more individuals compared with exotic landraces and "cultivars and advanced lines" (69 vs 50 and 25), this was consistent with the higher genetic variation in the Iranian landraces as revealed by number of alleles, number of effective alleles, gene diversity and Shannon's index ( Table 2). The higher genetic diversity in Iranian landraces could be due to the fact that a second domestication event may have occurred in barley, possibly in Central Asia at the eastern edge of the Iranian Plateau, and that this separate origin may have been the progenitor of present-day barleys found in East and South Asia 32,33 . Informative markers. Among the SSR loci with specific alleles in Iranian barley landraces, the following markers were associated with QTLs for malting quality, disease resistance, agronomic and physiological traits as reported by previous studies; Bmac209 (barley scald) 34 ; Bmag345 (malting quality: hot water extract, diastatic power, alpha-amylase activity and free alpha-amino acid) 35 ; Bmag382 (malting quality: hot water extract, grain protein content) 36 ; Bmag518 (flag leaf physiological traits (intercellular CO 2 concentration) 37 , chlorophyll content 38 , septoria speckled leaf blotch 39  www.nature.com/scientificreports/ and net type net blotch 39 and days to maturity 41 ); Bmag751 (carbon isotope discrimination and grain yield 41 ); EBmac541 (dehydration gene for drought tolerance 42 ); GBM1482 (lodging and lodging components 43 ; GBMS183 (plant height and spike number per plant); GBMS141 (Endosperm hardness 40 ); GMS89 (grain yield 26 ); HVM40 (nitrate accumulation 44 , days to heading 26 ) and HVM74 (major grain protein, soluble α-NH2 and grain protein concentration 44 . Assessment of group-specific alleles across the genome in plant germplasms should identify both the regions of the genome that should be conserved and the regions of the genome where there are opportunities to introgress new allelic diversity in the breeding materials without disrupting desirable gene complex. Population structure. We employed various statistical approaches to ensure the reliability of the inferences made regarding population structure in the collection. Although the results of model-based clustering were further ascertained by the results from the distance-based cluster analysis (Figs. 1, 2, 3) and PCoA (Figs. 4, 5, 6), the dendrograms resulted from NJ clustering were mostly in accord with the model-based clustering inferred groups. In the PCoA bi-plots, overlapping was observed among some of the inferred groups and represented a blurred distinction among the groups. Especially cluster I and III and cluster II and IV show some overlap. The row type in barley is an important determinant of the population structure 45 . In our studies, with K = 2, two-and six-rowed genotypes were assigned into distinct sub-populations. With an optimal number of clusters K = 4, except cluster III, two-and six-rowed genotypes were grouped separately. Cluster I, II, and IV consisted of 89%, 88% and 74% six-row, two-row, and two-row accessions, respectively. Cluster III is structured by 45% two-row and 55% six-row accessions. Our observations showed that Iranian barley landraces are different in comparison to the exotic landraces and mostly grouped separately. This could be due to the different evolutionary and domestication history of barley from Iran offers a plausible explanation for the observed differences 32,33 .
To further explore the genetic diversity and relationships among the clusters and within the clusters, allelic pattern and various diversity statistics were assessed for each of the STRU CTU RE inferred clusters ( Table 2). The cluster III (a relatively mixed group with two-and six-row accessions consisted of cultivars, advanced breeding lines and some landraces from various countries) with a higher number of member among clusters (42) had maximum mean number of alleles, number of effective alleles, gene diversity and Shannon index (4.29, 3.09, 0.57 and 1.09). Group-specific rare alleles detected in cluster III was also high compared with the others, emphasizing the presence of a higher diversity in this group. A total of 52 group-specific alleles at 36 loci were identified in cluster III. The number of group-specific alleles for cluster I was 19 alleles at 18 gSSR loci, one for group IV and no specific alleles were detected for cluster II. The higher heterogeneity in cluster III may be due to ecogeographical diversity and multiple barley domestication sites apart from Fertile Crescent. Comparison of the genetic structure of Western and Eastern cultivated barleys proposed a secondary domestication site of barley somewhere 1500-3000 km east of Fertile Crescent and found greater allelic differences between these groups 32 .

Conclusion
In the present study, we utilized microsatellite markers to assess the effect of breeding history, origin, and growth type on genetic diversity and population structure of barley genotypes. Assigning of Iranian landraces, exotic landraces and "varieties and advanced breeding lines" into separate groups based on both gSSR and EST-SSt data could be due to their different evolutionary, domestication, and breeding history. The higher genetic diversity in Iranian landraces revealed by high allelic diversity and private alleles number could be due to the fact that a second domestication event may have occurred in barley, possibly in Central Asia at the eastern edge of the Iranian Plateau. The allelic diversity presented in the studied collection, especially in Iranian landraces was found to be correlated with population structure, domestication, and eco-geographical factors. The high allelic richness in the studied set of barley genotype as revealed by genetic and statistical analyses provides insights into the available diversity and allows construction of core groups based on maximizing allelic diversity for use in Iranian barley breeding programs. The Iranian landrace panel comprised lines that were well distributed across all eco-geographical regions with different climatic regimes which could be used for introgression of favorite alleles into breeding lines.

Materials and methods
Plant materials. A set of 144 barley genotypes (80 two-rowed and 64 six-rowed) including 69 landraces from various regions of Iran, 23 from China, 12 from Egypt and 15 from the USA, England, India, Pakistan and Algeria along with 9 advanced breeding lines and 16 commercial varieties was used. Out of 144 genotypes, 61, 69, and 14 were spring, winter, and facultative growth habits, respectively. The majority of six-row and two-row barley genotype were winter types (88.8%, 70.0%, respectively) ( Table 5).
Genomic and EST-SSR genotyping. Genomic DNA was extracted from bulked fresh leaf samples of each genotype using the CTAB method 46 . Although the genotypes used in our study are homozygous lines, a pool of leaves from 15 plants of each genotype was used for DNA extraction to provide a reliable measure of the possible genetic heterogeneity within each genotype 47,48 . The 0.8% (w/v) agarose gel electrophoresis and spectrophotometer were used to examine the quality and quantity of the DNA samples, respectively. Each genotype was analyzed using a set 77 and 72 polymorphic EST-SSR and gSSR markers, respectively distributed on seven chromosomes of barley. Bin locations of the markers were adapted from Grain Genes Map Data Report: Barley, Steptoe × Morex, SSR (https ://wheat .pw.usda.gov/cgi-bin/grain genes /repor t). The PCR reaction mixture was prepared in a final volume of 10  www.nature.com/scientificreports/ 7 min. A Gel-Scan 3000 electrophoresis system (a real-time laser scanning electrophoresis system, Corbett Co.) based on 4% ultra-thin (0.2 mm) non-denature polyacrylamide gel stained by ethidium bromide was used to visualize PCR products.
Statistical analysis. Number of allele (Na), number of effective allele (Ne), major allele frequency (MAF), Shannon's index (I), gene diversity (He) and polymorphic information content (PIC) were determined for all the analyzed markers across the total population as well as landraces and "varieties and advanced breeding lines" using PowerMarker version 3.25 49 and GenAlEx 6.5 software 50 . Polymorphism Information Content (PIC) values were determined as PIC = 1 − p 2 i − 2 p 2 i p 2 j 51 . Gene diversity was calculated as He = 1 − p 2 i 52 . Shannon's information index was estimated as I = − p i lnp i , where p i and p j are the frequencies of the ith and jth alleles of a given locus, respectively.
Population structure in a collection of 144 barley genotypes including Iranian and exotic landraces, advanced breeding lines, and cultivars was assessed by two statistical methods. The model-based clustering implemented in STRU CTU RE 2.3.4 53 was performed by running 100,000 Markov chain Monte Carlo (MCMC) iterations after a burn-in of 100,000 replicates with 5 independent runs per K ranging from 1 to 10, using the admixture model with correlated allele frequencies, which estimates fractions of individual genomes that belong to different ancestry groups. The optimal number of cluster (K) was determined by calculating the mean posterior probability for each K value, LnP(D), which is based on the estimated maximum log-likelihood values 53 . The ΔK values (the rate of change in the log probability of data between successive K values) was also calculated as suggested by 54 using STRU CTU RE-HARVESTER, version 0.6.94 55 .
We used principal coordinate analysis (PCoA) for genetic differentiation between two-and six-row genotypes, among spring, winter, and facultative genotypes as well as Iranian landraces, exotics landraces, and "varieties and advanced breeding lines. The PCoA was performed on GenAIEx 6.501 software 27 . Finally, the unrooted neighbor-joining (N-J) clustering algorithm under the Reynold 1983 distance coefficient was applied using the software PowerMarker 3.25 26 to investigate the relationship of barley accessions.
A hierarchical analysis of molecular variance (AMOVA) implemented in GenAlex 6.501 27 was used the partition of the observed molecular variation among and within the three clusters inferred using STRU CTU RE and NJ cluster analyses corresponding to Iranian landraces, exotic landraces and "cultivars and advance breeding lines". The genetic parameters including the number of a different allele with a frequency ≥ 0.05, the number of effective alleles, Shannon's index (I), and unbiased expected heterozygosity (uHe = (2N/(2N − 1)) × He) were estimated for each cluster, where N is group size.