Genetic diversity and population structure of an African yam bean (Sphenostylisstenocarpa) collection from IITA GenBank

African yam bean, AYB (Sphenostylisstenocarpa), is an underutilized legume of tropical Africa. AYB can boost food and nutritional security in sub-Saharan Africa through its nutrient-rich seeds and tubers. However, inadequate information on germplasm with desirable agro-morphological traits, including insufficient data at the genomic level, has prevented the full exploitation of its food and breeding potentials. Notably, assessing the genetic diversity and population structure in a species is a prerequisite for improvement and eventual successful exploitation. The present study evaluated the population structure and genetic diversity of 169 accessions from the International Institute of Tropical Agriculture (IITA) collection using 26 phenotypic characters and 1789 single nucleotide polymorphism (SNP) markers. The phenotypic traits and SNP markers revealed their usefulness in uniquely distinguishing each AYB accession. The hierarchical cluster of phenotypes grouped accessions into three sub-populations; SNPs analysis also clustered the accessions into three sub-populations. The genetic differentiation (FST) among the three sub-populations was sufficiently high (0.14–0.39) and significant at P = 0.001. The combined analysis revealed three sub-populations; accessions in sub-population 1 were high yielding, members in sub-population 2 showed high polymorphic loci and heterozygosity. This study provides essential information for the breeding and genetic improvement of AYB.


Results
Phenotypic differentiation and diversity across 169 AYB accessions. The principal component analysis (PCA) revealed the most discriminative phenotypic traits across the 169 accessions. The traits that largely contributed to the observed variation in each PC axis are shown in bold (Table 1). Days to 1st flowering, days to 50% flowering, dry seed matter, petiole length, seed moisture content (SDMC), terminal leaf length (TLL), terminal leaf width (TLW), 100 seed weights, and seed color are traits that had high loading on more than one principal components (PC). The first eight PCs cumulatively explained 68.68% of the total phenotypic variation; the eigenvalues of the eight PCs varied from 1.11 to 4.81. PC1 made the highest contribution of 18.48% of the total variations, and nine quantitative traits contributed most in the PC axis. PC2 accounted for 13.63% of the total variation of which two quantitative traits; dry seed matter, seed moisture content, and four qualitative traits; mainstem pigmentation (MASPIG), branch pigmentation (BRAPIG), petiole pigmentation (PETPIG), and seed color, contributed most to the observed variation. The traits, dry seed matter, petiole lenght, terminal leaf length, and terminal leaf width, were the main traits that contributed to the observed variation in PC3. PC4 accounted for 7.42% of the total variation across the accessions. The traits that contributed most to the observed variations in PC4 were days to 1st flowering, days to 50% flowering, and 100 seed weight. Seed thickness, 100 seed weight (PC5); seed variegation, seed color (PC6); flower color (PC7), and growth habit (PC8) contributed 5.93%, 5.40%, 5.05%, and 4.25%, respectively, to the total variations across the accessions.
The genetic distance among the accessions based on their phenotypic evaluation varied from 0.06 to 0.57, with an average of 0.27. The maximum distance was observed between TSs-363 and TSs-446, whereas the minimum distance was observed between TSs-445 and 59B. Furthermore, the hierarchical cluster dendrogram grouped the 169 accessions into three major clusters representing three sub-populations (Fig. 1). Sub-population 1 had the highest number of accessions (72), followed by sub-population 2 (61) and sub-population 3 (36) accessions. The goodness of fit of the cluster dendrogram showed a high cophenetic correlation coefficient of 0.89. The mean values (Supplementary Table S3) of the sub-population showed that accessions grouped in sub-population 1 produced more grain (66.93 g) and were significantly different from sub-population 3 (53.06). More so, the number of seeds per pod (12.18), pod length (16.77 cm), and seed moisture content (7.00%) of accessions in sub-population 1 were higher and significantly different than observed in sub-population 3. Sub-population 3 presented a pod length of 15.77 cm, seed moisture content of 6.58%, and number of seeds per pod of 11.50. About 54% of accessions in sub-population 1 were none pod-shattering; the sub-population was different from sub-population 3.
The correlation among the 10 qualitative traits ( Supplementary Fig. S1) showed a positive correlation for all the qualitative traits evaluated. A strong correlation (0.95) in the forward and backward direction was observed between main stem pigmentation (MASPIG) and branch pigmentation (BRAPIG). Likewise, a moderate correlation (0.55) was obtained between seed variegation (SEDVAR) and seed color (SEDCOL) in the backward direction. Furthermore, the correlation among the 16 quantitative traits ( Supplementary Fig. S2) showed a statistically significant correlation at P < 0.001 for most of the quantitative traits evaluated. Seed moisture content (SDMC) and dry seed matter (DRMAT) showed highly significant (P < 0.001) and perfect negative correlation (-1.00). Highly significant (P < 0.001) and strong correlation (0.67) was observed between days to 1st flowering (DISFL) and days to 50% flowering. More so, a highly significant, moderate, and positive correlation (0.58) was observed between total seed weight (TSDWT) and seed moisture content; however, a negative (−0.58) but highly significant correlation was found between total seed weight and dry seed matter. www.nature.com/scientificreports/ Genetic diversity and population structure of AYB accessions. A total of 1789 SNPs from DArTseq was used in studying the genetic diversity of 169 AYB germplasm of IITA collections. The number of effective alleles (Ne) in the population was 1.61, and Shannon's information index (I) was 0.59. The population's expected heterozygosity (He) and observed heterozygosity (Ho) were 0.35 and 0.15, respectively. Across the 1789 SNPs, the minor allele frequency ranged from 0.05 to 0.5 with an average of 0.22, and the major allele frequency ranged from 0.50 to 0.95 with an average of 0.78 ( Table 2). The genetic distance of the studied accessions based on the Identity-By-State dissimilarity matrix varied from 0.004 to 0.41, with an average of 0.29. The maximum distance (0.41) was observed between accessions TSs-109 and TSs-23C, whereas the minimum (0.004) distance was obtained between TSs-151B and TSs-449. The cophenetic coefficient correlation of the dissimilarity matrix was 0.73, confirming the accuracy of the matrix used for cluster generation. The constructed hierarchical cluster dendrogram separated the accessions into three major clusters representing three sub-populations (1, 2 and 3) (Fig. 2). Sub-population 3 had the highest number of accessions (138), followed by sub-populations 1 (20), and sub-population 2 had the least number of accessions (11). The population structure of the accessions showed optimal delta K value = 2 and K = 3 ( Supplementary  Fig. S2). Based on the information from the hierarchical cluster, dendrogram delta K = 3 was selected as optimally describing the population structure within the accession. Thus, indicating three sub-populations within the 169 accessions (Fig. 2). The distribution of accessions into sub-populations followed the same pattern as the dendrogram clustering (Fig. 2). For example, the population structure showed 27 admixed individuals in subpopulation 3; likewise, 3 accessions were admixed in sub-population 1, whereas 2 accessions were admixed in sub-population 2. Similarly, the principal coordinate analysis (PCoA) based on a pairwise genetic distance matrix across the 169 AYB accessions also split the accessions into three groups representing three sub-populations (Fig. 3). The PC1 axis represented 5.87% of the explained variation in the accessions, while the PC2 and PC3 axis explained 3.98% and 3.28% of the observed variation, respectively (Supplementary Table S4). www.nature.com/scientificreports/ Genetic diversity of identified sub-populations. Accessions in sub-population 3 were relatively genetically diverse, as shown by the number of unique alleles (154), Shannon information index (0.58 ± 0.004), expected heterozygosity (0.35 ± 0.003), observed heterozygosity (0.17), and % polymorphic loci (100%). In addition, sub-population 2 had the highest number (400) of unique alleles (private allele) in contrast to alleles in sub-population 1 (0) and sub-population 3 (154); similarly, sub-population 2 showed the highest number of effective alleles (1.64 ± 0.011) among the sub-populations. Sub-population 1 showed low values for all the estimated diversity parameters, being the least diverse; however, sub-population 3 was the most varied, followed by sub-population 2 (Table 3). Furthermore, the genetic distance among accessions in each sub-population revealed the existence of considerable genetic diversity in the studied materials. The distance matrix of accessions in Sub-population 1 ranged from 0.004 to 0.314 with a mean value of 0.194. The maximum distance in the sub-population was observed between accession TSs-431 and TSs-47, and the minimum distance was recorded between TSs-151B and TSs-449. In sub-population 2, the genetic distance between TSs-69 and TSs-95 was the highest (0.34), whereas TSs-109 and TSs-89 showed the least distance (0.14) in the sub-population. The average distance across the sub-population was 0.28. In addition, accessions in sub-population 3 presented a genetic distance that varied from 0.60 to 0.99 with an average of 0.71. TSs-60 and TSs-82 were the most diverse accessions based on their genetic distance. In contrast, a closer relationship was observed between TSs-166 and TSs-2015-07 than other accessions of the same population. Expected heterozygosity (He) was higher than the observed heterozygosity (Ho) in all the subpopulations viz; sub-population 1 (He = 0.23 ± 0.004, Ho = 0.05); sub-population 2 (He = 0.34 ± 0.004, Ho = 0.07) and sub-population 3 (He = 0.35 ± 0.003, Ho = 0.17) an indication of inbreeding.  www.nature.com/scientificreports/ Analysis of molecular variance (AMOVA). The calculated distance was used to analyze molecular variance (AMOVA). The AMOVA performed on the three sub-population identified by STRU CTU RE revealed that 13% of the total variation was found among populations, whereas the remaining 87% was found among individuals ( Supplementary Fig. S3). The pairwise F ST among the three sub-populations varied from 0.14 to 0.39 and were significant at P-value (0.001), while the F' ST ranged from 0.12 to 0.28. A high level of differentiation was observed among accessions in sub-population 1 and sub-population 2 (0.39). Additionally, the level of differentiation observed between sub-populations 1 and 3 (0.20) was slightly higher than that observed between sub-populations 2 and 3 (0.18). (Table 4).  www.nature.com/scientificreports/ Combined analysis of phenotypic and genotypic data. The distance matrix of the combined phenotypic and genotypic data revealed a maximum genetic distance of 0.89, observed between TSs-446 and TSs-363. The minimum distance, 0.12, was displayed between TSs-151B and TSs-87B, while the average distance across the accessions was 0.56. A hierarchical cluster generated from the summation of the phenotypic and genotypic distances revealed three clusters representing three sub-populations (Fig. 4). Sub-population 3 had the highest number of accessions (61), which was followed by sub-population 1 (59) and sub-population 2 (49). The high cophenetic coefficient of correlation (0.84) reported for the combined matrix further confirms the goodness of fit of the combined hierarchical cluster dendrogram. The grouping of accessions based on phenotypic, genotypic, and combined (phenotypic and genotypic) analysis showed that most accessions remained together in a cluster across the different dendrograms. Comparing the dendrogram drawn with the phenotypic data to the dendro- Table 3. Mean allelic patterns across three sub-populations. Ne number of effective alleles, I Shannon's information index, He expected heterozygosity, Ho observed heterozygosity.  The Mantel test revealed a low correlation (r = 0.02); between the dissimilarity matrixes of phenotypic and genotypic data; however, the correlation r = 0.22 observed between genotypic and combined matrixes suggests that the matrix entries are moderately associated. A high positive association r = 0.96 was observed between the combined matrix and phenotypic matrix (Supplementary Table S5; Supplementary Fig. S5). The mean analysis www.nature.com/scientificreports/ of the three sub-population generated from the combined (phenotypic + genotypic) shows that accessions in subpopulation 1 reached 50% flowering (117.79 days) earlier than accessions in sub-population 2 (121.70 days) and sub-population 3 (118.26 days) and was significantly different from sub-population 2. Also, accessions grouped in sub-population 3 germinated earlier (12.32 days) and significantly differed from those in sub-population 2 (12.84 days). Furthermore, accessions in sub-population 1 yielded more seeds (67.31 g) than accessions in subpopulation 2 (56.28 g) and 3 (61.14 g), and the mean value was significantly different from that of sub-population 2. Across the three sub-populations, accessions grouped in sub-population 2 showed more diversity in flower color (2.02) and were significantly different from accessions in sub-population 3 (1.92). The diversity in seed color was also more prominent in sub-population 2 than observed in sub-populations 1 and 3. Moreso, a reasonable number (49%) of accessions clustered in sub-population 3 showed no variegation on seed, and the sub-population was significantly different from sub-populations 1 and 2. Similarly, 33% of accessions in sub-population 3 exhibited pod-shattering; the sub-population was significantly different from sub-population 1 and 2. Although the diversity parameters of the genotypic data varied across sub-population, an estimate of heterozygosity showed that sub-population 2 were more diverse than other sub-populations; again, the SNP markers associated with accessions in sub-population 2 showed 100% polymorphic loci, which was followed closely with markers associated with accessions in sub-population 3 (99.94%) and those in sub-population 1 (99.27%) ( Table 5).

Discussion
Despite the food and nutrition potentials of AYB, farmers' interest in cultivating the crop is perceived to be dwindling 10,12 ; the lack of interest could be linked to identified limitations, including prolonged cooking time of about 6-24 h, the abundance of anti-nutritional factors in seeds, and an extended maturity cycle of about 9-10 months. Understanding the population structure and identifying genetic variations within the crop's germplasm can facilitate its improvement 18 . Phenotypic and molecular methods are widely explored for genetic study in plant species 18,37 , neither of the methods is superior to the other 29 . The methods can, therefore, be used independently or complementary 31 . The present study used DArTseq derived SNPs and combined approach to study the genetic diversity and population structure of a selected AYB germplasm. The significance of PCA in studying the extent and pattern of variations across populations has been documented by authors Sharma et al. 38 ; Nadeem et al. 15 . Previous characterization studies in AYB likewise reported the relevance of phenotypic traits in understanding genetic diversity in the crop 17,32 . In the present study, analysis based on phenotypic traits indicated a substantial diversity within the accessions. PC1 to PC8 accounted for 68.68% of the phenotypic variability observed in the accessions. In particular traits, including days to 1st flowering, days to 50% flowering, dry seed matter, petiole length, 100 seed weight, and seed color contributed highly to the observed variations as shown by their PC values and contribution to more than one PC axis. The mentioned traits can be used to assess diversity in AYB collections efficiently. A genetic distance range of 0.06-0.57 was observed in the present study and the accessions clustered into three sub-populations. In similar studies using phenotypic traits, Aina et al. 17 obtained a distance of 0.0003-0.59 across 50 AYB collections sourced from IITA. The variation across means of phenotypic traits, e.g., days to 1st flowering (95.31-98.67 days), days to 50% flowering (117.17-124.33 days), total seed weight (53.06-66.39 g), observed in our study is an indication of the existing diversity in the crop. However, the mean values reported for days to 1st flowering and days to 50% flowering differs from earlier findings; in the phenotypic evaluation of 16 AYB accessions grown in Nigeria, days to 1st flowering was reported to vary from 139.40 to 159.21 days 35 . Also, Aina et al. 17 obtained mean values between 65.00 and 97.00 for days to 50% flowering in 50 accessions characterized in Nigeria. Similarly, Ojuederie et al. 32 reported days to 50% flowering as between 97.50 and 115.83 across 40 accessions evaluated in Nigeria. Nevertheless, the differences between our findings and previous studies could be due to variations in environmental conditions and sample size.
In addition, the correlation among 26 traits phenotypic traits in the present study showed significant associations across most of the traits; for instance, days to 1st flowering showed a significant positive correlation with days to 50% flowering (0.67), which is a good indication towards breeding for early maturity. The availability of accessions with less than 9-10 months maturity could encourage the crop's cultivation by farmers. Seed moisture content correlated positively with total seed weight (0.58), showing the importance of trait in assessing seed yield. Positive correlations between seed weight and other characteristics were also reported in earlier studies 14,34,35 . Accessions including TSs-2015-07, TSs-1, TSs-12, TSs-10, and TSs-109 found in sub-population 1 characterized with reduced days to 50% flowering (117.17) could be choice materials for breeding of early maturity in the crop. Sub-population 1 was likewise associated with high seed yield (66.93 g) and number of seeds per pod (12.18) and could therefore be exploited for improving seed yield in the crop. The selection of such materials for improvement has been recommended as an important improvement strategy for the crop 39,40 . Also, nonshattering accessions in sub-population 1 could be useful in breeding for accessions with reduced pod shattering. Same with our findings, TSs-1 and TSs-12 were also identified as non-pod shattering accessions 39 . Furthermore, improved cultivars could be developed from hybridizing the distantly related accessions (TSs-363 and TSs-446) identified in this study by phenotypic and genotypic analysis (TSs-431 and TSs-47). Past genetic diversity studies in AYB using AFLP, RAPD, ISSR, and SSR markers transferred from cowpea reported considerable diversity in the crop [33][34][35] . Among the three sub-population observed, sub-population 3 was the most genetically diverse of the three sub-populations followed closely by sub-population 2 and then sub-population 1 as indicated by the population's high expected heterozygosity (He), Shannon information index (I), and percentage polymorphic loci (PIC). Across the three subpopulations, the observed heterozygosity was lower than the expected heterozygosity, which can be attributed to the non-random mating among the individuals of the population suggesting inbreeding. The finding could be explained by the fact that AYB shows a high percentage of self-pollination 2,3 . The SNPs dependent approaches, STRU CTU RE, hierarchical cluster dendrogram, PCoA, and AMOVA implemented in the present study consistently identified three subpopulations across the 169 AYB accessions. The consistency in the clustering pattern agrees with reports in Camelina 41 rice 42 , and cowpea 43 . The genetic differentiation among the three sub-populations was significant was significant (P < 0.001) and the fixation index ranged from (F ST, 0.14-0.39), indicating a medium to a high amount of genetic differentiation 42,44 . Therefore accessions from each sub-population can be crossed and tested for heterosis.
In the present study, the combined genetic distance generated from phenotypic and genotypic data also indicated three sub-populations. The high cophenetic correlation coefficient ≥ 7.0 observed across the three distance matrixes used in constructing each hierarchical cluster dendrogram shows each dendrogram's fitness and ruling distortion in the data. Subjectively, the degree of fit is interpreted as: 0.9 ≤ r, very good fit; r < 0.7, very poor fit 45,46 . The Mantel test, Mantel 47 , showed a low correlation between the phenotypic and genotypic distance matrix, similar to findings reported in the diversity analysis of pepper 44 and winged yam 18 . The absence of a strong association between the phenotypic and genotypic data could be because the SNP data are not associated with the phenotypic traits evaluated; it could also be because molecular markers generally detect the non-adaptive types of variation and are not subjected to either/both natural and artificial selection which is attributed to phenotypic traits 18,48 . However, due to the inconsistency observed in studies involving phenotypic and genotypic evaluations, authors have recommended combining genotypic and phenotypic data as the best option for the efficiency of diversity assessment [48][49][50] . The evaluation of the grouping of accessions in the three dendrograms (phenotypic, genotypic, www.nature.com/scientificreports/ and combined) revealed a high pattern of similarity. The accessions grouped in sub-population 3 of the combined dendrogram retained 100% of their membership in sub-population 2 of the phenotypic dendrogram. Also, 83% of the accessions in sub-population 1 of the combined dendrogram clustered together in sub-population 1 of phenotypic dendrogram; however, the remaining 7% grouped in sub-population 3 of the phenotypic dendrogram. Similarly, 86% of the accessions in sub-population 1 of the phenotypic dendrogram remained together in sub-population 3 of the genotypic dendrogram, while 14% of the accessions clustered in sub-population 2 and 3 of the genotypic dendrogram. Furthermore, 86% of the accessions grouped in sub-population 1 of combined dendrogram maintained their membership in sub-population 3 of the genotypic dendrogram. The high correlation between phenotypic and combined dendrogram observed in this research is similar to the findings in winged yam 18 . However, the level of correlation obtained between the genotypic and combined dendrogram differs from the reported in winged yam 18 . In our study, the genetic diversity across the AYB population was confirmed further by the presence of high polymorphic loci of SNP markers associated with each population across the combined analysis. For example, sub-population 2 showed 100% polymorphic loci; more so, high heterozygosity was visible in sub-population 2, indicating high genetic diversity.
Conclusively, a sufficient level of genetic diversity was revealed among and within the 169 AYB accessions evaluated with phenotypic descriptors, DArT-SNPs markers, and combined analysis. The correlations observed between traits, including early maturity, seed yield, and main stem pigmentation, are valuable for AYB breeding activities. The polymorphic DArT-SNPs markers likewise showed efficiency in detecting the population structure and genetic diversity; the markers can therefore be explored for use in genome-wide association study (GWAS) and marker-assisted selection (MAS) in AYB. The complementary approach of combining phenotypic and genotypic data can be implemented in selecting divergent parental materials for hybridization, markerassisted selection (MAS), and genome-wide association study (GWAS).

Materials and method
Plant material. A total of 169 AYB accessions sourced from the GenBank of the International Institute of Tropical Agriculture (IITA) were evaluated for the present study; the passport data of the materials are shown in Supplementary Table S1. The AYB accessions were sourced and received following all the rules guiding plant material transfer between Nigeria and Ethiopia.
Phenotypic characterization. The 169 accessions were planted over two cropping seasons (2019/2020; 2020/2021) at Jimma Agricultural Research Center (JARC), Jimma, Ethiopia. The field evaluation was carried out under regulations guiding field experimentation of JARC. The experimental field sits at 1739 masl, N07°39.962′, and E036°46.74′ and was laid in Alpha lattice design with two replications of ten plants per accession. After sowing, each plant was stalked with a 3 m stick. Each accession was characterized using 26 phenotypic traits (16 quantitative and 10 qualitative); the traits were selected based on their abilities to comprehensively capture the existing diversity through all the crop's vital developmental stages to yield attributes. The IITA AYB descriptor list guided the trait selection 51 . The phenotypic traits evaluated, the assessment period and the method are presented in Supplementary Table S2. DArT sequencing. Two weeks after planting, about 1 g of young, healthy leaves was collected into labeled 1.2 ml cluster tubes. The tubes were immediately capped, placed on an ice bucket, and transferred to the Plant Molecular Laboratory at Jimma University, where they were kept in −80 °C freezer before lyophilization. The lyophilized leaves were shipped to SEQART Africa Laboratory at International Institute of Tropical Agriculture (ILRI), Nairobi, for DNA extraction and genotyping. The genomic DNA was extracted using the NucleoMag Plant kit, and DNA was purified with genomic DNA clean and concentrator. The purified DNA was quantified on 0.8% agarose gel electrophoresis. The DArT genotyping was done using SEQART Africa genotyping protocol 52 . In brief, genomic DNA was digested with two restriction enzymes; Mst1 was used as the rare cutter and pst1 as the frequent cutter. The digested DNA fragments were ligated using a common adapter, and a barcode adapter, the DNA fragments with a combination of common and barcoded adapters were selectively amplified. The PCR products were pooled and purified using a QIAquick PCR purification kit. The purified PCR products were sequenced on Illumina Hiseq 2500 using single reads. After the sequencing, FASTQ files generated by DArT were aligned against the African yam bean draft genome unpublished (provided by the Biosciences Eastern and Central Africa (BeCA-ILRI), and a HapMap file was generated.
Multivariate analysis and cluster generation of phenotypic data. The phenotypic data were analyzed with the R statistical package (Version 4.1.1) 53 . Analysis of variance (ANOVA) for each quantitative trait across two years was calculated using the PBIB.test function from the Agricolae R statistical package. Tukey's HSD test was performed to test the significant difference among the means. The ANOVA was performed using where Y is the traits, µ is the grand mean, E is the environment effect (years), B(E) is the block effect in environment, G is the genotype effect, GE is the genotype by environment interaction, e is the error. Furthermore, means analysis for qualitative data (ordinal) was analyzed using the Kruskal-Wallis test, and a post-hoc Dunns test was performed to test the significance of the means. The (binary data) were analyzed using the Chi-square test. Principal component analysis (PCA) across the LSmeans of phenotypic traits generated from the genotype by environment analysis was computed using the PCAmix function from the PCAmixdata package. PCAmixdata the model : www.nature.com/scientificreports/ is a suitable R package for multivariate qualitative and quantitative data analysis. The daisy function from the cluster package was used to generate the dissimilarity matrix using Gower 54 distance method, while the phylogenetic and evolution (ape) package was used to construct the hierarchical cluster dendrogram using the Ward. D2 option. The goodness-of-fit of the hierarchical dendrogram was estimated using the cophenetic coefficient of correlation. Finally, the correlation among the phenotypic traits (qualitative) was performed using the Good-manKruskal package. The ChartCorrelation function from the PerformanceAnalytics package was used for the quantitative traits.
Analysis of molecular data. A total of 7930 SNPs were generated from the DArTseq. The HapMap file was loaded into TASSEL software 5.2.73 55 for further filtering. The filtering was performed on sites retaining SNPs with a maximum of 20% missing values and a minimum and maximum allele frequency of 0.05 and 0.95, respectively. The filtered data generated 1789 SNPs, and the major allele and minor allele frequency were generated for the 1789 SNPs. The pairwise dissimilarity matrix, Identity-by-state (IBS) matrix, was calculated among individuals using PLINK software 56 . The IBS matrix was inputed into R software version 4.1.1 53 , and the ape package was used to construct a hierarchical cluster dendrogram based on Ward.D2 option. The effect of outliers in the pairwise matrix was minimized by using the cophenetic coefficient of correlation analysis implemented in R to estimate the goodness-of-fit of the hierarchical cluster dendrogram.
The population structure analysis of the 169 AYB accessions was performed using STRU CTU RE software version 2.34 (Jul 2012) 57 . First, the parameter set was inputed as follows; length of run, 30,000, and number of Markov chain Carlo (MCMC) after burning 30,000. Secondly, the "Admixture model" option of the "Ancestry Model" was selected; the admixture model is known to detect historical population admixture and estimate the number of natural genetic clusters. Next, the possible sub-population was estimated with a K-value analysis of k1 to k10; for each simulation, k was independently repeated five times. Finally, the STRU CTU RE HARVESTER 58 was implemented, and Evanno's Delta K 59 option was used to estimate the appropriate K value to describe the likely sub-population in the data set.
GenAlex software version 6.501 60,61 was used in calculating basic diversity parameters, including the number of private alleles, the number of effective alleles (Ne), Shannon information index (I), observed heterozygosity (Ho), expected heterozygosity (He), and fixation index (F) and % polymorphic loci across the 169 accessions and each sub-population. The clustering pattern of accessions was validated using principal component analysis (PCoA) implemented in GenAlex. The pairwise population differentiation statistics (F ST ), standardized (F' ST ), and Shannon index of the observed populations were generated using analysis of molecular variance (AMOVA) implemented in GenAlex.
Combined phenotypic and genotypic analysis. The IBS distance matrix generated from the genotypic evaluation and the Gower distant matrix generated from the phenotypic evaluation were loaded into R. The R package Dendextend was used to generate a combined genetic distance by summing the phenotype distance matrix and genotype distance matrix. The combined distance matrix was used to construct a hierarchical cluster dendrogram based on the Ward.D2 method. The cophenetic coefficient of correlation was used to measure the accuracy of the hierarchical cluster dendrogram.
Furthermore, the dendrograms generated from the phenotypic, genotypic and combined evaluation were compared against each other using the R package Dendextend. The significance between the phenotypic matrix and the genotypic matrix, phenotypic matrix versus the combined matrix, and genotypic matrix versus the combined matrix was estimated using the Monte-Carlo option of the Mantel test 47 with 9999 permutations. Similarly, the clusters generated from the combined dendrogram were inputed as variables for ANOVA. Finally, the significance of the cluster means was ascertained through Tukey's HSD Post-Hoc test.

Data availability
The data set generated during an/or analyzed during the current study are available from the corresponding author on resonable request.