Association study of morpho-phenological traits in quinoa (Chenopodium quinoa Willd.) using SSR markers

In this study, the genetic and molecular diversity of 60 quinoa accessions was assessed using agronomically important traits related to grain yield as well as microsatellite (SSR) markers, and informative markers linked to the studied traits were identified using association study. The results showed that most of the studied traits had a relatively high diversity, but grain saponin and protein content showed the highest diversity. High diversity was also observed in all SSR markers, but KAAT023, KAAT027, KAAT036, and KCAA014 showed the highest values for most of the diversity indices and can be introduced as the informative markers to assess genetic diversity in quinoa. Population structure analysis showed that the studied population probably includes two subclusters, so that out of 60 quinoa accessions, 29 (48%) and 23 (38%) accessions were assigned to the first and second subclusters, respectively, and eight (13%) accessions were considered as the mixed genotypes. The study of the population structure using Structure software showed two possible subgroups (K = 2) in the studied population and the results of the bar plot confirmed it. Association study using the general linear model (GLM) and mixed linear model (MLM) identified the number of 35 and 32 significant marker-trait associations (MTAs) for the first year (2019) and 37 and 35 significant MTAs for the second year (2020), respectively. Among the significant MTAs identified for different traits, the highest number of significant MTAs were obtained for grain yield and 1000-grain weight with six and five MTAs, respectively.


Plant materials and phenotypic assessment
A collection of 60 quinoa accessions was used in this study.All genotypes were obtained from the IPK Gene Bank, Leibniz Institute of Plant Genetics and Crop Plant Research, Germany.The origin and the accession number of the studied genotypes are presented in Table 1.The phenotypic evaluations were carried out at the experimental fields in Kuhdasht (47° 36′ E, 33° 32′ N, and altitude of 1195 m) and Poldokhtar (47° 42′ E, 33° 09′ N, and altitude of 673 m) counties, Lorestan province, Iran, in two cropping years, 2019-2020 and 2020-2021.To perform the experiments, a randomized complete block design with three replications was used.Seeds of each genotype were planted in three rows of 10 m at a depth of 2 cm with a distance of 40 cm between rows and 3-5 cm between plants on the rows.To measure the studied traits, a random sample consisting of 15 plants was used from each plot by removing the marginal effect.The measured traits included number of days to flowering, days to physiological maturity, plant height (cm), panicle length (cm), grain length(mm), grain saponin (%), grain protein (%), 1000-grain weight (g), grain yield (kg ha −1 ), and harvest index.

DNA extraction and genotypic assessment
Genomic DNA was extracted from the young and fresh leaves at 2-4 leaf stage (25 days old) according to the CTAB method 25 .The quality and quantity of the extracted DNA were evaluated by electrophoresis on a 1% agarose gel and spectrophotometer (LAMBDA 1050UV).A total of 40 microsatellite (SSR) markers were used for molecular analysis and amplification of quinoa genomic DNA.The quality and quantity of extracted DNA on 1% agarose gel and spectrophotometric method were determined.A total of 40 SSR markers were selected using previous studies 19,26 (Table 2).

Data analysis
For phenotypic data, descriptive statistics including minimum, maximum, mean, standard deviation, and coefficient of variation were calculated.The molecular data were manually scored using the binary coding system, (1) for presence and (0) for absence of each specific allele in all studied genotypes according to molecular weight using DNA size marker 100 bp Fermentas.The number of observed and polymorph alleles for each SSR locus in all quinoa genotypes was calculated.The effective number of alleles, gene diversity index and Shanon's diversity index were evaluated as described by Kimura and Crow 29 , Nei 30 and Lewontin 31 respectively.The polymorphic information content (PIC) index, indicating the ability of each marker to distinguish the genotypes 32 , was also calculated as expected heterozygosity based on the studies of Botstein et al. 33 .All genetic diversity indices were calculated using the PopGene software ver.1.32 34 , and the Power Marker software ver.3.25 35 .Cluster analysis using the neighbor-joining method was used to group and determine the differences and/or similarities between genotypes.Also, principal coordinates analysis (PCoA) used to display the two-dimensional distribution of quinoa genotypes using the studied microsatellite markers.
To perform association study, the effective population structure of the studied genotypes into separate subpopulations and identification of mixed genotypes was first performed using the Bayesian method by Structure software 36 .This method attributes each of the genotypes to a hypothetical subpopulation with a probability level, so that the degree of linkage disequilibrium in each subpopulation is minimum and the gametic stage equilibrium is maximum.The initial hypothetical subpopulation values were considered from 1 to 10 and to increase the accuracy, ten replications were performed for each subpopulation.Therefore, the admixture model and the allelic frequency independence with 10,000 burn-in and 10,000 Markov Chain Monte Carlo (MCMC) models, were used to obtain the maximum likelihood curve 37 .Structure software calculates a Qst matrix for each K value (actual number of subpopulations), which includes estimating the probability coefficients of membership of each genotype in each subpopulation.In the resulting plot, when the membership percentage of a genotype to a cluster is greater than or equal to 0.7, the genotype is assigned to that cluster, but if the membership percentage www.nature.com/scientificreports/ is less than 0.7, it is defined as a mixed genotype 38 .The actual number of subpopulations (K) was determined based on ΔK statistic using Evano et al. 39 method.General linear model (GLM) and mixed linear model (MLM) were used to identify informative and linked markers with the evaluated traits using TASSEL 3.0 software 40 .In the general linear model, the population structure coefficient matrix (Q matrix) is only used to determine the relationship between markers and traits, while in the mixed linear model, kinship relationships matrix (K matrix) along with Q matrix (K + Q matrices) are used to avoid false correlations between markers-traits 2 .

Plant ethics statement
In this research, experimental research/field studies/collection of plant/plant material, complied with the relevant institutional, national, and international guidelines and legislation.

Descriptive statistics
Descriptive statistics of the studied traits in 60 quinoa genotypes are presented in Table 3.The results showed that most of the studied traits had a relatively high diversity.The highest diversity was observed in grain saponin and grain protein with 18.54% and 16.24%, respectively.Since in association study, the relationship between phenotypic diversity of the studied traits with polymorphisms in the genome (genotypic diversity) are assessed and markers related to the target traits are identified, the mapping results will be more accurate by increasing the diversity of the studied traits.This diversity is the result of gene recombinations throughout the evolutionary history in natural populations, and in fact, all meiotic events during the evolutionary history of the organism are considered in association study 41 .Therefore, the existence of high diversity in the studied population in this research can provide more reliable and credible results for association studies.

Correlation coefficients
The results of phenotypic and genotypic correlation coefficients (Table 4) among the studied traits showed that the highest positive and significant phenotypic and genotypic correlations were observed between grain yield and harvest index (0.99, 0.98), 1000-grain weight (0.98, 0.92), grain protein (0.91, 0.86), grain length (0.89, 0.78), and panicle length (0.86, 0.75) (Table 4).Also, the phenotypic and genotypic correlations between harvest index with 1000-grain weight, grain length, grain protein and panicle length, as well as between grain protein with panicle length and grain length were positive and highly significant (Table 4).The traits with high correlations with grain yield can be used as indirect selection criteria to improve grain yield in future breeding programs.
Other researchers have reported similar results.Saddiq et al. 42 found a positive and significant correlation between grain yield with panicle length and 1000-grain weight.Also, a positive and significant correlation between spike length and seed length by Manjarres-Hernández et al. 43 and between grain yield and harvest index by Hussain et al. 44 was previously reported, which was consistent with the results of the current study.In contrast, Reguera et al. 45 , Hussain et al. 44 , Granado-Rodríguez et al. 46 , and Matías et al. 47 reported a negative and significant correlation between grain yield and grain protein, which was not consistent with the results of this study.Different results in different studies can be obtained due to the use of different genetic materials as well as different environmental conditions of the experiment.www.nature.com/scientificreports/most of the diversity indices in this research and can be recommended as the highest appropriate markers to evaluate genetic diversity in quinoa.On the other hand, it seems that this marker type is the most informative for assessing genetic diversity in quinoa and for discriminating among the varieties.Several studies had previously reported the high PIC values and polymorphism for microsatellite markers in quinoa populations 1, 19,26,48 .Maughan et al. 11 used microsatellite markers to study quinoa germplasm to introduce highly reproducible and informative microsatellite markers.A total of 1276 gene loci with ATG, ATT, and CA repeats in the genomic library were sequenced, and the microsatellite markers were designed for 397 gene loci.The number of observed alleles in each gene locus was between 2 and 13 alleles with an average of 4 alleles in each locus, and the heterozygosity of markers was 57% on average.They observed high heterozygosity in 67 markers and introduced these markers as appropriate for linkage mapping as well as use in quinoa breeding programs 11 .Jarvis et al. 19 also used 216 SSR markers to construct a quinoa linkage map.The average heterozygosity value of these markers was calculated to be 0.57.They constructed the first linkage map of quinoa with 216 SSR markers in a recombinant inbred lines (RILs) population consisting of 38 linkage groups with a total length of 913 cM 19 .

Population structure
In the genetic studies, population structure which shows the relationships of individuals inter-and intra-population, provides a perspective on the evolutionary relationships between individuals in a population.In an ideal association study, the studied population should not have a specific structure and not be structurally divided into different subpopulations.If the effect of population structure and kinship relationships on association studies is not considered, false positive results will be obtained 49,50 .To identify the population structure in this research, the studied quinoa genotypes were first grouped by cluster analysis using Neighbor-Joining method (Fig. 1).The results showed that 60 quinoa genotypes could be divided into two subpopulations with 38 and 22 genotypes, respectively.This grouping did not correspond to origin or seed color, so that genotypes with different origin or seed color were grouped in each subpopulation.Grouping of genotypes with different origins and seed colors only in two subpopulations can probably be due to the exchange of genetic materials and/or the mixing of different seeds, as well as the existence of two different types, highland and lowland, among the quinoa genotypes studied in this research.Patiranage et al. 32 using whole-genome sequencing of 310 quinoa accessions, clustered the quinoa accessions into two main groups, highland and lowland.Furthermore, to analyze the population structure and determine the actual number of subpopulations, the STRU CTU RE software ver.4.0 was used and the optimal number of subpopulations was calculated based on the change of ΔK versus K values.The results indicated that the number of possible subpopulations in the studied quinoa genotypes is equal to two subpopulations (Fig. 2).Therefore, K = 2 was considered as the optimal number of subpopulations to analyze the population structure www.nature.com/scientificreports/and calculate the membership coefficient of individuals in each cluster (Q-matrix).As shown in Fig. 3, among 60 genotypes studied in the current research, 29 (48%) and 23 (38%) genotypes belonged to the first and second subpopulations, respectively, and eight genotypes (13%) had a membership coefficient of less than 0.7 and were considered mixed genotypes.El-Harty et al. 21studied the population structure of 32 quinoa genotypes using STRU CTU RE software and identified two possible clusters (K = 2), with 13 genotypes (41%) belonging to the first cluster and 8 genotypes (25%) belonging to the second cluster, and 11 genotypes (34%) also having mixed structure.
In addition to the population structure, the linkage disequilibrium (LD) is another factor affecting association study.Coinheritance of two or more loci on a chromosome within a genetic region is called genetic linkage 27 .Linkage equilibrium is the random combinations of alleles at various loci where observed haplotype frequencies agree with the predicted haplotype frequencies in a population, but LD is a non-random association of alleles at various loci and describes non-equal haplotype frequencies 51 .Linkage disequilibrium values between the studied marker pairs in the present study along with their significant probability levels are shown in Fig. 3. Several demographic and genetic factors are affecting LD significantly, out of which mutation and recombination are the key factors 52 .Population structure, new mutations, autogamy, genetic drift, epistasis, genomic rearrangement, selection and kinship significantly increase LD, while higher recombination and mutation rates, repetitive mutations, gene conversion and outcrossing significantly decrease LD 53 .Nevertheless, significant LD is often observed between distant loci located even on different chromosomes causing spurious associations in association studies.It should be noted that the selection of a population with higher or lower LD levels depends on the objective of the mapping study 27 .The extensive level of LD (long-stretched or genetically unlinked LD) reduces the number of markers required for marker-trait association (MTAs) but lowers the mapping resolution, whereas less extensive or short-stretched LD needs a relatively larger number of markers to identify a gene but increases mapping resolution 51 (Fig. 4).

Principal coordinates analysis (PCoA)
Principal coordinates analysis (PCoA) can be used to display a two-dimensional distribution of people (and/ or genotypes and populations) based on genetic markers.The gathering of people in one area of the plot shows their genetic similarity 54,55 .The results of PCoA of the studied quinoa genotypes using microsatellite markers for the is presented in Table 6.The results showed that the first and second components explained 9.78% and 7.63%, and the first ten components accounted for 61.20% of the total variance of the studied population.In studying genetic diversity using molecular data, the markers should have a uniform and appropriate distribution on the  genome level, so that they can cover the entire genome well 56,57 .The justification of lower values of the total variance by the first few components in this study indicates the appropriate distribution and optimal sampling of the markers used in the whole genome.The diagram of the two-dimensional distribution of the quinoa genotypes based on the first and second principals resulting from the PCoA is presented in Fig. 5.As shown in Fig. 5, 60 quinoa genotypes were divided into two separate groups, including 22 and 38 genotypes, respectively.The result of PCoA confirmed the number of groups derived from the cluster analysis, although the genotypes within the groups were not the same in these two analyses.This difference is somewhat natural and obvious, because the cluster analysis grouped the genotypes based on 100% of the information obtained from the markers, while the grouping of the genotypes in the PCoA was done based on the first and second components, which explained 17.42% of the total variance (Table 6).

Association study using GLM and MLM models
The main challenge in association study is to identify real and false relationships between the studied markers and traits according to the population structure and kinship relationships, although the influence of kinship relationships in providing false positive results is more than the population structure [58][59][60] .To compare the identified markers between the first and second years, as well as to detect the stable and true markers associated with  www.nature.com/scientificreports/traits in both experimental years to use them in breeding programs, an association study using GLM and MLM models was separately done for data of the first and second years.
The results of the association study in the first year (2019) showed 35 and 32 significant marker-trait associations (MTAs) using the GLM and MLM models, respectively (Table 7).Grain yield traits with five MTA and days to flowering and grain saponin with two MTA had the highest and lowest significant associations with the markers used in this research, respectively.Coefficient of determination (R 2 ) statistics shows the percentage of the phenotypic variance explained by each marker (Table 7).R 2 values exhibited by the investigated markers in the first year (2019) varied from 0.11 to 0.33 in the GLM model and from 0.07 to 0.30 in the MLM model.The highest and lowest R 2 values in the GLM model belonged to the KAAT001 and KGA114 markers associated with grain yield and grain saponin, respectively, and in the MLM model belonged to the KGA002 and KGA111 markers both associated with panicle length (Table 7).Therefore, two markers KAAT001 and KGA002 are useful and informative markers to identify the genes encoding respective traits.These markers can be used as reliable markers to improve target traits using marker-assisted selection methods.
The results of the association study in the second year (2020) showed 37 and 35 significant marker-trait associations (MTAs) using the GLM and MLM models, respectively (Table 7).Grain yield traits with six MTA and plant height with two MTA had the highest and lowest significant associations with the markers used in this research, respectively.Coefficient of determination (R 2 ) statistics shows the percentage of the phenotypic variance explained by each marker (Table 7).R 2 values exhibited by the investigated markers in the second year (2020) varied from 0.09 to 0.29 in the GLM model and from 0.06 to 0.28 in the MLM model.The highest and lowest R 2 values in the GLM model belonged to the KCAA015 and KAAT033 markers associated with grain yield and Grain length, respectively, and in the MLM model belonged to the KCAA015 and KAAT026 markers associated with 1000-grain weight and grain saponin (Table 7).Therefore, marker KCAA015 is a useful and informative marker to identify the genes encoding respective traits.These markers can be used as reliable markers to improve target traits using marker-assisted selection methods 61 .
The results showed that some significant MTAs in the first year (2019) were only identified using the GLM model, such as the association of KAAT030 with days to maturity, KCAA106 and KGA117 with plant height, KGA006 with grain length, KGA041 with grain saponin, KAAT006 with grain yield, and KGA118 with harvest index, and some others only using the MLM model, such as KAAT025 with days to flowering, KGA111 with plant height, KGA114 and KGA003 with grain saponin, and KAAT047 with grain yield.Also, in the second year (2020), some significant MTAs were only identified using the GLM model, such as KAAT025 with days to flowering, KGA109 with panicle length, KCAA022 with grain length, and KAAT026 with grain saponin, and some others only using the MLM model, such as KGA117 with plant height, and KAAT033 with grain length.These results indicated that the difference in the number and type of the identified and associated markers with the studied traits depends on the type of input data and the type of statistical model used to identify significant marker-trait associations.
However, some significant MTAs were commonly identified by both GLM and MLM models (Table 8).Also, many markers identified by both GLM and MLM models had a significant association with more than one trait, among which the association of KAAT030 and KGA111 with plant height and the relationship of KGA003, KGA006 and KAAT023 with grain yield in the first year (2019) and association of KGA117 and KAAT026 with Grain saponin and the relationship of KAAT040, KGA002 and KAAT023 with grain yield in the second year (2020) can be mentioned.This can be due to the pleiotropic effects of a gene or the effect of linkage between adjacent genes.Identifying markers that are simultaneously associated with several traits will be of great importance to breeders, as they can be used in the simultaneous improvement of multiple traits 22 .Furthermore, some markers identified in this study had a low R 2 value, indicating the nature of quantitative and multigenic inheritance of the investigated traits.
Based on the results of the association study in two experimental years, ten SSR markers linked to the evaluated traits were identified in both years (Table 8).Among which the association of KAAT030 and KGA117 with plant height and the relationship of KGA003, KGA116, KGA002 and KAAT040 with grain yield can be mentioned which can be used as the informative and stable markers for the genetic diversity and mapping studies in quinoa.
Mizuno et al. 62 generated 136 quinoa inbred lines for the molecular identification and characterization of gene functions, and by using genotyping-by-sequencing analysis of the inbred lines, identified 5753 single nucleotide polymorphisms (SNPs) in the quinoa genome, and grouped the inbred lines into three genetic subpopulations.They measured physiological characteristics such as salt tolerance and key growth traits including 1000-grain weight, plant height, stem diameter, leaf dry weight, grain yield per plant, and days to flowering, and generated They also performed an association study using 3156 SNPs and 12 phenotypic traits and identified 3091 and 4 significant MTAs based on GLM and MLM models, respectively.They indicated that the MLM model detected far fewer MTAs than the GLM model and suggested that the MLM may yield fewer false-positive results than the GLM because the MLM used the population structure and the kinship relationships 62 .Patiranage et al. 32 using whole-genome sequencing of 310 quinoa accessions, identified 2.9 million polymorphic high-confidence SNP loci and clustered the quinoa accessions into two main groups, highland and lowland, with F ST divergence of 0.36 and fast LD decay of 6.5 and 49.8 kb, respectively.They also investigated the relationship between SNP markers and 17 agronomic traits using a genome-wide association study method and identified two candidate genes associated with thousand seed weights and a resistance gene analog associated with downy mildew resistance.They identified pleiotropically acting loci for four agronomic traits that highly responded to photoperiod, hence important for the adaptation to different environments.

Conclusion
In this study, the genetic diversity of quinoa accessions was assessed using some important morpho-phenological characteristics including yield and yield components as well as microsatellite markers.Furthermore, the population structure and the possible number of subpopulations as well as the significant marker-trait associations were identified.The results showed that most of the studied traits had a relatively high diversity.A total of 140 scorable alleles with an average of 3.5 alleles per marker were amplified by 40 SSR markers, of which 136 alleles (97%) were polymorph.The relatively high diversity was also observed for most of the studied markers, but the markers KAAT023, KAAT027, KAAT036, and KGA006 showed the highest values for most of the diversity indices in this research and can be recommended as the appropriate and informative markers to evaluate the genetic diversity of quinoa.
The analysis of population structure using cluster analysis with the neighbor-joining method as well as the determination of the actual number of subpopulations by ΔK statistics using STRU CTU RE software grouped the studied quinoa accessions into two possible subpopulations (K = 2).Out of the 60 genotypes studied in this research, 29 (48%) genotypes were allocated to the first and 23 (38%) to the second subpopulation, and eight genotypes (13%) were also considered as mixed genotypes.
Association studies using the GLM and MLM models identified the number of 35 and 32 significant MTAs for the first (2019) and 37 and 35 significant MTAs for the second (2020) experimental years, respectively.Among the significant MTAs identified for different traits, the highest number of significant MTAs were obtained for grain yield and 1000-grain weight with six and five MTAs, respectively.Also, some markers showed a significant relationship with more than one trait (including markers KAAT030 and KGA003 which are associated with www.nature.com/scientificreports/days to maturity, plant height, grain length, grain saponin, grain yield, and harvest index), which can be due to the pleiotropic effects of a gene or the effect of linkage between adjacent genes.In total, based on the results of the association study in two experimental years, ten SSR markers linked to the evaluated traits were identified in both years, which can be used as the informative and stable markers for the genetic diversity and mapping studies in quinoa.

Figure 2 .
Figure 2. Bilateral charts to determine the number of sub-populations in the studied quinoa genotypes (K = 2) based on microsatellite markers.

Figure 3 .
Figure 3. Barplot of structure analysis of the 60 studied quinoa genotypes based on 40 microsatellite markers using Bayesian model.Each color indicates one subpopulation or cluster (K = 2).The vertical axis shows the membership coefficient of each genotype into each cluster.

Figure 4 .
Figure 4. Linkage disequilibrium (LD) plot.The upper and lower diagonal represent linkage disequilibrium and p-value statistics for each marker pair, respectively.

Figure 5 .
Figure 5. Two-dimensional plot of principal coordinate analysis based on the first two components in 60 quinoa genotypes.

Table 1 .
Accession number, origin and seed color of the studied quinoa genotypes.

Table 2 .
Microsatellite (SSR) markers used for molecular evaluation of quinoa genotypes in this study.

Table 3 .
Descriptive statistics of the studied traits in 60 quinoa genotypes.

diversity of the studied population based on SSR markers
The genetic diversity indices of SSR markers in the sixty studied quinoa genotypes are shown in Table5.A total of 140 scorable alleles with an average of 3.5 alleles were amplified by 40 SSR markers, of which 136 alleles (97%) were polymorph.Among the studied markers, KAAT027, KAAT036, KAAT040, KCAA014, KGA042, KGA055 and KGA059 with five alleles showed the highest number of polymorph alleles.The effective number of alleles varied from 1.08 (marker KAAT006) to 3.72 and 3.75 (markers KAAT036 and KAAT027, respectively), with an average of 1.81 alleles.The polymorphic information content (PIC) ranged from 0.23 (marker KAAT007) to 0.89 (marker KAAT023) with an average of 0.60.The average values of Shannon's diversity index and Nei's gene diversity index were 0.54 and 0.44 per each SSR locus, respectively.Marker KAAT027 indicated the highest values for both Shannon's (0.82) and Nei's gene (0.80) diversity indices and marker KAAT007 showed the lowest values (0.15 and 0.09, respectively) (Table5).In total, relatively high diversity was observed for most of the studied markers, which indicates the appropriate selection and high efficiency of the selected markers in this study.Therefore, markers KAAT023, KAAT027, KAAT036, and KCAA014 showed the highest values for

Table 5 .
Diversity indices calculated for the studied SSR markers in 60 quinoa genotypes.

Table 6 .
Variance percentage explained by first ten components of the principal coordinate analysis in 60 quinoa genotypes.

Table 7 .
SSR markers linked to measured traits in the studied quinoa genotypes based on GLM and MLM models in both years.

Table 8 .
Significant MTAs identified by both GLM and MLM models in in 2019-2020 years and stable MTAs identified in both years.