Introduction

Aegilops tauschii (2n = 2x = 14, DD), the D-genome progenitor of hexaploid wheat (Triticum aestivum), is a diploid and self-pollinated plant. The D genome of Ae. tauschii and of the hexaploid wheat are closely related because of the recent origin of the latter by hybridisation of T. durum and Ae. tauschii1. Ae. tauschii is an important source of genetic variation for wheat breeding and its genome is an invaluable reference for wheat genomics, as revealed by its utility for studying wheat gene space2,3. Although T. aestivum was originated by hybridisation of T. turgidum (AABB) with Ae. tauschii (DD)4, the participation of the latter in the hexaploidisation of common wheat was very low and as a result the genetic diversity of hexaploid wheat is less than that of Ae. tauschii5. Therefore, understanding the genetic diversity and adaptive evolution of Ae. tauschii may provide important insights for breeding elite wheat varieties. Additionally, the high genetic variability of morphological traits in Ae. tauschii may indicate the presence of several loci or alleles that still remain to be uncovered.

Conventional linkage mapping is the most common approach to detect quantitative trait loci (QTLs), corresponding to complicated traits in plants. However, linkage mapping using bi-parental crosses is able to reveal information on two alleles at a given locus or few loci segregating in the study population. In addition, the resolution of the detected QTLs is poor, ranging from 10 to 30 cM, due to the limited number of recombination events that occur during the development of mapping populations6,7. Moreover, the development of mapping populations is an expensive and time-consuming process.

The use of single nucleotide polymorphism (SNP) markers, in conjunction with statistical approaches for association mapping (AM), provides dense genome coverage, decreases genotypic errors and allows the accurate identification of loci8. AM, also known as linkage disequilibrium mapping, is the non-random association of alleles at different loci and considered to be a powerful tool for resolving complex trait variation and identifying different loci and novel alleles in natural populations9,10. AM has been extensively used to identify genes or QTLs in many plant species including Arabidopsis11, rice12, maize13,14, potato15 and wheat16,17,18. Particularly, genome-wide association studies (GWAS), which exploit marker polymorphisms across all chromosomes, have become increasingly popular and powerful, since they have been successfully used in human and animal genetics19,20. GWAS has been employed to study several important traits in rice21, barley22 and wheat23,24.

In the present study, we used the GWAS approach employing 7,185 SNP markers in a core collection of 322 Ae. tauschii accessions of diverse origin in order to: 1) investigate marker-trait associations for 29 morphological traits and 2) scan for candidate genes that control corresponding morphological traits. Furthermore, we aimed to provide a comprehensive overview on genetic diversity of morphological traits, as well as, subspecies classification based on phenotypic data and genotypic comparison. Overall, this study was designed to provide useful information for understanding the genetic mechanism of morphological traits in Ae. tauschii and further unlock the regulatory network of complicated morphological traits in this species.

Results

Phenotypic evaluation

Analysis of variance (ANOVA) revealed significant variation among genotypes for all 29 morphological traits (Table 1). The level of variation was also reflected by the distribution of traits in 2012 and 2013 (see Supplementary Fig. S1 1a to 29a available online). Significant differences (P < 0.001) were observed between 2012 and 2013 for all traits, except for AL2 and GL2. Also, significant variation (P < 0.001) and significant differences (P < 0.001) in year × genotype interactions were observed for all traits.

Table 1 Analysis of variance for the tested 29 morphological traits in year 2012 and 2013.

Phenotypic variation among genotypes for each trait was confirmed by its range, mean, standard deviation and coefficient of variation (Table 2). The coefficient of variation in 2012 ranged from 8.08 to 22.19%, while in 2013 it ranged from 7.53 to 23.05%. Compared to 2012, the mean values of SL, IL1, IL2, IL3, IL4, SNN and AL1 were significantly increased (P < 0.001) in 2013, while the rest 22 morphological traits including PH, FL, FW, LN, SPL, SPW, SPN, GL1, GW1, GW2, GT1, GT2, LL1, LL2, LW1, LW2, PL1, PL2, PW1 and PW2 were significantly decreased (P < 0.001) in 2013.

Table 2 Range, mean, standard deviation (Sd), coefficients of variation (CV%) for the tested 29 morphological traits in year 2012 and 2013.

Broad-sense heritability estimates were calculated for all 29 morphological traits (see Supplementary Table S1). Among them, GW1 and GW2 had the highest heritability (0.94), while LN had the lowest heritability (0.27). Heritability estimates for all other traits ranged from 0.58 (PH) to 0.90 (SPW).

Pearson’s correlation was used to investigate the relationship of traits between 2012 and 2013 and all of them were significantly correlated (P < 0.01) between the two years (see Supplementary Table S2). In 2012 and 2013, GW1 and GW2 were highly correlated (r = 0.891**, P < 0.01 both years), while all other traits were moderately correlated. The best linear unbiased predictor (BLUP) was calculated from the fixed effects of phenotypic data to avail unbiased mean estimates and used in the correlation analysis among the study traits. The correlation coefficients of this combined analysis are presented in Supplementary Table S3. GW1 and GW2 were highly positively correlated (r = 0.989**, P < 0.01), while IL4 and SPN were highly negatively correlated (r = −0.589**, P < 0.01). AL2, SPW, GT1, GT2, LW1, LW2, PL2, PW1 and PW2 were significantly correlated (P < 0.05) with all other traits. The remaining traits showed moderate to weak correlation among each other.

Ae. tauschii classification based on BLUP values

Discriminant function analysis (Fisher’s method) based on BLUP values was used to show the distance of four Ae. tauschii subspecies (Ae. tauschii ssp. tauschii_I, Ae. tauschii ssp. tauschii_II, Ae. tauschii ssp. strangulata_I and Ae. tauschii ssp. strangulata_II). The results were concordant with those of genotypic comparison (see Supplementary Table S4) for 291 out of 322 accessions and as a result, a large proportion (90.4%) of Ae. tauschii accessions was classified correctly (Fig. 1 and Supplementary Table S5).

Figure 1
figure 1

Scatter plots for function 1 and function 2 in discriminant function analysis.

1, Aegilops tauschii ssp. taschii_I; 2, Aegilops tauschii ssp. taschii_II; 3, Aegilops tauschii ssp. strangulata_I; 4, Aegilops tauschii ssp. strangulata_II.

Cluster analysis (Ward’s method) was performed using the squared Euclidean distance matrix also based on BLUP values and all accessions were divided into four clusters (see Supplementary Table S6 and S7). Cluster I included 113 accessions from 13 different areas of origin and 3 different subspecies, while the most frequent subspecies type was Ae. tauschii ssp. tauschii_II. Cluster II included 78 accessions from 16 different areas of origin and 4 different subspecies; the most frequent subspecies was Ae. tauschii ssp. tauschii_I. Clusters III and IV comprised 44 and 87 accessions, respectively, that originated from 3 (Cluster III) and 13 (Cluster IV) different areas. Cluster III included 3 and Cluster IV included 4 different subspecies, whereas the most frequent subspecies was Ae. tauschii ssp. strangulata_II in Cluster III and Ae. tauschii ssp. strangulata_I in Cluster IV (see Supplementary Table S6).

It was observed that Ae. tauschii accessions from different areas of origin were grouped in the same cluster, while accessions from the same area of origin were grouped into different clusters. For instance, all the accessions from Iran (65 accessions) were grouped into four clusters, suggesting the high levels of genetic diversity in each centre of origin. We also observed that Cluster I had a closer relationship with Cluster II and the main subspecies in both clusters was Ae. tauschii ssp. tauschii. Cluster III had a closer relationship with Cluster IV and the main subspecies in both clusters was Ae. tauschii ssp. strangulata (see Supplementary Table S7). Overall, this analysis showed that there was no relationship between the morphological traits and the centres of origin, revealing high levels of genetic diversity among the accessions.

Marker-trait association analysis

The Bonferroni-corrected threshold (-lgp > 3.84) was used as a cut-off to identify marker-trait associations (MTAs). Using the mean phenotypic values from 2012 and 2013, 12,444 significant SNPs were detected by the GLM and 28 significant SNPs by the MLM (Table 3). Of these, 18 SNPs were detected by both methods. The GLM detected significantly more markers than the MLM because it is much less stringent, as shown by the quantile-quantile (Q–Q) plot (see Supplementary Fig. S1 1e to 29e, 1h to 29h).

Table 3 GWAS of the investigated 29 morphological traits in years 2012 and 2013 detected by GLM and MLM.

In 2012, the GLM detected significant SNPs for all traits, while the MLM for only six traits (SL, AL1, SPN, GW1, GW2 and PW2). The average r2 values that ranged from 2.04 to 9.35% provided an estimate of phenotypic variation explained by SNPs. In 2013, the GLM detected significant SNPs for all traits, while the MLM only for 13 traits (IL1, FL, FW, SNN, SPL, GL2, GW1, GW2, GT1, GT2, LL2, LW1 and LW2). The average r2 values ranged from 2.74 to 7.09% (Table 3).

We also detected associations between SNPs and BLUP values (see Supplementary Fig. S2). A total of 7,809 significant SNPs were detected by the GLM and 10 significant SNPs by the MLM. Only six significant SNPs were detected by both methods. The r2 values ranged from 2.05 to 8.65% (see Supplementary Table S8).

Significant loci and putative candidate gene

In this study, a total of 15 significant SNPs associated with 10 traits and 21 putative genes were identified (Table 4). Of these SNPs, four were located on chromosome (chr) 2D, two on chr 3D, five on chr 4D, two on chr 5D, one on chr 6D and one on chr 7D. Location information for four significant SNPs inferred from the genetic map constructed by Luo et al.3 differed from that inferred as the best hit from the International Wheat Genome Sequencing Consortium (IWGSC). SNP markers GBQ4KXB02HJM7P_431, contig11810_520, GBUVHFX01CI5PL_126 and GBF1XID02IP0NJ_181 were located on chr 2D, 3D, 6D and 7D, respectively, according to the genetic map and on chr 2BL, 3B, 6BS and 7AS, respectively, according to the best hit from the IWGSC (Table 4).

Table 4 SNPs significantly associated with morphological traits and candidate genes/flanking genes.

The study traits in 2013 were associated with a higher number of significant SNPs than those in 2012 and only GW1 was associated with significant SNPs in both years. Based on BLUP values, only GW1 and GW2 were associated with 3 significant SNPs each and a total of seven candidate genes (116F2, 115G1, 1J9.1, 1J9.2, Rht-D1b, Ig1, and TSAlike) were identified (Table 4).

In 2012, SPN was associated with the highest number of significant SNPs and a total of four candidate genes (LR34, cytochrome P450, glutathione-S-transferase 2 and glutathione-S-transferase 1) were identified. In 2013, SNN was associated with the highest number of significant SNPs and a total of five candidate genes (LR34, cytochrome P450, ZCCT2, ZCCT1 and SNF2P) were identified. GT1 and GT2 were associated with the same number of significant SNPs and the same candidate gene (CKX2.5) was identified. Similarly, GW1 and GW2 were also associated with the same number of significant SNPs and two candidate genes (116F2 and 115G1) were identified (Table 4).

Pleiotropy and multigenic effect revealed by GWAS

Significant association of the same SNPs with multiple traits might be the result of pleiotropy. We observed that a SNP at 55.616 cM on chr 2D, a SNP at 132.198 cM on chr 4D and a SNP at 151.266 cM on chr 5D were significantly associated with both GW1 and GW2. Also, a SNP at 113.167 cM on chr 5D was significantly associated with both GT1 and GT2 (Table 4). These associations were also supported by Pearson’s correlation analysis based on BLUP values (see Supplementary Table S3, r = 0.988** for GW1 and GW2; r = 0.975** for GT1 and GT2; P < 0.01).

Furthermore, several different SNPs were significantly associated with the same trait. SNP markers GBQ4KXB02HJM7P_431, GDEEGVY02FLOCP_398 and GDRF1KQ02F8V30_278 were significantly associated with GW1 and GW2. SNP markers GBB4FNX02JQNSU_161 and GBUVHFX01CI5PL_126 were significantly associated with SNN and SNP markers contig15239_471, contig11810_520, GCE8AKX01ALM0H_152 and F5MV3MU01BU5XD_286 were significantly associated with SPN (Table 4). These results suggested that some morphological traits were not controlled by a single gene but were quantitative.

Discussion

Hexaploid wheat (T. aestivum) originated by the hybridisation of T. turgidum (AABB) with Ae. tauschii (DD)4, a cross that most probably occurred south or west of the Caspian Sea4,25. The distribution centre of Ae. tauschii is along the southern shores of the Caspian Sea and in Azerbaijan and this species has mainly spread eastwards from the centre of origin26, probably due to its diverse adaptability. The study of genetic diversity in Ae. tauschii collections may help us in transferring desirable traits to common wheat.

In this study, Ae. tauschii accessions showed significant (P < 0.001) levels of diversity, as revealed by ANOVA of all morphological traits and the majority of traits were highly inheritable, showing a broad variation among the accessions. It is well-known that the genotype, environment and their interaction play an important role in morphological traits. In the two different years, the means of some morphological traits were significantly higher or lower, suggesting that traits were probably affected much more strongly by environmental factors (i.e. year) than by genotype. Discriminant function analysis based on BLUP values was not congruent with the classification based on the genotypic comparison or the areas of origin27 and 9.6% of the accessions were misclassified. Cluster analysis using BLUP values grouped accessions into four clusters, while accessions in the same cluster were not from the same subspecies28 or from the same area of origin. In each cluster, Ae. tauschii ssp. tauschii and ssp. strangulata did not separate from each other entirely. Cluster I included 110 accessions from Ae. tauschii ssp. tauschii against 3 from ssp. Strangulate. Cluster II included 66 accessions from Ae. tauschii ssp. tauschii versus 12 from ssp. Strangulate. Cluster III and Cluster IV were 1 VS 43 and 27 VS 60 respectively. The intermediate forms and hybrids between the two subspecies reported by Kihara et al.29 reveal possible events of migration that probably led to a decrease in genetic differentiation and may explain the results of this study. Jaaska30 reported that Ae. tauschii ssp. tauschii and ssp. strangulata were not closely related to each other; however, it seems that intraspecies branching outs probably occurred at the same time with big changes in the genetic structure of Ae. tauschii collections.

We identified many significant SNPs and related candidate genes associated with morphological traits in Ae. tauschii by employing the GWAS approach. It is known that linkage mapping can also detect QTLs using different segregating populations tested in different environments. Although, few QTLs have been identified in Ae. tauschii, a large number of QTLs related to agronomical traits have been identified in common wheat by the conventional mapping approach. Since the D-genome of Ae. tauschii and of common wheat are homologous, the identification of QTLs in Ae. tauschii by the GWAS approach may offer useful information for understanding the genetic mechanism of agronomic traits in wheat too.

In the present study, significant SNPs identified by the GLM or the MLM were distributed on chr 2D–7D. We identified a locus related to SL on chr 4D that was also reported by Sourdille et al.31, but they found that it was located in Xgwm261 of chr 2DS. We identified four loci related to SPN on chr 2D, 3D and 4D. These results were partially consistent with those of McCartney et al.32, who reported a QTL related to SPN on chr 4DL between Xbarc48 and Xgwm194 and with those of Rasheed et al.33 who reported a genomic region related to horizontal principal component 4 (HPC4) trait on 3D (53.86 cM) in synthetic hexaploid wheat. The physical maps of wheat and Ae. tauschii are yet to be finished and therefore little information on chromosome locations can be provided. Hence, the loci identified herein as being associated with morphological traits cannot be directly compared with QTL reported by other researchers.

Previous genetic research has uncovered many genes that affect important agronomic traits, but only a few have been practically used in plant breeding. For instance, it is known that numerous Rht (reduced height) genes affect plant height; however, only the Rht-B1b (Rht1), Rht-D1b (Rht2) and Rht8c have been used extensively in agriculture34. Other important agronomic genes include the Ppd (response to photoperiod) genes, the Vrn (response to vernalisation) genes and the Eps (earliness per se) genes35. These genes play a vital role in the processes involved inplant growth and they are homologous between different plant species. In this study, we identified a few candidate genes associated with phenotypic traits. These genes are partially homologous to Hordeum vulgare and Zea mays and highly homologous to different species of Triticum or Aegilops. These genes included enzyme genes, such as Acc-2, CKX2.5, Ig1, TSAlike, Acc-1, BAM3, LR34, cytochrome P450, glutathione-S-transferase 2, and glutathione-S-transferase 1; hormone response genes, such as Aux/IAA gene family and ERF1; regulatory element genes, such as 1J9.1 and 1J9.2; and other genes, such as AXAH3, 116F2, 115G1, Rht-D1b, ZCCT2, ZCCT1, and SNF2P. Additionally, the semi-dwarfing gene Rht-D1b was identified to affect GW2 in our study, indicating that it may probably control more than one agronomic trait. Pleiotropic and multigenic effects were also observed in this study, such as SNPs GBQ4KXB02HJM7P_431 (at 55.616 cM on chr 2D), GDEEGVY02FLOCP_398 (at 132.198 cM on chr 4D),) and GDRF1KQ02F8V30_278 (at 151.266 cM on chr 5D) that are associated with both GW1 and GW2 traits; SNP GBB4FNX01BXV4Q_52 (at 113.167 cM on chr 5D) is associated with both GT1 and GT2 traits. Pleiotropic or closely linked genes36 allowed us to unravel the origin of genetic correlations among the morphological traits, while multigenic effects revealed that the traits of Ae. tauschii were complex and affected by polygenes.

Our findings are a tool that can assist in genetic dissection of D genome of bread wheat. When the desired gene resides in D genome of Ae. tauschii, homologous pairing is expected between the donor and the recipient bread wheat chromosome (of the D genome) and no pairing induction is required. The most effective way is that skilful use of tetraploid wheat, which has AB genome, is hybridised with Ae. tauschii and the chromosomes of the F1 hybrid are doubled using colchicine treatment. The product is a fertile synthetic hexaploid ABD genotype fully homologous to bread wheat. Such wheat–Ae. tauschii hybrids have a high level of sterility. Homologous chromosomes of the hexaploid wheat will readily recombine in a hybrid and such synthetic lines can serve as a gene pool derived from Ae. tauschii that is ready for screening for any desired trait and allow for an easy transfer of responsible genes37.

In summary, we performed genome-wide association studies for morphological traits in a population containing 322 Ae. tauschii accessions using 7,185 polymorphic SNP markers. Fifteen significant markers were detected by both GLM and MLM. At significant loci and flanking regions, we identified candidate genes for morphological traits including enzyme genes, hormone response genes and other genes that may affect morphological traits. Additionally, discriminant function analysis and cluster analysis showed that there was no correlation between the morphological traits and the centres of origin and revealed high levels of genetic diversity among the tested populations. The identified SNPs and genes offer essential knowledge for cloning genes related to morphological traits in Ae. tauschii and wheat. These findings provide useful information for further unlocking of genetic mechanism of morphological traits in Ae. tauschii, followed by agronomic traits in wheat.

Methods

Plant Material

A total of 322 A. tauschii accessions used in this study were obtained from the Triticeae Research Institute of Sichuan Agricultural University (SAU). Detailed information for each accession is given in Supplementary Table S9 available online.

Phenotypic evaluation

Ae. tauschii accessions were evaluated in the field at Wenjiang, Chengdu, China, during the growing season (April–June) in 2012 and 2013. All accessions were grown during the planting season (October 2011 and 2012). Every accession was planted in three rows, each row with five plants, the length of each row was 1.5 m and spacing between plants and between rows was 0.3 m. Thus, each accession comprised 15 plants from three replications and each replication contained five plants. Those 15 plants of each accession were selected to investigating morphological traits. A total of 29 morphological traits were investigated including plant height (PH), spike length (SL), internode length 1 (IL1), internode length 2 (IL2), internode length 3 (IL3), internode length 4 (IL4), flag leaf length (FL), flag leaf width (FW), leaf number (LN), stem node number (SNN), awn length 1 (AL1), awn length 2 (AL2), spikelet length (SPL), spikelet width (SPW), spikelet number (SPN), glume length 1 (GL1), glume length 2 (GL2), glume width 1 (GW1), glume width 2 (GW2), glume thickness 1 (GT1), glume thickness 2 (GT2), lemma length 1 (LL1), lemma length 2 (LL2), lemma width 1 (LW1), lemma width 2 (LW2), palea length 1 (PL1), palea length 2 (PL2), palea width 1 (PW1) and palea width 2 (PW2). Measurements were conducted using a straightedge and a Vernier calliper. A brief description of each trait is summarised in Supplementary Table S10 available online.

Descriptive statistics, ANOVA, correlation analysis and heritability estimates of all traits were conducted in SAS 9.2 (SAS Institute Inc., Cary, NC) using 2-year data. Broad-sense heritability was defined as H = VG/(VG + VE), where VG and VE are the estimates of genetic and environmental variance, respectively38. Phenotypic BLUP was estimated taking into account the genotype by environment interaction39 and was used to perform correlation analysis, discriminant function analysis, cluster analysis and further association analysis. Discriminant function analysis (Fisher’s method40) and cluster analysis (Ward’s method41) were conducted using SPSS 20.0 (IBM Corp., Armonk, NY).

10K Infinium iSelect SNP array and SNP genotyping

A total of 7,185 polymorphic SNP markers in the array were uniquely mapped on the genetic map and the physical map of Ae. tauschii built from bacterial artificial chromosome clones3. SNPs were assayed according to the manufacturer’s protocol (Illumina Inc., San Diego, CA) at the Genome Centre, University of California, Davis, USA. Normalised Cy3 and Cy5 fluorescence intensities for each DNA sample were graphed using GenomeStudio software (Illumina Inc., San Diego, CA), resulting in genotype clustering for each SNP marker. Detailed information on SNP genotyping of Ae. tauschii accessions has been described in our previous study28.

Population structure

The Bayesian inference program STRUCTURE 2.3.342,43 was used to assess population structure using a set of 7,185 polymorphic SNP markers mapped on the genetic map of Ae. tauschii3. We used the linkage ancestry model and the allele frequency correlated model. A total of 100 burn-in iterations followed by 100 Markov Chain Monte Carlo (MCMC) iterations for K = 1–10 clusters were used to identify the optimal range of K. For each K, five independent runs were produced. The optimal value of K was determined using the delta K method44. Here, K = 4 was used and the whole panel was divided into Subpopulation (Subp) 1 (Ae. tauschii ssp. taschii_I), Subp 2 (Ae. tauschii ssp. taschii_II), Subp 3 (Ae. tauschii ssp. strangulata_I) and Subp 4 (Ae. tauschii ssp. strangulata_I) based on our previous study28.

Marker-trait associations

MTAs of 6,905 SNP markers with minor allele frequency (MAF) > 0.05 were evaluated based on the phenotypic mean data from 2012 and 2013, as well as BLUP values using Tassel 2.141,42 Two models, 1) GLM adjusted using the Q-matrix and 2) the MLM adjusted using both the Q- and kinship (K)-matrix were employed to reduce errors from population structure. Bonferroni-corrected thresholds at α = 1 were used as cut-offs. When the number of markers was 6,905 SNPs at α = 1, the Bonferroni-corrected threshold for the p value was 144.823 × 10–6 with a corresponding –log10(p) value of 3.84. Significant markers were demonstrated with a Manhattan plot generated in R 3.03 (http://www.r-project.org/). Significant p-values (observed p-values against cumulative p-values in a negative log10 scale) were demonstrated with a Q–Q plot also generated in R.

Putative candidate genes analysis

Putative candidate genes were proposed for each significant MTA using the corresponding extending SNP marker sequence from the National Centre for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/) GenBank non-redundant database and the extending SNP marker sequence derived from a 5-kb increase around each SNP marker that was performed by BLAST in the IWGSC (http://www.wheatgenome.org/).

Additional Information

How to cite this article: Liu, Y. et al. Genome-wide association study of 29 morphological traits in Aegilops tauschii. Sci. Rep. 5, 15562; doi: 10.1038/srep15562 (2015).