High molecular weight glutenin gene diversity in Aegilops tauschii demonstrates unique origin of superior wheat quality

Central to the diversity of wheat products was the origin of hexaploid bread wheat, which added the D-genome of Aegilops tauschii to tetraploid wheat giving rise to superior dough properties in leavened breads. The polyploidization, however, imposed a genetic bottleneck, with only limited diversity introduced in the wheat D-subgenome. To understand genetic variants for quality, we sequenced 273 accessions spanning the known diversity of Ae. tauschii. We discovered 45 haplotypes in Glu-D1, a major determinant of quality, relative to the two predominant haplotypes in wheat. The wheat allele 2 + 12 was found in Ae. tauschii Lineage 2, the donor of the wheat D-subgenome. Conversely, the superior quality wheat allele 5 + 10 allele originated in Lineage 3, a recently characterized lineage of Ae. tauschii, showing a unique origin of this important allele. These two wheat alleles were also quite similar relative to the total observed molecular diversity in Ae. tauschii at Glu-D1. Ae. tauschii is thus a reservoir for unique Glu-D1 alleles and provides the genomic resource to begin utilizing new alleles for end-use quality improvement in wheat breeding programs.


31
Originating in the Fertile Crescent some 10,000 years ago, hexaploid wheat (Triticum aestivum) is now 32 grown and consumed around the world 1 . The global consumption of wheat as a staple crop is owed 33 principally to the unique viscoelastic properties of wheat dough that lend it the capacity to make diverse 34 baked products such as leavened bread, tortillas, chapati, pastries, and noodles. The uniqueness of 35 wheat dough can also be described as the strength to resist deformation and elasticity to recover the 36 original shape as well as the viscosity to permanently deform under persistent stress. Elasticity is 37 important for the product to hold shape, while viscosity allows the dough to be worked and formed. 38 The balance of the competing properties determines what baked goods a dough is suitable for, such as a 39 dough with greater strength for leavened pan bread compared to the more extensible dough that is 40 desired for a chapati or tortilla. 41 42 Bread wheat is an allohexaploid with the A-, B-and D-subgenomes contributed by different, but related, 43 species. The closest relative to the wheat A-subgenome is diploid Triticum urartu, with other diploid A-44 genome species including the wild and domesticated Einkorn wheat (Triticum monococcum). While the 45 exact ancestor of the B-genome is unknown and presumed extinct, it is believed that Ae. speltoides (S-46 genome) is the closest living relative. These two species were brought together to form a tetraploid 47 Here we characterized the Glu-D1 allelic diversity in a panel of 273 sequenced Ae. tauschii accessions. 94 The panel spans the known genetic diversity of Ae. tauschii and is a powerful resource for association 95 mapping and gene identification 23 . From the sequenced Ae. tauschii panel, we discovered hundreds of 96 genetic variants which defined dozens of unique haplotypes. This gives the needed molecular 97 information to track these alleles in breeding germplasm, which will in turn enable targeted assessment 98 of the novel Ae. tauschii HMW glutenin alleles in hexaploid backgrounds leading to utilization of 99 favorable alleles for wheat quality improvement. Through the Open Wild Wheat Consortium, we obtained Illumina 150 bp paired-end short reads from 106 234 unique Ae. tauschii accessions each sequenced to greater than 7-fold coverage 23 . These were 107 aligned to the Ae. tauschii AL8/78 reference genome and sequence variants at the annotated Glu-D1 108 locus were extracted. We also included three wheat cultivars in this analysis to compare Ae. tauschii 109 variants to the common 5+10 (variety 'CDC Stanley') and 2+12 (varieties 'Chinese Spring' and 110 'LongReach Lancer') alleles. From this panel, we identified a total of 310 variants at Glu-D1, which were 111 used to generate haplotypes and evaluate molecular diversity at this locus. 112 113 From the Ae. tauschii germplasm collection we identified 32 and 33 haplotypes within the coding 114 sequence for the x and y subunits of the Glu-D1 locus, respectively (Figure 1, Supplemental File S1). 115 When considering the complete Glu-D1 locus with combination of the x and y subunit, a total of 45 116 haplotypes were identified (Table 1). The various x and y subunit haplotypes were almost exclusively 117 associated with each other, demonstrating the close physical association and limited recombination 118 between the two genes. We included the 2500 bp up-and downstream sequences in our analysis to see 119 if this resulted in further differentiation of alleles as short-read sequences often do not align uniquely to 120 the central, highly repetitive region of the HMW glutenin genes. Including the flanking regions did not 121 result in additional haplotypes. Thus, it appears that the identified variants are sufficient for faithfully 122 differentiating alleles at Glu-D1. 123 We then calculated genetic distances and determined a gene-level phylogeny at Glu-D1 for all of the Ae. 125 tauschii accessions (Figure 1). Haplotypes clustered into three major clades, two of which were 126 associated predominantly with Lineage 2 and one with Lineage 1. A unique group of Glu-D1 alleles from 127 the newly characterized Lineage 3 accessions were found within a narrow clade with Lineage 2. Among 128 the three major clades, we designated 16 subclades that were clearly distinguished by variants and 129 coincided with a Euclidean distance of 4. Of the 16 subclades, eight were associated exclusively with 130 Lineage 2, five with Lineage 1, and one with Lineage 3. The Lineage 3 accessions all fell within the 131 Lineage 2 major clades, but occupied a unique subclade therein. Thus, the gene-level phylogeny at this 132 locus agrees very closely with the overall previously described population structure of the Ae. tauschii 133 lineages 23,24 . We also observed one clade (9) that had representative accessions from both Lineage 1 134 and 2. This could represent an ancestral haplotype found in both lineages which underwent incomplete 135 lineage sorting, or a case of recent interlineage haplotype exchange. Cases of haplotypes shared across 136 Lineages 1 and 2 were also observed for pest (Cmc4) and disease resistance (Sr46) genes 23 . 137 138 Lineage 2, the recognized ancestral diploid donor of the D-subgenome of hexaploid wheat 3 , had greater 139 Glu-D1 molecular haplotype diversity than Lineage 1. Not only were there more subclades associated 140 with Lineage 2, there were also more haplotypes (Supplemental Tables S1 and S2). As expected, the 141 haplotypes of wheat clustered within Lineage 2 subclades ( Figure 1). Within Lineage 2, we observed Ae. Given the large difference in quality between wheat cultivars carrying 2+12 and 5+10 alleles, we 150 hypothesized that these two haplotypes would not be similar at a molecular level. However, we found 151 that 2+12 and 5+10 clustered relatively closely within major-clade III, with much greater overall diversity 152 detected across Ae. tauschii particularly when including the Lineage 1 accessions which had very 153 different haplotypes. When comparing the 2+12 and 5+10 haplotypes to those found in Lineage 1, it 154 becomes apparent that Ae. tauschii carries alleles that are very unlike anything seen in bread wheat and 155 may offer unique functional characteristics when introgressed into hexaploid backgrounds. 156 157

Geographic diversity 158
Given the known geographic structure and distribution of Ae. tauschii which is associated with various 159 levels of population structure 24 , we evaluated the Glu-D1 diversity relative to the geographic origin of 160 the Ae. tauschii accessions. Molecular haplotypes were strongly associated with geographic origin, 161 consistent with the overall genome-wide picture 24 , and genetic distances between alleles increased 162 with the geographic distance between collection sites of the Ae. tauschii accessions (Figure 2). The 163 greatest concentration of haplotype diversity was located along the shores of the Caspian Sea in Iran 164 ( Figure 2). Consistent with a hypothesis of admixture between Lineage 1 and Lineage 2 leading to 165 shared gene-level haplotypes across the lineages, the accessions from Lineage 1 and 2 with the same 166 Glu-D1 haplotype (within subclade 9) were collected very near one another. 167 168

Molecular haplotypes identify novel Glu-D1 alleles 169
We employed SDS-PAGE analysis, the traditional standard for differentiating HMW glutenin loci, to 170 determine if the haplotype molecular sequence diversity would also reflect differences in protein 171 mobility. We evaluated at total of 72 unique accessions with SDS-PAGE and differentiated 9 alleles for 172 the x subunit and 8 alleles at the y subunit from this protein mobility assay. Analysis of the Lineage 1 173 and Lineage 2 variants revealed that molecular haplotypes were consistent with the proteins 174 differentiated by SDS-PAGE (Supplemental Tables 1 and 2). For the majority of the alleles that were 175 differentiated by SDS-PAGE, we were able to unambiguously correlate the observed SDS-PAGE alleles 176 with the molecular variants. Although specific molecular haplotypes were associated with specific SDS-177 PAGE mobilities, there was little concordance between gene level variation and SDS-PAGE mobility as 178 similar alleles at the molecular level were observed with very different SDS-PAGE mobilities. 179 Alternatively, very different molecular haplotypes were observed with the same SDS-PAGE. This 180 supports our hypothesis that the observed sequence variants are effectively in complete linkage 181 disequilibrium and tagging the size variants from the central repeat region. Similarly, the SDS-PAGE 182 diversity was lower having less differentiating power than the molecular haplotypes. As noted, the 183 same SDS-PAGE mobilities were observed in both Lineage 1 and Lineage 2 haplotypes, but the molecular 184 haplotypes were clearly differentiated ( Figure 1). The protein mobility differences are considered to be 185 primarily due to variation in the central repetitive region and therefore are not directly detectable with 186 One of the most valuable findings of this study was the high prevalence of cryptic molecular haplotypes 206 hidden within SDS-PAGE mobilities. Within every SDS-PAGE mobility pattern there were multiple 207 molecular haplotypes, often from very different subclades and occasionally from entirely different 208 clades ( Figure 1). The cryptic SDS-PAGE haplotypes, accordingly, were geographically disperse ( Figure  209 3). For example, within SDS-PAGE 2+12 were four haplotypes; one which was the same as wheat 2+12 210 (Dx1a + Dy1a), another which was within the same subclade (Dx1c+Dy1d), and two from entirely 211 different major-clades (Dx9a+Dy9b and Dx13b+Dy13a). Also, within subclade 9 were the SDS-PAGE 212 mobilities Dx2+Dy10 and Dx2+Dy11, and within subclade 13 were the SDS-PAGE mobilities 1t+12, 213 2.1*+12.1*, and 4+10 further supporting that these haplotypes are not all similar to the wheat 2+12 214 haplotype at the molecular level. However, the proteins still migrate similarly on an SDS-PAGE. These 215 results suggest that SDS-PAGE alone is insufficient when characterizing HMW glutenin diversity in wild 216 relatives and will not be a suitable tool for tracking novel alleles in the hexaploid wheat germplasm. 217 While most molecular haplotypes delineated along the three Ae. tauschii lineages (Figure 1), a notable 219 exception was within the predominantly Lineage 1 major-clade, subclade 9, where the same three 220 haplotypes (Dx9a+Dy9a, Dx9a+Dy9b, and Dx9a+Dy9c) were observed in both Lineage 1 and Lineage 2 221 accessions. Interestingly, while there were three haplotypes at the y subunit, there was only a single x 222 haplotype associated with all three of these. The x subunit mobility was the same for all three 223 haplotypes, indicating that the x allele is in fact the same. However, the y subunit was differentiated 224 with the mobility Dy9b was faster than that of Dy9a and Dy10c (Supplemental Figure S1). 225 226

Recombinant haplotypes identified 227
The close proximity of the glutenin genes results in such tight linkage that recombination is extremely 228 rare. To date, a recombination between the x and y subunit of any HMW-GS locus has yet to be verified. 229 Among the 242 Ae. tauschii accessions studied here, we found a clear example of a historical 230 recombination at Glu-D1 in the accession TA1668 (Lineage 2). SDS-PAGE mobility of TA1668 matches 231 that of TA10081 (Dx2+Dy10.2), and though the y haplotype of TA1688 is the same as the y haplotype of 232 TA10081, the x subunit is very different and matching the Lineage 1 clade (Figure 4). Within this clade, 233 the subclade 9 contains both Lineage 1 and Lineage 2 accessions, indicating that there was incomplete 234 lineage sorting or admixture between the two lineages that lead to the introgression of a lineage Glu-D1 235 haplotype into the Lineage 2 population. In the presence of both haplotypes, it appears there was a rare 236 recombination between the Lineage 1 and Lineage 2 Glu-D1 haplotypes, leading to the recombinant 237 haplotype Dx9a+Dy5a found in TA1688. 238 The Lineage 3 accession TA2576 also appears to carry a recombinant haplotype (Dx7b + Dy15b) (Figure  239 4). However, our dataset did not contain the exact haplotypes involved in the recombination that led to 240 Dx7b + Dy15b. The closest x subunit haplotype is Dx7a, the only other Lineage 3 haplotype, from major-241 clade III and the closest y subunit is the Lineage 2 haplotype Dy15a from major-clade II (Lineage 2). We 242 therefore designated the x and y subunit haplotypes of TA2576 haplotypes within subclades 7 and 15. 243 Geographical analysis reveals that TA2576 was collected from a region shared with other Lineage 3 244 accessions. However, the accessions containing Dy15a haplotype were not collected from a shared 245 region with the L3 accessions. Although not conclusive, the most parsimonious explanation is therefore 246 that Dx7b + Dy15b represents a recombinant haplotype between the x and y subunits from two different 247 alleles. Within our current panel, however, we are unable to differentiate exactly which original 248 haplotypes gave rise to this recombinant haplotype. were then centrifuged for 2 min at 10,000 rpm, and the supernatant containing the gliadins was 311 discarded. The pellet was then mixed with 0.1 ml of a 1.5% (w/v) DTT solution in a Thermomixer for 30 312 min at 65 o C, 1,400 rpm, and centrifuged for 2 min at 10,000 rpm. A 0.1 ml volume of a 1.4% (v/v) 313 vinylpyridine solution was then added to the tube which was subsequently placed again in a 314 Thermomixer for 15 min at 65 o C, 1,400 rpm, and centrifuged for 5 min at 13,000 rpm. The supernatant 315 was mixed with the same volume of sample buffer (2% SDS (w/v), 40% glycerol (w/v), and 0.02% (w/v) 316 bromophenol blue, pH 6.8) and incubated in the Thermomixer for 5 min at 90 o C and 1,400 rpm. Tubes 317 were centrifuged for 5 min at 10,000 rpm, and 8 ml of the supernatant were used for the glutenin gel. 318 Glutenins were separated in polyacrylamide gels (15% or 13% T) prepared using 1 M Tris buffer, pH of 319 8.5. Gels were run at 12.5 mA for ~19 h. Alleles were identified using the nomenclatures proposed by 320 Payne and Lawrence (1983) 17 for bread wheat high molecular weight glutenins and Lagudah and 321 Halloran (1988) 22 for previously described Ae. tauschii high molecular weight glutenins. were identified as sharing greater than 99.8% variant calls. 336

Molecular Haplotype Analysis 337
Ae. tauschii and hexaploid wheat variant call format (vcf) files were merged in R and variant calls were 338 recoded to reference (-1) and alternate (1) alleles in R and heterozygous calls were set to missing. 339 Variants were filtered on the following criteria: a variant must be present in either hexaploid wheat or 340 Ae. tauschii, must have a quality score greater than 30 and be present in greater than 50% of samples. 341 Given that we expected novel alleles present in single accessions, no minimum minor allele frequency 342 was set. Samples sharing the same variants were considered to share the same molecular haplotype. 343 Genetic distances were calculated as the Euclidean distance on the A matrix of the variants in R. The A 344 matrix was calculated with 'A.mat()' from the rrBLUP package 35 and Euclidean distances with 'dist()'. 345 Hierarchical clustering of the genetic distances were found using hclust() and converted to a 346 dendrogram object before plotting with the dendextend package 36 . 347      Cryptic haplotypes within SDS-PAGE alleles. The molecular haplotypes for Ae. tauschii accessions at the sites where the accessions were collected. Corresponding SDS-PAGE allele is noted for 2+12 mobility alleles.