Genetic characterization of cysteine-rich type-b avenin-like protein coding genes in common wheat

The wheat avenin-like proteins (ALP) are considered atypical gluten constituents and have shown positive effects on dough properties revealed using a transgenic approach. However, to date the genetic architecture of ALP genes is unclear, making it impossible to be utilized in wheat breeding. In the current study, three genes of type-b ALPs were identified and mapped to chromosomes 7AS, 4AL and 7DS. The coding gene sequence of both TaALP-7A and TaALP-7D was 855 bp long, encoding two identical homologous 284 amino acid long proteins. TaALP-4A was 858 bp long, encoding a 285 amino acid protein variant. Three alleles were identified for TaALP-7A and four for TaALP-4A. TaALP-7A alleles were of two types: type-1, which includes TaALP-7A1 andTaALP-7A2, encodes mature proteins, while type-2, represented byTaALP-7A3, contains a stop codon in the coding region and thus does not encode a mature protein. Dough quality testing of 102 wheat cultivars established a highly significant association of the type-1 TaALP-7A allele with better wheat processing quality. This allelic effects were confirmed among a range of commercial wheat cultivars. Our research makes the ALP be the first of such genetic variation source that can be readily utilized in wheat breeding.

Scientific RepoRts | 6:30692 | DOI: 10.1038/srep30692 residues present in the mature protein 15 . The existence of proteins related to LMW gliadins, and constituting a new family of grain prolamin proteins, has also been confirmed in barley 16,17 and rye 18 . DuPont and co-workers described a protein isolated from wheat grains as 'avenin-like' , based on partial amino sequences determined by mass spectrometry 19 . Similarly, Vensel and coworkers 20 identified five avenin-like proteins in the proteome of albumins and globulins during early and late stages of grain development. Kan and coworkers reported two classes of cDNAs encoding two types of ALPs, namely type-a and type-b 21 . In a phylogenetic analysis of the prolamin superfamily, the ALP genes co-locate as a single cluster, with its closest neighbors being avenin of oats and the sulphur-rich proteins (α -gliadins, γ -gliadins, LMW subunits of glutenin). Furthermore, in the same study the authors observed that type-a ALPs contain 14 cysteine residues, among which eight cysteines form the characteristic conserved cysteine skeleton of the typical prolamins (similar to avenin, α -gliadins, γ -gliadins, and LMW subunits of glutenin) 22 . It is noteworthy that type-a ALPs can form seven intra-chain disulfide bonds, which is typical of monomeric LMW gliadins. Type-b ALPs contain two repetitive domains (R1, R2), each with eight cysteine residues in homologous positions to the cysteines of γ -gliadin and oats avenin protein. Type-b ALPs also exhibited some differences in cysteine distribution, with a total of 18 or 19 cysteine residues. In particular, Mamone and coworkers 23 detected type-b ALPs in the glutenin fraction of durum wheat cultivar Svevo, while Kan and co-workers 21 found that the two cysteines in the N-terminal domain are not conserved in various Aegilops species, hence suggesting that they could be involved in inter-chain linkages to polymeric subunits of glutenins. The identification of type-b ALPs was supported by the acquisition of sequences from a reasonable number of tryptic peptides matching the expected molecular weights and pI values 23 . The higher number of cysteines in type-b ALPs was expected to have a significant effect on folding and the arrangement of disulfide linkages, not only by stabilizing the molecular structure, but also by influencing glutenin polymer formation. Chen and coworkers 24 predicted that type-b ALPs were capable of forming eight intra-molecular disulfide bonds, with three free cysteine residues involved in inter-molecular disulfide bond formation. They confirmed that type-b ALPs can notably perform as "chain branches", increasing the probability of glutenin macro-polymer (GMP) formation and including other glutenin subunits 24 . Ma and coworkers overexpressed type-b ALPs in two transgenic wheat lines, resulting in a highly significant improve of dough mixing properties and provided strong evidence for their incorporation into gluten polymers 25 .
Until now, when selecting for dough and baking quality improvement, wheat breeders have mainly relied on the genetic variation underlying gluten proteins. The effect on dough mixing properties associated with ALPs represents a novel genetic effect that has not been utilized in a targeted way in wheat grain functionality breeding thus far. Marker-assisted selection targeting ALPs depends on both natural allelic variation of ALPs and their validated effects on dough mixing properties. The objectives of this study were to locate the type-b ALP coding genes, find the number of available alleles and quantify the allelic effects for each locus, and to develop allele-specific markers for wheat grain functionality breeding.

Results
Type-b ALP coding genes in Triticum aestivum. ALP specific primers were used to amplify the complete coding sequence of type-b ALP genes from the genomic DNA of 19 cultivars. The amplified products covering the start and stop codons were about 902 bp in length (Fig. 1). Nucleotide sequences highly similar (99%) to the previously reported type-b ALP gene sequence (Accession No. FJ529695) were obtained.
Sequence alignment and analysis. The sequences of the amplified ALP genes were used to search the EnsemblPlants (http://plants.ensembl.org/Triticum_aestivum/Info/Index) and the International Wheat Genome Sequencing Consortium (IWGSC) databases. The results showed that type-b ALP genes were transcribed at a  high rate and consisted of a single uninterrupted exon. The results were consistent with previous studies 26 . In addition, the type-b ALP gene sequences were good matches to three surveyed sequences (Chinese Spring) on chromosomes 7DS (99%), 4AL (98%) and 7AS (97%).
Gene locations. Three pairs of specific primers were designed, targeting ALP genes on chromosomes 7AS, 4AL and 7DS to verify the blasted results (Table 1). These primers were tested across the entire set of Chinese Spring deletion lines. Results were consistent with the surveyed sequence databases and the chromosomal location of the gene products ALP-7A, ALP-4A and ALP-7D was confirmed (Fig. 2). We thus named the three ALP gene loci TaALPb-7A, TaALPb-4A, and TaALPb-7D, accordingly. SNP and indel analyses. Genomic DNA of 19 cultivars was amplified using the primer pairs specific for ALP-7A, ALP-4A and ALP-7D, with each primer pair amplifying one single sequence across all cultivars. The full-length sequences at the three gene loci were either 855 or 858 bp, encoding proteins with 284 and 285 amino acid residues, accordingly. In addition, SNP and indel polymorphisms were discovered among the amplicons of different cultivars at loci TaALPb-7A and TaALPb-4A. Seven polymorphic sites were detected among the TaALPb-7Aamplicons, including one deletion (three bases) and six SNPs involving five transversions and one transition (Fig. 3). Eighteen polymorphic sites were detected among the TaALPb-4A amplicons, including seventeen SNPs involving14 transversions and 3 transitions, as well as one indel variant (Fig. 4). No variationwas found at the TaALPb-7D locus (Fig. 5). These results indicate that multiple alleles exist for TaALPb-7A and 4A while no or little genetic variation exists at the TaALPb-7D locus. Further comparison revealed that the TaALPb-7A gene had three alleles, designated TaALPb-7A1 (GenBank accession no. KU286147), TaALPb-7A2 (GenBank accession no. KU286148) (frequency 50.98%) and TaALPb-7A3 (GenBank accession no. KU286149) (frequency 49.02%) in the current study (Table 2), while TaALPb-4A gene had four alleles, TaALPb-4A1 (GenBank accession no. KU286150), TaALPb-4A2 (GenBank accession no. KU286151), TaALPb-4A3 (GenBank accession no. KU286152), and TaALPb-4A4 (GenBank accession no. KU286153). The TaALPb-7D (GenBank accession no. KU286154) locus did not show any allelic variation across the cultivars screened in this study. Analysis of the translated protein sequences revealed that the signal peptides at the N-and C-termini were rather conserved, with hardly any variation detected. The sequence differences occurred mainly in the repetitive region. Major variations were detected on 7AS and 4AL alleles. Among the 7AS alleles, TaALPb-7A1 andTaALPb-7A2encode mature proteins, while allele TaALPb-7A3 contained a stop codon (a SNP resulting in CAA→ TAA codon change. Figure 3), leading to early termination of translation in 10 cultivars (Supplementary Figure 1). Anonymous silenced ALP genes have been previously reported [27][28][29] . In-frame stop codons were not detected for the 4AL alleles, although many variations occurred within the mature proteins (Supplementary Figure 2). In addition, 18 cysteine residues

Marker
Primer sequences (5′ -3′ )  Table 1. Chromosome-specific primer sets for cloning type-b ALP genes. were detected in 7AS and 7DS ALPs. The 4AL ALPs contained 19 cysteine residues, exhibiting more cysteine residues than previously reported for endosperm-specific storage proteins.
In general, the type-b ALP proteins can be considered to be glutamine and proline-rich proteins, although less than gliadins and LMW glutenins, due to the lack of extensive repetitive sequences. At the same time, ALP proteins exhibited a conserved distribution of cysteines ( Supplementary Figures 1 and 2), which are predicted to be able to form seven or eight intra-molecular disulfide bonds among the 18 or 19 cysteine residues. The remaining cysteines (at least two) may form inter-molecular disulfide bonds linking to adjacent storage protein subunits.
Phylogenetic analysis. The phylogenetic relationship of the 42 cloned type b ALPs sequences was analyzed by applying UPGMA to the aligned complete coding sequences of all clones and wheat storage protein genes, as well as the reported ALPs of wheat-related species available from various databases ( Table 3). As shown in Fig. 6, the cloned type-b ALP sequences clustered according to their chromosomal origin. The cloned ALP sequences were closest to the reported type-b ALP sequences of related species, followed by sequences corresponding to HMW-GS and LMW-GS, while ω -gliadin were the furthest in evolutionary terms (Fig. 6).  Allelic effects. The fact that the TaALPb-7A locus has two types of alleles, active and silent, allowed us to study its allelic effects. Allele-specific PCR markers were designed to differentiate the two types of TaALPb-7A alleles and a total 102 wheat cultivars or lines were selected for quality testing and marker analysis (Fig. 7).  Mixograph analyses were conducted to assess wheat dough strength using procedures published previously [30][31][32][33][34] . Significant allelic effect differences were detected bewteen the active and silent alleles of TaALPb-7A. The active allele was significantly associated with higher dough strength parameters, including Midline Peak Time (P < 0.0443), and Midline Time × Width (P < 0.0096) ( Table 2). Meanwhile, the component of HMW-GS, protein content and gluten content of 102 wheat cultivars or lines were analyzed (see Supplementary Table 1). Results revealed that the HMW-GS alleles were randomly distributed between the allelic types. The favorable subunit Dx5 was found in 33% of the silent wheat lines and 29% in the active lines. No significant association was detected between the allelic types and grain protein content or gluten content (Table 4), indicating the high dough strength of the active allelic type is from the expression of TaALPb-7A.  Results revealed that the cultivars with the active allele give a normal gene expression, the ratios of expressed gene copy numbers between TaALPb-7A and actin ranged from 1:2.54 to 1:3.36 (Table 5), while the four cultivars with the silent allele had no gene expression.

Discussion
Cysteine-rich wheat grain storage avenin-like proteins (ALPs) capable of forming intra-molecular disulfide bonds were discovered in recent years and are considered atypical gluten components of the wheat grain storage protein complement. However, the presence of similar low-molecular weight subunits in glutenins and gliadins has been reported in the 1970s 35,36 , and these seem capable of forming strong in vivo associations among themselves and with HMW-GS and LMW-GS, apparently by inter-chain disulfide bonds. ALPs make up about 1% of total endosperm proteins 37 . Contrary to the typical gluten proteins that are characterized by large repetitive central domains these non-traditional gluten proteins lack repeating sequences. In 2D gels, type-b ALPs migrate only slightly faster than the LMW glutenins, α -, or γ -gliadins, due to sequence duplication in the central domain (R1, R2), compensating to a large extent for the missing repeating sequence domain 37,38 . The unique properties demonstrated by type-b ALPs make them an ideal component of elastic disulfide-linked aggregates. In this study, phylogenetic analysis clearly showed that the type-b ALP sequences of common wheat clustered in the same   class. The cloned sequences of the current study also clustered together, forming a small class of its own. Further, the sequences of type-b ALP genes indicated a genetic relationship to the unique C-terminal domains of gluten proteins (LMW-GS and gliadins) and are notable for the absence of significant repetitive domains of typical HMW-GS. Due to the great homology of ALP genes to gliadins and avenins, these genes might be primitive versions of earlier storage proteins predating development of the repetitive domains of the traditional gluten proteins. Alternatively, they might have evolved by losing the repetitive domains of the ancestral genes. Further work is needed to establish a clear evolutionary context for ALPs in relation to the traditional gluten proteins. It is noteworthy that almost all reported type-b ALP genes were derived from chromosome 7D, suggesting that the genes on chromosomes 7A and 4A in the current study were new discoveries. More importantly, the allelic effects identified in this study were attributed to the newly discovered 7A locus, representing a class of novel non-traditional gluten protein variation that can be readily utilized in breeding for wheat grain functionality. Our results confirmed that ALP genes belong to a multigene family, like other gluten proteins genes 26,39-41 . Cole and coworkers 42 reported that the tetraploid forms (AABB) of wheat are actually heterogeneous for the diploid donors of the A and B genomes, which helps explain the genetic variability at the 7A and 4A loci. However, the addition of the D genome to the tetraploid ancestor of bread wheat, even though it occurred on several separate occasions, seems to have relied on the hybridization with a rather conservative Aegilopsspp genome 42 . We found no genetic variation at the 7D locus of type-b ALP genes in the lines and varieties investigated in this study.
Despite the potential of type-b ALP proteins to form intermolecular bonds, their low abundance and the absence of a repetitive domain might limit their ability to play a major role in determining dough functional properties, so further work is needed to establish the potential of individual ALPs for dough viscoelasticity improvement. Research conducted on transgenic type-b ALP wheat lines confirmed the presence of free cysteines capable of improving dough mixing properties by forming extra inter-chain disulfide linkages with glutenins (HMW-GS and LMW-GS) 43 . Future research on the expression of ALPs, aimed at providing a more detailed understanding of peptide chain interactions, disulfide bond arrangements, and tertiary structure formation will allow us to delve deeper into the molecular interactions with gluten proteins. Combination and association analysis using targeted allelic ALP combinations will shed further light upon the highly complex interactions due to the allelic composition of sulfur-rich proteins (γ -and α -gliadin, LMW-GS), as well as sulfur-poor proteins (ω -gliadins and HMW-GS).
Although many researchers have mentioned that type-b ALP genes of wheat belong to a multigenefamily 26,[39][40][41] , there still remained a paucity of genetic information about the chromosomal location, number of loci and alleles, and allelic effects. The current study clearly identified the chromosomal locations of type-b ALP genes and the number of alleles at each locus for the first time. In this study, the three type-b ALP gene loci were mapped to chromosomes 7AS, 4AL and 7DS. Theoretically, due to the allohexaploid (AABBDD) nature of bread wheat, the three gene loci should be located on three homeologous chromosome locations (7AS, 7BS and 7DS). The reason for the unusual chromosomal locations can be found in the evolutionary relationships of wheat chromosome arms [44][45][46][47] , ie., a 4AL/7BS translocation, a pericentric inversion, and a paracentric inversion that took place in the tetraploid progenitor of hexaploid wheat 48 . This clearly provides a theoretical basis for the localization of the type-b ALP loci on 7AS, 4AL and 7DS. Common factors contribute to the different types of allelic variations, including natural evolution and artificial selection. In this study, the TaALP-7A1 allele was detected in five Chinese cultivars (lines) (Jimai13J494, Jimai13P414, Jimai23, Jimai24 and Jimai44), while the TaALP-7A2 allele came from four Australian cultivars (Kauz, Yitpi, Gregory and Chara). It is expected that new alleles may be discovered by expanding the number of lines and varieties screened. In addition, multilocus analyses of the experimental wheat lines have shown that striking, non-random associations of alleles develop over certain loci, i.e. the wheat lines develop a highly organized genetic structure featuring multilocus gene complexes. The frequency of functional alleles (50.98%) and the silent allele (49.02%) for TaALP-7A among eight type-b ALP alleles at three-locus combinations are found at equal levels in the tested wheat cultivars. This equal distribution of functional and non-functional alleles indicates that they could be used for marker-assisted screening for improved wheat flour processing quality. The occurrence frequency of the active TaALP-7A1 and TaALP-7A2 (50.98%) alleles underlines the potential utility of these alleles in wheat breeding programs.
The use of functional markers (FM) is especially important for the accurate discrimination of different alleles in marker-assisted selection (MAS) 49,50 . Thus far, 56 FM for processing quality traits are have been developed for 16 loci, with 62 alleles associated with HMW glutenins, LMW glutenins, polyphenol oxidase activity, lipoxygenase activity, yellow pigment content, kernel hardness, and starch properties 50 . These FMs play an important role in MAS-based breeding for improved wheat grain functionality. However, selection for wheat dough properties and breadmaking qualities has been limited to the genetic variability o gluten using the available HMW and LMW glutenin markers. The ALP allelic variation associated with dough quality discovered in the current study represents a class of novel natural genetic variation that has not been previously utilized in wheat breeding. The FM developed for the active 7AS allele can be efficiently applied to track this newly discovered variation. Important genetic and cytogenetic aspects of wheat grain functionality that still require our attention are how the expression of genes associated with dough processing properties (HMW-GS, LMW-GS, gliadins and ALPs) relates to the response of wheat storage protein accumulation to certain environmental and physiological processes.

Methods
Plant material and experimental design. All wheat lines used in the current study is listed in Table 6. Allele-specific marker development. Primers targeting the type-1 TaALP-7A allele were designed based on SNP/InDel information: F: 5′-TGCAGCAGCTTAGCAGCTGCCAT-3′ ; R: 5′-GCTGGT AGGCTGATCCACCGGA-3′ . A total of 102 wheat cultivars and lines (Table 1) were screened using the allele-specific primers.

HMW-GS electrophoretic analysis.
The HMW-GS protein for SDS-PAGE was extracted from wheat grains by using a modified method based on Singh et al. 52 . In detail, 500 μ l of 55% (v/v) isopropanol was mixed with crushed individual kernels for 5 min through continuous vortexing, followed by incubation (30 min at 65 C), vortexing (5 min), and centrifugation (5 min at 10000 rpm). This step was repeated three times to completely remove gliadins. Add 600 μ l of 62.5 mM Tris-HCl (pH 6.8) buffer containing 10% (w/v) glycerol, 2% (w/v) sodium dodecyl sulfate (SDS), 0.003% (w/v) bromophenol blue, and 5% β -mercaptoethanol. The samples were boiled for 2 hours and then centrifuged for 5 minutes at 10000 rpm, 15 ml of upper solution of each sample were loaded on to the gel. Proteins were separated by SDS-PAGE according to Jackson et al. 53 using stacking separation gels containing 4% acrylamide, 0.3% bis acrylamide, 0.1% SDS, and 0.125 M Tris-HCL (pH 6.8), and 8.7% acrylamide, 0.3% bis acrylamide, 0.1% SDS, and 0.38 M Tris-HCL (pH 8.8). The bands of HMW-GS on SDS-PAGE were scored according to the nomenclature system described by Payne and Lawrence 54 .
Quality testing. A 10-gram mixograph (National Manufacturing Co., Lincoln NE) was used to evaluate wheat dough mixing properties, as described by Zhang and coworkers 55 . Mixograph Peak Time (MPT, min), Peak Integral (MPI, cm 2 ), Peak Width (MPW, %), and Midline Time × Width (MT × W, min) were measured as the four parameters selected for evaluating the dough quality. The statistical significance of mixograph data was assessed performing T-tests using the SAS/STAT System software, Version 8.0 (SAS Institute Inc. Cary, NG) 55 . DA7200 near infrared apparatus (Perten, Swedish) was applied to analyze the protein content and gluten content following the manufacture's suggestion.