Fasciclin-like arabinogalactan protein gene expression is associated with yield of flour in the milling of wheat

A large portion of the global wheat crop is milled to produce flour for use in the production of foods such as bread. Pressure to increase food supplies sustainably can be address directly by reducing post-harvest losses during processes such as flour milling. The recovery of flour in the milling of wheat is genetically determined but difficult to assess in wheat breeding due to the requirement for a large sample. Here we report the discovery that human selection for altered expression of putative cell adhesion proteins is associated with wheats that give high yields of flour on milling. Genes encoding fasciclin-like arabinogalactan proteins are expressed at low levels in high milling wheat genotypes at mid grain development. Thirty worldwide wheat genotypes were grouped into good and poor millers based flour yield obtained from laboratory scale milling of mature seeds. Differentially expressed genes were identified by comparing transcript profiles at 14 and 30 days post anthesis obtained from RNA-seq data of all the genotypes. Direct selection for genotypes with appropriate expression of these genes will greatly accelerate wheat breeding and ensure high recoveries of flour from wheat by resulting in grains that break up more easily on milling.

. Flour yield of harvested grain, identified by as % of total flour yield, was determined using a Buhler mill. Genotypes were grouped as poor milling (PM) wheats and high milling (HM) wheats if their milling yield was equal to/above or below 77%, respectively. Grain hardness Index (HI) was measured for 300 mature wheat grains using Single Kernel Characterization System (SKCS). SD; standard deviation. Blue colour indicates genotypes with Pina-D1a/Pinb-D1a, orange colour indicates genotypes with Pina-D1a/Pinb-D1b, and green colour indicates genotypes with Pina-D1b/Pinb-D1a. Wheat Seeds were sourced from the Australian Winter Cereal Collection, Tamworth, Australia. and stiffness measured by SKCS has also been shown to correlate with milling yield 15 . Much of this may explain environmental rather than genetic variation in milling performance. Despite of all this analysis the genetic and environmental components of variation and their molecular basis remain unclear.
Genetics controls a large part of the variation in flour yield. For example, Laidig et al. 16 (2017) reported that genotype was a large contributor to variation in yield of wheat in Germany . Knowledge of the molecular basis 17 of the intense human selection for high flour yield in breeding modern wheat genotypes would greatly accelerate the breeding of wheat combining high productivity and the grain quality required by consumers. This capability will be of great value in rapidly adapting wheat to changing and variable climates. The main objective of this study was to identify candidate genes which may control flour yield of wheat. RNA Seq of 30 diverse worldwide wheat genotypes was conducted at 14 and 30 days post anthesis (DPA) and gene expression was analysed relative to flour yield obtained on laboratory milling of the mature grain.   Table 3. Differentially expressed genes identified in developing wheat seeds at 30 days post anthesis of wheat genotypes with a high flour yield on milling. Transcriptome data from developing seeds at 30 days post anthesis of 13 poor milling and 19 high milling wheat genotypes obtained by next generation sequencing was subjected to RNA-Seq analysis followed by application of the Baggerley's test to identify statistically significant differences in gene expression. The list of candidate genes shown are at FDR corrected P value of < 0.01 and sorted based on weighted proportion difference.

Results
Milling yield. Milling yield of the thirty wheat genotypes measured as % of total flour yield is shown in Table 1. Among the thirty genotypes, 17 were identified as high milling wheat genotypes (HM-group) (>/ = 77%) and 13 were identified as poor milling wheat genotypes (PM-group) (<77%). The milling yield of genotype NW51A was the lowest (71.4%) while that of Ellison was the highest (79.7%), respectively ( Table 1). The milling yield of Gregory and Bobwhite could not be determined due to lack of grain samples from these trails. However, these genotypes are commercially known as high milling wheats and thus were placed into the HM-group. Among the thirty genotypes, 14 genotypes had Pina-D1a/Pinb-D1a alleles of puroindoline genes within which 7 genotypes showed soft grain texture (SKCS HI < 50) and 7 showed hard grain texture (SKCS HI > 50), 8 genotypes were hard with Pina mutation (Pina-D1b/Pinb-D1a), and 7 were hard wheats with Pinb mutation (Pina-D1a/Pinb-D1b) ( Table 1) 18 . Genotypes in HM-group and PM-group were mixture of soft and hard wheats containing different set of Pin alleles. All the hard wheats with Pinb mutation were good millers, except Giza 139.

Identification of Differentially expressed genes in good millers using Baggerley's test. RNA-Seq
analysis of genotypes corresponding to the PM-group and HM-group, followed by application of the Baggerley's test for identification of significant differences led to the identification of twelve genes at 14 DPA and eight genes at 30 DPA that were statistically differentially expressed at a false discovery rate (FDR) p-value of <0.01 (Tables 2  and 3). Sorting the gene list based on proportion difference led to the identification of both up-and down-regulated genes. At 14 DPA, highly significantly differentially expressed transcripts GH726097 and TC379245 showed high homology with a fasciclin-like arabinogalactan-8 (FLA-8). GH726097 and TC379245 were expressed at much lower levels in wheat genotypes with high flour yield; 28.2 and 6.1 fold, respectively (Fig. 1). These two transcripts were also down regulated at 30 DPA in the HM-group. A third transcript (TAGI; TC429180) with homology with Characterization of three putative FLA sequences. The alignment of three differentially expressed putative FLA genes GH726097 (536 bp), TC379245 (1267 bp) and TC429180 (1252 bp) is shown in (Fig. 2). When BLASTn was performed on these three genes using Ensemble with IWGSC survey sequences, two transcripts Traes_2BS_D44E4B43A (cDNA; 1026 bp) and Traes_4BS_03EB5D160 (594 bp) showed the highest similarity. Traes_4BS_03EB5D160 sequence is exactly identical to Traes_2BS_D44E4B43A from base 433 to base 1026. The first of these is located on chromosome 4BS and the later on 2BS. A single FAS domain unique to FLA   genes was identified in both these sequences. Limited annotation is available for these genes to date (http://www. wheatgenome.org/). NCBI BLASTn identified DQ872381 as a highly similar gene to GH726097, TC379245 and TC429180. DQ872381 has been annotated as a fasciclin-like arabinogalactan-8 19 .
Nucleotide and the polypeptide sequence alignment performed on Clone Manager (Sci Ed, USA) showed that TC379245 shares 99% sequence identity with Traes_2BS_D44E4B43A and DQ872381 in the coding region (Figs 3 and 4). In TC379245 conserved regions were identified in the FAS domain (Fig. 2), H1:Leu-Thr-Val-Phe-Cys-Pro-Glu-Asp-Lys-Val, [YF]:Val-Leu-Leu-Tyr-His-Gly-Ala-Ala-Val-Cys and H2: Val-Tyr-Val-Ile-Asp-Val-Ile-Ile-Pro. Two notable conflicts were observed in this alignment. First, nucleotide substitution of 'C' (cytosine) in the cDNA of TC379245 at position 486 (425 in CDS) was observed to cause a frame shift mutation. However, none of the reads mapped to TC379245 for any of the wheat genotypes showed that substitution. Therefore, the nucleotide 'C' at the position 486 was removed from the TC379245 sequence. Second, a deletion of a nucleotide codon was observed in Traes_2BS_D44E4B43A at position between 91^^^92 (21^^^22 in CDS) (corresponding position in TC379245 CDS; 22,23,24). This missing codon codes for Leucine at the 7 th position in the polypeptide chain of DQ872381 and Traes_2BS_D44E4B43A. All the nucleotide conflicts and the corresponding changes in the amino acid polypeptide are shown in Table 4.
The TC429180 nucleotide sequence showed 87% identity with TC379245. Deletion of a 78 bp sequence was observed in TC429180, from position 461 to 541, corresponding to TC379245. TC429180 RNA-seq mapping files from all the wheat genotypes were manually checked to determine coverage at the conflict region. In none of the  files were reads observed to span the region of conflict. In none of the hits obtained for TC429180 on NCBI and Ensemble BLAST was the 78 bp deletion observed. Therefore, true identity of this deletion was not confirmed. A FAS domain was identified in the nucleotide sequence of TC429180 (Fig. 2). The translated amino acid sequence shows 99% sequence similarity to the TC379245 polypeptide before the start of deletion site but the later part of the sequence shows no similarity due to the apparent frameshift in the TC429180 polypeptide. However, as the 78 bp deletion couldn't be confirmed the true sequence of the TC429180 polypeptide couldn't be identified. GH726097 showed 90% sequence similarity to the TC379245 in the aligned region (Fig. 2). The GH726097 sequence is less than half the length of TC379245 and no open reading frame was identified within the sequence and thus it is likely to be an incomplete transcript sequence. No FAS domain was identified in this nucleotide sequence. Reads mapped to these two transcripts were sequence specific/unique which provides the evidence for them to be two different genes.
Analysis of published wheat transcriptome data was used to examine the tissue specificity of expression of these genes. In the grain, both transcripts have been reported in pericarp (inner and outer) but not in endosperm tissue (Fig. 5, transcriptome data from 20 ).

Discussion
Variation in the gene pool provides the resource for breeding improved genotypes 17 . The relationship between gene expression and flour yield in milling was analysed for 30 diverse wheat genotypes. Gene expression varies throughout seed development 21 with genes expressed at different stages contributing to the final composition and properties of the grain. Much of the variation in hardness of the wheat genotypes was explained by differences in Pin genes 18 and these loci have been associated with flour yield variation in some germplasm 8 but hardness did not explain most of the variation in milling performance (Fig. 1).
Down regulation of expression of transcripts (TAGI; TC379245 and GH726097) encoding cell adhesion proteins, FLA-8, was found at the critical mid grain development stage (14 DPA) and also at a later stage, close to maturity (30 DPA) in genotypes that gave high flour yield ( Table 2). Reduced expression of these genes may be associated with a reduction in the structural strength of the grain. These changes are likely to result in a grain that breaks up more easily in the mill and yields greater quantities of flour.
The arabinogalactan proteins (AGPs) are highly glycosylated proteins that are rich in hydroxyproline and found in plant cell wall and plasma membrane 22 . AGPs that contain putative cell adhesion domain known as fasciclin domain are known as fasciclin like arabinogalactan proteins. FLA has been implicated in cell adhesion and may link the cell membrane and cell wall. AGPs carry glycosylphosphotidylinositol (GPI) anchor at the C-terminus which signals the peptide to the plasma membrane. However, wheat FLA-8 has a shorter GPI-anchor which raises a question about its binding to plasma membrane 19 . But it might have a role in cell to cell and cell to extracellular environment interaction as suggested by Faik et al. 19 . Modification of a FLA in poplar 23 has been shown to reduce the mechanical strength of plant tissues 24,25 . Human selection for higher flour yield has apparently resulted in the selection of wheat genotypes with greatly reduced levels of expression of this gene. Nanospherical arabinoxylan proteins have been recently identified as the adhesive component secreted by the climbing plant, English ivy 26 .
A third transcript (TAGI; TC429180) with homology with FLA-8 was expressed at higher levels in high milling wheats at 30 DPA. The tissue specificity of expression is not known in this system but varies significantly in other plants 25 . Most gene expression at 30 DPA may be in the embryo as the endosperm is terminally differentiated by this stage. Recently Wilkinson et al. 27 (2017) reported that the product of the granule softness protein gene was an AGP. This would complicate any attempts to relate FLA levels to milling yield. Methods for distinguishing the FLAs from other AGPs would be required. Hard wheats are generally considered to be better milling wheats 14 and a reduced expression of AGP encoded by the Gsp-1 gene at the hardness locus might contribute to improved flour yield. However this study showed that flour yield was not associated with the pin genotype at the hardness locus (Fig. 1).
FLA-8 related genes were found to be located on chromosome 2BS and 4BS in the Chinese Spring wheat genome assembly with limited annotation to date (http://www.wheatgenome.org/). A major QTL for flour yield has been reported from chromosome 2B 7,12 . The QTL on chromosome 2BS had the highest overall LOD score (13.1) in analysis of the progeny of a Sunco X Tasman cross, explaining 14.4-31.3% of the variation in flour yield at different sites 7 . Sunco was shown to have low expression of the FLA gene in the current study and was the source of the high flour yield in the Sunco X Tasman cross 7 . A QTL on chromosome 4B was the second most significant flour yield QTL in this cross. However in another cross, Katepwa X CD87, the 4B QTL was the most important and explained 13.6-23.8% of variation in flour yield. The coincidence of the structural genes encoding FLA and the chromosomal locations of the major know QTLs for milling yield in these crosses on chromosomes 4B and 2 A provides strong confirmation of the significance of these associations.
The strength of the endosperm may be influenced by properties of the protein bodies and starch granules 28 that fill the endosperm cells and account for most of the grain content. The expression of storage protein genes was significantly altered in wheat with high flour yield. Several genes may contribute to changes in protein synthesis in the better milling wheat genotypes. The expression of 40 S ribosomal protein was significantly higher in the better milling wheat genotypes at both 14 and 30 DPA (Tables 2 and 3). This protein may regulate protein synthesis and down regulation of this gene has been reported in a late ripening citrus genotype 29 . Selection for protein composition that better suits end uses such as bread making may have been more intense in the development of genotypes also selected for milling quality.
Earlier studies have suggested that starch granule size distribution may be associated with flour yield differences 9,14 . This suggests that modification of starch may contribute to flour yield. Jing Hoang No.1 had a very high level of expression of β-amylase and was the only genotype giving high flour extraction without significantly reduced expression of the FLA-8 genes. High flour yield in wheat may be the result of unintended human selection for different types of genetic variation in different genotypes. New deliberate combinations of these individual genes may have application in development of wheat genotypes with very high flour yields. Other wheat is processed by directly grinding the grain. This type of flour is used to make products such as chapatti. Wheat genotypes in regions where these products predominate may not have been subjected to selection for flour yield in roller milling.
The tissue specificity 30 of the genes identified in this study may be important as changes in the expression of many of these genes may have been achieved by human selection for flour yield without perturbing critical gene expression in other parts of the plant. These genes are specifically expressed in the pericarp where they may influence the adherence of the pericarp to the Aleurone/starchy endosperm. In hexaploid wheat modification of sub-genome specific expression (in the A, B or D sub-genomes) may allow tissue specific alteration of function. Recent analysis of expression of defence genes and response to pathogen infection has been shown to result in highly sub-genome specific expression 31 .
Analysis of the wheat grain transcriptome has identified a gene controlling bread quality 32 and new pathways for carbon assimilation 33 . Although the gene controlling bread quality has at least five copies in the wheat genome with high sequence homology, the cause of the differential expression of this gene was identified to be the promoter region of a specific allele of this gene. A simple PCR-test was develop which allowed the identification of wheat genotypes for the presence or absence of the specific allele of gene controlling bread quality 32 which has been used by CIMMYT 34 . A similar strategy can be used to identify the presence or absence of the specific FLA allele which corresponds to high FLA-8 expression and thereby select for wheat genotypes for good milling.
Selection for these genes and the flour yield genes identified in the present study provides a new set of tools for wheat breeders that should reduce the constraints of selection for wheat quality 35 on the rate of genetic gain in wheat breeding and assist the selection of new genotypes to support the adaptation of wheat production to climate change 36,37 .

Experimental Procedures
Plant Material. Wheat grains of a worldwide set of wheat genotypes were sourced from Australian Winter Cereal Collection (AWCC), Tamworth, NSW, Australia; now known as Australian Grains Genebank (AGG), Horsham, Victoria, Australia (http://www.seedpartnership.org.au/associates/agg). Plants of all thirty wheat genotypes were grown under field conditions at Narrabri (NSW, Australia) and harvested at maturity. Another trial was grown at Biloela (Qld, Australia). The conditions at this site were harsher resulting in some varieties not performing well. Milling of these samples gave flour yield data that showed a highly significant (P = 0.01) correlation with that from the Narrabri site but the data was not included in the association analysis because it was considered a far less reliable measure of the flour milling qualities of these genotypes when grown in unfavourable environments.
Test milling and grain hardness. Milling yield of harvested grain, identified by as % of total flour yield, was determined using a Buhler mill as explained by Edwards et al. 9 . Wheat genotypes were grouped as poor milling (PM) wheats and high milling (HM) wheats if their milling yield was equal to/above or below 77%, respectively.
SCIentIFIC REPORts | 7: 12539 | DOI:10.1038/s41598-017-12845-y Grain hardness for all the genotypes was measured using Single Kernel Characterization System (SKCS) as described by Nirmal et al. 18 . Puroindoline alleles in all the genotypes were also characterized as described in Nirmal et al. 18 .
RNA isolation, cDNA preparation and NGS sequencing. Total RNA was isolated from the whole caryopsis at 14 days post anthesis (DPA) and at 30 dpa 38 . cDNA was prepared and used to produce indexed Illumina NGS libraries which were then multiplexed to allow the sequencing of eight indexed libraries in one lane on a GA IIx Illumina sequencing platform to obtain 100 bp paired-end reads. RNA sequencing of the thirty wheat genotypes generated a total of ~2.5 million to ~7.2 million reads varying with the genotype.
RNA-Seq analysis. All NGS data was analysed using CLC Genomic Workbench Version 9.0 (CLC Bio, Qiagen). All the reads were trimmed for quality to exclude calls with Fred quality scores less than 20. Following trimming all sequence data was subjected to quality analysis. After quality check reads were subjected to RNA-seq analysis. Triticum aestivum gene index (TAGI) database (ftp://occams.dfci.harvard.edu/pub/bio/tgi/ data/Triticum_aestivum/) which contains 221,925 tentative consensus sequences obtained from EST, cDNA and mRNA libraries was used as a reference database for RNA-seq analysis. Read alignment parameters length fraction and similarity fraction were set to 0.8 and 0.9, respectively. Paired reads were counted as two. Gene expression was normalised by calculating reads per kilo base per million mapped reads (RPKM).
Tissue specificity of expression of FLA genes in wheat grain was investigated in transcriptome data available from published data 20 for outer pericarp, inner pericarp, and endosperm tissues at 12 days-post-anthesis in developing wheat grains (cv. Holdfast) by RNA-Seq analyses using the TAGI reference in CLC genomics workbench. The data included three biological replicates and since there is was no significant difference between the replicates the mean values are shown in Fig. 5.
Identification of differentially expressed genes. RNA-seq data of the PM-group and the HM-group was subjected to proportion-based Baggerley's test 39 within CLC workbench to identify differentially expressed genes (DEGs) in the HM-group when compared to the PM-group. Differentially expressed genes with FDR (false discovery rate) p-value of < 0.01 were considered as statistically significant candidate genes associated. A further sorting of the DEGs was undertaken based on Baggerley's weighted proportion difference, to identify those genes with consistent expression difference in genotypes within the PM and HM wheat genotypes. Data Availability Statement. All NGS sequence data as raw data was submitted to NCBI at the Sequence Read Archive (SRA) and is available as SRA Submission # SUB2954840 (under BioProject# PRJNA392390 and BioSample #SUB2843912).