Genome wide association study of the whiteness and colour related traits of flour and dough sheets in common wheat

Flour whiteness and colour are important factors that influence the quality of wheat flour and end-use products. In this study, a genome wide association study focusing on flour and dough sheet colour using a high density genetic map constructed with 90K single nucleotide polymorphism arrays in a panel of 205 elite winter wheat accessions was conducted in two different locations in 2 years. Eighty-six significant marker-trait associations (MTAs) were detected for flour whiteness and the brightness index (L* value), the redness index (a* value), and the yellowness index (b* value) of flour and dough sheets (P < 10–4) on homologous group 1, 2, 5 and 7, and chromosomes 3A, 3B, 4A, 6A and 6B. Four, three, eleven, eleven MTAs for the flour whiteness, L* value, a* value, b* value, and one MTA for the dough sheet L* value were identified in more than one environment. Based on MATs, some important new candidate genes were identified. Of these, two candidate genes, TraesCS5D01G004300 and Gsp-1D, for BS00000020_51 were found in wheat, relating to grain hardness. Other candidate genes were associated with proteins, the fatty acid biosynthetic process, the ketone body biosynthetic process, etc.


Results
. Extensive phenotypic variation for these traits was observed across four environments (i.e., two growing seasons and two locations) among the 205 winter wheat accessions. The whiteness, L*, a* and b* values of the flour and dough sheets were continuously distributed in the population ( Fig. S1 and Fig. S2), which like typical quantitative traits, indicating that they were genetically controlled by multiple genes. Analysis of variance showed significant differences in the flour whiteness and L*, a* and b* values of the flour and dough sheets (P < 0.0001) among genotypes and environments, as well as G × E interactions (Table S1). The h 2 values of flour whiteness and the b* value were 98% and 99%, respectively ( Table 1). The h 2 values of the a* and b* values of the fresh dough sheets (FDS) and the b* value of the dry dough sheets (DDS) were 50%, 48% and 46%, respectively ( Table 2).

Marker-trait associations (MTAs) of colour traits of flour.
In total, 24,355 mapped SNPs were used for MTA analysis 28,53 . Twenty-two MTAs associated with flour whiteness (P < 10 -4 ) were distributed on chromosomes 1A, 1B, 2B, 3A, 5D, 6A, and 7B (Table 3; Fig. 1). There were four SNP loci (Excalibur_c4709_576, TA001505-1171, BS00000020_51, RAC875_c34446_396) detected in two or more environments. The BS00000020_51 locus had the maximum phenotypic variation explained (PVE) with 15.95%. Alleles C and T of marker BS00000020_51 on chromosome 5D were associated with the largest phenotypic difference (2.13) ( Table S2). The phenotypic value of flour whiteness associated with BS00000020_51-C on chromosome 5D was significantly higher than that associated with BS00000020_51-T across all four environments, indicating that the contribution of BS00000020_51-C locus to flour whiteness was better than that of BS00000020_51-T locus (Table S2).
Eleven MTAs on different chromosomes, i.e., 1A, 3B, 4A, 5B, 5D and 6A, were associated with flour colour yellowness (b* value) (P < 10 -4 ) ( Table 6; Fig. 1). Of these, TA003858-0637 (6A) and BS00000020_51 (5D) were found in three and four environments, respectively. Alleles A and G of marker Kukri_c1214_437 on chromosome 5B were associated with the largest phenotypic differences (0.79). The phenotypic value of flour colour yellowness (b* value) associated with Kukri_c1214_437-G was significantly higher than that associated with Kukri_c1214_437-A across all four environments, indicating that Kukri_c1214_437-G was better than Kukri_ c1214_437-A for the b* value (Table S2).
In general, MTAs consistently identified in more than one environment were considered to be stable. There were four, three, eleven and eleven stable SNP loci for flour whiteness and the L*, a* and b* values, respectively. There was one SNP marker, BS00000020_51, associated with both flour whiteness and flour L* and b* values.
MTAs of the colour traits of dough sheet. Ten MTAs significant at the P ≤ 10 -5 level for the L* value of dough sheet colour were detected, similar to the a* and b* values of the FDS and DDS (Table 7; Fig. 2). Of these, only one locus, GENE-1258_171 on chromosome 2A, was found in two environments, with a maximum PVE of 10.51%.
Prediction of candidate genes for colour traits of flour. Markers with high PVE values were selected from loci significantly associated with flour whiteness and flour colour for prediction (Table S3). Some important loci were identified. There were twenty-one candidate genes predicated. Of which, the marker BS00000020_51 on chromosome 5D had two candidate genes, TraesCS5D01G004300 and Gsp-1D, from wheat (Table S3). The functions of these genes are related to grain hardness, that is, puroindoline-b and an arabinogalactan peptide, respectively. The candidate gene of the marker Excalibur_c4709_576 on 2B in wheat and Arabidopsis (mouseear cress) also participates in glycerol-3-phosphate O-acyltransferase activity and protein self-association. The marker BS00027770_51 on chromosome 6B had the candidate gene TraesCS6B01G419200, whose functions is related to the flavonoid biosynthetic process and regulation of jasmonic acid mediated signalling pathway. The candidate gene TraesCS5A01G003800 for the marker BS00099534_51 on chromosome 5A is related to the fucose metabolic process and protein phosphorylation. The marker TraesCS6A01G241200 on chromosome 6A had Table 2. Phenotypic analysis of fresh and dry dough sheet colour in two locations in 2 years. TA Tai'an location, DZ Dezhou location, L* value the brightness index, a* value the redness index, b* value the yellowness index, FDS fresh dough sheet, DDS dry dough sheet. www.nature.com/scientificreports/ the candidate gene TA005690-1190, whose function is related to the ketone body biosynthetic process, leucine catabolic process and lipid metabolic process. These candidate genes may be related to flour colour, and their functions will be explored in future research.
Prediction of candidate genes for dough sheet colour. Markers with high PVE values were selected from loci significantly associated with dough sheet colour for prediction. Five new genes have been predicated. The marker BS00065510_51 on chromosome 1D had the candidate gene TraesCS1D01G070200 from wheat (Table S4). The functions of this gene are related to ATP binding, RNA binding and RNA helicase activity.   www.nature.com/scientificreports/ www.nature.com/scientificreports/ The candidate gene TraesCS1D01G070300 of the marker BS00065722_51 on 1D in Oryza barthii also participates in the RNA catabolic process. The marker GENE-1258_171 on chromosome 2A had the candidate gene TraesCS2A01G593500 from Oryza barthii, Aegilops tauschii and wheat. The functions of this gene are related to protein tyrosine kinase activity and protein serine/threonine kinase activity. The gene TraesCS5A01G003800 is predicated on chromosome 5A for maker BS00099534_51, whose function is related to catalytic activity and transferase activity participating in fucose metabolic process. The marker BobWhite_c15802_72 on chromosome 6A had the candidate gene TraesCS6A01G024900 from wheat, soybean (Glycine hispida) and Arabidopsis (mouse-ear cress). The functions of this gene are related to the response to oxidative stress, adenine salvage activity and the carbohydrate metabolic process. These candidate genes may be related to dough sheet colour, and their functions will be explored in future research.

Discussion
Flour whiteness and flour and dough colour-related traits are critical determinants for the end-use product quality of wheat. Therefore, it is important to identify some major and stable loci for these traits and then transfer these favourable alleles into commercial varieties. Although some loci for these traits were found in previous studies in different populations 11,30-34 using QTL mapping methods, genome-wide association studies (GWAS) for these traits are rarely conducted using single nucleotide polymorphism markers (SNPs). Therefore, it is still important to find new major and stable loci introgressed into commercial cultivars using GWAS. Zhai et al. 34 constructed a genetic map that included 8227 SNP markers using an RIL population and found fifty-six QTLs. However, in the present study, a total of 24,355 SNP markers were mapped for MTA analysis using a panel of varieties, and more new loci were found than in a previous study. Some chromosomes for flour colour traits were involved in QTL mapping, including mainly homoeologous group 1, 2, 5, and 7 chromosomes, and chromosomes 3B, 4A, 4B and 6B 34,35 . Some QTLs detected for the a* value were found on the 1B and 3B chromosomes 36 , and other involved chromosomes were mainly 1B, 1D, 2D, 4A, 4D and 7B. In this study, loci associated with these traits were found on chromosomes 6A, 4A and 3A in addition to the above chromosomes. Zhai et al. 37 performed GWAS on 166 bread wheat cultivars using the wheat 90 and 660K SNP arrays and 10 allele-specific markers, and identified 100 MTAs for flour color-related traits. This indicated that GWAS and QTL mapping could be complementary to each other.
PPO activity and the yellow pigment content have been reported to affect flour whiteness and colour [38][39][40] . Previous studies showed that PPO activity is mainly controlled by the genes on the homoeologous group 2 chromosomes, particularly 2A and 2D 41,42 . In the present study, significant SNP loci markers on chromosomes 2B, 2D and 2A were associated with flour whiteness, the flour L* value and dough sheet colour, but these SNP Table 6. SNP markers significantly associated with the yellowness index (b* value) of flour colour in two locations in 2 years (P < 10 -4 ). Chr. chromosome, Pos. position, Env. environment.

Marker
Chr. www.nature.com/scientificreports/ loci showed be different from PPO genes loci by comparing their physical positions. Excalibur_c4709_576 and TA001505-1171 on chromosome 2B were significantly associated with flour whiteness in two environments. Their candidate gene prediction indicated that the function was related to glycerol-3-phosphate O-acyltransferase activity, which participates in the fatty acid biosynthetic process. Previous research showed that lipoxygenase affects flour whiteness and colour 43 . Therefore, these loci may influence flour whiteness by lipoxygenase but not PPO activity. Kukri_c33486_128, on chromosome 2D, which was significantly associated with the flour L* value, was stably identified in three environments, but its candidate gene and function were not predicted in the BLAST search, which indicated that this locus is new and should be further studied in the future. GENE-1258_171 on chromosome 2A was significantly associated with the dough sheet L* value in two environments. Its predicted gene is TraesCS2A01G593500, whose function is related to protein kinase activity, so this locus affects the dough sheet colour, perhaps as a result of the grain protein content. Previous studies showed that the yellow pigment content is mainly controlled by chromosomes 7A and 7B; moreover, the Psy1 gene was reported to co-segregate with the b* value and yellow pigment content 35,44,45 In the present study, significantly associated loci were also found on these two chromosomes. Only one SNP marker associated with flour whiteness was detected on chromosome 7B. Although no loci were found on these two  www.nature.com/scientificreports/ chromosomes for the flour b* value, nine SNP markers were significantly associated with the flour a* value. Moreover, they seemed to be stable in multiple environments. Three SNP markers were found on chromosome 7A, and their function is related to leucine rich repeat family protein expression. The other six SNP markers on chromosome 7B have one candidate gene, TraesCS7B01G482200, which is related to sucrose synthase activity. Therefore, the mechanism of their influence on flour colour is different from that of the Psy gene and yellow pigment content, which needs to be further studied in the future. In addition, the flour whiteness colour is also affected by milling characteristics 32,46 . However, the milling characteristics are influenced by grain hardness, so the grain hardness affects the flour whiteness and colour. Zhai et al. identified the QTL QFL.caas-5D-1, which is close to the Pin-b gene, with a distance of 2.1 cM. The QTL for the L* value on chromosome 5DS coincided with the hardness (Ha) locus in previous studies 32,46 . In the present study, the SNP marker BS00000020_51 on chromosome 5D was significantly associated with flour whiteness and the L* and b* values and was stably detected in multiple environments. Therefore, this locus is important for flour whiteness and colour. Through a comparison of Zhai et al. 's results 34 , this SNP marker (BS00000020_51) was also found on their genetic map and was close to the closest marker BobWhite_s67669_117 of the QTL, with a distance of 2.1 cM. This indicated that the SNP marker BS00000020_51 controls flour whiteness and colour through grain hardness. We found two candidate genes, TraesCS5D01G004300 and Gsp-1D, by candidate gene prediction. Of these, Gsp-1D is the grain softness protein-1 gene that is linked to grain hardness, which affects flour particle size. Moreover, flour whiteness showed negatively correlated with flour particle size 47 , so it is possible that Gsp-1D affect the flour whiteness. Therefore, the accuracy of the results is confirmed.
Most interestingly, the SNP markers found on chromosome 6A were significantly associated with flour whiteness and colour. The markers RAC875_c34446_396 and GENE-4011_91 were at the same position, i.e., 49 cM, and were associated with flour whiteness and the flour L* value, respectively. The function of the candidate gene PUP88 was related to hydrolase activity and the hydrolysis of O-glycosyl compounds, participating in the carbohydrate metabolic process. However, at position 79 cM, five SNP markers detected in more than one environment were associated with the flour a* and b* values. The functions of the candidate gene TraesCS6A01G241200 of the SNP marker TA005690-1190 were different from those of other candidate genes. The TraesCS6A01G241200 gene participates in the ketone body biosynthetic process and lipid metabolic processes, which affect flour colour. Other candidate genes were mainly related to proteins. Moreover, the SNP marker BobWhite_c15802_72 on chromosome 6A was associated with dough sheet colour. Its function is related to peroxidase activity in soybean, but in wheat, the biological process is not clear. In addition, one special locus, the BS00027770_51 marker, associated with the flour a* value, was identified on chromosome 6B. Its candidate gene participates in the flavonoid biosynthetic process and regulation of jasmonic acid in Arabidopsis thaliana, but the function in wheat remains unknown. The above loci were not found in previous studies.
Flour whiteness and colour-related traits are inherently correlated. Three important loci on chromosomes 5D and 6A were identified. The SNP locus at genetic position 103 cM of chromosome 5D was involved in both flour whiteness and the b* value; the loci at genetic position 49 cM of chromosome 6A influenced flour whiteness and the flour L* value, and the last locus at genetic position 79 cM of chromosome 6A was associated with the flour a* and b* values. These relationships were reflected by the correlation coefficients (Table S5; Table S6), in agreement with previous studies 34 . Their physical positions were seen in Table S7. Therefore, genes with pleiotropic effects may explain the genetic basis of trait correlation. Pleiotropic effects were observed for dough sheet colour-related traits.

Conclusions
This study provided the important information about the influence of proteins, lipoxygenase and grain hardness on the flour whiteness and colour in addition to PPO activity and yellow pigment at the molecular level. GWAS is a good method for identifying new, important, stable loci. SNP markers significantly associated with flour whiteness and colour detected in this study provide opportunities for MAS of traits that are difficult to phenotype at the early stages of wheat breeding.

Materials and methods
Ethics statement. All samples analysed in our study adhered to all local, national or international guidelines and legislation, and no ethical approval was required. During the growing seasons, all recommended local crop management practices were followed, and damage attributed to lodging, disease, or pests was not observed.

Phenotypic trait evaluation.
Flour milling was carried out by using a Bühler experimental mill (Bühler mill, Bühler-Miag Company, Braunschweig, Germany) with a flour extraction yield of approximately 70% in all samples stored for approximately 1 month after being harvested. The samples were tempered to 14%-16% moisture content according to grain texture overnight before milling. Flour whiteness was determined by a WSB-IV intelligent whiteness determination meter (Dajiguangdian Instruments, Hangzhou, China) 36 . The working principle is to measure the absolute spectral diffuse reflectance using a photometry integrating sphere. The peak wavelength of the spectral power distribution of Y10 whiteness optical system is 475 nm and the half wave width is 44 nm. The standard of Y10 optical system accords with national standard GB3979 of China.
Dough making was performed according to 49 with minor modifications. Flour and water were mixed to achieve 44% absorption by slow mixing at a low speed for 5 min, followed by mixing at a medium speed for 2 min using a Kitchen Aid Professional Mixer (KPM5, St. Joseph, MI, USA). During the resting stage, crumbly dough was placed in a stainless steel bowl for 20 min at room temperature. The crumbly dough was then hand kneaded into a stiff mass and passed through an automatic noodle maker (JMTZ-14, Dongfang Fude Technology Development Center, Beijing, China) three times to form a noodle sheet at a 2.0 mm roll gap-setting. The dough sheet was then folded twice and passed through six different roll gaps (3.5, 3.0, 2.5, 2.0, 1.5, and 1.0 mm). Then, the fresh dough sheet was cut into approximately 6 small sheets (length 10 cm, width 5 cm, thickness 1.0 mm). The fresh sheet was dried in the oven for 24 h at 40 °C.
The colour-related traits of the flour and dough sheet. The flour colour parameters (L*, a* and b*) were measured with a Minolta colorimeter (CR-300, Minolta Camera Co., Ltd., Osaka, Japan) using the commission Internationale de l' éclairage (CIE) L* a* b* colour system 50 . The L* value indicates the lightness of flour with a range of 0-100 representing darkness to lightness (L* = 0 means black, L* = 100 means white, and the middle value is a grey transition with different brightness). The a* value indicates the red-green direction, that is, it designates redness when positive but greenness when negative. The b* value indicates the degree of the flour yellow-blue colour, that is, the higher b* value denotes a greater amount of yellow 51 .
The dough sheet colour was also measured by a Minolta CR-300 colour meter. Three points were measured to determine the fresh dough sheet (FDS) (uncooked) at 0 h and the dry dough sheet colour at 24 h per noodle sheet, each at a different location on the same side of the surface of the noodle sheet 52 .
The colorimeter parameter of each sample was measured three times, and the mean values were used for subsequent statistical analysis. , and σ e 2 are estimates of genotype, genotype × environment and residual error variances, respectively, r is replicates, and re is the product of replicates and number of environments. The estimates of σ g 2 , σ ge 2 , and σ e 2 were obtained from variance estimates included in the ANOVA, which was performed using the PROC GLM procedure of SAS 8.0 (SAS Institute Inc., Cary, NC, USA).
Genome-wide association analysis. SNP markers, genotyping and the population structure of the samples were reported previously 27,28 . Based on this information, significant marker-trait associations (MTAs) were identified using a mixed linear model (MLM) in TASSEL3.0. The P-value was used to determine whether a QTL www.nature.com/scientificreports/ was associated with a marker. The R 2 value was used to evaluate the magnitude of the MTA effects. The genomewide significance threshold (P-value ≤ 0.001) was given. SNPs with P-value ≤ 0.001 were considered to be significantly associated with phenotypic traits. When the MTA locus was detected in two or more environments, it was considered a stable association site 48 .

Forecasting candidate genes for flour whiteness and dough sheet colour-related traits.
To identify the position of important MTA loci on a physical map and possible candidate genes, significant markers detected in this study were used to identify putative candidate genes. A BLAST (Basic Local Alignment Search Tool) search was performed on the International Wheat Genome Sequencing Consortium database (IWGSC; http:// www. wheat genome. org/, 20th November 2020) using the sequence of the significant SNP markers identified by GWAS. When an SNP marker sequence from the IWGSC was 100% identical to any wheat contig, the sequence was extended 2 Mb for each marker using the IWGSC BLAST results. Then, the extended sequence was used to run a BLAST search at the National Center for Biotechnology Information (NCBI) database (http:// www. ncbi. nlm. nih. gov, 20th November 2020) and Ensembl Plants (http:// plants. ensem bl. org/ Triti cum_ aesti vum/ Tools/ Blast, 20th November 2020) to confirm possible candidate genes and functions.
Ethics approval and consent to participate. Wheat is a common crop extensively cultivated in the world. This study does not contain any research requiring ethical consent or approval.

Data availability
All data used during the current study are included in this published article or are available from the corresponding author on reasonable request.