Introduction

Maize (Zea mays L.) has become one of the most important crops globally for food, feed, and fuel since it appeared and spread widely1. In the past few years, more and more people paid attention to the quality of maize grain due to the rapid development of animal husbandry and processing industry. However, the nutritional quality of maize grain remains poor, especially the deficiency of lysine in the maize grain, which can not meet the nutritional and health requirements of people2. Thus, the genetic enhancement of nutritional quality in maize grains is essential to increase the nutritional value and conduct high-quality maize breeding2.

The results of genetic studies have indicated that variations in nutritional quality in maize grain characterize quantitative traits. Over the past two decades, genetic dissection of nutritional quality in maize kernels by classical QTL mapping has resulted in the identification of numerous nutritional quality QTLs. Mangolin et al.3 detected 13 QTLs by QTL mapping of maize kernel oil content in F2:3 population. Liu et al.4 detected seven QTLs associated with protein content, six QTLs associated with starch content, and five QTLs associated with oil content using F2:3 population and BC2F2 population. Wang et al.5 detected 38 QTLs for maize grain quality traits using three RIL populations in three environments. To date, many genes related to maize proteins have been cloned, such as opaque1 (o1), floury4 (fl4) and Mucronate (Mc)6,7,8. Some genes such as linoleic acid1 (ln1), Oleic acid content1 (olc1), fatty acid desaturation 2 (fad2), and fad6 have been reported to influence oil content in maize9,10,11. A few starch content-related genes such as Shrunken1 (Sh1), Sh2 and Brittle2 (Bt2) have been identified12,13.

Genome-wide association study (GWAS) is becoming a powerful tool to address interspecies genotype–phenotype association based on the development of next-generation sequencing technology. In maize, GWAS has made a significant progress in the past decade. For example, Li et al.14 used GWAS to dissect the genetic architecture of oil biosynthesis in maize kernels. Luo et al.15 used GWAS to detect 57 loci significantly associated with salt tolerance, and 49 candidate genes from these loci. It can be seen that GWAS is widely used in maize. However, there is still a huge problem about how to obtain phenotypic data accurately and quickly. Traditionally, phenotyping method for grain quality, such as chemical method, is not only laborious and time-consuming, but also damages the integrity of maize kernels. By contrast, Near Infrared Reflectance Spectroscopy (NIRS) is a fast, reliable, and non-destructive method. NIRS has been increasingly used in plant phenotyping measurements, such as maize kernel starch content16 and wheat protein content17. Therefore, NIRS can fully measure maize grain nutritional quality.

Although some potential grain quality genes and QTLs have been identified in maize, the genetic studies of grain quality are limited. In this study, we used NIRS to measure the main nutritional quality traits of 248 maize inbred lines and used 83,057 single nucleotide polymorphism (SNPs) markers to conduct GWAS. Our study was designed to accomplish the following objectives: (1) perform GWAS to identify SNPs responsible for moisture, protein, oil, starch and lysine contents in maize kernels, (2) compare our GWAS results with previous QTL mapping results, and (3) predict and identify candidate genes of these quality traits for future studies.

Results

Phenotypic variations of grain quality traits

The phenotypes of grain quality traits are shown in Tables 1 and 2 and Fig. 1. As displayed in Table 1, the results indicated that there were abundant phenotypic variations in the 248 inbred lines and all grain quality traits followed a normal distribution, which benefited the dissection of the genetic architecture of the grain.

Table 1 Statistical analysis of grain quality traits in different environments.
Table 2 Analysis of variance (ANOVA) for grain quality traits.
Figure 1
figure 1

Correlation analysis of grain quality traits at BLUP. *, ** and *** are significant correlation at P < 0.05, P < 0.01, P < 0.001, respectively.

Significant correlations were detected among grain quality traits, except for the correlation between moisture content and protein content (p = − 0.13) and the correlation between oil content and lysine content (p = 0.079) (Fig. 1). In addition, starch content had significant negative correlation with protein content, oil content, and lysine content, whereas the correlation among the remaining traits was positive.

Analysis of variance (ANOVA) indicated that highly significant variations for genotypes and environments were found (Table 2). However, the genotype-by-environment interaction was not significant except for the genotype-by-environment interaction of moisture content and oil content. The broad-sense heritability (\({h}^{2}\), %) for grain quality traits across the four environments in the 248 inbred lines ranged from 70.16 (lysine content) to 80.37 (starch content), indicating the predominant role of genetic factors in determining these traits (Table 2). Overall, the grain quality exhibited significant genetic variations and it was suitable for association analysis.

Genome-wide association analysis

With the BLUP value of each grain quality trait, we conducted a GWAS with 83,057 genome-wide SNPs. In total, we detected 3, 7, 21, 8 and 10 SNPs to be significantly associated with moisture content, protein content, oil content, starch content and lysine content, respectively (Table 3, Fig. 2). For moisture content, three SNPs were located on chromosomes 1, 3, and 9, which individually explained 10.53%–23.16% of the phenotypic variation. For protein content, seven SNPs were located on chromosomes 1, 2, 3, and 4, which individually explained 5.45%–32.79% of the phenotypic variation. For oil content, 21 SNPs were located on chromosomes 1, 3, 4, 5, 6, 7, 8, 9, and 10, which individually explained the phenotypic variation of 9.04–31.24%. For starch content, eight SNPs were located on chromosomes 1, 3, 4, and 6, which individually explained the phenotypic variation of 3.77–23.61%. For lysine content, ten SNPs were located on chromosomes 2, 3, 4, 5, 8, and 9, which individually accounted for 5.71%–32.42% of the phenotypic variation.

Table 3 Analysis of correlated SNP with grain quality traits at BLUP.
Figure 2
figure 2

Manhattan plots of GWAS results for (a) moisture content, (b) protein content, (c) oil content, (d) starch content and (e) lysine content. Black solid lines correspond to the threshold (− log10 (p) = 4).

Comparing the localization results, we found eight co-localized sites and the physical position between SNPs was not more than 200 Kb (Table 4). Notably, four co-localized sites between different traits were detected. 1_60098266-60,098,312 was associated with oil content, starch content and protein content, which explained 9.60%, 16.70%, 16.64% and 13.48% of the phenotypic variation, respectively. 3_1462112-1,462,147 was associated with starch content and protein content, which explained 3.77% and 5.45% of the phenotypic variation, respectively. 3_133182128-133,206,096 was associated with oil content and starch content, which accounted for 22.52%, 4.03% and 4.16% of the phenotypic variation, respectively. 9_97538609-97,538,609 was associated with moisture content and oil content, which explained up to 23.16% and 17.51% of the phenotypic variation, respectively.

Table 4 Co-localized SNPs of grain quality traits in natural population.

Candidate genes associated with significant SNPs

Previous study have shown that the correlation coefficient (r2) between SNP markers is less than 0.1, which is considered to be no correlation18. Therefore, we choose r2 = 0.1 as the LD decay distance. Candidate genes were predicted based on LD decay (r2 = 0.1) in the MaizeGDB genome browser. A total of 208 candidate genes were found and detailed descriptions were summarized in Table S1.

The candidate genes can be classified into 11 biological processes, 13 cellular components, and six molecular functions. The number of candidate genes involved in the grain quality traits of moisture, protein, oil, starch and lysine contents was 77, 46, 103, 136 and 49, respectively. Among them, the candidate genes in biological processes were mainly concentrated in the cellular process and the metabolic process; the candidate genes in cellular component were mainly concentrated in organelle, cell and cell part; and the candidate genes in molecular function were mainly concentrated in catalytic activity and binding (Fig. 3). As for the KEGG analysis of the candidate genes, a total of 12 pathways were identified. These pathways included the biosynthesis of secondary metabolites, and that of amino acids, starch and sucrose metabolism, inositol phosphate metabolism, and phosphatidylinositol signaling system, which could be related to grain quality (Fig. 4). In addition, a protein classification analysis tool was used to classify candidate gene proteins, 46 of which matched the PANTHER database. A further analysis showed that these 46 proteins fell into nine categories (Fig. 5), which contained the largest number of proteins— metabolite interconversion enzyme (PC00262). Furthermore, we identified 29 candidate genes to be associated with grain quality (Table 5). Annotation information suggested that these candidate genes may control multiple traits during maize growth and development.

Figure 3
figure 3

GO-second class of candidate genes.

Figure 4
figure 4

Analysis of KEGG pathway based on candidate genes. (The figure was created by R version 3.6.1 based on KEGG pathway database www.kegg.jp/kegg/kegg1.html).

Figure 5
figure 5

Protein classification of candidate genes.

Table 5 Putative candidate gene of grain quality traits.

Discussion

Genetic basis of grain quality traits

Maize grain quality traits are complex quantitative traits, controlled by main effect genes and lots of micro effect genes. In this study, there was a wide variety of grain quality traits in the natural population, which were normally distributed. To reduce the influence of the environment on the genotype, phenotypic BLUP values across four environments were used for association studies. Phenotypic correlations were observed among the five grain quality traits. For instance, oil content had significant positive correlation with protein content and significant negative correlation with starch content, which is consistent with previous results4,19. Meanwhile, starch content had significant positive correlation with protein content and lysine content, which is consistent with previous studies20,21. Moreover, all of the five grain quality traits had higher broad-sense heritability. Among them, the heritability for protein content, oil content, starch content and lysine content was higher than that in previous studies21. The above results indicated a stable genetic association among these grain quality traits of maize.

It is well known that there are hard choices between yield and quality. Previous studies showed negative correlation between quality traits and yield22. Therefore, how to carry out quality breeding while continuing to improve the yield of maize will be a new subject for maize breeders in the twenty-first century. At present, maize quality breeding is mainly to increase protein content and improve the composition of base acids, especially to increase the content of essential amino acids such as lysine and tryptophan23. In this study, a total of 83,057 SNP markers were used to scan the whole genome, combined with moisture, protein, starch, oil, lysine content and other phenotypic traits and genotypes for association analysis. The purpose of this study was to find the main genes to control the quality traits of maize, and then to introduce the genes into the parents of maize high-yield hybrids from molecular level, and then to obtain high-yield and high-quality hybrids and to provide a theoretical basis for genetic improvement of the quality traits of maize.

Significant SNPs for grain quality traits

Nowadays many researchers at home and abroad apply linkage analysis to locating the QTL of regulating grain quality traits, but few GWAS studies are on grain quality traits. In addition, the QTL detected by linkage analysis and association analysis have consistency in position24. The GWAS analysis is performed with a Bonferroni correction, however this was found to be too strict for less significant trait associations. In order to better detect micro-effect polygenes and identify genetic sites, we reduced the significance threshold to -log10(P) = 4 for all traits25. In this study, a total of 49 SNPs significantly associated with grain quality were detected, and the phenotypic variation explained (PVE) value by a single SNP ranged from 3.77% (3_1462112 and 3_1462147 of starch content) to 32.79% (2_43189825 of protein content). In addition, four co-localized SNPs were detected by multiple traits and single phenotypic variation explained value over 3.77%, indicating that starch content, protein content, oil content, and moisture content are interrelated in the components of corn kernels.

The SNPs detected in this study were compared with previous studies, and some SNPs were found to be located in the localized QTL confidence interval. Among them, five SNPs were located in the QTL interval with Zhang et al.26, where 1_190758142 for oil content was located on chromosomes 1 (bnlg2086-umc1122 interval), 4_85966198 for oil content was located on chromosomes 4 (phi096-bnlg1755 interval), co-localized site 1_60098266-60,098,312 for protein content and starch content was located on chromosomes 1 (phi001-umc1988 interval). Two SNPs were located in the QTL interval with Wang et al.27, where 1_19252635 for oil content was located on chromosome 1 (umc1685-umc1044 interval), and 1_190758142 for oil content was located on chromosome 1 (umc1395-umc2237 interval). In addition, 2_169733611 for lysine content was located on chromosome 2 (bnlg1138-umc1065 interval) of study by Zhang et al.2. Moreover, three SNPs were located in the QTL interval with Yang et al.28, where 1_190758142 for oil content was located on chromosome 1 (umc1590-bnlg1556 interval), 3_213791486 for protein content were located on chromosome 3 (umc2275-umc1594 interval), 10_127895301 for oil content was located on chromosome 10 (umc1272-bnlg1839 interval).

However, some SNPs were not found in previous studies. There are several reasons for these differences. First, the population in our investigation mightnot be different enough for grain quality. Second, different estimating methods also caused the variations. Third, these SNPs were newly discovered and needed testing further. All in all, the results of this study can serve as a reference for other studies.

Putative genes and pathways involved in grain quality

In this study, a total of 208 candidate genes were searched, of which 17 possible candidate genes for grain quality traits were predicted.

For moisture content, three candidate genes were detected. GRMZM2G069024 encoded Beta-glucosidase 11, an important component of the cellulase system29. Previous studies reported that dehydration rapidly induced polymerization of AtBG1, a beta-glucosidase30. GRMZM2G032852 encoded putative calcium-dependent protein kinase family protein and GRMZM2G321041 encoded putative RING zinc finger domain superfamily protein. The two proteins emerged as key proteins in response to drought stresses in plants31. The three enzymes are closely related to moisture content, therefore, three candidate genes may affect moisture content by influencing enzymes activities and genes expression levels.For protein content, three candidate genes were detected. One of the candidate genes (GRMZM2G047129) encoded alpha-L-fucosidase 2. Alpha-L-fucosidase has been reported in only a few plants, such as Arabidopsis,32 and pea33. It was reported that alpha-L-fucosidase can hydrolyze fucose residues from glycoproteins34, thus this candidate gene may affect protein content. Another candidate gene (GRMZM2G466833) encoded malate dehydrogenase 3 (MDH), that is, one of the key enzymes to synthesize malic acid. MDH played a key role in many physiological metabolic pathways, such as C4 pathway, crassulacean acid metabolism, gluconeogenesis, tricarboxylic acid cycle and photosynthesis, linking the metabolism of sugars, proteins and lipids in the body35. Therefore, this candidate gene may affect protein content by influencing the metabolism of proteins. Finally, GRMZM2G071714 encoded lipoyl synthase (LS) that analyzes the final step of lipoyl cofactor biosynthesis36. Protein lipoylation was denovo lipoylation pathway in plastids, and two octanoyltransferases and one LS provided protein lipoylation autonomy to plastids of Arabidopsis37. Therefore, this candidate gene may affect protein content by influencing the protein lipoylation.

For oil content, three candidate genes were identified. GRMZM2G433942 encoded palmitoyltransferase ZDHHC9. Serine palmitoyltransferase (SPT) is the key enzyme of sphingolipids biosynthesis, and sphingolipids are essential components of plant cells38. As can be seen, it has a certain effect on the oil content of maize. GRMZM2G134308 encoded putative beta-14-xylosyltransferase IRX10L. Xylose is a kind of glycosyl component widely found in plants. Plant glycosyltransferases are enzymes that are closely related to the metabolism of glycolipids, polysaccharides, glycoproteins, nucleic acids, plant secondary products, and so on39. Therefore, this candidate gene may affect oil content by influencing beta-14-xylosyltransferase activities and genes expression levels. GRMZM2G033544 encoded cyclopropane-fatty-acyl-phospholipid synthase that is synonymous with cyclopropane fatty acid (CFA) synthase. CFA is an important membrane fatty acid in the stress-resistant mechanism and the presence of CFA can enhance membrane rigidity40. CFA synthase is a key enzyme regulating synthetic CFA. Therefore, this candidate gene may affect oil content by influencing CFA synthase activities and genes expression levels.For starch content, five candidate genes were identified. Three of the candidate genes (GRMZM2G175218, GRMZM2G082034 and GRMZM2G347708) encoded beta-amylase, which was directly involved in the synthesis of starch and metabolic process of polysaccharides41. Two of the candidate genes (GRMZM2G000520 and GRMZM2G404453) encoded ethylene-responsive transcription factor (ERF), a member of a transcription factor family involved in plant growth and environmental stress responses42. In addition, ethylene has been shown to affect starch biosynthesis by influencing enzymes activities and genes expression levels involved in starch synthesis in maize43.

For lysine content, three candidate genes were detected. GRMZM2G050570 encoded threonine synthase which catalyzed the terminal reaction in the biosynthetic pathway of threonine44. From a nutritional point of view, lysine and threonine were essential amino acids in maize, and GRMZM2G050570 may indirectly affect lysine content by influencing threonine synthase activities and genes expression levels.. GRMZM2G129209 encoded omega-3 fatty acid desaturase which was a key enzyme for α-linolenic acid (ALA) biosynthesis45. Moreover, previous studies demonstrated that ALA was a crucial component in storing lipids in plants46. Therefore, the gene may regulate grain lysine content indirectly by regulating lipid synthesis. Finally, GRMZM2G076307 encoded glycosyltransferases (GTs), which belong to a multi-member genes family. According to previous studies, GTs played a very important role in the growth and development of plants, such as regulating plant hormone levels, participating in the synthesis, modification and transportation of secondary metabolites in plants, and participating in plant defense reactions47,48. Therefore, the gene may regulate grain lysine content indirectly by regulating plant hormone and secondary metabolites synthesis.

All in all, these candidate genes are closely related to grain quality and future work will include functional validation of these genes and illustrate the molecular mechanisms for controlling grain quality in maize plants.

Materials and methods

Association mapping panel and genotyping

The association panel consisted of 248 diverse lines, including some excellent backbone inbred lines in China and some high-quality inbred lines introduced from abroad. Details on 248 of these lines could be found in previous studies20. The DNA of all the maize inbred lines were extracted using CTAB method49 and genotyped using Genotyping-By-sequencing (GBS) method50. The methods of SNP filtering and calculating linkage disequilibrium (LD) decay were described in previous studies51. A total of 83,057 SNPs were used and the LD decay was 120 kb (r2 = 0.1) in this study.

Field experiments and phenotyping investigation

All 248 maize inbred lines of the association panel were planted in four environments, that is, Baoding in Hebei Province in 2016 and 2017, and Shijiazhuang in Hebei Province in 2016 and 2017. In each environment, all the maize inbred lines were planted in a single row plot using a randomized block design with two replications. Each experimental plot consisted of a row length of 3 m and 0.6 m between adjacent rows. After maturity, all corns except the head and tail of each row were harvested. Perten DA7200 Near Infrared Grain Analyzer was applied to determinate the moisture, protein, oil, starch and lysine contents of maize. Each material was repeatedly measured two times.

Phenotype statistical analysis

The IBM SPSS 21.0 software was used to make the descriptive statistical analysis and the analysis of variance (ANOVA). The broad-sense heritability (\({h}^{2}\)) for each trait was estimated as \({h}^{2}={\sigma }_{g}^{2}/\left({\sigma }_{g}^{2}+{\sigma }_{gy}^{2}/r+{\sigma }_{e}^{2}/yr\right)\)52, where \({\sigma }_{g}^{2}\), \({\sigma }_{gy}^{2}\) and \({\sigma }_{e}^{2}\) are genetic variance, genotype-by-environment interaction variance and error variance, respectively, \(y\) is the number of environments, and \(r\) is the number of replications. The “PerformanceAnalytics” package in the R software was used to perform correlations analysis. For each trait, BLUP value was evaluated by using the following mixed linear model in the “lme4” package of the R software53: Y = (1|rep%in%env) + (1|env) + (1|lines) + (1|env:lines), where Y stands for trait data, the parentheses indicate random effects, “1|” means groups, “:” means interactions, “lines” means all materials and “env” means the environment.

Genome-wide association study of grain quality traits

The BLUP value and 83,057 SNPs were used to conduct the GWAS by using FarmCPU model54 implemented in the GAPIT package in the R software55, with both K and Q matrix taken into account.

Prediction of candidate genes

Candidate genes were predicted based on the significant SNPs and their extension regions from 120 kb upstream to 120 kb downstream (LD decay) in the MaizeGDB (https://www.maizegdb.org/) genome browser B73 reference genome version v2. The MaizeGDB, NCBI (https://www.ncbi.nlm.nih.gov/) and Uniprot (https://www.uniprot.org/) were used to obtain annotation of candidate genes. Then these candidate genes were performed GO analysis on the GENE ONTOLOGY website (http://www.geneontology.org/). The KOBAS 3.0 website (http://kobas.cbi.pku.edu.cn/kobas3/?t=1) was used to performe Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis56.

Statement

Experimental research and field studies complies with relevant institutional, national, and international guidelines and legislation.