Maize (Zea mays L.) is the largest grain crop in China. The planting area for maize and maize production were 41.3 million ha and 2.6 billion tons in 2019 in China, respectively1. Mechanical grain harvesting is the key technology in maize production, and the kernel moisture concentration (KMC) is the main controlling factor in mechanical maize harvesting in China2. A low KMC at harvest may facilitate mechanical harvesting, shelling efficiency, and grain quality and reduce additional drying cost and shrinkage penalties3,4,5. When the KMC at harvest is greater than 25 percent, the breakage rate quickly increases, which significantly reduces farmers' incomes6. The kernel dehydration rate (KDR) is defined as the rate of moisture loss between two adjacent periods after pollination, which is closely related to the KMC. It is the key measurements to select maize hybrids with a low KMC and a high KDR at mature period for achieving mechanical maize harvesting.

Changes in the KMC and KDR occur in two distinct phases7,8. The first phase spans the time from pollination to physiological maturity (PM) and is defined as physiological dehydration. The second phase spans the time from PM to harvest and is defined as a natural drying process. During the first phase, kernel dehydration is an internal process under the control of growth and development regulatory processes. Environmental factors have no significant effects on such dehydration9. Beginning as early as the 1960s, the KMC and KDR have been controlled by multi-genes and could be stably inherited10,11,12. Previous research has also shown that selection based on a low KMC and a high KDR before PM was an effective strategy to achieve a low KMC at harvest13,14,15. Therefore, measuring the KMC and KDR in various maize cultivars before PM and conducting quantitative trait locus (QTL) mapping, genome-wide association studies (GWAS), and candidate gene mining for these two traits in maize are crucial tasks.

Extensive research has been conducted to map QTLs for the KMC and KDR in maize3,4,16,17,18,19. Some intervals or genes associated with the KDR in maize have been obtained3,4, but only Dai et al.20 and Zhang et al.21 have conducted GWAS for these two traits in maize.

The most popular method for GWAS is the mixed linear model (MLM)22,23. In the past decade, several MLM algorithms, such as compressed MLM24 and enriched compressed MLM25, were developed to improve the computational efficiency. However, all these models perform one-dimensional genome scans and require multiple corrections. Generally, the above traditional methods had significant limitations in mapping QTLs with relatively small effects. Therefore, Wang et al.26 proposed a new model. Then, GWAS methods based on a multi-locus random-SNP-effect MLM (mrMLM) were proposed. The methods included iterative modified-sure independence screening expectation–maximization (EM)-Bayesian LASSO (ISIS EM-BLASSO), polygenic-background-control-based least angle regression plus empirical Bayes (pLARmEB), fast multi-locus random-SNP-effect efficient mixed model association (FASTmrEMMA), and fast multi-locus random-SNP-effect mixed linear model (FASTmrMLM)26,27,28,29,30. These methods could effectively detect small-effect quantitative trait nucleotides (QTNs) and improve the efficiency and accuracy of GWAS.

In this study, the mrMLM was used to perform a GWAS based on Axiom Maize 55K SNP Array data from 132 maize inbred lines. QTNs associated with the KMC and KDR were detected using five methods in the package mrMLM (mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, and ISIS EM-BLASSO). Associated candidate genes were predicted and submitted to Gene Ontology (GO) enrichment analysis. The results have theoretical significance and application value for improving maize kernel dehydration characteristics using marker-assisted selection (MAS).


Phenotypic data analysis of 132 inbred lines

The trends of both the KMC and KDR were very similar in 2014, 2015 and 2016 (Fig. 1). The KMC gradually decreased over time, while the KDR gradually increased over time (Table 1). The different phenotypic traits also showed different levels of variation. The KMC had relatively low coefficients of variation (CV), and the lowest CVs were found at 15 days after pollination (DAP), with values of 1.34%, 1.48%, and 1.44% in 2014, 2015, and 2016, respectively. CVs for the KMC gradually increased over time, with values of 12.68%, 13.56% and 13.71% in 2014, 2015, and 2016 at 40 DAP, respectively. By contrast, the KDR had relatively high CV, with the maximum values found at 35 DAP, 42.09%, 39.4% and 37.06% in 2014, 2015, and 2016, respectively. This result suggests that the optimal time for maize KDR measurement may be after 35 DAP. The heritability ranged from 68.54 to 76.42% for the KMC, and ranged from 61.75 to 74.16% for the KDR. The highest heritability of both traits appeared at 40 DAP. The results from analysis of variance for maize KMC and KDR revealed highly significant differences across cultivars and across periods, except KDR15.

Figure 1
figure 1

The changes of the KMCs and KDRs in 2014, 2015 and 2016 across time.

Table 1 Statistical analysis for the KMCs and KDRs of the 132 inbred lines.

Population structure and linkage disequilibrium (LD) analysis

The Axiom Maize 55K SNP Array (CapitalBio Corp., Beijing, China) had 55,229 SNPs31, and the 132 inbred lines were genotyped based on the Affy AXIOM Array 2.0 platform. The raw genotyping data were filtered using PLINK based on a minor allele frequency (MAF) > 0.05 and a missing genotype rate (GENO) < 0.1. After filtering, the remaining 41,357 SNPs were used for LD analysis and GWAS.

When r2 = 0.1, the attenuation distance of LD was approximately 200 kb (Fig. 2A), and genes within 200 kb upstream or downstream of the marker exceeding the threshold were taken as candidates. The number of subpopulations (k) was plotted based on the delta k calculated using STRUCTURE software (Fig. 2B). The line chart showed a peak at k = 6, indicating that the natural population could be divided into six subpopulations (Fig. 2C).

Figure 2
figure 2

Linkage disequilibrium decay and population structure of 132 lines. (a) LD decay; (b) Plot of delta K against putative K ranging from 2 to 10; (c) Stacked bar plot.

The quality and accuracy of the SNP markers and lines

Measurement of the KDR is relatively difficult, and natural populations have rarely been used in GWAS. In this study, we conducted the GWAS using cob colour, a qualitative trait in maize, to verify the validity and accuracy of the experimental populations and markers. Pericarp colour 1 (P1) regulates red pigmentation in cob, pericarp, tassel glumes, and husks, and its gene is located at mk187 on chromosome (chr.) 1 at position 48.1 Mb. A total of seven significant QTNs were detected on chr.1 using the package mrMLM (Table 2). Among them, AX-86284808 and AX-91425354 were detected by two and four methods, respectively, which explained up to 32.12% of the phenotypic variation. When r2 = 0.1 and the attenuation distance of LD was approximately 200 kb, AX-86284808 and AX-91425354 were mapped onto B73 Ref Gen_V4, and both QTNs were internally scanned to Zm00001d028850 (P1), which is consistent with the results of a previous study screening genes involved in cob colour32. These results suggest that we should identify candidate genes within the attenuation distance of LD around the QTNs shared in common by multiple detection methods to achieve accurate candidate gene mining (Table 2).

Table 2 QTNs for cob colour traits by multi-locus GWAS methods.

QTNs associated with the KMC and KDR

A total of 334 significant QTNs were identified for the KMC and KDR in maize, using five multi-locus GWAS methods in the package mrMLM, with a critical LOD (logarithm of odds) score of 3 (Supplementary Table S1, Fig. 3). A total of 175 significant QTNs were found for the KMC, including 28, 35, 23, 34, 35, 23, and 30 significant QTNs associated with KMC10, KMC15, KMC20, KMC25, KMC30, KMC35, and KMC40, respectively, in which their proportions of phenotypic variation explained (PVE) by each QTN ranged from 1.22 to 25.34%. Of these QTNs, 41 and 21 QTNs were detected in 2 and 3 years, respectively. For the KDR, 178 significant QTNs were detected, including 30 28, 37, 39, 22, and 23 significant QTNs associated with KDR15, KDR20, KDR25, KDR30, KDR35, and KDR40, respectively, in which their PVEs ranged from 0.54 to 24.62%. Among these QTNs, 19 and 5 were detected in 2 and 3 years, respectively. Moreover, 1, 3, 5, and 32 QTNs were found to be simultaneously associated with the above 5, 4, 3 and 2 traits, respectively.

Figure 3
figure 3

Genome-wide distribution of SNPs throughout the 132 lines’ genomes. The outermost box with the scale represents the 10 maize chromosomes. The red histogram represents the density of SNPs. The yellow, red and blue scatters represent significant QTNs for the KMC and KDR traits in 2014, 2015 and 2016, respectively.

Common QTNs by multiple methods and years

Among the above-mentioned 334 QTNs, 116 and 58 were detected by two and more than two methods, respectively (Supplementary Tables S2, S3). Nine of the 58 QTNs were detected in 2 years, which were associated with KMC10, KMC20, KMC20, KMC30, KMC35, KMC40, KDR15, KDR30, and KDR40, respectively. Five of the 58 QTNs were detected in 3 years, which were associated with KMC10, KMC30, KMC35, KDR20 and KDR30, respectively. Among the 14 QTNs across multiple environments (Table 3), AX-91654966 on chr.5 was found by five methods in 2015 and 2016 to be associated with KDR40; AX-91629217 on chr.4 was found by four methods in 3 years to be associated with KMC30; and their PVEs ranged up to 19.76% and 14.50%, respectively.

Table 3 Fourteen QTNs co-detected by multiple methods and years for KMC and KDR traits.

Validation of fourteen common QTNs

Based on the alleles of 14 QTNs, the 132 inbred lines were divided into two groups, and we tested for significant differences in mean phenotypes between the two groups (Fig. 4). The results showed that the mean values of the groups with favourable alleles were larger than those of the other groups with alternative alleles, indicating significant or extremely significant differences.

Figure 4
figure 4

Boxplots for validation of the 14 co-detected QTNs (a ~ n). For each QTN, the population was divided into two groups according to allele types. The X-axis represents the two alleles for each QTN, while the Y-axis corresponds to the phenotype.

GO enrichment analysis

According to the genomic information of B73 Ref Gen_V4, a total of 2,970 genes were present in the intervals adjacent to the 334 QTNs detected for the KMC and KDR in maize (Supplementary Tables S1, S4). A total of 142 genes were detected in the intervals adjacent to the 14 QTNs in two and 3 years. Significance was observed for cellular component in the GO enrichment analysis (P ≤ 0.05; Supplementary Table S5), and this category contained five genes (Supplementary Table S6). This information may be useful for further mining of genes associated with the KMC and KDR in maize.


Measurements and physiological mechanisms of KMC and KDR

According to Hart and Golumbic33, methods to measure seed KMC can be divided into direct and indirect methods. With direct methods, the KMC in a seed sample is measured directly. With indirect methods, the KMC is measured according to the relationship between the chemical and physical characteristics of seed water and its KMC, with calibration to a reference. Direct methods have the advantage of high repeatability, but require a substantial workload in the field and are therefore unsuitable for multiple repeated measurements of many materials. In this study, we directly measured the KMC in maize following the experimental method of Reid et al.34. This method can not only reduce the workload in the field but also decrease experimental errors during shelling, thus saving time and labour.

The KMC in maize kernels is closely linked to their dry weight35. In the early stage of kernel development, water accumulates more than dry matter. As the dry weight increases, the KMC gradually decreases. Therefore, kernel dehydration before PM is an internal process controlled by growth and development, and environmental factors have no significant effects on dehydration in this process9.

Genotypic differences lead to variations in the KDR, and the KDR shows a marked variation between various cultivars before PM, which is still inheritable. This study showed that in the early stage of kernel development, the KMC decreased slowly, while the KDR was relatively low; and with maturation of the kernels, the KMC and KDR increased sharply after 35 DAP (the kernels were close to PM); these trends were the same for 3 years. Although the slope of the KDR or KMC is a parameter used for evaluation of the dehydration process, the index calculated by the fitting curve has no biological meaning. The purpose of the paper is to ensure the key periods and detect the key genes. Meanwhile, the specific moment is also the key point of phenotype identification.

The feasibility of multi-locus GWAS for detection of QTNs associated with the KMC and KDR

The multi-locus GWAS model has a higher discrimination ability and a lower false-positive rate for detection of animal and plant genes compared with the single-locus GWAS model36,37. Geneticists have introduced pleiotropy and population structure into the single-locus GWAS model to reduce the errors in effect estimation by controlling the genetic background24,38,39. Although the modification of the single-locus GWAS model improves its detection accuracy to a certain degree, the multiple-testing correction (e.g., Bonferroni correction) of significance thresholds for the single-locus model is too strict. This strict correction leads to the exclusion of important loci, especially when large experimental errors occur in field trials of crop genetics. To solve this problem, the application of multi-locus mrMLM is essential.

Previous studies have applied the multi-locus GWAS model to detect QTNs associated with important traits in various crops. Based on this model, Guan et al. detected 149 QTNs associated with the fatty acid content and composition of rape40, while Misra et al. detected 224 significant QTNs for the grain quality of rice, 97 of which were detected by at least two methods41. Additionally, Li et al. identified 42 SNPs and 5 QTNs for quality traits of forage sorghum42. Peng et al. detected 328 significant QTNs associated with free amino acid (FAA) levels in wheat, including 66 QTNs that were detected by more than two models, which revealed the complexity of metabolic and genetic regulation43. Lü et al. detected 159 QTNs for the photosynthetic response of soybean under low-phosphorus stress and discovered 52 candidate genes44. Hou et al.45 detected 20 QTNs that were significantly associated with drought resistance traits in cotton and further identified 1326 relevant genes, 205 of which were induced after drought stress treatment. Hu et al.46 detected 913 QTNs for agronomic traits of barley, including 39 QTNs that were repeatedly detected in various environments and by different methods, with 10 candidate genes identified through gene annotation In maize, Ma et al. 47 detected 63 QTNs for the regenerative capacity of embryonic callus and found 40 candidate genes based on these common QTNs. Moreover, Xu et al.48 identified a total of 60 QTNs for the gelatinization properties of maize starch. Zhang et al.49 detected 423 QTNs for maize stalk traits related to lodging resistance, 29, 34, and 48 of which were associated with stem diameter, stalk bending strength, and rind penetrometer resistance, respectively, as detected by multiple methods or across multiple environments. Based on these studies, conducting multi-locus GWAS for KMC- and KDR-associated traits in maize is feasible.

Detection of QTLs and QTNs associated with the KMC and KDR

The KMC and KDR of maize are complex quantitative traits with different impact factors in different periods. Physiological dehydration is controlled by genotype and could be stably inherited11,22,50. Natural drying is jointly determined by the KMC at PM, drying time, and the KDR and KMC at harvest, which are susceptible to environmental conditions9. Both the KMC and KDR have high heritability and can be stably inherited. It is feasible to map major-effect QTLs for the KMC and KDR, to mine associated candidate genes, and to develop practical functional markers for marker-assisted selection, which will be valuable for the breeding of maize cultivars for rapid dehydration. Many studies have mapped QTLs for the KMC and the KDR in maize3,4,16,17,18,19, but GWAS for these two traits have rarely been reported. Dai et al.20 selected 80 maize inbred lines to conduct a field survey for two consecutive years, performed a GWAS using 1,490,007 high-quality SNPs, and eventually detected 19 SNPs associated with natural KDR in the field. Zhang et al. conducted GWAS on kernel dehydration traits in maize using 290 inbred lines in combination with 201 simple sequence repeat (SSR) molecular markers, and 17 SSRs associated with natural KDR in the field were finally detected21. This study conducted a multi-locus GWAS for KMC- and KDR-associated traits in maize based on the Axiom Maize 55K Array data of 132 maize inbred lines. A total of 116 and 58 QTNs were detected by two and more than two methods, respectively. Among the 58 QTNs detected by three or more methods, 9 and 5 were detected in 2 or 3 years.

The QTNs identified in this study were correlated with QTLs reported earlier than the present report. Among these 14 QTNs, AX-123944713 and AX-90786057 were located in the qKdr-2-1 interval detected by Wang et al.51 and the intervals of q9GDR13-2-1 and qcGDR23-2-1 by Li et al.18; AX-90614081 were located in the q8GDR14-2-1 and qcGDR14-2-2 interval detected by Li et al.18 and the qKdr-2-2 interval by Liu et al.19; AX-90843001 was located in the intervals of q8GWC20-5-1, q9GWC10-5-2, qcGWC10-5-1, qcGWC20-5-1, q8GDR12-5-1 and qcGDR12-5-1 detected by Li et al.18; AX-90848774 was located in the qKdr-3-6 interval detected by Wang et al.51; AX-91652225 was located in the q9GWC10-5-11 interval detected by Li et al.18; AX-91654966 and AX-91442789 were located in the intervals of q8GWC10-5-1, q8GWC30-5-1, q8GWC40-5-1, q9GWC20-5-1, q9GWC30-5-1, q9GWC40-5-1, qcGWC30-5-1, qcGWC40-5-1, q8GDR13-5-1, q8GDR14-5-1 and qcGDR13-5-1 detected by Li et al.18; AX-91760347 was located in qKdr-8-2 interval detected by Wang et al.51; AX-91785266 was located in intervals of q9GWC30-9-1q9GDR13-9-1q9GDR34-9-1 and qcGDR13-9-1 detected by Li et al.18; AX-86286026, AX-91629217 and AX-86291963 have not been reported. Furthermore, a significant enrichment in cellular component was obtained through GO enrichment analysis of candidate genes in the intervals adjacent to the 14 QTNs detected in 2 or 3 years, and this category contained five genes.


The 132 inbred lines were genotyped using a maize 55K single-nucleotide polymorphism array. QTNs for the KMC and KDR in maize were detected based on five methods (mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, and ISIS EM-BLASSO) in the package mrMLM. A total of 334 significant QTNs were identified for the KMC and the KDR, 116 and 58 of which were detected by two and more than two methods, respectively, while 9 and 5 QTNs were detected in 2 and 3 years, respectively. A significant enrichment in cellular component was revealed by GO enrichment analysis of candidate genes in the intervals adjacent to the fourteen QTNs, and this category contained five genes. The information provided in this study may be useful for further mining of genes associated with the KMC and the KDR in maize.

Materials and methods

Plant materials

A total of 132 maize inbred lines were used in this study, which were provided by the Institute of Crop Resources, Jilin Academy of Agricultural Sciences (JAAS) (Supplementary Table S7). These lines represented six subpopulations: BSSS (American BSSS including Reid), PA (group A germplasm derived from modern U.S. hybrids in China), PB (group B germplasm derived from modern U.S. hybrids in China), Lan (Lancaster Surecrop), LRC (derivative lines from Lvda Red Cob, a Chinese landrace), and SPT (derivative lines from Si-ping-tou, a Chinese landrace).

Field design and phenotypic measurement

The 132 lines were grown at the Gongzhuling (124°47′ N and 43°27′ E, Jilin Province, China) Experimental Base, JAAS in 2014, 2015 and 2016, with a final plant density of 75,000 plants ha-1. A randomized block experimental design with three replications was adopted. In 3 years, each plot had 5 rows, with a row length of 5 m, row spacing of 0.65 m, plant spacing of 0.20 m and plot area of 16.25 m2. Normal agronomic practices for maize were used in field management. To avoid border effects, for each plot, two border rows and the first two plants at each end of the middle three rows were not used for future trait determination.

The ears were bagged before silking (50% of plants in the row with extruded silks). Then the bagged ears were pollinated by hand. One week later, the bags were removed and five tested ears were randomly selected, tagged and labelled in each plot. The moisture concentration was recorded from 10 to 40 DAP, with one measurement of every five days. At 9:00 a.m., per the method published by Reid et al.34, for each ear, a SK-300 probe for moisture concentration measurement (manufactured by Harbin Yuda Electronic Technology Co., Ltd., China) was used to pierce into kernels after penetrating the bract leaves in the middle of the ear.

KMCs at 10, 15, 20, 25, 30, 35, and 40 DAP were measured, which were designated as KMC10, KMC15, KMC20, KMC25, KMC30, KMC35, and KMC40, respectively. KDRs were then calculated based on two consecutive KMC measurements. KDR = (KMC at a given time—KMC at the next time)/number of days during the time span. The KDRs for the six time spans (namely, 10–15, 15–20, 20–25, 25–30, 30–35 and 35–40 DAP) were denoted as KDR15, KDR20, KDR25, KDR30, KDR35, and KDR40, respectively.

Statistical analysis of phenotypic data

The F-values for genotypes (FG) and environments phenotypic data (FE) were analysed by SPSS 22 (IBM Corp., Armonk, NY, USA). Broad-sense heritability was calculated using the formula proposed by Knapp et al.52: \({H}^{2}={\delta }_{g}^{2}/\left[{\delta }_{g}^{2}+\left({\delta }_{gl}^{2}/n\right)+{\delta }_{e}^{2}/nr\right]\), where \({\delta }_{g}^{2}\) is the genetic variance, \({\delta }_{gl}^{2}\) is the variance of the genotype-by-environment interaction, \({\delta }_{e}^{2}\) is the error variance, n is the number of sites, and r is the number of replicates.\(CV=SD/Mean\).

Genotyping and filtering

Genomic DNA was extracted from young leaf samples of the 132 maize inbred lines using the modified cetyltrimethylammonium bromide (CTAB) method53. The quality of DNA was assessed using 0.8% agarose gel electrophoresis and a NanoDrop 2000 spectrophotometer (NanoDrop, Wilmington, DE, USA). Genotyping was performed using the Axiom Maize 55K SNP Array (CapitalBio Corp., Beijing, China) 31, which contained a total of 55,229 SNPs. The 132 inbred lines were genotyped based on the Affy AXIOM Array 2.0 platform. Genotyping data of the 132 inbred lines were filtered using the software PLINK54 with the settings of MAF > 0.05, GENO < 0.1, and genotype heterozygous loci missing.

Linkage disequilibrium and population structure analysis

LD among markers was calculated using PLINK software. The window size for LD calculation was set based on the number of SNPs located in the genome. Pair-wise linkage disequilibrium was measured using the squared allele frequency correlations, according to Weir55, and assessed by calculating r2 for pairs of SNP loci.

The population structure of 132 lines was assessed using STRUCTURE software 2.3.456. A burn-in of 5000 iterations followed by 50 000 Markov Chain Monte Carlo (MCMC) replicates was implemented to estimate the number of subpopulations (k) in a putative range of 2–10. To estimate the robustness of the inferred population structure, five replications were conducted for each k. The subpopulation number was estimated using an ad hoc statistic delta k based on the rate of change in the log probability of data between successive values57.

Genome-wide association studies

All the phenotypic and genotypic information in the above mapping population was used to detect QTNs using the mrMLM26, FASTmrEMMA29, FASTmrMLM30, pLARmEB28, and ISIS EM-BLASSO27 approaches, implemented by the software programme mrMLM v4.0 ( The unified parameter settings for the five methods were as follows: (1) the Q + K model was used, where the population structure value Q was calculated by Structure 2.3.4 software56, and the kinship value K was analyzed by the “mrMLM” software package; and (2) the significance threshold p value was set as 0.0002 (limit of detection (LOD) = 3.0). In addition, while using mrMLM and FASTmrEMMA, the search radius of candidate genes was specified as 20 kb; using pLARmEB, 50 potential association loci were selected on each chromosome.

Annotation of candidate genes analysis

QTNs for the KMC and the KDR in maize detected by multiple methods were mapped to the maize reference genome B73 RefGen_V4 to identify associated candidate genes. The obtained candidate genes were subjected to GO enrichment analysis using AgriGO v2.058, and the final set of genes associated with the KMC and the KDR were identified.