Detection of QTNs for kernel moisture concentration and kernel dehydration rate before physiological maturity in maize using multi-locus GWAS

Maize is China’s largest grain crop. Mechanical grain harvesting is the key technology in maize production, and the kernel moisture concentration (KMC) is the main controlling factor in mechanical maize harvesting in China. The kernel dehydration rate (KDR) is closely related to the KMC. Thus, it is important to conduct genome-wide association studies (GWAS) of the KMC and KDR in maize, detect relevant quantitative trait nucleotides (QTNs), and mine relevant candidate genes. Here, 132 maize inbred lines were used to measure the KMC every 5 days from 10 to 40 days after pollination (DAP) in order to calculate the KDR. These lines were genotyped using a maize 55K single-nucleotide polymorphism array. QTNs for the KMC and KDR were detected based on five methods (mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, and ISIS EM-BLASSO) in the package mrMLM. A total of 334 significant QTNs were found for both the KMC and KDR, including 175 QTNs unique to the KMC and 178 QTNs unique to the KDR; 116 and 58 QTNs were detected among the 334 QTNs by two and more than two methods, respectively; and 9 and 5 QTNs among 58 QTNs were detected in 2 and 3 years, respectively. A significant enrichment in cellular component was revealed by Gene Ontology enrichment analysis of candidate genes in the intervals adjacent to the 14 QTNs and this category contained five genes. The information provided in this study may be useful for further mining of genes associated with the KMC and KDR in maize.

Maize (Zea mays L.) is the largest grain crop in China. The planting area for maize and maize production were 41.3 million ha and 2.6 billion tons in 2019 in China, respectively 1 . Mechanical grain harvesting is the key technology in maize production, and the kernel moisture concentration (KMC) is the main controlling factor in mechanical maize harvesting in China 2 . A low KMC at harvest may facilitate mechanical harvesting, shelling efficiency, and grain quality and reduce additional drying cost and shrinkage penalties [3][4][5] . When the KMC at harvest is greater than 25 percent, the breakage rate quickly increases, which significantly reduces farmers' incomes 6 . The kernel dehydration rate (KDR) is defined as the rate of moisture loss between two adjacent periods after pollination, which is closely related to the KMC. It is the key measurements to select maize hybrids with a low KMC and a high KDR at mature period for achieving mechanical maize harvesting.
Changes in the KMC and KDR occur in two distinct phases 7,8 . The first phase spans the time from pollination to physiological maturity (PM) and is defined as physiological dehydration. The second phase spans the time from PM to harvest and is defined as a natural drying process. During the first phase, kernel dehydration is an internal process under the control of growth and development regulatory processes. Environmental factors have no significant effects on such dehydration 9 . Beginning as early as the 1960s, the KMC  www.nature.com/scientificreports/ controlled by multi-genes and could be stably inherited [10][11][12] . Previous research has also shown that selection based on a low KMC and a high KDR before PM was an effective strategy to achieve a low KMC at harvest [13][14][15] . Therefore, measuring the KMC and KDR in various maize cultivars before PM and conducting quantitative trait locus (QTL) mapping, genome-wide association studies (GWAS), and candidate gene mining for these two traits in maize are crucial tasks. Extensive research has been conducted to map QTLs for the KMC and KDR in maize 3,4,[16][17][18][19] . Some intervals or genes associated with the KDR in maize have been obtained 3,4 , but only Dai et al. 20 and Zhang et al. 21 have conducted GWAS for these two traits in maize.
The most popular method for GWAS is the mixed linear model (MLM) 22,23 . In the past decade, several MLM algorithms, such as compressed MLM 24 and enriched compressed MLM 25 , were developed to improve the computational efficiency. However, all these models perform one-dimensional genome scans and require multiple corrections. Generally, the above traditional methods had significant limitations in mapping QTLs with relatively small effects. Therefore, Wang et al. 26 proposed a new model. Then, GWAS methods based on a multi-locus random-SNP-effect MLM (mrMLM) were proposed. The methods included iterative modified-sure independence screening expectation-maximization (EM)-Bayesian LASSO (ISIS EM-BLASSO), polygenic-backgroundcontrol-based least angle regression plus empirical Bayes (pLARmEB), fast multi-locus random-SNP-effect efficient mixed model association (FASTmrEMMA), and fast multi-locus random-SNP-effect mixed linear model (FASTmrMLM) [26][27][28][29][30] . These methods could effectively detect small-effect quantitative trait nucleotides (QTNs) and improve the efficiency and accuracy of GWAS.
In this study, the mrMLM was used to perform a GWAS based on Axiom Maize 55K SNP Array data from 132 maize inbred lines. QTNs associated with the KMC and KDR were detected using five methods in the package mrMLM (mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, and ISIS EM-BLASSO). Associated candidate genes were predicted and submitted to Gene Ontology (GO) enrichment analysis. The results have theoretical significance and application value for improving maize kernel dehydration characteristics using marker-assisted selection (MAS).

Results
Phenotypic data analysis of 132 inbred lines. The trends of both the KMC and KDR were very similar in 2014, 2015 and 2016 (Fig. 1). The KMC gradually decreased over time, while the KDR gradually increased over time ( Table 1). The different phenotypic traits also showed different levels of variation. The KMC had relatively low coefficients of variation (CV), and the lowest CVs were found at 15  Population structure and linkage disequilibrium (LD) analysis. The Axiom Maize 55K SNP Array (CapitalBio Corp., Beijing, China) had 55,229 SNPs 31 , and the 132 inbred lines were genotyped based on the Affy AXIOM Array 2.0 platform. The raw genotyping data were filtered using PLINK based on a minor allele frequency (MAF) > 0.05 and a missing genotype rate (GENO) < 0.1. After filtering, the remaining 41,357 SNPs were used for LD analysis and GWAS.
When r 2 = 0.1, the attenuation distance of LD was approximately 200 kb ( Fig. 2A), and genes within 200 kb upstream or downstream of the marker exceeding the threshold were taken as candidates. The number of subpopulations (k) was plotted based on the delta k calculated using STRU CTU RE software (Fig. 2B). The line chart showed a peak at k = 6, indicating that the natural population could be divided into six subpopulations (Fig. 2C). www.nature.com/scientificreports/ The quality and accuracy of the SNP markers and lines. Measurement of the KDR is relatively difficult, and natural populations have rarely been used in GWAS. In this study, we conducted the GWAS using cob colour, a qualitative trait in maize, to verify the validity and accuracy of the experimental populations and markers. Pericarp colour 1 (P1) regulates red pigmentation in cob, pericarp, tassel glumes, and husks, and its gene is located at mk187 on chromosome (chr.) 1 at position 48.1 Mb. A total of seven significant QTNs were detected on chr.1 using the package mrMLM (Table 2). Among them, AX-86284808 and AX-91425354 were detected by two and four methods, respectively, which explained up to 32.12% of the phenotypic variation. When r 2 = 0.1 and the attenuation distance of LD was approximately 200 kb, AX-86284808 and AX-91425354 were mapped onto B73 Ref Gen_V4, and both QTNs were internally scanned to Zm00001d028850 (P1), which is consistent with the results of a previous study screening genes involved in cob colour 32 . These results suggest that we should identify candidate genes within the attenuation distance of LD around the QTNs shared in common by multiple detection methods to achieve accurate candidate gene mining ( Table 2).
QTNs associated with the KMC and KDR. A total of 334 significant QTNs were identified for the KMC and KDR in maize, using five multi-locus GWAS methods in the package mrMLM, with a critical LOD (logarithm  Validation of fourteen common QTNs. Based on the alleles of 14 QTNs, the 132 inbred lines were divided into two groups, and we tested for significant differences in mean phenotypes between the two groups (Fig. 4). The results showed that the mean values of the groups with favourable alleles were larger than those of the other groups with alternative alleles, indicating significant or extremely significant differences.  Table S6). This information may be useful for further mining of genes associated with the KMC and KDR in maize.    www.nature.com/scientificreports/ ods, the KMC in a seed sample is measured directly. With indirect methods, the KMC is measured according to the relationship between the chemical and physical characteristics of seed water and its KMC, with calibration to a reference. Direct methods have the advantage of high repeatability, but require a substantial workload in the field and are therefore unsuitable for multiple repeated measurements of many materials. In this study, we directly measured the KMC in maize following the experimental method of Reid et al. 34 . This method can not only reduce the workload in the field but also decrease experimental errors during shelling, thus saving time and labour. The KMC in maize kernels is closely linked to their dry weight 35 . In the early stage of kernel development, water accumulates more than dry matter. As the dry weight increases, the KMC gradually decreases. Therefore, kernel dehydration before PM is an internal process controlled by growth and development, and environmental factors have no significant effects on dehydration in this process 9 .

GO enrichment analysis.
Genotypic differences lead to variations in the KDR, and the KDR shows a marked variation between various cultivars before PM, which is still inheritable. This study showed that in the early stage of kernel development, the KMC decreased slowly, while the KDR was relatively low; and with maturation of the kernels, the KMC and KDR increased sharply after 35 DAP (the kernels were close to PM); these trends were the same for 3 years. Although the slope of the KDR or KMC is a parameter used for evaluation of the dehydration process, the index calculated by the fitting curve has no biological meaning. The purpose of the paper is to ensure the key periods and detect the key genes. Meanwhile, the specific moment is also the key point of phenotype identification.

The feasibility of multi-locus GWAS for detection of QTNs associated with the KMC and KDR.
The multi-locus GWAS model has a higher discrimination ability and a lower false-positive rate for detection of animal and plant genes compared with the single-locus GWAS model 36,37 . Geneticists have introduced pleiotropy and population structure into the single-locus GWAS model to reduce the errors in effect estimation by controlling the genetic background 24,38,39 . Although the modification of the single-locus GWAS model improves its detection accuracy to a certain degree, the multiple-testing correction (e.g., Bonferroni correction) of significance thresholds for the single-locus model is too strict. This strict correction leads to the exclusion of important loci, especially when large experimental errors occur in field trials of crop genetics. To solve this problem, the application of multi-locus mrMLM is essential.
Previous studies have applied the multi-locus GWAS model to detect QTNs associated with important traits in various crops. Based on this model, Guan   www.nature.com/scientificreports/ induced after drought stress treatment. Hu et al. 46 detected 913 QTNs for agronomic traits of barley, including 39 QTNs that were repeatedly detected in various environments and by different methods, with 10 candidate genes identified through gene annotation In maize, Ma et al. 47 detected 63 QTNs for the regenerative capacity of embryonic callus and found 40 candidate genes based on these common QTNs. Moreover, Xu et al. 48 identified a total of 60 QTNs for the gelatinization properties of maize starch. Zhang et al. 49 detected 423 QTNs for maize stalk traits related to lodging resistance, 29, 34, and 48 of which were associated with stem diameter, stalk bending strength, and rind penetrometer resistance, respectively, as detected by multiple methods or across multiple environments. Based on these studies, conducting multi-locus GWAS for KMC-and KDR-associated traits in maize is feasible.
Detection of QTLs and QTNs associated with the KMC and KDR. The KMC and KDR of maize are complex quantitative traits with different impact factors in different periods. Physiological dehydration is controlled by genotype and could be stably inherited 11,22,50 . Natural drying is jointly determined by the KMC at PM, drying time, and the KDR and KMC at harvest, which are susceptible to environmental conditions 9 . Both the KMC and KDR have high heritability and can be stably inherited. It is feasible to map major-effect QTLs for the KMC and KDR, to mine associated candidate genes, and to develop practical functional markers for marker-assisted selection, which will be valuable for the breeding of maize cultivars for rapid dehydration. Many studies have mapped QTLs for the KMC and the KDR in maize 3,4,16-19 , but GWAS for these two traits have rarely been reported. Dai et al. 20 selected 80 maize inbred lines to conduct a field survey for two consecutive years, performed a GWAS using 1,490,007 high-quality SNPs, and eventually detected 19 SNPs associated with natural KDR in the field. Zhang et al. conducted GWAS on kernel dehydration traits in maize using 290 inbred lines in combination with 201 simple sequence repeat (SSR) molecular markers, and 17 SSRs associated with natural KDR in the field were finally detected 21 . This study conducted a multi-locus GWAS for KMC-and KDRassociated traits in maize based on the Axiom Maize 55K Array data of 132 maize inbred lines. A total of 116 and 58 QTNs were detected by two and more than two methods, respectively. Among the 58 QTNs detected by three or more methods, 9 and 5 were detected in 2 or 3 years. The QTNs identified in this study were correlated with QTLs reported earlier than the present report. Among these 14 QTNs, AX-123944713 and AX-90786057 were located in the qKdr-2-1 interval detected by Wang et al. 51 and the intervals of q9GDR13-2-1 and qcGDR23-2-1 by Li et al. 18 ; AX-90614081 were located in the q8GDR14-2-1 and qcGDR14-2-2 interval detected by Li et al. 18 and the qKdr-2-2 interval by Liu et al. 19 ; AX-90843001 was located in the intervals of q8GWC20-5-1, q9GWC10-5-2, qcGWC10-5-1, qcGWC20-5-1, q8GDR12-5-1 and qcGDR12-5-1 detected by Li et al. 18 ; AX-90848774 was located in the qKdr-3-6 interval detected by Wang et al. 51 ; AX-91652225 was located in the q9GWC10-5-11 interval detected by Li et al. 18 ; AX-91654966 and AX-91442789 were located in the intervals of q8GWC10-5-1, q8GWC30-5-1, q8GWC40-5-1, q9GWC20-5-1, q9GWC30-5-1, q9GWC40-5-1, qcGWC30-5-1, qcGWC40-5-1, q8GDR13-5-1, q8GDR14-5-1 and qcGDR13-5-1 detected by Li et al. 18 ; AX-91760347 was located in qKdr-8-2 interval detected by Wang et al. 51 ; AX-91785266 was located in intervals of q9GWC30-9-1、q9GDR13-9-1、q9GDR34-9-1 and qcGDR13-9-1 detected by Li et al. 18 ; AX-86286026, AX-91629217 and AX-86291963 have not been reported. Furthermore, a significant enrichment in cellular component was obtained through GO enrichment analysis of candidate genes in the intervals adjacent to the 14 QTNs detected in 2 or 3 years, and this category contained five genes.

Conclusions
The 132 inbred lines were genotyped using a maize 55K single-nucleotide polymorphism array. QTNs for the KMC and KDR in maize were detected based on five methods (mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, and ISIS EM-BLASSO) in the package mrMLM. A total of 334 significant QTNs were identified for the KMC and the KDR, 116 and 58 of which were detected by two and more than two methods, respectively, while 9 and 5 QTNs were detected in 2 and 3 years, respectively. A significant enrichment in cellular component was revealed by GO enrichment analysis of candidate genes in the intervals adjacent to the fourteen QTNs, and this category contained five genes. The information provided in this study may be useful for further mining of genes associated with the KMC and the KDR in maize.

Materials and methods
Plant materials. A total of 132 maize inbred lines were used in this study, which were provided by the  The ears were bagged before silking (50% of plants in the row with extruded silks). Then the bagged ears were pollinated by hand. One week later, the bags were removed and five tested ears were randomly selected, tagged www.nature.com/scientificreports/ and labelled in each plot. The moisture concentration was recorded from 10 to 40 DAP, with one measurement of every five days. At 9:00 a.m., per the method published by Reid et al. 34 , for each ear, a SK-300 probe for moisture concentration measurement (manufactured by Harbin Yuda Electronic Technology Co., Ltd., China) was used to pierce into kernels after penetrating the bract leaves in the middle of the ear. KMCs at 10,15,20,25,30,35, and 40 DAP were measured, which were designated as KMC10, KMC15, KMC20, KMC25, KMC30, KMC35, and KMC40, respectively. KDRs were then calculated based on two consecutive KMC measurements. KDR = (KMC at a given time-KMC at the next time)/number of days during the time span. The KDRs for the six time spans (namely, [10][11][12][13][14][15][15][16][17][18][19][20][20][21][22][23][24][25][25][26][27][28][29][30][30][31][32][33][34][35] were denoted as KDR15, KDR20, KDR25, KDR30, KDR35, and KDR40, respectively. Statistical analysis of phenotypic data. The F-values for genotypes (F G ) and environments phenotypic data (F E ) were analysed by SPSS 22 (IBM Corp., Armonk, NY, USA). Broad-sense heritability was calculated using the formula proposed by Knapp et al. 52 : H 2 = δ 2 g / δ 2 g + δ 2 gl /n + δ 2 e /nr , where δ 2 g is the genetic variance, δ 2 gl is the variance of the genotype-by-environment interaction, δ 2 e is the error variance, n is the number of sites, and r is the number of replicates.CV = SD/Mean. Genotyping and filtering. Genomic DNA was extracted from young leaf samples of the 132 maize inbred lines using the modified cetyltrimethylammonium bromide (CTAB) method 53 . The quality of DNA was assessed using 0.8% agarose gel electrophoresis and a NanoDrop 2000 spectrophotometer (NanoDrop, Wilmington, DE, USA). Genotyping was performed using the Axiom Maize 55K SNP Array (CapitalBio Corp., Beijing, China) 31 , which contained a total of 55,229 SNPs. The 132 inbred lines were genotyped based on the Affy AXIOM Array 2.0 platform. Genotyping data of the 132 inbred lines were filtered using the software PLINK 54 with the settings of MAF > 0.05, GENO < 0.1, and genotype heterozygous loci missing.
Linkage disequilibrium and population structure analysis. LD among markers was calculated using PLINK software. The window size for LD calculation was set based on the number of SNPs located in the genome. Pair-wise linkage disequilibrium was measured using the squared allele frequency correlations, according to Weir 55 , and assessed by calculating r 2 for pairs of SNP loci.
The population structure of 132 lines was assessed using STRU CTU RE software 2.3.4 56 . A burn-in of 5000 iterations followed by 50 000 Markov Chain Monte Carlo (MCMC) replicates was implemented to estimate the number of subpopulations (k) in a putative range of 2-10. To estimate the robustness of the inferred population structure, five replications were conducted for each k. The subpopulation number was estimated using an ad hoc statistic delta k based on the rate of change in the log probability of data between successive values 57 .
Genome-wide association studies. All the phenotypic and genotypic information in the above mapping population was used to detect QTNs using the mrMLM 26 , FASTmrEMMA 29 , FASTmrMLM 30 , pLARmEB 28 , and ISIS EM-BLASSO 27 approaches, implemented by the software programme mrMLM v4.0 (https ://cran.r-proje ct.org/web/packa ges/mrMLM .GUI/index .html). The unified parameter settings for the five methods were as follows: (1) the Q + K model was used, where the population structure value Q was calculated by Structure 2.3.4 software 56 , and the kinship value K was analyzed by the "mrMLM" software package; and (2) the significance threshold p value was set as 0.0002 (limit of detection (LOD) = 3.0). In addition, while using mrMLM and FAST-mrEMMA, the search radius of candidate genes was specified as 20 kb; using pLARmEB, 50 potential association loci were selected on each chromosome.
Annotation of candidate genes analysis. QTNs for the KMC and the KDR in maize detected by multiple methods were mapped to the maize reference genome B73 RefGen_V4 to identify associated candidate genes. The obtained candidate genes were subjected to GO enrichment analysis using AgriGO v2.0 58 , and the final set of genes associated with the KMC and the KDR were identified.