QTL mapping and GWAS for field kernel water content and kernel dehydration rate before physiological maturity in maize

Kernel water content (KWC) and kernel dehydration rate (KDR) are two main factors affecting maize seed quality and have a decisive influence on the mechanical harvest. It is of great importance to map and mine candidate genes related to KWCs and KDRs before physiological maturity in maize. 120 double-haploid (DH) lines constructed from Si287 with low KWC and JiA512 with high KWC were used as the mapping population. KWCs were measured every 5 days from 10 to 40 days after pollination, and KDRs were calculated. A total of 1702 SNP markers were used to construct a linkage map, with a total length of 1,309.02 cM and an average map distance of 0.77 cM. 10 quantitative trait loci (QTLs) and 27 quantitative trait nucleotides (QTNs) were detected by genome-wide composite interval mapping (GCIM) and multi-locus random-SNP-effect mixed linear model (mrMLM), respectively. One and two QTL hotspot regions were found on Chromosome 3 and 7, respectively. Analysis of the Gene Ontology showed that 2 GO terms of biological processes (BP) were significantly enriched (P ≤ 0.05) and 6 candidate genes were obtained. This study provides theoretical support for marker-assisted breeding of mechanical harvest variety in maize.

www.nature.com/scientificreports/ QTL mapping for KDR after PM and KWC at harvest 3,4,[12][13][14][15][16][17] . Recently, QTL for three traits related with KWCs at 30, 40, 60 and 80 days after pollination (DAP) was conducted by Capelle et al. 3 , using Recombinant Inbred Lines (RILs) F 3:4 populations derived from a cross between F 2 and F 252 . Obvious stage-specific QTL were revealed for all traits. QTL for KWCs at 10, 20, 30 and 40 DAP and for KDRs during all periods was conducted by Li et al. 18 , using 258 RILs developed from a cross between N04 and Dan232. The results showed that 45 QTLs were stage/ period specific. Besides, there were no other records in the literature regarding QTL for KWC and KDR at different stages before pollination. In this study, 120 derived double-haploid (DH) lines developed from a cross between two contrasting genotypes, a Tangsipingtou inbred Si287 with low KWC, and a Iodent inbred JiA512 with high KWC were used to map QTLs by genome-wide composite interval mapping (GCIM) 19 and QTNs by multi-locus random-SNP-effect mixed linear model (mrMLM) 20 , and to mine related candidate genes, which is for KWCs and the corresponding KDRs from 10 to 40 DAP. The results are of important theoretical significance and application value in the mining of candidate gene and the marker-assisted breeding of the field KWC and KDR-related characteristics in maize.

Results
Phenotypic evaluation of the DH populations. From Table 1 and Supplementary Fig. S1, the KWCs of both parents of the DH line population, Si287 and JiA512, were significantly or extremely significantly different at all the sampling times from 10 to 40 days after pollination in 2015 and 2016. There existed variations in the target traits among different lines, and the coefficients of variation for all the KWCs were less than 10%. The heritability for the KWCs ranged from 77.324 to 79.631%. The correlations between these KWCs for various periods in three environments were more than 90%. From Fig. 1a, we could know that the changing tendency of KWCs in three environments was similar; all continuously declined with increasing days after pollination and generally conformed to linear curve.
From Table 1 and Supplementary Fig. S2, the KDRs of Si287 and JiA512, were significantly or extremely significantly different at a majority of the above sampling times in 2015 and 2016. There existed variations in the target traits among different lines, and the coefficients of variation for all the KDRs were more than 10% and ranged from 28.76 to 47.63%. The heritability for the KDRs ranged from 63.235 to 73.295%. In the DH population, only the kurtosis for KWC30 was > 1, and the absolute values of skewness and kurtosis for other traits were < 1, which met the QTL mapping requirements for mapping studies. The KDRs for various periods were different among different lines of the DH population, but the correlations between these indicators were poor. From Fig. 1b, we could also know that the changing tendency of KDRs in the mean environment was continuously increased with increasing days after pollination.
Genetic map construction. Using the Axiom Maize55K 21 chip and upon filtration per the criteria described in section "DNA extraction and genotype analysis", there remained 12,861 polymorphic SNPs. The bin function in IciMapping software was used to delete redundant markers, the recombination frequency between them will be estimated as 0. A genetic linkage map containing 1702 markers was eventually obtained. 1702 markers covered 1,309.02 cM on 10 chromosomes (Chr.) with an average marker interval of 0.77 cM (Supplementary Fig. S3). Total length of the map for each chr. ranged from 98.96 cM (chr.8) to 234.93 cM (chr.1). Chr.4 and 1 had the least (115 markers) and most (335 markers) markers, respectively; chr.3 and 4 had the minimum (0.63 cM) and maximum (0.97 cM) average marker-intervals, respectively. Only one gap ≥ 10 cM existed on Chr.4 (10.46 cM); 1,680 gaps ≤ 5 cM existed (  Table S1) and 6 QTLs were detected in two or three environments. These QTLs were distributed on Chr. 1, 3-5, 7 and had an LOD range of 2.54-3.87 and could explain 3.06-16.03% of the phenotypic variation (PVE). For 6 QTLs related to KWC, qKWC35-3-1, qtlKWC35-7-1 and qtlKWC40-3-2 derived from the maternal line Si287, which had an LOD range of 2.76-3.41 and the range of PVE was 3.72-8.85%; qtlKWC35-4-1, qtlKWC35-7-2 and qtlKWC40-3-1 derived from the paternal line JL001, which had an LOD range of 2.76-3.41 and the range of PVE was 3.06-12.10%. For 4 QTLs related to KDR, qtlKDR30-5-1, qtlKDR35-7-1 and qtlKDR40-3-1 derived from the maternal line Si287, which had an LOD range of 2.54-3.47 and the range of PVE was 7.24-12.80%; qtlKDR15-1 derived from the paternal line JL001, which had an LOD 2.64 and the PVE was 16.03%. qtlKWC35-7-2 and qtlKDR35-7-1 were located at the same interval; qtlKWC35-3-1, qtlKWC40-3-1 and qtlKDR40-3-1 had an interval adjacent to each other. The above results are consistent with previous studies, which indicated that KDR is a maternal effect 24 but has a paternal effect as well 25 .
GWAS for KWCs and KDRs. Structure 2.3.4 software 26 was used to calculate the population structure (Q value),by setting the range of K value to 1-10 and based on the kinship and ΔK value of the parents Si287 and JiA512, K = 2 was specified ( Supplementary Fig. S7). Using population structure (Q) and kinship (K) as covariates, the Q + K model in the mixed linear model mrMLM was used to perform GWAS for KWC and KDR. A total of 27 QTNs associated with KWC and KDR were detected (Table 4, Fig. 2, Supplementary Fig. S8) and were distributed on Chr. 2-4 and Chr.6-8, and the range of PVE was 0.34-11.58%, in which 7 QTNs were detected by more than two environments or two methods. qtnKDR25-4, qtnKWC35-7-2 and qtnKWC40-3-2 were detected by two environments and the range of PVE was 2.88-8.23%. qtnKWC35-8 and qtnKWC40-3-3 were respectively detected by two or three models and the range of PVE was 4.56-11.58%. qtnKDR30-7 and qtnKWC35-4 were respectively detected by two or three models in three environments. www.nature.com/scientificreports/ Co-mapping analysis of QTLs and significantly associated loci in GWAS and candidate gene mining. The 5 KWC-related QTNs detected by the mrMLM model in the GWAS were consistent with the intervals of KWC-and KDR-related QTLs (Fig. 2). qtnKWC35-3-1 and qtnKWC40-3-2 were within the physical interval of qtlKDR40-3-1. qtnKDR35-3 and qtnKWC40-3-3 were within the physical interval of qtlKWC40-3-1. qtnKWC35-3-2 was within the physical interval of qtlKWC35-3-1. qtnKWC35-4 was within the physical interval of qtlKWC35-4-11. qtnKDR35-7, qtnKWC35-7-2 and qtnKDR20-7 were within the physical interval of qtlKDR35-7-1 and qtlKWC35-7-2. To explore genes potentially related to KWCs and KDRs in maize, we analyzed the above 3 hotspot regions. The 3 regions have 98, 33, and 363 genes, respectively. These genes are annotated for enrichment analysis using AGRIGO V2 software (https ://syste msbio logy.cau.edu.cn/agriG Ov2/index .php). According to the functions in the GO database, the terms can be grouped into 3 categories (Fig. 3): molecular function (MF), cellular component (CC) and biological process (BP). For MF, there are 6 terms; For CC, there are 3 terms and for BP, there Table 1. Statistical analysis for KWC and KDR of the parents and the DH population. 1 SD., standard deviation; 2 CV., coefficient of variation; 3 Ske., skewness; 4 Kur., kurtosis; 5 Her., heritability; BLUP, best linear unbiased prediction, ** is significant at 0.01 levels. www.nature.com/scientificreports/   Table S3).

Discussion
The availability of a reliable methodology to measure KWC under field conditions is a bottleneck in selection for KDR 28 . The traditional oven method is destructive and not suitable for rapid detection of KWC. Instead, a moisture determination metric, which reveals kernel moisture via detection of electric capacity variation, has been developed 29 . This method was listed in the International Seed Testing Protocol in 2003. The hand-held moisture  www.nature.com/scientificreports/ meter has been reported to be useful for selecting and evaluating genetic materials 4,11,12,23,28,30 . In this study, an SK-300 probe (manufactured by Harbin Yuda Electronic Technology Co., Ltd., China) was used to measure KWC. Many studies suggested that there is close relation between KWC and KDR and environmental factors including air temperature, air humidity, rainfall, etc [30][31][32] . Hence, the following measures were taken in this study so as to avoid the effect of the environmental factors on KWC and ensure the determination accuracy. (1) To avoid border effects, for each plot, 2 border rows and the first 2 plants at each end of the middle 3 rows were not used for future determination. (2) The ears were bagged before silking and pollinated by hand. One week later, the bags were removed and 5 tested ears were randomly selected, tagged and labeled in each plot. (3) The measurement time was established for 9:00 A.M. to remove the effect of the dew and the difference of measurement time. (4) If it rains, the KWC was measured after wiping the outer bracts of the ears to eliminate the effect of the rainfall.
RIL and DH population are all permanent mapping populations. The former has a high degree of recombination, but the constructed period is very long and dominant effect couldn`t be estimated. The latter has a bad degree of recombination, but the plants are homozygous and could be used to study the interactions between genotypes and environments.
The quality of the genetic map directly affects the accuracy of QTL mapping. Increasing marker density can improve the resolution of genetic map [33][34][35] . With the development of high-throughput sequencing and resequencing of the whole genome of the B73, numerous SNP markers have become effective means for constructing high-density genetic maps for maize 36,37 . SNPs provide abundant genetic variation loci at the genome level, which greatly improves genome coverage and marker saturation [38][39][40] . In previous studies, very significantly distorted markers were discarded in the construction of linkage maps, but the more markers increase total genetic distance and marker density on the chromosome 41 , and use of fewer distorted markers in all the RILs decreases the impact of distorted marker on map construction 42,43 . In this study, the Axiom Maize55K chip was used for genotyping DH lines and their parents to screen out 12,861 polymorphic markers, and upon removing redundant markers that the recombination frequency between them will be estimated as 0, a linkage map containing 1,702 markers including segregation distortion markers was obtained, with a total full length of 1,309.02 cM and an average map distance of 0.77 cM.
Linkage analysis was the most widely-used method in QTL mapping, which included the composite interval mapping (CIM), the inclusive composite interval mapping (ICIM), etc. Up to now, most of the numerous QTLs have small effects on complex traits 44 , and some are closely linked 45 . Although QTL mapping has proven to be useful for detecting major QTL with relatively large effects, it may lack power in accurately modeling small-effect QTL 46 . To address this issue, Genome-wide association study (GWAS) was developed to reconsider the model and improve the way that polygenic background is controlled. The GWAS data often includes a large number of markers, making co-factor selection infeasible. Thus, polygenic effects are often fitted to a mixed linear model to capture the genomic background information 47,48 . This treatment can help us improve the methods of QTL mapping, and overcome the subjectivity nature of the CIM in co-factorselection 49,50 . A series of simulated and real www.nature.com/scientificreports/ datasets was used to compare the different methods. The results showed that GWAS analysis had higher power in QTL detection, greater accuracy in QTL effect estimation, and stronger robustness under various backgrounds as compared with the CIM and empirical Bayes methods 19 . KWC and KDR after field pollination in maize are complex quantitative traits susceptible to environmental conditions and are controlled by multiple genes 8,51 . As maize KWC and KDR are affected by additive genetic effects and have high heritability 7,51-55 , it is feasible to carry out mapping of major QTLs for KWC and KDR, to mine candidate genes and to develop practical functional markers for marker-assisted selection. In this study, 10 QTLs and 27 QTNs were detected, in which 4 QTLs and 29 QTNs were consistent with previous studies. The others in this study have not been reported. 2 GO terms of BP were significantly enriched (P ≤ 0.05), the KWCs and KDRs may be related to certain BP terms. 6 candidate genes were obtained, in which Zm00001d022326 coded gibberellin receptor GID1L2 (Zea mays). These co-located QTL are reliable and will be valuable for marker assisted selection in maize genetic improvement.

Conclusions
The KWCs and the KDRs of both parents of the DH line population, Si287 and JiA512, were significantly or extremely significantly different at the sampling times from 10 to 40 days after pollination in 2015 and 2016. There existed variations in the target traits among different lines. The heritability for the KWCs and KDRs was very high. 10 quantitative trait loci (QTLs) and 27 quantitative trait nucleotides (QTNs) were detected by genomewide composite interval mapping (GCIM) and multi-locus random-SNP-effect mixed linear model (mrMLM), respectively. One and two QTL hotspot regions were found on Chromosome 3 and 7, respectively. Analysis of the Gene Ontology showed that 2 GO terms of biological processes (BP) were significantly enriched (P ≤ 0.05) and 6 candidate genes were obtained. This study provides theoretical support for marker-assisted breeding of mechanical harvest variety in maize.

Materials and methods
Plant materials. Si287 (low KWC) and the self-selection line JiA512 (high KWC) were selected as parents based on their similitude in time to flowering and their difference in KDR, which were part of the Tangsipingtou and Iodent heterotic groups, respectively. Specifically, Si287 was the maternal of the maize hybrid Jidan 27, which has been continuously grown for fifteen years in Heilongjiang Province, China with the most annual planting acreage, reaching up to 160,000 hectares. A DH population of 120 lines was developed from a cross between Si287 (maternal) and JiA512 (paternal).
The development of DH population was briefed as follows: In the summer of 2013, at the Gongzhuling (Jilin Province, China) Experimental Base of the Jilin Academy of Agricultural Sciences (JAAS), Si287 and JiA512 were crossed to obtain F1. In the winter of 2013, at the Ledong (Hainan Province, China) winter nursery of JAAS, the F1 plants as the maternal parent were made to obtain induced progenies using the induction line "Jiyou 101" as the paternal parent. In the summer of 2014, at the Gongzhuling Experimental Base, the induced progenies were chromosome-doubled using colchicine, followed by kernel identification; upon field identification and selection, 120 DH lines were obtained. In the winter of 2014, at the Ledong winter nursery, the 120 DH lines were multiplied in large number and used in subsequent experiments. The ears were bagged before silking (50% of plants in the row having extruded silks). Then the bagged ears were pollinated by hand (Supplementary Table S4). One week later, the bags were removed and 5 tested ears were randomly selected, tagged and labeled in each plot. The water content was recorded from 10 to 40 day after pollination, with one measurement of every 5 days. At 9:00 a.m., per the method published by Reid et al. 29 , for each ear, a SK-300 probe for water content measurement (manufactured by Harbin Yuda Electronic Technology Co., Ltd., China) was used to pierce into kernels after penetrating the bract leaves in the middle of the ear.
KWCs on day 10, 15,20,25,30,35,40 after pollination were measured, which were designated as KWC15, KWC20, KWC25, KWC30, KWC35 and KWC40, respectively. KDRs were then calculated based on KWCs for 2 adjacent times. KDR = (KWC at a given time-KWC at the next time)/number of days during the time span. The KDRs for the 6 time spans (namely, 10-15, 15-20, 20-25, 25-30, 30-35 and 35-40 days after pollination) were respectively denoted as KDR15, KDR20, KDR25, KDR30, KDR35, and KDR40. CV(Coefficient of variation) = SD(Standard Deviation)/Mean . One-way analysis of variance (ANOVA) between parents and the descriptive statistics for the DH population was conducted using SPSS 22.0 (SPSS, Chicago, IL, United States). R software was used to analyze the correlation between various traits. The best linear unbiased predictions (BLUPs) for each trait of 2 years were calculated using the R package Lme4 56 with the following model: y = Imer Trait ∼ 1|Genetype + (1|Year) . DNA extraction and genotype analysis.. Genomic DNA of 120 DH lines and their parents were extracted from young leaves by a modified cetyltrimethyl ammonium bromide (CTAB) method. DNA quality was determined by agarose gel electrophoresis (0.8%) and spectrophotometry (NanoDrop 2000). Genotyping was performed using an Axiom Maize55K biochip 21  www.nature.com/scientificreports/ The Axiom Maize55K biochip contains 55,229 single-nucleotide polymorphisms (SNPs). Based on the Affy Axiom Array 2.0 platform, the 120 DH lines and their parents were genotyped. Upon genotyping, the original data were filtered based on the following criteria: (1) minor allele frequency (MAF) > 0.05 and missing genotype rate < 0.1 (24,622 remained); (2) missing SNP loci for any or both of the parents (24,481 remained); (3) no polymorphisms at loci between parents (17,124 remained); and 4) heterozygous loci for any of the parents (12,861 remained). The PLINK program (version 1.9) 57 was obtained SNPs with MAF > 0.05 and missing genotype rate < 0.1 for association analyses.
Genetic map construction. The bin function in QTL IciMapping V4.1 58 software was used to delete redundant SNP markers, the recombination frequency between them will be estimated as 0, and the remaining markers were bin markers and used to construct a genetic linkage map. The threshold value of the logarithm of odds (LOD) was set as 3.0, and the Kosambi function 59 was used to start the program. Centi-Morgan (cM) was used to represent the intervals of markers on the map. QTL mapping and comparison with previous studies. All the above phenotypes, along with marker genotypic information and linkage maps, were used to identify QTLs using genome-wide composite interval mapping (GCIM) 19 , implemented by the software program QTL.gCIMapping.GUI 60 , where the threshold for significant QTL was set at LOD = 2.5 and the walking speed was 1 cM. Considering that all potential QTLs were selected in the first stage, we decided to place a slightly more stringent criterion of 0.000691, which is converted from LOD score 2.50 of the test statistics using P r χ 2 v > 2.50 × 4.605 = 0.000691 . The above-mentioned QTL nomenclature refers to the method in McCouch et al. 61 . The QTL nomenclature was designated as: qtl + trait abbreviation + chromosome number + QTL number. MapChart 2.3 software 62 was used to draw genetic linkage maps and label QTLs. When a QTL in the current study shared the same physical region as the previous QTL, it was regarded as a repeated identification of the previous QTL; otherwise, the current QTL was regarded as a new one.
Genome-wide association studies. All the above phenotypic and genotypic information in the above mapping population was used to detect QTNs using the mrMLM 20 , FASTmrEMMA 63 , FASTmrMLM 64 , pLARmEB 65 , pKWmEB 66 and ISIS EM-BLASSO 67 approaches, implemented by the software program mrMLM v4.0. The aboved six methods belonged to the "mrMLM" software package, which was developed by Professor Yuanming Zhang from College of Plant Science and Technology of Huazhong Agricutural University. The unified parameter settings for the six methods were as follows: (1) the Q + K model was used, in which the population structure value Q was calculated by Structure 2.3.4 software23 and the kinship value K was analysed by the "mrMLM" software package; and (2) the significant threshold FPR (the false positive rate) value was set as 0.0002 (LOD = 3.0), which was calculated as the ratio of the number of false positive effects to the total number of zero effects considered in the full model. In addition, while using mrMLM and FASTmrEMMA, the search radius of candidate genes was specified as 20 kb; using pLARmEB, 50 potential association loci were selected on each chromosome. The QTN nomenclature was designated as: qtn + trait abbreviation + chromosome number + QTN number.
Candidate genes identification. All QTLs and QTNs related to maize KWC and KDR detected by QTL gCIMapping software and the "mrMLM" software package were mapped to the maize reference genome B73 RefGen_V4, and candidate genes were identified in hotspot regions where QTL intervals overlapped QTNs. The resultant candidate genes were subjected to Gene Ontology (GO) enrichment analysis for selecting candidate genes related to maize KWC and KDR.