Mapping quantitative trait loci for yield-related traits and predicting candidate genes for grain weight in maize

Quantitative trait loci (QTLs) mapped in different genetic populations are of great significance for marker-assisted breeding. In this study, an F2:3 population were developed from the crossing of two maize inbred lines SG-5 and SG-7 and applied to QTL mapping for seven yield-related traits. The seven traits included 100-kernel weight, ear length, ear diameter, cob diameter, kernel row number, ear weight, and grain weight per plant. Based on an ultra-high density linkage map, a total of thirty-three QTLs were detected for the seven studied traits with composite interval mapping (CIM) method, and fifty-four QTLs were indentified with genome-wide composite interval mapping (GCIM) methods. For these QTLs, Fourteen were both detected by CIM and GCIM methods. Besides, eight of the thirty QTLs detected by CIM were identical to those previously mapped using a F2 population (generating from the same cross as the mapping population in this study), and fifteen were identical to the reported QTLs in other recent studies. For the fifty-four QTLs detected by GCIM, five of them were consistent with the QTLs mapped in the F2 population of SG-5 × SG-7, and twenty one had been reported in other recent studies. The stable QTLs associated with grain weight were located on maize chromosomes 2, 5, 7, and 9. In addition, differentially expressed genes (DEGs) between SG-5 and SG-7 were obtained from the transcriptomic profiling of grain at different developmental stages and overlaid onto the stable QTLs intervals to predict candidate genes for grain weight in maize. In the physical intervals of confirmed QTLs qKW-7, qEW-9, qEW-10, qGWP-6, qGWP-8, qGWP-10, qGWP-11 and qGWP-12, there were 213 DEGs in total. Finally, eight genes were predicted as candidate genes for grain size/weight. In summary, the stable QTLs would be reliable and the candidate genes predicted would be benefit for maker assisted breeding or cloning.

different populations. For example, the trichome density trait was confirmed with four recombinant inbred lines (RIL) populations of A. thaliana 13 , the stay green trait was confirmed with two RILs populations of sorghum 14 , and the kernel length trait of barley was confirmed by different RILs population 15 .
Grain weight is one of the most important yield-related trait in crops. It is of great significance to clone the genes controlling grain weight and then to clarify their molecular genetic mechanism. Great achievements have been made in genes/QTLs identification and dissection for grain size and grain weight in many crops, such as rice [16][17][18][19][20] , soybean 21,22 , wheat 23,24 . Especially many genes related to grain weight or grain size in rice, including GS3 25 , GS5 16 , qGL3 26 , GW2 18 , GW8 27 , GS2 28 , qGW7/GL7 29 , have been successfully cloned. The evidence showed that the grain size trait was regulated by multiple signaling pathways, and the main pathway included ubiquitin-proteasome degradation pathway 30 , phytohormone signaling pathway and G protein independent pathway. Earlier studies showed that crop's yield was greatly influenced by grain size and grain weight. Although great achievements on the genes controlling grain size and grain weight have been made in maize in recent years 5,[31][32][33] , it was relatively low compared to rice and Arabidopsis thaliana. It is necessary for maize yield-related traits to confirm stable QTLs, to decrease functional gene number in stable QTL intervals and to predict candidate genes. All these works would provide basis for cloning functional genes and marker-assisted breeding.
The purposes of this study were: (1) to identify QTLs for yield-related traits with an F 2:3 population from SG5 × SG7; (2) to compared these QTLs detected in this study with the QTLs identified in other populations, including an F 2 population from the same parents as the F 2:3 used in this study and recent studies; (3) RNA-seq technology was applied to identify transcriptional variations between maize inbred lines SG5 and SG7 subjected to grain weight; (4) to identify the DEGs related to grain weight between SG5 and SG7 and to predict candidate genes.

Results
phenotype evaluation of the mapping population. The phenotypic data of seven yield traits, i.e., ear length (EAL), ear diameter (EAD), cob diameter (CD), kernel row number (KRN), ear weight (EW), grain weight per ear (GWP) and 100-kernel weight (KW), were collected from the F 2:3 mapping population in 2016. The mean values of the seven traits were shown in Table 1. The phenotypic values of the two parents SG5 and SG7 were different in all the seven traits. The seven yield traits all displayed in bell-shaped normal distribution (Supplemental Fig. 1). Table 2 listed the pearson's correlation coefficients and the significance tests between every two of the seven observed traits. The closes relation occurs between GWP and EW (0.975).
QtL analysis using a high-density linkage map. CIM and GCIM procedures were applied to identify QTLs associated with the seven yield traits based on a high-density linkage map. The map constructed in previous study 34 , had 3305 bin-markers. Using CIM, a total of thirty-three QTLs were detected for the seven yield traits. These QTLs were distributed on all of the 10 maize chromosomes. Among them, four QTLs controlled KW trait and were located on maize chromosomes 3, 7, 8, and 9; five related to EAL trait were located on chromosomes 1, 2, 3, and 5; four associated with EAD trait were located on chromosomes 1, 4, and 10; four controlling CD trait were located on chromosomes 1, 3, and 6; five controlled KRN trait and were located on chromosomes 2, 3, 4, and  9; four were associated with EW trait and located on 2, 5, 7, and 9; and seven controlling GWP trait were located on chromosomes 2, 3, 5, 7, and 9. QTL mapping result from CIM procedure is shown in Fig. 1 and .6%, respectively. The logarithm of odds (LOD) scores ranged from 3.0 (qKW-12) to 7.4 (qEAD1-1). For GCIM, a total of fifty-four QTLs were detected. Among these QTLs, fourteen were also detected by CIM (Table 4), three KW QTLs were located on chromosomes 7, 8 and 9, two EAD QTLs on chromosomes 1, and 2, nine CD QTLs on chromosomes 1, 2, 4, 5, 6, and 7, thirty-four KRN QTLs on over all chromosomes except chromosomes 6 and 9, two EW QTLs on chromosomes 2 and 7, and four GWP QTLs on chromosomes 5, 7, and 9. Their LOD scores ranged from 3.1 (qKW-7) to . The GCIM approach seems more powerful in detecting small effects QTLs, especially for early generation population. The related information is summarized in Table 4 .

Confirmation of QTLs in different generation materials.
It is essential to confirm QTLs before used in marker assisted breeding 35 . We compared the QTLs mapped with the F 2:3 population with those detected using other populations, including an F 2 generating the same parents as the mapping population in this study 34 . For the 33 QTLs mapped by CIM in this study ( Table 3), eight of them located on chromosomes 1, 2, 3, and 7 were consistent with those detected in the F 2 population (Fig. 2). There were fifteen major QTLs overlapped with the those identified in previous studies [36][37][38] (Fig. 2), and they were located on chromosomes 1, 2, 3, 5, 7 and 9. For the fifty-four QTLs detected by GCIM, five of them(qKW-7, qEAD1-1, qCD-3, qKRN2-1, qKRN-1) were also mapped in the F 2 population from the same parents, and twenty one were identical to the reported QTLs in other studies ( Table 4). The QTL detected in more than one mapping populations is considered to be stable QTL.
To sum up, for all the QTLs detected by CIM and GCIM methods, thirty of them were stable QTLs, and fourteen and twenty-two were confirmed by CIM and GCIM, respectively. The detailed information about confirmed QTLs were listed in Tables 3 and 4. The confirmed QTLs for grain weight include one major QTL for KW on chromosome 7 (qKW-7, mapped by CIM and GCIM), two major QTLs for EW on chromosome 7 and 9 respectively (qEW-9, mapped by CIM and GCIM; and qEW-10, mapped by CIM only), two major QTLs for GWP on chromosome 2 and 7 respectively (qGWP-6, mapped by CIM only; qGWP-10, mapped by CIM and GCIM), two major QTLs for GWP on chromosome 5 (qGWP-8 and qGWP-13, detected by CIM and GCIM respectively), and three major QTLs for GWP on chromosome 9 (qGWP-11 and qGWP-12, mapped by CIM; and qGWP-14, mapped by GCIM).
comparison of QtL regions. In this study, a total of thirty-three QTLs and fifty-four QTLs were respectively mapped by using CIM and GCIM methods for the seven yield-related traits. However, more than half of these QTLs were not stable, implying that these traits were controlled by multiple minor genes. The closely www.nature.com/scientificreports www.nature.com/scientificreports/ connected QTLs might be one locus, such asqEAD1 and qEAD1-1 on chromosome 1, qCD-3 and qCD3-1 on chromosome 1, qKRN-2 and qKRN-2-1 on chromosome 2. In addition, some of the detected QTLs in this study tended to have pleiotropism phenomenon. For example, qEAD-1 and qCD-3 were located on the same position on chromosome 1, but they were related to EAD and CD respectively. qEW-8 and qGWP-9 were located on the same position on chromosome 5, but they were related to EW and GWP respectively. qEW-10 and qGWP-11 located on the same position on chromosome 9 were related to EW and GWP respectively. And qKW-7, qEW-9 and qGWP-10 were located on the same position on chromosome 7, but they were related to KW, EW and GWP respectively.
Identification of candidate DEG for grain weight. RNA-seq procedure was carried out on an Illumina NovaSeq instrument. Totally, 18 RNA samples from SG5 and SG7 were collected form grains at three developmental stages (three biological replicates per stage) and used for RNA-seq analysis. The Pearson correlation analysis for RNA-seq data between samples was showed in Supplemental Table 2 and Fig. 2. A total of 932,738,890 with average 51, 818,827 clean reads on each chromosome were obtained after we filtering and performing quality control against the raw reads. (Supplemental Table 1). The clean reads were mapped onto the maize B73 genome by TopHat v. 2.0.12 39 software. For each replicate, 40,277,620-62,763,408 reads were obtained. Of these reads, 81.97-86.23% were mapped to the reference genome as unique and multiple matches (Supplemental Table 1). Based on these mapped reads, DEGs, novel transcripts, alternative splicing and events were detected. The fragments per kilobase of exon per million fragments mapped (FPKM) model was used for calculating all genes expression values. The threshold of corrected P-value 0.05 and log 2 (Fold change) of 0.5 (absolute value) were set as the thresholds for pair-wise comparison to detect DEGs between SG5 and SG7. To analyze the functions of DEGs, Gene Ontology function enrichment was conducted by Blast swiss prot database (website). The top significant biological process (BP), cellular component (CC), and molecular function (MF) terms were shown in Supplemental Fig. 3. In the BP group, DEGs are enriched in carbohydrate metabolic and lipid metabolic processes. In the MF group, DEGs are enriched in catalytic activity, hydrolase activity, oxidoreductase activity and so on. In the CC group, DEGs enrichment do not reach up to a significant level.
To decrease the candidate DEG number, we focused on those overlapped within the physical intervals of those confirmed QTLs related to grain weight. These QTLs were located on chromosomes 1, 2, 5, 7, and 9, including qKW-7, qEW-9, qEW-10, qGWP-6, qGWP-8, qGWP-10, qGWP-11, qGWP-12, qGWP-13 and qGWP-14. There were more than 1000 protein-coding genes in the mapped physical intervals in total, but after removing those genes that were not DEGs between parents, only 213 genes left and applied for further comparative genomics analysis and candidate gene prediction (Supplemental Table 3). candidate gene prediction. In ubiquitin-proteasome degradation pathway, grain size/weight was regulated by the interactions among three kinds of protein enzymes, i.e., ubiquitin protein ligase E3, ubiquitin-binding enzyme E2 and ubiquitin-activating enzyme E1 30 .The functional mechanism of DA2 40 in Arabidopsis and GW2 18 in rice in regulating grain size were both involved in the ubiquitin-proteasome degradation pathways. Previous www.nature.com/scientificreports www.nature.com/scientificreports/ studies have proved that abscisic acid (ABA) plays functions during seed development, and ABA-deficient Arabidopsis mutants produce seeds with increased size and mass 41 . MADS-box transcription factor genes have also been proven to play critical role in regulating grain size/weight indirectly 42 . FEM111/AGL80 is such a kind of gene in Arabidopsis 43 . Among the candidate genes in the intervals of qGWP-6, there were two genes GRMZM2G097089 and GRMZM2G158191 encoding E3 ubiquitin protein ligase. At qGWP-8 interval, one gene GRMZM2G169994 encoding E3 ubiquitin protein ligase was located. At the physical interval of the three identical QTLs qKW-7, qEW-9 and qGWP-10, there were two genes, one was GRMZM2G113039 that encodes E3 ubiquitin protein ligase and the other was GRMZM2G134480 that encodes ubiquitin-activating enzyme E1. At the physical interval of the two consistent QTLs qEW-10 and qGWP-11, two genes GRMZM2G057959 and GRMZM2G128953 were located and encoded ABA receptor MADS-box transcription factor respectively. And at qGWP-12 interval, one gene GRMZM2G036697 that encodes probable E3 ubiquitin-protein ligase was located.
These eight genes were chosen as candidate genes for grain weight in maize for the future prospects (Table 5).

Discussion
High yield is a permanent objective for maize breeders. Studies on QTL mapping and gene dissection for yield-related traits have become a research focus in maize in recent years 37,[44][45][46] . It is of great significance for maize breeder to map QTLs, to predict candidate genes, to clone functionlal genes and to clarify gene's genetic mechanism for yield-related traits, especially for grain weight. Compared to rice and Arabidopsis thaliana, the studies on maize grain weight candidate genes were relatively slow. Stable QTLs are useful for marker-assisted breeding. QTLs mapped in one population might well be not detected in another population. Thus, it is critical important for QTLs to be confirmed to rule out statistical anomalies while used in marker assisted breeding 7 . To validate QTLs, the first scheme is to confirm them in other mapping populations, the second scheme is to confirm them in different generations from the same crossing, and the third is to confirm QTLs using the same population evaluated in multiple locations and in multiple years. Mapping QTLs in early generations is that the QTLs with beneficial effect for breeding lines in early generations can be transferred to the high generations directly. In addition, the QTLs detected in early generation are more than those in late generation, especially for those minor QTLs 47 . In this study, an F 2:3 population developed from the cross between SG-5 and SG-7 was used for mapping QTLs for seven yield-related traits. In terms of phenotypic segregation issue in F 2:3 , Zhang et al. (2004) proposed to combine the F 2 phenotypes with the F 2:3 average phenotypes to further increase the power of QTL mapping 48 .
In this study, the F 2:3 phenotypic value were the mean value of eight F 2:3 individuals of F 3 progeny derived from F 2 plant selfing. The confirmed QTLs for yield-related traits are considered as stable QTLs and could be applied in marker assisted breeding, gene cloning and function analyzing, etc. Compared to low-density linkage map, Hori et al. 49 indicated that the higher-density linkage maps were more beneficial for QTL mapping, that is, the markers tightly linked to target QTL are more effective in marker assisted breeding 49 . Furthermore, it is possible to separate two closely linked QTLs by a high-density map with high resolution 49 . In this study, a high -density linkage map developed from an earlier study result 34 was applied in QTL mapping. However, the confidence intervals of identified QTLs always were not narrow enough to predict candidate genes directly. Gene density is within a wide range of 0.5-10.7 genes per 100 kb in maize genome 50 . It is of great significance to study how to decrease the number of candidate genes located on confirmed QTL intervals. For the purpose, a common approach is to develop advanced generation population such as NIL(near isogenic line) population, thus to narrow down the QTL confidence interval significantly, even to clone the QTL or gene based on primary mapping results 51,52 . However, the construction of NILs is time -consuming and tedious, which limit the usage of large amount of objective QTLs in marker assisted breeding to a certain extent. In this study, the DEGs between parents were obtained from transcriptomic profiling of grains at different developmental stages and overlaid onto the confirmed QTL intervals to predict candidate genes for grain weight. Based on these DEGs, the candidate genes in the physical intervals of the ten confirmed QTLs (qKW-7, qEW-9, qEW-10,qGWP  were decreased from over 1000 to 213 (Supplemental Table 3). Comparative genomics analysis was carried out to further predict candidate genes. Finally, a total of eight genes that might be involved in ubiquitin-proteasome degradation 30 , phytohormone signaling and transcription factor pathways were chosen as candidate genes controlling grain size and grain weight in maize.
In this study, a total of ten QTLs associated with grain weight were confirmed as stable QTLs, and eight candidate genes were predicted. All these results would be basis for cloning related functional genes and marker-assisted breeding.

Methods
Development of the F 2:3 mapping population and field trails' investigation. The F 1 hybrid seeds were obtained from an intraspecific cross between two inbred lines SG5 and SG7 in Liupanshui, Guizhou in 2013 summer. There are significant differences in yield-related traits for the two inbred lines ( Table 1) www.nature.com/scientificreports www.nature.com/scientificreports/ High density linkage map, QtL analysis and validation. Methods of extracting genomic DNA, sequencing genotype, grouping sequence data, identifying single nucleotide polymorphisms (SNPs) and constructing high-density bin map were exhibited in our previous study 34 . QTL were detected by two methods: one is CIM method included in QTL Cartographer v. 2.5, and the other is genome-wide composite interval mapping (GCIM) method 53 . GCIM method was applied by using random model with maximum likelihood (ML) function. For CIM and GCIM mapping procedures, the walking speed was 1 cM and the logarithm of odds (LOD) threshold value was set up as 3.0. The position of a significant QTL was determined by its LOD peaks. The positive or negative additive effect of a QTL indicated that the increase or decrease in phenotypic value for a trait is provided by the alleles from SG5 or SG7. The graphic of QTLs positions on 10 maize chromosomes were drawn by MapChart 2.32 software 54 . The physical intervals between the QTLs identified in F 2:3 population were compared with the intervals of QTLs mapped using a F 2 population previously 34 , and Those QTLs both detected from F 2 and from F 2:3 populations show overlapped physical intervals will be regarded as confirmed QTLs.
RnA sample preparation. The two maize inbred lines SG5 and SG7 were grown at the Panxian Maize Breeding Station, in Hainan, China in November 2016. Ear shoot were covered before silking. Some parental plants were hand pollinated when the length of corn silk was about 5 cm. Kernels were sampled from SG5 and SG7 on the 5-, 10, and 15-the days after pollination. Each sample consisted of three biological replicates in parallel. All samples were collected and frozen immediately with liquid nitrogen and stored in refrigerator at −80 °C. The total RNA of kernels at different growing stages was extracted by using TRIzol reagent (Invitrogen). One percent of agarose gels were used for RNA degradation and contamination. The NanoPhotometer ® spectrophotometer (IMPLEN, CA, USA) was used for checking RNA purity. Qubit ® RNA Assay Kit in Qubit ® 2.0 Flurometer (Life Technologies, CA, USA) was used for measuring RNA concentration. The RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system (Agilent Technologies, CA, USA) was used for assessing RNA integrity. The Illumina NovaSeq platform was then applied to RNA-seq.
illumina sequencing and data analysis. A total of 18 samples with three repeats were collected and sequenced at the Illumina NovaSeq platform. Raw reads with fastq format were firstly handled by in-house perl scripts. Clean reads were then obtained after deleting reads containing adapter and ploy-N and removing reads of low quality in raw data. The GC content, Q20 and Q30 of the clean reads were calculated. High quality clean data were then carried out for further downstream analyzing. Maize reference genome and correlated files of gene annotation were downloaded directly from website (https://www.maizegdb.org/). Bowtie v. 2.2.3 and TopHat v2.0.12 39 were used for building reference genome index and aligning paired-end clean reads to the reference genome, respectively. The reads mapped to every gene were counted by HTSeq v. 0.6.1.
For each gene, the expected number of Fragments Per Kilobase of transcript sequence per Millions base pairs (FPKM) was calculated by analyzing the gene length and mapped reads. FPKM is a widely accepted methodfor evaluating gene expression levels based on sequencing depth effect and gene length of the read count simultaneously 55 . The DEGSeq R package (v. 1.20.0) was used for analyzing differential expression genes between two conditions. The P value was corrected by using the Benjamini & Hochberg method. The threshold of corrected P-value 0.05 and log 2 (Fold change) of 0.5 (absolute value) were considered as significantly differential expression. The GOseq R package was used for analyzing Gene Ontology (GO) enrichment of DEGs. The GO term with corrected P-value less than 0.05 was considered as DEG.
Screening of candidate DEGs associated with QTLs for grain weight. In this study, the obtained RNA-seq data were used to explore DEGs between parental lines SG5 and SG7. Pair-wise comparison of transcriptomes between SG5 and SG7 was conducted for detecting DEGs. The DEGs were overlaid onto the physical intervals of confirmed QTLs to predict candidate genes for grain weight in maize.  Table 5. Predicted candidate genes for grain weight. h Start physical position of the gene; m End physical position of the gene; n Gene lenth; p LogFC: Log 2 ratio, number of folds the gene is differentially expressed in RNAseq; q RCP1/RCP2: different of readcounts between P 1 and P 2 ; Positive sign of logFC indicates gene transcript expressed high in SG5 while negative sign indicates gene transcript expressed high in SG7.