Identification of novel QTLs for grain fertility and associated traits to decipher poor grain filling of basal spikelets in dense panicle rice

High grain number is positively correlated with grain yield in rice, but it is compromised because of poor filling of basal spikelets in dense panicle bearing numerous spikelets. The phenomenon that turns the basal spikelets of compact panicle sterile in rice is largely unknown. In order to understand the factor(s) that possibly determines such spikelet sterility in compact panicle cultivars, QTLs and candidate genes were identified for spikelet fertility and associated traits like panicle compactness, and ethylene production that significantly influences the grain filling using recombinant inbred lines developed from a cross between indica rice cultivars, PDK Shriram (compact, high spikelet number) and Heera (lax, low spikelet number). Novel QTLs, qSFP1.1, qSFP3.1, and qSFP6.1 for spikelet fertility percentage; qIGS3.2 and qIGS4.1 for panicle compactness; and qETH1.2, qETH3.1, and qETH4.1 for ethylene production were consistently identified in both kharif seasons of 2017 and 2018. The comparative expression analysis of candidate genes like ERF3, AP2-like ethylene-responsive transcription factor, EREBP, GBSS1, E3 ubiquitin-protein ligase GW2, and LRR receptor-like serine/threonine-protein kinase ERL1 associated with identified QTLs revealed their role in poor grain filling of basal spikelets in a dense panicle. These candidate genes thus could be important for improving grain filling in compact-panicle rice cultivars through biotechnological interventions.


Results
Phenological variation in the parents and RIL mapping population. The total spikelet number per panicle was much higher in PDK Shriram (350-360) than Heera (100-110) ( Table 1, Fig. 1). The inter-grain space (IGS) was much less in PDK Shriram (Mean: 0.268 cm) compared with Heera (Mean: 0.661 cm). In contrast to SFP and IGS, the ethylene production was more in PDK Shriram in basal spikelets (Mean: 0.0319 pmo l g −1 frwt ml −1 air h −1) than in Heera (Mean: 0.0165 pmol g −1 frwt ml −1 air h −1 ) for 3 day after anthesis (Fig. 2). Furthermore, the production of ethylene was always more in the basal spikelets compared with the apical spikelets in all the days after fertilization in both the cultivars (Fig. 2). The phenological data for parents and two RIL lines 166A and 14A were shown in Table 1 Table S1). The IGS in the RIL population varied from 0.28 to 0.95 cm with an average of 0. 51 Table S1).
It was also observed the difference in the production of ethylene was higher between the basal spikelets of the two cultivars than between the apical spikelets with PDK Shriram showing more production of ethylene www.nature.com/scientificreports/ than Heera. In addition, the production of ethylene increased greatly in the basal spikelets in PDK Shriram with an increase in the days after anthesis from 0 to 3 days, but the increase was not much in Heera (Fig. 2). Supplementary Table S1 shows the values for skewness and kurtosis for SPF, IGS, and ethylene production for the kharif seasons of 2017 and 2018. The normal frequency distribution of RILs was observed in SPF and IGS Table 1. Morphological feature of panicles of parental cultivars and two RILs selected based on high spikelet number (compact) and low grain number (lax) panicle and their fertility percentage. Inter-grain space = Total length of primary branch / total number of spikelets per panicle. All values are mean of five replicates.  Fig. 1a, 1b). The ethylene production recorded for 3 DAA in the basal spikelets of RILs, showed highly positive skewness and kurtosis (Supplementary Table S1).

Correlation analysis between traits.
Inter-grain space (IGS) showed a significant positive correlation with spikelet fertility percentage (SFP) for the seasons' data, kharif 2017 (0.313) and kharif 2018 (0.325) at the level p ≤ 0.01 (Table 3). IGS, on the other hand, showed a significant negative correlation (− 0.128 in kharif 2017 and-0.191 in kharif 2018) with ethylene production (ETH) at an early stage after anthesis (3 DAA) in the RIL population at the level p ≤ 0.05. However, no significant correlation was observed between ETH and SFP in both the kharif seasons ( Table 2).
Identification of QTLs for spikelet fertility percentage, inter-grain space, and ethylene production. A total of 118 (9.83%) out of 1200 SSR markers and 22 (9.78%) out of 225 SNP markers (Supplementary Table S2, S3) were found polymorphic between parents, PDK Shriram and Heera. Seven markers that showed highly segregation distortion (P < 0.0001) were not used in linkage map construction. The remaining 133 polymorphic markers were used to construct a linkage map. A total of 17 linkage groups were constructed after screening with LOD threshold 3 using the IciMapping V4.1 software. The linkage map covered a total genetic distance of 1878.3 cM (Supplementary Table S5) with an average marker interval of 19.17 cM. One-dimensional scanning of the whole genome was carried out with mapping parameters of step size 1 cM with PIN (probability value for entering variables) value 0.001 and a threshold LOD score of 3 to identify significant QTLs associated with spikelet fertility percentage (SFP), inter grain space (IGS) and ethylene production (ETH) in kharif seasons of 2017 and 2018 (Fig. 3). A total of 14 QTLs was identified in kharif 2017, while 15 QTLs were identified in kharif 2018 on five different chromosomes of rice (  and lax-panicled cultivar Heera on different days after anthesis (0, 3 and 6). The data are the mean (± SD) of five observations. There was significant difference (p ≤ 0.05) for the ethylene production between apical and basal spikelets in HGN. A denotes apical spikelets; B denotes, basal spikelets. Analysis of candidate genes associated with identified QTLs. In silico search was done to identify genes present in the chromosomal regions associated with identified QTLs. NCBI gene bank database (https:// www. ncbi. nlm. nih. gov/) and rice genome annotation project (RAP-DB) were used to search genes in QTL LG-Linkage group.  3) were carried out at the exponential stage of grain filling at 6 days, 9 days, and 12 days after anthesis in both apical and basal spikelets of parents (PDK Shriram and Heera) and two RILs, one lax-panicle low grain (RIL-14A) and one compact-panicle high grain number (RIL-166A). One ethylene-responsive transcription factor 8 also named as ethylene-responsive factor 3 (ERF3) associated with qETH1.2 on chromosome 1 expression was higher in basal spikelets as compared to apical spikelets in compact panicle PDK Shriram and RIL-166A, while their expression was not significant between apical and basal spikelets in lax panicle Heera and RIL-14A ( Supplementary Fig. 2a)   www.nature.com/scientificreports/  www.nature.com/scientificreports/ with the basal spikelets in PDK Shriram and RIL-166A on all the days of post-anthesis ( Supplementary Fig. 2b).
In Heera and RIL-14A, the expression of the enzyme was more or less similar in both apical and basal spikelets ( Supplementary Fig. 2d). E3 ubiquitin-protein ligase (GW2)(LOC107276853), which was linked to the QTL, qSFP6.1, and qIGS6.3 in the marker region RM20500-RM20506 on chromosome 6 showed higher expression in the basal spikelets compared with the apical spikelets in the compact panicle cultivar, PDK Shriram and RIL-166A ( Supplementary Fig. 3a). However, the opposite was the case with regard to the expression of E3 ubiquitin-protein ligase in the lax panicle Heera and RIL-14A sample ( Supplementary Fig. 3a). The expression pattern of serine carboxypeptidase II-2 (LOC4340344), which was also identified on chromosome 6 in the marker region RM19480-RM19496 of QTL qIGS6.2 was similar to E3 ubiquitin-protein ligase. The expression of the protein was higher in the basal spikelets compared with the apical ones in PDK Shriram (HGN) and RIL-166A, while the protein had a lower expression in the basal spikelets compared with the apical ones in Heera and RIL-14A on all the days post-anthesis, except on 6 DAA ( Supplementary Fig. 3b). The expression pattern of the LRR receptor-like serine/ threonine-protein kinase ERL1 genes, which was also identified in the QTL region of qIGS6.2 and qETH6.1 on chromosome 6, was somewhat different from E3 ubiquitin-protein ligase and serine carboxypeptidase II-2. The expression of the protein was although mostly higher in the basal spikelets compared with the apical ones in general in PDK Shriram and RIL-166A, the expression was low in the basal spikelets than in the apical spikelets on 6 DAA in RIL-166A ( Supplementary Fig. 3c). The expression of the protein on the other hand was much lower in the basal spikelets than in the apical spikelets in Heera, particularly on 9 and 12 DAA (Supplementary Fig. 3C). The expression of LRR receptor-like serine/threonine-protein kinase ERL1, however, remained more or less similar in both apical and basal spikelets in RIL-14A ( Supplementary Fig. 3c), in contrast to the expression of E3 ubiquitin-protein ligase ( Supplementary Fig. 3a) and serine carboxypeptidase II-2 ( Supplementary Fig. 3b). Their expression was found upregulated in apical spikelets in Heera as compared to basal spikelets. However, in compact panicle cultivar PDK Shriram and RIL-166A, its expression was more in basal spikelets as compared to apical.

Discussion
The grain filling stage is crucial and very complex in the rice life cycle. Several factors regulate the grain filling process determining the final grain yield. The number of spikelets per panicle plays an important role in determining the grain yield of the crop. Hence, increasing the spikelet number per panicle is essential to enhance the yield potential of rice. However, breeding programs to increase the number of spikelets per panicle is accompanied by poor filling of the basal spikelets particularly in the secondary branch that limits the grain yield and quality, as is evident from the poor grain filling percentage of PDK Shriram bearing numerous spikelets compared with the high grain filling percentage of Heera bearing fewer spikelets (Table 1). Similar findings have been noted in other studies as well 2,5,6 . The major factor that has come to knowledge in the poor filling of grains on a panicle bearing numerous spikelets is a decrease in inter-grain space (IGS), which was also observed in the present study in PDK Shriram (Table 1), as well as reported by others 5,17,23 . In addition, higher production of ethylene in the basal spikelets of the compact panicle has also been reported to influence grain filling negatively 8,17,23,25 , similar to that found in PDK Shriram (Fig. 2). However, the mechanistic details of inhibition of the grain filling process in basal spikelets of compact-panicle bearing numerous spikelets are yet to be understood. Nevertheless, in the present study, the identification of QTLs controlling the traits and their associate genes may unmask the cause of inhibition of grain filling in high spikelet number compact panicle cultivars at the genetic level. The RIL population developed from PDK Shriram and Heera showed normal distribution for two phenotypic traits, SFP and IGS ( Supplementary Fig. 1a, 2b, Supplementary Table S1) in both kharif seasons of 2017 and 2018. No significant skewness or kurtosis was found for these traits, while ethylene (ETH) production showed skewed distribution 31 . No significant correlation was found between ETH and SFP (Table 2), indicating that ethylene production might not be tightly linked with SFP, in contrast to reports available that high ethylene production in the spikelets is the cause of poor grain filling in basal spikelets of the compact panicle 8,23 . Rather, SFP might be linked to factors other than ETH, like IGS. Support to no possible link of SFP with ETH comes from the fact that in the present case, a lesser SFP was also observed in the RIL population showing low ethylene production ( Supplementary Fig. 1). Nevertheless, the inhibitory role of ethylene may not be totally ruled out, as the ethylene-induced signalling component has been found to be inhibitory to granule bound starch synthase 23,24 . A significant positive correlation between SFP and IGS, is, however, certainly visible from the data of the RIL population (Table 2). It means inter-grain space (IGS) in panicle has an inhibitory role of grain filling that is lower the IGS, higher the poor grain filling particularly seen in basal spikelets. Wang et al. 32 reported a negative relationship of poor grain filling with IGS. The compact panicle cultivars having lower inter-grain space showed lower grain weight and width, and higher chalky grain percentage and amylose content among grains within a panicle than the lax panicle cultivars 33 .
In our study, 118 (9.83%) out of 1200 SSR markers, and 22 (9.78%) out of 225 SNP markers showed polymorphism between parents, PDK Shriram (HGN) and Heera. These polymorphic markers were used to genotype 188 RILs of the cross between PDK Shriram (HGN) and Heera. Among these polymorphic markers, seven SSR markers showed highly segregation distortion (P < 0.00001) (Supplementary Table S4). These seven SSR markers were not considered for linkage mapping and QTL analysis because segregation distortion (SD) significantly affects linkage map construction and identification of QTLs 34 . Segregation distortion can be caused by competition among gametes for preferential fertilization and by abortion of the male or female gametes or zygotes at some stage of the development cycle 34,35 . SD can also occur as a result of conscious or unconscious selection during the development of mapping populations (while forwarding the materials over generations or while sampling). SD in permanent mapping populations such as RILs derived following the single-seed-decent method is due to the cumulative effect of both genetic and environmental factors on multiple generations 36 www.nature.com/scientificreports/ the accuracy of linkage map construction by introducing errors in map distance estimation, and marker order and thus could affect mapping quantitative trait loci (QTLs) when many SDs are present 35,39 . A total of seven novel QTLs associated with ethylene production (qETH1.1, qETH1.2, qETH3.1, qETH4.1,  qETH4.2, qETH6.1, and qETH6.2) were identified. No QTL has been identified earlier for ethylene production in these QTL regions. A total of four QTLs, qSFP1.1, qSFP3.1, qSFP6.1, and qSFP8.1 for spikelet fertility percentage have been identified. Three QTLs, qSFP1.1, qSFP3.1, and qSFP6.1 associated with spikelet fertility percentage are found to be novel as no QTL has been previously identified in regions between RM10552 and HVSSR1-31 on chromosome 1, RM14906, and RM16 on chromosome 3, and between RM20500 and RM20506 on chromosome 6, respectively. A total of seven QTLs (qIGS1.1, qIGS3.1, qIGS3.2, qIGS4.1, qIGS4.2, qIGS6.1 and qIGS6.2) have been identified for inter-grain space. Further, qIGS3.2 and qIGS4.1 was novel as no QTL for panicle density or inter-grain space has been detected previously in the associated QTL region. Previously, QTL on chromosome 1 named sf1 associated with spikelet fertility in the region of 34,264,553 bp-36,658,883 bp and on chromosome 6, QTL for spikelet density named sd6 in the region of 6,023,974 bp-9,537,572 bp have been identified 40 . These QTLs are present in the marker region of detected QTLs qIGS6.1 and qIGS6.2. A total of four QTLs have been identified for spikelet fertility previously, qFER-6 in the region of 3,416,533 bp-3416728 41 , qSPTF6 in the region of 4,234,080 bp-5,096,867 bp 42 , S5 in the region of 6,283,432 bp-9284248 bp 43 , and spf6 in the region of 6,283,432 bp-7177183 bp 44 . The traits, SFP, IGS, and ETH may have controlled with expression of certain genes associated with identified QTLs. The important to note was, however, that QTLs, qSFP1.1, qIGS1.1 and qSFP6.1,  qIGS6.3 were identified in the same regions of chromosome 1 and chromosome 6, respectively (Table 3 and Fig. 4). Interestingly, these two traits also bear a significant correlation between them (Table 2), suggesting that the genes are associated with QTLs for SFP and IGS. Further, qETH6.2 shared the same region with qIGS6.2 on chromosome 6, suggesting that the inhibition in grain filling could be linked to the production of ethylene or inter-grain space at least some of the genes regulating IGS and ETH share a common region on the chromosome, and the genes associated with these QTLs may have pleiotropic effects on regulation of these traits.
The apical spikelets were fully filled with good quality in dense panicled rice while in lax panicled (low grain number per panicle) rice cultivar, both apical and basal spikelets comparatively were good filled 5 . Hence, the comparative expression studies of identified candidate genes in apical and basal spikelets in both dense and laxpanicled rice cultivar PDK Shriram with RIL-166A and Heera with RIL-14A were done to know how their expression affects the grain filling in basal spikelets of dense panicled rice cultivar. The gene, ERF3 (Os01g0797600), an ethylene-responsive downstream signaling component associated with QTL, qETH1.2 on chromosome 1 shows higher expression ( Supplementary Fig. 2a) in basal spikelets as compared to apical spikelets in compact panicle cultivar, while its expression was almost the same in apical and basal spikelets of lax panicle cultivar, Heera and RIL-14A, indicate that its expression was reciprocated the poor grain filling in basal spikelets in dense panicled rice cultivar. Similar results were also found for the expression of AP2-like ethylene-responsive transcription factor (Os06g0145700) (Supplementary Fig. 2b) and EREBP transcription factor (Os06g0194000) (Fig. 2c) associated with QTL, qETH6.1 on chromosome 6. The results indicated that due to the higher evolution of ethylene in basal spikelets of compact panicle cultivar, the perception of ethylene might be higher that leads to higher expression of downstream ethylene signaling component in basal spikelets 23 . Further, AP2 like ethylene-responsive element-binding protein family transcription factor, also known as rice starch regulator (RSR1) and it negatively regulates the expression of type I starch synthesis genes 24 . Hence, due to higher expression of AP2 like EREBP family transcription factor in basal spikelets of compact panicle cultivar, there might be inhibition of expression of starch synthesis related genes may lead to poor grain filling in inferior spikelets. Further, type 1 starch synthesis-related gene, granule bound starch synthase 1 (GBSS1) was also identified in the QTL region of qIGS6.1 and qIGS6.2. Their Spatio-temporal expression analysis of the enzyme reciprocated the grain filling pattern in the lax-as well as the compact-panicle, as the expression of the enzyme was more or less similar in both the apical and basal spikelets of the lax-panicle cultivar, Heera and RIL-14A, while the expression of the enzyme was downregulated in the basal spikelets compared with the apical ones in compact panicle cultivar, PDK Shriram and RIL-166A indicating the negative regulator of the expression of type I starch synthesis genes by AP2/EREBP family transcription factor. So, we propose a model for ethylene-mediated poor grain filling in basal spikelets of dense panicle rice cultivars (Fig. 5). Further, overexpression of AP2/ERF family transcription factor OsEATB (ERF protein associated with tillering and panicle branching was also reported to promote the branching and involve in reduction of panicle length through restricting internode elongation by down-regulating a GA biosynthetic gene ent-kaurene synthase A 45 that leads to lower inter-grain space and increases compactness having numerous spikelets per panicle. Harrop et al. 46 also reported that AP2/EREBPlike genes were involved in inflorescence branching and architecture in domesticated rice. E3 ubiquitin-protein ligase (GW2) has been reported to play a very important role in the regulation of weight of spikelets; the loss of function of GW2 increases grain filling rate and larger spikelets hull 47 . Thus, a greater expression of this gene in the basal spikelets in the compact-panicle cultivar, PDK Shriram and RIL-166A (Supplementary Fig. 3a) might be one of the possible causes of poor filling of grains in them compared with the apical spikelets. Song et al. 47 has also reported that GW2 negatively regulates cell division by targeting its substrate(s) to proteasomes for regulated proteolysis. Furthermore, Choi et al. 48 reported that GW2 negatively regulates the seed size by targeting EXPLA1 for degradation through its E3 ubiquitin ligase activity. In addition, GW2 regulates the seed size through direct interactions with proteins involved in carbohydrate metabolism by modulating their activity or stability and controlling disulfide bond formation in various proteins during seed development 49 . Since both the parent PDK Shriram and RIL-166A shows poor grain filling in the basal spikelets, and the same is accompanied by the compactness of their panicle, a low IGS in them could be a result of the pleiotropic effect of greater expression of E3 ubiquitin-protein ligase in the basal spikelets compared with apical, the possible cause of poor grain filling in the former than in the latter. The relationship between the spatial difference in expression of E3 ubiquitin-protein ligase and panicle compactness or panicle architecture, however, certainly needs further detailed investigation.  Fig. 3b). The role of serine carboxypeptidase II-2 has so far not been investigated. However, some other serine carboxypeptidase like GS5/OsSCP26 and SCP46 has been reported to be a positive regulator of grain width and weight 50,51 . Higher expression of LRR receptor-like serine/threonine-protein kinase in the basal spikelets compared to the apical ones in PDK Shriram and RIL-166A ( Supplementary Fig. 3c) could also be the reason for a low IGS and compactness of panicle in them in contrast to Heera and RIL-14A. The possible role of LRR receptor-like serine/threonine-protein kinase in regulating IGS in panicle is reflected from the fact that the enzyme is the largest receptor-like protein kinase having a diverse role in plant growth, development including organogenesis, morphogenesis, hormone signaling, and abiotic and biotic stress response in plants 52,53 . It has also been reported that LRR receptor-like serine/threonine-protein kinase ERL1 regulates inflorescence architecture and organ shape as well as stomatal patterning, including density and clustering, together with ER and ERL2 [54][55][56][57][58][59][60] . Thus, the QTLs, qIGS6.1, qIGS6.2, and qETH6.1 on chromosome 6 that harbour genes encoding proteins like LRR receptor-like serine/threonine-protein kinase remains to be answered how their higher expression in the basal spikelets in comparison with the apical spikelets would change the entire panicle architecture from lax to compact type.

Conclusions
Grain filling in rice is a complex process, particularly keeping in view the differential filling of the grains in the spikelets based on their spatial location in compact panicles. The differential filling of grains in the compact panicle in which the basal spikelets remain largely unfilled or poorly filled has, in fact, been a major obstacle in rice breeding programs intended to increase rice production. An attempt was made to understand the molecular basis of differential grain filling through the identification of QTLs for spikelet fertility and its associated traits like inter-grain space, and ethylene production in the mapping RIL population of Heera (lax panicle) and PDK Shriram (compact panicle). Novel QTLs, qSFP1.1, qSFP3.1, and qSFP6.1 for spikelet fertility qIGS3.2 and qIGS4.1 for inter-grain space and qETH1.2, qETH3.1, and qETH4.1 for ethylene production are the useful QTLs that may have role in regulation of grain filling of basal spikelets in dense panicle rice. Five genes associated with these traits like ERF3, AP2-like ethylene-responsive transcription factor (Os06g0145700) and EREBP transcription factor (Os06g0194000), granular bound starch synthase 1 and E3 ubiquitin-protein ligase stand distinct are involved in the regulation of grain filling in basal spikelets of dense panicles, while the role of serine carboxypeptidase II-2, LRR receptor-like serine/threonine-protein kinase in compactness of panicle and grain filling was not known. These genes thus could be important candidate genes in improving the grain filling in compact-panicle rice cultivars showing poor grain filling through biotechnological interventions. The QTLs harbouring these genes may be useful to transfer the grain filling trait into the rice cultivars of interest through the molecular breeding approaches. www.nature.com/scientificreports/ ber of spikelets (100-110) (Fig. 1). A mapping population consisting of 188 F 11 recombinant inbred lines was developed from the cross PDK Shriram and Heera by single-seed descent method.

Materials and methods
Phenotypic evaluation of RIL mapping population for spikelet fertility and panicle compactness. All  Estimation of ethylene production in the spikelets. Ethylene production in the apical and basal spikelets of the parents PDK Shriram and Heera was measured on the early days of anthesis, such as 0, 3, and 6 days after anthesis (DAA). The primary tillers were used to harvest for the collection of spikelets at various stages of anthesis. The differences in anthesis of spikelets on basal branches were 4-5 days after anthesis of apical spikelets and it was taken as 0 day for basal spikelets. The apical and basal spikelets sampled were moistened and immediately inserted separately in rubber stopper 10 ml tubes and sealed after 20 min. The sealed tubes with spikelets were incubated in darkness for 5 h. After incubation, the headspace gas (1 ml) was drawn with the help of airtight chromatography syringe and injected into the GC column of gas chromatography (Varian CP-3800) fitted with a flame ionization detector. The data were represented as the mean ± SD of five observations and expressed as pmol ethylene produced. h −1 g −1 frwt ml −1 headspace air 23 . Ethylene production in all 188 RILs was also measured for 3 days after the anthesis of basal spikelets to generate phenological data for ethylene production.
Genotyping of RIL mapping population. Genomic DNA isolation was carried out from the leaf tissues of the parents and all 188 RILs using the Cetyl trimethyl ammonium bromide (CTAB) method 61 . The quality of the DNA was checked by electrophoresis on 0.8% agarose gel and the quantity was determined using a Nano spectrophotometer. A set of 1200 random SSR markers and 225 SNP markers were used to identify polymorphic SSR and SNP (single nucleotide polymorphic) markers between parents, PDK Shriram and Heera. PCR was performed in a 20 μl reaction mixture using 5 pM (pico-mole) of each SSR marker forward and reverses primer, 200 μM of dNTP (dATP, dCTP, dTTP, dGTP), 1U of Taq DNA polymerase, 10 mM Tris-HCl (pH = 8.3), 50 mM KCl and 2 mM MgCl2 and 40 ng of genomic DNA. The PCR amplification was carried out in a thermal cycler (Applied Biosystems, USA) with the following parameters: initial denaturation at 94 °C for 3 min followed by 35 cycles of denaturation at 94 °C for 30 s, annealing at 55-65 °C for 1 min and extension at 72 °C for 1.5 min and then a final extension for 10 min at 72 °C. The amplified products were separated on 3-3.5% agarose gels using 1 × TBE buffer, stained with Ethidium bromide (0.5 μg/ml). The gels were visualized under UV radiation and were photographed using Gel-Doc System (Syngene, USA) to detect amplified fragments. All the microsatellite markers used in this study were available in the Gramene database (http:// www. grame ne. org). 118 polymorphic SSR markers (Supplementary Table S2) were used to genotype 188 RILs along with parents. 225 SNP sequences were downloaded from OryzaSNP@MSU databases (http:// rice. plant biolo gy. msu. edu) and the MassARRAY Assay Design Suite (ADS) software from Agena (www. Agena Cx. com) was used to design assay panels (nine assay panels) for the PCR and iPLEX extension primers for each SNP of interest. These nine assay design panels were used to survey parental polymorphism using the Agena MassArray Analyzer4 (MA4) genotyping system. Depending upon the complexity of the assay, a single multiplex reaction contained 20-33 SNP markers. Two PCR amplifications were carried out. 1 st PCR amplification was carried out in a 96-well microtiter plate using 40 ng DNA of each genotype and reagents supplied in the iPLEX reagent kits. PCR conditions like reaction volume or conditions of the PCR program were determined by the kit. After completion of 1st PCR, excess nucleotides were dephosphorylated by using 2 μl of shrimp alkaline phosphatase (SAP) treatment for about 40 min followed by inactivation of SAP enzyme for 5 min at 85 °C. Then 2nd PCR with the iPLEX single base extension reaction in which a mix of oligonucleotide extension primers designed to anneal to the amplified DNA fragments was added together with an extension enzyme and mass-modified dideoxynucleotide terminators. After completion of the 2nd PCR reaction, 30 μl of sterile molecular grade water was added to each well. Then the 96-well microtiter plate was loaded to Agena MassArray Analyzer4 system. The extension products (analytes) were desalted using 13 μl of Clean Resin by transferring to each well of microtiter plate and then 9 nanoliters of reaction PCR products from each well of microtiter plate were loaded onto a SpectroCHIP Array by automated nanodispenser, where they crystalized with a pre-spotted MALDI matrix. The SpectroCHIP Array was then loaded into the main chamber of the Agena MassARRAY Analyzer4 system, where very low pressure is maintained. The UV laser beam was then bombarded precisely on to the each sample (as analyte crystals) present in SpectroCHIP Array. The sample (analyte crystals) was ionized, and the positively charged molecules move from the bottom towards the top (detector) of the chamber. The separation of ionized molecules occurs during movement inside the chamber, which is proportional to the mass of the individual molecule. After each laser bombardment, the detector records the relative time of arrival of each ionized molecule. The lightest ionized molecule arrives first, www.nature.com/scientificreports/ while the heaviest ionized molecule arrives last at the detector. Typer software automatically generates reports in the form of peaks that identify the SNP alleles (one peak for homozygous while two peaks for heterozygous alleles) in each sample. Twenty-two polymorphic SNP markers (Supplementary Table S3) were identified and subjected to again design assay panel (one panel). SNP genotyping was performed in 188 RILs and parents using the Agena MassArray Analyzer4 genotyping system.

Construction of linkage map and identification of QTLs.
The genotype data on 188 RILs generated by 118 polymorphic SSR and 22 single nucleotide polymorphic (SNP) markers was used for the construction of linkage map using integrated QTL IciMapping, Version 4.1 software 62 . LOD score 3 was used to construct the linkage group. The Kosambi mapping function was used to convert the recombination frequency into genetic distances in centi-Morgan (cM) 63 . The data of linkage map construction was used for QTL identification using the same QTL IciMapping, Version 4.1 software. The inclusive composite interval mapping (ICIM) was used to identify QTLs associated with spikelet fertility percentage, inter-grain space (panicle compactness), and phytohormone ethylene production. The inclusive composite interval mapping (ICIM) software provided more efficient background control via a two-step mapping strategy as compared to composite interval mapping (CIM). This software avoids the possible increase of sampling variance and complicated background marker selection process in CIM 64 . The one-dimensional scanning of the whole genome was carried out with mapping parameters of 1 cM, the probability value for entering variables (PIN) of 0.001 with a threshold LOD score of 3 to identify significant QTLs 65 .
In silico analysis for identification of the candidate genes associated with QTLs. Expression studies of the candidate genes. Apical and basal spikelets were sampled on 6, 9, and 12 days after anthesis in both parents PDK Shriram and Heera, frozen into liquid nitrogen, and stored at -80 °C until use. Among RILs, one high spikelet number compact panicle line, 166A, and one lax panicle low spikelet number line, 14A were also considered for the sampling of the spikelets, similar to that of the parents. Total RNA was isolated from the collected samples using TRIZOL reagent (Invitrogen) following the reagent user manual. The quality and quantity of the RNA isolated from the individual samples were checked on Agarose gel and Nanodrop spectrophotometer, respectively. Quanti-Tect Reverse Transcription Kit (Qiagen) was used for conversion of total isolated RNA to cDNA following the protocol outlined in the kit's manual. The study of the expression of some genes associated with identified QTLs regions were conducted using the cDNAs prepared from the individual samples. Expression of a gene was studied by qRT-PCR taking the cDNA as template and SYBR green (Agilent). Primers specific to the gene of interest were designed using Primer Blast software at the NCBI site. The required amount of SYBR green, cDNA template, and the primers for a gene was mixed in a final volume of 20 µl, and PCR was run in Real-Time PCR (Bio-Rad CFX). Rice actin was used as an internal control. The relative level of templates of the individual gene in the apical and basal spikelets was quantified following Pfaffl 66 , and the result was expressed as a fold change in expression in the basal spikelets over apical ones. Seven candidate genes were selected from three QTL regions and used in the expression studies (Supplementary Table S6).