Superior haplotypes towards development of low glycemic index rice with preferred grain and cooking quality

Increasing trends in the occurrence of diabetes underline the need to develop low glycemic index (GI) rice with preferred grain quality. In the current study, a diverse set of 3 K sub-panel of rice consisting of 150 accessions was evaluated for resistant starch and predicted glycemic index, including nine other quality traits under transplanted situation. Significant variations were noticed among the accessions for the traits evaluated. Trait associations had shown that amylose content possess significant positive and negative association with resistant starch and predicted glycemic index. Genome-wide association studies with 500 K SNPs based on MLM model resulted in a total of 41 marker-trait associations (MTAs), which were further confirmed and validated with mrMLM multi-locus model. We have also determined the allelic effect of identified MTAs for 11 targeted traits and found favorable SNPs for 8 traits. A total of 11 genes were selected for haplo-pheno analysis to identify the superior haplotypes for the target traits where haplotypes ranges from 2 (Os10g0469000-GC) to 15 (Os06g18720-AC). Superior haplotypes for RS and PGI, the candidate gene Os06g11100 (H4-3.28% for high RS) and Os08g12590 (H13-62.52 as intermediate PGI). The identified superior donors possessing superior haplotype combinations may be utilized in Haplotype-based breeding to developing next-generation tailor-made high quality healthier rice varieties suiting consumer preference and market demand.

is one of the crucial determinant factors for all the stakeholders including breeders, producers and consumers 2 . Currently, more efforts have been put forth to improve the rice quality traits by utilizing the key allelic variants from the germplasm variation to meet quality rice demand of consumers and potential markets. But, what are the key grain quality determinants? They are broadly classified in to, milling, appearance, cooking/eating parameters and nutritional quality. Among these, the later two are associated with the end-users. The physicochemical characters which determine the cooking and eating quality are amylose content (AC), gel consistency (GC) and gelatinization temperature (GT) or alkali spreading value (ASV). Generally, rice with intermediate amylose, GT and soft to medium gel consistency are preferred by consumers in South Asia 3 . Apart from quality, these days people have more concerns about their health, and prefer low glycemic index rice varieties which could be safe for diabetics and obesity. Hence, grain quality improvement of rice together with the lower glycemic index are of vital importance for the rice breeders in the current scenario of increasing diabetic population all over the world, especially in the Asia. Low GI diets effectively prevent type II diabetes 4,5 and consumption of high amylose class rice varieties with soft-textured on cooking might be an alternative for intermediate amylose rice varieties especially for those who are suffering from type II diabetes 6 . The most critical factor responsible for breeding low glycemic index (GI) rice is resistant starch (RS). It is the starch portion resistant to enzyme hydrolysis that escapes the small intestine and enters the large intestine where it gets fermented and slowly releases glucose to the bloodstream. Generally, it has been reported that increased level of RS content in rice grains lowers the GI value ensuring a negative correlation with amylose content 7 . Recently low-to-intermediate RS phenotypic variations in rice panel were identified, resulting novel RS associations to numerous genes associated with amylopectin biosynthesis and degradation 7 .
In any crop species, genetic diversity plays important role in breeding elite varieties. Identification of favorable alleles 8 and its superior haplotype of the various genes associated with traits linking to cooking and eating quality of rice are pre-requisite of breeding to develop healthier rice. Nowadays, with the improvement of highthroughput sequencing technologies with reduced cost makes genome wide association studies (GWAS) as on of the prominent techniques to identify marker-trait associations (MTAs). Research on the glycemic index identified a novel association of candidate loci Os05g03600 reporting intermediate to high GI variations 9 . Another hotspot on chromosome 6 was found to include GBSSI, hydrolase genes and genes involved in signaling and chromatin modification with differential methylation patterns in GI6.1 region. Alternative splicing of GBSSI promoter region resulted in intermediate to high GI variations. Novel SNP associations on chromosomes namley 2, 5, 6 and 11 has been reported and these SNPs influences the final viscosity variations but no significant association with GI 9 . Besides, the predicted glycemic index (PGI) is estimated by in-vitro enzymatic action of starch digestion leads to hydrolysis and the glycemic index (GI) estimated in-vivo, which requires human clinical evaluation of the two to three hours of blood glucose response after food intake. Thus, it is more time consuming and resource demanding. The in-vitro digestion methods have been developed to measure the starch hydrolysis index and it is used to calculate PGI using the formula developed 10 . Significant positive association (r = 0.946) between pGI and GI of rice samples by using bread as the reference 11 .
Rice grain quality is one of the major determinants in selecting parents for any breeding program and the genetic constitution of the genotypes determine the effect of prevailing environmental conditions on the developmental process involved in seed formation and maturation. The good quality grain is an important factor influencing its acceptance by consumers and thus, is one of the major traits in rice breeding to withstand marketability in trade and commerce. However, one of the significant challenges to date has been the lack of extensive knowledge of the genetic and molecular basis of several grain quality traits including GI. The availability of 3000 rice genomes (3 K-RGP) offers opportunities for harnessing haplotype diversity for GI along with other critical quality-related genes and enhances the possibility for the identification of ideal haplotypes for deployment in rice breeding. The current study was undertaken with the objective to identify the candidate genes and their superior haplotypes associated with RS, PGI and nine major grain quality traits across the 3 K-RGP subset under transplanted (TPR) situation ( Fig. 1) and explore the possibility of developing healthy rice varieties with preferred grain quality by assembling superior haplotypes via haplotype-based breeding.

Results
Trait variation for resistant starch, glycemic index and grain and cooking quality. The subset of 3 K RGP was extensively phenotyped to study the variations for RS, PGI along with nine other grain qualityrelated traits. The results revealed significant variations among the accessions (Table S1). The RS content ranged Correlation among grain quality traits. Association among the traits was observed. RS and PGI showed significant negative correlations. The important quality trait AC showed a significant positive (r = 0.19*, p-value-0.048) and negative correlation (r = − 0.16*, p-value-0.049) with RS and PGI, respectively whereas; it showed a negative significant relationship with ASV and GC. KL had a significant positive correlation with LBR and KLAC whereas it was negatively correlated with KB (Fig. 2).

Figure 1.
Overall methodology flow of the study to unravel the SNP associations, identifying favorable SNPs, locating the candidate genes and superior haplotype identification in transplanted rice (TPR) by haplotype based breeding to assemble superior haplotypes to design the rice varieties suitable for diabetics with improved grain quality parameters. www.nature.com/scientificreports/ Genome-wide association study (GWAS) of targeted traits. Single locus GWAS (MLM) was performed using the phenotypic data with approximately more than 0.5 million (500 K) SNPs of concerned accessions and the peak associations were predicted based on considering the value of PVE (phenotypic variance explained) with suggestive significant p-value < 0.00001 (− log 10 (p) > 5). In general, modification of bonferroni correction significant threshold value to control the false positive rate in a single-locus GWAS are so conservative that some associated SNPs may be eliminated. To avoid this problem multi-locus GWAS of mrMLM was also conducted to confirm the MTAs with the LOD score of greater than 3. A set of 41 causatives MTAs were detected for all the investigated quality traits in MLM (Table 1). In MLM, single MTA was identified for KL, KBAC, L/B ratio and LER, and up to nine MTAs for AC, RS and PGI. We identified nine MTAs associated with RS located on chromosomes 1, 2, 6, 8 and 11 with PVE ranging between 15.92 to 19.31% (Fig. 3). For PGI, nine MTAs were identified on chromosomes 1, 3, 7, 8, 9 and 11 with PVE ranging between 23.47 to 17.77% (Fig. 3). For AC, nine MTAs were detected on chromosomes 2, 6, 8, 10 and 11, respectively with the PVE ranging between 14.38 to 17.96% (Fig. S1). For KL, one MTA was identified on chromosome 11 with the PVE of 11.87% (Fig. S2 S3). LBR recorded an MTA on chromosome 5 with the PVE of 16.99% (Fig. S4). For ASV, an indicator for GT, one peak was detected on chromosome 6 with a PVE of 25.53% (Fig. S5). Three MTAs on chromosome 1, 5 and 12 were detected for KLAC with PVE of of 23.55, 24.17 and 18.05%, respectively (Fig. S6). For KBAC only one association was detected on chromosome 8 with the PVE of 18.35 (Fig. S7). For LER, one MTA was identified on chromosome 9 explaining PVE of 16.69%. Two MTAs for GC on chromosome 10 and 7 with the PVE of 18.39 and 18.11% were detected in TPR (Fig. S8).
To confirm the real associations, multi-locus GWAS (mrMLM) was also conducted with three models mrMLM 37 , FastmrMLM 38 and FASTmrEMMA 39 by adopting the critical threshold of significance for SNP-trait association was set at a LOD score of greater than 3. On comparing both MLM and mrMLM, we found that all the MTAs identified by MLM were also detected by any one method of mrMLM namely mrMLM, Fastm-rMLM and FASTmrEMMA with significant LOD score ( Table 1, Table S2a). In mrMLM GWAS, S6_6896749 and S11_28809145 exhibited with AC with a high PVE range of 11.52 to 21.83 and 9.26 to 11.50%, S5_5361877 for KB and LBR with PVE of 1.01 to 5.29% and 1.01 to 12.26%, S6_6711302 for ASV with the PVE of 10.55 to 27.19%, S12_17888167 for KLAC with 16.62-17.19% of PVE, S9_17873147 for LER with 1.00-16.07% PVE, S7_24356119 for GC with 37.55% PVE and S11_5737145 for PGI with PVE of 5.71 to 23.55%. As per the model of mrMLM GWAS, the common MTAs identified shown higher values of PVE in the range of 5.29% (S5_5361877

Detection of favorable SNP alleles associated with a trait of interest. Associated MTAs of RS,
PGI and nine quality traits were subjected for the identification of favorable SNP alleles (Table S2b). In this investigation, the positive effect of candidate SNP alleles that led to increase in AC, KL, L/B ratio, ASV, KLAC, LER, GC, and RS or decrease in KB, KBAC and PGI were defined as "favorable alleles", and those that resulted in decrease of AC, KL, L/B ratio, ASV, KLAC, LER, GC, and RS or an increase in KB, KBAC and PGI were defined as "unfavorable alleles". As a result, one MTA for each trait like AC, ASV, GC and RS had an increased phenoptypic effect whereas KB, KBAC and PGI revelaed decreased effect in their phenotype are designated as favorable alleles. S6_6896749 possessing 'G' allele had strong phenotypic effect (0.88%) on AC, S5_5361877 possessing ' A' allele with 0.18 of L/B ratio, S6_6711302 with 'T' allele had 1.96 phenotypic effects on ASV, S6_5819355 with ' A' allele had 1.74% increase in phenotypic effect in RS and S8_7447826 with ' A' allele had resulted in decrease of − 2.91 in PGI (Table S2b). Findings of this study indicated that the favorable SNP significantly increased or decreased the phenotypic effect of the trait in the genotypes they were present. Identification of superior haplotypes of these mined favorable alleles would be beneficial to develop elite entries with preferred grain quality and lower GI value.
Haplotype analysis of identified candidate loci. All the MTAs associated with the quality-related traits were subjected to the identification of candidate loci using RAP database ( Table S2b). Out of the 41 MTAs detected in this investigation, we found 27 MTAs were within the candidate loci and 14 MTAs were found to be flanked by two candidate loci. Of these, nine loci for PGI, nine loci for AC, nine loci for RS, four loci each for KB, KLAC, two loci for GC, one loci each for KL, LBR, ASV, LER and KBAC were identified in this study. The identified candidate loci were subjected to haplotype analysis by rice SNP seek database to estimate the number of haplotypes present in the sub-set of 3 K-RGP for all the investigated quality-related traits and to use superior haplotypes to breed quality rice varieties. Haplotype analysis reported a minimum of one haplotype to the maximum of 15 haplotypes for the identified loci. The loci Os08g0276000, Os02g0135800 and Os10g09430 Table 1. GWAS for grain quality traits, RS content and predicted glycemic index by single locus (MLM) and multi-locus model (mrMLM). AC-Amylose content, KL-Kernel length, KB-Kernel breadth, LBR-Length/ Breadth ratio, ASV-Alkali spreading value, KLAC-Kernel length after cooking, KBAC-Kernel breadth after cooking, LER-Linear elongation ratio, GC-Gel consistency, RS-Resistant starch, PGI-Predicted glycemic index, TPR-Transplanted rice and significant associations were detected with the P value < 0.00001 in MLM and LOD > 3 in mrMLM. Bolded MTAs shown strong association confirmed by MLM and mrMLM with high value of PVE.   (Table S4).

Discussion
Rice, one of the essential staple food crop species is consumed by half of the global population especially in Asian continents. The ever-increasing population of the world demands more than 50% additional rice production to be attained by 2030 to meet future needs 40 . Dissecting the complexity of grain quality is possible by means of genome wide association mapping using a diverse set of population with a higher precision approach to map the causative alleles than the bi-parental mapping population since GWAS population consist higher evolutionary recombination events in their genomes 41 . During the past three decades, tremendous efforts have been put in by rice breeders and geneticists to detect the causative QTLs/genes responsible for quality improvement in rice and many researchers came up with remarkable results. Grain quality includes cooking and eating quality of rice grains and these are complex traits governed by multiple genes with various gene expression and regulatory network pathways and several QTLs/genes associated with grain quality were identified and cloned in rice 42 . Starch biosynthesis in rice grains during grain formation and development is one of the critical factors in improving the quality aspects of rice varieties and regulation of this complex metabolic pathway controlled by a network of various genes and gene combinations are poorly understood. Waxy gene (GBSSI) plays a key role in amylose biosynthesis and the formation of long-chain amylopectin 43 . Another gene Starch synthase IIa (SSIIa) modifies amylopectin structure and improving grain quality 44,45 . Likewise, several other genes namely SSI, SSIIIa, isoa-mylase1 (ISA1), pullulanase (PUL), branching enzyme (BEI and BEIIb) [46][47][48][49][50] , have also been reported to control starch structure formation and modifies the physicochemical properties of rice grains. Therefore, amylopectin content and structure alteration have a significant impression on the modification of starch granules and affect cooking properties in rice.
In recent times, non-infectious diseases cause several health-related issues like cardiovascular problems, obesity and diabetics due to the high consumption of calories and improper physical exercises. Recent study reported that high amylose starch-rich RS decreases the glycemic value of starch, insulin response in human metabolism and reduce the risk of type II diabetes mellitus and other health-related issues 51 . Currently, breeders and geneticists are making efforts to develop high RS rice varieties. Cultivars rich in amylose content of rice, maize and barley developed by either mutational breeding or biotechnological approaches have been found to possess high RS content 52 www.nature.com/scientificreports/ In the present investigation, diverse panel of rice accessions (3 K-RGP sub-set) were not only phenotyped for grain quality-related traits but also phenotyped for the most important health/well-being related traits such as RS and PGI to map candidate SNPs, functional annotations and to unravel the haplotype diversity of identified candidate loci across the 3 K-RGP. Then, the haplo-pheno analysis was carried out to ascertain superior or appropriate haplotype for the improvement of grain quality in rice breeding programs.
RS, a valuable trait was found to have a significant positive correlation with amylose content and was negatively correlated with glycemic index as suggested by several previous reports and thus a suitable parameter for rice consumption to improve the health status of those who are suffering from diabetics, colon cancer, and obesity 7 . We identified nine strong MTAs for RS with significant PVE on chromosomes 1, 2, 6, 8 and 11 (Table 1). Primarily, previous reports found that sbe3-rs on chromosome 2 55,56 and SSIIIa (Os08g0191433) on chromosome 8 45 involved in the production of RS content and it may be associated with expression of Waxy gene as well. In our study, we report two MTAs in proximity to the candidate regions mentioned above of chromosome 2 (Os02g0594100) and chromosome 8 (Os08g0335500) with not much significant haplotype diversity but significant variations in the trait value was observed among the accessions 57 . We also identified a locus on chromosome 6 (Os06g11100) near (1 Mb away) to the position of starch synthase IIa (SSIIa: Os06g0229800, 6,748,398-6,753,302) gene, significantly affecting GC, degree of starch crystallinity by increasing the fractional Table 2. Haplo-pheno analysis of favourable candidate loci for grain quality traits, RS content and predicted glycemic index in rice. AC-Amylose content, KL-Kernel length, KB-Kernel breadth, LBR-Length/ Breadth ratio, ASV-Alkali spreading value, KLAC-Kernel length after cooking, KBAC-Kernel breadth after cooking, LER-Linear elongation ratio, GC-Gel consistency, RS-Resistant starch, PGI-Predicted glycemic index, TPR-Transplanted rice.  8,44,58 . All the identified MTAs and their corresponding candidate loci exhibit a significant number of variable haplotypes for the traits studied. In recent past, the discovery of mutation/ variants each in GBSSI, SSI, SSIIa, and SSIIIa with a potential to increase RS content and hydrolysis index in rice has been reported 59,60 . For PGI, we identified a total of 9 associations on chromosome 1 (Os01g0548000), 3 (Os03g38070), 7 (Os07g0675200), 8 (Os08g12580 and Os08g0520300), 9 (Os09g13060 and Os09g29090) and 11 (Os11g10500 and Os11g22020) ( Table 1). Similar kind of results were reported 9 and we also found significant MTAs influencing PGI in close proximity on chromosome 1 (snp_01_30302075 and snp_01_36980087), Chr 3 (snp_03_22422723), Chr 9 (snp_09_21456056 and snp_09_21523496) and Chr 11 (snp_11_334055 and snp_11_28758733). Similar is the case for GC; our results match with "qAC7" associated with RM8261 in the physical position Chr7: 25,866,581 61 . Here, we identified a candidate Os07g0597400 on chromosome 7 associated with GC. A QTL qRS7-2 on chromosome 7 associated with RS content between RM3404 (chr7:20,105,832) and RM478 (chr7:25,949,521) was identified and reported 36 .

Trait Gene ID (RAP) Allelic effect Gene function
Over several decades, quality traits like AC, GC, ASV, LER, etc., of rice grains have been extensively studied and it has been reported that AC is governed by waxy gene 62 , by several other loci 63 and even by unidentified non-waxy genes 64 . Our study reported nine associations for AC (Fig. S1) and the major SNP S6_6896749 located with the candidate loci of Os06g0232700 on chromosome 6 found adjacent to ALK gene (6,726,252) [65][66][67] . Several other loci were identified on Chromosome 2, Os02g0135800 encoding for WD domain, G-beta repeat domaincontaining protein, expressed and Os02g05180 (2,474,176) encrypting for retrotransposon protein, putative Ty3-gypsy subclass. Starch branching enzyme 3 (SBEIIb-Os02g32660) is essential for amylopectin synthesis in the endosperm 68,69 . Several reports found that high amylose rice and maize reveals in amylose extender (ae) phenotype due to the inactivation of SBEIIb 46 . These results clearly confirmed that a network of genes are involved in the biosynthesis of amylose. Our association analysis study provides a stronger picture of how genetic network of starch synthesizing genes involved in cooking and eating quality improvement in rice. We also identified an association on chromosome 11, S11_28809145 on Os11g0704000 with the function of selT/selW/selH selenoprotein domain-containing protein, expressed slightly closer to Os11g31330 70 .
Kernel length, kernel breadth, and LBR are the important traits to assess the grain size and dimensions for selecting appropriate breeding lines for their consumer's preference and market acceptability. Medium slender www.nature.com/scientificreports/ rice is more preferable than bold rice varieties. For kernel length, an MTA was identified on chromosome 11 (S11_28909306) with 9 haplotypes. For kernel breadth, one MTA was observed on chromosome 5, two on chromosome 3 and one on chromosome 2. LBR determines the grain size and is controlled by a complex network of genes and also affected by environmental factors. Our study reported similar findings of 13 , who reported quite a lot of QTLs for grain length, grain width and grain length-width ratio on chromosome 3, 5 and 7 by utilizing 3 k panel sub-set. To date, several hundred QTLs associated with grain size have been identified and functionally characterized viz., GS3, GW2, qSW5/GW5, GS5 and qGL3/qGL3.1 14,16,17,19,[71][72][73][74] . Among the above said QTLs GS3 and qSW5 showed consistent and sturdy effect on grain size variations reported in our investigation also 71,[75][76][77] . For instance, the locus Os05g0187500 identified having large haplotype diversity responsible for the putative function of IQ calmodulin-binding motif family protein exhibits peak association close to the SNP position of S5_5361877 reported for KB and LBR 27,31,41,78 . ASV is an indicator for gelatinization temperature, we identified a strong and peak association on chromosome 6 at the position of S6_6711302 (between Os06g0229000 and Os06g12380) with worthy haplotype diversity in the range of 4 to 15 haplotypes, respectively. The same QTL on chromosome 6 at the position of 6,726,252 associated with ALK gene was reported [65][66][67]78 . A study conducted with 258 accessions of 3 k panel also reported the same genic location (S6_6752888) for GT and a candidate of ALK gene encoding SSIIa affecting the chainlength distribution of amylopectin causing alkali disintegration of rice grains 13,65,79 . For gel consistency, we identified two candidates on chromosome 7 (Os07g0597400) and 10 (Os10g0469000) with minimum haplotype variations of 8 and 2. Previously, a qGC6 was reported between close proximity to WAXY gene 13 . More importantly, a new novel association for GC is identified on chromosome 7 had near vicinity and it was not previously reported. Linear elongation ratio is an important cooking quality trait and we identified an associations for LER on chromosomes 9. A QTL for LER was reported on chromosome 4 with the marker interval of C933 to C946 using RIL population of indica x indica hybrids 80 .
On a whole, amylose and amylopectin production is a complex phenomenon governed by several networks of genes and their coordinating expression during grain filling and seed maturation affecting the grain textural properties and even prevailing environmental conditions also. Noteworthy, it is not an easy breeding protocol to improve the quality aspects of rice varieties by selecting a notable gene that may be either superior or inferior for a specific trait and introgression by markers assisted pyramiding and selection. In this, we identified several superior haplotypes by studying haplotype diversity and linking haplotype variations with the trait value. Assembling superior haplotype combinations for most of the grain quality traits viz., AC, ASV, GC, LBR and RS coupled with GI in to one background to maximize the textural properties of grains, yield and most of morpho-agronomic traits to develop tailor-made new generation rice with enhanced genetic gain compared to available popular mega rice varieties 81,82 . For example, AC (S6_10612428) had maximum haplotype diversity with the superior haplotype of H3 (Os06g18710 -25.50%) possessing high amylose class, LBR (Os05g0187500) owns H9 (2.81) as superior one among 15 haplotypes, RS (Os06g11100) holds H4 (3.28%) as topmost one from all the 4 haplotypes.
In this investigation we also identified better donors for RS, PGI and other quality-related traits which can be utilized in the haplotype-based breeding program to develop elite lines with low glycemic index value with desirable quality traits by assembling superior haplotypes suited for different situations (Table S5). Our results were compared with previous reports of glycemic index, we found UQUIHUA::IRGC 117,037-1 had registered intermediate GI of 66.88 with the RS content of 3.23% whereas 9 reported as low GI (< 55). Apart from this we also selected different classes of glycemic index lines from the 3 K sub-panel grown in 2019WS and different quality-related traits were phenotyped and result showed that four lines namely BAIANG 6::IRGC 6129-1 from Indonesia, MAKRO::IRGC 74,763-1 from India, AUS 329::IRGC 29,116-1 from Bangladesh and KOTTEYARAN::IRGC 47,383-1 from Srilanka possesses medium slender grains with good elongation ratio and medium to soft gel consistency after cooking ( But when considering ASV for GT, the identified entries for the low glycemic index showed the score of 1 to 2 which represents high GT requires more water, cooking time and poor in texture not suitable for cooking and eating (Table S6). Amylose content alone does not describe the cooking and eating quality of rice grains, as varieties with the same range of AC possess variable variations in the cooking and eating quality 55 . In our study, intermediate ASV of 4 to 5 score possesses high PGI values. Superior haplotypes associated with the low glycemic index, intermediate ASV and soft GC in the preferred combinations can be utilized for the development of low glycemic rice varieties with desirable grain quality. Hence, the approach of haplotype-based breeding is anticipated to assist in the development of -premium quality rice varieties with low GI to meet the increasing demands of the rice consuming population.

Conclusion
Significant variations were observed for the grain quality traits, especially for AC, ASV, GC and RS. We captured several novel and significant associations of SNPs for the target traits and studied haplotype differences of identified candidate genes. Newly identified candidate genes might be useful for futher functional characterization and pathway elucidation for grain quality traits in rice. Here we proposed an exclusive strategy on use of superior haplotype-possessing elite donors and incorporating superior/appropriate haplotypes for the majority of quality traits in a genetic background as a prominent way to develop high yielding and quality rich rice varieties suitable for the consumption of diabetics, obese population and as preferred by consumer's needs and demands.  (Table S7). These accessions were evaluated for RS, PGI and 9 quality parameters including physico-chemical properties, cooking and eating properties. The analysis was carried out at the rice quality lab at IRRI SA Hub, ICRISAT campus. The agronomic management of the transplanted experiments including was carried as per the procedure detailed 83 .
Phenotyping of physicochemical parameters and cooking and eating quality of rice. Traits namely, KL, KB and LBR were measured with 10 kernels using Standard Evaluation System of Rice (SES) 84 and the mean was calculated. Amylose content of the grains was estimated by the rapid protocol of cut grain dip method 85 . For cooking-related parameters, 20 kernels from each entry were soaked in 5 ml of distilled water in 15 ml test tubes for 20 min and cooked in a boiling water bath at 100 °C intended for the time period of 8 min to determine the KLAC, KBAC and LER. Gel consistency was determined by the method formulated by 86 , where known amount of rice flour was placed in the culture tubes and wetted with 0.2 ml 95% ethanol containing 0.03% thymol blue and 2 ml 0.2 N KOH added and mixed with a Vortex Genie mixer set at specified speed. Tubes were covered with glass marbles and heated in a vigorously boiling water bath for 8 min. Then, the tubes were removed from the water bath and kept at room temperature for 5 min, cooled in an ice-water bath for 20 min, and laid flat on a laboratory Phenotyping for RS and predicted GI. Measurement of RS. RS content of the rice samples was determined by using a resistant starch assay kit (K-RSTAR, Megazyme, Irishtown, Ireland) by ten-fold downscaling of sample and reagents with slight modifications 88 . Fine powder of polished rice flour (10 mg) was taken in 2 ml eppendorf tube and it was digested with 400 μl of enzymatic mixture-containing 10 mg/ml of pancreatic α-amylase and 3.0U/ml of amyloglucosidase (AMG) and the mixture was incubated at 37 0 C for 16 h with continuous shaking (200 strokes/minute). After incubation, 400 μl of 99% ethanol was added to the mixture and vortexed to stop the reaction. Then, the samples were centrifuged for 10 min at 12,000 rpm and the supernatant was collected and the pellets were washed repeatedly with 200 μl (vortexed) and 600 μl of 50% ethanol. Washing with 50% ethanol was made twice to remove all the non-resistant starch in the samples and supernatants were pooled for the measurement of non-resistant starch. Then, the pellets were air-dried for 30 min to remove moisture content in the residue. Around, 200 μl of 2 M KOH was added to the residue and mixed properly to avoid the formation of clumps and incubated at 5 0 C for 1 h in a shaker (200 strokes/minute). Then, 800 μl of 1.2 M sodium acetate buffer and 10 μl of AMG (3,300 U/ml), vortexed and tubes were placed in a water bath for 30 min at 50 0 C with intermittent mixing with the help of vortex mixer for every 10 min. After incubation, samples were cooled and centrifuged at 12,000 rpm for 10 min. Then, 100 μl of sample aliquot was taken in a fresh test tube (15 ml), 3 ml of GOPOD reagent (D-Glucose assay kit, Megazyme, Irishtown, Ireland) and incubated in a water bath at 50 0 C for 30 min. The absorbance was determined with the help of UV-1800 (Shimadzu Corporation, Japan) at 510 nm against a blank containing buffer and GOPOD reagent.
Estimation of predicted GI (PGI). The glycemic index was determined by the protocol given by 10 with little modifications. Fine powder of polished rice flour (50 mg) was cooked in 5 ml of water for 30 min and 10 ml of HCl-KCl buffer (pH = 1.5) was added to the sample. Then, 0.2 ml of a solution containing 1 g of pepsin in 10 ml of HCl-KCl buffer was added to each sample and incubated at 40 °C for 1 h in a shaking water bath. After incubation, the volume was adjusted to 25 ml with Tris maleate buffer (pH = 6.9). Then, 5 ml of α-amylase (2.6 units) in Tris maleate buffer (pH 6.9) was added and samples were incubated at 37 °C in a shaking water bath. After incubation, 1 ml of the sample aliquot was collected serially at intervals of 30 min for 3hrs (30,60,90,120 and 180 min). The enzyme activity in the aliquot was inactivated by heating at 100 °C for 5 min and refrigerated until the end of the incubation period. 3 ml of 0.4 M Sodium acetate buffer (pH 4.75) and 60 µl amyloglucosidase was added to hydrolyze the digested starch to glucose. The samples were incubated at 60 °C for 45mins in a water bath and glucose content in each aliquot was estimated using the GOPOD kit (D-Glucose assay kit, Megazyme, Irishtown, Ireland). Glucose was converted into starch by multiplying with 0.9. Kinetics of starch digestion was estimated by non-linear first-order equation; C = Cα (1-e -kt ). Where, C = Concentration of starch hydrolyzed at the time (t) , Cα = equilibrium concentration (i.e., % of starch hydrolyzed after 180 min which is the glucose content after 120 min divided by the total starch) and k = kinetic constant. The area under the hydrolysis curve (AUC) was calculated using the equation; AUC = Cα (tf-t0) -Cα/k (1-e-k (tf -to) ). Where, C∞ corresponds to the concentration at equilibrium (t 180 ), t f is the final time (180 min), t 0 is the initial time (0 min) and k is the kinetic constant. A hydrolysis index (HI) was calculated by comparison with the AUC of a reference food (white bread). The predicted glycemic index (PGI) was estimated using the formula PGI = 39.7 + 0.548 (HI).

GWAS, Haplotype analysis and
Haplo-Pheno analysis. Mean data was used for GWAS analyses using approximately more than 500 K SNP data. The analysis was carried out by using GAPIT (MLM model) (https:// CRAN.R-proje ct. org/), R based approach considering both kinship values (k-values) and population structure (Q-matrix) 89 . Of the several methods suggested to correct false positive in association analysis even www.nature.com/scientificreports/ keeping stringent p-value benchmark, the most stringent correction method called "Bonferroni Correction" was used in the present analysis.The bonferroni threshold calculated by the formula 1/m where 'm' is the number of test performed which resulted in -log 10 (1/m) = 5.69. As it is too conservative and suggestive value of p = 0.00001 (− log 10 (p) > 5) 31,90 was used for the identification of peak associations (MTAs) with the targeted grain quality traits. Multi-locus GWAS (mrMLM) was also conducted to validate the MTAs identified by single locus model of MLM by adopting the significant LOD score of 3 91 . The phenotypic allele effect (ai) was determined by the formula given by 29 and the favorable alleles of each trait were subsequently identified according to the breeding objective. Strongly associated SNPs with the targeted traits were utilized to find the causative QTLs/genes by the RAP database. In-built tool of SNP seek database 89 was utilized to conduct haplotype analysis for entire candidate locus, by employing default parameters with Calinski criteria for k-group determination. Nipponbare was used as the reference genome. All of the 150 lines belonging to 8 sub-populations namely, aro, aus, admix, ind1A, ind1B, ind2, ind3, and indx was considered for the haplotype analysis. We utilized the '3kfiltered' SNP set present in the SNP seek database for the entire analysis. The filtered was obtained from the Base SNP set by applying the following filtering criteria: (1) alternative allele frequency at least 0.01, (2) proportion of missing calls per SNP at most 0.2 92 (http:// snp-seek. irri. org/_ downl oad. zul) and this SNP set was already available in the SNP seek database which was directly utilized in this study. Haplotype analysis for candidate locus have been carried out considering only the nonsynonymous SNPs and indels in the exon region that results in aminoacid change. The information regarding haplotype and their diversity was obtained from the SNP-seek database (http:// snp-seek. irri. org/_ downl oad. zul) to detect the superior haplotype by categorizing haplotypes by using phenotyping data of concerned genotype trait means for the associated genes. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.