Identification of rice landraces with promising yield and the associated genomic regions under low nitrogen

With the priority of the low input sustainable rice cultivation for environment friendly agriculture, NUE of rice becomes the need of the hour. A set of 472 rice genotypes comprising landraces and breeding lines were evaluated for two seasons under field conditions with low and recommended nitrogen and >100 landraces were identified with relative higher yield under low nitrogen. Donors were identified for higher N uptake, N translocation into grains and grain yield under low N. Grains on secondary branches, N content in grain and yield appears to be the selection criterion under low N. Through association mapping, using minimum marker set of 50 rice SSR markers, 12 genomic regions were identified for yield and yield associated traits under low nitrogen. Four associated genomic regions on chromosomes 5, 7 and 10 were fine mapped and QTL for yield under low N were identified from the marker delimited regions. Three candidate genes viz., 2-oxoglutarate /malate translocator (Os05g0208000), alanine aminotransferase (Os07g0617800) and pyridoxal phosphate-dependent transferase (Os10g0189600) from QTL regions showed enhanced expression in the genotypes with promising yield under low N. Marker assisted selection using SSR markers associated with three candidate genes identified two stable breeding lines confirmed through multi-location evaluation.

Association Study (GWAS) has become a powerful tool in rice to identify genomic regions associated with traits of interest using Single Nucleotide Polymorphism (SNPs) [16][17][18] .
Since, NUE is finally calculated on yield basis, the information about the variability of yield and yield related components in the rice germplasm under low N situation is important for developing rice varieties with NUE. Thus, using rice landraces adapted to low inputs and strategy of association mapping, an attempt was made in the present study to identify promising donors for yield and the associated traits/parameters and genomic regions for yield under low N through i) evaluation of 472 genotypes comprising landraces and breeding lines under field conditions in wet (WS) and dry seasons (DS) with low and recommended N; ii) characterization of correlations of nine agro-morphological traits and parameters under low N iii) assessment of panicle component traits contributing towards yield under low N and v) to identify genomic regions associated with yield under low N using a strategy of minimum set of 50 polymorphic rice SSRs across 12 chromosomes through association mapping and fine mapping of a few associated genomic regions as a proof of concept.

Results
Identification of promising donors for yield under low N. Wide variation with a general trend of reduction for the nine agro-morphological traits and parameters was observed in 472 genotypes under low N compared to recommended N across the seasons under field evaluation (Table 1) (Supplementary Fig. S1). Significant differences among genotypes were found by ANOVA for all the parameters but for the panicles m −2 . N treatment was found to be significant only for grain yield. In the present study, NUE was analysed using three indices viz., physiological NUE (PNUE kg kg −1 ) as total dry weight/total N uptake to study conversion efficiency, internal efficiency (IE kg kg −1 ) as grain yield/total N uptake and NHI (%) as N in grain/N in plant to study partitioning efficiency. For three NUE indices, despite the wide range of values obtained, N treatment was not significant, whereas significant genotypic variation was observed ( Table 1). The interaction between treatment and genotypes was significant for plant height, total N uptake and NUE indices. Significant differences were also detected for all the interactions between genotypes and seasons and also among genotypes, treatments and seasons. The analysis of variability for seven additional traits viz., SPAD during vegetative stage, number of tillers per hill, number of productive tillers per hill, straw weight kg ha −1 , total dry matter kg ha −1 , N in grain kg ha −1 and N in straw kg ha −1 under low N compared to recommended N were presented in Supplementary Table S1.
Under low N, minimum mean reduction of plant height by 17.8%; panicles m −2 by 16.1%; grain yield by 18.3% and total N content by 33.8% was observed across the seasons. The trend of reduction was also seen for the additional traits like tillers, productive tillers, straw weight, total dry matter and N in grain and straw (Supplementary  Table S1). However, some landraces have shown buffering capacity under low N with high number of panicles, total N uptake, total dry matter and yield indicating their efficient uptake and utilization of available N into yield. Analysis of the variation for the nine traits/parameters across the genotypes grouped as high yielders (>3000 kg ha −1 -WS) (>2500 kg ha −1 -DS), moderate yielders (>1500 kg ha −1 ) and low yielders (<1500 kg ha −1 ) clearly showed significant differences for the total N uptake by the high yielders in comparison to the other two groups. The internal efficiency (kg kg −1 ) and NHI% of the low yielders found to be significantly different from high and moderate yielders (Fig. 1). In both seasons, N in grain found to be differentiating the high, moderate and low yielders ( Supplementary Fig. S2).
Thus, 103 donors with promising grain yield >3000 kg ha −1 -WS and 122 donors >2500 kg ha −1 -DS were identified under low N and 26 genotypes were found to be common across seasons as listed in Supplementary  Table S2. As our focus is yield along with NUE, 30 genotypes (WS) and 24 genotypes (DS) with high IE (>68 kg kg −1 ) were further selected among the genotypes with higher yield under low N.
A sub-set of 206 genotypes from the set of 472 genotypes was investigated for the distribution of spikelets and grains across the panicle. Each panicle was divided into upper and lower portion and the spikelet number and

Traits/ Parameters
Range Mean grain number was counted on primary and secondary branches. ANOVA showed significant differences for all the traits, but for grain filling % across the panicle (Table 2). For total number of spikelets and grains, significant reduction was observed by 21.9% and 20.4% under low N. Similar reduction was observed in the lower portion of the panicle for number of spikelets by 24.2% and number of grains by 23.1% under low N, indicating adverse affect of limited N supply on lower portion of panicle. The spikelets and grains on secondary branches were observed to be reduced by 25.6% and 24.6% in comparison to the primary branches. The magnitude of reduction for spikelets and grains across the panicle was shown in Table 3. The highest reduction (28.8%) was found for spikelets on the secondary branches and followed by grains on secondary branches (27.6%) of lower portion under low N. In the present study, genotypes were identified with maximum number of spikelets ~135 and grains ~100 on secondary branches in the lower portion of the panicle, which can be deployed donors for grain number under low N. Similarly, genotypes have also been identified with high number of spikelets and grains on primary branches of lower portion and primary and secondary branches of upper portion of the panicle (Table 2).
Positive correlation of SPAD and panicles m −2 with high yield under low N. To understand the relationship among the traits/parameters and to explore the possibility of using the phenotype data as selection criteria for yield under low N, multiple correlations were performed across genotypes with high and low yields under low N resulting only in obvious significant positive correlations like for grain yield with total N and total dry matter across the seasons. The interesting observation was the wide variation showed for correlation analyses of high and low yielders especially for SPAD and panicles m −2 across the seasons (Fig. 2a-d).
Grouping of germplasm by population structure. A total of 177 alleles were obtained using 50 SSR markers among the 472 genotypes with the average alleles of 3.54 per SSR locus ranging from 2 to 6. The average gene diversity ranged from 0.54 ranging from 0.34 to 0.72 with average PIC value of 0.45 ranging from 0.29 to 0.67 (Supplementary Table S3). The model based simulation of population structure showed the highest value of ΔK = 4 for 472 rice genotypes ( Supplementary Fig. S3). The population structure showed four subgroups and the grouping was consistent with clustering based on genetic distance calculated by DARwin 5.0 ( Supplementary  Fig. S4). AMOVA estimates showed a total variation of 9.66% among four subgroups and 90.34% of variation within subgroups (Supplementary Table S3). The overall F ST value was 0.097 among subgroups and the gene diversity among the subgroups ranged from 0.43 to 0.55 (Supplementary Table S3).  . MLM considers both population structure and kinship for association mapping, thus is more suitable for the material of the present study over GLM, where only population structure is considered for analyses. Under low N in wet season, seven marker trait associations with five SSRs under field conditions were identified (RM495, RM22, RM55, RM169 and RM1381). Interestingly, NHI% was associated with four markers (RM495, RM169, RM22 and RM1381). RM22 also had shown association with two morphological traits, plant height and SPAD. HI was found to be associated with RM55. For dry season, grain yield was associated with RM514 and RM507. The total N uptake and IE were found to be associated with RM1381 and RM22, while RM495 was associated with panicles m −2 . For panicle traits, 21 traits associations with 10 SSRs were observed. RM271 found to be the most promising marker with associations of grain filling (%) of secondary branches, spikelets on primary branches of lower portion, grain filling (%) of lower portion, grains on lower portion and total grains. The length of the panicle was associated with RM507. Spikelets on primary branches of upper portion found to be associated with RM208 and lower portion with RM271. Spikelets on secondary branches of lower portion were associated with RM22 and RM210. The spikelets on branches of lower portion were associated with RM169 and the total spikelets on primary branches with RM154. Grains on primary branches of lower portion were associated with RM22 and on secondary branches of lower portion with RM154. The total grains on branches of lower portion were found to be associated with RM495, on lower portion with RM271 and on primary branches with RM514. The grain filling (%) of secondary branches on upper portion was associated with RM154; on lower portion with RM271 and RM169; of secondary branches with RM271and the total grain filling (%) with RM507. The total grains were associated with RM271, RM455 and RM514. The alleles of RM22 and RM271 have shown positive and negative allelic effects across subgroups ( Table 4). The marker trait associations for recommended N were listed in Supplementary Table S4. The identified SSR markers from association mapping of the 427 genotypes were evaluated using existing 218 recombinant inbred lines (RIL) population developed between BPT5204 and Varadhan. BPT5204 is a check variety for yield and quality and Varadhan was found to be promising for yield under low N from our earlier study 19 and the characters of BPT5204 and Varadhan are given in Supplementary Table S5. Mapping of RIL population showed four SSR markers, RM507 and RM169 (with grain yield), RM495 (with panicles m −2 ) and RM271 (with total grains) found to be associated under low N.

Moderate level of linkage disequilibrium (LD) and LD decay.
Novel genomic regions for yield under low N identified through association mapping. Out of 12 associated genomic regions under low N identified in this study, five genomic regions (38.5%) viz., RM495, RM154, RM507, RM1381 and RM271 are novel genomic regions for yield under low N. Seven genomic regions (RM208, RM22, RM55, RM514, RM169, RM455 and RM210) were co-localized with reported QTL for NUE and related traits (Supplementary Table S6).
Insilico analysis of identified genomic regions from association mapping. From the analysis of five Mb region spanning each of the associated 12 SSR markers in the rice genome sequence, 103 to 277 candidate genes with an average of 166.5 were noted and a minimum of two candidate genes directly related to N metabolism were observed (Supplementary Table S7).
Local Linkage map construction of four genomic regions based on map based cloning approach and QTL analyses. Out of 12 genomic regions identified to be associated with traits of interest under low N, four genomic regions viz., 0 to 5 Mb (RM507), 5.1 to 9.5 Mb (RM169) of chromosome 5, 24.7 to 29.5 Mb (RM495-RM118) of chromosome 7 and 1 to 6 Mb (RM271) of chromosome 10 were selected based on the number of the associations in the present study and reported QTL. Following the map based cloning strategy, SSR markers in these four regions were surveyed for their polymorphism between the parents of existing RIL mapping population of BPT5204 (a high yielding variety popular for its quality) and Varadhan (a variety with high yield under low N) 19 . Seven polymorphic SSRs in the region of RM507 out of 23 SSRs, six SSRs spanning RM169 of chromosome 5 out of 19 SSRs, eight SSRs in the region of RM455 of chromosome 7 out of 32 SSRs and seven SSRs spanning RM271 out of 14 SSRs were used for local linkage map construction of BPT5204/Varadhan. QTL analyses for yield under low N showed a major QTL qGYLN7, explaining large phenotypic variation (PV) (~20%) located between RM1365 and RM8044 of chromosome 7 (RM455) and another QTL, qGYLN5-1 (~15% PV) associated with RM18076 and RM2998 of chromosome 5 (RM169). Two minor QTL (~10% PV) viz., qGYLN5-2 between RM6300 and RM413 of chromosome 5 (RM507) and qGYLN10 between RM5348 and RM6207 of chromosome 10 (RM271) were also identified ( Fig. 3 Table 8).

) (Supplementary
Expression analysis of candidate genes in the associated genomic regions with yield under low N. As per the map based cloning strategy, putative candidate genes were listed using the rice genome sequence in the marker delimited region of identified QTL. For expression analysis, two novel genomic regions  Table 3. The magnitude of reduction of panicle components under low N in comparison to recommended N. and two genomic regions associated with reported QTL were targeted. In qGYLN7, 25 putative genes were found in 405.5 kb region. Spanning 899.2 kb region, 34 putative genes were listed in qGYLN5-1. In qGYLN5-2, 92 putative genes were found in 1.5 Mb region and 52 putative genes were identified in 1.5 Mb region of qGYLN10 (Supplementary Table S8). Candidate genes associated with N metabolism in the genomic region were only considered for expression analyses viz., two in qGYLN7, three in qGYLN5-1, four in qGYLN5-2 and two in qGYLN10 in the present study. After preliminary screening of differential expression of selected 11 candidate genes associated with N metabolism between 12 genotypes with contrasting yield under low N, three genes showed significant differential expression. Further analyses of these three genes in 36 genotypes with higher yield and 12 genotypes with lower yield under low N showed significant increased fold changes of expression for 2-oxoglutarate/ malate translocator (Os05g0208000) in qGYLN5-1, alanine aminotransferase (Os07g0617800) in qGYLN7 and pyridoxal phosphate-dependent transferase (Os10g0189600) in qGYLN10. The fold changes of 12 genotypes with higher yield under low N was shown in Fig. 4a-c and the fold changes of an additional set of 24 genotypes with higher yield was presented in Supplementary Table S9. For the remaining eight genes, clear differential expression between the genotypes with differential yield was not observed (Supplementary Table S10). Though QTL was identified between RM6300 and RM413, differential expression was not observed for the selected candidate genes between the genotypes with relative high yield and low yield under low N.  Table S11). On evaluation in three locations, these two lines have shown relative high yield under 50 kg N ha −1 over parents (Supplementary Table S12).

Discussion
Landraces of rice, the genotypes selected by farmers over time for their grain yield and traits like quality and stress tolerance grown mostly under low or minimum inputs likely to harbour the trait of resource use efficiency, especially the NUE. True to our premise, we could identify >100 genotypes with promising yield under low N in our study with efficient utilization of absorbed N. The wide and variable response to low N in the present study confirms the reported variability of the differential N metabolism capabilities of the individual genotypes. With the utilization of the large number of gentoypes in the present study, we could pin point the traits and mechanism viz., grains on secondary branches and partitioning of N into the reproductive parts to be critical for yield under low N.
Variations between wet and dry seasons and their correlations can be attributed to difference of the light, sunshine hours, humidity, rainfall and temperature. The wet season appears to be more vulnerable for low N conditions for plant height, whereas, the tillers, productive tillers and panicle number were relatively reduced more in  20 . Thus the limited translocation of N from culms to leaves leads to the reduction of photosynthates, therefore decrease the translocation of nutrients to developing panicles, thereby reducing the total biomass and yield. Landraces have general tendency for vigorous vegetative growth, thus several genotypes in the present study were identified with high PNUE indicating their conversion efficiency of the absorbed N into total dry matter, mostly the straw. Genotypes were also identified with high internal use efficiency reflecting the efficiency of assimilated N for grain yield production through the differences in internal N requirements of the individual genotypes for expansion growth, mass accumulation and organ formation 21 . The variability of IE across the genotypes may also be due to the differential rate of radial and axial flow of total N, its distribution to sources and sinks (leaves or roots or panicles) at different stages, remobilization from older organs to active tissues, flag leaf N import/export, leaf senescence patterns, efficiency in converting CO 2 to carbohydrate and retention of N in the straw at maturity 22,23 .
While expected correlations were obtained with grain yield and total N uptake in the analyses of genotypes as high, moderate and poor yielders, an interesting observation was the high yielders across the season efficiently partition their N into grain yield, whereas the low yielders convert their N for increasing their biomass. In our study, we could identify several genotypes with high NHI% (>70) with higher conversion efficiency. NHI was reported to be around 60% and suggested to be genotype specific in the earlier studies 6,23 . Some land races with higher IE and NHI lodged because of their height and weak culm, thus those landraces were identified only as donors of high IE and NHI. Crosses are being made to get the favourable recombinants using the landraces with undesirable plant type, but positive alleles for NUE.
Distribution of spikelets and grains across the panicle showed the importance of contribution of spikelets and grains on secondary branches of the panicle towards the total grain number under low N. Maximum reduction for the number of spikelets and grains on secondary branches of lower portion under low N suggested the limitation of assimilates for the production of the spikelets and their filling on secondary branches of lower portion under low N. In fact, yield increase in the high yielding varieties of rice under green revolution was credited to the increase of the grain as well as spikelet number on secondary branches which is dependent on the N fertilization 24,25 . Under resource limited conditions, the plant tries to survive and propagate itself by producing the necessary grains, thus grains were mostly observed on primary branches. Even, the grain filling % found to be better under low N owing to the propensity of the panicle for filling of the available spikelets under resource limited conditions. Because of the wide genetic variability of the material of the present study, genotypes with 235 grains per panicle from with maximum number of grains on the secondary branches of lower portion (100) under low N were identified. From the screening of 472 genotypes for the agro-morphological, yield and N related traits and their correlations, we suggest that grains on secondary branches, total N uptake and grain yield to be the selection criterion for promising genotypes under low N.
Association mapping takes advantage of historic linkage disequilibrium i.e., the random association of alleles at different loci existing in the germplasm for linking to the phenotypic characteristics, thus in the present study, LD was exploited to identify the genetic associations with yield and related components using SSR markers under low N [13][14][15][16][17][18] . Many association mapping studies in rice have identified genomic regions associated with germination to HI and also various abiotic and biotic stresses using SSR markers 13,14,[26][27][28][29] . So far ~192 QTL through biparental mapping populations and eight genomic regions through association mapping have been identified for N related traits under low N 15 . The five novel genomic regions identified for yield under low N despite several reported genomic regions suggests the potential of landraces as new genetic resources for identifying new genomic regions and genes for NUE in rice. The interesting observations for panicle traits were the association of different markers for spikelets and grains on primary and secondary branches of lower and upper branches of the panicle and their filling suggesting the possibility of the pyramiding of the favourable alleles of the markers for yield across the panicle under low N. Another interesting observation is the involvement of a single marker with several panicle traits under low N implying its utility as marker and scope for finding the associated candidate gene linked to the marker.
As followed in map based cloning/positional strategy, the differential expression was observed for 2-oxoglutarate/malate translocator (Os05g0208000), alanine aminotransferase (Os07g0617800) from the co-localized QTL and pyridoxal phosphate-dependent transferase (Os10g0189600) in the novel genomic region. The role of alanine aminotransferase in NUE of rice has already been demonstrated by genetic engineering of alanine aminotransferase with tissue specific expression 30 . The enhanced expression of 2-oxoglutarate/malate translocator gene and its association with yield under low N through the identified QTL is being reported for the first time in the present study. The role of 2-oxoglutarate in GS/GOGAT pathway and cell carbon/nitrogen status is known and 2-oxoglutarate is now being considered as master regulator metabolite 31,32 . Pyridoxal phosphate-dependent enzymes are reported to be primarily involved in the biosynthesis of amino acids and amino acid-derived metabolites 33 . The enhanced expression of the 2-oxoglutarate/malate translocator and pyridoxal phosphate-dependent transferase in genotypes with promising yield under low N need further validation. We have validated only the genes with known function related to nitrogen metabolism from the identified QTL in the present study and the role of all the other genes present in the identified genomic regions is being attempted through further studies. The differential expression of the three candidate genes in the high and poor yielders under low N indirectly confirms the genomic regions identified through association mapping for NUE in rice.
The outcome of present association mapping study was deployed for MAS using associated SSR marker alleles of alanine aminotransferase, 2-oxoglutarate/malate translocator and pyridoxal phosphate-dependent transferase among RIL. Two RILs of Varadhan/ BPT 5204 have shown relative higher yield confirming the efficacy of the identified genomic regions for yield in rice under 50 kg N ha −1 .
In conclusion, after evaluation of substantial number of genotypes under low N, we have identified >100 rice landraces with relative higher yield under low N as donors in our study and characterized for the higher total N uptake, N translocation into grains and grain yield under low N. Grains on secondary branches, total N uptake and grain yield appears to be the selection criterion under low N. Through association mapping, we could show that use of a minimum set of SSR markers could identify the genomic regions associated with the traits of interest and fine mapped them using map based cloning strategy. As a proof of concept, two novel regions and two regions co-localized with reported QTL were selected, local linkage map constructed with polymorphic SSR markers and QTL for yield under low N were identified in the existing mapping population. Taking the putative candidate genes from the marker de-limited regions, enhanced expression of the three candidate genes was studied in the landraces with promising yield under low N for confirmation. The identified genomic regions were selected using SSR markers in the existing cross made with a popular variety with a check variety carrying the positive alleles and the breeding lines were evaluated across three locations for their performance under low N and two promising lines for yield were identified under low N. The study brings out the possibility of use of landraces as source for NUE by identifying the donors and native genes for yield under low N.  Table S14). The experiment was conducted in a split plot design, without N application and with N application as main plots and genotypes as subplots in three replications and the fertilizer applications were followed as per earlier studies 4,19 . Nitrogen fertilizer @ 100 kg ha −1 was supplied in the form of urea (46.5%) in three equal split applications to the recommended N treatment (at basal, maximum tillering and panicle initiation stages). Phosphorus (@40 kg ha −1 ), potassium (@40 kg ha −1 ) and zinc (@25 kg ha −1 ) were applied to both plots.

Methods
One month old seedlings of 472 rice genotypes were transplanted at a spacing of 10 × 20 cm (Supplementary  Table S13). From each line, five representative plants were harvested at maturity and were divided into vegetative and reproductive parts, dried and weighed for determining dry matter of various plant parts. Grain and straw yield was adjusted to 14% grain moisture content and expressed in kg ha −1 . Straw and grain samples were analyzed for N with Kjeldahl method. A total of nine traits/parameters were recorded for morphological, yield and nitrogen content in low and recommended N conditions for wet and dry seasons (Supplementary Table S15). Analysis of the variation for the nine traits/parameters across the genotypes grouped as high yielders (>3000 kg ha −1 -WS) (>2500 kg ha −1 -DS), moderate yielders (>1500 kg ha −1 ) and low yielders (<1500 kg ha −1 ). For a random subset of 206 genotypes of first set of germplasm, three panicles from plant were collected and characterized for panicle length (from base to the tip) and filled and unfilled spikelets on primary and secondary branches(Supplementary Table S13). The distal half of the panicle was taken as upper half and proximal half was taken as lower portion of the panicle. The grain filling percentage was calculated based on filled spikelets to the total number of spikelets (Supplementary Table S15). The second set of 218 RILs along with the two parents was grown during wet season (Kharif) 2015 at low and recommended N plots and data for grain yield was collected.

Data analysis. NUE indicators have been calculated for low and recommended N individually viz.,
Physiological Nitrogen Use Efficiency (PNUE) 25 , Internal Efficiency 34 and Nitrogen Harvest Index (%) 23 (Supplementary Table S15). Two way analysis of variance (ANOVA) and correlations were performed using an open source software R (R Core Team, 2012) with agricolae package. Multiple correlations were performed for the nine traits/parameters across the genotypes grouped as high yielders (>3000 kg ha −1 -WS) (>2500 kg ha −1 -DS) and low yielders (<1500 kg ha −1 ). Association mapping. Total genomic DNA from 472 genotypes and mapping population was isolated using modified protocol 24 . A total of 50 rice microsatellite markers based on simple sequence repeats (SSRs) across the 12 chromosomes were selected for association mapping analysis (Supplementary Table S8). All primer sequences of microsatellites were obtained from http://www.gramene.org/. For amplification of the microsatellite markers, PCR was carried out in Thermal Cycler (Bio-Rad C1000) using 40 ng template DNA, 0.125 mM dNTPs each, 2.5 p moles of forward and reverse primers, 1 U Taq polymerase (Bangalore Genei, India) and 1× buffer (Bangalore Genei, India) in a total volume of 10 µl reaction. The cycling conditions included initial denaturation at 94 °C for 4 min, followed by 30 sec at 94 °C, 30 sec at 55 °C, and 1 min at 72 °C for 35 cycles, followed by a final extension at 72 °C for 7 min. Amplified products were resolved in 8% polyacrylamide gel using mini vertical polyacrylamide gel electrophoresis (PAGE) (CBS Scientific Co. Inc., USA).
Scientific RepoRts | (2018) 8:9200 | DOI:10.1038/s41598-018-27484-0 Allelic diversity, population structure and association. The number of alleles, gene diversity (GD), and polymorphism information content (PIC) per locus were calculated with the PowerMarker 3.25 35 . The LD coefficient r 2 and D′ of all markers in pairs were evaluated using the software TASSEL 2.1. The LD among SSR markers was determined as per distances indicated in Cornell SSR 2001 map (www.gramene.org). To assess the genetic structure in 472 genotypes, both model and distance based approaches were used. For model based approach, 20 independent simulations were run for each K (from 1 to 10) with burn in length of 100,000 and a model for admixture and correlated allelic frequencies using Structure 2.3.4 program 36 . To determine the K value, the LnP(D) value in the Structure output and Evanno's Δ between successive K were used 37 . Based on highest ΔK of the data K = 4 clusters of genotypes, Q matrix was calculated. The hierarchical distribution of the molecular variance (AMOVA) within and between subgroupss defined by Structure and pairwise Wright's fixation index (F ST ) values were assessed using Arlequin ver.3.11 (cmpg.unibe.ch/software/arlequin3/). For distance based approach, an unweighted neighbor-joining (NJ) tree was constructed based on dissilimilarity matrix using a shared allele index with DARwin 5.0 (http://darwin.cirad.fr/darwin).
Association between SSR markers and phenotypic traits were analyzed by TASSEL 3.0. For each SSR locus, rare alleles in the population (defined as alleles with frequency <5%) were treated as null alleles. Two statistical models (GLM and MLM) were used to evaluate the effects of population structure (Q) and relative kinship (K) for marker-trait associations. The P value (0.05) was used to identify statistically significant loci. A false discovery rate (FDR) of 0.05 was used as a threshold for significant associations as per Benjamini and Hochberg (1995) correction method 38 . The allelic effect of each locus (average allelic effect AAE) associated with a given trait was estimated by comparing the mean phenotypic data for that trait with respect to each allele to the phenotypic data of the null allele (a i = phenotypic effect of the allele i = ∑x ij /n i − ∑N k /n k ; x ij = phenotypic measurement values of j variety carrying the allele i; n i = the number of materials carrying the allele of i, N k = phenotypic value of the variety of k carrying null allele, n k = the number of materials for the null allele 38 . The four subgroups were analyzed for the marker trait association analysis with phenotypic variance >1% and AAE > 1% (positive or negative) to identify significant associated genomic regions.
Insilico analysis of identified genomic regions from association mapping. Five Mb region spanning each of the 12 associated SSR markers was analyzed for the reported putative candidate genes (http://www. ncbi.nlm.nih.gov/nuccore/JN193288.1) (Supplementary Table S7).
Mapping. Out of 12 significant associated genomic regions, four genomic regions were selected for validation based on their association with the traits in the present study and reported QTL (www.gramene.org). Based on the position of associated genomic regions around ~5 Mb was targeted for selection of SSR markers. A total of 88 SSR marks comprising 23 SSRs for RM507 and 19 SSRs of RM169 of chromosome 5, 32 SSRs for RM455-RM118 of chromosome 7 and 14 SSRs for RM271 of chromosome 10 were surveyed for polymorphism between the parents viz., BPT5204 and Varadhan (Supplementary Table S8). The polymorphic markers were screened in RIL population for local linkage map construction. From the genotypic and phenotypic data of RIL population, four local linkage maps constructed using Joinmap 4.1 and QTL for grain yield were identified using MapQTL 6 40 .
Panicle samples of first set of 12 genotypes along with two parents of the mapping population and a second set of 48 genotypes during booting stage were collected and immediately frozen in liquid nitrogen for RNA isolation. Total RNA was isolated using Trizol reagent (Invitrogen, USA) and the quality of the RNA was assessed using Nanodrop ® ND1000 spectrophotometer (Thermo Scientific, USA). RNA samples were treated with RNAse free DNAse (Invitrogen, USA). Approximately 1 µg of total RNA from each sample was used as template for the first-strand cDNA synthesis, using Superscript III reverse transcriptase (Invitrogen, USA). qRT-PCR was performed using Applied Biosystems 7500 Real Time PCR (Life Technologies, USA), in a final volume of 20 μl, containing 10 μl of Platinum ® SYBR ® Green qPCR Super Mix (Invitrogen, USA) with 500 nM each of forward and reverse primers and 20 ng of the cDNA samples. The real time PCR cycling conditions included a pre-incubation at 50 °C for 2 min and denaturation at 95 °C for 10 min followed by 40 cycles of denaturation at 95 °C for 15 s and annealing and extension at 60 °C for 1 min. qRT-PCR was performed as three biological replicates. Samples were run in duplicates on the same plate along with controls set up for each sample in duplicate using 18 s RNA gene for normalization of gene expression. The data were analyzed using the 7500 Sequence Detection Software (Applied Biosystems, USA) with default baseline and threshold. The relative expression levels of genes were calculated using the 2 −∆CT∆CT method, which represents the difference of CT between the control products and the target gene products. Recommended N situation was taken as control and low N was considered as treated. Initially all the primers were checked for the differentially expression in 12 genotypes and the three genes differentially expressed in the preliminary analyses were screened in 48 genotypes. Development of BPT5204 with positive alleles from Varadhan. Based on the markers associated with three candidate genes viz., RM18076 and RM2998 for 2-oxoglutarate/malate translocator, RM1365 and RM8044 for alanine aminotransferase and RM5348 and RM6207 for pyridoxal phosphate-dependent transferase, selections were made in the segregating population of BPT5204/Varadhan applying MAS and two stable lines (F 6 -7 ) were identified. Two of the promising lines (BPT5204/Varadhan/6 and BPT5204/Varadhan/10) were evaluated under trial entitled "Evaluation of Radiation and Nitrogen use efficient promising rice genotypes" under All Indian Coordinated Rice Improvement Program(AICRIP) at three locations across India with two N treatments (50 kg ha −1 and100 kg ha −1 ) for two years 40,41 .