Genome-wide DNA polymorphisms in low Phosphate tolerant and sensitive rice genotypes

Soil Phosphorus (P) deficiency is one of the major challenges to rice crop world-wide. Modern rice genotypes are highly P-responsive and rely on high input of P fertilizers. However, low P tolerant traditional cultivars and landraces have genetic potential to sustain well under low P. Identification of high resolution DNA polymorphisms (SNPs and InDels) in such contrasting genotypes is largely missing for low P response at gene levels. Here, we report high quality DNA polymorphisms in low P sensitive genotype, PB1 and tolerant traditional genotype, Dular. We performed whole genome resequencing using Illumina NGS platform and identified a total of 5,157,939 sequence variants in PB1 and Dular with reference to Nipponbare genome. We have identified approximately 2.3 million and 2.9 million high quality polymorphisms in PB1 and Dular, respectively, with an average read depth of ≥24X. We further mapped several DNA polymorphisms (non-synonymous and regulatory variants) having potential functional significance to key Phosphate Starvation Responsive (PSR) and root architecture genes in Dular and Kasalath using a compiled list of low P responsive genes. These identified variants can serve as a useful source of genetic variability for improving low P tolerance and root architecture of high yielding modern genotypes.

Scientific RepoRts | 5:13090 | DOi: 10.1038/srep13090 few desirable traits, are highly P responsive but have low tolerance to P deficiency. Therefore, identifying the genetic basis of low P tolerance mechanisms in traditional genotypes can pave the way for improving the elite rice cultivars for low P tolerance.
Rich genetic diversity exists among the traditional and modern cultivars for important agronomical traits including low P tolerance [11][12][13] . However, comprehensive phenotypic and genotypic screening of such diverse collections for elucidating the mechanisms of low P tolerance is required. Few studies have led to the identification of major and minor QTLs for low P tolerance by conventional genetic mapping approaches based on RFLP, AFLP and SSR markers in population generated from low P tolerant and susceptible genotypes [14][15][16] . However, utilising conventional markers is a time consuming and laborious process. With the availability of high quality rice genome sequence and advent of next generation DNA sequencing (NGS) technologies, it has now become easier to explore high level of genetic variability for low P tolerance at genome-wide scale by resequencing diverse rice genotypes. Whole genome re-sequencing studies have also been utilized in genes/QTLs identification, genetic mapping, genome diversification, evolutionary and phylogenetic analysis [17][18][19] . Apart from bridging the gap of genotype to phenotype, NGS has immense potential to unravel the functional complexity of rice genome and can enhance the pace of molecular breeding program to improve the beneficial traits.
Genetic diversity for low P tolerance in rice genotypes can be therefore, investigated using NGS-based whole genome resequencing of genotypes with contrasting low P tolerance/sensitive traits. Therefore, in the present study, we have carried out the whole genome re-sequencing of low P tolerant traditional genotype "Dular" 13,20 and sensitive modern genotype "PB1" using Illumina HiSeq ® NGS platform followed by their comparison with another low P tolerant (Kasalath) and sensitive (Nipponbare) genotypes. Further, we have identified numerous SNP and InDel markers in these genotypes at a genome-wide scale and evaluated functional significance of these SNPs by correlating their presence (structural and functional annotations) in PSR (Phosphate Starvation Response) genes. The genome-wide high resolution SNP and InDel markers discovered from contrasting rice genotypes for low P tolerance would accelerate the identification and functional validation of novel PSR genes for genomics-assisted crop improvement.

Results and Discussion
Growth behaviour of PB1 and Dular under different P regimes. Dular has been proven to be one of the most tolerant genotype under low P in field conditions 13 . We found similar trend in our experiments under low P media in hydroponic system (Fig. 1a). While PB1 showed a drastic reduction in shoot and root biomass; Dular still maintains higher biomass accumulation than PB1 under low P (1 μ M) (Figure 1a-c). Our analysis further revealed that PB1 is highly P responsive as it fosters its growth under increasing concentrations of Pi in media (Fig. 1b,c). However, Dular being a traditional genotype is able to sustain its growth under low P environment. Root hairs (RH) are one of the most important root traits for P uptake 5 . Increase in RH length in Dular (lesser in PB1) also reflects its better adaptation resources under low P (Fig. 1d). Similarly, Dular showed better performance in terms of growth parameters like tiller number, leaf width, root number, shoot biomass and root biomass in pot system under low P (Fig.S1). Analysis of total P content revealed significantly higher P demand of PB1 under P sufficient conditions (100 and 320 μ M Pi) as compared to Dular (Fig. 1e). But under P deficient condition (1 μ M Pi) Dular has relatively higher P content both in root and shoot tissues. All this information collectively showed that Dular is a 'P-efficient' genotype 21 whereas PB1 is a 'P-responsive' genotype 22,23 . This contrasting behaviour of PB1 and Dular towards P homeostasis prompted us to investigate nucleotide polymorphism in these genotypes at genome levels.
Whole genome resequencing of rice genotypes. Whole genome resequencing of Dular and PB1 revealed the occurrence of DNA polymorphisms at a genome-wide scale and their probable effect on differential low P tolerance in these genotypes. We obtained ~173 and 224 million sequence reads (average 101 bp reads) each in PB1 and Dular, respectively, with ~85% high quality sequences (Q30 passed quality score) ( Table 1). Overall ~87% of the total pre-processed sequence reads from PB1 and Dular mapped to the reference Nipponbare genome with an average mapping quality (MAPQ) of 32.0 ( Figure S2a) and average insert size of ~275-280 bp ( Figure S2b). The average read depth and coverage obtained for each chromosome was 35.31X and 94.53% for PB1 and 46.14X and 94.04% for Dular, respectively (Table S1). Overall 58% of the total reads passed the alignment filter's criteria (see methods). Total filtered reads were uniformly distributed across all the 12 rice chromosomes (Fig. 2). Approximately, 82% (306.1 Mb) of the reference genome of Nipponbare was covered by filtered reads of both genotypes with an average depth of 24X and 30.6X in PB1 and Dular, respectively (Table S2). Such a high coverage and read depth indicates the high quality assembly and sequencing data in accordance with earlier reports 24 . Discovery and validation of genome-wide sequence variants. We identified a total of 5,157,939 variants in low P sensitive PB1 and tolerant Dular with reference to low P sensitive Nipponbare reference genome (Table S3). Further, approximately 2.3 million high quality variants were mined in PB1, while 2.9 million variants were discovered in traditional low P tolerant genotype Dular. Out of which, 2,526,661 and 345,504 variants were high quality SNPs and InDels, respectively (Table S3). In silico validation of identified polymorphisms was performed with earlier SNP data (dbSNPs), which revealed ~26 to 28% SNPs are common between previous and present study. The remaining ~72% and 74% SNPs along with Scientific RepoRts | 5:13090 | DOi: 10.1038/srep13090 96% InDels discovered in each of PB1 and Dular as compared to Nipponbare are novel. Further, we validated the identified variants by Sanger-based amplicon sequencing of 50 randomly selected SNPs and InDels from both genotypes (Table S4). All the variants were successfully validated giving 100% experimental validation success rate. Majority of the variants (~87% of SNPs and ~92% of InDels) identified were of homozygous type. Majority of InDels identified in both genotypes had single base insertion or deletion. However, we also detected 7,779 and 10,524 insertions and 15,282 and 20,591 deletions of up to ≥ 9 nucleotides in PB1 and Dular, respectively ( Figure S3b). Thus, aus genotype Dular showed higher DNA polymorphisms with reference to japonica genotype, Nipponbare as compared to indica genotype PB1. This is in concordance with previous study 18 where highest numbers of SNPs were detected in aus rice genotype due to its higher evolutionary divergence from japonica than indica rice 18 .
The percentage of homozygous and heterozygous variants was almost similar in both the genotypes. More than 96% of all the identified variants have quality score ≥ 100 (Table S5a). About 62% of the reads in Dular and 40% of the reads in PB1 had a read depth ≥ 30, which ensures high quality and reliable variant calling (Table S5b). Therefore, heterozygous SNPs (~13%) in both of the rice genotypes are valid and non-erroneous. Similar level of heterozygous SNPs has also been estimated in previous genome-wide SNP discovery studies in diverse rice genotypes 18,24 . The total number of transitional changes (Ts) observed were more than twice as compared to transversion changes (Tv) in both PB1 and Dular ( Figure S3a). A higher Ts/Tv ratio of 2.4 indicates 'transition bias' . Transitions are usually favoured over transversions because transitions provide easy tolerance from selection pressure as they result into synonymous substitutions, which do not alter the conformational structures of protein unlike transversions 25   transitions affect the RNA secondary structure less severely than transversions. Therefore, transition bias has been frequently reported in rice and many other plant species [26][27][28] . However, in our study, the detected Ts/Tv ratio is higher than the previously reported Ts/Tv ratios in rice 24,28 . A higher Ts/Tv ratio is also suggestive of low level of genetic divergence. These ratios are expected to decline with increasing genetic distance between the comparative genotypes as in due course of time; transversions erase the record of frequent transitions 29 . Collectively, this suggests that resequencing of low P tolerant (Dular) and sensitive (PB1 and Nipponbare) rice genotypes are reliable and robust for utilization in large-scale genotyping applications.
Genomic distribution of variants. Variant distribution was not uniform across the chromosomes (Table S6, Fig. 3). We detected an average of 578 SNPs and 74 InDels in PB1 and 677 SNPs and 92 InDels in Dular per 100 kb (Table S6). In PB1, highest SNP density (666 SNPs/100 kb) was observed in chromosome 11, whereas it was maximum in Dular chromosome 10 (780 SNPs/100 kb). We found lowest SNP density on chromosome 2 (403 SNPs/100 kb) of PB1 and on chromosome 4 (516 SNPs/100kb) of Dular. Interestingly, Dular had a higher density of SNPs than PB1 across all 12 rice chromosomes which could be due to its higher phylogenomic divergence from japonica rice 18 . Further, we identified the high and low resolution SNP regions with a stringent criterion of ≥1000 SNPs/100 kb and ≤ 20 SNPs/100 kb, respectively 24 . Based on this, 281 SNP high and 79 low resolution genomic regions were identified in PB1. Similarly, 514 SNP high and 27 low resolution genomic regions were detected in Dular. Highest numbers of high resolution genomic regions (59) were identified in chromosome 6 of Dular. However in PB1, chromosome 11 possessed a maximum number of SNP high resolution regions (47). Such variation in chromosomal distribution of polymorphisms has been attributed to the selective sweep-based natural selection 30 and has been frequently reported in rice 27,31 . Structural and functional annotation of variants. Of the total variants, ~18% of the SNPs and ~21% of the InDels fell in the genic regions. In terms of genotypes, Dular had more number of variants in genic region (Table 2). Approximately, 27% of the genic SNPs and 10% of the genic InDels lied in coding region in both genotypes. This frequency of variants in coding is higher than earlier reports 24,28,31 . A large number of polymorphisms were also detected in up and down stream regulatory regions ( Table 2). These polymorphisms in regulatory regions might be functionally relevant and can regulate the functions of associated genes 32 . It is noteworthy that many UTR-based regulatory polymorphisms have also been successfully employed to generate potential trait-associated markers in plants 33,34 . Interestingly, majority of the genic variants were intronic. Approximately, 52% of genic SNPs were identified in intronic regions. SNPs in intronic regions may reside at splice junctions 35 or in intron splice enhancer or silencer elements 36 . Thus, such intronic SNPs may potentially change the fate of splicing and can alter protein structure and function 37,38 . Intronic SNPs may also shift the expression dynamics of crucial genes or regulate the transcript level of these genes by changing the binding sites of miRNAs which sometime reside in the intronic regions 39 .
All coding variants were classified as silent, missense, nonsense, startloss, stoploss, inframe and frameshift polymorphisms to dissect their functional relevance (Table S7). The number of identified non-synonymous polymorphisms (~50%) was higher than the synonymous polymorphisms (44%) in both genotypes. Nonsynonymous or missense variants can lead to drastic phenotypic consequences which could be either deleterious or beneficial 40,41 . Deleterious polymorphisms are mostly eliminated from the population through negative selection whereas beneficial polymorphisms could be fixed in the population leading to differential responses of cultivars towards biotic and abiotic stresses 42 .

Structural and functional annotation of variants between PB1 and Dular.
To extract the potential polymorphisms responsible for low P tolerance in Dular, we identified a total of 2,442,979 polymorphisms (SNPs and InDels) directly between Dular and PB1 (Table S8). Among these, 958,296 variants (SNPs and InDels) were specifically present in PB1 with respect to Nipponbare. Remaining 1,484,683 variants were absent in low P sensitive genotype PB1. It is noteworthy that the reference genome, Nipponbare is also a low P sensitive genotype 13,16,43,44 . Of the total polymorphisms between Dular and PB1, 86% were SNPs and remaining ~14% were InDels.
Structural annotation of polymorphisms between Dular and PB1 revealed that approximately 85% and 15% of total were intergenic and genic, respectively (Fig. 4a,b). Further, among the genic polymorphisms, we detected 1,73,842 SNPs and 31,112 InDels in exonic regions. A relatively higher percentage of genic variants comprising 52% SNPs and 62% InDels were intronic, whereas 27% SNPs and 13% InDels were detected in coding regions. Many InDels were also identified in regulatory regions. Functional annotation of these variants revealed 42,124 missense and 35,738 silent variants (Fig. 4c).
We further mapped these variants to 23,218 rice genes and assigned them GO classes (Fig. 4d). Variants were falling in various functional classes, including cell wall organisation, primary and secondary metabolisms, signalling, stress pathways, regulatory pathway and other miscellaneous classes. Importantly, the energy metabolism pathways remain largely unaffected due to their vital roles for sustaining life. These genotypes differ in their response to environmental stresses as also reflected by high number of variants in "stress" genes. Whether these variants also determine their behaviour under low P would be a subject of future research. Variants in low P responsive genes. Mechanisms for low P tolerance are orchestrated by a set of PSR genes. We mapped all the variants between Dular and PB1 on these selected PSR (2152) genes (see methods). We found 1,731 PSR genes containing 18,872 SNPs and 4,271 InDels between Dular and PB1 (Table S9). To deduce more promising SNP/InDel variants potentially associated with low P tolerance, we compared all variants between Dular and PB1, underlying PSR genes with another low P tolerant aus genotype, Kasalath variants 45 . Interestingly, we found 30% variants identical in Dular and Kasalath (Table  S10). This could be due to the same 'aus' group of both Dular and Kasalath as well as low P tolerance of both genotypes. These common variants in Dular and Kasalath can be further validated in natural and mapping population to establish their potential association with low P tolerance. Moreover, to address the question of differential genotypic behaviour of low P sensitive and tolerant cultivars, we also identified variants of PSR genes between two low P sensitive genotypes (Nipponbare and PB1) and two low P tolerant genotypes (Dular and Kasalath). From this analysis, we found 930 PSR genes containing 5,948 SNPs and 923 InDels in sensitive vs tolerant group (Table S10). However, to discern the differences between contrasting genotypes for low P tolerance, we annotated all such PSR gene variants. We found 2,481 exonic SNPs underlying PSR genes (Fig. 5a). We detected 1211 and 1270 PSR SNPs in CDS and regulatory regions (UTR), respectively. Thus, variation in PSR gene polymorphisms between sensitive and tolerant genotypes further strengthens the role of these genes in their differential low P response.
In order to explore the potential use of these SNP variants (sensitive vs tolerant genotypes) in breeding programmes, we also identified 149 restriction sites disrupted by these SNPs in exonic regions of PSR genes (Table S10). These 149 SNPs falling in 116 PSR genes can be converted into allele-specific PCR based marker system for their validation and genotyping in a larger set of germplasm lines and bi-parental mapping population. We have successfully verified and converted five of these randomly selected SNP loci into CAPS (Cleavage Amplified Polymorphic Sequences) markers by use of a common set of restriction enzymes in Dular, PB1, Nipponbare and Kasalath genotypes (Fig. 6). Conversion of SNPs into CAPS markers can be an efficient and cost effective system for large scale genotyping of SNPs in natural and mapping population to expedite marker-assisted selection in developing high yielding low P tolerant cultivars 46 . The validation of SNPs with CAPS markers further suggests the reliability and robustness of our SNPs discovery through whole genome resequencing.

Functional relevance of sequence variants in PSR genes between tolerant and sensitive genotypes.
To further, dissect the significance of identified PSR variants between sensitive (PB1 and Nipponbare) vs tolerant (Dular and Kasalath) group, we assigned them functional classes. We identified 564 synonymous and 465 nonsynonymous substitutions in PSR genes containing variants. There were 259 PSR genes associated with nonsynonymous substitutions (Fig. 5b). Some genes like peptidyl-prolyl isomerase, protein kinases, zinc-finger family protein, glucan endo-1,3-beta-glucosidase possessed 8 to 16 missense variants (Table S10)  in our analysis of missense variants. About 34 kinases contained nonsynonymous substitutions between sensitive vs tolerant group. Besides this, we detected 25 large effect variants in 21 PSR genes. These large effect variants include loss of stop codon, introduction of premature stop codon, loss of start codon and frameshifts in coding frame leading to generation of non-functional protein products 18,47 . We further explored the identified PSR gene variants for their presence in SNP high resolution genomic region. Our analysis revealed about 28 PSR genes (Table S11) containing SNPs encompassed SNP high resolution genomic regions. These genes include ABC transporter, Ser/Thr phosphatase, glycosyl hydrolase, WD domain containing protein and phosphatidylinositol 3-kinase. Interestingly, these families of genes are significantly known to be involved in PSR.

Variants in phosphatases and genes involved in lipid metabolism. Phosphatases play vital roles
in signaling and internal P utilization under P deficiency. We found 7 missense SNPs between sensitive vs tolerant genotypes in 5 different phosphatases ( Table 3, Table S10). It is noteworthy that protein phosphatase 2C (Os02g0471500) contained 3 missense SNPs and one InDel. Besides phosphatases, variants were also identified in lipid metabolism genes. We found a missense SNP in glycerophosphoryl diester phosphodiesterase gene (Os08g0535700). This PSR gene is involved in release of sn-glycerol 3-phosphate under P starvation. SNP in CDS region of this biomarker gene can change the fate of this GDPD under P deficiency. Moreover, a MGD2 (monogalacatosyl diacyl glycerol synthase) gene showed one InDel. This PSR gene is involved in galactolipids synthesis under P deficiency [48][49][50] . Further investigation of these genes can shed light on differential role of this variant in galactolipids accumulation under P deficiency. Variants in genes involved in carbohydrate metabolism. Carbohydrate metabolism genes alter the rate of P consumption under its deficiency through switching the alternate glycolytic pathway which utilises lesser amount of P 51 . We found five missense SNPs in some key carbohydrate metabolic genes. Interestingly, we identified SNPs in pyrophosphate-fructose 6-phosphate 1-phosphotransferase gene (Os02g0714200), which use a diphosphate and convert this diphosphate to monophosphate. Moreover, we detected higher number of variants associated with another carbohydrate metabolic PSR gene, phosphoenolpyruvate carboxylase (Os01g0208700) which showed 33 intronic SNPs. This gene plays crucial role in releasing phosphate during carbon fixation.

Variants in genes encoding membrane transporters.
Here, we identified 20 missense SNPs between low P sensitive and tolerant genotypes in various membrane transporters. It is noteworthy that only low P sensitive genotype PB1 showed accumulation of 28 SNPs in 5′ UTR and one missense SNP in coding region of inorganic P transporter gene (Os04g0186400) with respect to all other 3 genotypes (Table S9). Recently same gene (OsPht1;4) has been found to be highly expressing in root tissues and essential for phosphate acquisition under P deficiency 52 . Since PB1 is a high yielding variety which being P responsive demands higher accumulation of P as compared to low P tolerant traditional genotypes Dular and Kasalath, investigating this transporter in diverse genotypes would help understand the role of these variants affecting its transcriptional regulation under low P.
Variants in genes involved in RSA. RSA modulation is a key strategy to alleviate the P deficiency by increasing top soil foraging 5 . Traditional and modern genotypes differ significantly in their RSA under P deficiency. Higher density of root hairs, increased lateral root length and density contribute significantly to improve the P deficiency by enhancing P uptake from soil solution to plant cells 53 . We identified 7 missense variants in PSR genes involved in RSA modulation like lateral root development (Table S10). It is noteworthy here that Dular responds to low P by altering its root architecture while sensitive genotype failed to do so. PSTOL1 gene is responsible for RSA alteration in low P tolerant traditional genotype Kasalath 9 . However, we found PSTOL1 locus between Dular and PB1 without any nucleotide change; therefore, variants identified here become more relevant to investigate the novel PSR candidates. In conclusion, we have identified many DNA polymorphisms in key PSR genes in low P tolerant and sensitive genotypes. Dular and Kasalath are highly tolerant to low P due to efficient root system architecture. Given this, variants identified here can serve as an important resource to improve root system architecture of elite rice cultivars for low P tolerance. Large number of sequence variants were detected between tolerant (Dular and Kasalath) and sensitive (PB1 and Nipponbare) genotypes; and also in PSR genes, probably due to their divergence from japonica and indica group. However, partly this could also be due to their unique adaptation to marginal soil. A larger population screen is needed to investigate this notion.

Methods
Plant material and phenotyping under low P environment. Seeds of PB1 and Dular genotypes were surface-sterilized with 0.1% mercuric chloride containing few drops of Triton X-100 for 15 minutes. Seeds were then washed 5-7 times with sterile H 2 O to wash off detergent. Sterilized seeds were germinated on wet filter paper for two days in dark. Uniformly germinated seedlings were transferred to different concentrations of Pi (1, 100 and 320 μ M of NaH 2 PO 4 ) in Yoshida medium with iron supplemented as FeNaEDTA. Seedlings of both genotypes were raised in separate containers in growth chamber with 16 h day (30 °C)/8 h night (28 °C) photoperiod, 250-300 μ M photons/m 2 /sec photon density and ∼ 70% relative humidity. 30 seedlings per genotype were raised in 15 litres containers filled with nutrient solution. Nutrient medium was changed every day and pH was maintained strictly around 5.5. Root and shoot biomass was recorded after 15 days of growth in hydroponics. For measurement of total P content, root and shoot tissue of both genotypes was oven dried at 80 °C till constant weight. Dried plant material was weighed and subjected to ashing at 550 °C for 5 hrs in a muffle furnace. Total P content was measured colorimetrically by Ammonium Vanadate-Ammonium Molybdate yellow method as described 54 .
For analysis of root hair, excised embryos of PB1 and Dular seeds were surface sterilized and germinated in dark on MS 55 medium containing 0.2% phytagel (Sigma P8169). Only embryo part was used to minimize the intrinsic variability of different seed vigour. After two days, uniformly germinated seedlings were transferred to MS medium containing 0, 100, 320 μ M of KH 2 PO 4 with 0.2% (w/v) phytagel. In -P medium, KH 2 PO 4 was replaced with equimolar concentration of K 2 SO 4 . After 5 days of growth, root images were analysed in ImageJ 1.46r (http://imagej.nih.gov/ij) for root hair length measurement. Seeds were also sown in pots filled with a mixture of sand and perlite in order to study the effect of Pi deficiency in soil-grown plants. Plants were kept in a field under natural conditions and irrigated with Yoshida medium carrying low and high Pi for six weeks.
Extraction and sequencing of genomic DNA. Genomic DNA (gDNA) was extracted from leaf tissue using Qiagen DNeasy kit (Qiagen USA) as per manufacturer's instructions. Integrity of gDNA was assessed by Bioanalyzer 2100 (Agilent Technologies, Singapore). gDNA sample preparation for sequencing was done using Illumina TruSeq DNA sample preparation kit (Illumina, USA). Briefly, one μ g high quality gDNA was subjected to fragmentation by Covaris shearing to obtain a final library of 300 to 400 bp average insert size. Sheared DNA was then subjected to end repairing, adenylation of 3′ ends followed by ligation of indexed paired end adapters. Ligated products falling in size range of 400-500 bp were purified using Qiagen MinElute gel extraction kit and selectively enriched by PCR. The libraries so generated were validated for right fragment size, concentration and pooled as per manufactures' instructions. Finally, libraries were sequenced by Illumina HiSeq 2000 NGS platform.
Read alignment and filtering. Read quality check and alignment were performed according to standard Illumina analysis pipeline. Low quality sequence reads were excluded from further analysis and only high quality sequences with mapping quality (MAPQ) ≥ 20 were retained for read alignment. Before performing the alignment, raw reads were first trimmed based on base quality, base composition, and adapter sequence. After trimming, reads < 30 bp was removed from further analysis. The trimmed reads were aligned to the reference genome, Nipponbare (IRGSP-1.0 pseudomolecule/MSU7). Alignment was performed using BWA program with -q20 parameter to trim the low quality portion of the read. The aligned reads were first sorted using Picard tool and SortSam subsequently. Duplicate reads were removed using Picard Mark Duplicates command. After removing duplicate reads the reads were realigned to reference genome.

Variant identification and annotation. After performing realignment, Genome Analysis
TKLite-2.3-9 toolkit Unified Genotyper 56 was used to identify single nucleotide polymorphisms (SNPs) and short InDels. After calling, variants were further filtered for read depth and quality score parameters in order to retain the good quality variants. Variants with stringent criteria of read depth ≥ 10 and quality  score ≥ 50 at corresponding regions in both the genotypes were considered for further analysis. Genomic distribution of variants (SNPs/InDels) was analysed in 100 kb sliding window size across all rice chromosomes in PB1 and Dular. Genomic distribution of SNPs/InDels/100 kb was represented pictorially using Circos program 57 . The identified variants were annotated using VariMAT program. The gene models used for annotation were downloaded from plant Ensembl database. The VariMAT (SciGenome, India) programme was used for genic and intergenic, exon, intron, 5′ UTR, 3′ UTR, coding-region, splice-site annotation of SNPs and InDels. The same programme was also used for variant class prediction (silent, missense, non-sense, stop-loss, start-loss, inframe, frameshift) and mapping of variants to all transcript form of the gene.

Validation of SNPs and InDels.
A total of 50 SNPs/InDels in Dular were randomly selected from all 12 rice chromosomes. Primers were designed from ~150 bp flanking region of each variant. Approximately, 300 bp amplicons for each variant were amplified by PCR from both PB1 and Dular genomic DNA. The amplified PCR products were sequenced using automated AB1 3730xl DNA Analyzer.

Development and validation of CAPS marker.
About 500 bp flanking region of each Dular-specific SNP variant underlying PSR genes (identical to Kasalath) were retrieved from Rice Genome Browser (http://rice.plantbiology.msu.edu/cgi-bin/gbrowse/rice). These sequences were searched and annotated for the presence of restriction sites altered by the introduction of SNP at specific genomic position employing NEB cutter (http://nc2.neb.com/NEBcutter2/). The primer pairs (Fig. 6) amplifying 800-1000 bp fragments possessing the restriction sites alleles with SNPs were designed and amplified using genomic DNA of Dular, PB1, Nipponbare and Kasalath genotypes. PCR amplicons were digested with specific predicted restriction endonuclease as per manufacturer's protocol (New England BioLabs, Inc). Polymorphisms were analysed through agarose gel electrophoresis.

Identification and annotation of variants within PSR genes.
For elucidating the functional relevance of variants in context of low P tolerance, a selected set of PSR genes, previously reported in five independent transcriptome studies 43,44,58-60 under P deficiency in different rice genotypes, were downloaded. A total of 2,152 differentially expressing unique PSR genes present in at least ≥2 transcriptomic studies were identified and selected for further analysis. All Dular vs PB1 variants present on these selected PSR genes were structurally and functionally annotated. Variants were also compared between low P sensitive genotypes (PB1 and Nipponbare) and low P tolerant genotypes (Dular and Kasalath) by utilizing publically accessible Kasalath genome sequence 45 .