Multi-trait association study identifies loci associated with tolerance of low phosphorus in Oryza sativa and its wild relatives

We studied variation in adaptive traits and genetic association to understand the low P responses, including the symbiotic association of arbuscular mycorrhizal (AM) fungal colonization in Oryza species (O. sativa, O. nivara, and O. rufipogon). In the present experiment, we performed the phenotypic variability of the morphometric and geometric traits for P deficiency tolerance and conducted the association studies in GLM and MLM methods. A positive association between the geometric trait of the top-view area and root traits suggested the possibility of exploring a non-destructive approach in screening genotypes under low P. The AMOVA revealed a higher proportion of variation among the individuals as they belonged to different species of Oryza and the NM value was 2.0, indicating possible gene flow between populations. A sub-cluster with superior-performing accessions had a higher proportion of landraces (42.85%), and O. rufipogon (33.3%) was differentiated by four Pup1-specific markers. Association mapping identified seven notable markers (RM259, RM297, RM30, RM6966, RM242, RM184, and PAP1) and six potential genotypes (IC459373, Chakhao Aumbi, AC100219, AC100062, Sekri, and Kumbhi Phou), which will be helpful in the marker-assisted breeding to improve rice for P-deprived condition. In addition, total root surface area becomes a single major trait that helps in P uptake under deficit P up to 33% than mycorrhizal colonization. Further, the phenotypic analysis of the morphometric and geometric trait variations and their interactions provides excellent potential for selecting donors for improving P-use efficiency. The identified potential candidate genes and markers offered new insights into our understanding of the molecular and physiological mechanisms driving PUE and improving grain yield under low-P conditions.

) and maximum root length (max root lth); (c) leaf width (leaf wd) and root dry weight (RDW); (d) stem thickness and root P content (R_P); (e) shoot dry weight (SDW) and root-shoot dry weight ratio (R/S_DW); (f) tiller number (Tiller/pl) and root volume (RV); (g) total dry weight (TDW) and root-shoot length ratio (R/S_lth); (h) root tips and top-view area (TVA); (i) SPAD, total root projected area (TRPA), and mycorrhiza colonization (%) (M_col); (j) average root diameter (ARD) and shoot P content (S_P); and (k) total root length (T_root_lth) and (l) total root surface area (TRSA). The upper, median, and lower quartiles of the boxes represent the 75th, 50th, and 25th percentiles of the population, respectively. The square box inside the quartile box represents mean and the asterisk represents outliers. www.nature.com/scientificreports/ Correlation analyses were executed to determine the correlation coefficient among the parameters associated with the traits responsible for low P in rice plants. Among the 23 traits studied, total dry weight had a positive correlation with 10 traits, followed by stem thickness, maximum root length, total root length, and total root surface area, which exhibited a positive association with eight traits (Fig. 2). In addition, a highly significant positive association was observed between shoot length and leaf length (r = 0.937), leaf number with tiller number (r = 0.8606), shoot dry weight (0.7066), and total dry weight (r = 0.7212). Similarly, shoot and root dry weight had a strong positive association between them. Also, the total root projected area exhibited a strong association with most of the root parameters, such as maximum root length (r = 0.8056), total root surface area (r = 0.9865), root volume (r = 0.8340), and number of root tips (r = 0.7378). The total root surface area also positively associated with root volume, root tips, and total root length. In contrast, root-shoot dry weight ratio exhibited a strong negative association with shoot length (r = −0.7486) and leaf length (r = −0.7114), wherein root P content also registered a negative association with shoot dry weight (r = −0.5524), root dry weight (r = −0.5198), and total dry weight (r = −0.5678).
A multivariate principal component analysis was carried out to partition the traits associated with low P. A trait biplot analysis was composed and displayed for the first two principal components with a lay hold of 54.28% of the total variability depicted in Fig. 3. Principal component 1 (PC1) and PC2 accounted for 38.52% and 15.76%, respectively, toward the variability. Among the various traits studied, PC1 was strongly correlated with root-and shoot-related traits. PC1 increases with an increase in root volume, total root surface area, projected area, total root length, dry weight (shoot and root), stem thickness, and shoot and leaf length. It suggests that if root volume increases, the rest of the associated traits will increase, and genotypes would tend to have more root volume under P-starved conditions. PC2 had a positive association with SPAD and leaf width. However, root P content and root dry weight were negatively associated with PC1 and PC2, respectively. This suggests that the roots should translocate observed P and assimilate to expand the above-ground portion in genotypes under deficient P. Further, the biplot graph (Fig. 3) categorized the whole population into five major groups. Among these five groups, group 3 in quadrant 3 (bottom right) registered high biomass with low tissue P in shoot and root. On the contrary, group 2 in quadrant 2 exhibited more root growth with high root P. Other groups (1, 4, and 5) were presented on the opposite plane of the biplot and showed susceptibility under low P with high tissue P.

Marker segregation and diversity index.
A total of 78 primer pairs consisting of microsatellites and gene-based markers covering all 12 chromosomes were used to assess the genetic diversity of 120 genotypes. Fifty-seven of 78 (73.07%) primers used were polymorphic among the 120 selected genotypes. A total of 154 alleles were amplified with 57 polymorphic markers, with an average of 2.70 alleles per locus and the number of alleles ranging from 2 to 5 ( Table 2). The amplification size of the markers ranged from 70 bp (RM30) to 995 bp Table 1. Analysis of variance, mean, range, and broad-sense heritability of 23 traits under P-deficiency conditions.   www.nature.com/scientificreports/ Analysis of molecular variance (AMOVA) and principal coordinate analysis (PCoA). In this study, 120 genotypes were divided into five populations based on their ecology and species level (irrigated, lowland, upland, O. nivara, and O. rufipogon) to determine the genetic differentiation among them. Maximum variation (71%) was found among the individuals, followed by within individuals (23%), and minimum variation existed among the populations (6%) ( Table 3). The deviation from Hardy-Weinberg's prediction was calculated using Wright's F statistics. The F IS and F IT values for all the loci were 0.757 and 0.771, respectively, while F ST was 0.056 between the populations. The NM value of the assumed population was 2.003. The PCoA explained that the first two components accounted for 20.97% of the total genetic variation among the assumed population ( Fig. 4). Most of the genotypes were plotted on the right side of the plot, including wild species. However, all wild accessions from populations 4 (O. nivara) and 5 (O. rufipogon) were plotted in the separate quadrant on the plot's right side. Twenty-four accessions of populations 1 (irrigated; 10 in number), 2 (shallow lowland; 6 in number), and 3 (upland; 8 in number) were plotted on the extreme left side of the plot. Nei genetic distance congregated the five populations into three major clusters. Populations 4 and 5 were grouped into separate clusters, while populations 1 and 2 stayed together. The highest pairwise Nei genetic distance was noticed between population 1 (irrigated) and population 4 (O. nivara) (0.1158), followed by population 5 (O. rufipogon) with population 1 (0.1046) and population 2 (lowland) with 4 (0.1023). Minimum genetic distance was observed between populations 1 and 2 (0.0248), followed by that between populations 2 and 3 (upland) (0.0269).
Genetic diversity and population structure. Cluster and population structure analyses were carried out with the data generated with all 78 primers collectively, with and without 13 Pup1-specific markers. The cluster analysis based on unweighted neighbor-joining with 10,000 bootstraps with all the primers understudy grouped 120 genotypes into three major clusters (Fig. 5a). Cluster-I (blue) constitutes 45 genotypes divided into three major sub-clusters; all the wild cultivars fell into this cluster. Sub-cluster I-1 constitutes only two genotypes, Kamesh and ASD 16, followed by five genotypes (Kouni, AC100142, AC100175, Phalguni, and AC100010) grouped into Sub-cluster I-2. The rest of the 38 genotypes, including all the wild species and a few improved varieties (Meher, Sadabahar, Subhadra, Parijat) and landraces (Dular, IC459373, Sekri, Sukhapanki, Longmanabi A, Dular, and Harishankar), were grouped into Sub-cluster I-3. Cluster-II (green) was divided into six subclusters with 65 genotypes, most of which are improved varieties, including CR Dhan 801, whereas Cluster-III (red) was grouped as a separate cluster with nine genotypes (Akhiyaturfa, KumbhiPhou, ChakhaoAubi, Kabuk-Phou, Rajeshwari, Longmanabi, Khitish, CR-Dhan103, LeimaPhou) of the northeastern states of India with two improved lines (Fig. 5a). Similarly, cluster analysis carried out with 65 low-P linked markers separated 120 geno- www.nature.com/scientificreports/ types into three groups (Fig. 5b). Cluster-I (green) represented only improved lines of 65 in number, whereas Cluster-II (red) grouped 47 genotypes comprising wild species and a few O. sativa (landraces and improved lines). Cluster-III (blue) had eight genotypes, with seven improved varieties and one wild accession. Cluster analysis with 13 Pup1-specific markers grouped all 120 genotypes into three major clusters (Fig. 5c). Cluster-I (red) was further divided into three sub-clusters. A mono-genotypic cluster with genotype Hiranmayi was formed separately as Sub-cluster I-1, while Sub-cluster I-2 consisted of 15 genotypes with 12 O. sativa (landraces and improved varieties) and three wild accessions. In Sub-cluster I-3, 29 genotypes of improved varieties and landraces were grouped. On the other hand, Cluster-II (green) separated 45 genotypes into three sub-clusters: II-1 (10), II-2 (16), and II-3 (19). Sub-clusters II-2 and II-3 consisted of most of the wild accessions. Cluster-III (blue) represented 30 genotypes, which were further divided into three sub-clusters with 14 (III-1), 13 (III-2), Figure 5. Unrooted tree using unweighted neighbor-joining (UNJ) method depicting clustering pattern of a panel population of 120 accessions in response to all 78 primers collectively, with and without Pup1 markers. (a) 78 primers grouped genotypes into three major clusters. Cluster-I (blue) constitutes 45 genotypes that were divided into three major sub-clusters. Cluster-II (green) was divided into six sub-clusters with 65 genotypes, most of them being improved varieties, including CR Dhan 801 and Kasalath. Cluster-III (red) was grouped as a separate cluster with nine genotypes of northeastern states of India with two improved lines. (b) The cluster analysis with 65 low-P linked markers separated 120 genotypes into three groups. Cluster-I (green) represented only 65 improved lines Cluster-II (red) grouped 47 genotypes comprising wild species and a few O. sativa (landraces and improved lines) and Cluster-III (blue) had eight genotypes with seven improved varieties and one wild accession. (c) The cluster analysis with Pup1-specific markers grouped 120 genotypes into three major clusters. Cluster-I (red) consisted of 48 genotypes and was further divided into three sub-clusters. Cluster-II (green) separated 45 genotypes into three sub-clusters and Cluster-III (blue) represented 30 genotypes that were further divided into three sub-clusters with 14 (III-1), 13 (III-2), and 3 (III-3) genotypes. The positive checks Dular and Kasalath were grouped into sub-cluster III-2 with IC459373, multiple-stress-tolerant CR Dhan 801, Poongar, Sekri, Kouni, AC10062, AC100326, AC100284, AC 100281, AC 100135, and AC 100117. www.nature.com/scientificreports/ www.nature.com/scientificreports/ and 3 (III-3) genotypes. Those with known low-P tolerance, such as Dular and Kasalath, were grouped into subcluster III-2 with IC459373, multiple-stress-tolerant CR Dhan 801, Poongar, Sekri, Kouni, AC10062, AC100326, AC100284, AC 100281, AC 100135, and AC 100117. Similarly, the genetic relationship among the genotypes was also determined by a model-based simulation, STRU CTU RE 2.3.4, separately for all 78 markers under study, 13 low-P gene-specific markers (Pup1), and 65 primers without Pup1 markers (Figs. 6a-c, 7a-c). The simulation, by using all 78 primers, identified the highest log-likelihood with the number of populations set at three (K = 3) (Pritchard et al. 2000). Structure harvester of Evano table (http:// taylo r0. biolo gy. ucla. edu) analysis showed that, at K = 3, ΔK = 248.51, where the value was the highest in both independent burns (Fig. 6a). At K = 3, all 120 genotypes were divided into three subpopulations (SP). Among the 120 accessions studied, 25 accessions of wild species and landraces were placed in SP1, and 44 accessions fitted in SP2 comprising only improved genotypes. Similarly, 24 improved genotypes were grouped in SP3, and the rest of the wild species and landraces (27 genotypes) were grouped as an admixture (Fig. 7a). The fixation index (Fst) value ranged from 0.1880 in sub-population 1 of wild species to 0.5507 in subpopulation 3 of improved fixed lines. The average distance (expected heterozygosity) was maximum (0.1261) between sub-population 1 of wild species and sub-population 3 of improved lines, whereas minimum distance was observed among sub-populations 1 to 2 (0.0797). The value of alpha (0.1270) reflects the relative admixture level between populations of less than 1, signifying origin mainly from one population (and each population is equally probable).
On the other hand, without a Pup1-specific marker, the 65 low-P linked markers identified the highest log-likelihood with the number of populations set at three (K = 3), with ΔK of 179.01 (Fig. 6b). Among the 120 genotypes, 46 genotypes of improved varieties were placed in SP1, and 24 genotypes fitted in SP2 comprising wild species and upland (landrace and improved) varieties. Similarly, 24 improved genotypes were grouped in SP3, and the rest of the wild species, landraces, and improved (26) accessions were grouped as an admixture (Fig. 7b). Fst values of the sub-populations are 0.2111 for sub-population 2 (wild and upland), 0.2760 for sub-population 1 (improved), and 0.5471 for sub-population 3 (improved). Allele frequency divergence between sub-populations 1 and 3 was maximum (0.1283), while minimum divergence was observed between sub-populations 1 and 3 (0.0768). The average distance (expected heterozygosity) between genotypes is high in population 2 (0.2863) and minimum with sub-population 3 (0.1623). The alpha value was nearer to zero (0.1226), suggesting that no admixture and population might have originated from one population.
The simulation using 13 Pup1-specific markers identified the highest log-likelihood with the number of populations set at two (K = 2), with ΔK = 106.29 ( Fig. 6c). At K = 2, all 120 genotypes were divided into two sub-populations. Twenty-five genotypes of only improved varieties were placed in SP1, 63 genotypes fit in SP2 comprising improved varieties (including CR Dhan 801), wild species, and a few landraces (Dular, Kasalath, and IC459373), and the rest of the 32 genotypes of wild species and improved varieties were grouped as an admixture (Fig. 7c). The Fst value of sub-population 1 was 0.6382, and that of sub-population 2 was 0.3718. The allele frequency divergence among the two sub-populations was 0.2847, and the average distance (expected heterozygosity) among genotypes in the same sub-population was high in population 2 (0.2252) and minimum in population 1 (0.1548). The alpha value of the Pup1-specific markers is 0.3285, which signifies origin mainly in one population (and each population is equally probable).
Association of markers with traits related to low P. In this study, both a generalized linear model (GLM) and mixed linear model (MLM) were used for association analysis at P-value < 0.005, FDR at the 5% level, and R 2 > 7.5. In the GLM, 61 markers were associated with 110 QTLs (data not shown), and in MLM 16 markers were associated with 29 QTLs (Table 4). The R 2 value of the associated markers for GLM ranged from 0.075 (RM283) to 0.190 (RM297), and for MLM it was from 0.076 (K29-3) to 0.149 (RM297). The marker F-value for GLM ranged from 7.95 (RM283) to 23.75 (RM297), and for MLM it ranged from 7.53 (K29-3) to 14.85 (RM297). A total of 19 QTLs for different traits were predicted on four chromosomes (1, 8, 11, and 12). A maximum of five QTLs (qARD8.1, qLL8.1, qSL8.1, qTRP8.1, and qTSA8.1) associated with average root diameter, leaf length, shoot length, total root projected area, and total surface area were identified on chromosome 8, followed by five QTLs (qNT11.1, qRDW11.3, qRV11.3, qSDW11.2, and qTDW11.2) found associated with tiller number, root dry weight, root volume, shoot dry weight, and total dry weight on chromosome 11 (Table 5). Five QTLs were harbored on chromosome 12, followed by four QTLs on chromosome 1, three on chromosome 3, and two (qRL6.1 and qTRL6.1) on chromosome 6. In the MLM, nine of 23 traits under study were associated with more than one marker. The traits tiller number, root biomass (RDW), and root length were associated with a maximum of three markers on different chromosomes. Similarly, AM fungal root colonization (C%) was associated with two markers on chromosomes 2 and 4, with phenotypic variability (PV) ranging from 8.0 to 10.0%.
Among 29 QTLs observed in the MLM, 12 were associated with root parameters such as root dry weight (RDW), root length (RL), total root length (TRL), total surface area (TSA), average root diameter (ARD), total root projected area (TRPA), root volume (RV), and root biomass (RDW), with recorded PV from 10.32% to 14.97% on six chromosomes (1, 3, 6, 8, 11, and 12). Out of 16 markers associated with 29 QTLs, marker RM6966 on chromosome 8 had an association with five traits, LL, ARD, TRPA, TSA, and SL (shoot length), and explained PV of 7.7% to 10.7%. Similarly, RM5926 on chromosome 11, RM297 on chromosome 1, RM242 on chromosome 9, and K41 on chromosome 12 are associated with four, three, two, and two different traits, respectively, linked with low-P tolerance. Among the Pup1-specific markers, K20-2, K29-3, and K41 on chromosome 12 were associated with root dry weight, stem thickness, root length, and top-view area, with PV of 7.6% to 11.6%. The marker PAP1 developed from NRRI on chromosome 11 was associated with tiller number with a PV of 7.7%.   Table 1S. www.nature.com/scientificreports/ Prediction of QTLs underlying candidate genes for traits associated with low-P tolerance. Twenty-nine QTLs related to traits measured under P starvation associated with 15 markers were considered to identify candidate genes (Table 6). Three markers, RM283 (chromosome 1), RM297 (chromosome 1), and RM242 (chromosome 9), associated with leaf number (qNL1.1), biomass (qRDW 1.1, qTDW 1.1, qSDW 1.1), leaf length (qLL9.), and tiller number (qNT 9.1) were found to have an association with growth promoter genes (auxin (Os09t0491740), brassinosteroid (Os01t0178500), and strigolactone (Os01t0746400)). Traits related to roots such as root length (qRL6.1), average root diameter (qARD8.1), root projected area (qTRPA8.1), and root surface area (qTSA8.1), associated with markers RM30 and RM6966, were found to be linked with candidate genes of phosphate starvation regulator (Os06t0664800) and phosphate transporter (OsPT6, Os08g0564000), respectively. On the other hand, tiller number (qNT3.1 and qNT11.1) was associated with markers RM2334 and PAP1 on chromosomes 3 and 11, respectively, linked with locus Os03t0672900 and Os11t0149100 responsible for the regulation of plant growth and purple acid phosphatase (EC:3.1.3.2) involved in the acquisition and use of organic P. The network analysis of these candidate genes in the QTL interval region was analyzed using the riceFREND database, and it revealed the coexpression pattern of the genes (Fig. 1S). A total of seven candidate genes were explored to construct gene networks, and their interactions were disposed of in Supplementary Fig. S1. Besides, the loci information and functions of the associated genes were given in Table 2S. These reported genes are functionally associated with physiological and molecular pathways in biotic and abiotic stress tolerance mechanisms and also uptake the nutrient element regulated by the phytohormone biosynthesis pathways. Insilico expression analysis of these genes indicates that most genes are highly expressed in root tissues based on the RiceXPro database. The sum of nodes and edges of coexpressed genes of the seven-candidate genes were 72 and 134 respectively. Among the seven putative candidate genes, the locus Os08g0564000 located on chromosome 8 Table 4. Association of marker alleles with associated traits for low P in rice detected by MLM (Q + K) analysis in a set of 120 genotypes. SL shoot length (cm), NT tillers plant −1 , NL leaf number plant −1 , LL leaf length (cm), LW leaf width (cm), SG stem thickness (mm), RL max. root length (cm), SPAD, SDW shoot dry weight (g), RDW root dry weight (g), TDW total dry weight (g), TRL total root length (cm), TRPA total root projected area (cm 2 ), TSA total root surface area (cm 2 ), ARD average root diameter (mm), RV root volume (cm 3 ), RT root tips, TPA top-view area (mm 2 ), C% mycorrhiza colonization (%), SP shoot P (mg g −1 ), and RP root P (mg g −1 ). Identification of rice germplasm with QTLs related to low-P tolerance traits such as root growth and shoot and root biomass. A total of seven rice accessions were found to tolerate P-deficient conditions, including tolerant checks Dular and Kasalath. Among them, the genotypes IC459373, Chakhao Aumbi, AC100219, AC100062, and Sekri exhibited a par or higher shoot and root dry weight with better root growth than the checks in P-starved conditions. Genotype IC459373 outperformed the checks, while Chakhao Aumbi and AC100219 performed significantly better under low P. Amid the favorable QTLs detected, six QTLs (Table 7) were found in Sekri, followed by five QTLs detected in Chakhao Aumbi having a high tiller number and biomass. The genotypes IC459373 and Kasalath had four QTLs each. Therefore, these accessions could be used as donors in a breeding program to improve PUE in rice.

Discussion
Researchers globally for long been trying to understand the critical role of P in plant growth and development mediated through signaling and metabolism to develop P-efficient cultivars in several crops. Considering the area occupied and the importance of rice, improving PUE is catching the attention of rice researchers. Because Table 5. Allelic variant of associated markers and favorable allelic and related phenotypic traits of the panel population. Mean with ± is the standard deviation of the specific allele. Values with different alpabhet across allelic variant are significantly different with each other at p < 0.05. Values in parenthesis next to the favorable allele denotes the number of genotypes carrying that specific favorable allele. SL shoot length (cm), NT tillers plant −1 , NL leaf number plant −1 , LL leaf length (cm), LW leaf width (cm), SG stem thickness (mm), RL max. root length (cm), SPAD, SDW shoot dry weight (g), RDW root dry weight (g), TDW total dry weight (g), TRL total root length (cm), TRPA total root projected area (cm 2 ), TSA total root surface area (cm 2 ), ARD average root diameter (mm), RV root volume (cm 3 ), RT root tips, TPA top-view area (mm 2 ), C% mycorrhiza colonization (%), SP shoot P (mg g −1 ), RP root P (mg g −1 ). www.nature.com/scientificreports/ of finite P fertilizer and its availability only in specific regions around the globe 4 , the applied P fertilizer becomes unavailable to plants because of high reactivity with soil particles and microbial activity 25,26 . Exploring the large unavailable form of soil-bound P would be an alternate strategy to mitigate these factors. In addition, increasing the tolerance level through PUE, acquiring P from the soil by exploring the adjacent area and by modifying the root architecture, secretion of root exudates, and symbiotic association with AM fungi would be advantageous. Attempts were made to improve P efficiency in rice through classical plant breeding by transferring targeted traits with limited success 27 . The available germplasm in gene banks has sufficient genotypic variation to improve P efficiency, but the major obstruction in identifying and developing P-efficient genotypes is the lack of a screening facility with low-P soil. Predominantly, the identification of P-efficient genotypes is driven by biomass production of shoot or root under P-deficient soil. To overcome this impediment, QTLs/genes related to traits that improve P-use efficiency need to be identified to introgress them into an elite background. Wild species and Table 6. Co-localization of significant markers and candidate genes believed to be involved in low-P tolerance identified in the panel population. SL shoot length (cm), NT tillers plant −1 , NL leaf number plant −1 , LL leaf length (cm), LW leaf width (cm), SG stem thickness (mm), RL max. root length (cm), SPAD, SDW shoot dry weight (g), RDW root dry weight (g), TDW total dry weight (g), TRL total root length (cm), TRPA total root projected area (cm 2 ), TSA total root surface area (cm 2 ), ARD average root diameter (mm), RV root volume (cm 3 ), RT root tips, TPA top-view area (mm 2 ), C% mycorrhiza colonization (%), SP shoot P (mg g −1 ), RP root P (mg g −1 ).  Table 7. List of genotypes identified with their QTL details and reaction against P deficiency in rice. SL shoot length (cm), NT tillers plant −1 , NL leaf number plant −1 , LL leaf length (cm), LW leaf width (cm), SG stem thickness (mm), RL max. root length (cm), SPAD, SDW shoot dry weight (g), RDW root dry weight (g), TDW total dry weight (g), TRL total root length (cm), TRPA total root projected area (cm 2 ), TSA total root surface area (cm 2 ), ARD average root diameter (mm), RV root volume (cm 3 ), RT root tips, TPA top-view area (mm 2 ), C% mycorrhiza colonization (%), SP shoot P (mg g −1 ), RP root P (mg g −1 ). www.nature.com/scientificreports/ landraces possess higher genetic diversity than modern/improved varieties, which serve as a reservoir of genetic diversity and help to use the beneficial alleles in breeding programs for crop development. This research work attempted to study the genetic diversity, structure, and association between markers and traits measured under low-P conditions among the selected subset of the population comprising wild species, landraces, and improved varieties with multiple traits. The line indicates the fitted results representing the relationship between total P of plant tissue and possible parameters (TRSA and mycorrhizal colonization) involved in P uptake under P deficient condition. The contribution of improved P uptake of TRSA was high compared to mycorrhizal colonization.  28 for total above-ground biomass, total above-ground P uptake, and P translocation efficiency in P-starved conditions. Therefore, the presence of moderate to high genetic advance with high h 2 offers an opportunity to improve the selection of traits at the early generation based on shoot biomass, total root length, tissue P content, mycorrhizal colonization, and the geometric trait of top-view area for improving low-P tolerance in rice. The strong positive correlation between shoot and root biomass and inter-correlation among the root parameters show the possibility of improving P-efficient genotypes. The negative association between root P content and shoot dry weight (Fig. 8a) suggests that genotypes with desired root architecture QTLs such as Pup1 coupled with high-affinity P transporters (PHT1 family) would be supportive of using available P efficiently in P-deprived soil 14 . Besides, TRSA had become the higher contributor for improved P uptake under deficit P than mycorrhizal colonization. This was clear from Fig. 8b, and non-linear regression showed that 33% of P uptake would be through TRSA, while the role of mycorrhizae in P uptake was minimum (4%) in the present experiment. The multivariate principal component analysis also suggests the importance of roots under P-starved conditions to improve shoot growth. Genotypes such as upland landraces of northeast India, a few genotypes of improved upland varieties, O. nivara, O. rufipogon, and known tolerant genotypes Dular and Kasalath were categorized under group 3 with high biomass and low tissue P in both shoot and root, whereas genotypes of the improved varieties were put in groups 1 and 2 with high tissue P in the root and low P in the shoot. Conversely, O. nivara and O. rufipogon were grouped together (group 5) with high tissue P, root diameter, root-shoot weight ratio, and root-shoot length ratio, while group 4 had a mixture of all species with more shoot P. This suggests that genotypes of group 5 (O. nivara and O. rufipogon) can extract, upload, and translocate P from the rhizosphere to the shoot in low-P soil. The symbiotic association between AM fungi and plants ( Fig. 9) was significantly higher in groups 3 and 2 (41%), in which they have a good root system (Fig. 3). The improved genotypes of group 1 had minimum colonization of 30% with a poor root system. It is well-known that rice plants with more root biomass positively correlate with AM fungal root colonization 29 .

S. no
Genetic diversity and distribution of Pup1 among Oryza spp.. Elucidating molecular genetic diversity helps breeders estimate rice germplasm's genetic constitution and select donors to develop a systematic and effective breeding program. Fifty-six of 78 markers exhibited polymorphism (73%), which gave rise to 154 alleles, less than in our previous study 30 . We obtained 128 alleles from 39 polymorphic primers, and more than reported by Donde et al. 31 , 154 alleles from 65 polymorphic markers. The present average of 2.7 alleles per locus was lower than that of many other research reports; for example, Noyer et al. 32 analyzed 419 rice accessions using 16 SSR markers and reported an average of 9.1 alleles per locus. The higher average number of alleles per locus than in the present study (2.24) may be due to the use of a larger number of genotypes. This study's average PIC www.nature.com/scientificreports/ value was 0.326, which was a bit higher than that of Xu et al. 33 (0.31), who studied the association mapping of cold tolerance in improved japonica rice germplasm. Cluster analysis based on unweighted pair group method with arithmetic mean (UPGMA) with all 78 markers and excluding gene-specific Pup1 grouped 120 genotypes into three clusters. Under both conditions (excluding and including the Pup1), the wild species were grouped into a single cluster and might have a similar type of P-efficiency mechanism under P deficiency. However, all explored wild species (30 in number) might have lower diversity because they share common geographical boundaries such as Odisha and neighboring province West Bengal of India. This coincides with several studies on assessing genetic diversity [34][35][36] . On the other hand, those wild species were separated from O. sativa L. (landraces and improved genotypes). Similarly, Xu et al. 33   , SG stem thickness (mm), RL max. root length (cm), SPAD, SDW shoot dry weight (g), RDW root dry weight (g), TDW total dry weight (g), TRL total root length (cm), TRPA total root projected area (cm 2 ), TSA total root surface area (cm 2 ), ARD average root diameter (mm), RV root volume (cm 3 ), RT root tips, TPA top-view area (mm 2 ), MC mycorrhiza colonization (%), SP shoot P (mg g −1 ), RP root P (mg g −1 )]. www.nature.com/scientificreports/ grouped both of them in separate clusters, which are in support of our study. The UPGMA cluster analysis with 13 gene-specific Pup1 markers grouped all 120 genotypes into three major clusters (Fig. 5c). Cluster-III (blue) grouped 25% (30 accessions) of the studied population, including the tolerant genotypes Dular and Kasalath having Pup1. However, this cluster again sub-divided into three sub-clusters (III-1 (14 in number), III-2 (13 in number), and III-3 (3 in number)) based on the presence of gene-specific Pup1 markers. Among the three subclusters, sub-cluster III-2 had grouped 13 genotypes that include six landraces (42.85% of the total population), five O. rufipogan (33.3%), one O. nivara (AC 100117) (6.66%), and one improved variety (CR Dhan 801) (1.31%). The performance of those 13 genotypes registered 28.42% shoot length, 83.41% root biomass, and 90.49% root volume, with 35.88% top-view area, higher than in the genotypes secured in the other two sub-clusters. On the other hand, out of 13 gene-specific Pup1 markers, K29-3, K41, K43, and K45 played an important role in differentiating genotypes between these sub-clusters. To corroborate this, Chin et al. 17 reported K29-1, K29-3, K41, K43, K45, and K46-1 as core markers to differentiate genotypes that differ in Pup1 allele constitutions. Further, Neelam et al. 37 substantiated that O. rufipogon collected from India has high PUE with a differential functional allele of PSTOL1. By considering all 78 markers and excluding gene-specific Pup1 markers, the model-based simulation STRU CTU RE 2.3.4 divided the population into three sub-populations, similar to the distance-based model. Conversely, the gene-specific Pup1 markers grouped the population into two sub-populations. The grouping of genotypes with distance and model-based analysis was similar at the genetic level with markers linked to low P, including exclusive markers or exclusive gene-specific Pup1 markers. It suggests that traits other than early root growth (Pup1-specific markers) in differentiating low-P tolerance must be documented. AMOVA revealed a higher proportion of variation among the individuals, while a lower proportion of variation was observed among the populations. A similar observation of a higher proportion of variation among the individuals was reported by Verma et al. 38 and Islam et al. 39 . Genotypes of the present study belonged to different species of Oryza (sativa, rufipogon, and nivara), wherein the species sativa includes genotypes of landraces and from improved upland, lowland, and irrigated ecosystems, which resulted in a higher proportion of variation among the individuals than among the populations. Within-individuals that had registered higher variation indicated a high level of heterozygosity at each locus in wild genotypes and landraces. The Fst value was minimum in the case of subpopulations containing wild genotypes (0.20), while a high Fst index of 0.55 was observed in improved genotypes, suggesting that they are fixed and homozygous in nature. The alpha value determined by the model-based simulation STRU CTU RE 2.3.4 under three circumstances was nearer to zero (0.12) in two instances, which suggests that the individuals are from one population, while the Pup1 gene marker represented with 0.3285 suggests that the variation arising in a specific region might be variation in the Pup 1 indel region. The value of NM (2.0) indicated possible gene flow between the populations and low genetic differentiation for the trait studied 40,41 . Therefore, a hypothesis has been proposed that the Pup1 region or gene with low-P tolerance might have been introgressed from O. rufipogon into O. sativa. This may be supported in this study by the frequency of tolerant accessions of O. rufipogon outnumbering those of O. nivara. Further, Neelam et al. 37 substantiated that O. rufipogon (IRGC 106506) accessions performed significantly better under limited P, with 2.5 times higher root weight than the positive control.

Scientific Reports
In the present association study, both GLM and MLM were used to assess the association between traits and markers. However, the number and degree of association-related traits are based on the dataset and model used. MLM, unlike GLM, is more accurate and has a robust algorithm to improve the calibration. It integrates structure and kinship matrix, which rectifies the false-positive error expected due to population structure and relatedness 42 . Hence, MLM has been popularly used by several researchers for studying marker-trait association. In the GLM, 61 markers were associated with 110 QTLs; in the MLM, 16 markers were associated with 29 QTLs. The total number of QTLs and distribution of the markers are displayed in Fig. 10. The unique population exploited in this study was associated with 78 markers linked to low P with multiple traits screened under deficient P. Not much study was carried out on the traits selected in this study in reference to the association with tolerance of P deficiency. This resulted in the identification of novel and interesting QTLs for multiple traits. We reported 16 markers (RM6966, RM1272, RM200, RM2334, RM242, RM283, RM297, RM30, RM3343, RM521, RM574, RM5926, PAP1, K20-2, K29-3, and K41) associated with 29 QTLs for 17 traits (ARD, C%, LL, LW, NL, NT, RDW, RL, RV, SDW, SG, SL, TDW, TPA, TRL, TRPA, and TSA) under low-P conditions in rice. Overlapping of markers for tolerance of low P suggests a high correlation between the traits. For instance, marker RM6966 was found to be associated with several root-related traits necessary for low-P conditions. Similarly, RM242 and RM297 were linked to shoot and root biomass. Thus, the pleiotropic effect of markers unravels the genetic correlation among the traits 30 measured under low P. Marker RM30 on chromosome 6 was observed to be significantly associated with root length (R 2 = 9.9%), and was co-localized with a QTL reported for root elongation ratio under P deficiency with PV of 19.9%. The tolerance imparted by this QTL might be due to the positive regulation of the candidate gene Phosphate starvation regulator (Os06t0664800) at 0.15 Mb to the right of qRL6.1 on chromosome 6. The network analysis of locus Os06t0664800 by the riceFREND database revealed that it has 11 nodes and 19 edges. It was found that coexpression patterns of the genes were highly expressed in roots, specifically in protophloem sieve elements and promote root elongation (OsbHLH068) 43 . Similarly, marker RM6966 from this study suggested that three QTLs were controlling root traits of low-P tolerance in rice, which were co-localized with the QTLs qRS8b, qRDW8, and qRN8b reported by Li et al. 19 for root traits located in the same genomic region under P deficiency with a PV of 11.04%. The R 2 values of traits associated with marker RM6966 varied from 7.7% to 10.7%. The well-documented high-affinity phosphate transporter (OsPT6) gene located on chromosome 8 at 28.3 Mb within this QTL region proved to be involved in long-distance transport of P from root to shoot, and it aids in the accumulation of biomass 44 under P-deficient conditions. Insilico coexpression pattern of the gene Os08g0564000 revealed that most of the genes involved in root development and signaling pathways during abiotic stress (phosphoesterase family protein; OsNPC4) 45  www.nature.com/scientificreports/ expression analysis suggest that the resulting gene network provides the molecular pathways and the biological function of each gene. This piece of information would be useful to explore the role of novel genes related to the target trait of interest in the breeding program.
On the other hand, marker RM242 located on chromosome 9 was a hotspot for biomass accumulation under P-deficient conditions, for which two important QTLs, qLL 9.1 and qNT 9.1, were identified to be associated with low-P conditions, with R 2 ranging from 8.50% to 9.97%. Marker RM242 was reported to be associated with increasing root length under irrigated conditions and drought stress and was used in a marker-assisted backcross 46 to improve root length in rice genotype Kalinga III. Goncharova et al. 42 also reported RM242 to be involved in increasing relative biomass and nitrogen-use efficiency. The locus Os09t0491740 is located 0.33 Mb away from marker RM242 involved in auxin efflux situated in the plasma membrane. Similarly, markers RM283 and RM297 were also linked with the growth promoter genes brassinosteroid (Os01t0178500) and strigolactone (Os01t0746400).
Four PAP markers (1 to 4) associated with purple acid phosphatase were developed at ICAR-NRRI to identify genotypes having candidate gene OsPAP21b. The gene OsPAP21b was reported to be up-regulated under phosphate deprivation 47 and proved to be a major plant enzyme involved in releasing P from organophosphates that remain unavailable for plants before mineralization 48 . Among the four markers, PAP1 at 2.42 Mb on chromosome 11 exhibited polymorphism across genotypes, which co-localized with tiller number with PV of 7.6% under P deprivation. Among the Pup1-specific markers, the R 2 value of K20-2 registered a maximum of 11.5% for stem thickness and 10.5% for geometric trait top-view area, followed by the Pup1 core marker K41, which exhibited association with stem thickness of 9%. Another core marker, K29-3, was associated with root dry weight. Therefore, it is believed that Pup1 is involved in the accumulation of biomass in addition to early rooting in P-deficient conditions. On the other hand, markers RM521 and RM1272 were associated with mycorrhizal colonization % on chromosomes 2 and 4, respectively, with an R 2 value of 10.3% and 8.3%. The symbiotic association between the rice plant and soil fungi mycorrhiza is well known under P-deficient conditions and fulfills 80% of P [21][22][23] .
Here, several reliable associations between traits measured under P deprivation and molecular markers were established. Notably, six markers (RM297, RM30, RM6966, RM242, RM283, and PAP1) would be more useful in predicting the identification of genotypes under P-deprived conditions. Typically, an association panel is a diverse set of populations that harbors multiple alleles at any given locus (which requires identifying favorable alleles rather than using the whole genome for hybridization), which are identified phenotypically. Identification and breeding for favorable alleles of the trait of interest from wild genotypes and landraces into modern cultivars will help breeders arrive at a new improved variety with wider adaptability. Therefore, a set of five potential genotypes (IC459373, Chakhao Aumbi, AC100219, AC100062, and Sekri) having a set of favorable alleles would be used appropriately to improve the PUE of rice. Among the selective potential genotypes, IC459373 and AC100219 contain the QTLs qNT3.1 and qNT11.1 for tiller numbers, while Dular had only one QTL qNT11.1. However, the tiller number between the three genotypes were uniform even they had additional QTLs for the same trait. A more number of QTLs for same trait might have complimentary effect under stress condition for maximizing the root length, which is highly essential for low P condition. This may be observed in the correlation between tiller number and maximum root length (Fig. 2). Therefore, genotypes having multiple QTL for same trait may be explored to utilize them in the breeding program to improve the PUE in rice. As these genotypes (IC459373 and AC100219) having better performance with additional QTLs, they need to be studied further and we are in it. Besides , the identified alleles need to be evaluated to understand the advantage of using them and their effect on different genetic backgrounds.
Identifying suitable donors and improving PUE in rice, potential candidate genes, and markers associated with these traits under low P is a central research area in rice breeding. Therefore, in this study, we demonstrated a selected population panel having significant trait variations. The positive correlation between the traits explained the possibility of improving genotypes for P-deprived soil in a holistic manner. Notably, a negative association was observed between root P content and biomass. The trait total root surface area had become the major contributor for improved P uptake under deficit P than mycorrhizal colonization. The geometric trait of the top-view area exhibited a positive association with maximum root length and root volume, suggesting using a non-destructive approach in screening genotypes under low-P conditions, which is the first report to the authors knowledge as well as based on available scientific documentation. The genotypes were divided into three groups based on distance and model-based analysis, while gene-specific Pup1 markers classified the panel into two groups. This suggests that traits other than early root growth (Pup1-specific markers) in differentiating low-P tolerance must be documented. This unique panel serves to find the association between traits and primers and facilitates the identification of donors with a combination of adaptive traits necessary for low-P conditions. Further, the identified linked markers are highly valued when determining variation for a target trait. The markers RM259, RM297, RM30, RM6966, RM242, RM184, and PAP1 were highly associated with traits responsible for low-P conditions in rice. The multiple traits related to these genome regions are promising resources for improving and understanding PUE and are also more useful as a breeding tool in predicting genotypes under P-deprived conditions.

Methods
Plant materials. The population consisted of 155 rice accessions (Table 1S)  www.nature.com/scientificreports/ Growing conditions and experimental design. Phenotyping for phosphorus stress in a cement tank. All the accessions under study were screened in a cement tank for a low-P tolerance facility located at NRRI, Cuttack (20 o 27′09" N, 85 o 55′57" E, 26 masl). The genotypes were direct-seeded in tanks containing low-P soil (< 3 kg/ha; 0-15-cm layer, pH 4.9) with 20 cm X 15 cm between genotypes in three replicates during June 2019. Average day/night temperatures were 33.6/26.0 °C, and relative humidity was 85.9% in bright sunlight. Before sowing, seeds of all the accessions were heat-treated at 50 °C for 45 h in a hot-air oven to break seed dormancy. The soil was irrigated every other day and, 15 days after sowing, the seedlings were thinned, leaving only two seedlings per hill. Chlorophyll content was measured for three plants of each accession on the 44th day from the date of sowing by using a SPAD meter (SPAD-502, Konica Minolta). On the next day (45th day), the plants were uprooted and morphological traits, shoot length (cm), number of tillers, number of leaves, 3rd-leaf length (cm) and width (cm), stem thickness (mm), maximum root length (cm), and root-shoot length ratio were recorded for three plants of each genotype. Subsequently, root traits (total root length (cm), projected root area (cm 2 ), root surface area (cm 2 ), average root diameter (mm), root volume (cm 3 ), and number of root tips) were recorded for each genotype per replication and analyzed by WinRHIZO Pro 2013e (LA 2400, Regent Instruments Inc.). The plant samples were dried in a hot-air oven at 60 °C for 5-6 days and shoot, root, and total dry weight and rootshoot dry weight ratio were recorded in grams. In addition, the plant's top-view area was measured using opensource Image J software 49 . Images of each accession were taken using a 12-megapixel Nikon camera at 1.5-m distance, and the pixels were changed into mm 2 as the top-view area of the plant 49,50 . To quantify total P, the dried plants were powdered around 300 mg shoot and 90 mg root samples following the phospho molybdo vanadate colorimetric method. P concentration in the digest was determined using a Cystronicx UV Spectrophotometer at 420 nm, and total shoot and root P contents were determined on a mg/g dry weight basis. Additionally, to strengthen these data, mycorrhiza colonization was observed.
To study AM fungal root colonization, root samples were collected from all the genotypes, washed thoroughly, cut into small pieces (0.5 to 0.7 cm), and put in falcon tubes containing 15 to 20 ml of 10% KOH. Care was taken when the root samples became immersed inside the KOH solution. Further, the samples were autoclaved at 121 °C at 15 psi pressure for 15 min and washed under running tap water until the roots became clear in color. Each sample was treated with 10 to 15 ml of 2% HCL for 5 min and washed under running tap water. Finally, 10 to 15 ml of 0.05% trypan blue in lactoglycerol was added depending upon root volume and this was left overnight at room temperature. Root samples were rinsed with lactoglycerol (without stain) and slides were prepared with 10 root pieces on each slide for observation under a stereomicroscope to check the root colonization. Percentage of root colonization was determined as per the formula described by McGongigle et al. 51 : Percentage of colonization = (number of root segments colonized/total number of root segments) × 100.

Statistical analysis.
To study the effect of variation due to P in 155 accessions, descriptive statistics, box plot, ANOVA, and broad-sense heritability (H 2 ) were calculated using Windostat 7.5 software with 23 traits measured under low-P conditions. Principal component analysis (PCA) was performed on a matrix of morphometric and geometric traits to assess the variability in traits for tolerance of P deficiency with 155 rice genotypes to differentiate the genotypes based on their performance under low-P conditions. The PCA analysis was executed using the PCA function from the FactoMine R package 52 in R. The corrplot functions from the corrplot package 53 in R (version 3.6.3) were used to find the effect of low P on the various genotypes.
Genotyping. DNA isolation, quantification, and PCR amplification. Based on the level of tolerance and phenotypic information, a panel of 120 accessions was shortlisted from 155 accessions for genotypic study based on their distribution pattern. The total genomic DNA was extracted from young leaves following the modified CTAB method 54 . Further, the DNA samples were quantified by a Nanodrop spectrophotometer (Thermo Scientific) and diluted to a working stock of 50 ng µl −1 . PCR amplified the isolated DNA samples with a 10 μl reaction containing 20 ng of DNA, 10 mM Tris-HCl, 1.5 mM MgCl 2 , 0.2 unit of Taq DNA polymerase (New England Biolabs), 50 μM of dNTPs (New England Biolabs), and 0.1 μM each of forward and reverse primers using a T100 (Bio-Rad, USA) thermal cycler. Thermal cycler reaction was performed as mentioned: initial denaturation at 94 ºC for 10 min, the mixture was cycled 35 times at 94 ºC for 45 s for denaturation, followed by annealing at 55-60 ºC for 45 s (following the TM values of the primer) and 72 ºC of extension for 60 s, and followed by a final extension at 72 ºC for 10 min. The samples were run on a 3.5% agarose gel by using bromophenol blue as a dye and a 50 bp ladder (New England Biolabs) for 1 h in 0.5X tris-acetic acid-EDTA (TAE) buffer. The resolved PCR bands were documented in a gel documentation system (Gel Doc XR + , Bio-Rad, USA), and the images were stored for analysis. In this study, with the prior information, 78 low-P QTL-linked primers (Table 2), including 13 Pup1-specific markers 55 , were used to map the QTLs associated with traits related to low P and the genetic architecture of the population panel.
Genetic diversity, population structure, and association analysis. The molecular size of the amplified fragments was determined by image lab software using 50 bp DNA ladders as the standard. Amplification of DNA samples with the primers was scored as per the molecular size or '1' for amplified regions and '0' for unamplified regions. A data matrix with '0' and '1' or the molecular size was prepared depending on the amplification, and the data matrix underwent further analysis. Genetic diversity parameters, major allele frequency, gene diversity, heterozygosity, and the polymorphic information content (PIC) were calculated for each SSR locus using Power marker software version 3.25 56 . An un-rooted tree with a bootstrap value of 1000 was used to construct an unweighted neighbor-joining tree (Nei 1972) using Darwin 6.0 software 57 . Analysis of molecular variance (AMOVA) was generated using the GenAlex 6.502 program to describe the presence of molecular variance components within and between the population differentiation among the five assumed sub-populations 58  www.nature.com/scientificreports/ to estimate the F statistics comprising the deviations from Hardy-Weinberg expectation across the population (F IT ), within-population (F IS ), and for correlation of alleles between the sub-populations (F ST ). In addition, Shannon's Information index observed and expected homozygosity, observed and expected heterozygosity, Nei's genetic diversity index, and the number of migrants (NM) between the assumed sub-populations were calculated by POPGENE program version 1.31 59 .
The genotypic data of the genotypes under study were analyzed for possible population structure with the model-based program STRU CTU RE 2.3.4 60 using a length of the burn-in period of 1,00,000, followed by 10,000 Markov chain Monte Carlo (MCMC) replications. At least ten runs of STRU CTU RE were performed by setting the number of sub-populations (K) from K = 1 to K = 10. To find the true K-value, ad hoc statistics ∆K was followed 61 using Structure Harvester version 0.6.94 62 . In this study, both model and distance-based cluster analysis were carried out with the data generated with all 78 primers collectively, excluding 13 Pup1-specific markers and only with Pup1-specific markers to realize the distribution of the PSTOL1 gene across species and the significance of other markers under low P. Further, to determine the genetic relatedness between traits measured under low P, the rice genotypes and primers were analyzed by the general linear model (GLM) and mixed linear model (MLM) in TASSEL version 5.2.63 63 . False discovery rate (FDR) was used to obtain q-values (adjusted P-values) as described by earlier studies 64 . A significant association between markers and traits was identified based on their R 2 and P-value.
Candidate gene analysis underlying a QTL region and coexpression of gene assay. The associated traits and SSR markers aligned to the IRGSP 1.0 genome in The Rice Annotation Project (https:// rapdb. dna. affrc. go. jp/ viewer/ gbrow se/ irgsp1/) to retrieve the genes associated with linked markers. The sequence on either side of each linked marker extended from 500 kb left and 500 kb right and was marked as a QTL region and meticulously searched to find the genes associated with low P tolerance. The interval on either side of the associated marker can be called an interval of the QTL genome 65 . Furthermore, we have performed the coexpression of gene assay for each putative candidate gene based on the guide gene approach using the riceFREND database (https:// ricef rend. dna. affrc. go. jp/) to identify the coexpressed genes. This approach provided the details of coexpressed genes, functional information associated with reported genes, and transcription factors. The direct link of this database with RiceXPro (https:// ricex pro. dna. affrc. go. jp/) helps to understand the insilico expression of these candidate genes. This integrative process in two databases allows the number of nodes of the gene interactions, and the ranking of nodes helps identify the groups of functionally related genes 66 . Ethics approval and consent to participate. The authors declare that this research review was conducted in the absence of any commercial or economic associations that could be construed as potential conflicts of interest.