Variability Assessment for Root and Drought Tolerance Traits and Genetic Diversity Analysis of Rice Germplasm using SSR Markers

The studies on genetic variation, diversity and population structure of rice germplasm of North East India could be an important step for improvements of abiotic and biotic stress tolerance in rice. Genetic diversity and genetic relatedness among 114 rice genotypes of North East India were assessed using genotypic data of 65 SSR markers and phenotypic data. The phenotypic diversity analysis showed the considerable variation across genotypes for root, shoot and drought tolerance traits. The principal component analysis (PCA) revealed the fresh shoot weight, root volume, dry shoot weight, fresh root weight and drought score as a major contributor to diversity. Genotyping of 114 rice genotypes using 65 SSR markers detected 147 alleles with the average polymorphic information content (PIC) value of 0.51. Population structure analysis using the Bayesian clustering model approach, distance-based neighbor-joining cluster and principal coordinate analysis using genotypic data grouped the accession into three sub-populations. Population structure analysis revealed that rice accession was moderately structured based on FST value estimates. Analysis of molecular variance (AMOVA) and pairwise FST values showed significant differentiation among all the pairs of sub-population ranging from 0.152 to 0.222 suggesting that all the three subpopulations were significantly different from each other. AMOVA revealed that most of the variation in rice accession mainly occurred among individuals. The present study suggests that diverse germplasm of NE India could be used for the improvement of root and drought tolerance in rice breeding programmes.

annual Oryza nivara found in NE India, Bangladesh and Northern Myanmar 9,10 . Kim et al. 9 stated that "the cultivated aus cultivars and its wild ancestor represent an underappreciated genetic resource". These aus rice cultivars are early maturing, photoperiod insensitive and drought tolerant 8,11 . The genetic diversity among aus genotypes is abundant and they are enriched with various abiotic and biotic stress resistance genes 6,10,12,13 . For example, a traditional aus cultivar FR 13A harbour Sub1 gene which confers submergence tolerance 14,15 ; likewise Kasalath an aus type cultivar was the donor of Pstol1 gene which confers phosphorus starvation tolerance 13 . Rayada, N-22 and Dular aus type cultivars have a large root length and high root density and drought resistance 16,17 . Nagina22 deep-rooted an aus type rice cultivar is the donor for heat and drought tolerant traits [18][19][20][21][22][23] . BG1222 an aus type cultivars harbour xa34 bacterial blight resistance genes 24 . Most of the drought-tolerant rice are originated from aus rice germplasm which has been cultivated in Northeast India and Bangladesh 12,25 . Knowledge of the genetic diversity of rice genotypes is useful for core collection development and effective conservation strategy. In the present study, an attempt was made to study genetic diversity in a few rice genotypes of North East India, the majority of which were agronomically identified as aus/ahu cultivars along with few indica rice.
The root system is the main part of plants for the absorption of water and nutrients from soil 26 . Root is the first organ to experience water stress and thus root system plays a vital role under drought stress conditions [27][28][29] . Root traits such as small fine root diameter, long specific root length and root length density play a major role in water uptake and maintaining plant productivity under drought. The nature and type of root characteristics are the main factors deciding their survival and adaptation to drought. The distribution of the root system and its density indicate water uptake potential 30 . In rice, deep rooting is governed by DEEPER ROOTING1, which confers improved drought resistance in drought stress environment 31 . In upland rice, the long root system is highly associated with drought tolerance 32 .
Study on root traits particularly in the field condition is very challenging and thus is very limited. Similarly, no attempt has been made to study the genetic architecture of the root traits in relation to drought tolerance using rice from North East India which necessitates the study of the diversity for such traits for supplement rice improvement works. It is envisaged that the selection of diverse parents based on root traits and drought tolerance would facilitate the development of transgressive segregates, as well as, heterotic groups for hybrid crop breeding in the population. However, genetic diversity analysis solely based on phenotypic traits may not be a reliable measure of genetic differences as they are influenced by environmental factors [33][34][35] . Thus, DNA based markers such as RAPD (Random Amplified Polymorphic DNA), SSR (Simple Sequence repeat), AFLP (Amplified Fragment Length Polymorphism) and SNP (Single Nucleotide Polymorphism) have been routinely used to assess the genetic divergence among the genotypes as they are not influenced by environmental factors.
Multi-allelic nature and high polymorphism of SSR markers help to establish the relationship among the individuals even with less number of markers 36 . SSR markers are preferred as they are abundance in the genome, well-distributed throughout the genome, hyper-variable, multi-allelic and co-dominant nature, ease of assaying, highly reproducible and highly informative 37,38 . SSR markers are immensely valuable in studies of variation detection, diversity analysis, phylogeny, population structure, gene mapping and association studies 39,40 .
The knowledge of the extent of genetic variation, diversity and genetic relationships between genotypes of the crop is vital and foundation for developing an improved cultivar possessing high yield, good grain quality and adapted to various abiotic and biotic stresses situations. Knowledge of the genetic diversity of rice genotypes is useful for core collection development and effective conservation strategy. Thus in this present study attempt has been made to (a) estimate the extent of genetic diversity in indigenous landraces using both morphological traits (drought tolerance, root and shoot traits) as well as SSR markers genotypic data. (b) To study the genetic structure in a few rice germplasm from NE India.

Materials and Methods
plant materials. The experimental material for the present investigation comprised of 114 genotypes of rice (Table 1). Pure seeds of 114 genotypes were collected from Regional Agricultural Research Station (RARS), Titabar farm of Assam Agricultural University, Jorhat. These genotypes were directly sown in moist soil and evaluated in three replication for various root and shoot traits and drought tolerance using PVC pipes in a rain shelter at ICR (Instrumental cum Research) farm of Assam Agricultural University, Jorhat. Mean performance was recorded for root length, fresh root weight, dry root weight, root volume, root angle, bottom root number, peripheral root number, shoot length, dry shoot weight, fresh shoot weight, root to shoot ratio and ratio of deep rooting ( Table 1). The root angle, bottom root number and peripheral root number were measured using basket method 41 . The root length, shoot length, fresh root weight and shoot weight data of 45-day-old plants were recorded using PVC pipes following standard method 42 . Plants were sampled and dried at 80 °C for seven days and root dry weight and shoot dry weight was recorded. The seedling stage drought tolerance score was recorded separately by growing the genotypes in the seedbed of three-row of 2-meter length per genotype in three replication under rain protected condition following Swain et al. 43 . The observation for drought tolerance was recorded at 32 and 34 days after withdrawal of life-saving irrigation (DAWW) when the soil moisture content was around 7-8% (W/V) and susceptible check (IR64 and Ranjit) showed complete drought stress symptoms using "Standard Evaluation System for Rice (SES)" 44 . The Soil moisture status was determined by the gravimetric method 45 . SSR genotyping. DNA isolation and PCR amplification. A total of 140 SSR markers and 30 genes specific (root related traits and aquaporin) SSR markers covering all 12 chromosomes were used to assess the level of genetic diversity among 114 rice accessions. Finally, 65 markers including four gene-specific markers out of 170 SSR markers were chosen for genetic diversity analysis because they were found polymorphic and showed prominent distinguishable banding patterns among genotypes. The genetic sequence of genes related to root traits and drought tolerance factors were downloaded from http://rapdb.dna.affrc.go.jp/ and http://rice.plantbiology.msu.edu/. Gene-specific primers were designed using Primer 3 software. SSR marker sequences, annealing temperature and www.nature.com/scientificreports www.nature.com/scientificreports/ chromosomal locations are obtained from the GRAMENE database. The total genomic DNA from each of the genotypes included in the present study was extracted following the protocol of Plaskhe et al. 46 with slight modification. The quantity of genomic DNA was measured using a Nanodrop instrument. The final concentration of DNA was adjusted to 30 ng/μl for PCR reaction. The amplification conditions were based on the procedure of Panaud et al. 47 . The PCR reaction volume was 10 μl. The PCR reaction mixture of 10 μl consists of 0.4 mMdNTPs, 4 mM of MgCl2, 150 mM of Tris-HCl, 10 pmoles of forward and reverse primer and 0.05 U Taq polymerase with 30 ng of DNA. The reagents were mixed thoroughly and then placed in a Thermal Cycler (PCR Gene AMP ® 2400, Applied Biosystems, USA) for cyclic amplification using the amplification programme Step 1 (Initial denaturation) 94 °C for 5 min.
Gel electrophoresis, photography and allele scoring. Amplified products were separated based on their size using 3% agarose gel electrophoresis and 1x TBE buffer in the horizontal electrophoresis tank. The photograph of the gel was digitally documented in Gel Documentation System (UVP, UK). The molecular weight of distinct bands or amplified fragments was measured in base pair by comparing with the band size of 100 bp ladder (GeNeI Company) with IR-36 as molecular weight reference 48 .
Data analysis. The root traits and drought tolerance were subjected to analysis of variance (ANOVA), clustering based on the algorithm of unweighted pair group method with arithmetic mean (UPGMA) and Principal Component Analysis (PCA) using R packages 49 .
To identify the genetic structure of the given population and assign individuals to populations, the software STRUCTURE version 2.3.4 was used 50 . To derive the optimal number of groups (K), STRUCTURE was run with K varying from 1 to 10, with five runs for each K value. To determine the true value of K, ad hoc statistic ΔK was followed. Parameters were set to 1,00,000 burn-in periods and 5,00,000 Markov Chain Monte Carlo (MCMC) replications after burn-in with an admixture and allele frequencies correlated model. The method described by Evano et al. 51 was used to estimate the most probable K value for the analyzed data, using the web tool Structure Harvester ver. 0.6. application 52 .
The number of alleles (N), Ne (Allelic richness) Shannon information index (I), observed heterozygosity (Ho), expected heterozygosity (He), and fixation index (I)) were determined by using GenAlEx 6.502 programme 53 . A PIC value of each marker was determined as suggested by Botstein et al. 54 . Further, the allelic data were subjected to estimation of genetic distances among genotypes using simple matching coefficients by bootstrapping 1000 times and they were clustered using a neighbor-joining method using Darwin software version 6.0 55 . Further, analysis of molecular variance (AMOVA) was performed to describe variance components among individuals and the population differentiation among the seven assumed subpopulations using GeneAlEx 6.502 program 53 with 1000 permutations. Principal coordinate analysis (PCoA) was performed to highlight the resolving power of the ordination and the first two components were used to represent the genotypes in the graphical form. PCoA and dissimilarity matrix was performed by using DARwin software version 6.0 55 . Genetic differentiation among the assumed subpopulation was analysed using Nei's gene diversity statistics using GenAlEx program version 6.502. Venn diagram analysis was performed to identify common varieties between the model-based cluster and neighbor-joining based grouping using online interactive tool Venny 2.1 56 . A Mantel test was done to test similarities of distance and kinship matrices, with 30,000 permutations for a two-tailed Mantel test using R packages 57 .
Result phenotypic traits. Analysis of variance revealed significant differences among the genotypes for all traits measured in the study (Table 2). UPGMA based dendrogram analysis grouped the 114 rice accession into 7 clusters using fifteen roots and shoot traits. Clusters III, V, VI and VII comprised of 11, 3, 3 and 97 rice germplasm, respectively (Fig. 1). The cluster I composed a solitary genotype 'Inglongkiri' which was the best genotype among www.nature.com/scientificreports www.nature.com/scientificreports/ 114 rice germplasm considering several root traits together. The cluster II composed of 'Horin Kajuli' which showed drought tolerance with high shoot and fresh shoot weight. The cluster III genotypes have a relatively low root angle. The genotypes ' ARC10372' and ' As 313' were the best genotypes in this cluster for root and shoot traits. The cluster IV was composed of ' As 1913/1' which was found to be drought susceptible with the shallow and fibrous root system. The cluster V composed of ' As 138/2' , 'Bizor' and 'Bizor-II' genotypes which showed to be drought tolerant. Genotypes of cluster VI were dwarf genotypes with the shallow but narrow root system. The cluster VII was the largest group with 97 genotypes, among them ' As 314' , ' As 93/1' , 'Jahinga' , 'Kosamoni' and 'Raja Ahu' showed drought tolerance. These results revealed that considerable variation existed among genotypes for drought, root and shoot traits.
PCA analysis revealed that the first principal component with eigenvalue of 5.21 accounted for 40.13% of contribution to total variation and a second component with eigenvalue of 2.0 accounted for 15.38% of contribution to the total variation in the population. These two principal components include fresh shoot weight, root volume, dry shoot weight, fresh root weight, drought score after 32 and 34 DAWW (Fig. 2).

SSR polymorphism among rice varieties. The polymorphism information content (PIC) of markers
along with allele information such as number, size, highest frequency detected among 114 accessions is presented in Table 3. In the present diversity analysis, PIC value ranged from 0.005 for RM 87 to 0.802 for RM 418 with an average of 0.51 for all the genotypes under study ( Table 3). The primer RM 474 and RM 320 showed higher discriminatory power to distinguish genotypes due to its high PIC value 0.747 and 0.749 respectively. The primer RM 87 and RM 480 showed lower PIC value 0.005 suggesting less discriminatory power of this primer under study. PIC value of microsatellite marker higher than 0.5 is considered highly informative 54 . The highest resolving power (3.05) was observed for RM219 and lowest (0.32) for RM480 with an average of 1.37. The most major frequent allele frequency of 0.90 was observed for the marker RM480 and the lowest (0.35) was observed for RM24 with a mean of 0.61. population genetic diversity. The population-level genetic diversity of the rice accessions under study is presented in Table 4. Altogether 147 alleles were detected using 65 SSR markers, with an average of 2.26 alleles per locus, which indicated that genotypes of the present study were diverse. The number of alleles amplified varied from 2-4 and the highest number of alleles (4 alleles) were detected for RM 219 and RM 592. These two markers RM 219 and RM 592 showed the highest Nei's Gene diversity of 0.658 and 0.601 respectively. Observed heterozygosity (H o ) ranged from 0 to 0.725 (RM219) with an average of 0.036 across all 65 loci. The majority of the SSR markers exhibited observed heterozygosity as zero, indicating that the majority of rice germplasm used in the present study were pure and completely homozygous for SSR markers used in the present study, which may be the result of the self-pollinated mode of reproduction of rice. Observed heterozygosity (0.041) was far lower than total expected heterozygosity (0.467 ) which is further supported by low gene flow (Nm) value for the majority of loci, except RM87, RM431, RM333, RM 336 and RM495. The average value of Nm was recorded at 3.264. Expected heterozygosity or gene diversity (He) estimated based on Nei distance varied from 0.118 (RM480) to 0658 (RM219) with an average of 0.334.

Genetic relationship among the germplasm. Population structure analysis using the model-based
approach. In the present study grouping in population was determined using STRUCTURE analysis. The population structure of the 114 genotypes was analyzed by Bayesian clustering model-based approach with admixture and k value ranging from 1 to 10 with 5 iterations using 65 polymorphic markers. The ΔK was found highest for the model parameter K = 3 then for other value of K (Fig. 3) and the standard deviation was least at K = 3. Hence the true number of subpopulations were considered as three (P1, P2 and P3) (Fig. 4) which indicated that the whole population can be stratified into three subpopulations. Based on the sharing of genomic regions, genotypes in different populations were classified as pure or admixture. The accessions with the probability of ≥80% were considered as pure and assigned to corresponding subgroups while <than 80% were categorized as admixture (Fig. 4). Among 114 genotypes, 84 were pure and 30 rice accessions were admixture. Subpopulation P1 showed 21 pure (58.3%) and 15 admixed (41.6%) landraces, P2 had 52 pure (82.5%) and 11 (17.5%) admixed landraces, and P3 had 11 pure (73.3%) and 4 (26.7%) admixed individuals. No significant grouping was observed based on drought tolerance and roots and shoot trait data in the present study. However, the P1 population has 'Kosamoni' as a tolerant genotype having the highest root angle (64°) indicating a narrow rooting system. The best performing genotypes in the population (P1) were 'Kosamoni' for drought tolerance and 'Hafa Ahu' for root volume, fresh root weight and root length. The genotypes 'Banglami' , 'Bizor' , 'Bizor-2' , 'Horin Kajuli' and 'Raja Ahu' genotypes were the best genotypes of the P2 population for drought tolerance and high recovery rate. 'Horin Kajuli' was the www.nature.com/scientificreports www.nature.com/scientificreports/ best genotype of this group in terms of drought tolerance, root volume, fresh root weight and dry shoot weight. In this group, 'Boga Gajeb' , 'Gajef Sali-3' and 'Norin 18/Patnai 23' showed moderate drought tolerance and rest genotypes were susceptible to drought. As many as, 30 genotypes were observed as admixtures. Among the genotypes of admixtured origin, the genotypes 'Inglongkiri' , ' ARC 10372' , ' As 38/2' showed drought tolerance. Among all 114 genotypes, 'Inglongkiri' was also found to be the best genotypes for root volume, fresh root weight and dry shoot weight and root length. This indicated that different population identified using SSR markers also showed variation for different traits under study. These drought-tolerant genotypes can be used in drought tolerance improvement breeding programmes in rice.
F ST statistics were calculated using STRUCTURE software to estimate the level of population structure. The F ST values of 0.319, 0.332 and 0.570 for sub-populations P1, P2 and P3 with an average value of 0.407 indicated moderate population structure. In model-based analysis mean alpha value was 0.1022. The genetic differentiation among subpopulations was very high (average F ST = 0.407) based on classification given by Wright 58 . Among the subpopulations, the P3 was highly differentiated followed by the P2 and P1. This genetic differentiation might be a result of natural selection favouring a different set of alleles in different ecologies and physical barrier which   www.nature.com/scientificreports www.nature.com/scientificreports/ leads to low interchange or migration of alleles between different subpopulations. This is further supported by low gene flow (Nm) value for the majority of loci, except RM566 and RM495 SSR marker gene flow value. The average distance between individuals in P1, P2 and P3 was observed 0.358, 0.323 and 0.240 respectively, which indicated that P1 was the most diverse and less differentiated as compared to P2 and P3 subpopulation.
Pairwise F ST values of sub-population range from 0.152 to 0.222 and showed significant differentiation among all the pairs which suggested that all the three groups were significantly different from each other. Based on pairwise F ST estimate, P2 and P3 showed the highest level of differentiation from each other and population P1 and population P2 exhibited less differentiation from each other ( Table 5).
The genetic diversity at the subpopulation level was studied in terms of the mean number of alleles (Na), No of effective alleles (Ne), observed heterozygosity (Ho), gene diversity (He), unbiased expected heterozygosity (uHe) and Wright's fixation index (F), which is presented in Table 6.
The Na, F and Ne were comparable among the three subpopulations ( Table 6). The gene diversity was highest in P1. In all subpopulations, the mean expected heterozygosity was higher than mean observed heterozygosity. This was supported by the Mean fixation index of all subpopulations which was varied from 0.819 to 0.914.

Analysis of molecular variance (AMOVA)
. AMOVA was done on population provided by model-based analysis, because of its reliability and consistency to provide detail information about the genetic constitution of the population. AMOVA revealed the presence of 17% of the variation was among populations, whereas, 77% of the variation among individuals and 6% of the variation among individuals within a population (Table 7). AMOVA revealed that most of the variation in rice accession mainly occurred among individuals. Wright's F statistic (F ST ) was 0.174, while F IS and F IT were 0.93 and 0.94, respectively. Higher F IS , which is measured at the subgroup level in the whole population, has indicated a lack of heterozygosity and high distinctness of populations, due to the autogamous nature of the crop. Determination of F ST has shown high genetic variation among the population.
Nei genetic distance ranged from 0.190 to 0.333. The maximum distance was observed between Pop 3 and pop 1 (0.333) and minimum distance was observed between pop1 and pop 2 (0.190), indicating that genomic differences between pop 3 and pop1 were more and it was less between pop1 and pop2 (Table 8).
Neighbor-joining based clustering. An unweighted neighbor-joining tree, based on the alleles detected by 65 SSR markers, showed the genetic relationships among the 114 accessions. Cluster analysis based on the unweighted neighbor-joining clustering method separated the accessions into three main groups along with admixture genotypes spreading over three different clusters (Fig. 5), which showed similar results as model-based analysis. Cluster I was the largest cluster consist of 64 genotypes. Cluster II consisted of 26 genotypes followed by Cluster III having 24 genotypes. This pattern of clustering confirmed the existence of a significant amount of diversity. The genetic relationship results of the model-based analysis were compared with the unweighted neighbor-joining clustering method using the Venn diagram. Cluster-I generated through neighbor-joining unweighted analysis has 63.2% similarity of genotypes with subpopulation P1 generated through model-based analysis Fig. 6(A), whereas cluster-II showed 85.3% correspondence with subpopulation 2, Fig. 6(B). Similarly, cluster-III showed 65.2% of similarity of genotype sharing with subpopulation 3, Fig. 6(C).
Principal coordinate analysis. PCoA using SSR markers allelic data determines the genetic relatedness among the genotypes. The first three axes of differentiation explained 27.87% of the total variation. The first coordinate explained 11.63% of the variation and second coordinate explained 9.12% of the variation (Fig. 7). PCoA analysis, similarly, grouped the genotypes as that of model-based analysis.
The results of unweighted neighbor-joining clustering tree and PCoA analysis were in close correspondence to results of model-based STRUCTURE analysis which further confirmed the population STRUCTURE results. The histogram showing the sampling distribution of our 1000 randomly-produced Pearson correlations with the diamond symbol showed the location of our observed correlation (0.059) (Fig. 8). Mantel test P value (0.937) revealed a lack of correlation between genotypic and phenotypic distances.  Continued A close correspondence was recorded in the results of AMOVA and F ST analysis with that obtained from model based analysis, PCoA and unweighted neighbor-joining clustering. These revealed that the population under study has high genetic diversity and moderate population structure.  Table 4. Genetic diversity of 65 SSR markers in the 114 rice genotypes. Na = Number of alleles, Ne = Number of effective alleles, Ht = Total expected hetrogygosity, He = Gene Diversity, Ho = Observed Hetrogygosity, FIS = inbreeding coefficient, FIT = inbreeding coefficient to total, FST = Fixation index and Nm = Gene flow.  www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
Analysis of genetic relationship among individual is an important component and play a major role in their effective utilization in the crop improvement programme. Genetic diversity study provides knowledge about the level of genetic diversity and genetic structure of the population and serve as a platform for the selection of superior genotypes to be used as parents in crop improvement breeding programmes. The popularization of few improved varieties in a crop like rice among the farming community leads to the narrow genetic base of crop species which results in high susceptibility of crops to various abiotic and biotic stress damages 59,60 . Therefore the inclusion of diverse valuable genetic base in breeding programmes can play a key role in the improvement of the degree of tolerance against various abiotic and biotic stress damages. The present investigation was aimed to throw some light on genetic variation for root traits and genetic diversity in North East Indian rice genotypes for their effective utilization in the breeding programme.
Dendrogram analysis based on root, shoot and drought tolerance traits revealed that the clustering pattern obtained is determined by mainly fresh shoot weight, root volume, dry shoot weight, fresh root weight and drought score traits. Cluster I genotype was characterized by the highest root volume, fresh root weight and dry root weight. Cluster II genotype is characterized by the highest fresh shoot weight and drought tolerance. Cluster IV genotype was characterized by drought susceptibility, root length and bottom root number. Cluster V is characterized by high drought tolerance and cluster 6 is characterized by low value for fresh shoot weight, dry shoot weight, root volume and fresh root weight. Cluster VII is characterized by drought susceptibility and intermediate value for root volume, fresh root weight, fresh shoot weight and dry shoot weight. The PCA analysis revealed that maximum diversity in a population of 114 rice germplasm was governed by fresh shoot weight, root volume, dry shoot weight, fresh root weight, drought score traits. This study revealed sufficient diversity and genotypes identified to be superior for one or more traits from different clusters might be useful in the hybridization programme to identify desirable segregants for the traits under study.    www.nature.com/scientificreports www.nature.com/scientificreports/ Genetic diversity refers to the presence of contrasting alleles of a gene in different individuals of the same species 61,62 . Diversity analysis using molecular markers is advantageous over the conventional approach based on phenotypic data, as molecular markers provide true information at a genetic level without the influence of environmental effects and provide information about the genetic constitution of genotypes such as which genomic regions or alleles have come from which population 63 . Genetic diversity study provides not only the phylogenetic relationship but also provide a chance of a finding a new and useful novel alleles present in a diverse set of accessions 64 .
Polymorphism information content (PIC) indicates the informativeness of a marker and allelic diversity of the population. PIC value of 1 indicates that marker is highly polymorphic, and would have an infinite number of alleles, and the marker is more informative, suggesting higher discriminatory power of marker 65 . SSR polymorphism analysis revealed an average PIC of 0.507 for 65 markers, which reflected the better discriminatory power of these markers to reveal the higher level of genetic diversity among genotypes and indicated the diverse nature of accessions under study. Similar results for average PIC were reported by Das et al. 66 in the landraces of northeast India. Behera et al. 67 reported a higher average PIC of 0.811 per locus, this might be due to the use of a more diverse set of rice accession in their study or due to the use of highly polymorphic markers. Shah et al. 68 and    Choudhary et al. 75 reported a higher number of alleles (3.69) per locus with a range of 3 to 7 alleles in 100 major rice cultivars using 52 hyper-variable SSR markers. Das et al. 66 also reported a higher number of alleles ranging from 2-11, with an average of 4.91 alleles per locus in Northeast rice germplasm. Edzesi et al. 76 reported average 11.3 alleles per locus in China and Vietnam rice accession. Roy et al. 77 also detected 2-21 alleles with an average of 8.49 alleles in rice accession comprised of Arunachal Pradesh landraces, basmati rice, local aromatic rice of Meghalaya, aus rice, japonica rice, indica rice. The discrepancy in the number of alleles detected might be due to the use of highly diverse genetic material and polymorphic DNA markers in their study.
Gene diversity similar to the present study has been reported by Chen et al. 78 . Ananadan et al. 40 also reported 0.30 gene diversity in 426 ARC accessions, 25 tropical japonica, 57indica landraces, 127 breeding lines. Roy et al. 79 reported 0.66 gene diversity in 26 rice accession of Arunachal Pradesh. Genotyping of 409 Asian rice accessions collected from 79 countries of the world using SSR markers revealed the genetic diversity of 0.68, which is higher than the present study observation 80 . Most of the diversity panel with global accessions have the gene diversity of 0.45 to 0.7 81,82 . These results on global accessions involving indica, tropical japonica, temperate japonica and wild relatives help to infer that, the diversity in a panel of 114 rice accession collected from North East India represents a considerable proportion of the genetic diversity that exists in the major rice-growing Asian continent. In the present study Nm value for some markers recorded more than 1, which is considered to be high as Nm value in self-pollinated crops lies below one 83 . The inclusion of sister lines or same landraces with different names or little amount of cross-pollination as farmers some time grows crops vary adjacent to each other might attributed to high Nm value.
Population structure analysis using model-based analysis is better than the frequentist approach of clustering since model-based clustering is based on Bayesian methods, in which certain parameters like correlated allele frequencies no-prior population information were used as defined in STRUCTURE software. Distance-based methods are usually easy to apply and are often visually appealing such as neighbor-joining to cluster multi-locus genotype data, but difficult to assess how confident one should be that the clusters obtained in this way are meaningful 84 .
The model-based Bayesian clustering algorithm approach assigns individuals to subpopulations. Among 114 genotypes 84 genotypes were pure and 30 rice accessions were admixture. The admixture was observed because of gene flow among genotypes through a small amount of natural cross-pollination as farmers grow rice crops adjacent to each other fields. The relatively small value of alpha (α = 0.1022) in the present study reveals that only a few individuals were admixed 72,85 . The assigning of genotypes to the subpopulation is based on the ancestry threshold of the individual. In the present study threshold of 80% was used for grouping which is more stringent and result in 30 genotypes as an admixture. A similar ancestral threshold of 80% to categorize an individual to a particular subpopulation was used by Salgotra et al. 59 , and Travis et al. 6 . However, if the ancestral threshold of 60% has been followed to categorize an individual according to Liakat Ali et al. 80 , then only 15 individuals have been categorized as an admixture.
In the present study, population structure analysis grouped 114 genotypes into three subpopulations named as P1, P2 and P3. Roy et al. 79 also reported three number of a subpopulation in North East rice collection. Singh et al. 72 reported three numbers of groups in Indian rice. Islam et al. 73 reported three clusters in aromatic rice accessions. Das et al. 66 reported four groups in a 91 rice landraces from Eastern and Northeastern India. Edzesi et al. 76 reported seven subgroups in 628 rice accessions from China (507) and Vietnam (121) because of the big population and highly diverse population. Rathi et al. 86 have reported ten subgroups in a population of 100 indica rice using 98 SSR markers. Anandan et al. 40 grouped 96 rice accessions involving 70 ARC into two groups due to less number of genotypes and low diversity among individuals. In the present study grouping, pattern or number was different from other studies might be attributed to the use of different marker system and a different set of genotypes. In the present study, Banglami, 'Rongadoria' and 'Kola Ahu' was grouped in different subpopulations and 'Inglongkiri' and 'Mazubiron' was identified as admixture while Travis et al. 6 reported 'Banglami' , 'BogaAhu' and 'Kola Ahu' in Aus-1 and 'Inglongkiri' and 'Mazubiron' in Aus-2. Such differences might be attributed to use of fewer markers in the present study.
The P1 comprised of genotypes like 'Banglami' , 'Bizor' , 'Bizor-2' , 'Horin Kajuli' and 'Raja Ahu' showing drought tolerance and high recovery rate. 'Horin Kajuli' was the best genotype of this group in terms of drought tolerance, root volume, fresh root weight and dry shoot weight. The best performing genotypes in P2 were 'Kosamoni' for drought tolerance and Hafa Ahu for root volume, fresh root weight and root length. The P3 consisted of 15 genotypes (Fig. 4), among which 'Boga Gajeb' , 'Gajef Sali-3' and 'Norin 18/Patnai 23' showed moderate drought tolerance and rest genotypes were susceptible to drought. Among the genotypes of admixtured origin, 28% genotypes were drought-tolerant and others were identified as drought susceptible. For example, 'Inglongkiri' , ' ARC 10372' , ' As 38/2' showed drought tolerance. Among all 114 genotypes, Inglongkiri was the best genotypes for root volume, fresh root weight and dry shoot weight and root length. These indicated that different population identified using SSR markers also showed variation for different traits under study.
Results of AMOVA indicated that there was a higher proportion of variation among individuals and a lower proportion of variation among populations. Similar to this Jasim Aljumaili et al. 87 , Islam et al. 39 , Salgotra et al. 59 and Singh et al. 72 also reported a higher proportion of variation among individuals in the rice population. All the genotypes of the present study were collected from different regions of Assam covering lowland (Sali rice) and varied upland (Ahu rice) situation, which results in higher variation among individuals than among populations. Within individuals 6% variance was observed, it indicated the high purity of germplasm and has been maintained carefully without any mixture. A very high F IT value has indicated a lack of heterozygosity most likely due to the www.nature.com/scientificreports www.nature.com/scientificreports/ inbreeding nature of rice (Nachimuthu et al. 74 ). The F ST inbreeding coefficient within subpopulations relative to the total provides a measure of the genetic differentiation between subpopulations 88 . The determination of F ST using structure analysis for the subpopulation of the present study was 0.407 which indicated high differentiation between subpopulation because genotypes were collected from a wide range of ecology and topography. Wright 58 proposed that values of F ST 0.25 explain a very great differentiation between subpopulations; the range of 0.15 to 0.25 indicates moderate differentiation; while differentiation is not negligible if F ST is 0.05 or less.
The PCoA analysis showed large genetic diversity and distinctness of populations, the first two principal coordinates explained 11.63 and 9.12% of the variance. A similar pattern of molecular variance was reported by Nachimuthu et al. 74 .
In all genotypic based clustering patterns of the present study, admixtures were distributed over subpopulation. Groupings of genotypes obtained through model-based analysis, unweighted neighbor-joining clustering and PCoA were incongruent to a large extent indicating real genetic differences among the genotypes under study at DNA level and perfectness of clustering of genotypes. Venn diagram analysis also showed more than 62% of co-linearity between rice germplasm grouping in neighbor-joining clustering and model-based population structure. A similar finding was also reported by Singh et al. 72 . Observed minor differences in a grouping of genotypes could be attributed to the difference in their methodology of grouping, as model-based analysis grouping is based on the Bayesian model approach whereas unweighted neighbor-joining clustering is distance-based approach in which genotypes with admixtures were not considered. The result of model-based analysis was more productive as it provides detail information about the genetic constitution of genotypes which help in the separation of admixture from pure genotypes. The result of Model-based analysis is robust for small population size 89 . Grouping or clustering of genotypes helps in identification of diverse parents to be used in the hybridization programme to create segregating progenies with maximum genetic variability for further selection 90 . Therefore the selection of genotypes from different population complimenting for different root traits and drought tolerance help in generating transgressive segregates for root traits and drought tolerance.
No correlation was observed between genotypic and phenotyping based clustering. which is obvious, because 61 SSR markers used in the present study were random SSR markers, not EST markers. SSR markers are present in both non-coding and coding region of genome 38,91,92 , but majorities of SSR markers are present in introns and might not affect the trait of interest directly 93 . Only a few of the SSR markers might be linked to genomic regions which influence the trait of interest. Phenotype is the resultant of genotype and environment effect. At molecular level phenotype is the outcome of many gene expression and interaction. Diversity based on phenotypic traits is influenced by environments [33][34][35] . Another reason could be that only a few markers may be linked with root, shoot and drought tolerance, therefore little variation is detected by the SSR markers in the present study. Farmers have selected land-races based on yield attributes and unknowingly they have selected for drought tolerance and better root system in upland rice germplasm of northeast India. Morphological traits were subjected to farmer's selection pressure while SSR (DNA) markers were not the target of selection 94 . These reasons might leads to high chances of no correlation between genotypic and phenotypic based clustering. Silva et al. 95 also reported a low correlation (0.35) between genotypic and phenotypic diversity in sorghum. Fufa et al. 96 also observed a low correlation between genotypic based clustering and yield attributes based clustering in red winter wheat. Nazaphy et al. 97 reported a low correlation of 0.049 between SSR marker-based clustering and phenotypic based clustering. According to Martinez et al. 98 correspondence between molecular and phenotypic based diversity might be improved by analysing more numbers of morphological and DNA markers. Silva et al. 95 stated that the poor correlation observed between the molecular and the phenotypic diversity matrices highlights the complementarity between the molecular and the phenotypic characterization to assist a breeding program. Hence, the combined use of phenotypic and molecular data is regarded as the best way to identify divergence among genotypes due to their complementary nature.
The present study revealed a high level of genetic diversity among the accession at DNA level, root phenotype and drought tolerance as the landraces of NE India are cultivated by farming community historically for many years in a diversified ecological niches such as near river basin, hills, hill slopes and plains, etc.

conclusion
In this present study, SSR based diversity analysis confirmed the existence of genetic diversity in a population of 114 rice genotypes. Based on various statistical methods, we identified three subpopulations along with 30 admixtures. These three subpopulations were highly differentiated from each other. The majority of variation was observed among individuals. The gene and allele based diversity analysis have indicated the existence of a broad genetic base in this population. The result of the model-based analysis is in close correspondence with the results of the neighbor-joining clustering and PCoA analysis. Thus, the results of this study indicate the scope for utilizing the genetic diversity results in association mapping analysis and selection of diversified genotypes for the development of variety from diversified groups complementing each other for various economical traits.