Morphological and molecular dissection of wild rices from eastern India suggests distinct speciation between O. rufipogon and O. nivara populations

The inter relationships between the two progenitors is interesting as both wild relatives are known to be the great untapped gene reservoirs. The debate continues on granting a separate species status to Oryza nivara. The present study was conducted on populations of Oryza rufipogon and Oryza nivara from Eastern India employing morphological and molecular characteristics. The cluster analysis of the data on morphological traits could clearly classify the two wild forms into two separate discrete groups without any overlaps i.e. lack of intermediate forms, suggesting the non-sympatric existence of the wild forms. Amplification of hyper variable regions of the genome could reveal 144 alleles suggesting high genetic diversity values (average He = 0.566). Moreover, with 42.37% of uncommon alleles between the two wild relatives, the molecular variance analysis (AMOVA) could detect only 21% of total variation (p < 0.001) among them and rest 59% was within them. The population structure analysis clearly classified these two wild populations into two distinct sub-populations (K = 2) without any overlaps i.e. lack of intermediate forms, suggesting the non-sympatric existence of the wild forms. Clear differentiation into two distinct groups indicates that O. rufipogon and O. nivara could be treated as two different species.


Results
). In the O. nivara population no variation was detected for panicle: awns (present in all accessions), leaf: shape of ligule (split) and leaf: auricles (present), all being monomorphic. Similarly, the characters like panicle: awns (present), panicle: curvature of main axis (straight), panicle: secondary branching (weak), leaf: shape of ligule (split) and leaf: auricles (present) could not identify any differ-  Table S2 and Fig. 2). O. rufipogon accessions were medium slender to long slender for their grain characteristics (grain length: 7.08-9.26; grain breadth: 1.76-2.94), but some short bold and short medium grain types (grain length: 4.27-9.46; grain breadth: 1.99-3. 25 (Table 1). RM8020 amplified a highest of 7 alleles, followed by 6 alleles with RM495, RM3866, RM16649, RM336 and RM72. The loci, RM6378, RM422, RM413 and RM3472 could amplify 5 alleles each, while RM423, RM5780, RM3392, RM163, RM528, RM510, RM253, RM547, RM215, RM447, RM8207, RM206, RM5918 and RM2529 showed 4 alleles in the entire set of wild accessions. One locus i.e. RM261 could detect only one allele which differentiated the accessions for its presence or absence. The number of effective alleles for the tested loci ranged from 1.000 to 4.331 for the markers RM261 and RM8020, respectively which was directly related to number of alleles amplified by the markers. Effective allele is the number of alleles amplified with equal frequencies in the entire population. PIC value of a marker determines the effectiveness of diagnostic power for that particular marker in the given set of genotype. In our study, the PIC value for the 36 SSRs varied from 0.428 (RM261) to 0.982 (RM16649). Of the 36 loci, 17 could detect a high PIC value of more than 0.900 and the average PIC value calculated with all the markers used in this study was 0.852. The markers could clearly reveal that expected homozygocity (Ho) of any given loci is inversely proportional to its heterozygocity/genetic diversity (He). RM8020 showed a highest genetic diversity (He) of 0.769 followed by RM16649 (He = 0.746). Similarly, RM244 showed lowest genetic diversity (He = 0.080). However, RM261 distinguishing the accessions with only one allele, could not generate any genetic diversity for the tested set of accessions. The average genetic diversity value detected with the markers was 0.566 and 7 of the loci were detected with a genetic diversity value of more than 0.700. Similar trend was detected for Shannon's genetic diversity index (I) for all the 36 markers.
The molecular markers were categorized based on their repeat motif. Out of 36 total loci, 17 were with di-nucleotide repeat motif, 13 having tri-nucleotide and 2 having tetra-nucleotide. Further, 4 loci had multi-nucleotide repeat motif. The average number of alleles was highest (4.750) for markers having multi-nucleotide repeat motif followed by tri-nucleotide (4.077), di-nucleotide (3.882) and tetra-nucleotide (3.000) ( Table 2). However, the average PIC value was highest with tetra-nucleotide repeat motif and was lowest in case of tri-nucleotide repeat motif. But, the genetic diversity value (He) was 0.586 (highest) for di-nucleotide repeat motif followed by 0.566 (multi-nucleotide repeat motif), 0.544 (tri-nucleotide repeat motif) and 0.536 (lowest) for tetra nucleotide repeat motif.
Rare and unique alleles. The alleles which were amplified in less than 5% of the entire population for any given loci were considered as rare alleles 31 Table S3). Another marker, RM2529 also amplified 4 rare alleles i.e. 150 bp (8 accessions), 180 bp (5 accessions), 200 bp (5 accessions) and 210 bp (2 accessions). Majority of the rare alleles were observed in O. rufipogon accessions (cumulative n = 116) than in O. nivara accessions (cumulative n = 71). Of these, some of the accessions were detected with more than one rare allele for more than one loci. For instance, AC100141 (O. rufipogon) showed two rare alleles of 120 bp for Unique allele addresses the potentiality of any given loci to detect an accessions with any distinct amplicon that is different from the entire population. In our study, a total of 5 unique alleles were identified with the tested set of loci. Interestingly, one O. nivara accession, AC100313, was detected with the maximum of 3 unique alleles of 250 bp (RM423), 280 bp (RM422) and 500 bp (RM3392) (Supplementary Table S4). Further, another O. nivara accession (AC100016) showed uniqueness for RM510 with an amplicon of 400 bp and O. rufipogon accession (AC100485) was detected to amplify a unique allele for RM3866 (450 bp).   Further, the analysis of molecular variance (AMOVA) was done to reveal the total existing variation within and between populations (both species and geographical region wise). When we grouped the total O. rufipogon population and total O. nivaraO. nivara population separately, regardless their collection states, 21% of the total variation was present among population while 79% was within population (Table 4). Secondly, in both the cases for O. rufipogon and O. nivara collected from two different states, the AMOVA could identify only 2% of variation among population and 98% within population. The neighbour-joining dendrogram based on pairwise genetic distance measure between any two pair was constructed to understand the genetic relationship among the accessions at molecular level. The dendrogram could clearly separate O. rufipogon and O. nivara accessions in to two distinct clusters. Further, the O. rufipogon accessions were observed to be closely related to each other with 4 major sub-clusters and some minor sub-clusters. But, the O. nivara accessions were distantly associated generating several sub-cluster (Fig. 3A). The O. rufipogon accessions were grouped together having less distance from each other as compared to the O. nivara accessions. Two distinct sub-clusters, both in O. rufipogon and O. nivara clusters were detected.
The population structure analysis revealed highest value of 959.28 at K = 2, suggesting that the entire collection could be divided into two distinct sub-populations i.e. SP1 and SP2 (Fig. 3B) Table S6). Similarly, the number of effective allele ranged from 2.697 for Bardhhaman to 0.028 for Mayurbhanj. The locally common allele for a particular district with a frequency of >25% was highest (0.389) for Bankura district, whereas it was lowest (0.028) for Mayurbhanj. However, the accessions collected from Bardhaman district showed highest diversity of 0.569 followed by Midnapore (0.548) and Mayurbhanj district was lowest counterpart with a diversity value of 0.349 for  its wild accessions. The analysis of molecular variance revealed that there existed 9% of the total variation among the districts and rest 91% were within them (Supplementary Table S7). The entire collection of wild accessions i.e. O. rufipogon and O. nivara were grouped based on their districts of collection from both the states to understand the genetic relationship among different districts. The neighbour-joining dendrogram constructed based on pair-wise genetic distance between districts grouped the 26 districts into two major clusters. The districts like Boudh, Anugul and Dhenkanal present in the middle part of Odisha were separated from rest of the districts and were grouped in a different cluster (Fig. 4). Critical observations showed that most of districts which share the adjacent geographical areaswere grouped together. For example, close relationship was observed between Malda, South-Dinajpur and North-Dinajpur of West Bengal, Puri and Ganjam of Odisha, Koraput and Malkangiri of Odisha. Further, the districts like Sundargarh, Keonjhar and Mayurbhanj of Odisha were closely associated with their nearby West Bengal districts like 24Parganas, Midnapore and Bankura were grouped together.

Discussion
The aim of the present study was to harness the potential of morphological and molecular variation available within and between two wild rice populations i.e. O. rufipogon and O. nivara collected from West Bengal and Odisha, two major rice growing states of eastern India. In these states, in general, the farmers are marginal having small land holdings and employ traditional practices. Rice cultivation in these areas is quite challenging as irrigated area is less and proportion of rainfed rice is high. Under this scenario, biotic and abiotic stresses are of common occurrence as most of the area is classified as uplands or low lands. However, the potential to enhance productivity and production in eastern India is high as a number of wild rice accessions are still available in these states and thorough characterization of them could lead to identification of suitable material for rice improvement. Since both O. rufipogon and O. nivara are closely related and also sharea close relationship with the Asian cultivated rice (O.sativa), the characterization of the diversity in them could lead to understand their dynamic relationship. At morphological level, as expected, a high level of variation was detected in both O. rufipogon and O. nivara accessions. Awn, a specificfeature of wild forms, wasdetected in all the accessions. Interestingly, high variation (4 alternative forms for each trait) was detected for traits like basal leaf sheath colour, leaf angle, culm attitude, internode colour and flag leaf attitude in both wild forms but the frequencies varied between the species. For example, in O. nivara, the green basal leaf sheath was predominant (52%), while majority (47%) of the O. rufipogon accessions were having purple basal leaf sheath. The dendrogram constructed based on the morphological traits clearly demarcates O. rufipogon and O. nivara into two distinct classes.
The wide variation observed in O. rufipogon and O. nivara for different agronomic traits provide us signs related to their ecological adaptation. While majority of the O. rufipogon and O. nivara accessions are of late duration, the medium duration accessions are of interest as they can lead us to understand the mechanisms involved in tolerance to drought and photo period sensitivity and the underlying genes involved. Earlier studies reported, fine type of grain in O. rufipogon and bold type in O. nivara 32 . In the accessions studied, variation for grain type was observed in both wild forms with medium slender and long slender grains observed in O. rufipogon while majority of the accessions of O. nivara are either short bold or short medium type. All the accessions under study are with weak culm and one of the major drawbacks in wild rices is culm thickness and for this reason, wild rices are of spreading type.
Marker technologies can dissect variation efficiently at molecular level and the 36 highly variable SSR markers employed in the study could amplify a total of 144 alleles, a value much higher than the number of alleles detected in wild rice population of Uttar Pradesh and Bihar of India 33 where a total of 106 alleles were detected. Further, Singh and his colleagues 24 could detect an average of 2.4 alleles per locus with 25 SSR markers in wild rice collection of Indo-Gangetic Plains of India whereas, we could detect an average of 4 alleles per locus. Instead of using random SSRs, targeting the hyper variable regions of rice chromosome to exploit their genetic diversity would have an edge over the random SSRs. Except one locus i.e. RM261, high allelic variationwas seen with 35 markers used and from the results it can be suggested that the present set of markers used in this study can serve as a reference set for genetic diversity studies in rice. This observation was further justified as these markers showed high resolving power with PIC values ranging from 0.428 for RM261 to 0.982 for RM16649 and 17 out of 36 loci had a PIC value of more than 0.900. The average PIC value in our study (0.852) was higher than the value reported earlier in wild rice of Indo-Gangetic plains of India (0.790) 24 , wild rice collection of UP and Bihar of India (0.5247) 32 and Indian rice mini-core collection (0.671) 34 .
As expected, the diversity detected in O. rufipogon and O. nivara accessions of eastern India (average He = 0.577) was higher than the diversity reported in Indian rice hybrids (He = 0.432) 35 , Thai rice lines (He = 0.436) 36 and rice germplasm of DPRKorea (He = 0.340) 37 . Further, this set of wild accessions were more diverse than the wild rice collections of eastern Indo-Gangetic plains 24 , while it was quite comparable with the wild rice collection from UP and Bihar of India 33 . As the Sundarbans delta of West Bengal and western part of Odisha are considered as two major biodiversity hot spotsfor different flora and fauna, the wild rice accessions of these regions and nearby areas are expected to display higher amount of genetic diversity 38,39 . Further, predominance of tribal communities who still prefer traditional land races and traditional farming practices in these two states (Odisha and West Bengal) as compared to other states could be a reason for low level of genetic erosion in these areas.
Interestingly, more than 20% of the total amplified alleles were identified as rare alleles. This observation confirms the earlier reports that wild rices could be preferred over cultivated varieties/landraces for mining valuable alleles for improvement of rice. The rare alleles may be associated with special traits which could be highly valuable in breeding programmes after their validation. In our study, of the five unique alleles detected, three were detected in a single O. nivara accession i.e. AC100313. Since O. nivara it the closer to O.sativa, genome sequencing of this particular O. nivara accession could unravel its genetic constitution. Different QTLs associated with drought tolerance were mapped on rice chromosome 2 and 3 and since O. nivara is adapted to upland conditions, the unique alleles identified on these chromosomes are of great interest.
Of the thirty six locitested, two loci could differentiate O. rufipogon from O. nivara effectively with distinct allelic variation between the two populations. The allele sharing between O. rufipogon and O. nivara was detected to be high (57.64%), revealing close genetic association between them. Close relationship between these wild forms of eastern India might be due to a common ancestry and/or regular gene flow between them. This observation is similar to the reports of Banaticla-Hilarioand colleagues 40 who also reported more than 50% of allele sharing between O. rufipogon and O. nivara. As expected, O. rufipogon accessions were more diverse (He = 0.512) as compared to O. nivara (He = 0.484) 40 . The higher level of gene flow (Nm = 0.068) detected in O. rufipogon population can be attributed to its higher out pollination rates.
Though there was extensive allele sharing, species differentiation (21% variation between populations of O. rufipogon and O. nivara) detected in the study is similar to the results on the collections from Indo-Gangetic plains of India 24 and Vientiane region of Laos 23 . However, we could not detect any regional differences between the collections of West Bengal and Odisha which could be due to their close geographical location or due to diversification at a place and subsequent spread to the other region. The Bayesian model based clustering showed clear molecular divergence of O. rufipogon and O. nivara into two separate and distinct populations at K = 2 23,40 . Similar to nuclear genome variation 25,41 , organelle level variation i.e. chloroplast, mitochondria, between O. rufipogon and O. nivara was reported earlier 42,43 . Another interesting observation in our study was the absence of admixtures in the entire population which is in contrast to the report of Vaughan and his group 44 who reported abundant presence of introgressed/intermediate forms in the Asian gene pool. The availability of true to type O. rufipogon and O. nivara accessions will be of immense value to the future rice improvement programs. However, further studies are needed to ascertain the allelic similarity/diversity between the wild and cultivated forms.
One of the basic questions is how to treat O. rufipogon and O. nivara as separate groups on the basis of variation within and between the populations. Rather than analyzing the sub-populations (already identified at K = 2) separately, manually increasing the K value of the previous analysis was preferred 45 . Interestingly, at K = 3, the O. rufipogon accessions collected mostly from North-western regions of Odisha, formed an independent group and at K = 4, the O. nivara accessions from 24 Parganas and its adjacent districts of West Bengal formed a separate group. From the neighbour-joining dendrogram of different districts, it can be observed that the collections from adjoining districts that a share a common boundary, were grouped together. Moreover, a close association was observed between a group of districts of West Bengal comprising 24 Parganas and its nearby districts and the North-western districts of Odisha suggesting similarity over a broad geographical area. In Odisha and West Bengal, rice is generally grown under rainfed conditions and the region has both uplands and low lands, the habitats suitable for the growth of O. nivara and O. rufipogon respectively. Zheng  But, in our case, where the populationsare non-sympatric, we could detect 21% variation among species, thus clearly demonstrating the level of differentiation was non-sympatric populations than in their sympatric counterparts. This result provides some substance to the first domestication events and suggest thepossibility of a common ancestor for these two wild ancestors of rice.
Origin of O. nivara has always been a debatable issue since 1960s and it was considered as a distinct species 7,10 while it was treated as a habitat shift related speciation from O. rufipogon 4 . The observation of highly diverse accessions in both O. nivara and O. rufipogon, with little sharing of common alleles among them indicates that these two forms can be treated as diverse forms. Normally, description and differentiation between O. rufipogon and O. nivara has been focused with sympatric population of different locations in earlier studies. However, population genetic structure of these two wild relatives in a non-sympatric population has not been exploited much. From the results obtained in the present study where genetic variation between O. rufipogon and O. nivara, a non-sympatric population of eastern India was focused, it is interesting to differentiate the two groups in separate cluster at minimal model based analysis. Though there is evidence for conservation of some of the morphological traits and sharing of allelic uniqueness between the two groups, existence of them in a non-sympatric manner and their clear differentiation in alternative groups provides insights to the idea that O. rufipogon and O. nivara could be treated as different species. This is possible either through (i) accessions of both could have been evolved separately or (ii) one could have evolved from the other and fixation of genetic constituent thereafter according to agro-ecological condition leading to some of the intermediate and alternative forms due to the ecological shift to form an annual that lack in photosensitivity trait and with a curtailed duration.  Table S8 and Fig. 5). Further, these collected wild rice accessions were categorized according to their districts of collection of both the states. All the accessions were grown in individual pots in net-houses of National Rice Research Institute, Cuttack, Odisha during 2014-16 under controlled environmental conditions. Twenty-two phenotypic characters 46 (Supplementary Table S1) were measured at different growth stages following the guidelines of International Rice Research Institute for two consecutive years i.e. 2015-2016. Further, data was recorded on agronomic traits like heading duration, culm length, culm diameter, culm number, leaf length, leaf width, ligule length, panicle length, grain length, grain breadth, L/B ratio and 100 grain weight were recorded for each accession 47 .

Methods
Isolation of genomic DNA, PCR assay and genotyping. Fresh leaf samples were harvested from one-month young seedlings and genomic DNA was extracted from each wild rice accessions (both O. rufipogon and O. nivara) following a modified cetyltrimethylammonium bromide extraction protocol 48 . Thirty-six SSR loci amplifying the hyper variable regions 49 , three from each of 12 chromosomes of rice were selected for the genetic diversity analyses. Details of the SSR loci used in the present study are given in Supplementary  Table S9 (depending on TM value of primer) at 50-60 °C for 45 sec, extension at 72 °C for 1 min and a final extension of 7 min at 72 °C. The final PCR products were resolved in 2.5% ethidium bromide stained (1 µg/ml) agarose gel. The separated PCR products were visualized under UV light and photographed using Typhoon FLA 7000 fluorescent image analyzer (GE Healthcare Bio-Sciences AB, Uppasala, Sweden). The size of each intense amplified fragment for all SSR loci was determined by comparison with the size standard (100 bp DNA ladder) and scored by incremental numbering from the lowest molecular weight band to the higher molecular weight bands and the genotype matrix was prepared. Data analysis. The amplified bands/alleles were scored as present (1) or absent (0) for each genotype and primer combination. The data were entered into a binary matrix and subsequently analyzed using different computer software packages. An allele that was observed in <5% was considered to be rare allele.The polymorphism information content (PIC) for each SSR marker locus was calculated using the formula: = − ∑ = Pij PICi 1 ( ) j n 1 2 , where n is the number of marker alleles for marker i and P ij is the frequency of the j th allele of marker I 50 . Genetic diversity parameters viz., number of alleles (Na), effective number of alleles (Ne), expected homozygosity (Ho), Nei's genetic diversity index/expected heterozygosity (He) 51 and Shannon Index (I) were evaluated using POPGENE v 1.32 (http://www.ualberta.ca/fyeh) with 1,000 permutations. The dendrogram based on unbiased genetic distances among genotypes was constructed by un-weighted neighbour joining tree employing MEGA 6 52 . Principal coordinate analyses (PCoA) of the accessions with 36 SSR loci were performed based on the simple matching coefficient using the dcenter and eigenvector matrices in the software GeneALEx6 53 with 1,000 random permutations. The Bayesian model-based clustering analysis of the genotypes was used for determining the optimal number of genetic clusters found among rice varieties using the STRUCTURE software 54 with 1,00,000 burn-in (iteration) periods and 1,00,000 Markov Chain Monte Carlo (MCMC) replicates with ten independent runs (K) ranging from 1 to 10. The ΔK based on the change in the log probability of the data between successive K values 55 was estimated using the software program Structure Harvester v6.0 56 and population clusters were produced by the software Structure Plot 57 (http://btismysore.in/strplot). Moreover, genotypes were further grouped based on their collection by geographical location (districts) and the genetic diversity parameters of the genotypes within each district were determined using POPGENE v 1.32. The genetic variation within and among the populations was calculated by the procedure of AMOVA (Analysis of Molecular Variance) using the software GeneALEx6 53 .