Draft genome sequence of first monocot-halophytic species Oryza coarctata reveals stress-specific genes

Oryza coarctata (KKLL; 2n = 4x = 48, 665 Mb) also known as Porteresia coarctata is an extreme halophyte species of genus Oryza. Using Illumina and Nanopore reads, we achieved the assembled genome size of 569.9 Mb, accounting 85.69% of the estimated genome size with N50 of 1.85 Mb and 19.89% repetitive region. We also found 230,968 simple sequence repeats (SSRs) and 5,512 non-coding RNAs (ncRNAs). The functional annotation of predicted 33,627 protein-coding genes and 4,916 transcription factors revealed that high salinity adaptation of this species is due to the exclusive or excessive presence of stress-specific genes as compared to rice. We have identified 8 homologs to salt-tolerant SOS1 genes, one of the three main components of salt overly sensitive (SOS) signal pathway. On the other hand, the phylogenetic analysis of the assembled chloroplast (134.75 kb) and mitochondrial genome (491.06 kb) favours the conservative nature of these organelle genomes within Oryza taxon.


Materials and Methods
Genome size estimation. The plant, after collecting from the coastal region of Sundarban delta of West Bengal, India (21°36′N and 88° 15′E) was established at Net house through clonal propagation. To estimate the genome size, we used 20 mg leaf extract from a 10 cm long plant grown in pot. DNA content was estimated as fluorescence of propidium iodide (PI)-stained 17 nuclei of O. coarctata genome using Pisum sativum (pea) (1 C = 9.09 pg) as an external standard 18 . Experiment was conducted with FACS cell sorter by BD-LSR II(BD-JH FACS Academy, Jamia Hamdard (Hamdard University) Hamdard Nagar, New Delhi, India)and data were analyzed by BD FACS Diva v.8.0.1 (http://www.bdbiosciences.com/in/instruments/software/facsdiva/features/overview.jsp). The whole experiment was repeated 3 times with 8 samples in each time.
Library preparation and sequencing. Genomic DNA (gDNA) was isolated by CTAB method as per our previous protocol 2 from young leaf of the same plant used to estimate the genome size. The quality of the isolated DNA was checked by a NanoDrop D-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE) and Qubit Fluorometer. This DNA was used to make one Illumina short Paired-end (PE) library of 151 bp long reads and four Mate-pair (MP)library such as 2 kb, 4 kb, 6 kb and 8 kb in size following the standard Illumina protocols (Illumina, San Diego, CA) and sequenced with HiSeq4000 platform (Illumina, San Diego, CA). In addition, we also used Nanopore long reads for better assembly which was sequenced on MinION Mk1b (Oxford Nanopore Technologies, Oxford, UK) using SpotON flow cell (R9.4) in a 48 h sequencing protocol on MinKNOW 1.4.32. Base calling was performed using Albacore and base called reads were processed using Poretools version 0.6.0 19 . All the sequencing works were carried out at M/S Genotypic Technology Private Limited, Bengaluru, India.
Genome annotation of nuclear genome. Putative protein-coding genes were predicted using a combined strategy that integrates ab intio gene predictor AUGUSTUS 3.1 (http://bioinf.uni-greifswald.de/ augustus/) and sequence evidence based annotation pipeline, MAKER-P v2.31.8 (http://www.yandell-lab.org/ software/maker-p.html) with O. sativa ssp. japonica as reference gene model. In addition, 5 transcriptome SRA [SRX248542 (Salt450 along with Submerged); SRX248538 (Control); SRX248541 (Submerged); SRX248540 (Salt700); SRX248539 (Salt450)] were used as expression evidence for this prediction (Supplementary Table S1). Resulting gene models were filtered for valid start codon. The final set of predicted protein-coding genes was annotated with Blast2GO (version 4.01) 21 by using BLAST 22 based approach against a database containing functional plant genes downloaded from NCBI with an E-value cut-off of ≤1e −10 . Genes with significant hits were assigned with GO (Gene Ontology) terms and EC (Enzyme Commission) numbers. InterProScan search and pathway analyses with KEGG database were also performed by using Blast2GO.
In order to predict the conserved and unique gene families among the O. coarctata genome and 11 other Oryza species, an HMMScan of HMMER 3.1 package (http://hmmer.org/) was run against the protein sequences of all these 12 species (Supplementary Table S1 A keyword based search was done for available SOS1 genes at NCBI with keyword "plasma membrane Na + / H + antiporter" or "SOS1" or "salt overly sensitive 1". A non-redundant set of 149 SOS1 genes was found in different plant species, ranging from a minimum length of 65 amino-acid to maximum 1213 amino-acid. Out of total 149 genes, 123 genes were found to have the "Na + /H + antiporter or exchanger family" domain as predicted by CD-batch search tool. To predict the SOS1 genes in O. coarctata genome, the 33627 genes of O. coarctata were BLAST searched against these 123 SOS1 genes. The significant hits, with 60% identity, were checked for the presence of "Sodium/hydrogen exchanger family" domain (PF00999) at threshold of e-value ≤ 1e-10 by CD search tool. MEGA 7.0 23 was used to perform alignment and phylogenetic analyses between the newly predicted SOS1 genes in O. coarctata and known SOS1 genes from other plant species.

Identification of transcription factors.
To identify the transcription factors (TFs) in O. coarctata genes, plant transcription factors sequences were downloaded from Plant Transcription Factor Database v4.0 (http:// planttfdb.cbi.pku.edu.cn/) and a BLAST search based approach was used with cut-off values for E-value, identity and query-coverage as ≤1e −10 , ≥40% and ≥50%, respectively 24 .
Identification of repetitive elements and SSR markers. Repetitive elements, retrotransposons and DNA transposons were identified and masked in the assembled O. coarctata genome by using RepeatMasker Tool (http://www.repeatmasker.org/) against the RepBase v.20170127 (http://www.girinst.org/repbase/) using the reference genomic repeats of O. sativa for hard masking. LTR_FINDER v1.05 was used to identify full-length LTR (long terminal repeat) retrotransposon 25 . The SSRs were identified by MIcroSAtellite identification tool (MISA) perl script (http://pgrc.ipk-gatersleben.de/misa/). The minimum number of nucleotide repeats searched during the SSR analysis was set as ten for mono-, six for di-, four for tri-and three for tetra-, penta-and hexa-nucleotide repeats with maximal number of bases interrupting 2 SSRs in a compound microsatellite as 100.
Identification of non-coding RNAs (ncRNA). The ribosomal RNA (rRNA) and small nuclear RNA (snRNA) were identified by INFERNAL 26 with default parameters against Rfam database (release 9.1) (http:// rfam.xfam.org). To predict the miRNAs, we used two steps procedure: first, homology search against Rfam database with a cut-off of 90% identity and query coverage, secondly the presence of hairpin structure in the surrounding sequence of predicted miRNA. The tRNAscan-SE algorithms (Version 1.23) 27 were used to identify and annotate transfer RNA (tRNA) genes with default parameters. For the prediction and annotation of small nucleolar RNA (snoRNA) genes in the assembled genome, snoScan was used with the yeast rRNA methylation sites and yeast rRNA sequences provided by the snoScan distribution 28 .
Phylogenetic analysis with single-copy genes among Oryza species. All protein sequences of O. coarctata genome and 11 other sequenced Oryza species (Supplementary Table S1) were subjected to CD-HIT 29 for clustering at cut-off of 90% coverage and similarity to form unique clusters. Clusters with a single copy gene from all 12 genomes, termed as Single-Copy-gene Clusters, were used for molecular phylogenetic analysis by using the steps as described by Kawano et al. 30 . Synteny analysis for conserved regions with other species. Conserved synteny is used as a measurement for evolutionary divergence 31 or as calculation of conserved coding or non-coding region across different genomes 32 . To identify the conserved regions across genomes of interest, we performed the synteny analysis of the assembled genome of O. coarctata with the reference genome of O. sativa ssp. japonica and the model dicot species A. thaliana, along with its wild halophytic relatives Schrenkiella parvula and E. salsugineum. The genome sequences of these 4 species were downloaded from NCBI (Supplementary Table S1) and BLAST search was used to find out the conserved regions between O. coarctata assembled genome and these 4 reference genomes with an e-value cut-off of ≤1e −10 . Homologous blocks of alignment length of ≥500 bp between the references and the assembled genome were rendered in the synteny plot generated using CIRCOS tool v0.67-7 33 .
Reference-based assembly of chloroplast genome. We used the Nanopore long reads to assemble the chloroplast genome of O. coarctata. The Nanopore reads were BLAST searched against chloroplast genomes of 11 Oryza species of interest (Supplementary Table S1). Out of 1,717,607 Nanopore reads, 28,218 reads (95.26 Mb) were showing significant hits (E-value ≤ 1e −10 ) against Oryza species and were used for 2-round assembly by CLC Genomics Workbench 9.5.1 (CLC Bio, Arhus, Denmark) with chloroplast sequence of O. sativa ssp. indica and japonica as reference. Circular orientation was checked and chloroplast annotation was done by CpGAVAS 34 . The four junctions between 2 inverted repeat regions and 2 single copy regions were validated by PCR amplification using 4 pairs of primers (Supplementary Table S2). GenBank file was prepared with sequin and subjected to OGDraw v1.2 35 to generate chloroplast gene map. We also performed the phylogenetic analysis of this assembled chloroplast genome along with chloroplast sequence of 11 other sequenced Oryza species. All 12 sequences were aligned by ClustalW and evolutionary analyses were conducted using the Maximum Likelihood method based on the Tamura-Nei model in MEGA7 23 .

Results
Genome sequencing, assembly and quality assessment. The genome size calculation for O. coarctata genome from young leaf tissue of pot-grown single plant using P. sativum as an external standard with flow cytometric method resulted in the estimated genome size of 665 Mb (Supplemental Fig. S1a,b) 39 . We have generated a total of 166.69 Gb of HQraw data comprising of short Illumina PEreads (123.78 Gb), MP libraries of 2 Kb (11.12 Gb), 4 Kb (3.06 Gb), 6 Kb (11.23 Gb) and 8 Kb (11.13 Gb) insert size, and Nanopore reads (6.35 Gb) to yield the genome-sequencing depth (x) of 250.66 (Supplemental Table S3). The average read length for Illumina reads were 151 bp, and that for long reads from Nanopore was 3,697 bp with GC% in the range of 42-44% for the sequencing reads generated from each of these technology. The nanopore technology, which was used to generate long reads, was quite successful as more than 60% of the reads having length above 1 Kb, including ~10% above 10 Kb length with maximum read length of 677.71 Kb (Supplemental Fig. S2a In this final assembly of 569.99 Mb, we achieved the longest contig size of 7.85 Mb with an average scaffold length of 9,766 bp. Both CEGMA and BUSCO were used to check the completeness of the assembled genome for the presence of core genes. Results from CEGMA, registered for 92.34% (97.18%, partial) completeness of the assembled genome with 229 of 248 ultra-conserved core eukaryotic genes (CEGs) were present in the genome (Supplemental Table S4). The genome completeness raised upto 98.70% when normalized with respect to reference genome of O. sativa spp. japonica Nipponbare-IRGSP-1.0. Similar results were found for the analysis with BUSCO that registered 97.08% completeness with normalized value of 99.43% (Supplemental Fig. S3).
Gene prediction and annotation. With high level of completeness for the presence of core genes in the assembled genome, as predicted with CEGMA and BUSCO, we proceed with gene prediction and annotation in the assembled 58,362 scaffolds by integrating AUGUSTUS and MAKER-P annotation pipeline. A 3.32% (1,938) of the total scaffolds were found containing a total of 33,627 predicted protein-coding genes with length ranging from 180 bp to 1926 bp and an average length of 1,147 bp. These numbers are quite comparable with the predicted genes among the 11 other Oryza species sequenced so far (Supplemental Table S5).
Of 33,627 predicted genes, a total number of 26,569 (79.01%) were assigned functions based on their significant best BLASTP hits to Plant Genes DB by using Blast2GO. Species distribution showed that majority (i.e. 59.28%) of these 26,569 genes had top BLAST hits against O. sativa ssp. japonica, followed by O. brachyantha, Aegilops tauschii, Setaria italica and Zea mays with 28.13%, 2.15%, 2.27% and 2.01%, respectively (Fig. 1a). In total, a good amount of 87.41% genes showed homology against rice and O. brachyantha, in spite of less number of genes from these genomes in the plant gene database used for annotation (Supplemental Table S6), suggesting its existence somewhere between AA and FF genome type or between domesticated and wild relatives. In order to identify the genes associated with biological processes, molecular functions and cellular process, we carried out the functional annotation in terms of Gene Ontology (GO) using BLASTP results. A total of 35,690 GO terms were assigned to 16,357 genes associated with biological processes, cellular components and molecular functions (Fig. 1b). Among the biological processes, the most prevalent were metabolic process (6,564), cellular process (5,661) and single-organism process (3,169) (Fig. 1b). Apart from these three generally high-scoring categories, there are 19 more groups with at least 1 or more genes with assigned GO terms but noticeable ones are "response to stimulus" (827 genes), "signalling" (251 genes) and "developmental process" (130 genes). The 827 genes under GO category "response to stimulus" were assigned with 207 GO terms or sub-categories like "response to stress", "salt-stress", "oxidative stress", "water deprivation", "auxin", "abscisic acid", "biotic-" and "abiotic-stimulus" pointing towards the significant presence of stress-related genes in the O. coarctata genome, which are either absent or present in very less number in rice (IRGSP 1.0) (Supplemental Table S7). For cellular processes associated genes, ~40% of genes were assigned GO terms for cell or cell-part category and ~29% genes to that for membrane or membrane-part category (Fig. 1b). With 51.59% (8,732 genes), the largest category among genes associated with GO terms for molecular functions was binding activity followed by catalytic activity (35.33%; 5,980 genes) (Fig. 1b). A total of 2,844 genes were annotated with enzyme code distribution in which high abundance of genes were assigned hydrolases enzyme (1336) and transferases (707) followed by oxidoreductase (404), lyases (155), isomerases (127) and ligases (115). Additionally, 4,640 of the predicted genes were functionally annotated with Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The observed different types in abundance of protein function classes may be important to support different life-styles of the plant species. Finally, there were 7,058 genes in O. coarctata without any significant BLAST hits (E-value ≤ 1e −10 ) against plant genes database and 2,216 of these genes were found neither any significant GO hits nor any InterProScan match which accounts for 22.33% and 6.60% of total genes, respectively indicating that these could act as a gene-pool for O. coarctata specific and unique genes (Supplemental Table S8).
Transcription factors (TF), as key regulator for gene expression, are best suited for understanding the molecular mechanism and incorporating the abiotic stress tolerance in plants 40 This comparison resulted in a core set of 652,000 genes representing 3,215 clusters shared among all 12 Oryza species, encoding ancestral gene families (Fig. 2). We found a total of 20 gene clusters comprising 22  There are 3 main genes in the SOS (salt overly sensitive) signal pathway, i.e. SOS1, SOS2 and SOS3 but as SOS2 gets activated by SOS3 after interaction, and their complex is required to fully activate SOS1, we focused on the identification of SOS1 gene, a set of genes with most important role in plant salt-tolerance, to get an overlay of the abundance of SOS pathway related genes in the assembled genome [47][48][49][50] . Moreover O. coarctata is halophytic in nature and SOS1 was found to have the activity needed for halophytic characteristic as reported in halophytic dicot T. salsuginea 51 . SOS1 is one among the very few genes that required for plant salt tolerance which encodes for "plasma membrane Na + /H + antiporter", which not only plays an important role in germination but more importantly aid in the growth of plants in saline conditions 52 . So we focused on "Na + /H + antiporter" domain or keyword based strategy to identify SOS1 genes, as described in methodology. We identified a total of 8 SOS1 genes in O. coarctata genome either by homology search against known SOS1 genes or BLAST based function annotation of predicted genes and with the presence of "Na + /H + antiporter or exchanger family" domain. To classify these eight SOS1 genes in to plasma membrane Na + /H + or vacuolar Na + /H + antiporter gene, we have chosen 18 SOS1 genes as used by Chen et al. 53  A multiple sequence alignment and phylogenetic analysis was performed between these sequences and the close analysis of so obtained Neighbour Joining (NJ) tree suggested that only 1 of the 8 SOS1 genes (OcSOS1-1) clustered with plasma-membrane Na + /H + antiporters of other plant species while rest of the 7SOS1 genes (OcSOS1-2, 3,4,5,6,7,8) were part of the cluster having vacuolar Na + /H + antiporters from other plant species (Fig. 3). But in both clusters, these genes were closely related with SOS1 homologs of O. sativa ssp. japonica. Three of these identified SOS1 genes i.e. OcSOS1-6, OcSOS1-7 and OcSOS1-8 could be the newly evolving ones as they showed a very low or no homology against known SOS1 genes and were present in a separate cluster in the obtained tree.   Table S10).Among retroelements, 14.48% (82.53 Mb) of genome accounts for LTR retrotransposons and just 0.47% for Non-LTRs (SINEs and LINEs). The copia and gypsy are the two main components of LTR retrotransposons accounting 8.24% (46.94 Mb) and 6.04% (34.42 Mb) of the assembled genome, respectively. With LTR_FINDER, we have identified 218 full length LTR retrotransposons in 142 scaffolds (out of total 58362 scaffolds) comprising of a total of 1.30 Mb which accounts just 0.23% of O. coarctata genome (Supplemental Table S10).
Non-coding RNA genes have important regulatory roles in a number of plant phenomenons like chromosomal silencing, regulating the transcription process, developmental control, and various stress-responses 54 . Different types of ncRNAs were identified in O. coarctata genome included miRNA, rRNA, snRNA, tRNA and snoRNA (Supplemental Table S12). The O. coarctata genome was found to have 200 miRNA genes, 118 snRNA genes and 3,110 rRNA (LSU, SSU, 5_8S_rRNA and 5S_rRNA) genes encoded by "cmsearch" module of the INFERNAL package using the relevant covariance model from Rfam. The tRNAScan-SE algorithms, as applied with default parameters to the O. coarctata genome assembly, resulted in the identification of 900 putative tRNAs in the O. coarctata assembly. The snoRNA is a small RNA molecule that leads the chemical modifications of other RNAs, including rRNAs, tRNAs and snRNAs. In total 1,184 snoRNA were identified in O. coarctata genome.

Single copy genes: Phylogenetic analysis of O. coarctata along with other 11
Oryza species. In order to perform the genome level phylogenetic analysis of O. coarctata along with other 11 Oryza species, a total of 194,069 unique clusters were formed with a cut-off of 90% for coverage and similarity from their protein sequences. The cluster analysis resulted in total of 170 putative single copy gene clusters. These single copy proteins from the 12 genomes were considered for phylogenetic analysis. The tree generated with these single-copy orthologous genes placed O. coarctata as an individual clade and was found consistent with the species-tree obtained from the TimeTree database 55 , which generally used to retrieved the divergence times among species. A consensus analysis based on both the tree depict that FF genome (O. brachyantha) diverged about 15 million years ago (MYA) followed by O. coarctata around 10 MYA and the BB genome (O. punctata) about 6 MYA from the AA genome types. Among AA type genomes, it seems that Australian species, O. meridionlais was first to diverge from other AA genome species (South-American, Asian and African) and that happened somewhere around 2-3 MYA (Fig. 4). Synteny analysis. For representation of homologous blocks in synteny plots, unplaced scaffolds from each reference were concatenated and represented as single contig ("Un") and the contigs with less than 1MB size in our assembled genome were concatenated and are represented as a single scaffold ("Unplaced") (Supplemental Fig. S6). In the synteny analysis of O. coarctata with three of the dicot species including A.thaliana, E. salsugineum and S. parvula,high numbers of syntenic blocks were found with just 1 scaffold (S128; 1,335,219 bp) of O. coarctata having no collinear or syntenic block with any of these three species (Supplemental Fig. S6a-c) Analysis of chloroplast genome. The chloroplast genome exists as circular molecule in angiosperms with size ranging from 120 to 160 kb in length 56 . We used 11 chloroplast sequences of Oryza species with size ranging from 134,558 bp to 135,525 bp as reference (Supplemental Table S1). Chloroplast reads were extracted from Nanopore data by BLAST search and assembled by guidance-based assembly into a circular contig of 134,750 bp length. It has a typical and standard quadripartite plant chloroplast structure with comparable regions of ~20.8 kb, ~80.8 kb and ~12.3 kb inverted repeats (IR), large single copy (LSC) and small single copy (SSC) regions, respectively with that of other Oryza species [56][57][58] . These four junctions were further confirmed and validated by PCR amplification (Supplemental Table S2; Fig. S7). While the IRa and IRb spanned for 20,800 bp region of assembled chloroplast, the LSC and SSC regions covered 80,816 bp and 12,334 bp, respectively ( Fig. 5a and Supplemental Table S13). There were 82 protein coding genes, 33 tRNA genes and 8 rRNA genes, making a total of 123 genes with 18 genes as one copy in each of the 2 IR regions (Supplemental Table S13). In total, it has 43 genes involved in photosynthesis and divided in 6 different classes (Supplemental Table S14). No ycf1 gene was found in this newly assembled chloroplast which supports the early loss of this gene in Poaceae family 58 . The phylogenetic analysis with chloroplast genomes of 11 other Oryza species infer that this newly assembled chloroplast genome is close to japonica subgroup of Oryza and is a part of the clade consisting of AA genome type while the BB and FF genome type, i.e. punctata and brachyantha, respectively were representing single different clades each (Fig. 5b).
Analysis of mitochondrial genome. The assembled mitochondrial genome is 491,065 bp long ( Fig. 6a) with 43.07% of GC content comprising of 46 protein coding genes, 37 tRNA genes and 4 rRNA genes (Supplemental Table S15). The annotation also confirm the presence of 4 genes related to functional category 'Origins of replication' , making a total of 91 genes in the mitochondrial genome with a total of 48,960 bp (i.e.9.97% of assembled mitochondrial genome). The genome organization is comparable with available mitochondrial genomes of 4 other Oryza species with almost similar GC content and same range of protein coding genes, tRNA genes and rRNAgenes 59 . The annotated 46 protein coding genes here, included 25 genes related to the production of ATP synthase and the electron transport chain with 9, 1, 3 and 5 subunits of complex I (nad1-9), III (cob), IV (cox1-3) and V (atp1, 4,6,8,9), respectively (Supplemental Table S16). Besides, there were 11 ribosomal proteins and only a single maturase (mat-r) gene. To

Discussion
We report first time the draft genome sequence of a monocot halophyte, O. coarctata which provides an excellent platform for exploring the genetic potential of this wild species of rice. Our flow cytometry based results indicated the genome size be of 665 Mb, which is much closer to the previously reported estimated size of 771 Mb 61 . The results demonstrate several other interesting biological aspects and features of the genome. The Nanopore long reads significantly improved almost all the metrics related to quality of assembly, including reduced number of scaffolds, large size of the scaffolds (upto 7.85 Mb), a significant N50 (1.85 Mb) or L50 value, a good range of mean and median lengths (9.76 kb) along with a high percentage of genome completeness measure (upto 97.08%) based on the presence of core orthologous genes and hence contributed in the best representation of genome. It also helped us to retrieve the complete set of chloroplast genome sequence. To date, this is the only largest sequenced genome of any halophyte with the deepest sequencing coverage of 250.66-fold, considering the available sequenced halophytic genome such as T. parvula, T. salsuginea and/or E. salsugineum (Supplemental Table S17). A total of 33,627 predicted protein-coding genes that we discovered here, are the highest among sequenced halophytes and are comparable with other Oryza species. A 22.33% of these genes were found specific and unique to O. coarctata but rest were showing a good homology with rice (domesticated and AA type) and O. brachyantha (wild and FF type). The small genomes of chloroplast and mitochondria aid well in understanding the evolutionary background of O. coarctata in Oryza genus and among with AA, BB and FF genome type and supports the conservative nature of chloroplast and mitochondrial genomes within the taxon. Further, a phylogenetic analysis based on the single copy genes among Oryza species pointing towards the existence of O. coarctata genome somewhere between the divergence of FF and BB genome form AA genome. It could be true as assumed by researchers 5,13,62,63 but needs to be validated with further study.
Further, when compared with rice, there were found 123 GO terms including "response to salt-stress", "abscisic acid", "biotic stress", "desiccation", "defence response to fungus", etc., under GO category "response to stimulus", exclusively present in O. coarctata but completely absent or present in low number in rice favouring the high salinity adaptation of this species. Moreover the presence of 8 SOS1 gene copies in O. coarctata mayfavours and ads up to its salinity tolerance mechanism as SOS1contribute to salt tolerance by pumping the sodium ions out of the cells once activated 50 . Even the total identified (4,916) and stress-responsive TFs (1,440) were much larger in number than those found in rice (2,478 and 1,408, respectively) in earlier reports 64 . Oryza coarctata, even being a tetraploid, is found to have low repetitive content which is an exception and similar results were found reported by earlier researchers 65 . Although 82.53 Mb (14.48%) of genome consists of LTR retrotransposons but only 1.30 Mb (i.e. 0.23% of genome) of them were found as full-length LTRs, suggesting the probably high occurrences of recombination or deletion events post retrotransposition 66 . Comparatively a little higher synteny was found for O. coarctata with E. salsugineum and S. parvula as compared to rice, which could be due to the halophytic nature of these species indicating that, this genome sequence offers advantages as an ideal system for functional genomics to understand the salinity tolerance mechanisms of monocots as rest halophyte sequenced genome are dicot species. More importantly, our genome information will complement the I-OMAP project along with already existing genomic resources of different wild and cultivated Oryza species and will aid in both the genome level and the salinity based comparative genomics, evolutionary studies and extension of gene pool for improvement in cultivated rice.