Mangrove tree (Avicennia marina): insight into chloroplast genome evolutionary divergence and its comparison with related species from family Acanthaceae

Avicennia marina (family Acanthaceae) is a halotolerant woody shrub that grows wildly and cultivated in the coastal regions. Despite its importance, the species suffers from lack of genomic datasets to improve its taxonomy and phylogenetic placement across the related species. Here, we have aimed to sequence the plastid genome of A. marina and its comparison with related species in family Acanthaceae. Detailed next-generation sequencing and analysis showed a complete chloroplast genome of 150,279 bp, comprising 38.6% GC. Genome architecture is quadripartite revealing large single copy (82,522 bp), small single copy (17,523 bp), and pair of inverted repeats (25,117 bp). Furthermore, the genome contains 132 different genes, including 87 protein-coding genes, 8 rRNA, 37 tRNA genes, and 126 simple sequence repeats (122 mononucleotide, 2 dinucleotides, and 2 trinucleotides). Interestingly, about 25 forward, 15 reversed and 14 palindromic repeats were also found in the A. marina. High degree synteny was observed in the pairwise alignment with related genomes. The chloroplast genome comparative assessment showed a high degree of sequence similarity in coding regions and varying divergence in the intergenic spacers among ten Acanthaceae species. The pairwise distance showed that A. marina exhibited the highest divergence (0.084) with Justicia flava and showed lowest divergence with Aphelandra knappiae (0.059). Current genomic datasets are a valuable resource for investigating the population and evolutionary genetics of family Acanthaceae members’ specifically A. marina and related species.

Comparative analysis of A. marina cp genome with related species. In order to further analyze the characteristics of the A. marina chloroplast genome, its assembled genome was compared with the chloroplast genomes of 10 other species of the same family. The results revealed that A. marina (150,279 bp) cp genome size is slightly bigger as compared to A. paniculata (150,249 bp), J. leptostachya (149,227 bp) and S. cusia (144,133 bp). However, it has a slightly smaller cp genome size when compared to the species A. knappiae, C. nutans, E. attenuates, E. lofouensis, E. longipes, E. longzhouensis and J. flava ( Table 1).
The GC contents of A. marina (38.6%) were found almost similar for species (A. knappiae, E. longipes, E. lofouensis and E. longzhouensis), however GC contents were found higher when compared to the species A. paniculata (38.3%), C. nutans (38.4%), E. attenuates (38.3%), J. flava (38.2%), J. leptostachya (38.2%) and S. cusia (38.2%). The number of genes, rRNA, tRNA, SSC size and the number of protein coding regions were found almost similar in these studied cp genomes Table 1. The LSC analysis showed that it is almost similar in all the species except E. longzhouensis where the size was found slightly smaller (79,203 bp) but significantly higher in S. cusia (92,666 bp) when compared to A. marina (82,522 bp).
The synteny of A. marina cp genome with ten other species from Acanthaceae was analyzed by mVISTA. Gene divergence was determined by determining pairwise alignment of A. marina with related species. A. marina chloroplast genome was used for reference to determine variation and sequence identity in the chloroplast genomes of related species. The results showed high sequence similarities among the cp genomes of several species, especially in protein-coding and IR regions ( Figure S1).

Phylogenetic relationships among A. marina and related species.
To determine the phylogenetic position of the A. marina, complete chloroplast genome was performed and compared with 24 other related species to build the phylogenetic tree. The phylogenetic position of A. marina is established using maximum likelihood (ML), maximum persimony (MP) and neighbour-jouining (NJ) methods in this study by utilizing 65 shared genes and complete cp genomes of related plant species. The phylogenetic trees constructed on three different methods shows same result and A. marina formed a clade near to A. paniculata and A. knappiae genomes from same family Acanthaceae. Similarly, Gesneriaceae was found the closest family to Acanthaceae ( Fig. 6; Figure S2).

Discussion
A. marina belongs to the family Acanthaceae which consist of more than 400 species distributed around the world 35 . The phylogenetic analysis is very important to correctly identify the taxonomic position of plants.
Recently chloroplast genome sequencing has played a major role in the phylogenetic analysis of plants 36,37 .
Though the chloroplast genomes are highly conserved when the size and genomic architecture are considered, however, the genes located on the borders of IR/SC varies tremendously in terms of size and type which make chloroplast distinctive for phylogenetic analysis 38,39 . The chloroplast genome sequence of all the 10 related species were also found conserved, however when compared with each other and with the chloroplast genome of A. marina, differences were observed for the genes located on the borders of IR/SC regions. The size of complete chloroplast genome of A. marina (150,279 bp) is greater than the previously reported J. laptostachya (149,227 bp) 40 and S. Cusia (144,133 bp) 41 , and almost same size of A. paniculata (150,249 bp) 42 and J. flava (150,888) 43 . However, the size of A. marina chloroplast genome that we sequenced was found lower than the previously reported chloroplast genomes of the related species such as A. knappiae 44 , C. nutans 45 , E. attenuates , E. lofouensis , E. longipes 46 and E. longzhouensis 43 . Our study also confirmed the LSC, SSC and IR region that were almost similar in size to the previously reported chloroplast genomes of Eucalyptus globulus 47 , Coffea arabica L. 48 and Camellia japonica L. 49 . In our study, we have found that the GC contents (38.6%) of A. marina and on the 3rd position of codons, the GC contents (32.25%) were found lower as compared to AT/U contents (67.8%) which support previous reports of chloroplast genome sequences of C. gileadensis (37.9%) and C. foliacea (37.8%) 50 . The intron containing genes (14 genes) were found in which only 2 genes were found to contain two introns. The introns are very important in the gene expression regulations studies and it has been observed that when present on specific sites/positions, it can positively regulate exogeneous gene expression 51 . Thus, introns can be valuable tools in order to improve the transformation efficiencies. The intron sequences of chloroplast DNA also has key role in the phylogenetic analysis 52 .  (87), rRNA (8) and tRNA (37) were found in our study that coincides with the previously reported studies on chloroplast genomes 50 . For example, in the same family (Acanthaceae), Ding, et al. 42 and Huang, et al. 44 previously reported the size of protein coding genes, rRNA, tRNA, number of genes, number of protein coding genes, number of rRNA and number of tRNA in A. paniculate and A. knappiae In the chloroplast genome, microsatellites or SSRs distributed in the genome with the length of almost 1 bp to 6 bp. Previously many studies reported SSRs at different positions in chloroplast genomes 55,56 . In the present study, SSRs distributed at different locations such as IR regions, LSC and SSC region were identified in the A. marina. Total of 126 SSRs were identified in A. marina and almost similar number of SSRs were found in the chloroplast genomes of related species such as E. Longipes (122) and E. longzhouensis (120). Other studies on chloroplast genomes have also confirmed the presence of uneven numbers of SSRs in the chloroplast genomes at different locations 50,[57][58][59] .
The difference in the size of chloroplast genome is considered a common evolutionary practice which can be attributed to the contraction and expansion of the inverted repeats (the most conserved regions) in the genome 63 . In majority of the plants, the junctions or borders of the genome particularly in the quadripartite structure are very conserved. However, inversion at the borders or junction can be found in some species as previously reported 64 as well as the loss of genes and the expansion and contraction which is the common cp genome event in the angiosperms 65,66 . The results in our study shows that rps19 gene at JLB junction in A. marina has extended to IRb (15 to 102 bp) while the gene ndhF partly on the IRa region (40 to 178 bp) shows similar results patterns as previously studied by Cheon, et al. 67 .
The phylogenetic analysis is very important for the evolutionary and taxonomic studies. Many phylogenetic analysis are now based on the chloroplast genomes 68,69 . Previously, the phylogenetic analysis of mangrove (A. marina) were mostly based on the RAPD and other molecular markers 70,71 . Sahu,et al. 72 reported that the role of multiple gene in the mangrove phylogeny. However, in this study we used both whole cp genomes and concatenated 65 protein coding genes to infer the phylogenetic position of A. marina. Both data sets showed same results and A. marina is closely related to A. paniculate and A. knappiae in family Acanthaceae. The present study provides a valuable analysis of the complete plastome of A. marina and related species, which may facilitate species identification and both biological and phylogenetic studies.

Conclusion
Complete chloroplast genome sequence of A. marina was found highly conserved in its structure and order of genes distribution as compared to the other mangrove species in Acanthaceae. The results showed location and distribution of SSRs as well as the sequence divergence among the chloroplast genome of A. marina and related mangrove species. Among genes, rpl22, ndhF, rps15 and ndhA were found the most divergent genes in the mangrove species. Additionally, the phylogenetic analysis shows that A. marina was closer to A. knappiae and A. paniculate species. It can be concluded from this study that complete chloroplast genome sequence may provide a better understanding of identification and phylogenetic studies of plant as compared to other strategies.

Materials and methods
Plant sample, Chloroplast DNA extraction and sequencing. A. marina seedlings were donated by the Center for Marine Conservation, Ministry of Environment, Sultanate of Oman. After shipping it to the greenhouse the leaves were collected in liquid nitrogen and ground to a fine powder. The powder samples of leaves were processed for chloroplast DNA extraction. Leaves were collected from the mangrove plants and were ground into fine powder using liquid nitrogen. Chloroplast DNA was extracted using the protocol of Khan et al. 73 . Manufacturer's instructions (Life Technologies, Carlsbad, CA, USA) were followed for the preparations of genomic libraries. Ion Shear Plus Reagents kit was used to shear chloroplast DNA into 400 bp fragments while Ion Xpress Plus gDNA Fragment Library kit was used for library preparation. Quantity and quality of libraries were checked using the Qubit 3.0 fluorometer and Bioanalyzer (Agilent 2100 Bioanalyzer system; Life Technologies, Carlsbad, CA, USA  81 was used for the genome annotation. For the position's identification of transfer and ribosomal RNA and coding genes BLASTX and BLASTN were used. Furthermore, tRNAscan-SE version 1.21 82 was used for the tRNA gene annotation. In order to compare the cp genome of A. marina with the related cp genomes, tRNAscan-SE and Geneious software were used. Intron boundaries, stop and start codons were adjusted by comparing it to the previously reported cp genome of the related specie from family Acanthaceae. OGDRAW 83 was used for the illustrations of structural features of A. marina cp genome. MEGA-X software 84 was used for the determination of deviations in synonymous codon usage and relative synonymous codon usage with amino acid composition influence. The pairwise gene divergence was determined by mVISTA 85 in Shuffle-LAGAN mode. Identification of repetitive sequences and SSRs. The REPuter software 86 was used for the identification of forward, reverse and palindromic repeats, the sequence identity with 90% and minimum 15 bp was the basic criteria in the identification. Microsatellite analysis of contig sequences was carried out with the MIcroS-Atellite (MISA) identification tool 87 . The parameters (unit_size, min_repeats) were defined as follows: 1-10, 2-8, 3-4, 4-4, 5-3, and 6-3; the minimum distance between two SSRs was set to 100. For the identification of tandem repeats, Tandem Repeat Finder version 4.087 88 b was used for tandem repeat identification.
Sequence divergence and Phylogenetic analysis. The pairwise distance among the cp genes shared by A. marina and related species as well as the complete cp genome of A. marina was determined. The ambiguous and missing genes annotation were identified and removed by the comparative sequence analysis. Complete cp genomes were aligned by using the MAFFT version 7.222 89 and Kimura's two-parameter (K2P) model 90 was used for calculation of pairwise sequence distance. The divergence of the new A. marina cp genomes from related species of family Acanthaceae was assessed using mVISTA 85 in Shuffle-LAGAN mode and by employing the new A. marina genome as reference.
For the determination of phylogenetic position of A. marina, 23 published cp genomes were downloaded from NCBI from four different families (Gesnerianceae, Lamiaceae, Bignoniaceae and Acanthaceae) respectively. Complete cp genomes and a separate partition containing only the 65 shared genes (concatenated) were used to infer the phylogenetic position of A. marina. MAFFT version 7.222 89 , with default parameters were used for the alignment of both complete genomes and shared genes data sets. Maximum likelihood (ML) and neighbourjoining (NJ) methods were used to infer the phylogenetic trees with MEGA-X 91 and parameters were adjusted with a BIONJ tree with 1000 bootstrap replicates using the Kimura 2-parameter model with gamma-distributed rate heterogeneity and invariant sites. Maximum parsimony (MP) by using PAUP 92 using previously described settings 33,93 .

Data availability
All data generated or analyzed during this study are included in this published article.