Low genetic diversity indicating the threatened status of Rhizophora apiculata (Rhizophoraceae) in Malaysia: declined evolution meets habitat destruction

Worldwide, many mangrove species are experiencing significant population declines, including Rhizophora apiculata, which is one of the most widespread and economically important species in tropical Asia. In Malaysia, there has been an alarming decline in R. apiculata populations driven primarily by anthropogenic activities. However, the lack of genetic and demographic information on this species has hampered local efforts to conserve it. To address these gaps, we generated novel genetic information for R. apiculata, based on 1,120 samples collected from 39 natural populations in Peninsular Malaysia. We investigated its genetic diversity and genetic structure with 19 transcriptome and three nuclear microsatellite markers. Our analyses revealed a low genetic diversity (mean He: 0.352) with significant genetic differentiation (FST: 0.315) among populations of R. apiculata. Approximately two-third of the populations showed significant excess of homozygotes, indicating persistent inbreeding which might be due to the decrease in population size or fragmentation. From the cluster analyses, the populations investigated were divided into two distinct clusters, comprising the west and east coasts of Peninsular Malaysia. The western cluster was further divided into two sub-clusters with one of the sub-clusters showing strong admixture pattern that harbours high levels of genetic diversity, thus deserving high priority for conservation.


Results
Identification and development of microsatellite markers in R. apiculata. A total of 25,938,686 raw RNA-seq reads were generated by the Illumina HiSeq 4000 sequencer, with a total of 25,627,792 clean reads remained for further analysis after the ambiguous and low-quality sequences were filtered. These clean reads were then assembled, resulting in 141,915 contigs with an overall GC content of 44%. Using MISA software, these contigs were analysed and found to harbour 18,674 microsatellites, with the highest distribution in dinucleotide (15,898,85.13%), followed by trinucleotide (2,403,12.87%) and tetranucleotide microsatellites (373, 2.00%). Three highest frequencies of dinucleotide motifs that were detected were CT (16.80%), AG (16.68%) and TC (13.45%). Out of the 60 primer pairs tested, 46 yielded successful amplification. These primers were then used to screen 24 individuals of R. apiculata via PCR amplification and fragment analysis. Nineteen out of 46 primer pairs which showed clear genetic polymorphisms along with three other primer pairs (RM111, RM116, and RM121) previously developed for R. mucronata 21 were selected ( Table 1). The assembled contig sequences from which the newly developed microsatellite markers derived were blasted against the genome sequence of R. apiculata 22 to check the reliability (Supplementary Table S1). These markers were subsequently used to genotype the 1,120 R. apiculata samples collected throughout Peninsular Malaysia ( Fig. 1 and Table 2). Genetic diversity. Descriptive statistics for the 39 populations of R. apiculata based on the 22 polymorphic microsatellite markers are listed in Table 2. These microsatellite markers had number of alleles per locus (A a ) ranging from 2.3 to 4.6 alleles for all samples, with an average of 3.2 alleles per locus. The mean values for observed heterozygosity (H o ) and expected heterozygosity (H e ) were 0.299 and 0.325, respectively. The expected heterozygosity (H e ) at the population level ranged from 0.247 (Balik Pulau) to 0.503 (Muar). The allelic richness (R s ) ranged from 2.07 to 3.63, showing marginal differences among the populations investigated. All loci deviated from Hardy-Weinberg equilibrium (HWE), with heterozygote frequencies being either lower or higher than expected (Table 2). Approximately 95.9% of the 39 tested populations had excess of homozygotes with positive inbreeding coefficient values (F IS ) ranging from 0.007 to 0.450. Of that, 66.7% (26 populations) were found to be statistically significant (p < 0.05), mainly of the populations with disturbed habitat, either due to logging for firewood, charcoal production and construction (Telok Gedong and Merchang) or development for human settlement, aquaculture, ecotourism, road construction and land use for plantation (Kubang Badak, Merbok, Balik Pulau, Trong, Sungai Batang, Banjar Utara, Pulau Ketam, Sepang Besar, Sepang Kecil, Pulau Besar, Muar, Pulau Kukup, Tanjung Piai, Kuantan, Peramu, Cherating, Kuala Kemaman and Kuala Terengganu). There were only two populations with negative F IS values (Pulau Tengah, F IS = − 0.035 and Merlimau Tambahan; F IS = − 0.023) or excess of heterozygotes but were not statistically significant (p < 0.05) ( Table 2).
Most of the total genetic diversity (H T = 0.532) was partitioned within population (H S = 0.370) ( Table 3). The values recorded for Wright index (F ST ) ranged from 0.079 to 0.638, while those for the Slatkin's divergence parameter (R ST ), varied from 0.038 to 0.617. The proportion of genetic variation distributed among populations (G ST ) was estimated at 0.305, indicating that 30.5% of genetic variability was distributed among populations ( Table 3). The mean F ST (0.315) estimate was slightly higher than G ST and was significantly greater than zero (p < 0.05), while the mean R ST (0.242) was lower than F ST (Table 3).
Population genetic structure. Genetic differentiation between populations increased as distance between populations increased (R 2 = 0.2602) (Fig. 2a). Additionally, the results from the model-based Bayesian clustering analysis using STRU CTU RE v2.3.1 indicated the presence of two main clusters within Peninsular Malaysia, one formed by the populations in western Peninsular Malaysia (Cluster 1: populations , and the other by those in eastern Peninsular Malaysia (Cluster 2: populations [28][29][30][31][32][33][34][35][36][37][38][39]. The STRU CTU RE algorithm showed the best clustering at K = 2, providing good biological explanation since the clusters coincided with two geographical groups 23 corresponding to the Straits of Malacca (western Peninsular Malaysia) and South China Sea (eastern Peninsular Malaysia) (Fig. 2b). A gradually increasing level of admixture was observed in populations between Kedah and west part of Johor (Cluster 1: populations 1-27). When genetic variation is hierarchically organ-  (Fig. 2c). Interestingly, the observed strong admixture pattern in Sub-cluster 1b suggests that the populations within this sub-cluster harbour higher levels of genetic diversity and could be an important genetic reservoir for R. apiculata in Peninsular Malaysia. Results from the principal component analysis (PCA) corroborates with the STRU CTU RE analysis, whereby the 39 populations were also divided into two distinct clusters (Fig. 2d), i.e. Cluster 1 along the Straits of Malacca (western Peninsular Malaysia) and Cluster 2 facing the South China Sea (eastern Peninsular Malaysia). Similarly, Cluster 1 was further divided into two sub-clusters (Fig. 2e). To complement the population structure analysis, a UPGMA dendrogram was constructed with MEGA v5.0 based on Nei's D A 24 , producing two main branches that illustrate the two major clusters and two sub-clusters within Cluster 1 (Fig. 3).
When the 39 populations were grouped based on the two main clusters, AMOVA revealed that 45% of the variation was apportioned between the western and eastern regions of Peninsular Malaysia, 13% among populations within regions, and 42% within populations (Table 4).

Discussion
From our transcriptome data, we identified a total of 18,674 microsatellites, with dinucleotide repeat motifs (85.13%) being the most frequent type of microsatellite. Similar observations, whereby dinucleotides was the most abundant motif in the plants' genomes, have been reported for other tree species such as grey mangrove (Avicennia marina) 25 , downy oak (Quercus pubescens) 26 and crape myrtle (Lagerstroemia spp.) 27 . A total of 19 transcriptome-based microsatellite markers (EST-SSR) were developed in this study. To increase the number of markers for genotyping, three additional nuclear microsatellite markers (gSSR) which were previously developed for R. mucronata 21 have been validated and utilised in our study. Increasing the number of polymorphic loci can greatly enhance the precision of estimates of genetic distance 28 . The high cross-species transferability of microsatellite markers between R. apiculata and R. mucronata has previously been demonstrated 9 .
Generally, low genetic diversity is common in mangrove plant species particularly the Rhizophora species 9,16,22 . Low levels of genetic diversity (mean H e = 0.352, Table 2) in R. apiculata were observed in the present study. The result is comparable to other mangrove species such as R. mucronata (H e = 0.354) 9 , R. stylosa (H e = 0.321) 9 , and Sonneratia alba (H e = 0.280) 29 . The low level of genetic diversity occurring within R. apiculata populations suggests that this species may have experienced severe habitat fragmentation or population size reduction.
Our results also showed significant excess of homozygotes or positive inbreeding coefficient values (F IS ) in most of the R. apiculata populations, consistent with the observed habitat destruction found in most of these populations (Table 2). Hence, the observed positive F IS values might be the result of selfing and/or mating between close relatives due to the reduction in population size caused by the anthropogenic activities. This is also a common result when drift proceeds in small population. As drift proceeds and each population becomes different from one other, the genetic variation among populations increases 30,31 . www.nature.com/scientificreports/ R. apiculata is considered a threatened mangrove species in Southeast Asia, and its population decline is due mainly to deforestation and land conversion 4,32,33 . The high wood density is an attractive selling factor for R. apiculata (0.600 gcm -3 ), causing extensive over-exploitation of the species 12,34 . Owing to this, decline in effective population size and aggravate loss of genetic diversity ultimately reduced resilience of populations to anthropogenic climate change 35 . The genetic fragility in R. apiculata is of concern, considering that future impacts of environmental changes, whether natural or otherwise, will likely to further reduce its genetic diversity and threaten its long-term viability.
Wright's F-statistics 36 is among the most common methods used to estimate the level of heterozygosity in a population. The F ST is more sensitive in detecting intraspecific differentiation as compared to its analogue, the R ST 37,38 . The R ST , however, is thought to be a better predictor for interspecific divergence because it can effectively reflect the mutation patterns of microsatellites 39 . Therefore, both models were applied in the present study, revealing a strong population genetic structure of R. apiculata (F ST = 0.315, R ST = 0.242). We found that approximately 68.5% and 31.5% of genetic variations were distributed within and among R. apiculata populations, respectively. Other mangrove studies also revealed high population differentiations, namely in Avicennia marina (0.410) 25 , Ceriops tagal (0.529) 40 , and A. germinans (0.410) 41 .  Table 2 for detailed information on sampling sites. The map was generated using the ArcGIS v10.5 (https ://deskt op.arcgi s.com/en/, ESRI). The Malaysian administrative boundary data for mapping was downloaded from the iGISMAP (https ://map.igism ap.com/share -map/expor t-layer /Malay sia_Polyg on/cedeb b6e87 2f539 bef8c 3f919 874e9 d7). www.nature.com/scientificreports/ All the three cluster analyses performed in this study showed similar results, whereby the 39 populations were divided into two major geographical clusters (Cluster 1: populations 1-27; and Cluster 2: populations 28-39), with strong admixture pattern observed between southern part of western and eastern regions of Peninsular Malaysia (Fig. 2b), sub-structuring further divided Cluster 1 (Fig. 2c) into two sub-clusters. Due to the strong admixture of alleles, Sub-cluster 1b (populations 20-27) harbours higher genetic diversity (Table 2, Fig. 2c). The observed pattern may be best explained by ocean current movements, in which the Straits of Malacca act as a channel connecting the Andaman Sea with South China Sea, flanked by Indonesian island of Sumatra and Peninsular Malaysia land mass, linking the mangrove ecosystems between the western and eastern regions through hydrological cycle 33,42 . The ocean current movements along the Straits of Malacca and South China Sea are www.nature.com/scientificreports/ highly dependent on the monsoon winds during the north-east (December-January) and south-west (June-July) monsoons 33 . This enables the mangroves propagules to float in ocean water for an extended period and follow the ocean currents, transported from the source to the sink 43 . Due to the shallow waters at water depth ~ 30 m, which is narrow in the southern part of the Straits of Malacca before meeting the South China Sea 42 , the propagules of either region travelled to the sink population and formed the observed admixed populations (Sub-cluster 1b). Studies have shown that R. apiculata are strong dispersers with high propagules survivorship in seawater with capability of long-distance dispersal 44 . Nevertheless, the potential of long-distance dispersal can be limited due to several factors such as ocean circulations, wind, large distances, longevity and land barriers 14,44,45 . Peninsular Malaysia has been reported to serve as a land barrier to gene flow 15,45 , thus promoting genetic differentiation between R. apiculata populations, particularly those from the eastern and western regions of Peninsular Malaysia. Previous study suggested that limited gene dispersal likely played an important role in the evolutionary history of Rhizophora species, as frequent sea level fluctuations associated with climate changes would negatively impact their effective population sizes 9 . In addition, human activities such as logging and developments near mangrove habitat (as recorded in Table 2) might have created anthropogenic barriers that pose serious threats to gene flow. Such barriers can effectively split a species' range into isolated fragments, and dispersal from one population to another can prove difficult.
The low genetic diversity of R. apiculata, along with environmental fragility caused primarily by anthropogenic activities, can decrease its fitness and affect its long-term survival. Prior to this study, there was a lack of genetic and demographic information on R. apiculata, hampering local efforts to develop a conservation plan for the species. The newly generated genetic information will enable the formulation of comprehensive conservation guidelines for R. apiculata in Peninsular Malaysia. Since Sub-cluster 1b of the western cluster exhibited strong admixture pattern that harbours higher levels of genetic diversity, as a reservoir rich with admixed alleles from both western and eastern clusters, this sub-cluster deserves high priority for conservation. Besides, the selection of in situ conservation areas can be considered independently from the two main clusters with priority given to habitat protection to combat the potentially detrimental effects of inbreeding in the remaining R. apiculata populations.

Methods
Plant material collection and habitat condition survey. Sampling was carried out along sheltered coasts of Peninsular Malaysia, where R. apiculata grow abundantly in water zones where saltwater meets fresh water. We collected a total of 1,120 samples from 39 natural populations of R. apiculata between 2017 and 2018, with an average of 29 samples per population (Fig. 1, Table 2). Leaf samples were collected randomly from individual trees, cleaned, and kept in liquid nitrogen prior to DNA extraction. Genomic DNA of R. apiculata was Table 3. Genetic diversity assessment in R. apiculata. *Nuclear microsatellite markers developed for R. mucronata 21 . **Significant at 95% confidence interval. www.nature.com/scientificreports/ extracted with a modified cetyl trimethylammonium bromide (CTAB) method 46 and purified using High Pure PCR Template Preparation Kit ver. 20 (Roche, USA). Besides, habitat condition survey was also carried out based on systematic observation of each population by boat and/or walking on foot, providing visual coverage on the anthropogenic activities of the surrounding area that could contribute to mangrove habitat destruction. We categorised the anthropogenic activities into two categories, 'logged' as logging for firewood, charcoal production and construction, and 'developed' as development for human settlements, aquaculture, ecotourism, road construction and land use for plantation, www.nature.com/scientificreports/ while undisturbed populations were recorded as 'undisturbed' . Most of the R. apiculata sampling sites have experienced some form of anthropogenic activities either due to logging or development ( Table 2). Several sites were considered undisturbed mostly due to poor accessibility and far from human settlements and activities.
Microsatellite marker development and genotyping. The total RNA was extracted using Qiagen RNeasy kit (Qiagen, USA) and purified using the TURBO DNA-free kit (Ambion, USA). The quality of RNA sample was then checked using NanoDrop 2000 spectrophotometer (Thermo Scientific, USA). Transcriptome sequencing was carried out on an Illumina HiSeq 4000 sequencer (Illumina, USA) with default parameters at Novogene Bioinformatics Technology Co. Ltd. (Beijing, China). The raw data underwent quality checking, trimming, and assembling using FastQC 47 , Trimmomatic v0.32 48 and Trinity v2.4.0 49 , respectively. Sequencing reads with many ambiguous (N) bases and more than 50% low-quality bases in raw reads were filtered out from the raw data set. MicroSAtellite program (MISA, https ://pgrc.ipk-gater slebe n.de/misa) 50 was used to identify microsatellite sequences of di-, tri, and tetranucleotide motifs. Markers were then designed with Primer3 (https ://bioin fo.ut. ee/prime r3-0.4.0/) 51 . We used the following criteria for marker selection: microsatellite motif length of ≤ 30 base pairs (bp); amplicon size of between 80 and 400 bp; and rejecting markers and amplicons with multiple mononucleotide repeat sequences. Based on these criteria, 60 primer pairs were synthesised and those that showed clear polymerase chain reaction (PCR) products on 1.5% agarose gel were fluorescently-labelled at the forward primer with HEX or 6-FAM. We performed PCR on a GeneAmp PCR System 9700 (Applied Biosystems, USA) in a final reaction volume of 10 μL. The PCR thermocycling parameters consisted of an initial denaturation step of 4 min at 94 °C followed by 40 cycles 94 °C for 1 min, 55 °C annealing temperature for 30 s, and 72 °C for 40 s, and the final extension step at 72 °C for 30 min. The PCR products were genotyped on an ABI 3130xl Genetic Analyzer (Applied Biosystems, USA) with ROX400 as the internal size standard. The alleles were scored using GeneMarker v2.6.4 (SoftGenetics, USA). The assembled contig sequences of the successfully developed microsatellite markers were assessed for reliability by blasting them to the R. apiculata genome sequence 22 using BLASTN.

Data analysis
Micro-Checker software was used to detect possible genotyping errors and null alleles 52 . We examined deviations from Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium (LD) using Fisher's exact test in Genetic Data Analysis (GDA) v1.1 53 . The sequential Bonferroni's correction was conducted to adjust critical values for multiple comparisons, with a significance level of 5% 54 . Allelic frequencies for each locus in each population were obtained. The levels of genetic diversity within each population were estimated using Microsatellite Toolkit 55 based on several parameters, including the average number of alleles per polymorphic locus (A a ), effective number of alleles per polymorphic locus (A e ), observed heterozygosity (H o ), and expected heterozygosity (H e , Nei's genetic diversity) 28 . Allelic richness (R s ) was estimated using FSTAT v2.9.3 56 and GDA v1.1, respectively. On the other hand, the inbreeding coefficient (F IS ) was calculated using FSTAT v2.9.3 56 , and its significance was tested with GDA v1.1. Additional statistics, such as polymorphic information content (PIC) and Nei's genetic distance (Nei's D A ) 57 , were calculated using POWERMARKER v3.25 58 .
We also analysed the levels of genetic differentiation in R. apiculata populations using the molecular fixation index (F ST ) 59 , the divergence parameter (R ST ) 38 , and Analysis of Molecular Variance (AMOVA), which were all determined by GenAlEx v6.5 at the significance level of 5% 60 . A Mantel test based on 9,999 permutations was carried out to examine if there was correlation between geographical distance and genetic differentiation (F ST , R ST and F ST /1 − F ST ), by analysing the Isolation-by-Distance (IBD) in GenAlEx v6.5. To infer the populations' genetic structure, we applied a model-based Bayesian clustering method using STRU CTU RE v2.3.1 61 . Dataset was explored using admixture model, which can detect structure among populations that are potentially similar due to shared ancestry or migration, with a burn-in of 250,000 steps followed by 850,000 Markov Chain Monte Carlo (MCMC) iterations. The StructureSelector software 62 was then used to select and visualise the optimal number of clusters (K) in order to identify the highest level of genetic division hierarchy. Additionally, principal component analysis (PCA) was carried out using PCAGEN v1.2 63 to assess the goodness-of-fit between simulated and real datasets, visualizing genetic distance and relatedness between populations in a two dimensional (2D) standard plot. To complement the analysis of the population structure, a UPGMA dendrogram was constructed based on Nei's D A using MEGA v5.0 64 . Bootstrap of 1,000 times was applied to acquire a reliable tree with correct branch topology. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.