Introduction

Southeast Asia holds the greatest diversity of mangrove species in the world1. The mangroves, which form the habitat for hundreds of species at all levels of near-shore food webs, provide numerous ecosystems functions such as storm surge protection and carbon sequestration for climate change mitigation2,3. However, in recent decades, mangrove populations have plummeted owing mainly to various anthropogenic activities, which contribute to widespread mangrove forest degradation through land clearing, illegal logging and the over-exploitation of their highly valuable wood1,4,5. These were evident with around 16% mangrove species are in danger of extinction6.

Habitat fragmentation can affect the genetic structure of a species, primarily by increasing levels of genetic drift and inbreeding within populations, which consequently reduces the genetic diversity within the gene pool7. Given that populations with a lack of genetic diversity often exhibit a greater risk of extinction8, examining the genetic diversity patterns of the threatened mangrove populations will be the key to ensure their long-term survival. The patterns of genetic diversity in a population can be shaped by both natural (evolution) and environmental events such as human-induced climate change. Unveiling insights into the mechanisms underlying these patterns are crucial to ensure informed conservation and management of the threatened species.

Rhizophora is the most widely distributed species among all the mangroves species9. Among them, Rhizophora apiculata is a dominant commercial mangrove species that plays important ecological and economic role in Malaysia10. Locally known as Bakau Minyak, this species is preferred for its hard, strong and heavy wood, harvested mostly for wood chips, furniture, and charcoal11,12. While the population size of R. apiculata has been reduced dramatically over the past few decades13, little is known about the population genetics of this species. Previous work on the genus Rhizophora revealed low genetic diversity and high genetic differentiation, which seemed to be common traits for mangroves14,15,16,17,18. However, as most studies aimed to cover a large geographical area, sampling was not thorough, e.g. for Malaysia, only a few populations were chosen to represent the country (see15,18). The genetic information generated from a few populations is not sufficient to develop sound conservation plans for the species in Malaysia.

Here, we developed a new set of genic microsatellite markers (EST-SSR) in R. apiculata using next generation sequencing (NGS) approach, to assess its level of genetic diversity and population differentiation throughout Peninsular Malaysia. Microsatellite markers have been widely used for population analyses because of their co-dominant inheritance and high degree of polymorphism (for recent work see19,20). The markers developed in this study will provide a detailed genetic diversity database of R. apiculata which can facilitate the formulation of conservation guidelines in Peninsular Malaysia.

Results

Identification and development of microsatellite markers in R. apiculata

A total of 25,938,686 raw RNA-seq reads were generated by the Illumina HiSeq 4000 sequencer, with a total of 25,627,792 clean reads remained for further analysis after the ambiguous and low-quality sequences were filtered. These clean reads were then assembled, resulting in 141,915 contigs with an overall GC content of 44%. Using MISA software, these contigs were analysed and found to harbour 18,674 microsatellites, with the highest distribution in dinucleotide (15,898, 85.13%), followed by trinucleotide (2,403, 12.87%) and tetranucleotide microsatellites (373, 2.00%). Three highest frequencies of dinucleotide motifs that were detected were CT (16.80%), AG (16.68%) and TC (13.45%). Out of the 60 primer pairs tested, 46 yielded successful amplification. These primers were then used to screen 24 individuals of R. apiculata via PCR amplification and fragment analysis. Nineteen out of 46 primer pairs which showed clear genetic polymorphisms along with three other primer pairs (RM111, RM116, and RM121) previously developed for R. mucronata21 were selected (Table 1). The assembled contig sequences from which the newly developed microsatellite markers derived were blasted against the genome sequence of R. apiculata22 to check the reliability (Supplementary Table S1). These markers were subsequently used to genotype the 1,120 R. apiculata samples collected throughout Peninsular Malaysia (Fig. 1 and Table 2).

Table 1 Information on the 22 microsatellite markers for R. apiculata and the corresponding Genbank accession number.
Figure 1
figure 1

An overview of sample collection in Peninsular Malaysia. See Table 2 for detailed information on sampling sites. The map was generated using the ArcGIS v10.5 (https://desktop.arcgis.com/en/, ESRI). The Malaysian administrative boundary data for mapping was downloaded from the iGISMAP (https://map.igismap.com/share-map/export-layer/Malaysia_Polygon/cedebb6e872f539bef8c3f919874e9d7).

Table 2 Sampling location, habitat condition and genetic diversity measures (number of alleles per locus, Aa; observed heterozygosity, Ho; expected heterozygosity, He; and allelic richness, Rs) and inbreeding coefficient (FIS) of the 39 populations of R. apiculata from Peninsular Malaysia.

Genetic diversity

Descriptive statistics for the 39 populations of R. apiculata based on the 22 polymorphic microsatellite markers are listed in Table 2. These microsatellite markers had number of alleles per locus (Aa) ranging from 2.3 to 4.6 alleles for all samples, with an average of 3.2 alleles per locus. The mean values for observed heterozygosity (Ho) and expected heterozygosity (He) were 0.299 and 0.325, respectively. The expected heterozygosity (He) at the population level ranged from 0.247 (Balik Pulau) to 0.503 (Muar). The allelic richness (Rs) ranged from 2.07 to 3.63, showing marginal differences among the populations investigated. All loci deviated from Hardy–Weinberg equilibrium (HWE), with heterozygote frequencies being either lower or higher than expected (Table 2). Approximately 95.9% of the 39 tested populations had excess of homozygotes with positive inbreeding coefficient values (FIS) ranging from 0.007 to 0.450. Of that, 66.7% (26 populations) were found to be statistically significant (p < 0.05), mainly of the populations with disturbed habitat, either due to logging for firewood, charcoal production and construction (Telok Gedong and Merchang) or development for human settlement, aquaculture, ecotourism, road construction and land use for plantation (Kubang Badak, Merbok, Balik Pulau, Trong, Sungai Batang, Banjar Utara, Pulau Ketam, Sepang Besar, Sepang Kecil, Pulau Besar, Muar, Pulau Kukup, Tanjung Piai, Kuantan, Peramu, Cherating, Kuala Kemaman and Kuala Terengganu). There were only two populations with negative FIS values (Pulau Tengah, FIS = − 0.035 and Merlimau Tambahan; FIS = − 0.023) or excess of heterozygotes but were not statistically significant (p < 0.05) (Table 2).

Most of the total genetic diversity (HT = 0.532) was partitioned within population (HS = 0.370) (Table 3). The values recorded for Wright index (FST) ranged from 0.079 to 0.638, while those for the Slatkin’s divergence parameter (RST), varied from 0.038 to 0.617. The proportion of genetic variation distributed among populations (GST) was estimated at 0.305, indicating that 30.5% of genetic variability was distributed among populations (Table 3). The mean FST (0.315) estimate was slightly higher than GST and was significantly greater than zero (p < 0.05), while the mean RST (0.242) was lower than FST (Table 3).

Table 3 Genetic diversity assessment in R. apiculata.

Population genetic structure

Genetic differentiation between populations increased as distance between populations increased (R2 = 0.2602) (Fig. 2a). Additionally, the results from the model-based Bayesian clustering analysis using STRUCTURE v2.3.1 indicated the presence of two main clusters within Peninsular Malaysia, one formed by the populations in western Peninsular Malaysia (Cluster 1: populations 1–27), and the other by those in eastern Peninsular Malaysia (Cluster 2: populations 28–39). The STRUCTURE algorithm showed the best clustering at K = 2, providing good biological explanation since the clusters coincided with two geographical groups23 corresponding to the Straits of Malacca (western Peninsular Malaysia) and South China Sea (eastern Peninsular Malaysia) (Fig. 2b). A gradually increasing level of admixture was observed in populations between Kedah and west part of Johor (Cluster 1: populations 1–27). When genetic variation is hierarchically organised, the algorithm underlying STRUCTURE detects only the uppermost level of population structure23. Hence, STRUCTURE analysis within cluster level further divided western cluster into two sub-clusters: Sub-cluster 1a (populations 1–19) and Sub-cluster 1b (populations 20–27) (Fig. 2c). Interestingly, the observed strong admixture pattern in Sub-cluster 1b suggests that the populations within this sub-cluster harbour higher levels of genetic diversity and could be an important genetic reservoir for R. apiculata in Peninsular Malaysia.

Figure 2
figure 2

(a) Result of Mantel test for Isolation-by-Distance using Nei’s genetic distance; (b) STRUCTURE showing the division of R. apiculata populations into two main clusters; (c) Sub-structuring of Cluster 1, forming Sub-clusters 1a and 1b; (d) Principal component analysis (PCA) based on pairwise FST of 39 R. apiculata populations, assigning the populations into two distinct clusters that coincide with western (orange color: Straits of Malacca) and eastern (blue color: South China Sea) Peninsular Malaysia and (e) PCA showing Cluster 1 further divided into two sub-clusters.

Results from the principal component analysis (PCA) corroborates with the STRUCTURE analysis, whereby the 39 populations were also divided into two distinct clusters (Fig. 2d), i.e. Cluster 1 along the Straits of Malacca (western Peninsular Malaysia) and Cluster 2 facing the South China Sea (eastern Peninsular Malaysia). Similarly, Cluster 1 was further divided into two sub-clusters (Fig. 2e). To complement the population structure analysis, a UPGMA dendrogram was constructed with MEGA v5.0 based on Nei’s DA24, producing two main branches that illustrate the two major clusters and two sub-clusters within Cluster 1 (Fig. 3).

Figure 3
figure 3

UPGMA dendrogram based on mean character differences estimated from microsatellite data of 1,120 R. apiculata individuals in 39 populations.

When the 39 populations were grouped based on the two main clusters, AMOVA revealed that 45% of the variation was apportioned between the western and eastern regions of Peninsular Malaysia, 13% among populations within regions, and 42% within populations (Table 4).

Table 4 Analysis of molecular variance (AMOVA) performed by dividing the 39 populations into geographical regions.

Discussion

From our transcriptome data, we identified a total of 18,674 microsatellites, with dinucleotide repeat motifs (85.13%) being the most frequent type of microsatellite. Similar observations, whereby dinucleotides was the most abundant motif in the plants’ genomes, have been reported for other tree species such as grey mangrove (Avicennia marina)25, downy oak (Quercus pubescens)26 and crape myrtle (Lagerstroemia spp.)27. A total of 19 transcriptome-based microsatellite markers (EST-SSR) were developed in this study. To increase the number of markers for genotyping, three additional nuclear microsatellite markers (gSSR) which were previously developed for R. mucronata21 have been validated and utilised in our study. Increasing the number of polymorphic loci can greatly enhance the precision of estimates of genetic distance28. The high cross-species transferability of microsatellite markers between R. apiculata and R. mucronata has previously been demonstrated9.

Generally, low genetic diversity is common in mangrove plant species particularly the Rhizophora species9,16,22. Low levels of genetic diversity (mean He = 0.352, Table 2) in R. apiculata were observed in the present study. The result is comparable to other mangrove species such as R. mucronata (He = 0.354)9, R. stylosa (He = 0.321)9, and Sonneratia alba (He = 0.280)29. The low level of genetic diversity occurring within R. apiculata populations suggests that this species may have experienced severe habitat fragmentation or population size reduction.

Our results also showed significant excess of homozygotes or positive inbreeding coefficient values (FIS) in most of the R. apiculata populations, consistent with the observed habitat destruction found in most of these populations (Table 2). Hence, the observed positive FIS values might be the result of selfing and/or mating between close relatives due to the reduction in population size caused by the anthropogenic activities. This is also a common result when drift proceeds in small population. As drift proceeds and each population becomes different from one other, the genetic variation among populations increases30,31.

R. apiculata is considered a threatened mangrove species in Southeast Asia, and its population decline is due mainly to deforestation and land conversion4,32,33. The high wood density is an attractive selling factor for R. apiculata (0.600 gcm-3), causing extensive over-exploitation of the species12,34. Owing to this, decline in effective population size and aggravate loss of genetic diversity ultimately reduced resilience of populations to anthropogenic climate change35. The genetic fragility in R. apiculata is of concern, considering that future impacts of environmental changes, whether natural or otherwise, will likely to further reduce its genetic diversity and threaten its long-term viability.

Wright’s F-statistics36 is among the most common methods used to estimate the level of heterozygosity in a population. The FST is more sensitive in detecting intraspecific differentiation as compared to its analogue, the RST37,38. The RST, however, is thought to be a better predictor for interspecific divergence because it can effectively reflect the mutation patterns of microsatellites39. Therefore, both models were applied in the present study, revealing a strong population genetic structure of R. apiculata (FST = 0.315, RST = 0.242). We found that approximately 68.5% and 31.5% of genetic variations were distributed within and among R. apiculata populations, respectively. Other mangrove studies also revealed high population differentiations, namely in Avicennia marina (0.410)25, Ceriops tagal (0.529)40, and A. germinans (0.410)41.

All the three cluster analyses performed in this study showed similar results, whereby the 39 populations were divided into two major geographical clusters (Cluster 1: populations 1–27; and Cluster 2: populations 28–39), with strong admixture pattern observed between southern part of western and eastern regions of Peninsular Malaysia (Fig. 2b), sub-structuring further divided Cluster 1 (Fig. 2c) into two sub-clusters. Due to the strong admixture of alleles, Sub-cluster 1b (populations 20–27) harbours higher genetic diversity (Table 2, Fig. 2c). The observed pattern may be best explained by ocean current movements, in which the Straits of Malacca act as a channel connecting the Andaman Sea with South China Sea, flanked by Indonesian island of Sumatra and Peninsular Malaysia land mass, linking the mangrove ecosystems between the western and eastern regions through hydrological cycle33,42. The ocean current movements along the Straits of Malacca and South China Sea are highly dependent on the monsoon winds during the north-east (December–January) and south-west (June–July) monsoons33. This enables the mangroves propagules to float in ocean water for an extended period and follow the ocean currents, transported from the source to the sink43. Due to the shallow waters at water depth ~ 30 m, which is narrow in the southern part of the Straits of Malacca before meeting the South China Sea42, the propagules of either region travelled to the sink population and formed the observed admixed populations (Sub-cluster 1b).

Studies have shown that R. apiculata are strong dispersers with high propagules survivorship in seawater with capability of long-distance dispersal44. Nevertheless, the potential of long-distance dispersal can be limited due to several factors such as ocean circulations, wind, large distances, longevity and land barriers14,44,45. Peninsular Malaysia has been reported to serve as a land barrier to gene flow15,45, thus promoting genetic differentiation between R. apiculata populations, particularly those from the eastern and western regions of Peninsular Malaysia. Previous study suggested that limited gene dispersal likely played an important role in the evolutionary history of Rhizophora species, as frequent sea level fluctuations associated with climate changes would negatively impact their effective population sizes9. In addition, human activities such as logging and developments near mangrove habitat (as recorded in Table 2) might have created anthropogenic barriers that pose serious threats to gene flow. Such barriers can effectively split a species' range into isolated fragments, and dispersal from one population to another can prove difficult.

The low genetic diversity of R. apiculata, along with environmental fragility caused primarily by anthropogenic activities, can decrease its fitness and affect its long-term survival. Prior to this study, there was a lack of genetic and demographic information on R. apiculata, hampering local efforts to develop a conservation plan for the species. The newly generated genetic information will enable the formulation of comprehensive conservation guidelines for R. apiculata in Peninsular Malaysia. Since Sub-cluster 1b of the western cluster exhibited strong admixture pattern that harbours higher levels of genetic diversity, as a reservoir rich with admixed alleles from both western and eastern clusters, this sub-cluster deserves high priority for conservation. Besides, the selection of in situ conservation areas can be considered independently from the two main clusters with priority given to habitat protection to combat the potentially detrimental effects of inbreeding in the remaining R. apiculata populations.

Methods

Plant material collection and habitat condition survey

Sampling was carried out along sheltered coasts of Peninsular Malaysia, where R. apiculata grow abundantly in water zones where saltwater meets fresh water. We collected a total of 1,120 samples from 39 natural populations of R. apiculata between 2017 and 2018, with an average of 29 samples per population (Fig. 1, Table 2). Leaf samples were collected randomly from individual trees, cleaned, and kept in liquid nitrogen prior to DNA extraction. Genomic DNA of R. apiculata was extracted with a modified cetyl trimethylammonium bromide (CTAB) method46 and purified using High Pure PCR Template Preparation Kit ver. 20 (Roche, USA).

Besides, habitat condition survey was also carried out based on systematic observation of each population by boat and/or walking on foot, providing visual coverage on the anthropogenic activities of the surrounding area that could contribute to mangrove habitat destruction. We categorised the anthropogenic activities into two categories, ‘logged’ as logging for firewood, charcoal production and construction, and ‘developed’ as development for human settlements, aquaculture, ecotourism, road construction and land use for plantation, while undisturbed populations were recorded as ‘undisturbed’. Most of the R. apiculata sampling sites have experienced some form of anthropogenic activities either due to logging or development (Table 2). Several sites were considered undisturbed mostly due to poor accessibility and far from human settlements and activities.

Microsatellite marker development and genotyping

The total RNA was extracted using Qiagen RNeasy kit (Qiagen, USA) and purified using the TURBO DNA-free kit (Ambion, USA). The quality of RNA sample was then checked using NanoDrop 2000 spectrophotometer (Thermo Scientific, USA). Transcriptome sequencing was carried out on an Illumina HiSeq 4000 sequencer (Illumina, USA) with default parameters at Novogene Bioinformatics Technology Co. Ltd. (Beijing, China). The raw data underwent quality checking, trimming, and assembling using FastQC47, Trimmomatic v0.3248 and Trinity v2.4.049, respectively. Sequencing reads with many ambiguous (N) bases and more than 50% low-quality bases in raw reads were filtered out from the raw data set.

MicroSAtellite program (MISA, https://pgrc.ipk-gatersleben.de/misa)50 was used to identify microsatellite sequences of di-, tri, and tetranucleotide motifs. Markers were then designed with Primer3 (https://bioinfo.ut.ee/primer3-0.4.0/)51. We used the following criteria for marker selection: microsatellite motif length of ≤ 30 base pairs (bp); amplicon size of between 80 and 400 bp; and rejecting markers and amplicons with multiple mononucleotide repeat sequences. Based on these criteria, 60 primer pairs were synthesised and those that showed clear polymerase chain reaction (PCR) products on 1.5% agarose gel were fluorescently-labelled at the forward primer with HEX or 6-FAM. We performed PCR on a GeneAmp PCR System 9700 (Applied Biosystems, USA) in a final reaction volume of 10 μL. The PCR thermocycling parameters consisted of an initial denaturation step of 4 min at 94 °C followed by 40 cycles 94 °C for 1 min, 55 °C annealing temperature for 30 s, and 72 °C for 40 s, and the final extension step at 72 °C for 30 min. The PCR products were genotyped on an ABI 3130xl Genetic Analyzer (Applied Biosystems, USA) with ROX400 as the internal size standard. The alleles were scored using GeneMarker v2.6.4 (SoftGenetics, USA). The assembled contig sequences of the successfully developed microsatellite markers were assessed for reliability by blasting them to the R. apiculata genome sequence22 using BLASTN.

Data analysis

Micro-Checker software was used to detect possible genotyping errors and null alleles52. We examined deviations from Hardy–Weinberg equilibrium (HWE) and linkage disequilibrium (LD) using Fisher’s exact test in Genetic Data Analysis (GDA) v1.153. The sequential Bonferroni’s correction was conducted to adjust critical values for multiple comparisons, with a significance level of 5%54. Allelic frequencies for each locus in each population were obtained. The levels of genetic diversity within each population were estimated using Microsatellite Toolkit55 based on several parameters, including the average number of alleles per polymorphic locus (Aa), effective number of alleles per polymorphic locus (Ae), observed heterozygosity (Ho), and expected heterozygosity (He, Nei’s genetic diversity)28. Allelic richness (Rs) was estimated using FSTAT v2.9.356 and GDA v1.1, respectively. On the other hand, the inbreeding coefficient (FIS) was calculated using FSTAT v2.9.356, and its significance was tested with GDA v1.1. Additional statistics, such as polymorphic information content (PIC) and Nei’s genetic distance (Nei’s DA)57, were calculated using POWERMARKER v3.2558.

We also analysed the levels of genetic differentiation in R. apiculata populations using the molecular fixation index (FST)59, the divergence parameter (RST)38, and Analysis of Molecular Variance (AMOVA), which were all determined by GenAlEx v6.5 at the significance level of 5%60. A Mantel test based on 9,999 permutations was carried out to examine if there was correlation between geographical distance and genetic differentiation (FST, RST and FST/1 − FST), by analysing the Isolation-by-Distance (IBD) in GenAlEx v6.5. To infer the populations’ genetic structure, we applied a model-based Bayesian clustering method using STRUCTURE v2.3.161. Dataset was explored using admixture model, which can detect structure among populations that are potentially similar due to shared ancestry or migration, with a burn-in of 250,000 steps followed by 850,000 Markov Chain Monte Carlo (MCMC) iterations. The StructureSelector software62 was then used to select and visualise the optimal number of clusters (K) in order to identify the highest level of genetic division hierarchy. Additionally, principal component analysis (PCA) was carried out using PCAGEN v1.263 to assess the goodness-of-fit between simulated and real datasets, visualizing genetic distance and relatedness between populations in a two dimensional (2D) standard plot. To complement the analysis of the population structure, a UPGMA dendrogram was constructed based on Nei’s DA using MEGA v5.064. Bootstrap of 1,000 times was applied to acquire a reliable tree with correct branch topology.