Introduction

Mangifera casturi Kosterm., or Kalimantan mango, is an endogenous fruit plant in Kalimantan Selatan, Indonesia; it is classified as extinct in the wild according to the IUCN Red List1. M. casturi belongs to the Mangifera genus within the Anacardiaceae family2 and is classified as a common ancestor of the Mangifera species in Indonesia3. M. casturi is proposed to be the natural hybrid of M. indica and M. quadrifida according to single nucleotide polymorphisms (SNPs) analysis4. In Kalimantan Selatan, M. casturi are known by various local names, such as Kasturi, Cuban, Pelipisan, Pinari, and Rawa-rawa (Fig. 1)2. M. casturi bears small fruits, with an attractive purple color and a distinctive aroma; thus, it is a prospective genetic resource for improving mango varieties in the future5. M. casturi also contains useful secondary metabolites, such as lupeol, an antioxidant and anticancer agent6. However, the genomic information on M. casturi remains limited, with only one accession deposited (MF678493.1) in nucleotide repositories such as the NCBI, and one SRA study (SRP183190) reported.

Figure 1
figure 1

Appearance Fruits of M. casturi hybrids (from left to right: Pinari, Pelipisan, Kasturi, and Cuban) and closely related species (Rawa-rawa). This photo was taken by DDM.

Recently, sequencing has advanced significantly from Sanger sequencing to next-generation sequencing (NGS). For example, whole-genome sequencing can produce comprehensive genomic information on a species7. Also, from NGS data, it is easier to obtain genetic information such as microsatellite markers, which are superior to other markers like RAPD and AFLP and are already used in other Mangifera species8,9. Microsatellite markers can determine distinct variations at the level of species as they are codominant; as a result, they are widely used in population and genetic studies10. Microsatellite markers have also been used to determine the genetic variation in M. indica11. Although microsatellites are important for taxonomy and the study of genetic conservation, no M. casturi-specific microsatellite markers have been reported.

In recent years, Sanger sequencing approaches have been utilized for DNA barcoding. DNA barcoding methods based on chloroplast regions, such as rbcL, matK12, and trnH-psbA13, internal transcribed spacers (ITS), and second internal transcribed spacers (ITS2) from nuclear ribosomal DNA14, have been widely used for phylogenetic analysis at various taxonomic levels. These DNA barcoding markers from chloroplast regions can also be determined at the genus or family level because of their inheritance from a maternal ancestor. On the other hand, the ITS region can determine the barcoding of the paternal and maternal ancestor. However, DNA barcoding sequences for M. casturi have not been recorded in any public database. As a result, there have been no phylogenetic studies of M. casturi using DNA barcoding to achieve accurate identification at the taxonomy level.

This study aimed to collect genomic information from M. casturi using NGS and Sanger and to analyze and determine the genetic variation among the M. casturi hybrids. Microsatellite markers were used to assess genetic variation among M. casturi hybrids, Kasturi, Cuban, Pelipisan, and Pinari. Furthermore, at a higher taxonomy level, phylogenetic analysis of M. casturi hybrids and Mangifera species was performed using DNA barcoding. In addition, there is no clear information and proof about the genetic variation and relationship among M. casturi hybrids. In this study, we propose a candidate ancestor from a natural hybridization of M. indica and M. quadrifida.

Results

In this study, 11.01 Gbp of M. casturi DNA was obtained with high-throughput sequencing using an Illumina HiSeq 4000 system with paired end 150 bp reads. The raw data were registered in the DDBJ with accession number DRA011022. Clean reads were obtained via filtering, and 10.95 Gbp of de novo genome assembly was performed using a Ray Assembler. We obtained 259 872 scaffolds with an N50 value of 1 445 bp and a maximum scaffold length of 144 601 bp (Table 1). The genome assembly and annotation completeness were assessed using BUSCO, and complete ratio, universal single-copy orthologs were found to be 42.3% similar using a plant reference database (Table 2).

Table 1 Statistics of de novo assembly from M. casturi using Ray Assembler.
Table 2 Summarized benchmarks in BUSCO annotation from M. casturi scaffolds.

Microsatellite markers were identified using the MISA program, producing 11 040 sequences with at least one microsatellite motif, and 770 sequences with more than one site (Table 3). The trinucleotide motifs predominated with 52.77%, followed by the dinucleotide motif with 33.3%. Fourteen candidate sequences were selected and identified (Table 4). Finally, all the confirmed primers were amplified and registered in the DDBJ with accession numbers (Table 4).

Table 3 Number of microsatellite regions observed in M. casturi scaffolds, total and subdivided by motif.
Table 4 Summary statistics of the fourteen analyzed microsatellite loci.

Eight samples, namely Pelipisan, Cuban, Pinari, Kasturi, Rawa-rawa, M. foetida, M. quadrifida, and M. indica (Supplementary Table 1), were used to validate and determine the allele size of the microsatellite markers using QIAxcel capillary electrophoresis. The 14 primers produced 58 alleles in total and a mean of 4.14 alleles per locus (Table 4). All the loci were polymorphic (Table 5). The observed heterozygosity (Ho) ranged from 0 for 4 markers (mc21672, mc28107, mc88075, and mc88387) to 0.62 for the mc8693 locus with a mean Ho of 0.26. The expected heterozygosity (He) ranged from 0.40 for the mc167596 locus to 0.80 for the mc230178 locus, with a mean of 0.65. The fixation index (FST), reflecting the degree of genetic differentiation, ranged from 0.08 to 1.00 with an average of 0.60 per locus. Lastly, Shannon’s information index ranged from 0.69 and 1.75 with a mean of 1.21.

Table 5 Allele size information of microsatellite loci.

The mc230178 and mc58089 loci produced seven alleles from eight samples, while the mc122955 and mc88387 loci produced two alleles. Some loci, namely mc176197, mc21672, and mc88075, displayed the same alleles between Kasturi and Cuban. In the mc88387 locus, only the Kasturi sample was not amplified; thus, this locus might have a null allele in Kasturi. Therefore, mc88387 locus can be used to identify M. casturi accessions in a natural population, as it is dissimilar to other Mangifera species.

Furthermore, principal coordinate analysis was performed using the GENALEX 6.501, indicating that 33.85% of the variance within the microsatellite data was graphed by the first axis, and 19.43% by the second axis (Fig. 2). Additionally, the eight samples could clearly group into three clusters, Hambawang (M. foetida), Pelipisan, Pinari, M. indica in one cluster, while Kasturi and Cuban in the second cluster, and Rawa-rawa and M. quadrifida in the third cluster.

Figure 2
figure 2

Principal Coordinates Analysis (PCoA) via covariance matrix with data standardization in four M. casturi hybrids and three closely related species using 14 microsatellite loci (1: Hambawang (M. foetida), 2: Pelipisan, 3: Cuban, 4: Pinari, 5: Kasturi, 6: M. quadrifida, 7: Rawa-rawa, 8: M. indica).

The result was also presented in a UPGMA dendrogram (Fig. 3). M. quadrifida and Rawa-rawa were placed in the same clade. All the M. casturi accessions were in the same clade as M. indica and Hambawang (M. foetida). Cuban were most closely related to M. indica. However, Kasturi and Pelipisan were in the same clade where some loci showed similar alleles. Thus, these accessions had a closer genetic relationship to each other than to Cuban. However, Pinari also exhibited distinct genetic differences from the other M. casturi accessions, even though Cuban was quite distant from other M. casturi accessions.

Figure 3
figure 3

Dendrogram for UPGMA analysis in four M. casturi hybrids and four closely related species using 14 microsatellite loci.

Next, phylogenetic analysis was performed using three widely used chloroplast markers, matK, rbcL, and trnH-psbA (Fig. 4). The matK, rbcL, and trnH-psbA sequences of Kasturi, Cuban, Pelipisan, Pinari, and Hambawang were obtain using a 3500 Genetic Analyzer (Applied Biosystems) and were deposited in the DDBJ Nucleotide Sequence Submission System under the accession number LC602976- LC602993. However, the matK, rbcL, and trnH-psbA sequences from other Mangifera species were downloaded from the public nucleotide database of NCBI (Supplementary File 2). The matK phylogenetic tree showed that Kasturi, Cuban, Pelipisan, Pinari, and Hambawang belonged to one group with M. indica and M. sylvatica. In comparison, the rbcL phylogenetic tree placed Cuban and Pelipisan into the same clade as almost all M. indica accessions. Meanwhile, Pinari and Hambawang, separated from this clade, were joined with the M. laurina, M. flava, M. cochinchinensis, M. odorata, and M. duperreana clades. In contrast, the phylogenetic tree analysis using trnH-psbA placed Kasturi, Pelipisan, and Hambawang with M. indica. Lastly, Pinari was close to M. odorata, M. griffithii, M. pajang, M. andamanica, and M. indica.

Figure 4
figure 4

Phylogenetic analysis of M. casturi hybrids compared to the other Mangifera species (deposited in NCBI, Supplementary Table 2) using (a) matK, (b) rbcL, (c) trnH-psbA, and (d) internal transcribed spacers (ITS) by Maximum Likelihood method. (The evolutionary history was inferred by using the Maximum Likelihood method and Tamura 3-parameter model for matK, trnH-psbA, ITS and Jukes-Cantor model for rbcL. The bootstrap consensus tree was inferred from 10 000 replicates. The tree is rooted with the outgroup, Anacardium occidentale.).

On the other hand, the ITS phylogenetic tree produced three large groups: Indica 1, indica 2, and a group containing Kasturi, Cuban, Pelipisan, and Pinari. Hambawang was included in the indica 2 group. Meanwhile, Pinari was placed in a sub-group with M. oblongifolia, M. camptosperma, M. gedebe, and M. flava. Lastly, Kasturi, Cuban, and Pelipisan were included in the other sub-groups with M. casturi (MF678493.1), M. griffthii, M. quadrifida, M. kemanga, M. torquenda, and M. sumatrana.

Discussion

The Mangifera genus originates from southeast Asia and has polyembryonic seeds, derived from gametes or nucellar cell components2. Most Mangifera flowers are either hermaphrodites or males32; thus, self-crossing can occur in various species. However, self-incompatibility in the Mangifera genus has been reported in several types of mangos33, suggesting that various Mangifera species can cross-hybridize2,34. As a result, cross-hybridization in the natural populations has produced many interspecies, including M. odorata (Kuini), a natural hybrid between M. indica and M. foetida9.

Microsatellite markers have been used successfully to determine genetic variation among many plants35,36. Microsatellites in Mangifera species that were previously identified in M. indica have been useful in the genetic analysis of genus Mangifera and its related genera37,38. In This study, genetic analysis has revealed markers in 14 microsatellite loci and that different allele sizes have arisen from four accessions of M. casturi namely Kasturi, Cuban, Pelipisan, and Pinari. The expected heterozygosity (He) value of the microsatellite markers used in the M. casturi analysis ranged between 0.40 and 0.80, with an average of 0.65, which indicated that the highly informative microsatellite markers could be employed in genetic diversity studies of M. casturi. In this study, a high level of genetic variation was discovered in M. casturi accessions, likely arising from repetitive interspecific hybridization. In Petunia, microsatellite markers have determined genetic differentiation and hybrid identification39. The accessions of Kasturi, Cuban, and Pelipisan were more closely related than Pinari. Kasturi and Cuban are very similar in fruit size. However, morphologically, the fruit shape of Kasturi is more oval than Cuban. In contrast, a Pelipisan fruit is more oval and slightly larger than Kasturi and Cuban. Pinari has the largest fruit size among the M. casturi accessions. Lastly, Pinari is classified into the M. casturi group by the locals, based on its purplish skin similar to that of other M. casturi accessions 4.

Intraspecies genetic variation can occur because of multiple cross-hybridizations among several species. Using microsatellite markers, hybridization between Juglans regia and J. cathayensis indicated a rare phenomenon and backcrosses between hybrids and either of the parental species40. In addition, Kuini (M. odorata), a natural hybrid between M. indica and M. foetida was revealed by AFLP analysis to represent a simultaneous backcross between the F1 hybrids of Kuini and M. foetida4,9. On the other hand, SNP analysis using double-digest restriction-site-associated DNA (ddRAD)4, revealed M. casturi to be a natural hybrid between M. indica and M. quadrifida, whereas their F1 hybrid was a backcross with M. indica. Morphologically, M. casturi is very close to M. quadrifida, with the same purplish skin and small fruit size4. Therefore, M. casturi has several types known as Kasturi, Cuban, Pelipisan, and Pinari. These hybrids are believed to be hybrids between M. indica and M. quadrifida and backcrosses between hybrids and either of the ancestors.

In an allopolyploid plant such as mangosteen (Garcinia mangostana), microsatellite markers indicate cross-hybridization with multiple ancestors, including G. malaccensis, G. celebica, and G. porrecta22. In this study, microsatellite analysis results showed that four accessions of M. casturi, namely Kasturi, Cuban, Pelipisan, and Pinari had allelic differences in all the microsatellite loci. However, allele sharing between the four accessions was detected in the mc8693 locus with an allele size of 160/182, indicating that these accessions were derived from the same ancestor, M. indica. In contrast, the allele differences in the microsatellite loci suggested that the four M. casturi accessions underwent cross-hybridization with multiple ancestors. In Oaks (Quercus spp.), microsatellites indicate sharing most alleles in their hybrids than recurrent gene flow41.

Plant taxonomists have used the chloroplast coding regions matK, rbcL, and trnH-psbA intergenic spacers in DNA barcoding analysis42. DNA barcoding analysis using matK and rbcL implied very high nucleotide conservation between the four M. casturi accessions. Also, this evidence indicated that the maternal ancestor of these accessions was identical and that M. indica was one of their maternal ancestors. Additional evidence was found in the trnH-psbA phylogenetic tree, where Pinari had a different maternal ancestor from the other accessions. Therefore, one of the M. casturi hybrids may have crossed with other Mangifera species as the maternal ancestor.

In addition to chloroplast markers, the nuclear ribosomal internal transcribed spacer (ITS) region has also been indicated as a barcoding region43. ITS sequences are highly variable, conserved region, and biparentally inherited in most angiosperms and widely used to construct a phylogenetic tree for inferring the hybrid origin of species44,45,46,47. The ITS phylogenetic tree also revealed that three accessions of M. casturi, excluding Pinari, belonged to the same sub-group, in contrast to a previous hypothesis that M. casturi was a cross-hybrid between M. indica and M. quadrifida4. Lastly, the DNA barcoding results also supported the hypothesis that the F1 hybrids of M. casturi crossed with other Mangifera produce natural hybrids of M. casturi that had a high level of genetic variation.

In this study, the genomic data revealed the genetic variation and ancestral origin of M. casturi hybrids. Based on genomic data, we have identified that M. casturi, which is endemic to Kalimantan Selatan, consists of 4 types, namely Kasturi, Cuban, Pelipisan, and Pinari. In Addition, based on the combination of microsatellite data, and DNA barcoding, M. casturi hybrids are natural hybrids between M. indica and M. quadrifida. Moreover, the genomic data represent an important genetic resource for breeding and improving the characteristics of this local mango in the future. The habitat of M. casturi is severely threatened; as a result, it is classified as extinct in the wild. Thus, more intensive conservation efforts are necessary. Moreover, since M. casturi varieties have never been confirmed or registered by authorities, the results of this study can help breeders and the local government to officially document this local mango and one of their elite germplasm.

Methods

M. casturi accessions were collected from the Banjar district, Kalimantan Selatan, in the southern region of Kalimantan, three Mangifera species were collected from Banua Botanical Garden (M. quadrifida, M. foetida) and Dramaga, Bogor, West Java (M. indica) (Fig. 1; Supplementary Table 1). To analyze whole genome sequencing, genomic DNA was isolated from Kasturi accession using a DNeasy Power Plant kit (Qiagen) following the manufacturer's protocol. The quality and quantity of DNA were analyzed using a NanoPhotometer NP80 Touch (Implen) spectrophotometer. Genomic DNA samples were sent to Novogen-AIT Singapore with 150 paired-end (PE) reads collected using an Illumina HiSeq4000 system. Raw reads were quality controlled using FASTQC Version 0.11.9 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). 15, and clean reads were filtered using the Fastp version 0.20.1 (https://github.com/OpenGene/fastp) with default parameters16. Clean reads were assembled using a Ray version 2.1.0 (https://github.com/sebhtml/ray) with default parameters (-k 63 -minimum-contig-length 200)17 under the Maser Platform (https://cell-innovation.nig.ac.jp/). 18. The assessment of genome assembly was performed to check the assembled contig quality using BUSCO version 3.0.2 (https://busco.ezlab.org). 19 on the Maser Platform with default parameters.

Microsatellite marker development and validation

Microsatellite markers were extracted using the MISA version 2.1 (https://webblast.ipk-gatersleben.de/misa/). 20 from M. casturi scaffolds, the parameters being set to the following minimum repeat levels: six for two bases, and five for three, four, five, and six bases. The difference between microsatellite motifs was 100 bases. The microsatellite motif-containing sequences were selected based on parameters; (1) the flanking region are at least 150 bp long in both directions (2) microsatellite repeats have the longest repeat motifs. The primer was designed using the web version of Primer 3 with default parameters21.

Genomic DNA was isolated using the modified CTAB method, with a slight modification22. The quality and quantity of DNA were assessed using a NanoPhotometer NP80 Touch (Implen). A Type-it microsatellite PCR kit (Qiagen) was used to analyze the microsatellite markers. PCR master mix was prepared with a mixture of 3.2 μL RNase-free water, 5 μL 2 × Type-it Multiplex PCR Master Mix, 0.4 Q solution, 0.2 μL of 10 μM forward primer, and 0.2 μL of 10 μM reverse primer. PCR was performed using a SimpliAMP Thermo Cycler (Applied Biosystems). The PCR conditions were as follows: initial conditions of PCR pre-denaturation at 95 °C for 5 min, followed by 32 cycles of denaturation at 95 °C for 30 s, annealing at 57 °C for 1 min 30 s, extension at 70 °C for 30 s, and final extension at 60 °C for 30 min. The amplicons were checked using 1% electrophoresis gel in TAE buffer for 20 min at 100 v. Before loading the sample into QIAxcel capillary electrophoresis (Qiagen), the sample was diluted twice and then run using a QIAxcel DNA High Resolution Kit (Qiagen). Allele size data were confirmed and processed manually using QIAxcel ScreenGel version 1.4.0 (https://www.qiagen.com/us/products/instruments-and-automation/quality-control-fragment-analysis/qiaxcel-advanced-system/?catno=9021163).

Descriptive statistics were calculated using GENALEX version 6.501 (https://biology-assets.anu.edu.au/GenAlEx/) for each microsatellite marker, including the number of alleles (Na) per locus, and both observed (Ho), and expected (He) heterozygosity, fixation index (FST), and Shannon's information index (I). The principal coordinate analysis (PCoA) via Covariance matrix with data standardization was also performed using GENALEX 6.501. The microsatellite data were processed using the Phylip version 3.695 (https://evolution.genetics.washington.edu/phylip.html) with the unweighted pair group method and arithmetic mean (UPGMA) method. The resulting dendrogram was edited using the program MEGA-X Software (https://www.megasoftware.net)23.

DNA barcoding and phylogenetic analysis

For the DNA barcoding analysis, we used three chloroplast genes: matK24, rbcL25, and trnH-psbA26 and one nuclear DNA region of the internal transcribed spacer (ITS)27. PCR barcoding was performed using KOD Plus (Toyobo) according to the manufacturer's protocol. The PCR products were cleaned using ExoSAP-IT PCR Product Cleanup Reagent (Applied Biosystems). Then, PCR sequencing was carried out with a BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems), followed by purification using a BigDye XTerminator Purification Kit (Applied Biosystems) according to the manufacturer protocol. The sequencing products were performed using a 3500 Genetic Analyzer (Applied Biosystems). Sequence data were analyzed using Sequencing Analysis Software version 6.0 (https://www.thermofisher.com/order/catalog/product/4474950), and the data were processed with ATGC-MAC version 7 (https://www.genetyx.co.jp) and MEGA-X software (https://www.megasoftware.net). 23.

Phylogenetic trees were inferred using the maximum likelihood method and constructed using MEGA-X software28. The Mangifera sequences of matK, rbcL, and trnH-psbA, and ITS complete sequences were downloaded from NCBI (Supplementary Table 2). The best DNA model was calculated using MEGA X for each marker29,30. Phylogenetic trees were tested using 10 000 bootstrap replicates31.

Ethics approval and consent to participate

All experiments were performed in accordance with relevant guidelines and regulations. The experimental research has complied with Bogor Agricultural University research regulation (No: 11/SA-IPB/P/2016 on research and publication ethics), and the field study was in accordance with the national legislations of Indonesian Law Number 5/1990 on biological diversity conservation and Indonesia Law Number 11/2013 on the ratification of the Nagoya Protocol. M. casturi samples were collected and exported from Banjarbaru, South Kalimantan to Bogor, West Java with the permission (No: 2020.2.1702.0.K12.000044) from Plant Quarantine Division of National Agency for Agricultural Quarantine in Banjarbaru, South Kalimantan following permit approvals from South Kalimantan Natural Resources Conservation Agency/BKSDA of the Ministry of Environment and Forestry of the Republic of Indonesia (KLHK) as agency in charge of managing conservation areas including protected plant in the territory, particularly the nature reserve forests (wildlife, nature reserves) and national park. The M. casturi samples from Banjar, South Kalimantan as herbaria voucher (received by Agung Sriyono) were duplicated and stored in Banua Botanical Garden, Province of South Kalimantan, Banjarbaru, Indonesia.