The genetic variation and relationship among the natural hybrids of Mangifera casturi Kosterm

Mangifera casturi Kosterm., a mango plant from Kalimantan Selatan, Indonesia, has limited genetic information, severely limiting the research on its genetic variation and phylogeny. We collected M. casturi’s genomic information using next-generation sequencing, developed microsatellite markers and performed Sanger sequencing for DNA barcoding analysis. These markers were used to confirm parental origin and genetic diversity of M. casturi hybrids. The clean reads of the Kasturi accession were assembled de novo, producing 259 872 scaffolds (N50 = 1 445 bp). Fourteen polymorphic microsatellite markers were developed from 11 040 microsatellite motif-containing sequences. In total, 58 alleles were produced with a mean of 4.14 alleles per locus. Microsatellite marker analysis revealed broad genetic variation in M. casturi. Phylogenetic analysis was performed using internal transcribed spacers (ITS), matK, rbcL, and trnH-psbA. The phylogenetic tree of chloroplast markers placed Kasturi, Cuban, Pelipisan, Pinari, and Hambawang in one group, with M. indica as the female ancestor. Meanwhile, the phylogenetic tree of ITS markers indicated several Mangifera species as ancestors of M. casturi. Thus, M. casturi very likely originated from the cross-hybridization of multiple ancestors. Furthermore, crossing the F1 hybrids of M. indica and M. quadrifida with other Mangifera spp. may have generated much genetic variation. The genetic information for M. casturi will be a resource for breeding improvement, and conservation studies.

Also, from NGS data, it is easier to obtain genetic information such as microsatellite markers, which are superior to other markers like RAPD and AFLP and are already used in other Mangifera species 8,9 . Microsatellite markers can determine distinct variations at the level of species as they are codominant; as a result, they are widely used in population and genetic studies 10 . Microsatellite markers have also been used to determine the genetic variation in M. indica 11 . Although microsatellites are important for taxonomy and the study of genetic conservation, no M. casturi-specific microsatellite markers have been reported.
In recent years, Sanger sequencing approaches have been utilized for DNA barcoding. DNA barcoding methods based on chloroplast regions, such as rbcL, matK 12 , and trnH-psbA 13 , internal transcribed spacers (ITS), and second internal transcribed spacers (ITS2) from nuclear ribosomal DNA 14 , have been widely used for phylogenetic analysis at various taxonomic levels. These DNA barcoding markers from chloroplast regions can also be determined at the genus or family level because of their inheritance from a maternal ancestor. On the other hand, the ITS region can determine the barcoding of the paternal and maternal ancestor. However, DNA barcoding sequences for M. casturi have not been recorded in any public database. As a result, there have been no phylogenetic studies of M. casturi using DNA barcoding to achieve accurate identification at the taxonomy level.
This study aimed to collect genomic information from M. casturi using NGS and Sanger and to analyze and determine the genetic variation among the M. casturi hybrids. Microsatellite markers were used to assess genetic variation among M. casturi hybrids, Kasturi, Cuban, Pelipisan, and Pinari. Furthermore, at a higher taxonomy level, phylogenetic analysis of M. casturi hybrids and Mangifera species was performed using DNA barcoding. In addition, there is no clear information and proof about the genetic variation and relationship among M. casturi hybrids. In this study, we propose a candidate ancestor from a natural hybridization of M. indica and M. quadrifida.

Results
In this study, 11.01 Gbp of M. casturi DNA was obtained with high-throughput sequencing using an Illumina HiSeq 4000 system with paired end 150 bp reads. The raw data were registered in the DDBJ with accession number DRA011022. Clean reads were obtained via filtering, and 10.95 Gbp of de novo genome assembly was performed using a Ray Assembler. We obtained 259 872 scaffolds with an N50 value of 1 445 bp and a maximum scaffold length of 144 601 bp ( Table 1). The genome assembly and annotation completeness were assessed using BUSCO, and complete ratio, universal single-copy orthologs were found to be 42.3% similar using a plant reference database (Table 2).
Microsatellite markers were identified using the MISA program, producing 11 040 sequences with at least one microsatellite motif, and 770 sequences with more than one site ( Table 3). The trinucleotide motifs predominated with 52.77%, followed by the dinucleotide motif with 33.3%. Fourteen candidate sequences were selected and identified (Table 4). Finally, all the confirmed primers were amplified and registered in the DDBJ with accession numbers (Table 4).
Eight samples, namely Pelipisan, Cuban, Pinari, Kasturi, Rawa-rawa, M. foetida, M. quadrifida, and M. indica (Supplementary Table 1), were used to validate and determine the allele size of the microsatellite markers using QIAxcel capillary electrophoresis. The 14 primers produced 58 alleles in total and a mean of 4.14 alleles per locus (Table 4). All the loci were polymorphic (  ), reflecting the degree of genetic differentiation, ranged from 0.08 to 1.00 with an average of 0.60 per locus. Lastly, Shannon's information index ranged from 0.69 and 1.75 with a mean of 1.21. The mc230178 and mc58089 loci produced seven alleles from eight samples, while the mc122955 and mc88387 loci produced two alleles. Some loci, namely mc176197, mc21672, and mc88075, displayed the same alleles between Kasturi and Cuban. In the mc88387 locus, only the Kasturi sample was not amplified; thus, this locus might have a null allele in Kasturi. Therefore, mc88387 locus can be used to identify M. casturi accessions in a natural population, as it is dissimilar to other Mangifera species.
Furthermore, principal coordinate analysis was performed using the GENALEX 6.501, indicating that 33.85% of the variance within the microsatellite data was graphed by the first axis, and 19.43% by the second axis (Fig. 2). Additionally, the eight samples could clearly group into three clusters, Hambawang (M. foetida), Pelipisan, Pinari, M. indica in one cluster, while Kasturi and Cuban in the second cluster, and Rawa-rawa and M. quadrifida in the third cluster.
The result was also presented in a UPGMA dendrogram (Fig. 3). M. quadrifida and Rawa-rawa were placed in the same clade. All the M. casturi accessions were in the same clade as M. indica and Hambawang (M. foetida). Cuban were most closely related to M. indica. However, Kasturi and Pelipisan were in the same clade where some loci showed similar alleles. Thus, these accessions had a closer genetic relationship to each other than to Cuban. However, Pinari also exhibited distinct genetic differences from the other M. casturi accessions, even though Cuban was quite distant from other M. casturi accessions.
Next, phylogenetic analysis was performed using three widely used chloroplast markers, matK, rbcL, and trnH-psbA (Fig. 4). The matK, rbcL, and trnH-psbA sequences of Kasturi, Cuban, Pelipisan, Pinari, and Hambawang were obtain using a 3500 Genetic Analyzer (Applied Biosystems) and were deposited in the DDBJ Nucleotide Sequence Submission System under the accession number LC602976-LC602993. However, the matK, rbcL, and trnH-psbA sequences from other Mangifera species were downloaded from the public nucleotide database of NCBI (Supplementary File 2). The matK phylogenetic tree showed that Kasturi, Cuban, Pelipisan,

Discussion
The Mangifera genus originates from southeast Asia and has polyembryonic seeds, derived from gametes or nucellar cell components 2 . Most Mangifera flowers are either hermaphrodites or males 32 ; thus, self-crossing can occur in various species. However, self-incompatibility in the Mangifera genus has been reported in several types of mangos 33 , suggesting that various Mangifera species can cross-hybridize 2,34 . As a result, cross-hybridization in the natural populations has produced many interspecies, including M. odorata (Kuini), a natural hybrid between M. indica and M. foetida 9 .
Microsatellite markers have been used successfully to determine genetic variation among many plants 35,36 . Microsatellites in Mangifera species that were previously identified in M. indica have been useful in the genetic analysis of genus Mangifera and its related genera 37,38 . In This study, genetic analysis has revealed markers in 14 microsatellite loci and that different allele sizes have arisen from four accessions of M. casturi namely Kasturi, Cuban, Pelipisan, and Pinari. The expected heterozygosity (He) value of the microsatellite markers used in the M.    www.nature.com/scientificreports/ casturi analysis ranged between 0.40 and 0.80, with an average of 0.65, which indicated that the highly informative microsatellite markers could be employed in genetic diversity studies of M. casturi. In this study, a high level of genetic variation was discovered in M. casturi accessions, likely arising from repetitive interspecific hybridization. In Petunia, microsatellite markers have determined genetic differentiation and hybrid identification 39 . The accessions of Kasturi, Cuban, and Pelipisan were more closely related than Pinari. Kasturi and Cuban are very similar in fruit size. However, morphologically, the fruit shape of Kasturi is more oval than Cuban. In contrast, a Pelipisan fruit is more oval and slightly larger than Kasturi and Cuban. Pinari has the largest fruit size among the M. casturi accessions. Lastly, Pinari is classified into the M. casturi group by the locals, based on its purplish skin similar to that of other M. casturi accessions 4 . Intraspecies genetic variation can occur because of multiple cross-hybridizations among several species. Using microsatellite markers, hybridization between Juglans regia and J. cathayensis indicated a rare phenomenon and backcrosses between hybrids and either of the parental species 40 . In addition, Kuini (M. odorata), a natural hybrid between M. indica and M. foetida was revealed by AFLP analysis to represent a simultaneous backcross between the F1 hybrids of Kuini and M. foetida 4,9 . On the other hand, SNP analysis using doubledigest restriction-site-associated DNA (ddRAD) 4 , revealed M. casturi to be a natural hybrid between M. indica and M. quadrifida, whereas their F1 hybrid was a backcross with M. indica. Morphologically, M. casturi is very close to M. quadrifida, with the same purplish skin and small fruit size 4 . Therefore, M. casturi has several types known as Kasturi, Cuban, Pelipisan, and Pinari. These hybrids are believed to be hybrids between M. indica and M. quadrifida and backcrosses between hybrids and either of the ancestors.
In an allopolyploid plant such as mangosteen (Garcinia mangostana), microsatellite markers indicate crosshybridization with multiple ancestors, including G. malaccensis, G. celebica, and G. porrecta 22 . In this study, microsatellite analysis results showed that four accessions of M. casturi, namely Kasturi, Cuban, Pelipisan, and Pinari had allelic differences in all the microsatellite loci. However, allele sharing between the four accessions was detected in the mc8693 locus with an allele size of 160/182, indicating that these accessions were derived from the same ancestor, M. indica. In contrast, the allele differences in the microsatellite loci suggested that the four M. casturi accessions underwent cross-hybridization with multiple ancestors. In Oaks (Quercus spp.), microsatellites indicate sharing most alleles in their hybrids than recurrent gene flow 41 .
Plant taxonomists have used the chloroplast coding regions matK, rbcL, and trnH-psbA intergenic spacers in DNA barcoding analysis 42 . DNA barcoding analysis using matK and rbcL implied very high nucleotide conservation between the four M. casturi accessions. Also, this evidence indicated that the maternal ancestor of these accessions was identical and that M. indica was one of their maternal ancestors. Additional evidence was found in the trnH-psbA phylogenetic tree, where Pinari had a different maternal ancestor from the other accessions. Therefore, one of the M. casturi hybrids may have crossed with other Mangifera species as the maternal ancestor.
In addition to chloroplast markers, the nuclear ribosomal internal transcribed spacer (ITS) region has also been indicated as a barcoding region 43 . ITS sequences are highly variable, conserved region, and biparentally inherited in most angiosperms and widely used to construct a phylogenetic tree for inferring the hybrid origin of species [44][45][46][47] . The ITS phylogenetic tree also revealed that three accessions of M. casturi, excluding Pinari, belonged to the same sub-group, in contrast to a previous hypothesis that M. casturi was a cross-hybrid between M. indica and M. quadrifida 4 . Lastly, the DNA barcoding results also supported the hypothesis that the F1 hybrids of M. casturi crossed with other Mangifera produce natural hybrids of M. casturi that had a high level of genetic variation.
In this study, the genomic data revealed the genetic variation and ancestral origin of M. casturi hybrids. Based on genomic data, we have identified that M. casturi, which is endemic to Kalimantan Selatan, consists of 4 types, namely Kasturi, Cuban, Pelipisan, and Pinari. In Addition, based on the combination of microsatellite data, and DNA barcoding, M. casturi hybrids are natural hybrids between M. indica and M. quadrifida. Moreover, the genomic data represent an important genetic resource for breeding and improving the characteristics of this local mango in the future. The habitat of M. casturi is severely threatened; as a result, it is classified as extinct in the wild. Thus, more intensive conservation efforts are necessary. Moreover, since M. casturi varieties have never been confirmed or registered by authorities, the results of this study can help breeders and the local government to officially document this local mango and one of their elite germplasm.

Microsatellite marker development and validation.
Microsatellite markers were extracted using the MISA version 2.1 (https:// webbl ast. ipk-gater sleben. de/ misa/). 20 from M. casturi scaffolds, the parameters being set to the following minimum repeat levels: six for two bases, and five for three, four, five, and six bases. www.nature.com/scientificreports/ The difference between microsatellite motifs was 100 bases. The microsatellite motif-containing sequences were selected based on parameters; (1) the flanking region are at least 150 bp long in both directions (2) microsatellite repeats have the longest repeat motifs. The primer was designed using the web version of Primer 3 with default parameters 21 . Genomic DNA was isolated using the modified CTAB method, with a slight modification 22 . The quality and quantity of DNA were assessed using a NanoPhotometer NP80 Touch (Implen). A Type-it microsatellite PCR kit (Qiagen) was used to analyze the microsatellite markers. PCR master mix was prepared with a mixture of 3.2 μL RNase-free water, 5 μL 2 × Type-it Multiplex PCR Master Mix, 0.4 Q solution, 0.2 μL of 10 μM forward primer, and 0.2 μL of 10 μM reverse primer. PCR was performed using a SimpliAMP Thermo Cycler (Applied Biosystems). The PCR conditions were as follows: initial conditions of PCR pre-denaturation at 95 °C for 5 min, followed by 32 cycles of denaturation at 95 °C for 30 s, annealing at 57 °C for 1 min 30 s, extension at 70 °C for 30 s, and final extension at 60 °C for 30 min. The amplicons were checked using 1% electrophoresis gel in TAE buffer for 20 min at 100 v. Before loading the sample into QIAxcel capillary electrophoresis (Qiagen), the sample was diluted twice and then run using a QIAxcel DNA High Resolution Kit (Qiagen). Allele size data were confirmed and processed manually using QIAxcel ScreenGel version 1.4.0 (https:// www. qiagen. com/ us/ produ cts/ instr uments-and-autom ation/ quali ty-contr ol-fragm ent-analy sis/ qiaxc el-advan ced-syste m/? catno= 90211 63).
Descriptive statistics were calculated using GENALEX version 6.501 (https:// biolo gy-assets. anu. edu. au/ GenAl Ex/) for each microsatellite marker, including the number of alleles (Na) per locus, and both observed (Ho), and expected (He) heterozygosity, fixation index (F ST ), and Shannon's information index (I). The principal coordinate analysis (PCoA) via Covariance matrix with data standardization was also performed using GENALEX 6.501. The microsatellite data were processed using the Phylip version 3.695 (https:// evolu tion. genet ics. washi ngton. edu/ phylip. html) with the unweighted pair group method and arithmetic mean (UPGMA) method. The resulting dendrogram was edited using the program MEGA-X Software (https:// www. megas oftwa re. net) 23 .

DNA barcoding and phylogenetic analysis.
For the DNA barcoding analysis, we used three chloroplast genes: matK 24 , rbcL 25 , and trnH-psbA 26 and one nuclear DNA region of the internal transcribed spacer (ITS) 27 . PCR barcoding was performed using KOD Plus (Toyobo) according to the manufacturer's protocol. The PCR products were cleaned using ExoSAP-IT PCR Product Cleanup Reagent (Applied Biosystems). Then, PCR sequencing was carried out with a BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems), followed by purification using a BigDye XTerminator Purification Kit (Applied Biosystems) according to the manufacturer protocol. The sequencing products were performed using a 3500 Genetic Analyzer (Applied Biosystems). Sequence data were analyzed using Sequencing Analysis Software version 6.0 (https:// www. therm ofish er. com/ order/ catal og/ produ ct/ 44749 50), and the data were processed with ATGC-MAC version 7 (https:// www. genet yx. co. jp) and MEGA-X software (https:// www. megas oftwa re. net). 23 .
Phylogenetic trees were inferred using the maximum likelihood method and constructed using MEGA-X software 28 . The Mangifera sequences of matK, rbcL, and trnH-psbA, and ITS complete sequences were downloaded from NCBI (Supplementary Table 2). The best DNA model was calculated using MEGA X for each marker 29,30 . Phylogenetic trees were tested using 10 000 bootstrap replicates 31 . Ethics approval and consent to participate. All experiments were performed in accordance with relevant guidelines and regulations. The experimental research has complied with Bogor Agricultural University research regulation (No: 11/SA-IPB/P/2016 on research and publication ethics), and the field study was in accordance with the national legislations of Indonesian Law Number 5/1990 on biological diversity conservation and Indonesia Law Number 11/2013 on the ratification of the Nagoya Protocol. M. casturi samples were collected and exported from Banjarbaru, South Kalimantan to Bogor, West Java with the permission (No: 2020.2.1702.0.K12.000044) from Plant Quarantine Division of National Agency for Agricultural Quarantine in Banjarbaru, South Kalimantan following permit approvals from South Kalimantan Natural Resources Conservation Agency/BKSDA of the Ministry of Environment and Forestry of the Republic of Indonesia (KLHK) as agency in charge of managing conservation areas including protected plant in the territory, particularly the nature reserve forests (wildlife, nature reserves) and national park. The M. casturi samples from Banjar, South Kalimantan as herbaria voucher (received by Agung Sriyono) were duplicated and stored in Banua Botanical Garden, Province of South Kalimantan, Banjarbaru, Indonesia.

Data availability
All sequence data from the next generation sequencing during the current study have been submitted to the DDBJ Read Archive (DRA) under the BioProject accession number PRJDB10715: http:// trace. ddbj. nig. ac. jp/ BPSea rch/ biopr oject? acc= PRJDB 10715. All sequence data from DNA barcoding analysis during the current study have been submitted to the DDBJ Nucleotide Sequence Submission System under the accession number of LC602976-LC602993.