A First Phylogeny of the Genus Dimocarpus and Suggestions for Revision of Some Taxa Based on Molecular and Morphological Evidence

Dimocarpus longan, commonly known as the longan, belongs to the family Sapindaceae, and is one of the most economically important fruits commonly cultivated in several regions in Asia. There are various cultivars of longan throughout the Thai-Malay peninsula region, but until now no phylogenetic analysis has been undertaken to determine the genetic relatedness of these cultivars. To address this issue, 6 loci, namely ITS2, matK, rbcL, trnH-psbA, trnL-I and trnL-trnF were amplified and sequenced from 40 individuals consisting of 26 longan cultivars 2 types of lychee and 8 herbarium samples. The sequencing results were used to construct a phylogenetic tree using the neighbor-joining (NJ), maximum likelihood (ML) and Bayesian inference (BI) criteria. The tree showed cryptic groups of D. longan from the Thailand-Malaysia region (Dimocarpus longan spp.). This is the first report of the genetic relationship of Dimocarpus based on multi-locus molecular markers and morphological characteristics. Multiple sequence alignments, phylogenetic trees and species delimitation support that Dimocarpus longan spp. longan var. obtusus and Dimocarpus longan spp. malesianus var. malesianus should be placed into a higher order and are two additional species in the genus Dimocarpus. Therefore these two species require nomenclatural changes as Dimocarpus malesianus and Dimocarpus obtusus, respectively.


Results
Data analysis. The sizes of PCR products amplified from ITS2, matK, rbcL, trnH-psbA, trnL-I and trnL-trnF primer were about 300, 690, 540, 520, 340 and 380 base pairs (bp), respectively. Observation of PCR products after electrophoresis though 1.5% agarose gels revealed different product sizes of the trnH-psbA PCR fragment amongst the longan samples (Fig. 1A). The different PCR product sizes may be due to an InDel mutation which was found only in Dimocarpus longan spp. longan var. obtusus (Thao) (lane no. 6 in Fig. 1A). The PCR amplification was performed with another 4 DNA samples extracted from different longan trees which were all Thao cultivar, and the results showed a smaller trnH-psbA PCR fragment in all samples in comparison with other longan cultivars (Fig. 1B).
Conservation of the matK and rbcL gene sequences was observed after multiple sequence alignment. However, significant diversity was observed in the trnH-psbA gene amongst the 26 longan and 2 lychee cultivars. Three locations of InDel mutations and 2 locations of nucleotide substitution were found after the multiple sequence alignment. Interestingly, a 70 nucleotide deletion at position 109 to 178 was observed only in the Thao cultivar. A six nucleotide deletion at position 254 to 259 was found in Thao, Daw Kaew Yee, Baan-Hong 60, Phuen-Mueang and the 2 lychee cultivars. An adenine base insertion at position 289 was found in 4 longan cultivars, namely Thao, Daw Kaew Yee, Baan-Hong 60 and Phuen-Mueang. An adenine base substitution was detected in only 2 lychee cultivars whereas a guanine base substitution was found in the 2 lychee samples and the Thao cultivar. Moreover, a six base pair deletion was found in Thao, Daw Kaew Yee, Baan-Hong 60 and Phuen-Mueang samples. A second nucleotide substitution (guanine; G) occurred at position 277 in Thao and the 2 lychee samples. Finally the insertion of an adenine was detected in the Thao, Daw Kaew Yee, Baan-Hong 60 and Phuen-Mueang cultivars. The result of multiple sequence alignment of trnH-psbA gene is shown in Fig. 2.

Phylogenetic analysis.
A total of 40 individual samples (including out groups) consisting of 3 species 2 subspecies and 26 longan cultivars were used to reconstruct the phylogenetic trees based on the nuclear ITS2 region and 5 plastid markers (matK, rbcL, trnH-psbA, trnL-i and trnL-trnF). A partition homogeneity test by PAUP 4.0b10, using 100 replicates 22 showed no significant differences were found between markers (P = 0.095). The uncorrected p-distance between the taxa ranged from 0.003 to 0.023 [inter/intraspecific p-distances = 0.013 and 0.001, respectively].
The phylogeny based concatenate of nuDNA and chDNA showed the evolutionary relationship among Dimocarpus and its position (Fig. 3). All trees from each DNA dataset were almost congruent in topology. The phylogenetic tree was divided into two main clades with high statistical support (Clade A and Clade B in Fig. 3) with 100, 100 of NJ and ML bootstraps and 1.00 of BI support. Dimocarpus was monophyletic and well separated from the out group. Clade A also divided into 4 sub-clades as Clade 1a, 2a, 3a and 4a with moderate to high statistical support (Fig. 3). The 4 sub-clades consist of D. longan spp. longan var. obtusus (Thao) (Clade 1a in Fig. 3) with 99, 100 of NJ and ML bootstraps and 1.00 of BI support, the D. longan spp. longan var. longan were grouped together (Clade 2a in Fig. 3) with 0.83 of BI support, D. autralianus (from Australia) and D. fumatus (from Malaysis) were grouped together (Clade 3a in Fig. 3) with 99, 100 of NJ and ML bootstraps and 1.00 of BI support and D. longan spp. malesianus var. malesianus (Clade 4a in Fig. 3) with 70, 73 of NJ and ML bootstraps and 0.95 of BI support. The result shows that Dimocarpus longan was polyphyletic and separated into three sub-clades as 1a, 2a and 4a (Fig. 3). This result shows that the taxonomy of Dimocarpus longan is confusing and needs to be clarified.

Species Delimitation.
The PTP analysis revealed that the likelihood of the null model in that all sequences belong to a single species was found to be significantly lower than the maximum likelihood species delimitation (P < 0.001). For the PTP analysis, the results revealed delimitation of five in group PSHs, hereafter denoted PSH-A to PSH-E (Fig. 4). Dimocarpus longan sensulato was delimited as three PSHs (PSH-A, PSH-B and PSH-E in Fig. 4). The tree showed the cryptic groups of Dimocarpus longan from both Thailand and the Malaysia region. Cryptic group designations of both PTP-and ABGD-delimited PSHs labels are almost similar, except for PSH-E that ABGD-delimited divided to be 2 sub-groups (Fig. 4). The GMYC-delimited method resulted in recovery of 13 PSHs within Dimocarpus. The results did not conflict with PTP-and ABGD-delimited PSHs but suggested additional phylogenetic species within PTP-delimited PSHs for the Dimocarpus cultivars from Thailand (PSH-B in Fig. 4) and Dimocarpus longan spp. malesianus var. malesianus from Malaysia. The GMYC, ABGD results were almost consistent with the five PSHs identified by PTP, except for PSH-B and PSH-E.

Discussion
The PCR product of the trnH-psbA gene run on 1.5% agarose gels showed a smaller PCR fragment in lane number 6 which was the Thao cultivar (Dimocarpus longan spp. longan var. obtusus) ( Fig. 1A and B). The differentiation of the PCR product might be caused by an InDel mutation occurring inside this gene. The PCR amplification was not only performed on one sample of a Thao cultivar, but 4 more DNA samples were extracted from other Thao cultivar trees which were used to confirm the 70 bp deletion of this gene in the Thao cultivar. The trnH-psbA gene fragments amplified from the 5 Thao samples all showed the same size which confirmed the unique deletion of the trnH-psbA gene in the Thao cultivar (Fig. 1B). This unique genetic pattern which is found only in the Thao cultivar can be applied as an easy and cost effectively genetic marker to identify the Thao cultivar. The maximum likelihood tree based on concatenation of nuDNA and chDNA (Fig. 3) revealed the relationship and positions of Dimocarpus spp. in Thailand and nearby countries. The examination of individual trees showed slight differences, but results were broadly consistent with the concatenated tree topologies. Dimocarpus longan was polyphyletic and divided into 3 sub-clades, designated as Clade 1a, 2a and 4a with moderate to high statistical support (Fig. 3). The multi-gene phylogenetic analyses, in combination with species delimitation methods, revealed evidence that Dimocarpus longan sensulato showed morphological cryptic diversity (Figs 3 and 4). Additionally, longan historically recognized as Dimocarpus longan with various subspecies and varieties (D. longan spp. longan var. obtusus (Clade 1a in Fig. 3), D. longan spp. longan var. longan (Clade 2a in Fig. 3), D. longan spp. malesianus var. echinatus (Clade 2a in Fig. 3) and D. longan spp. malesianus var. malesianus (Clade 4a in Fig. 3) segregated into different clades. This result challenges the validity of the species and subspecies of Dimocarpus. This cryptic diversity strongly supports the taxonomic description asa new species, albeit in concert with other characteristic such as sympatric/allopatric speciation, ecology, hybridization and morphology 23 . The phylogenetic trees were supported by three of the species delimitation approaches (PTP, ABGD, GMYC in Fig. 3) which identified additional PSHs within not only D. longan spp. longan var. longan, but also suggested that some of subspecies should be rearranged and recognized as species, especially, D. longan spp. longan var. obtusus (Clade 1a in Fig. 3 and PSH-A in Fig. 4) and D. longan spp. malesianus var. malesianus (Clade 4a in Fig. 3 and PSH-E in Fig. 4). Notably, D. longan was not monophyletic in any of our phylogenetic trees. These results are also supported by the different morphological characters between D. longan spp. longan var. longan (Clade 2a in Fig. 3 and PSH-B in Fig. 4), D. longan spp. longan var. obtusus (Clade 1a in Fig. 3 and PSH-A in Fig. 4), and D. longan spp. malesianus var. malesianus (Clade 4a in Fig. 3 and PSH-E in Fig. 4) which show large differences in morphological features such as habit, twigs, petals, fruits, petioles and rachis, and leaflets as detailed in Table 1. According to our results based on molecular and morphological approaches, it is strongly suggested that three subspecies/varieties of the D. longan species complex should be recognized as three distinct species, two of which are elevated to species rank: D. malesianus (Clade 4a in Fig. 3) and D. obtusus (clade 1a in Fig. 3), while D. longan spp. malesianus var. echinatus is reclassified as D. longan var. echinatus (clade 2a in Fig. 3 Fig. 3 and PSH-B in Fig. 4) might have different genetic backgrounds. This is supported by the GMYC species delimitation method, and revealed the diversity of longan cultivars in Thailand that will be useful for conservation management of this plant in the future. Finally, there is a need for more sensitive markers to be used to clarify the relationship of these longan cultivars in the future.
On the basis of phylogenetic tree reconstruction (Fig. 3), species delimitation analyses (Fig. 4) and morphological differentiation (Table 1), we propose that in the D. longan species complex three species are recognized, one of which is D. longan, corresponding to clade 2a in Fig. 3. The other two species require nomenclatural changes as follows:  24 Leenh whereas others longan cultivar in Thailand are the Dimocarpus longan ssp. longan var. longan. From this information the Thao cultivar is classified to be the same subspecies as other longan cultivars but just a different variety 3,10 . Nevertheless, the DNA barcoding result from our study make the information more clear by supporting that the Thao cultivar should be classified as a different species from other longan cultivars as noted above. This is further supported by Jaroenkit, T. 18 , who noted that the special character of the Thao cultivar was due to its creeping plant-like nature as opposed to others longan cultivar which are perennial plants.  Table 2. DNA extraction, PCR amplification and sequencing. Total genomic DNA was extracted from young leaves using the cetyltrimethylammonium bromide (CTAB) method 28 . While the herbarium DNA extractions were performed using a CTAB method 29 modified as according to Bakker 30 . The quantity and quality of the genomic DNA was analyzed by electrophoresis through 1% agarose gels and the 260/280 nm absorbance ratio as determined by spectrophotometry. The genomic DNA was used as a template for PCR amplification using 6 specific primer pairs directed to the nuclear internal transcribed spacer (ITS2) 31, 32 , matK 33 , rbcL 34,35 , trnH-psbA 36 , trnL (UAA) intron (trnL-i) 37 and trnL-trn intergenic spacer (trnL-trnF) 38 . Sequences of DNA barcoding primers are shown in Table 3. The PCR amplification step was an initial 95 °C for 5 min followed by 35 cycles of denaturation at 95 °C for 30 sec, annealing for 45 sec and extension at 72 °C for 1 min, and the final extension step was performed at 72 °C for 10 min. The PCR products were run on 1.5% agarose gels in 0.5X TBE buffer. The PCR fragments were visualized under UV light after staining with SYBR SAFE DNA Gel Stain (Invitrogen, U.S.A). The PCR products amplified from the 6 loci were further analyzed by DNA sequencing.

Molecular Analyses.
The BioEdit Sequence Aligment Editor program 39 was used to analyze the DNA sequences resulting from sequence analysis of the PCR products generated with the six barcoding primer pairs. The ClustalW program with additional manual curation was used to analyze multiple sequence alignments in order to observe the sequence conservation among longan cultivars. The nuclear ITS2 region sequence and 5 plastid markers consist of both coding (matK and rbcL exons) and non-coding regions (trnH-psbA, trnL-trnF and trnL-i) were used to reconstruct the phylogeny. All sequences were checked for ambiguous nucleotide sites and saturation before being subjected to phylogenetic analysis. The uncorrected pairwise distances for transition and transversion substitutions were plotted to visualize saturation and detect the taxa responsible. Analysis of genes separately and in combination was performed using the neighbor-joining (NJ), maximum likelihood (ML) 40 and Bayesian inference (BI) criteria. jModeltest2.1.1 41 was used to calculate and determine the best evolutionary substitution model by the Akaike Information Criterion (AIC) 42 and showed that HKY + G (G = 0.05) model for ITS2, HKY for matK, trnH-psbA, trnL-i and trnL-trnF, and JC for rbcL.
The incongruence length difference test 43 in the partition homogeneity test in PAUP 4.0b10 using 100 replicates 22 were performed to test the concatenated data sets. To assess support at each node, non-parametric bootstrap analyses 44,45 were performed using PAUP* version 4.0b10 22 . For coding genes, first and second codon, third codon and all codon positions were tested. The mutation rates were partitioned among genes in concatenated data sets based on model as above. The NJ analysis and the likelihood scores of different data partitions were carried out using PAUP* version 4.0b10 22 with bootstrap re-sampling 44,45 with 1000 replicates. The maximum likelihood 40 analysis was undertaken using RAxML v. 7.2.7 46,47 . The bootstrap resampling 44 with 1000 replicates were performed to support the individual branches of the ML tree.
Bayesian inference (BI) analysis was undertaken using MrBayes version 3.2.5 48 . The 4 chains of a Markov chain Monte Carlo algorithm (MCMC) were used in this criterion. The analysis was run for 10 million generations with png which is licensed under the "Creative Commons Attribution-Share Alike 3.0 Unported" that is free to share (to copy, distribute and transmit the work) and remix (to adapt the work).
SCiEntifiC REPORTS | 7: 6716 | DOI:10.1038/s41598-017-07045-7 a 0.05 heating parameter. The convergence of analysis was estimated using Tracer 1.4.1 49 , and reliable ESS values (>200) were ensured. The sampling was done for every 100 generations and then the first 25% of trees were discarded using a burn-in procedure. Support for nodes was defined as posterior probabilities (P).
The tree topological differences between single-gene phylogenetic trees were compared at the level of resolution obtained by each marker and its bootstrap support. Topological differences of the trees with bootstrap support (BS) and posterior probability (P) less than 75% were not considered. Two lychee cultivars, namely Brewster and Hong Houy were used to root the tree as the out group 50 .
For ABGC 52 , genetic distances between samples were evaluated using the Kimura two parameters (K2P) model, a standard metric in DNA barcoding studies. The ABGD was run via web server http://wwwabi.snv.jussieu.fr/public/abgd/abgdweb.html using default values, except for the relative gap width (X) that was set to 10 to avoid the capture of smaller local gaps.  For PTP and GMYC 53 , all samples of Dimocarpus were included. These methods use a phylogenetic input tree from which the fit of speciation and coalescent processes are modeled to delineate a Primary Species Hypotheses (PSHs). The branch lengths were estimated under a relaxed log-normal clock algorithm as an implement in BEAST v1.8.2 package 54 . HKY + G model was applied to construct the tree. The MCMC chains were run for 10 × 10 6 generations with a sampling step performed for every 100 and 10% burnin. The MCMC output was determined by examination of traces in Tracer 1.6 55 and analyzed with TreeAnnotator 1.7.4 using all trees after the burnin. A posterior probability limit of 0.5 with maximum clade credibility tree was set. Both the single-threshold and the multiple-threshold versions of the GMYC model 53 were optimized onto the output tree with the help of the SPLITS v.1.0-19 package for R. The PTP method was executed using the best-scoring ML tree produced earlier using RAxML v. 7.2.7 56 , and was run in Python using the Environment for Tree Exploration package 57 .
Data availbility statement. The data sets generated and analysed during the current study are available within the paper. All GenBank accession numbers (KY174077-KY174314) o f nucleotide sequences of six loci from individual samples analysed in this study can be retrieved through the NCBI database.

Conclusion
This is the first report of the genetic relationship of Dimocarpus based on multi-locus molecular markers and morphological characteristics. Multiple sequence alignment, phylogenetic tree analysis and species delimitation supported that Dimocarpus longan spp. longan var. obtusus and Dimocarpus longan spp. malesianus var. malesianus should be classified to be different species from Dimocarpus longan spp. longan. Moreover, sequencing of the DNA barcode revealed the possibility of different species among Thai longan cultivar such as Daw Kaew Yee, Baan-Hong 60 and Phuen-Mueang cultivars. However, more evidence is required to confirm this proposition.