Phylogenetic relationship of Paramignya trimera and its relatives: an evidence for the wide sexual compatibility

The genus Paramignya (Rutaceae) comprises about 30 species typically distributing in tropical Asia. Like other genera of the family Rutaceae, the significant variation in the morphology of Paramignya species makes the taxonomic study and accurate identification become difficult. In Vietnam, Paramignya species have been mostly found in Khanh Hoa and Lam Dong provinces and used as traditional medicines. Recently, Paramignya trimera, a species of the genus Paramignya with local name “Xao tam phan” has been drawn attention and intensively exploited to treat liver diseases and cancers. However, the significant variations in the morphology and different local names of P. trimera have caused confusion and difficulty in the accurate identification and application of this plant for medicine. In this study, the combination of both morphological and DNA sequence data has effectively supported the taxonomic identification of P. trimera and some relatives collected in Khanh Hoa and Lam Dong provinces. The comparison of the morphology and analysis of the phylogenetic trees suggested that there was a significant variation of P. trimera. In addition, some accessions of P. trimera with morphological characteristics similar and Atalantia buxifolia were likely the intergeneric hybrids between the two species. Analysis of genetic variation, interspecific and intraspecific distances using ITS, matK and rbcL sequences shown that P. trimera was closely related to A. buxifolia, Severinia monophylla and Luvunga scandens. In addition, matK sequences represented as the effective candidate DNA barcode to identify and distinguish Paramignya species from others of the family Rutaceae.


Scientific Reports
| (2020) 10:21662 | https://doi.org/10.1038/s41598-020-78448-2 www.nature.com/scientificreports/ region trnH-psbA [16][17][18][19] . Recent case studies on the identification of medicinal plants based on DNA barcoding revealed that among universal barcodes, matK and ITS regions showed a high success rate of PCR amplification and discriminatory power followed by rbcL region. The trnH-psbA region provided low discriminatory power due to its low success rate of DNA sequencing 18,19 . Due to the complexity and uncertain morphological taxonomy of Paramignya species, the analysis of the molecular data, especially the DNA barcode sequences, are necessary for the identification and distinguish of these species. However, studies on the genetic variation and the phylogeny of the genus Paramignya using molecular data are relatively limited 11,16 . Until now, reports on the genus Paramignya have predominantly focused on the investigation and the characterization of the physicochemical and the biopharmaceutical properties of the extracts 4,[20][21][22][23] . A broad range of the value secondary compounds such as coumarin, tirucallane, acridone alkaloids, phenols, flavonoids, limonoid, sterols and derived glycosides was characterized as valuable resources for natural novel drug developments 4,[20][21][22][23][24][25][26][27][28][29][30] . In recent years, the physicochemical properties, antioxidant, anti-proliferative and anti-inflammatory capacities of the leaf, stem, and root extracts of P. trimera were reported [24][25][26][27] . An abundant source of the natural compounds such as phenols, saponins, flavonoids, proanthocyanidins, and antioxidant agents was found in P. trimera 20,28,29 . The in vitro anticancer activity of the extracts of P. trimera against human pancreas and breast cancer cell lines MCF-7 via apoptosis induction was also investigated 25,30 . Although the phytochemical and biopharmaceutical properties of P. trimera were characterized, the morphology and the phylogenetic relationship of P. trimera with relatives were clearly undescribed. The high genetic diversity of the natural populations of P. trimera and the existence of the numerous local names make it difficult to identify and distinguish P. trimera from its relatives. Therefore, a well systematic analysis to identify and distinguish P. trimera from relatives in the family Rutaceae is necessary for the protection and the conservation of this plant.
In the present study, the morphological and DNA sequence data of the accessions of P. trimera and relatives distributing in Khanh Hoa and Lam Dong provinces were used to clarify the phylogenetic relation among these species. In particular, the morphological similarity between accessions of P. trimera and A. buxifolia was investigated to support for the hypothesis about the intergeneric hybrids of P. trimera with relatives. Additionally, the intraspecific and interspecific distances between accessions were also analyzed based on ITS, matK and rbcL sequences to discover a candidate DNA barcode for the identification and the discrimination of P. trimera from other species in the genus and in the related genera.

Results
Collecting plant specimens. In the present study, 10 accessions assigned to 4 genera Atalantia, Luvunga, Paramignya, and Severinia were collected from different sites in Khanh Hoa and Lam Dong provinces of Vietnam ( Fig. 1). Of these, six accessions of P. trimera (Oliv.) Burkill were collected at different sites in Khanh Hoa provinces including Ninh Van Table 1.
L. scandens (Roxb.), Wight was discovered in Lam Dong of Vietnam with the local name "Xao leo". L. scandens is woody climber or scrambling shrub; rough tufted from the ground with strong axillary sharp straight or slightly recurved spines. Leaves  www.nature.com/scientificreports/  www.nature.com/scientificreports/ flowering time has been described. According to traditional experience, this plant is used to treat rheumatism, liver disease and ascites (Fig. 2d).
Phylogenetic relation analysis. The phylogenetic tree from ITS sequences included 3 groups (Fig. 5a).
The first monophyletic group was only S. monophylla (PC.DD) as an out group. The second monophyletic group included 2 accessions of L. scandens (PR.DL and PR.CT). The third group was paraphyletic group with 9 accessions clustered in 2 sub-groups. The first sub-group included only P. trimera, whereas the second sub-group included 3 accessions P. trimera nested with P. confertifolia and A. buxifolia. In addition, in the second sub-group, the accessions of P. trimera collected in Dien Khanh, Vietnam (PT1.DK) and P. confertifolia from Mensong, China were in the same monophyletic clade whereas A. buxifolia (PA.VN) was clearly separated from others. The unrooted tree from matK sequences included 3 groups in which the first monophyletic group were 2 species P. lobata and P. scandens (Australia), the second monophyletic group included only P. confertifolia (China) and the third group (paraphyletic group) included 3 sub-groups (Fig. 5b). The first sub-group included all accessions of P. trimera, the second sub-group included only S. monophylla and the third sub-group included L. scandens and A. buxifolia.
The unrooted tree from rbcL sequences included 2 main groups in which the first group included 3 species P. scandens, P. monophylla and P. lobata (Australia) and the second group (paraphilic group) included 5 species P. trimera, P. confertifolia (China), S. monophylla (Japan), A. buxifolia, and L. scandens (Fig. 5c). In this group, some accessions of P. trimera were nested in the paraphylic sub-groups because they did not share an immediate common ancestor.
The pattern of the phylogenetic tree constructed from the concatenated sequences was similar to that of ITS sequences (Fig. 5d). The tree included one monophyletic group with only L. scandens and one paraphyletic group with the accessions of P. trimera nested within P. confertifolia, A. buxifolia and S. monophylla.
Genetic distance analysis. The overall genetic distances for ITS, matK, rbcL and concatenated sequences were 0.11 ± 0.01, 0.29 ± 0.02, rbcL 0.48 ± 0.05 and 0.05 ± 0.0, respectively ( Table 2). An overlap between the maximum intraspecific distances and the minimum interspecific distances were observed in the cases of ITS, rbcL and concatenated sequences ( Table 2, Fig. 6a,c,d). In case of matK, a clear barcode gap was found between the maximum intraspecific distance (0.0028) and the minimum interspecific distance (0.0056). The histogram and ranked pairwise (K2P) distances demonstrated a significant difference in the cases of matK and rbcL (Fig. 6b,c).

Discussion
The controversy of the morphological classification. According to the database of the plant list (www.thepl antli st.org, 2020), 30 plant name records were matched with the query Paramignya. In the BOLD system, 10 published specimen records represented for 4 Paramignya species (P. cf. scandens, P. mindanaensis, P. lobata and P. confertifolia) and 6 barcodes sequences were described and linked to NCBI database.
This study is the first to sample all accessions of P. trimera (Oliv.) distributing in Khanh Hoa and the relatives in Khanh Hoa and Lam Dong provinces of Vietnam. By morphology analysis, accessions P. trimera were different from other Paramignya species because of its trimerous flowers with 3 corollas (Fig. 2c) 1,31 . However, a wide range variation in the characteristics of the leaves of P. trimera was observed (Fig. 3). In addition, some accessions of P. trimera with their leaves similar to A. buxifolia that caused uncertainty about the taxonomic classification ( Fig. 4a,b). The blending forms of leaves found in some accessions of P. trimera collected in Khanh Hoa province suggested that they were likely the "graft chimera" or the intergeneric hybrids between P. trimera and the closely relatives such as A. buxifolia or S. monophylla (Fig. 4). The lack of information on the holotype or lectotype of www.nature.com/scientificreports/ P. trimera in combination with the complexity of the nexus between self-incompatibility, apomixis and outcross led to the difficulty in the discrimination of the hybrid lines of P. trimera with its relatives. However, based on the phylogenetic relationships demonstrated in the cladograms (Fig. 5), the "graft chimera" or hybrid lines would be the results of the outcrossing between P. trimera and A. buxifolia rather than S. monophylla because P. trimera and A. buxifolia were clustered in one clade. For years, in term of taxonomy, most Paramignya species were listed in the genus Atalantia 30 . P. trimera used to have other synonyms such as Triphasia monophylla DC or A. recurva Benth 3,6 or possibly congeneric with Luvunga species 7 . According to Mabberley, Paramignya species were so closely related to the genus Luvunga that they were listed in the genus Luvunga with the scientific name L. monophylla (DC.) 7 . However, in this study, accessions of P. trimera were separated from Luvunga in all DNA barcode sequences. In the aspect of morphology, the genus Luvunga Wight & Arn was held to differ from Paramignya in its 3-5 corollas, 6-10 stamens and 2-4 locules in the ovary 6 . Since 1931, Burkill has also named A. trimera as P. trimera because Paramignya species have small fruits containing fluid mucus and without pulp vesicles 3 . However, the misidentification of Paramignya species was sometimes due to the characteristics of their leaves or axillary spines 31 . The simple leaves of some species of the genus Paramignya, including P. trimera, sometimes also occur in the genus Luvunga because of their petioles shorter than those of the usual trifoliate leaves (Fig. 4d). This remark was also mentioned in the notes on the genus Paramignya 6 . According to the study on the phylogenetic relationships of the sub-family Aurantioideae inferred from chloroplast DNA sequence data, in the tribe Citreae, nearly all the species develop axillary spines, single or paired, sometimes curved as in Luvunga and Paramignya 31,32 . In our study, some accessions of P. trimera were similar to A. buxifolia, especially their leaves and stems (Fig. 4a,b). This was the reason for the difficulty in providing an unequivocal identification for Paramignya species. However, based on both morphological and DNA sequence data, it was possible to distinguish P. trimera from other species of the genus Paramignya.
Analysis of DNA barcode. At the time of this study, in the NCBI Entrez system, there are 39 records matched with the query "Paramignya" including P. lobata, P. scandens, P. monophylla, P. trimera and P. confertifolia. A total of 26 barcode sequences of ITS1, ITS2, matK, ribulose-1,5-bisphosphate carboxylase/oxygenase gene large unit (rbcL), hyper-variable regions of chloroplast ribosomal protein S (Rps4, Rps16), psbA-trnH, ATPase beta-subunit gene (atpB), trnG, trnF and trnD were found in the GenBank. For the phylogenetic tree from ITS    www.nature.com/scientificreports/ sequences, most accessions assigned to P. trimera were clustered with P. trimera (KM111544.1), the other accessions of P. trimera were nested within a clade with P. confertifolia (HG004846.1) (Fig. 5a). This problem was likely due to the paraphyly or polyphyly of the conspecific DNA sequences that caused by incomplete lineage sorting or the existence of the interspecific hybrids among these closely related species. This result also matched with the overlap of the distance between the maximum intraspecific and the minimum interspecific distances (Table 2). Although the ITS data were not completely support for the identification and the discrimination of all species of the genus Paramignya, the tree created by ITS sequences supported for the classification of P. trimera and relatives. These results were also consistent with the notes in the sub-family Aurantioideae (Rutaceae) 6 . The tree created by matK sequences supported Swingle and Reece's classification of subfamily 1 . The results were matched with the notes about molecular phylogeny of the orange subfamily using cpDNA sequences 2 and genetic relationships of citrus and its relatives 16 . The clear barcode gap between the maximum intraspecific distance and the minimum interspecific distance (Table 2) also agreed with the phylogenetic tree. It suggested that matK was a candidate DNA barcode for the classification and the discrimination of P. trimera from other relatives in the genus Paramignya and other genera in the sub-family Aurantioideae. The presence of the paraphyletic groups in the phylogenetic tree based on rbcL sequences partly reflected the unresolved relationships of closely related taxa. These results were also matched with the data of intraspecific and interspecific distances ( Table 2). The overlap between the intraspecific and interspecific distances among P. trimera with S. buxifolia and S. monophylla suggested that rbcL would not be suitable barcode marker for the classification and the discrimination of P. trimera from other Paramignya species as well as other genera. The results were also matched with those in the study on the molecular phylogeny of the subfamily Aurantioideae using cpDNA sequences 2 . The overlap between the maximum intraspecific distance and the minimum interspecific distance suggested that the accessions of P. trimera with intermediate forms of leaves were likely the interspecific hybrids among P. trimera with A. buxifolia (Fig. 4c,d).
Based on both the morphological and DNA sequence data of P. trimera and relatives, we suggested that the identification and the discrimination of Paramignya species by using DNA barcodes was reliable only if a significant difference was consistently detected between the maximum intraspecific distance and the minimum interspecific distance. The use of the mean instead of the minimum interspecific distance could exaggerate the size of the "barcoding gap" and lead to misidentification 33 . Therefore, in the case of P. trimera the approach to reliably detect the barcoding gap is to determine the gap between the maximum intraspecific and the minimum interspecific distances 33 . Therefore, only matK sequence would be a suitable candidate DNA barcode for the identification and the discrimination of P. trimera from other Paramignya species (Table 2). Although the histogram and ranked pairwise (K2P) distances analyzed by ABGD program showed a clear gap in both cases of matK and rbcL (Fig. 6b,c), the barcode gap was only found in the case of matK rather than rbcL. Other DNA sequences including ITS, rbcL and concatenated sequences proved inefficient to solve the relationships within the Paramignya species and some close relatives such as P. confertifolia, L. scandens, A. buxifolia and S. monophylla (Table 2 and Fig. 5). In comparison to matK and rbcL sequences, although ITS showed higher mutation rate and more informative sites (data not shown), it was likely not suitable for sufficiently developing a DNA barcode to distinguish between Paramignya species. Although this comparison was relative because of the available heterogeneous datasets, the results provided additional insights into the effective of DNA sequences as barcode markers for accurate identification of P. trimera as well as Paramignya species. This consistent problem was due to some factors such as the inadequate number of samples in different geographical locations, the shortage of both morphological and molecular data of well-characterized phylogeny, or the interspecific hybrids as a result of the outcrossing enforced by self-incompatibility. It was likely that the wide sexual compatibility of the genus Paramignya and other genera of family Rutaceae was also one of the reasons leading to the difficulty in taxonomic identification. Although the comprehensive database such as BOLD system has been grown up rapidly, few wellsampled datasets, especially for the genus Paramignya are available to test its efficient performance. Thus, the considerable promise of barcoding will be realized only if the solid taxonomic foundations were well understood and established thoroughly sampled clades. Obviously, DNA barcoding is a system for species identification by using a short-standardized sequence as a "barcode" to assign an unknown specimen to a known species, however, a question on which DNA region can be used as the standard barcode should be adequately addressed for P. trimera as well as other species of the genus Paramignya.
Based on the obtained results we suggested the use of DNA barcodes was helpful to identify and distinguish of Paramignya species. In addition, the combination with other data there would allow to minimize the probability of misidentification. Therefore, the further systematic study and species identification of Paramignya are still needed to provide reference data for the screening of DNA barcode and the species discrimination that could provide theory basis for the identification and conservation of valuable medicinal plants.

Conclusion
It was the first time the morphology and the phylogenetic relation of P. trimera and some relatives of the family Rutaceae collected from Khanh Hoa and Lam Dong provinces of Vietnam were analyzed. A combination of morphological data, BOLD platforms and DNA barcode sequences was efficiently support for the identification and the discrimination of P. trimera from its relatives. In addition, the presence of the intermediate forms of P. trimera was likely the interspecific hybrid lines as the results of the outcross between P. trimera with closely related species, notably A. buxifolia. It also suggested that the wide sexual compatibility could lead to the difficulty in taxonomic identification of P. trimera and Paramignya species. The study supported for the accurate identification for exploitation and of P. trimera in Vietnam as a valuable indigenous source of medicinal plant. Similar PCR conditions were applied for the amplification of matK and rbcL sequences. The matK sequences of all accessions were amplified by using the universal primers matK-390F TAA TTT ACR ATC AAT TCA TTC AAT ATT TCC and matK-1326R GAR GAY CCR CTR TRA TAA TGA GAA AGA TTT according to Kyndt, T. et al. 2005 with 52 °C annealing temperature 36 . The rbcL sequences of all accessions were amplified by using the universal primers rbcL-F ATG TCA CCA CAA ACA GAG ACTAA and rbcL-R TTC GGC ACA AAA TAC GAA ACG ATC TCTC with 56 °C annealing temperature 37 .
PCR products were examined by electrophoresis and purified by using the QIAquick PCR Purification Kit (Qiagen, Cat. No. 28104) following the manufacturer's protocols. Purified PCR products were directly sequenced in both directions using the ABI PRISM dye terminator cycle sequencing ready kit with AmpliTaq DNA Polymerase (Applied Biosystems Inc.). Unincorporated dye terminators were removed using the DyeEX Dye-Terminator removal Kit (Qiagen, Cat. No. 63204) following the manufacturer's recommendations. Sequencing samples were automatically loaded and injected on the ABI 3500 XL (Applied Biosystem Inc.) following the instruction of the manufacture.
Sequence splicing and correction. Both forward and reverse nucleotide sequences were visualized and aligned using BioEdit (Ver. 7.0.5.3). Sequences were checked manually to find sequencing errors, if any, to correct. Erroneous and ambiguous base calls with low quality were trimmed from both ends. BLAST searches were performed for consensus sequences to identify best matches in GenBank at NCBI. In this study, the sequences of ITS, matK and rbcL have been deposited in GenBank as phylogenetic data under the accession numbers MT215517-MT215536 and MT193825-MT193834 (Table 1). Based on multiple sequence alignment, the datasets of ITS, matK and rbcL sequences of Paramignya species were pruned to a maximum of 715, 774 and 570 nt, respectively. The combined datasets as the concatenated sequences (ITS + matK + rbcL) of accessions were 2059 nt.
Genetic distance and phylogenetic analysis. The nucleotide divergence was estimated based on the multiple sequence alignment of ITS, matK, rbcL and the concatenated sequences by MEGA X (Ver.10.1.7) using the Kimura-2-parameter (K2P) model 38 . A uniform distribution was set as rate variation among sites. The Maximum likelihood (ML) trees were generated for each DNA sequence separately and combined as concatenated sequences by using MEGA X software with 1000 bootstrap replications 39,40 . The tree with the highest log likelihood is shown and the percentage of trees in which the associated taxa clustered together is shown next to the branches. Due to the significant small value of the branch lengths that may affect the display of trees, all trees in this study were plotted as cladograms. The cut-off value for condensed trees were set at 50% to better represent hypothetical phylogenetic systematics relationship among accessions. Gaps and missing data treatment were selected as partial deletion with 95% site coverage cutoff (including alignment gaps, missing data, and ambiguous bases were allowed at any position).
The overall genetic distances estimated for the ITS, matK, rbcL and concatenated sequences were estimated by MEGA X software. To determine the barcoding gap between pairwise genetic distances among and within species, the intraspecific and interspecific distance were calculated by ExcaliBAR program based on the original distance matrices computed by MEGA-X software. The barcoding gap was calculated by the difference between the maximum intraspecific distance and the minimum interspecific distance 33,41 . The automatic barcode gap discovery (ABGD) (http://wwwab i.snv.jussi eu.fr/publi c/abgd/abgdw eb.html) was used to generate distance histograms and distance ranks with two X values of relative gap width (1.0 and 1.5) and distance metric (K2P) 42 . Default values were employed for all other parameters, P (prior intraspecific divergence) ranged from 0.001 to 0.1 while Steps was set to 10, and Nb bins (for distance distribution) was set to 20 42 .