Introduction

Orchids are significant sources of secondary metabolites (e.g. alkaloid, flavonoid, terpenoid etc.) with chemical compounds (phytochemicals) such as Moscatin, Erianin, Gastrodin1,2. These phytochemicals have numerous medicinal properties that are important for human healthcare. For example, Goodyerin isolated from Goodyera schlectendaliana3 has sedative and anticonvulsant activities4; Gastrodin found in Gastrodia elata5 is effective in variety of neurological disease6; Dendrobine extracted from Dendrobium nobile7 is effective for influenza A virus8. Indeed, orchids were earlier considered more for their medicinal values rather than their beauty of colorful flowers and have long been used as medicine in different parts of the world2,9,10,11. Although the medicinal usage of orchids was apparently first recognized in the twenty-eighth century BC9,12, the history of medicinal usages of orchids dates back to the seventeenth century based on the official literature such as “Chinese Pharmacopeia”13,14. Presently, there are about 600 species of orchids that are widely used for traditional medicine in different parts of the world11.

Asia is considered as a hotspot of medicinal orchids due to the widespread commercial usages of orchids species as traditional medicine and the occurrence of these species in natural habitats. Based on various literatures, more than 70% medicinal orchid species are found in Asia10,11 representing all five subfamily (i.e. Apostasioideae, Vanilloideae, Cypripedioideae, Orchidoideae and Epidendroideae) of Orchidaceae (for example, Fig. 1). Usages of medicinal orchids are very popular in East Asia (mainly in China, Japan and Korea)11,15, South East Asia (especially in Thailand, Malaysia, Indonesia, Myanmar and Philippines)11,16,17, South Asia (usually in Bangladesh, Bhutan, India, Nepal, Pakistan)10,11 and western Asia (such as in Iran)11.

Figure 1
figure 1

Representative species of medicinal orchids from five subfamilies. (A) Apostasia wallichii (Apostasioideae), (B) Vanilla aphylla (Vanilloideae), (C) Cypripedium himalaicum (Cypripedioideae), (D) Brachycorythis obcordata (Orchidoideae), (E) Bulbophyllum careyanum (Epidendroideae), (F) Dendrobium amoenum (Epidendroideae). Photographs by Bhakta B. Raskoti. Images merged in Adobe Illustrator CC v17.0. https://www.adobe.com.

Growing popularity of herbal medicine in the twenty-first century has increased the demand of medicinal orchids in the global markets that has led to the overharvesting and illegal trade of wild orchids. There are several legislations for conservation and sustainable management of wild orchids. The most notable one is inclusion of all orchid species in the Convention on International Trade of Endangered Species of Fauna and Flora (CITES) in Appendices I or II18, which means that the collection and trade of orchids from wild habitat are banned. Despite several efforts to adopt national legislations and international treaties for conservation of wild orchids, high volumes of medicinal orchids are widely and illegally traded in the national, regional and international markets19,20,21,22. Overexploitation and illegal trade of wild orchids for medicinal usages have driven many orchid species towards extinction, for example; Dendrobium officinale Kimura & Migo and Dendrobium huoshanense C. Z. Tang et S. J. Cheng (a closely related species of Dendrobium moniliforme based on molecular phylogeny23) have been harvested from the wild habitats for medicinal use since 1200 years ago19,20, which enforced to enlist these species under critically endangered species24,25. To control overexploitation and illegal trade, the first and foremost crucial step is to correctly identify illegally traded orchids at their species level which in turn can help to understand their original locality (natural habitats) and to monitor that specific area for conservation and sustainable management.

Medicinal orchids are usually traded in the form of dried or fresh parts of plants such as pseudobulbs, tubers, leaves, stems and flowers. Morphological characters of these dried or fresh parts of different species look very similar and are difficult to differentiate from each other, for example, tuber of Gymnadenia and Dactylorhiza (Fig. 2A,B), stem of different species of Dendrobium (Fig. 2C,D), leaves of Vanda and Aerides, pseudobulb of many species of Coelogyne and Bulbophyllum. Accurate identification of orchid species from such dried or fresh materials based on morphological observation is almost impossible. To overcome this problem, DNA barcoding method can play a vital role for proper species level identification of medicinal orchids. Although few studies have been conducted on DNA barcoding of orchids using nuclear and plastid markers26,27,28,29,30,31, no concrete effort has been made so far focusing on DNA barcoding of medicinal orchids. Furthermore, in the vast majority of prior molecular works on orchids (in DNA barcoding and phylogenetic analysis), genomic DNA was extracted from leaf samples (e.g.28,29,30,31). But for the medicinal orchids, DNA extraction from tuber, stem and pseudobulb are equally important because these parts are major commodities of trade. Therefore, this research aims to evaluate DNA barcoding of medicinal orchids in Asia (where collection and trade of huge quantity of medicinal orchids have long existed) by using five barcodes (ITS, ITS2, matK, rbcL and trnH-psbA), which have been used in previous studies for DNA barcoding of angiosperms including orchids29,30,31,32,33,34. The specific objectives of this study were to: (1) test the use of existing protocol for DNA extraction from tuber, stem and pseudobulb of medicinal orchids and evaluate the success of amplification and sequencing, (2) assess efficacy of barcodes for identification of medicinal orchids and (3) establish a barcode dataset as a resource library for identification of medicinal orchids.

Figure 2
figure 2

Tuber of Gymnadenia orchidis (A) and Dactylorhiza incarnata (B), stems of Dendrobium officinale (C) and Dendrobium moniliforme (D). Photographs by Bhakta B. Raskoti. Images merged in Adobe Illustrator CC v17.0. https://www.adobe.com.

Results

Sequence success and characteristics

In this study, a total of 6986 sequences (including 431 newly generated sequences) were assembled to evaluate the five candidate barcodes (ITS, ITS2, matK, rbcL and trnH-psbA and their possible combinations). Sampling comprised 380 species belonging to 94 genera from five subfamily of Orchidaceae (Table 1). Our sample represents 90% species of medicinal orchids in Asia, which included all highly traded species. Sequence in the data matrix comprised 1823 (ITS), 1833 (ITS2 mainly excised from the aforementioned ITS sequences), 1414 (matK), 1109 (rbcL) and 807 (trnH-psbA) (Table 2). Summary of taxon sampling including sample ID, tissue source, gene bank accessions of newly generated and accessions of downloaded sequences are provided in Table 1, Table 2, Supplementary Table S1 and S2. In the aligned datasets, the maximum number of representative sequences of a species was limited to 20 individuals, and in the final data matrix about 80% species represented at least two individual sequences except for trnH-psbA.

Table 1 Summary of taxon sampling in this study.
Table 2 Sequence characteristics of different DNA markers evaluated in this study.

DNA extraction from tuber, pseudobulb and stem was 99% successful whereas extraction for the few samples that failed were mainly from stem and pseudobulb. The PCR amplification and sequencing success rates were high for ITS, rbcL (99%) and trnH-psbA (97%) (Table1). For matK, 96% of the sequences were successfully amplified and sequenced. Sequencing failed for 5 individuals because of polymorphic sites (double peaks) or a poly-G structure in the trace file. Failed sequences were re-sequenced. Some sequences were successfully obtained in the second attempt, but few samples still failed to generate readable sequences which were mainly from Orchidoideae. In total, about 97% of sequences were successfully sequenced. Sequence alignment was most consistent for rbcL and matK, followed by ITS. Conversely, trnH-psbA contains inversions as well as insertions and high level of sequence length variation, which makes the alignment extremely time consuming in comparison to other markers.

Genetic distances and barcode gaps

The results of genetic distance indicated that ITS had the highest interspecific variation (0–0.64), followed by ITS2 (0–0.60), whereas rbcL was the most conserved and displayed the lowest interspecific divergence (0–0.02). Likewise, intraspecific genetic distance ranged highest in ITS (0–0.46) followed by ITS2 (0–0.43) and lowest in rbcL (0–0.10) and matK (0–06) (Table 2).

The genetic distance method based on histograms did not detect distinct barcoding gaps and showed overlap between intra- and interspecific distance (Supplementary Fig. S1). In contrary to histograms, results based on scatter plots approach did detect barcoding gaps, which had different resolution between barcodes (Fig. 3, Table 3). Among single locus, ITS demonstrated the highest barcode gaps (84%), followed by ITS2 (80%), trnH-psbA (69.41%), matK (64.33%) and rbcL (60.33%). Among 2-loci combinations, ITS + matK exhibited highest (80.13%) barcode gaps followed by ITS2 + matK (71.11%), whereas ITS + rbcL and ITS2 + rbcL exhibited nearly similar barcode gaps (i.e. 69.05% and 69.84% respectively). The lowest barcode gaps were detected by matK + rbcL (63.20%). The combinations of 3-loci, ITS + matK + rbcL and ITS2 + matK + rbcL exhibited nearly equal barcode gaps i.e. 67.42% and 66.03% respectively (Table 3).

Figure 3
figure 3

Scatter plots of the maximum intraspecific versus minimum interspecific K2P distance for five single markers and seven combinations (I, ITS; I2, ITS2; M, matK; R, rbcL; T, trnH-psbA).

Table 3 Identification success rates obtained using Distance and NJ tree methods for the five single markers and seven combinations.

Result based on the BM analysis, ITS performed highest (91.62%) identification ability among the single barcode followed by trnHpsbA (84%), ITS2 (83.45%), matK (71.20%) and rbcL (43.10%). Among the multi-loci combinations, ITS + rbcL had the highest identification power (84.60%) followed by ITS + matK (80.51%), whereas the lowest capacity was exhibited by matK + rbcL (49.04) (Table 3). Based on the BCM method, the best-performing barcode for single locus was ITS (90.83%) followed by trnHpsbA (84.85%), whereas identification power was lowest for rbcL (43.10%) (Table 4). Among the combinations of 2-loci, the species identification performance was ranked as ITS + rbcL > ITS + matK > ITS2 + rbcL > ITS2 + matK > matK + rbcL. At the 3-locus combinations, the identification abilities were relatively lower than 2-loci combinations with resolution of 56.51% (ITS2 + matK + rbcL) to 63.55% (ITS + matK + rbcL) (Table 4).

Table 4 Species identification success based on best match and best close match.

Species discrimination

Neighbour-joining (NJ) trees obtained from the majority of barcodes exhibited nearly similar topology (Supplementary Figs. S2-S13) and agreed with the core phylogenetic hypothesis of Orchidaceae (e.g.35). Species discrimination rates among single-locus ranged from 53% (rbcL) to 90% (ITS), where matK and trnHpsbA performed 67.56% and 66.11% respectively. Among the 2-loci combinations, discrimination range comprised 70.12% (matK + rbcL) to 80% (ITS + matK). In the 3-loci combinations, highest species discrimination was exhibited by ITS + matK + rbcL (70.20%), whereas ITS2 + matK + rbcL had 68.24% (Table 3).

Discussion

In the vast majority of the prior studies of DNA barcoding of Orchidaceae, genomic DNA was extracted from leaves (e.g.28,29,30,31) but illegal trade of medicinal orchids usually occurs by exporting their different parts such as tuber, stem, pseudobulb etc. Therefore, it is important to test and develop a protocol for DNA extraction from illegally traded parts of orchids for their accurate identification. In this study, we extracted DNA from the tuber, pseudobulb and stem samples, which were collected from traders or directly from wild habitats. Our study observed that DNA extraction is possible from tuber, pseudobulb and stem of medicinal orchids following existing protocol. This study found a high rate of PCR amplification and sequencing success for ITS, rbcL, matK as well as trnH-psbA, which is consistent with previous studies29,30,31,34. Comparatively, trnH-psbA seems to be slightly less successful due to presence of poly (T). Besides, some trnH-psbA sequences obtained with indels (i.e. insertion and inversions) caused complexity in data alignment. Such issue was also observed in previous studies29,31. Similarly, sequencing success rate of matK was relatively low due to presence of Poly (G). But generally, DNA extraction, PCR amplification and sequencing for ITS, matK, rbcL and trnH-psbA have no serious issues for DNA barcoding of medicinal orchids.

Based on different analytical methods (genetic distance, BM, BCM, NJ), ITS performed the highest identification rate among the single barcode region (Fig. 4, Tables 3, 4, Supplementary Fig. S1). This rapidly evolving nuclear gene has the highest variable sites that contribute to the efficacy of species discrimination36,37. Our result is consistent with previous studies29,31,32,36. On the other hand, matK, rbcL and trnH-psbA (plastid barcodes) exhibited lower resolution than the ITS (nuclear barcode). This could be due to lower substitution rates found in the plastid region. Therefore, these plastid barcodes alone are not recommended for DNA barcoding of medicinal orchids. The low resolution of the plastid region has been reported in different seed plants including orchids38,39. In some studies27,40, other plastid barcodes such as ndhF, ycf1, trnL-trnlF (not evaluated in this study) were also found effective in DNA barcoding of Orchidaceae. However, in general these plastid barcodes are not commonly used in Orchidaceae. Therefore, additional studies are required to evaluate efficacy of ndhF, ycf1, trnL-trnlF for the DNA barcoding of the medicinal orchids.

Figure 4
figure 4

Neighbour-joining tree based on the ITS barcode. Coloured clades (except black) represent species that were correctly identified, and different sub-families are colour-coded. Black clades represent species that were not identified successfully. Details are included in Fig. S2.

Different combinations of two and three markers from ITS, ITS2, matK, rbcL were analysed in this study. Due to the comparatively low number of sampled species and sequences along with several indels, we excluded trnH-psbA in concatenation that may create robust missing data in the data matrix. Such missing data may have a negative impact on the results of phylogeny or tree-based DNA barcoding41,42,43,44. Although few studies have proposed trnH-psbA as a key barcode36,45, in this study trnH-psbA performed weak in species resolution except in BM and BCM analysis (Table 4). Moreover, several problems of trnH-psbA such as high frequency of length variation and the presence of inversions and insertions (which create complications to use it as a DNA barcode) have been reported46,47,48.

In the combinations of two barcode region, ITS + matK exhibited highest degree of species discrimination capability (Table 3) which is consistent with previous studies29,31. As an alternative to ITS + matK, a combination of ITS + rbcL could be a supplementary choice for the DNA barcoding of medicinal orchids (Table 4). The usefulness of this option (i.e. ITS + rbcL) is important particularly when matK amplification fails. The use of matK as a barcode has been criticized mainly because primers may be taxon specific or universal primers may not be available for all taxa49. Although matK amplification is not an immense problem in orchids, some studies from other groups of plants reported that matK sequencing is successful only after using up to 10-primer pairs47.

The combination of matK + rbcL exhibited relatively low efficacy for species identification, possibly because these two plastid markers are more conserved and lack sufficient variable sites. Therefore, matK + rbcL cannot be an ideal barcode for the medicinal orchids. Similar results were also reported in previous studies in different groups of plants32,38. By contrast, matK + rbcL was proposed as a core barcode for land plants36,46,50. Our samples were delimited to a single family (Orchidaceae) i.e. relatively more closely related species than in the sampling by36,46,50, indicating that matK + rbcL is not effective for barcoding of medicinal orchids.

Combinations of 3-locus candidates were unable to increase resolution rates, as they exhibited comparatively lower resolution than the combinations of 2-locus. Such kinds of results have been also reported in the previous studies30,31. Thus, combinations of 3-barcode regions are not recommended as efficient DNA barcode for the identification of medicinal orchids.

In this study, some species belonging to Coelogyne, Cymbidium, Dendrobium (Epidendroideae) and Goodyera (Orchidoideae) were not correctly identified by NJ method with majority of markers (Fig. 4, Supplementary Figs. S2-S12). Within the Coelogyne, two species i.e. Coelogyne fimbriata and Coelogyne ovalis were not distinctly identified. These two species are morphologically very similar and also share their geographical distribution ranges, which may lead to misidentification during sample collection. Besides, taxonomic identity of these two species is based on morphological studies but lacks assessment at molecular level, therefore likely to be the same species as assigned in taxonomic revision of Coelogyne section Fuliginosae (Orchidaceae)51. In the Dendrobium, D. moniliforme was not monophyletic, although a species should be monophyletic for the effectiveness of DNA barcoding52. D. moniliforme was also not resolved in the phylogenetic analysis of Dendrobium23 where several samples with the name ‘Dendrobium moniliforme’ were nested into different lineages. Possible reason could be improper taxonomic treatment or existence of cryptic species within D. moniliforme. In the Goodyera, G. kwangtungensis failed to distinguish itself from G. schlechtendaliana. It may be due to incorrect identification during sample collection possibly caused by their similar morphological characters. Another possibility is that these two species may be the same; the assignment of separate taxonomic identity of these species may be due to presence of ambiguous characters from the result of hybridization and polyploidization. G. kwangtungensis and G. schlechtendaliana are also not resolved in the molecular phylogenetic analysis53. Further studies are necessary to clarify the taxonomic status of these unresolved species.

A perfect DNA barcode usually should exhibit high interspecific but low intraspecific distances36,45, which can be clearly demonstrated either using histograms (for e.g.31,40) or scatter plots (for e.g.29,32,33). In this study, histograms approach failed to detect clear barcoding gaps (Supplementary Fig. S1), indicating that intraspecific genetic distance and interspecific genetic distance distributions overlapped with each other. This result is also in line with previous studies29,32,33,34. Conversely, barcoding gaps were detected using scatter plots where ITS performed the best among five single barcodes and ITS + matK presented the best among the multi-locus barcodes (Fig. 3). In the previous studies, barcode gaps were detected using scatter plots in varying degree between the barcodes29,32,33. The results of this study revealed that a rapidly evolving gene ITS is a powerful barcode in DNA barcoding gap assessment as well as efficacy of species identification success rate of medicinal orchids. Besides, ITS region is necessary in each of the most powerful multi-locus barcodes (i.e. ITS + matK and ITS + rbcL) indicating that ITS (having maximum intra- and interspecific genetic divergence comparisons) plays an important role to enhance barcode performance. Therefore, we recommend ITS region to be incorporated into the core barcode of medicinal orchids. This condition was also suggested by authors such as38,45, and strong positive effects of ITS locus have been reported in prior studies in different groups of plants29,31,39. Although few concerns have been raised about the use of ITS locus mainly due to fungal contamination54,55, but in orchids, ITS amplification and sequencing are already established and most commonly used in vast majority of molecular phylogenetic as well as DNA barcoding studies (for e.g.26,29,31).

Conclusions

Based on three different analytical methods (genetic distance, similarity-based and tree-based) with sampling nearly 7000 sequences from five single barcodes (ITS, ITS2, matK, rbcL, trnH-psbA and their seven combinations), this study revealed that DNA barcoding is effective for identifying medicinal orchids. Among single locus, ITS performed the best barcode (amplification, sequencing and species identification). Among combined barcode loci, ITS + matK exhibited the most efficient barcode for the DNA barcoding of medicinal orchids. Alternative to ITS + matK, a combination of ITS + rbcL could be another multi-locus barcode option. This study indicated that a rapidly evolving gene ITS is important for the DNA barcoding of the medicinal orchids. Based on genetic distance analysis, we also suggest using scatter plots instead of histograms to detect the presence of DNA barcoding gaps in the medicinal orchids. Furthermore, the success rate of amplification and sequencing is high and the existing protocol is applicable for DNA extraction from tuber, stem and pseudobulb of medicinal orchids. A barcode library (assembling sequences from five loci ITS, ITS2, matK, rbcL and trnH-psbA) as a resource for identifying medicinal orchids has been established which contains about 7,000 sequences of 380 species (i.e. 90%) of medicinal orchids of Asia. Future studies will enhance this barcode library mainly by adding sequences from the remaining 10% species.

Methods

Sampling strategy

We compiled a checklist of medicinal orchids from Asia based on published literatures (e. g.10,11, and then the accepted species name was assigned following The Plant List and various recently published papers. Medicinal orchids in the checklist (within Asia) comprised 422 species and represent all five subfamily (i.e. Apostasioideae, Vanilloideae, Cypripedioideae, Orchidoideae and Epidendroideae) of Orchidaceae.

In total, 6,555 sequences were retrieved from the National Center for Biotechnology Information (NCBI) (see Supplementary Table S1). For this, priorities were given to those sequences, which are already published in papers or have provided voucher specimens so that misidentified sequences can be avoided. In cases where large numbers of sequences were available for a species per marker, we selected the ones, which have good quality, the longest sequences and represent from different geographical regions. Furthermore, the downloaded sequences from NCBI were filtered according to the following criteria: (1) omitted sequences having length less than 300 bp but this criteria is not applied for some ITS2 region; (2) excluded sequences lacking voucher specimens; and (3) discarded sequences having taxa without specific names (such as Habenaria sp. and Bulbophyllum cff. etc.).

In this study, 431 sequences were newly generated from 134 individuals representing 48 species that were collected from different localities of Nepal (mainly from community based forest). The national guidelines were followed for the collection and use of plants. The plant samples collected for the present study are currently neither included in the IUCN red list nor listed as protected plants. Although these plants are included in CITES Appendix II, there was no any provision to take collection permit during the time of fieldwork. The localities of species collected in this work are not from protected area; hence no permits were required. However, we did inform the related community forest user groups (local institutions under the district forest based on Forest Act 1993 enacted by Ministry forest and Environment, Government of Nepal) and took verbal consent for specimen collection.

We reviewed related floras, monographs and compared specimens with printed as well as online images including available images of type specimens for the species identification. Species were formally identified by Dr. Bhakta Bahadur Raskoti, Biodiversity Conservation Initiative, Nepal. All newly generated sequences have been submitted to NCBI (Supplementary Table S2). Voucher specimens were deposited in National Herbarium and Plant Laboratories (KATH), voucher number are also available publicly in NCBI gene bank accession records.

DNA extraction, PCR amplification and sequencing

Total genomic DNA was extracted from plant leaves dried in silica-gel following modified CTAB protocol56. We also extracted genomic DNA from tuber, stem, pseudobulb (total 47 samples from 20 species) dried in silica-gel. For this DNA was extracted following STE-CTAB protocol57. Amplification of DNA regions was performed using a polymerase chain reaction (PCR) following the reference58. The sequencing reactions were performed using the Applied Bio-systems Prism Bigdye Terminator Cycle Sequencing (Applied Bio-systems, Foster City, CA) following the manufacture’s instructions. Primer pairs for PCR and sequencing used in this study are provided in Supplementary Table S3.

Data analysis

Forward and reverse sequencing output files were edited and assembled using ContigExpress Application 6.0 (InforMax, Inc.). Assembled sequences were initially aligned using Clustal X59 and then manually adjusted in BIOEDIT version 760. Altogether twelve barcodes were evaluated including five single loci (ITS, ITS2, matK, rbcL, trnH-psbA) and seven combinations (ITS + matK, ITS + rbcL, ITS2 + matK, ITS2 + rbcL, matK + rbcL, ITS + matK + rbcL, ITS2 + matK + rbcL) using following methods.

Genetic distance-based method

The genetic pairwise distance for each marker was calculated in MEGA X61 using the Kimura 2-parameter model, and we investigated the minimum interspecific distance and maximum intraspecific distance for each species using custom R script. To detect the barcode gaps, scatter plots were generated using R version 3.6.362. In scatter plots, each dot represents a species and the dot above the 1:1 slope indicates a barcoding gap32,33,63. We counted the number of species having barcoding gaps for each marker; finally these barcode gaps were calculated in percentage. We also used histograms to detect barcoding gaps for every single and multi-loci barcode. Histograms were generated from the distribution of intraspecific and interspecific genetic distances obtained from pairwise summary function using the program TaxonDNA64.

Similarity-based method

To assess the proportion of accurate species identification, best match (BM) and best close match (BCM) functions were implemented in the TaxonDNA64. For BM analysis, identification was considered correct when query and best match sequences were from the same species, ambiguous when they were from both the same and different species, or incorrect when they belonged to different species33,64. For BCM, species identification was considered correct if a query matched all conspecific sequences within the 95% pairwise genetic threshold33,64. In BM and BCM analysis we deleted all species represented by a single sequence.

Tree-based method

To evaluate discriminatory power of single and multi-locus barcodes, unrooted neighbour-joining (NJ) trees were constructed in MEGA X61. For this pairwise deletion based on the p-distance model following protocols for species level discrimination in the closely related species were applied34,64,65. A species was considered successfully identified only when all conspecific individuals formed a monophyletic clade.