DNA metabarcoding to unravel plant species composition in selected herbal medicines on the National List of Essential Medicines (NLEM) of Thailand

Traditional medicines are widely traded across the globe and have received considerable attention in the recent past, with expectations of heightened demand in the future. However, there are increasing global concerns over admixture, which can affect the quality, safety, and efficacy of herbal medicinal products. In this study, we aimed to use DNA metabarcoding to identify 39 Thai herbal products on the Thai National List of Essential Medicines (NLEM) and assess species composition and admixture. Among the products, 24 samples were in-house-prepared formulations, and 15 samples were registered formulations. In our study, DNA metabarcoding analysis using ITS2 and rbcL barcode regions were employed to identify herbal ingredients mentioned in the products. The nuclear region, ITS2, was able to identify herbal ingredients in the products at the genus- and family-levels in 55% and 63% of cases, respectively. The chloroplast gene, rbcL, enabled genus- and family-level identifications in 58% and 73% of cases, respectively. In addition, plant species were detected in larger numbers (Family identified, absolute %) in registered herbal products than in in-house-prepared formulations. The level of fidelity increases concerns about the reliability of the products. This study highlights that DNA metabarcoding is a useful analytical tool when combined with advanced chemical techniques for the identification of plant species in highly processed, multi-ingredient herbal products.

www.nature.com/scientificreports/ origin of components. This is mostly applicable to traditional medicines for which the raw herbal materials are extremely processed and prepared in various dosage forms, such as capsules, tablets, and powders 2 . In Thailand, most traditional herbal products are multi-herb formulations. Despite appeals from the traditional healthcare system, there is limited scientific evidence to support the utilization of polyherbal formulations 7 . Many biological activities related to those known in the human healthcare system have been documented for polyherbal formulations. Furthermore, the use of multiple herbals or combinations of plants has been established in Indian Ayurveda 8 , Chinese traditional medicine 9 , and Thai traditional medicine 10 . Many studies have confirmed that herbal extracts from polyformulations have better efficacies than equal doses of individual active constituents and/or herbs, highlighting the significance of synergistic action in herbal medicines [11][12][13] . Hence, it is important to authenticate these multiple mixtures of plant samples, polyherbal formulations or herbal products to ensure their quality, safety, and efficacy 6,14 . DNA metabarcoding has been established as an appropriate molecular technique for the authentication of plant species along with animal supplements used in different forms of traditional medicine [15][16][17][18] . Many studies have shown that herbal products/market samples are often not what they claim according to the listed plant ingredients. Using a multilocus DNA metabarcoding approach, it has been shown that only 65% of identified plant species correspond to the listed ingredients for traditional herbal products 17 . Similarly, an assessment of Hypericum perforatum L. samples in an herbal market revealed that only 68% of the samples were authentic 16 . Recent findings successfully demonstrated the authentication of sixteen Veronica herbal products that are widely used in European traditional medicine. Only 15% of the products were identified as Veronica officinalis L., while 62% of the products were identified as Veronica chamaedrys L 19 . Using next-generation sequencing, a study was performed to demonstrate the authenticity of the plant Salvia officinalis L. Interestingly, the additional presence of fungal species in market samples was reported 20 . Similarly, the botanical and entomological sources of honey in different commercial honey products have been identified using DNA metabarcoding regions 21 . Seventynine Ayurvedic herbal products were authenticated using DNA metabarcoding, and the results revealed that 67% contained a single ingredient, 21% contained multiple ingredients, and some did not contain all of the plant species mentioned on the label 15 . Much evidence supports species adulteration in herbal products, which might adversely affect consumer health and safety 6,22 . Precise and consistent sequences can be obtained for improved species determination using metabarcoding, which generates repeatable results with a reasonable cost and high-throughput sequencing 23 . Therefore, metabarcoding can be implemented for the quality control and validation of contaminants and species composition in highly processed herbal products 24 . Recent improvements in next-generation sequencing methods further reduced sequencing costs and helped improve the technique for identifying complex multi-ingredient herbal products 25 . Unless regulated, such adulteration may decrease the efficacy of herbal medicines and have adverse effects on consumer health, which will eventually result in economic losses in the herbal market 22 .
In this study, DNA metabarcoding was used to examine the plant ingredients mentioned for Thai herbal products on the NLEM from markets and hospitals. Here, we highlight metabarcoding as a potential method for use in the quality control of herbal products. Our ultimate aims are to assess the ability of DNA metabarcoding to detect the presence of plant species mentioned on the labels of herbal products and to identify the presence of any adulterants in the products.

Results
DNA isolation and PCR amplification. The DNA extracted from 39 samples was very inconsistent in quality and quantity. The genomic DNA concentrations of the samples were between 0.01 and 99.0 ng/μl, according to NanoDrop measurements (Table S1). The total DNA concentrations of all the samples are provided in Table S1. PCR amplification reactions were conducted for all the samples. Out of 39 samples, 33 yielded PCR amplicons from both the ITS2 and rbcL DNA barcode regions. The success of PCR amplification from both ITS2 and rbcL for the powder, tablet and capsule dosage forms was 88%, 80% and 78%, respectively. The blank (negative control) contained distilled water and yielded no molecular operational taxonomic units (MOTUs) with either ITS2 or rbcL primers. Six of 39 herbal samples (16%) yielded no MOTUs with the ITS2 and rbcL primer pairs. The failed samples (5, 9, 21, 30, 37, and 39) were not included in the results and discussion.
Plant species detection in Thai herbal products. The labels of thirty-nine herbal products (Table 1) listed 175 medicinal plant species belonging to 136 genera and 72 families. By ITS2 metabarcoding, 1290 different plant species belonging to 138 genera and 146 families were identified, while rbcL analyses revealed 1,611 plant species corresponding to 138 genera and 123 families. The number of plant species detected ranged from 16 to 74 and 22 to 78 with an average of 39 and 49 species per sample when using ITS2 and rbcL, respectively. The raw data consisted of 5,478,664 reads, with an average of 83,010 reads per sample for both markers. After trimming and filtering, the numbers of high-quality reads were 1,989,478 and 2,552,489 with an average of 60,287 and 77,348 reads per sample for ITS2 and rbcL, respectively (Table S2). The total numbers of MOTUs obtained from the 33 herbal samples were 24 for the samples in powder form, 4 for the samples in tablet form, and 7 for the samples in capsule form.
The number of plant families detected in the herbal products ranged from 1 to 17 (Table S3). The plants listed on the labels were identified in 100% of the two single-component products and 45% of the multiple-ingredient samples (n = 31). Family-level identification of listed and non-listed (undeclared) species was performed using both ITS2 and rbcL (Table 2). In the ITS2 analysis, sample no. 1 in powder form contains the largest number of identified plant families (Table 2). For many samples, the families mentioned on the labels could not be identified; among those samples, sample no. 38 showed the smallest number of families that could be identified (  (Fig. 1). The results from rbcL detection of all the herbal samples are presented in Table 2. Among the three different dosage forms, the tablet form yielded the most identifications when using both ITS2 and rbcL (Fig. 2).

Plant ingredients and undeclared (undetected) taxa identified in herbal samples.
In our analysis for thirty-three herbal samples at the family level, 210 (63%) and 243 (73%) out of 337 families were identified by using ITS2 and rbcL, respectively ( Table 2; Table S3). In separated analysis at the genus level, 242 (545%) and 256 (58%) out of 447 were identified using ITS2 region and rbcL region respectively ( Fig. 1; Table S3). At least two to three plant species were identified in all the multiple-ingredient samples. However, among all the herbal samples, 205 and 191 of 447 species could not be identified when using ITS2 and rbcL, respectively (Table S4). Sample nos. 1, 2, 3, 4, 7, 8, 15, 18 and 24, which contained the largest numbers of ingredients according to the labels, yielded identifications at the family or genus level ( Table 2; Fig. 1). In addition to the detected plant families, many unidentified families were observed among the herbal samples (Table S4), predominantly Apiaceae, Apocynaceae, Asteraceae, Lamiaceae, Lauraceae, Myristicaceae, Rutaceae, and Zingiberaceae (Table S4). Some Theaceae. Using a correlation matrix, principal component analysis (PCA) was performed. The plants used in the preparations of herbal samples were clustered into 3 groups. Axis 1 yielded the largest number of herbal samples clustered in groups, which were used for dizziness relief and as cardiotonic, antidiarrheal and antiflatulence agents, among other uses. Axis 2 contained fewer herbal samples clustered in groups, which were used as antipyretics and to relieve cough (Fig. 3).

Validation of registered herbal formulations and in-house-prepared formulations.
Among the thirty-nine herbal products analysed in this study, twenty-four samples were from household remedies, and fifteen samples were from registered formulations prescribed in hospitals. Our results showed that registered herbal formulations yielded the largest percentages of family-level detection, at 72% and 77% (absolute) when using ITS2 and rbcL metabarcoding, respectively (Table 3). Both ITS2 and rbcL analyses revealed that sample nos. 26, 27, 32, 33, and 38 were authentic ( Table 2, Fig. 1). Similar results were obtained for all the herbal samples sample no. 34. However, genus-and family-levels detection for sample no. 34 could not be achieved with the ITS2 region, whereas rbcL identified two of the four families mentioned on the label. For sample no. 34, rbcL seems to be a good candidate region that may prove to be of greater utility. In the overall analysis, rbcL resulted  Fig. 1). Registered formulations showed larger percentages of detection at both the family-and genus-levels than household preparations when using ITS2 and rbcL metabarcoding. Unfortunately, we could not identify 100% authentic samples for any of the household formulations or in-house-prepared formulations using both ITS2 and rbcL metabarcoding. Most of the herbal samples were polyherbal formulations and highly processed powder forms. Many families that were undeclared or not identified on the label (39% and 29%, relative) were detected with metabarcoding using ITS2 and rbcL, respectively (Table 3).

Discussion
Medicinal plants and their importance in traditional healthcare systems are receiving increasing attention as solutions to resolve healthcare problems. People around the world are trying thousands of different kinds of herbal medicines each year, causing rapid growth of herbal usage and herbal industries. The quality and authenticity of herbal products directly affect the safety and efficacy of the drugs. To confirm the identity of raw ingredients  www.nature.com/scientificreports/ or processed herbal products, quality control involving a series of standard processes and screening of labelled species along the trade chain is very important 16 . In general, herbal products are extremely processed and manufactured into capsules, tablets, powders or other forms containing many components. Although chemical and DNA-based methods can be used to identify specific targeted compounds or species of origin, they are limited in their ability to identify intrageneric substitutions and do not provide any evidence of other plant constituents in herbal samples 26 . In our DNA metabarcoding study, we used two DNA loci, ITS2 and rbcL, to assess the species used as ingredients in Thai herbal products. The metabarcoding method, which makes use of DNA markers with universal applicability across a wide variety of plants and animal species, enables the identification of species in herbal samples containing degraded DNA. Many reports have shown the applications of the metabarcoding technique in herbal sample validation and pharmacovigilance 4,15,16,18,27,28 due to its cost effectiveness and ability to reveal the plant species diversity within products. However, this technique has not yet been authenticated for use in a regulatory framework for the quality control of herbal drugs 29 .
Our studies have shown that DNA barcode markers (ITS2 and rbcL) are useful in the identification of plant species in Thai herbal products on the NLEM. ITS2 has been revealed to be the most useful DNA marker for the identification of plants at the family level. Chen et al. 30 demonstrated that ITS2 can serve as a universal barcode for the identification of plants at the genus-and family-levels, particularly for medicinal plants 30 , and it has been used to calculated intraspecific and interspecific distances in many studies 31 . For example, Veronica officinalis and Hypericum perforatum, which are closely related species, were successfully identified in herbal products using ITS2 metabarcoding 16 . In this study, the ITS2 and rbcL regions were successfully used for the identification of plants in herbal samples at the family-and genus-levels. Fungal contamination was detected in some samples, which could be due to harvesting or storage conditions 25 .
Many reports have demonstrated that the quality of extracted DNA affects PCR amplification and the success of sequencing 16,18,32 . The occurrence of DNA in processed materials is affected by degradation during harvesting, storage and manufacturing 33 . In our study, 6 of 39 herbal samples (16%) did not yield MOTUs for either the ITS2 or rbcL  www.nature.com/scientificreports/ barcode. Hence, the results for these samples were excluded and not used for further analysis. Very high processing or assembly of herbs can decrease the quality of DNA, particularly if the material is extracted in hot water or liquor at high temperature. A total of 171 plant species were present in the 39 herbal products, including cultivated species (26 species), wild species (89 species) and both cultivated and wild species (56 species) ( Table S5). The wild plant species detected included threatened species, such as Enhalus acoroides (L.f.) Royle. More of the cultivated species were herbs than were shrubs and trees. Plants in the herbal products included ferns, grasses and climbers (Table S5). In this study, sequences clustered with a 99% threshold were used to identify the best-matching species, as mentioned in published reports 16,34 . In addition to sequencing error limitations, other factors were considered to minimize errors associated with the Illumina MiSeq platform 35 , which might lead to the incorrect preparation of MOTUs. Furthermore, the strict trimming of the sequences, filtering according to nucleotide bases, quality, and length, and strict grouping criteria for MOTU formation increased our confidence in the results. In previous reports, DNA metabarcoding was used for the authentication of herbal products, and the results focused on the presence and absence of species or ingredients mentioned on the labels and whether there were any contaminants 16,18 . The presence of non-listed plant species may be due to many factors, including but not limited to unintentional adulteration and deliberate substitutions that may take place from the primary source of the medicinal plants (i.e., growing or cultivation, harvesting, storage, and transport) to the industrial processing and commercialization of herbal end-products 15 . DNA metabarcoding is an exceptionally sensitive technique, and even a small amount of DNA, e.g., contamination from pollen grains or plant residue during industrial processing, can be distinguished and recognized 15 .
According to the NLEM, many Thai herbal formulations are prepared using multiple plant species. A multitude of reports have shown that herbal formulations including single or multiple herbs have higher efficacies than formulations with single active constituents and/or herbs used alone, underlining the consequences of synergistic action in herbal medicines [11][12][13] . In this study, the results revealed that two herbal samples containing single ingredients were authentic. However, polyherbal samples could not be identified because most of the polyherbal formulations contained a mixture of powered samples. In ITS2 analysis, sample no. 2, in powder form, contained the most undeclared (unidentified) families (15). In a separate rbcL analysis, sample no. 1, in powder form, contained 14 of the families listed on the label. This could be because polyherbal powered formulations have a higher chance of admixture than other forms of samples, such as bark, leaves, stems, and roots 22 . For the polyherbal formulations, at least two to three plant species mentioned on the labels were identified. In our analyses, both regions had similar results except for sample no. 34. In general, ITS region is used extensively in herbal drug authentication including systematic and phylogenetic analysis and successfully discriminates among plant species. However, a number of concerns about the use of the ITS region for phylogenetic inference have been noted 36 . Perhaps the greatest challenge to barcoding is the presence of paralogous copies of ITS observed in some plant genera 37 ; therefore, in many DNA metabarcoding studies, chloroplast regions were used along with the nuclear ribosomal ITS 15,17,18 . In sample no 34, rbcL seems to be a suitable region that may prove to be of great use in the identification of plant families. Among the 447 mentioned plant species, 221 and 191 could not be identified using ITS2 and rbcL, respectively (Table S4). Interestingly, the majority of the samples included materials from the Apiaceae, Apocynaceae, Asteraceae, Lamiaceae, Lauraceae, Myristicaceae, Rutaceae, and Zingiberaceae families (Table S4). Many families not mentioned on the label were detected, including Araceae, Bignoniaceae, Convolvulaceae, Cucurbitaceae, Euphorbiaceae, Fabaceae, Geraniaceae, Lamiaceae, Malvaceae, Polygonaceae, Rubiaceae, Rutaceae, and Theaceae. PCA was performed using a correlation matrix. The largest number of herbal samples was clustered along axis 1 (22.51%), and these herbal samples were recommended for their antidiarrheal, antiflatulence, anti-inflammation, antipyretic, and anticough effects, among others (Fig. 3, Table 1). Our results revealed that registered formulations yielded larger numbers of identifications than in-house-prepared herbal samples. Registered formulations undergo stringent quality checking by regulatory agencies before entering the market, whereas in-house formulations are prepared by local practitioners; in many cases, the latter may not pass through proper quality and safety checks. Samples in tablet and capsule dosage forms showed better results than those in powder form (Fig. 2). The samples in powder form had a higher chance of adulteration; for example, ginger powder (Zingiber officinale Roscoe) adulterated with chilli (Capsicum annuum L.) 38 and black pepper powder (Piper nigrum L.) adulterated with chilli (Capsicum annuum L.) 39 were reported.
The regulatory policies for herbal products vary among countries. In a few nations, such as Australia, Canada, the United States, and the European Union (EU), governing bodies/agencies evaluate the quality and safety of traditional medicines prior to permitting herbal product entry into the market 3,40 , but in rehearsal implementation, actions to control the quality and authenticity of herbal products in the marketplace seem to be limited. In several other nations, there is no well-established regulatory framework with which to evaluate the efficacy and safety of herbal drugs prior to their marketing 40 . The European Medicines Agency (EMA) regularly revises the European Pharmacopoeias 41 and provides monographs on the quality and verification of specific herbals to provide practical methods for their quality evaluation. More recently, the British Pharmacopoeia (BP) incorporated the first universal DNA-based identification techniques for quality control of herbal drugs, including plant sampling; DNA barcoding, isolation, and purification; PCR amplification, and a reference sequence database 40,42 . Although DNA metabarcoding is useful in the authentication of species and for analysing the species diversity in multiple mixtures of samples, it does not provide additional necessary information, such as the presence of the target compounds and their concentrations or the occurrence of chemical contaminants, which include heavy metals, irritating and disruptive dyes, and synthetic drugs 25 .

Conclusion
Medicinal plants and their herbal products have received significant attention in healthcare systems worldwide. However, due to increasing commercial interest in using traditional medicine-based herbal products, intentional (through blatant adulteration mainly aimed at turning a profit) or unintentional (through a lack of www.nature.com/scientificreports/ proper quality control measures) adulteration of the plant species used in the compositions can occur. There is an urge to validate herbal products in order to gain consumer confidence by promoting and endorsing quality standards for herbal products. In this study, DNA metabarcoding provided significant testing methods for the identification of multiple mixtures of Thai polyherbal formulations/remedies. Registered herbal formulations yielded larger numbers of identifications than in-house-prepared herbal samples when the rbcL region was used as a marker. However, many herbal products were composed of plant species that were not listed on the labels, which should be taken into consideration by authorities to establish a requirement for herbal products in order to improve the quality control framework in the herbal market. DNA metabarcoding is not yet used as a wideranging authentication method in the quality control step. There is limited advocacy in relation to its importance for herbal product validation and pharmacovigilance both as a regular and a complementary technique. Hence, standardization procedures must be developed before DNA metabarcoding can be applied as a regular systematic technique and approved by expert authorities for use in a regulatory framework. A novel systematic method should ultimately use a combination of suitable chemical approaches and advanced high-throughput sequencing to analyse such mixed herbal products, concentrating on their complete trade chain, from the crude material to the processed, finalized product.

Materials and methods
Traditional Thai medicine (TTM) samples. In this study, 39 samples of medicines on the NLEM were collected in Thailand. Among the samples, 24 were in-house-prepared formulations (by local individuals or medicine practitioners), while 15 were formulations registered with a regulatory agency by the Thailand Food and Drug Administration (FDA) ( Table S6). Herbal samples in various dosage forms, such as tablets (n = 5), capsules (n = 9) and powders (n = 25), were purchased directly from local herbal markets (n = 24) and hospitals (n = 15) in different locations in Thailand. Two groups were categorized based on label information: single and polyherbal samples. Two herbal products were composed of a single plant. For polyherbal products, the combinations of herbs included 2 to 55 plants (Table 1). In the thirty-nine Thai herbal products, 175 plant species belonging to 136 genera and 72 families were listed (Table S7). The details of the herbal products/formulations, including their Thai and English names, dosage forms, label information, and sources of collection, are provided in Table 3. The binomial names and authorities of the ingredients were validated according to The Plant List (2013) 43 .
DNA isolation and sequencing. Genomic DNA was extracted from 100 mg of herbal products (tablets, capsules and powders) using a DNeasy Plant Mini Kit (Qiagen, Germany) according to the manufacturer's protocol and further purified using a GENECLEAN Kit (MP Biomedicals, USA) according to the developer's instructions. The quality and quantity of the DNA were determined by a NanoDrop One UV-Vis Spectrophotometer (Thermo Scientific, USA) (Table S3) and confirmed by agarose gel electrophoresis. Purified genomic DNA was kept at − 20 °C for longer storage. Further PCR amplification with universal primers was carried out using two DNA barcoding regions, namely, the nuclear ITS2 (ITS-2F 30 and ITS-4R 44 primers) and chloroplast rbcL (rbcL2-F 45 and rbcLa-R 46 primers) regions. Extracted DNA from all herbal products was used for PCR in a final reaction volume of 25 µl, including 12.5 µl of template genomic DNA (20 ng/µl) and 12.5 µl of 2X sparQ HiFi PCR master mix (Quantabio, USA). The PCR procedure consisted of initial denaturation at 98 °C for 30 s, followed by 25 cycles of denaturation at 98 °C for 20 s, annealing at 55 °C for 30 s, and elongation at 72 °C for 30 s and a final elongation step at 72 °C for 5 min. Subsequently, PCR amplicons were cleaned using AMPure XP beads and indexed using 5 µl of each Nextera XT index primer in a 50 µl PCR, followed by 8-10 cycles of the PCR conditions listed above. The final PCR amplification products were purified, pooled and diluted to a final loading concentration of 6 pM. Cluster generation and 300-bp paired-end read sequencing were completed with the Illumina MiSeq platform at the Omics Sciences and Bioinformatics Center (Chulalongkorn University, Bangkok, Thailand).
Bioinformatics pipeline. We implemented the bioinformatics pipeline described by Sickel et al. 47 to categorize ITS2 and rbcL reads. In brief, raw sequence reads were obtained via the Illumina MiSeq platform and demultiplexed by MiSeq Reporter version 3.1. Forward and reverse reads were merged using the join_paired_ ends.py command in QIIME v.1.8.0. Low-quality reads were filtered out with USEARCH v8.0.1477. Subsequently, high-quality reads were taxonomically categorized with the UTAX algorithm. The raw UTAX output was analysed using a custom Perl script, which tallied the number of assignments for each taxon and combined the numbers into a single table. This table was transformed into community matrix format, with columns representing samples and rows representing species, and a distinct file with the taxonomic lineage of each species was built. Species-level identifications were accepted if the queried sequences showed 99-100% matches with reference sequences and E-values below the threshold. For MOTUs based on ITS2 and rbcL, identifications were made to the family level and, in few cases, to the genus level because full-length rbcL barcodes may not provide consistency at the genus level 48 . The reference sequence database was the monthly updated NCBI/GenBank nucleotide database, and the parameters were applied to all of the samples.
Presence and absence of species across samples. To assess the presence and absence of species in each sample, three categories were segregated based on the MOTUs from next-generation sequencing using both ITS2 and rbcL with the best match results. The MOTUs detected with metabarcoding in each sample were characterized as identified species (MOTUs corresponding to species listed on the sample label versus the best species match detected with metabarcoding), unidentified (undeclared) species (MOTUs corresponding to species listed on the sample label but not detected among the best matches), and other detected species (MOTUs www.nature.com/scientificreports/ corresponding to species not mentioned on the sample label but identified in the analysis) ( Table S4). The total presences and absences of species per category (identified and not identified) were evaluated (Table S4). The counts of detected plant families and genera (identified with metabarcoding) were normalized by dividing the number of identified (identified with metabarcoding) taxa by the total number of taxa listed among the ingredients on the labels. The undeclared species (not identified with metabarcoding) were calculated by dividing the number of undeclared species (not identified with metabarcoding) by the total number of species listed among the ingredients detected with metabarcoding ( Fig. 1; Table S5). We used the MOTU table as input for QIIME2 software to compute the Aitchison distance and obtained a count sum less than 10, which may represent contamination or chimeric sequences, were filtered out. Principal component analysis (PCA) was performed using the Emperor Plot functionality of QIIME2 49 . The PCA represents the beta diversity differentiating taxonomic composition between communities, and this analysis enables an overview of plant species similarity across herbal products.

Data availability
The datasets generated during and/or analysed during the current study are available in the NCBI GenBank repository, SRA accession: PRJNA650007.