An authenticity survey of herbal medicines from markets in China using DNA barcoding

Adulterant herbal materials are a threat to consumer safety. In this study, we used DNA barcoding to investigate the proportions and varieties of adulterant species in traditional Chinese medicine (TCM) markets. We used a DNA barcode database of TCM (TCMD) that was established by our group to investigate 1436 samples representing 295 medicinal species from 7 primary TCM markets in China. The results indicate that ITS2 barcodes could be generated for most of the samples (87.7%) using a standard protocol. Of the 1260 samples, approximately 4.2% were identified as adulterants. The adulterant focused on medicinal species such as Ginseng Radix et Rhizoma (Renshen), Radix Rubi Parvifolii (Maomeigen), Dalbergiae odoriferae Lignum (Jiangxiang), Acori Tatarinowii Rhizoma (Shichangpu), Inulae Flos (Xuanfuhua), Lonicerae Japonicae Flos (Jinyinhua), Acanthopanacis Cortex (Wujiapi) and Bupleuri Radix (Chaihu). The survey revealed that adulterant species are present in the Chinese market, and these adulterants pose a risk to consumer health. Thus, regulatory measures should be adopted immediately. We suggest that a traceable platform based on DNA barcode sequences be established for TCM market supervision.

definitively identify plant genera, such as Crataegus and Salix. Chemical analyses, such as high-performance liquid chromatographic-mass spectrometric (HPLC-MS) 21 , near-infrared spectroscopy (NIRS) 22 and liquid chromatography-mass spectrometry (LC-MS assays) 23 , can be used to detect chemical compositions to identify adulterant products. However, none of these methods alone can definitively identify closely related species that share remarkably similar morphological characteristics and chemical profiles. These techniques produce only indirect evidence of fraud and cannot definitively determine the identity of the given species. Therefore, there is an urgent need for rapid and simple identification procedures for the rapid inspection of raw herbal materials.
DNA barcoding is a new molecular diagnostic technology that was first proposed by Canadian zoologist Paul Hebert in 2003, and it identifies species by using a recognized standard, short genomic sequence 24 . DNA barcoding provides consistent and reliable results regardless of the age, plant part, or environmental factors of the sample 25 . Researchers can evaluate species information accurately by analysing DNA sequences. Other investigators have suggested that a global DNA barcode revolution would become a "big science" research programme after the human genome project 26 , and Miller published "the Renaissance of DNA barcode and taxonomy" in PNAS 27 . This approach has been repeatedly reported in academic journals (e.g., Nature, Science) and in media outlets (e.g., National Geographic News, The New York Times) stating that DNA barcode technology has become a global innovation for academic research on biological taxonomy. Chen et al. have analysed more than 6600 plant samples belonging to 4800 species from 753 distinct genera by using the chloroplast regions psbA-trnH, matK, rbcL, rpoC1, ycf5 and the nuclear loci ITS and ITS2. These investigators suggested that the internal transcribed spacer (ITS) fails to be amplified and sequenced in most samples and that ITS2 is the most suitable locus for DNA barcoding research, followed by psbA-trnH as a complementary region 28 . By using an ITS2 + psbA-trnH two-loci barcode combination, our group developed a TCM barcode platform, called the Traditional Chinese Medicine Database (TCMD) 29 , which contains 78,847 barcodes belonging to 23,262 medicinal species listed in the Chinese, European, Indian, Japanese, Korean and American Herbal Pharmacopoeias [30][31][32][33][34][35] . There are more than three samples per species in this database 29 . At present, the TCMD is the largest DNA barcode database of medicinal materials. The TCMD also contains the DNA barcoding standard operating procedure (SOP) and provides bioinformatics tools to assist in data analysis for researchers in the herbal identification industry. The TCMD can be accessed at http://www.tcmbarcode.cn/en/.
In this study, we investigated the proportions and varieties of adulterant medicine in herb markets with the aim of protecting consumers from health risks associated with herbal product substitution and contamination by using a standard DNA barcoding method. A total of 1436 raw herbal samples representing 295 medicinal species were collected from the 7 primary markets in China. The advantages and limitations of DNA barcoding for the authentication of complex TCM materials by using the TCMD database are also discussed. Additional details are described in subsequent sections.
Proportions and varieties of adulterant species revealed by the TCMD. BLAST1 was used to estimate the reliability of species identification by the TCMD 29 . We searched the 1260 ITS2 sequences generated in this study in the TCMD database. The ITS2 region results indicated that 4.2% of the sample names were not in accordance with the commercial name (Table 1).
No adulterants were found in the fungal and folium samples (Fig. 1). All of the 18 fungi and 56 folium samples were authenticated. Approximately 13.9% of the cortex samples were found to be adulterant, including the Albiziae Cortex (Hehuanpi), Pseudolaricis Cortex (Tujingpi) and Acanthopanacis Cortex (Wujiapi) from different markets. Of the 410 total sequences generated from the fruit and seed samples, only 2 were adulterants, including one sample of Sojae Semen Praeparatum (Dandouchi) and one sample of Alpiniae oxyphyllae Fructus (Yizhi). The adulterant rate and the failed amplification of flos samples were approximately 8.1% and 12.2%, respectively. Of the 438 total ITS2 sequences generated from the radix et rhizome samples, approximately 7.31% were adulterant.
Of the 295 medicinal species in this study, 198 could be amplified successfully and were validated, including species that are commonly used in TCM, such as Fritillariae cirrhosae Bulbus (Chuanbeimu), Rhei Radix et Rhizoma (Dahuang), Angelicae Sinensis Radix (Danggui), Codonopsis Radix (Dangshen), Saposhnikoviae Radix (Fangfeng), Glycyrrhizae Radix Rhizoma (Gancao), and Polygoni multiflorum Radix (Heshouwu), and the other 97 varieties exhibited failed amplification and adulterants to some extent. The adulterants included species such as Ginseng Radix et Rhizoma (Renshen), Radix Rubi Parvifolii (Maomeigen), Dalbergiae odoriferae Lignum (Jiangxiang), Acori Tatarinowii Rhizoma (Shichangpu), Inulae Flos (Xuanfuhua), Lonicerae Japonicae Flos (Jinyinhua), Acanthopanacis Cortex (Wujiapi) and Bupleuri Radix (Chaihu). The original species of Albiziae Cortex (Hehuanpi) was Albizia julibrissin, but five of the 9 Albiziae Cortex (Hehuanpi) samples were found to be derived from the Cortex of Albizia kalkora Prain (Shanhehuanpi) (Fig. 2)   from Caesalpinia sappan). In total, 53 samples were adulterants, and for 9 samples, the exact species could not be determined ( Table 1).  37 . Previous studies have defined the uncertainties of assigning unknown herbal products with incomplete reference barcode databases in GenBank and BOLD. One of the goals of the Herb-BOL (barcode of life) research programme was to build an herbal barcode library that covered all 1800 known medicinal species used in commercial products. Because of the importance of authenticating medicinal plant materials, it is vital to develop an exclusive, extensive herbal database 25 . The GenBank database (http://www.ncbi.nlm.nih.gov/genbank/) is possibly one of the largest sequence databases and is one of the most frequently used databases for species identification. An unknown DNA sequence can be rapidly compared to known species sequences with the BLAST program 38 . However, at present, many medicinal sample sequences are not adequately represented in GenBank, and in some cases investigators could only declare results at the genus level based on sequence similarity. The TCMD is a barcode database that is exclusively devoted to medicinal species, and it contains 23,262 medicinal and closely related species, including adulterants and substitutions. The TCMD covers almost all the medicinal materials listed in herbal pharmacopoeias from around the world, including China, Europe, India, Japan, Korea and the United States. Currently, the TCMD is the largest DNA barcode database of medicinal materials in the world 29 . Thus, the TCMD platform is the most suitable for the rapid screening of crude medicinal materials. The establishment of the TCMD has greatly improved the resources available for medicinal species identification.

Survey of 7 herb markets. The 7 herb markets investigated in this study included Guangxi Yulin (GX),
Given that some medicinal samples are heavily processed and that some artificial adulterant samples do not contain DNA, DNA barcoding is not sufficient to confirm the identity of any given sample. In the current investigation, we found that at least 50% of the medicinal materials on the market have been fumigated with sulphur to extend the storage time and prevent insect infestation and mildew. In some cases, samples treated with sulphur, such as Lycium barbarum and Dioscorea opposite, appeared very clean and bright in colour and could be sold at a high price. This factor may also affect the amplification efficiency of the sample. In addition, many herbs contain secondary compounds such as polysaccharides, pigments and others. We washed the precipitants with wash buffer three times to remove sticky residues before extraction, but some of the residues could not be removed, which could also make it difficult to extract DNA from these samples. Approximately 12.26% of the samples evaluated in this study could not be successfully amplified and sequenced.
DNA barcoding is an efficient tool for the identification of herbs and for the determination of various adulterants. However, DNA barcoding does not currently yield information regarding the concentration of active ingredients. Thus, DNA barcoding cannot be used to determine whether medicinal samples meet pharmacopoeia standards. In other words, DNA barcoding can be used to establish herbal authenticity but cannot be used to evaluate herbal quality. This drawback indicates that a combination of DNA barcoding and chemical analysis is necessary for a comprehensive quality assessment of herbal samples. HPLC has been used for the differentiation of accessions collected from different geographic regions. DNA barcoding has been used for the differentiation of inter-and intraspecific variations and to detect adulterations. Attempts have also been made by the author to match the results of DNA barcoding to the chemical analysis techniques of Salvia L 39 .
Building a traceable platform for traditional Chinese medicine using DNA barcoding. In many developing countries, the introduction of herbal medicine products into the marketplace is not adequately monitored. Genuine (Daodi) herbs are usually considered to be high-quality medicinal materials that are produced in the Daodi area. However, because many genuine medicinal plants are transported to other places, their characteristics will be changed. In TCM markets, many sellers advertise that their herbs come from the genuine area, but there are no methods to evaluate genuine characteristics. Furthermore, herbal medicine contamination is higher because of the lower stringency of the rules and regulations governing the quality of these herbs in different countries 40,41 . In the present study, a survey of TCM markets identified approximately 4.2% of the samples as adulterants. Such adulterant incidents will only increase if measures are not taken to prevent them. Thus, it is necessary to build a traceable platform to ensure the safe use of TCM.
At present, DNA barcode technology is the best technology for providing traceability. Liu et al. 42 successfully converted DNA barcoding sequences into two-dimensional barcodes (2D-barcodes). In addition, our research group has developed an automated process that converts DNA barcode sequences into 1D-and 2D-barcodes.
Scientific RepoRts | 6:18723 | DOI: 10.1038/srep18723 Other information, including planting, processing and additional consumer information, can also be databased and converted into a 2D-barcode. Smartphones can be used as 2D-barcode readers so that consumers can conveniently scan samples to access information. This type of traceability system would not only help to manage TCM authentication but would also provide a valuable tool to improve TCM quality. Consumers could obtain all the information regarding a commercial TCM that was on the market, including planting, production, processing and circulation information, by scanning the 2D-barcode on the package. A workflow outlining such a system is shown in Fig. 3. In view of the above information, the establishment of a traceability system for TCM based on DNA barcode sequences is urgently needed.
The future of DNA barcoding. Traditionally, commonly used identification methods require special skills acquired through extensive experience; thus, only experts can identify taxa accurately. The current study showed that ITS2 sequences could be used to efficiently identify medicinal species. The herbal industry should adopt DNA barcoding to authenticate the raw materials used to manufacture its products.
DNA barcoding can be easily implemented and will play an increasingly important role in medicinal identification because of its ability to rapidly evaluate samples from leaves, seeds, flowers, dry materials, museum specimens, powders or medicinal materials from which DNA can be obtained. DNA barcoding and next-generation sequencing technology are powerful tools for identifying herbal ingredients in patient medicines 43,44 . There are limitations to the four common methods of identification, namely, original, microscopy, morphological, and physicochemical identification. The DNA barcoding tool can provide supplementary information to improve classifications and to enable a critical examination of the precision of the four common methods used in medicinal material identification. Descriptions of "medicinal materials" in the pharmacopoeia of China with attached DNA sequences should be actively encouraged. Identification approaches that integrate DNA barcoding, morphological characters and chemical attribute information will achieve maximum efficiency for medicinal material identification. Researchers will have easy access to all the related herbal information in the database. With the development of pyrosequencing, sequencing costs have been dramatically reduced, which opens the way to the high-throughput sequencing of ITS2 sequences, facilitating a wide range of research possibilities using medicinal species. However, for some closely related species, such as the 9 unidentified samples in this study, identification will be very difficult when using universal primers, in which case a better approach would be to use the whole chloroplast genome as a super barcode 45,46 .
In conclusion, the current TCM markets are unregulated. The consideration of simple and low-cost measures, such as DNA barcoding, has the potential to make a major contribution to the detection of adulterant products in TCM markets. The present work effectively demonstrates the feasibility of this approach. According to the TCMD, 4.2% of the samples we evaluated were adulterants. The TCMD provides users with easy access for sequence comparisons. The improvement of the TCMD will fulfil its important role in the authentication of medicinal ingredients, which will be beneficial to the entire Chinese herbal industry.  (Fig. 4). Of the 295 medicinal species, 294 were listed in the Chinese Pharmacopoeia, and they accounted for approximately 96.4% (133 varieties) of the commonly used varieties in TCM (total of 138 varieties). Thus, the number of samples collected was large enough to be representative. All the specimens were deposited in the herbarium at the Institute of Medicinal Plant Development. The entire list of 1436 samples can be found in Supplementary Table S1 online. The locations of the 7 markets are shown in Fig. 4, which was created using an open source web site (http://www.dituhui.com/) with the latitude and longitude information for the 7 herb markets. The photographs were obtained from AG, which is the largest market in China, and were taken by co-author Baosheng Liao. The map and photographs were combined with Photoshop software.

DNA extraction and polymerase chain reaction (PCR) amplification. A 75% alcohol solution was
used to clean the surfaces of the herbal material prior to DNA extraction to prevent fungal DNA contamination, and then one piece of each sample was ground into powder with a FastPrep bead mill (Retsch MM400, Germany). Total DNA was extracted with a Plant Genome DNA Kit (Tiangen Biotech Co., China), which is based on the CTAB approach. The key procedure was modified as follows. First, the powder was washed with wash buffer three times to remove sticky residues from the precipitant before extraction. Second, after the extraction buffer was added, the samples were incubated at 58 °C for 8-12 hours. Third, an equal amount of ice-cold isopropanol was used to precipitate the DNA at − 20 °C in a refrigerator for at least 30 minutes. Other procedures were routinely performed as indicated in the CTAB method. The ITS2 was amplified using universal primers 28  Sequencing and analysis. The PCR products were purified with a QIAquick PCR purification kit (Tiangen Biotech, Beijing, China) and were directly sequenced on an ABI 3730XL sequencer (Applied Biosystems, USA) by using the original amplification primer as the sequencing primer. The original forward and reverse sequences were assembled with a CodonCode Aligner 3.0. The assembled sequences were annotated and delimited with a hidden Markov model (HMM)-based method 47 , and the complete ITS2 sequences were pasted into the identification module on TCMD (http://www.tcmbarcode.cn/en/). After the query sequence was submitted, a BLASTN algorithm was activated, and its nearest neighbours to all the reference sequences were made available. When a best match to a reference sequence has been found, the identification module can provide a species-level identification and the Latin name of the best-match species will be given 29 .