Barcode ITS2: a useful tool for identifying Trachelospermum jasminoides and a good monitor for medicine market

Trachelospermum jasminoides is commonly used in traditional Chinese medicine. However, the use of the plant’s local alternatives is frequent, causing potential clinical problems. The T. jasminoides sold in the medicine market is commonly dried and sliced, making traditional identification methods difficult. In this study, the ITS2 region was evaluated on 127 sequences representing T. jasminoides and its local alternatives according to PCR and sequencing rates, intra- and inter-specific divergences, secondary structure, and discrimination capacity. Results indicated the 100% success rates of PCR and sequencing and the obvious presence of a barcoding gap. Results of BLAST 1, nearest distance and neighbor-joining tree methods showed that barcode ITS2 could successfully identify all the texted samples. The secondary structures of the ITS2 region provided another dimensionality for species identification. Two-dimensional images were obtained for better and easier identification. Previous studies on DNA barcoding concentrated more on the same family, genus, or species. However, an ideal barcode should be variable enough to identify closely related species. Meanwhile, the barcodes should also be conservative in identifying distantly related species. This study highlights the application of barcode ITS2 in solving practical problems in the distantly related local alternatives of medical plants.

accurate identification of T. jasminoides is especially important. However, the T. jasminoides sold in the medicine market is commonly dried, sliced, or shredded, making the application of traditional identification methods more difficult. An accurate and fast tool is required, which calls to mind DNA barcoding.
DNA barcoding is a milestone in taxonomy in its use of short sequences from standard genome regions to identify species [12][13][14] . Unlike animal barcoding, the standard mitochondrial cytochrome c oxidase 1, DNA barcoding in plants has experienced a long time since it was first proposed because of its lower mutation rate and very little variation in the typically used plastid phylogenetic markers 15,16 . Several DNA regions, including atpF-atpH, matK, rbcL, ndhj, ycf5, accD, rpoB, rpoC1, psbK-psbI, trnH-psbA, trnL-F, and ITS as well as their combinations, have been advocated to provide a standard plant barcode [17][18][19][20][21][22][23] . Internal transcribed spacer 2 (ITS2), a part of the nuclear rDNA, has been proposed as a candidate DNA barcode 24,25 and has attracted increasing attention in the recent years. The ITS2 region is an ideal barcode because of its short length, easy amplification with a single pair of primers, high sequencing efficiency, and high variation between species 26,27 . Moreover, molecular morphological characteristics based on the secondary structures of ITS2 provide another identifiable ground for species and better discrimination 28,29 . Many research indicated that ITS2 exhibits a good capacity for species identification 30,31 . Furthermore, the ITS2 region has been used to supervise the proportions and varieties of adulterant species 32,33 .
Previous studies on DNA barcoding concentrated more on the same family, genus, or species. However, the substitution or adulteration of medical plants are not generally closely related species 34,35 . An ideal barcode should be variable enough to identify closely related species while being conservative in identifying distantly related species. This study highlights the application of barcode ITS2 in solving practical problems in the distantly related local alternatives of medicinal plants. The ITS2 region was used to barcode 127 sequences, including those of T. jasminoides and its local alternatives. Meanwhile, the adulterant proportions of T. jasminoides in the medicine market was investigated using barcode ITS2.

Results
Amplification, sequencing, and alignment. Genomic DNA was extracted from the 101 samples.
Electrophoretic results indicated the six bright bands of leaf extractions, whereas the other bands of dried samples collected from markets or drugstores were smeared or invisible. In these cases, the amplification success rate of the ITS2 regions for all the 101 samples were 100%. All the PCR products of the ITS2 barcode were successfully sequenced, and all the bidirectional sequences were of high quality. Before alignment, the lengths of the ITS2 sequences of T. jasminoides, F. pumila, F. tikoua, and E. fortunei were 222, 239, 241, and 220 bp, respectively. The G + C contents ranged from 61.3% to 74.1%, of T. jasminoides, F. pumila, F. tikoua, and E. fortunei was 61.3, 71.1, 67.2, and 74.1%, respectively. The aligned length was 270 bp, and 126 variable sites existed with a rate of 46.7% (Table 1). Therefore, the ITS2 sequences for the collected species were relatively variable.
Genetic divergence with and between species. All the 127 ITS2 sequences were used to calculate the K2P genetic distances. In the present study, only one haplotype of T. jasminoides species was found, no variable sites were observed, and the intraspecific distance of T. jasminoides was 0.000. The interspecific distance between T. jasminoides and its local alternatives varied from 0.488 to 0.607. The minimum interspecific distance of 0.488 between T. jasminoides and F. pumila was far larger than the maximum intraspecific distance ( Table 2).
Barcoding gap assessment. Barcoding gap is an important index in determining whether a DNA barcode is suitable or not. In the present study, the distributions of intra-and inter-specific divergences in ITS2 barcode were examined at a scale bar of 0.03 distance units. The intra-specific distance of the ITS2 region was low because

K2P genetic distances Genetic distances
Intra-specific distance of T. jasminoides 0.000 Inter-specific distance between T. jasminoides and E. fortunei 0.567-0.607 Inter-specific distance between T. jasminoides and F. tikoua 0.505-0.522 Inter-specific distance between T. jasminoides and F. pumila 0.488-0.502 Table 2. Analysis of intra-specific variation and inter-specific divergence of the ITS2 sequences.
only one haplotype of the T. jasminoides species was present. The ITS2 region presented a very high inter-specific variation and an obvious barcoding gap was noted because the inter-specific variation never overlapped with the intra-specific distance (Fig. 1). Results indicate that the ITS2 region was variable and consistent with the high inter-specific divergence in medicinal plants 24 .
Species identification efficiency. The identification efficiency of the ITS2 region was evaluated by the BLAST1 and nearest distance methods. In this study, results showed that ITS2 region could successfully identify all the samples with both methods without incorrect or ambiguous identification ( Table 3).
Analysis of secondary structure. In this study, the secondary structures of ITS2 regions were predicted to identify the species. A central ring and four similar helices, namely, Helix I, II, III, and IV, existed in the ITS2 secondary structures of T. jasminoides and its common local adulterants. However, the secondary structures of Figure 1. Relative distribution of interspecific divergence between congeneric species and intraspecific distances for ITS2 locus.   ITS2 among these species displayed significant differences in the four helices in terms of stem loop number, size, position, and degree of angles from the center of the spiral arm ( Fig. 2). Thus, the secondary structure of ITS2 provides another dimensionality for species identification at the molecular morphological characteristics level.

Number of samples
NJ tree analysis. An NJ tree was constructed using all the 127 ITS2 sequences. Results demonstrated that all 92 T. jasminoides species clustered into one clade, while F. pumila, F. tikoua, and E. fortunei clustered into their own clades. Notably, LS006 and LS041 samples were identified as E. fortunei and F. pumila, respectively, indicating that adulterants of T. jasminoides existed in the medicine market and ITS2 can be used as a good monitor (Fig. 3). Overall, NJ tree can clearly distinguish between T. jasminoides and its common local adulterants.
Two-dimensional DNA barcoding. In the current stage, "DNA barcode" only represents DNA sequences that are unamenable to information storage, recognition, and retrieval. The QR code was found superior in representing DNA barcode sequences efficiently 36 . The ITS2 sequences of T. jasminoides and its local adulterants were transformed into QR codes and two-dimensional DNA barcoding images were used in this study (Fig. 4). In the left colored DNA image, the different colors represent different nucleotides and the numbers represent the lengths of the sequences, which can be used in obtaining clear sequence information. By scanning the two-dimensional code at the right with the scanner (e.g., mobile terminal), the species sequence can be obtained. After sending the sequences to the database for identification, results are returned to the scanner, making the identification more convenient and rapid.

Discussion
Barcode ITS2: a good tool for identification. The substitution of medical herbs with local alternatives in clinical treatment is prevalent. Therefore, a rapid and accurate identification method of medical herbs is urgently needed. DNA barcoding is not influenced by external factors, biological development stage, or organ tissue, thus providing a basis for identification at the gene level. ITS2 region was proposed as the standard marker for medical plant species identification and for phylogenetic analysis 37,38 . In this study, the ITS2 sequence was used to distinguish T. jasminoides from distantly related local alternatives with one universal pair of primer for PCR amplification. DNA was isolated from dried samples, which were degraded on several levels. For dried stem samples, tissues with more living cells, appropriate increase of sample quantity and extended water bath time will be better for DNA extraction. The success rate was 100% because ITS2 is a multi-copy region, and amplifying the degraded DNA was easier than the other regions. In the present study, high-quality bidirectional sequences were obtained for 100% of the ITS2 region without manually editing the trace files. The ITS2 sequence length ranged from 220 bp to 241 bp, satisfying the short length criterion of good barcoding. All the ITS2 sequences were used to perform BLAST, and results indicated the absence of fungal contamination in the samples. Barcoding gap is an important criterion for an ideal DNA barcode 39,40 . The ITS2 region exhibited conservative intra-specific divergence and high inter-specific variation, and the barcoding gap was obvious.
In addition, the secondary structures of the ITS2 region provided morphological characteristics that can help improve species identification 28,41 . The secondary structures of T. jasminoides and its local alternatives displayed different stem loop numbers, positions, sizes, and angle degrees. Discriminatory power is another criterion for a good DNA barcode, and the results of the BLAST 1 and nearest distance methods indicated that ITS2 regions successfully identified all the texted samples. Furthermore, according to the NJ tree results, T. jasminoides and its local alternatives clustered into their own clades. Therefore, barcode ITS2 successfully identified T. jasminoides and its local alternatives and exhibit an ideal barcode criterion. Our results corresponded well with the results of previous studies 42,43 .
Notably, the internal transcribed spacer of nuclear DNA region exist multiple copies within each cell, it is doubtful whether the PCR sequences would be stable and representative, which making the use of ITS2 barcoding more complicated 14,56 . Song et al. suggested that major variants of the ITS2 region is sufficient for phylogenetic analysis and species identification in most cases 31 . The current study indicated that the multi-copy of the ITS2 sequences is not a problem in the identification of T. jasminoides and its adulterants in this study, which highlights the universality of the ITS2 region as a DNA barcode. To our knowledge, this is the first time that T. jasminoides and its adulterants have been identified using the ITS2 region in such a large sample size. In addition, each species were represented by more three samples for better determination of intraspecific variation. Moreover, in previous studies, most of the samples were fresh or silica gel dried, while 93.5% of the samples were dried and sliced in the current study, which broaden the practical application of the ITS2 region in the herbal plant field. DNA barcoding for medicine market supervision. The global trade of raw drugs has been witnessed in the past decades, and herbal product market has become much more promising. However, an increase in immoral commercial practices has emerged to make more profits, whereby authentic herbs are substituted with cheap, less effective, and often deleterious herbs. Accurate and fast species identification is the key to herbal market safety 44 . Traditional methods commonly require professionals, such as taxonomists, who require significant amount of time and are sometimes inconsistent in their opinions on identification at the industrial level. DNA is more stable than other macromolecules (protein and RNA), and genetic molecule is not affected by external factors and is easily isolated in all tissues 45 . Many research have been conducted using DNA barcoding to supervise the medicine market 46,47 . The present study broadens the use of ITS2 barcode to market products. Two adulterants were detected, including one F. pumila and one E. fortunei, from the 79 dried T. jasminoides samples purchased from drug stores and herbal markets. Barcode ITS2 was proposed for application in monitoring T. jasminoides in the medicine market. Similarly, DNA barcoding should be embraced for identifying herbal products through the testing of raw materials applied in the herbal industry 48 . Scientific RepoRts | 7: 5037 | DOI:10.1038/s41598-017-04674-w DNA barcoding in two-dimension. DNA barcoding is the final product of a handled DNA barcoder, which contains components for DNA extraction, amplification, and sequencing and a DNA barcode analysis engine with the associated software tools and database. To realize the "life barcoder" better, an appropriate format is essential. At present, "DNA barcode" specifically refers to DNA sequences. However, the large printout size and difficulty in information retrieval of the sequences limit barcoding in practical applications. Therefore, a new format of retrieving DNA barcode information is urgently required. Barcode technology has been applied in the manufacturing and retailing industries for a couple of decades. This well-developed technology has been used to investigate the symbology that could represent DNA barcode sequences better, as reported by Liu et al. 36 . Their results indicated that the QR code had the largest coding capacity and relatively high compression ratio. In the present study, the ITS2 sequences of T. jasminoides and its local alternatives were converted to QR codes, and two-dimensional images were obtained along with the clear sequence information. The assurance of the genuine origin of herbal materials is crucial, two-dimensional DNA barcoding can monitor the source of medicinal materials from the origin. The true and false recognition of the herbal materials can be performed in the field, as well as the random inspection using two-dimensional DNA barcoding. QR code-based DNA barcodes promote DNA barcoding applications to a more practical level and extensively illustrate the potential purpose of DNA barcoding in the identification of medicinal plants to ensure product safety. At present, two-dimensional DNA barcoding merely encompasses Latin names and sequence information for species, pictures of medicinal herbs and property descriptions should be included for better identification.
Challenge and promise for DNA barcoding. In the present study, barcode ITS2 exhibited a powerful discrimination capacity toward T. jasminoides and its local alternatives, perfectly resolving practical problems. However, the ITS2 barcode is not sufficient to identify the heavily processed materials with degraded DNA, which are difficult to amplify. A nucleotide signature specific to the tested sample within the ITS2 region is also an effective approach. The ideal nucleotide signature should be completely conserved within a specific species 49 . In addition, whether the genuine T. jasminoides samples texted in this study satisfy the quality of Chinese pharmacopoeia standard, in which the content of tracheloside must be more than 0.45%, require further study. Moreover, attention should be given to the possibility that herbal products may not only be contaminated at the plant species level. The substitution of the non-prescribed plant part or a prescribed part harvested during the wrong season also present different medical effects. Thus, DNA barcoding will fail to identify lower quality products under this circumstance 44 . Therefore, merely relying on DNA barcoding to perform quality control for medical plants is insufficient. DNA barcoding should be supplemented with morphological and biochemical traits for raw drug supervision to guarantee clinical safety 50 .

Sampling.
A total of 101 samples were collected from the Hebei, Henan, Hubei, Hunan, Shanxi, Sichuan, Anhui, Shandong, Liaoning, Jilin, Heilongjiang, Xiamen, and Guangxi provinces and Beijing municipality. Among them, 82 samples were T. jasminoides (79 stem and 3 leaf samples), six were F. pumila (5 stem and 1 leaf samples), eight were F. tikoua (7 stem and 1 leaf samples), and five were E. fortunei (4 stem and 1 leaf samples). The collected data are listed in Appendix S1. The leaf samples collected from the field were silica gel-dried, while the stem samples purchased in the medicine markets and drugstores were already dried. All corresponding voucher samples were deposited in the Beijing Forestry University, Beijing, China. In addition, 12 published T. jasminoides, 5 F. pumila, 6 F. tikoua, and 3 E. fortunei ITS2 sequences (or containing ITS2 sequence) were downloaded from the GenBank (Appendix S1). DNA isolation, amplification, and sequencing. The DNA extraction and amplification of the ITS2 locus of the 101 samples were conducted in the Laboratory of Molecular Biology at the Department of Biology. The samples were first scraped and then wiped with 75% ethanol to prevent fungal contamination. Then, 40 mg of the samples were rubbed with liquid nitrogen until they became powder. Total genomic DNA was isolated using a plant genomic DNA kit (Tiangen, China), which is based on the CTAB approach. The ITS2 sequences were amplified using a pair of universal primers: ITS2-2F, 5′-ATGCGATACTTGGTGTGAAT-3′ and ITS2-3R, 5′-GACGCTTCTCCAGACTACAAT-3′ 24 . The primer pair was synthesized by the Shanghai Sangon Biotech and Service, Beijing branch (Beijing, China). PCR amplifications were performed in 25 µL reaction volumes containing 12.5 µL of 2 × EasyTaq PCR SuperMix (Beijing Baierdi Biothch Co., China), 8.5 µL of molecular grade water, 1 µL of each primer (2.5 µM), and 2 µL of the DNA template. The PCR reactions proceeded at 94 °C for 5 min, followed by 40 cycles at 94 °C for 45 s, 56 °C for 45 s, 72 °C for 1.5 min, and a final extension step at 72 °C for 10 min. The PCR products were sequenced in both directions by the Institute of Crop Science, Chinese Academy of Agricultural Sciences (Beijing, China). Data analysis. The raw trace files were trimmed and assembled using CodonCode Aligner 6.0.2 (CodonCode Co., USA). All ITS2 regions were annotated using the Hidden Markov model to remove the 5.8 S and 28 S sections 51 . Sequences with lengths less than 100 bp were eliminated as well as the sequences contaminated by fungi or other unnamed species 52 . The effectiveness of the ITS2 locus was evaluated with the following methods.
Intra-and inter-specific divergences. All the 127 sequences were analyzed by MEGA 5.2.2 53 to obtain the aligned length, G + C content range, and the number of variable sites. The sequences were aligned using Muscle, and the intra-and inter-species genetic distances were computed with the kimura-2-parameter (K2P) model in MEGA 5.2.2. Intra-and inter-species pairwise divergences were calculated as the barcoding gaps using TAXON DNA 54 .
Scientific RepoRts | 7: 5037 | DOI:10.1038/s41598-017-04674-w Authentication efficacy evaluation. Two identification methods, BLAST1 and nearest distance, were used to evaluate the authentication efficacy of the ITS2 region 55,56 . All ITS2 sequences were regarded as query sequences, and BLAST program was performed in the BLAST 1 method. Three situations were considered. One is when the best BLAST hit of the query sequence is from the expected species, it is considered a correct identification. Another is when the best BLAST hits for a query sequence are from several species, including that of the expected species, they are regarded as an ambiguous identification. The other is when the best BLAST hit is not from the expected species, it is considered an incorrect identification. The nearest distance method is based on the smallest genetic distances, and correct identification indicates that the hit comes from the same species as that of the query. Ambiguous identification means that several hits have the same smallest genetic distances in the query sequence, and incorrect identification indicates that the hit is not from the query sequence.
Tree-based method. A phylogenetic analysis of all the 127 sequences was performed using neighbor-joining (NJ) tree method to evaluate the identification capacity of the ITS2 locus, and node support was assessed based on 1000-replicate bootstrap tests. The identification of species with multiple individuals clustering into one clade based on the NJ tree method was considered successful when the bootstrap value was above 60% 57 .
Secondary structure prediction. The deep level phylogeny derived from the ITS2 data largely agreed with the phylogenetic hypotheses from morphologic and other molecular evidence. The secondary structure of all the ITS2 sequences was predicted using the ITS2 Workbench (http://its2.bioapps.biozentrum.uni-wuerzburg.de/) 58 . Two-dimensional DNA barcoding. Quick response (QR) code was found to be a suitable symbology for the DNA barcode sequences. Based on the QR Code coding approach, the ITS2 sequences were transformed into two-dimensional images (http://qrfordna.dnsalias.org) 36 .