DNA Barcode Authentication and Library Development for the Wood of Six Commercial Pterocarpus Species: the Critical Role of Xylarium Specimens

DNA barcoding has been proposed as a useful tool for forensic wood identification and development of a reliable DNA reference library is an essential first step. Xylaria (wood collections) are potentially enormous data repositories if DNA information could be extracted from wood specimens. In this study, 31 xylarium wood specimens and 8 leaf specimens of six important commercial species of Pterocarpus were selected to investigate the reliability of DNA barcodes for authentication at the species level and to determine the feasibility of building wood DNA barcode reference libraries from xylarium specimens. Four DNA barcodes (ITS2, matK, ndhF-rpl32 and rbcL) and their combination were tested to evaluate their discrimination ability for Pterocarpus species with both TaxonDNA and tree-based analytical methods. The results indicated that the combination barcode of matK + ndhF-rpl32 + ITS2 yielded the best discrimination for the Pterocarpus species studied. The mini-barcode ndhF-rpl32 (167–173 bps) performed well distinguishing P. santalinus from its wood anatomically inseparable species P. tinctorius. Results from this study verified not only the feasibility of building DNA barcode libraries using xylarium wood specimens, but the importance of using wood rather than leaves as the source tissue, when wood is the botanical material to be identified.

similar to intervessel pits; perforation plates simple; axial parenchyma, aliform, confuent and narrow bands of 1-4 cells wide; prismatic crystals in chambered axial parenchyma cells; axial parenchyma cells storied; fbres thick-walled, storied; rays exclusively uniseriate, occasionally 2 cells wide, 2 to 10 cells high, homocellular, consisting of procumbent cells; All rays storied. In addition to these anatomical similarities, we report that ethanol extract color, heartwood surface fuorescence, heartwood water extract fuorescence, and heartwood ethanol extract fuorescence are all also indistinguishable. It is thus impossible to make a forensically valid separation of P. santalinus from P. tinctorius based on wood anatomical features.
Te features of the four DNA barcodes were shown in Table 1. Te length of the aligned rbcL sequences was 350 bp with 47 variable sites and 15 informative sites. Te aligned matK sequence was 239 bp long, with 10 variable sites and 10 informative sites. In the ITS2, the sequence was 234 bp in length, with 75 variable sites, 69 informative sites and 16 indels. For the sequence of ndhF-rpl32, the aligned length was 173 bp, with 14 variable sites, 12 informative sites and six indels. Among the four DNA barcodes, ITS2 had the highest proportion of variable (32.05%) and informative (29.49%) sites, followed by rbcL (13.43% and 4.29%) and ndhF-rpl32 (8.09% and 6.94%), with matK showing the lowest values (4.18% and 4.18%).
DNA Barcoding Gap Assessment. Barcoding gaps, the absence of overlapping regions between intra-and interspecifc distances, were evaluated by the results of the distribution graph obtained in the "pairwise summary" function in TaxonDNA (Supplementary Figure S1). In the study, no single-or multi-barcodes exhibited clear barcoding gaps; all barcodes overlapped between the intra-and interspecifc distances. However, the mean interspecifc divergence was higher than that of the corresponding intraspecifc variation for each of the barcodes (Table 2). Among the single barcodes, ITS2 had the highest variation in interspecifc divergence compared to the range of intraspecifc distances (Table 2). When barcodes were individually analyzed, ITS2 presented the best barcode gap performance, with 69.6% of pairwise interspecifc distances greater than 0.05 and 95.9% of pairwise intraspecifc distances lower than 0.05. Conversely, unsatisfactory results were observed for matK, ndhF-rpl32 and rbcL separately, with almost total overlap of intra-and interspecifc variation (Supplementary Figure S1) for each.
As for the barcode combinations, the best results were found for matK + ITS2 and matK + ndhF-rpl32 + ITS2, with 98.9% and 95.1% of pairwise interspecifc distances greater than 0.05, respectively, and 91.8% of pairwise intraspecifc distances lower than 0.05, both of which also outperformed any single barcode. All other barcode combinations showed clear overlap (Supplementary Figure S1).    Species Discrimination based on TaxonDNA and Tree-based Analysis. The parameters "best match" and "best close match" from Taxon DNA were used to analyze all sequences generated in this study as well as those downloaded from the GenBank database (Fig. 3). For single-locus barcodes, both the "best match"  and "best close match" methods provided the similar species discrimination success rate. ITS2 showed the highest success rate (85.1%), followed by ndhF-rpl32 (20.0%), rbcL (18.2%), while matK exhibited the lowest rate (10.7%). Te identifcation success rates for all barcode combinations were generally higher than those of the single barcodes. Te highest success rate (100%) of barcode combinations based on the "best match" and "best close match" analysis was obtained by the two-barcode combination of matK + ITS2 and three-barcode combination of matK + ndhF-rpl32 + ITS2. Te ndhF-rpl32 + rbcL combination exhibited the lowest performance for correct identifcation. All barcode combinations that included ITS2, i.e. matK + ITS2, ndhF-rpl32 + ITS2, rbcL + ITS2, matK + ndhF-rpl32 + ITS2, matK + rbcL + ITS2, ndhF-rpl32 + rbcL + ITS2 and matK + ndhF-rpl32 + rbcL + ITS 2, provided higher identifcation success rates than other chloroplast DNA barcode combinations (Fig. 3). Bootstrap support for species-specifc clusters based on unrooted neighbor-joining (NJ) trees for the four barcodes and their combinations were calculated (Supplementary Figure S2). When barcodes were individually analyzed, the highest species discrimination successes were obtained by ITS2 and rbcL (16.7%), whereas the barcodes matK and ndhF-rpl32 could not distinguish any Pterocarpus species (Supplementary Table S4). Te mini-barcode ndhF-rpl32, 167-173 bps in length, can separate the two anatomically similar species, P. santalinus and P. tinctorius using neighbor-joining tree analysis (Fig. 4B). Furthermore, six continuous diagnostic characters (insertion/deletion) at nucleotide positions from 112 to 117 (TTATTA) were found within the ndhF-rpl32 region (Fig. 4C), which was a distinguishing feature based on the character-based approach. Discrimination of all six species using only one barcode was insufcient to provide an accurate resolution among the Pterocarpus species studied here. When combining two to four barcodes, the highest discrimination rate (100%) was obtained by matK + ndhF-rpl32 + ITS2 and matK + rbcL + ITS2 (Fig. 5). Moreover, the barcode combinations that included ITS2 yielded higher success rates than other chloroplast DNA barcode combinations (Supplementary Table S4).

Assessment of DNA Barcodes for
Pterocarpus. An ideal DNA barcode should be short making it easy for recovery, and have sufcient information to provide maximal species discrimination [30][31][32] . While this is true for any barcode as a general principle, it is a key concern for barcodes for wood identifcation, because wood  is a DNA-poor botanical material in the living tree, and the quality and quantity of DNA in wood degrades with industrial processing, necessitating barcodes known to be recoverable from dry wood. In these Pterocarpus, shorter amplicons showed a generally higher recovery rate than longer ones, with the shortest fragment ndhF-rpl32 having the highest success rate, which is in line with several previous studies 22,28,33 . We expect that the DNA in xylarium wood specimens is typically highly fragmented 33,34 . Additionally, the nuclear ribosomal DNA region ITS2 yielded lower recovery success rate (67%) compared to the chloroplast DNA regions although it is present in multiple copies in the genome. In spite of some amplifcation disadvantages, ITS2 provided the best SCIentIFIC REPORtS | (2018) 8:1945 | DOI:10.1038/s41598-018-20381-6 discrimination performance among the four barcodes. Te superior identifcation power of nuclear DNA region ITS2 over plastid barcodes is also consistent with the results of other previous studies 15,19,[35][36][37] .
Although the chloroplast DNA regions rbcL and matK were proposed as core barcodes for seed plants 31 , the two regions gave low species resolution for Pterocarpus in this study. Both rbcL and matK are widely used in phylogenetic analyses with over 130,000 sequences available in Genbank. Kress et al. 30 showed that the rbcL sequence evolves slowly and this barcode has been recognized as the lowest divergence of studied plastid genes in fowering plants 30 . Consequently, on average it is not likely to be useful for identifcation at the species level 15,31,[38][39][40] . It is reported that matK shows diferent discrimination success rates when it comes to diferent taxonomic groups (e.g. discriminating more than 90% of species in the Orchidaceae 41 ) but less than 49% of species in the nutmeg family 40,42 . Meanwhile, despite its power in phylogenetic studies of other species 17,18 , ndhF-rpl32 showed low resolution for distinguishing all six Pterocarpus species in this study.
No single barcode was found to be able to distinguish all six Pterocarpus species in this study. Overall, combined barcodes provided higher species resolution than any single barcode, which was consistent with previous studies 12,43,44 . Te CBOL Plant Working Group recommended the combination barcode of rbcL + matK as the core barcode for land plants. Yan et al. 32 also demonstrated that the three barcode combination of ITS + psbA-trnH + matK could give better discrimination performance than single barcodes, and was the best choice for the genus of Rhododendron 32 . In this study of Pterocarpus, the highest success rate of barcode combinations based on the "best match" and "best close match" analysis of TaxonDNA was obtained by matK + ITS2 and matK + ndhF-rpl32 + ITS2. When the tree-based analyses (NJ) were conducted, the combination matK + ndhF-rpl32 + ITS2 and matK + rbcL + ITS2 gave the best results. We conclude that the combination matK + ndhF-rpl32 + ITS using these two methods is the best combination DNA barcode to resolve six of Pterocarpus species (Figs 3 and 5).
Although the barcode matK individually or in combination with other chloroplast DNA barcodes yielded a low success rate for species discrimination, interestingly it has the ability to cluster studied Pterocarpus species according to their broad geographic origins (Fig. 6). We found that Asian and African species clustered together except for 1 or 2 samples of P. angolensis (Fig. 6). Here we suggest the two chloroplast locus combination of matK + ndhF-rpl32 as a potential barcode for geographic origin tracking of Pterocarpus species when the recovery success rate is considered. It has been reported that the chloroplast DNA barcodes that are variable enough to reveal geographic structure could be used to diferentiate the origin of timber [45][46][47] . Additionally, Lee et al. 12 also showed that the DNA barcode combination matK + trnL-trnF + ITS2 had the ability of geographic clustering for Aquilaria species 12 .

Species Discrimination between P. santalinus and P. tinctorius based on the Special
Mini-barcode. Inasmuch as P. santalinus and P. tinctorius cannot be separated by wood anatomy but are mixed in trade, an efective method to separate these woods is critically needed. A single DNA barcode targeted to this question alone would be an efective tool, especially if the barcode were easily recovered from both species. Te DNA mini-barcode ndhF-rpl32 could give good performance for distinguishing the two closely related Pterocarpus species.
DNA mini-barcodes, short DNA sequences of 100-250 bp, are suitable for species identifcation within a given taxonomic group of old herbarium/museum specimens when high-quality DNA is not available and seriously degraded DNA is retrieved 9,48,49 . We suggest that the DNA mini-barcoding approach is suitable for species identifcation of woody tissues, especially in narrow cases to separate a small number of anatomically indistinguishable woods. In this study, the recovery success rate of ndhF-rpl32 was highest among the four DNA barcodes in the study (Fig. 4) and this parameter has been used as an important criterion to determine whether DNA could be efectively isolated from wood tissues 11,22,23,34 . Te reduced taxonomic discriminatory power of a mini-barcode compared to that of a full-length barcode and the taxon-specifc nature of which mini-barcodes are most efective are the primary detriments of this approach. If for every group of taxa a new mini-barcode is needed, the basic principle of standardization is violated. Terefore, the choice of position of mini-barcodes from DNA genome is signifcant in their ability of discriminating species [48][49][50] . A good DNA mini-barcode candidate should be of high PCR and sequencing success without much loss of species discrimination power, and as broadly applicable as possible.

Regarding the Utility of DNA Barcoding in the Conservation of and Controlled Trade in
Pterocarpus Wood. Biodiversity conservation has rapidly become a focus of attention due to the sharp increase of global forest resources trade, over-exploitation and illegal logging activities. For forest protection and global trade monitoring, developing accurate species-level identifcation and geographic traceability for wood is a crucial and signifcant technical prerequisite 2 . Te application of DNA barcoding to identify the species and track the geographic origin of internationally traded timber has attracted increasing interest as a potential part of global systems to support sustainable forestry and especially to reduce the behaviors of illegal logging 51 . In addition to this work, previous studies have reported the potential of DNA barcoding to support conservation eforts of wood species, e.g. Aquilaria 11,12 , Dalbergia 23,51,52 and Populus 22 .
DNA barcoding can play an increased role in identifcation and conservation of Pterocarpus species, and of wood species worldwide. Availability of a reliable reference DNA barcode library remains the main obstacle of application of DNA barcoding for the next few years. Our study confrms that xylarium wood specimens are rich sources for reliable DNA sequence data. Xylarium wood specimens could certainly enhance the construction of global DNA barcode reference libraries to support species conservation worldwide, and thus continue to play a critical role as repositories of wood anatomical, chemical, and molecular information for the future.

Materials and Methods
Plant Materials. All wood specimens were taken from the xylarium (wood collections) of the Chinese Academy of Forestry (WOODPEDIA), the largest wood collection in China. A total of 39 specimens of 6 species of Pterocarpus were sampled. Four types of specimens, i.e., heartwood, sapwood, twig, and silica gel-dried leaf were collected in this study. 4-11 individuals per species were sampled. Details of the collected reference samples, including the location of vouchers, are listed in Supplementary Table S1. Molecular Methods. Exposed surfaces of xylarium wood specimens were removed with a sterile scalpel to avoid external contamination. Each wood sample of 500 mg was frozen in liquid nitrogen and then ground into a fne powder in a 6770 Freezer/Mill (SpexSamplePrep, Metuchen, NJ, USA).
All DNA isolations were carried out under sterile conditions. DNA from the wood specimens was extracted following the DNeasy Plant Maxi Kit (Qiagen, Hilden, Germany) protocol, modifed 11 according to Jiao et al. 11 . For silica gel-dried leaves, DNA was isolated using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) following the manufacturer's recommendations.  Supplementary Table S2. Te PCR products were purifed using a UNIQ-10 Spin Column DNA Gel Extraction Kit (Sangon, Shanghai, China) and sequenced in both directions with the same primers used for PCR on an ABI PRISM 3730xl (Applied Biosystems, Foster City, CA, USA).
In addition to the sequences generated in this work, we downloaded sequences (from loci ITS2, matK, ndhF-rpl32 and rbcL) (Supplementary Table S3) for specimens of Pterocarpus from GenBank for analysis.
Light Microscopy. Sectioning blocks [10 mm (L) × 10 mm (R) × 10 mm (T)] were cut with razor blades and then sofened in 2% ethylenediamine at 60 °C for 48 hours. Tereafer, 15 μm thick transverse, radial and tangential sections were cut on a sliding microtome. Sections were stained with a 1% aqueous safranin solution, rinsed, then mounted on glass slides and then observed under a light microscope (Olympus BX61, Japan). Data Analysis. Raw sequences for each region were assembled and edited using ContigExpress in Vector NTI Advance v. 10.1 (Invitrogen InforMax, Frederick, MD, USA), saved in FASTA format and deposited to GenBank (Supplementary Table S1). Te edited sequences were then aligned with Clustal X 1.81 53  adjustment with BioEdit sofware 54 . To assess the barcoding gap, the relative distribution of pairwise genetic distances was calculated using TaxonDNA 55 under the K2P-corrected pairwise distance model 32 .
To evaluate species discrimination success, two widely used methods, TaxonDNA and a neighbor-joining tree-based approach, were applied to the four single barcode and all their possible combinations. For the TaxonDNA analysis, we used the "best match" and the "best close match" functions in the sofware to test the species-level discrimination rates under the K2P-corrected distance model for each barcode singly and all possible combinations of barcodes 52,56 . Te "best close match" method required a threshold value which was calculated for each barcode from pairwise summary. All the results above the threshold were treated as "no match". For the tree-based method, unrooted neighbour-joining (NJ) trees were constructed in MEGA 5 57 with pairwise deletion and the P-distance model 32,51,[58][59][60] . Only when all the conspecifc individuals were clustered a single clade and at least one specimen in each clade was derived from a botanically vouchered collection was it considered a successful species discrimination.