Development and technical application of SSR-based individual identification system for Chamaecyparis taiwanensis against illegal logging convictions

Chamaecyparis taiwanensis is an endemic plant suffering illegal logging in Taiwan for its high economic value. Lack of direct evidence to correlate stump and timber remains a hurdle for law enforcement. In this report, 23 polymorphic Genomic Simple Sequence Repeat (gSSR) and 12 Expressed Sequence Tag (EST)-SSR markers were developed and their transferability was assessed. The individual identification system built from selected non-linkage 30 SSR markers has a combined probability of identity as 5.596 × 10–12 equivalents to identifying an individual in a population of up to 18 million C. taiwanensis with 99.99% confidence level. We also applied the system in an actual criminal case by selecting 19 of these markers to correlate illegally felled timbers and victim trees. Our data demonstrate that molecular signals from three timbers hit with three victim trees with confidence level more than 99.99%. This is the first example of successfully applying SSR in C. taiwanensis as a court evidence for law enforcement. The identification system adapted advanced molecular technology and exhibits its great potential for natural resource management on C. taiwanensis.

The problem of illegal logging has been paid attention since 1995. More and more national and international regulations mandate tracking systems that ensure traceability on wood market [8][9][10] . Wood anatomy and dendrochronology are common visual identification method. The former is based on the anatomical characteristics to identify the wood, and can usually be identified to the genus 11 ; the latter is often used to illustrate past climates, but may also provide the age and origin of the trees 12 . Compounds synthesized by trees and other plants are often called phytochemicals and are often used to identify species or distinguish genera. Intraspecific variation can also be detected in some species through some chemical analysis such as mass spectrometry 12,13 , near infrared spectroscopy 14 , detector dogs 15 , stable isotopes 16 , and radiocarbon 17 . Genetic analysis can provide species-level identification, which is usually achieved by DNA sequence polymorphism 18 . Simple sequence repeats (SSRs) and Single nucleotide polymorphisms (SNPs) can be used to identify individuals and can be used in population genetics or systematic geography to determine the geographical region of origin within a species 19 . DNA fingerprinting is built into each organism itself and cannot be forged 20 . When enough markers are developed, in principle every individual has its own unique DNA fingerprint. DNA fingerprinting has the potential to track wood products independently within complex global supply network 21 . Theoretically, DNA fingerprinting is the only forensic wood identification technology that could be used to connect seized timber to illegally felled stumps 8 .
SSR is the most common marker used in individual identification for its short length, high polymorphism, easy polymerase chain reaction (PCR) amplification, high reproducibility, and high sensitivity 20,22,23 . SSRs are divided into two broad categories by different sources: Genomic (g)-SSR and expressed sequence tag (EST)-SSR 24 . gSSR markers are derived from amplified genomic libraries. EST-SSRs are markers mined from EST sequence collections. gSSR markers have been reported to be more polymorphic when compared with EST-SSR in gymnosperms 4 and crops 25,26 because of a more diversified nucleotide sequence. Since the development of highthroughput sequencing technology, the marker development technique has been continuously advanced. Wang et al., 2018 published the first report on gSSR developed by De novo genome sequencing 27 . In contrast, EST-SSR, derived from the expressed sequence, is fast-acting, cost-effective and labor-saving alternative for non-model organisms 24 . Because of the conservative nature in gene coding regions 24 , newly developed EST-SSRs usually can be transferred in closely related species for marker development. The first EST-SSR based on Illumina-based de novo transcriptome was also published by Zhou et al. in 2018 28 . A study to develop both markers would avail of their merits and functions simultaneously.
For C. taiwanensis, evaluation of genetic variation or population structure is necessary for its preservation 2,29 because this species is used extensively. After mid-twentieth century, the number of C. taiwanensis plunged, which also led to a significant decrease in both genetic variation and population structure. As an important tool for genetic and subsequent breeding, SSR markers are helpful for breeding polymorphic maternal plants and increasing the diversity of progeny. The objective of this study is to establish a scientifically valid SSR mediated individual identification system for C. taiwanensis in order to provide court evidence to link the seized wood and the victim tree, and to provide traceability proof for wood supply network. In the beginning of the research, we used Next Generation Sequence (NGS) technology to establish the DNA and RNA libraries of C. taiwanensis to accelerate the development of gSSR and EST-SSR markers. A total of 96 samples from four populations were used to evaluate the polymorphism, discriminative power, and random match rate of the selected SSRs. The linkage disequilibrium between markers was calculated to estimate the availability and credibility of the individual identification system. In this study, we successfully linked 3 stolen timbers back to 3 victim trees (case number MJIB-DNA-1080413 combine 1080328), marked the first successful application of C. taiwanesis individual identification system. Finally, our work would deter illegal felling toward these precious species by manifesting law enforcement effectively.

Result and discussion
Developing C. taiwanensis individual identification system. Choice of template and library preparation. The gSSR are characterized by high polymorphism and is suitable for developing individual identification markers. The EST-SSR are highly conservative which could be used for developing markers to categorize species and populations 20,22,23 . In this study, both DNA and RNA libraries were constructed simultaneously as gSSR and EST-SSR markers, respectively (Fig. 1, Supplementary Sect. 1). From the three DNA libraries and from a RNA library prepared for the study, the sequences were compared between individual plants as well as between groups (Supplementary Sect. 1). With these two nucleic acid markers, we envisioned to differentiate samples within or among species.
Nucleic acid sequencing and analysis. Next-generation sequencing technology enables the possible procurement of large number of sequences in a short time. In this study, we used the Illumina MiSeq platform (2 × 300 bp) to sequence the DNA and RNA libraries (Fig. 1). A total of 13,651,578 and 11,763,646 raw reads were produced from DNA and RNA libraries, respectively. The raw reads were deposited in the NCBI Sequence Read Archive (PRJNA506084). The sequences were then subjected to quality-trimming and merging and afterwards 4,236,284 contigs of the DNA pool and 4,392,534 RNA contigs were assembled. The base lengths of contigs ranged from 120-579 and 120-529, at an average of 420 for DNA and RNA, respectively. According to the work published by timber researchers 23,30 , the nucleic acid markers with fragment lengths of around 250 bases best meet our research goals. The lengths of contigs derived from the four libraries we have prepared were found to be suitable for screening markers within 250 bp length. A target band size below 250 bp implies a higher PCR success rate as the DNA of wood samples from seized timber and victim trees were mostly severely degraded.
SSR discovery and primer design. A sum of 318,153 gSSR and 63,390 EST-SSR candidate sequences were screened by Simple Sequence Repeat Identification Tool (SSRIT) 31 (Fig. 1) www.nature.com/scientificreports/ amplifying (null) alleles and other factors. Therefore, about 90% of the designed markers could be screened out.
We designed a total of 395 gSSR and 105 EST-SSR primer pairs for testing in C. taiwanensis.
Marker validation. From the PCR results, 23 gSSR and 12 EST-SSR markers with polymorphism were selected ( Fig. 1, Tables 1, 2), and the success rate for gSSR and EST-SSR marker was found to be 5.82% and 11.42%, respectively. Our data showed that it is easier to select SSR markers from the RNA library than from the DNA library, which is akin to previous studies 4, 32,33 . Other reports 24,34,35 suggest that SSR occurs more frequently in EST sequence than in the genome. In addition, the fact that the information content in EST is markedly lower than that in the genome promotes the calculation and analysis of EST in silico 24,33 .  (Table 2). Heterozygosity, being one of the first parameters that appear often in a data set, reveals lot of information including population structure and other historical clue. High heterozygosity means a lot of genetic variation, whereas low heterozygosity means almost no genetic variation. The heterozygosity data echo the results of PIC, PD and PI, suggesting that the SSR marker developed in this study has moderate genetic variation. In addition, most of these markers show Ho < He (except CoTW495, CoTW556, CoTW559, CoTW598, CoTW424), which suggests that the population of C. taiwanensis is an inbred. A total of 15 sets of SSR marker used in this study deviated from HWE, which suggest the population may be not under the ideal status of HWE. The reason for this deviation could be artificial selection, non-panmixia or genetic drift 36 . Generally, EST-SSR markers are less polymorphic than gSSR in plants because of high conservation in transcribed regions 24 . Moreover, other factors 33,37 such as SSR motif type, sample size, population and species may also differentiate gSSR and EST-SSR markers. However in this study, in terms of polymorphism and cross-species transferability, there was no significant difference between gSSR and EST-SSR groups (Supplementary Sects. 2, 3), but the difference rather occurred among individual markers. This fact might be explained by polymorphism and detection limit as markers with higher PD are often selected for individual identification. Also in our study, the differences in polymorphism and cross-species transferability between gSSR and EST-SSR are not significant, but those among markers are significant. It might be because of the giant genome size in taxa Chamaecyparis (20.03-27.40 pg/2C) 38,39 which leads a deviation from random sampling in marker selection. When using the system to perform individual identification assay, a marker with higher PD should be considered as priority.
Probability of identity and power of discrimination analysis. Continued multiplication can be used to calculate the cumulative random probability of identity (CP I ) and the combined power of discrimination (CPD) for non-linked markers, where CP I is the probability of two individuals most likely the same genotype, CPD is the probability of individuals being identified, and CP I + CPD = 1. The credibility of the system is calculated based on "Random match probability in population size and confidence levels" published by Budowle et al. 43 : While applying the system in criminal cases, for the sake of objective and impartiality, practically the court will use the credibility of 95%, 99%, or 99.99% as aacceptance criteria 40 (Wall 2002, ISO ISO/IEC 17025). In this study, only one marker in the same linkage group is used for CP I analysis, and up to 30 markers can be continuously accumulated (Table 5). Also, the individual identification system's CP I is as small as 5.596 × 10 -12 , and the CPD is as high as 0.99999999994404 (extremely close to 1). Applying the court's strictest credibility standard of 99.99%, when the number of markers reached up to 30, the system can identify 18 million individuals, which actually exceed the whole C. taiwanensis population of 7.39 ± 0.73 million in Taiwan 41 . While applying to the lowest acceptable credibility standard of 95%, the system could identify at least 2300 plants with a minimum number of 6 markers (Table 5).
Aligning seized timbers to victim trees. In this case (MJIB-DNA-1080413 combine 1,080,328), we successfully matched 3 seized timbers back to 3 victim trees by using 19 pairs of non-linkage SSR markers (Fig. 2, Table 6, Supplementary Sect. 4). The credibility values of the 3 cases are all above 99.99%. In our experiments, DNA samples were extracted twice or more from each sample in order to optimize the DNA concentration. Since 2007, forestry researchers have noticed that molecular markers can be used to provide direct evidence linking stolen timber and victim trees 42 . Although many techniques have developed for extracting DNA from fresh and dried leaves (including published literature 43,44 and commercial reagents), yet few studies have reported on extracting DNA from dried wood, which is still considered the most challenging part in this field of research 45 46 . False negative result cannot be ruled out from over-concentrated sample and vice versa. Therefore, it is necessary to extract DNA two or more times for dry timber, as abovementioned, because its DNA extraction is challenging. Several studies suggested that the error rate increases along with PCR cycles 47,48 . Base misincorporation incurred by PCR occurs randomly throughout the sequence without hot spots 48 . The probability of base misincorporation is 1.85 × 10 -5 per base per cycle 48 . After comparing the results of positive and negative endpoint, we discovered 36 cycles is the upper limit which leads to positive PCR product without false-positive result. From comparing the results of positive and negative endpoint, we discovered 36 cycles is the upper limit which leads to positive PCR product without false-positive result. Therefore, the cycles were controlled below 35 cycles in our study, but not increasing cycles without limit. In addition, the SSR types of each marker were analyzed at least twice with ABI 3130XL. Signals below 150 RFU peak height threshold were considered not detectable. We developed a protocol of two sessions of instrumental operation and setup threshold value from pilot test result for illegal felling investigation cases. By comparing the profiles from positive and negative controls with test samples, we can obtain objective data with least erroneous possibility to conclude our investigation for court evidence. Thirty-seven victim trees were reported by Luodong Forest District Office in December 2018. According to census data, the crime scene forest area is 281.03 hectare and the density of C. taiwanensis is 16 ± 1.6 individuals/ hectare. However, in order to protect suspects' rights, we took an excess of the maximum possible population size: 10,000 into the calculation. Among 22 samples in this case, 7 succeeded in analysis, which is, by our definition, showing positive result in just 35 PCR cycles. The rest were denoted "Not detected" due to low positive PCR result (all tests comply the standard of accredited laboratory ISO/IEC 17025) or CL < 95%. It is worthwhile to note that seized illegally-felled timber 6TC matched the victim tree 6 TB (CP I = 3.342 × 10 -13 , CPD = 0.999999999999666, CL = 99.9999999%). In addition, seized illegally-felled timber 7TC matched with victim tree 7TA and 7 TB (CP I = 1.631 × 10 -13 , CPD = 0.999999999999837, CL = 99.9999999%), and seized illegally-felled timber 8 TB matched with victim tree 8TA (CP I = 4.468 × 10 -10 , CPD = 0.999999999553151, CL = 99.999532%). In this individual identification system test case (Table 6, Fig. 2, Supplementary Sect. 4), the minimal amount of matching marker was 17 among the positive-matched groups (CL = 99.999532%). The credibility increased along with the matching marker amount. The credibility is dependent on population size and matching marker amount. In addition, while aligning the evidence to the victim individuals, it is a common scenario that the sample DNA might have been degraded. Successful extraction is one of the crucial steps to identify same individual using DNA matching techniques. The extractable DNA in desiccated timber is low in quantity and poor in quality. The extracted DNA can only be used for individual identification using markers developed for specific species. In this regard, SSR marker is a traditional marker for individual identification, which has been widely used in human and gradually extended to other species. All the SSR markers designed in this study are shorter than 300 bp, which would be suitable for amplifying the lysed DNA fragments from desiccated timbers. Although SSR marker has the merits above mentioned, the overall success rate of DNA extraction and genotyping from timber is relatively low (31.81%, 7 out of 22 samples tested successfully). The low quantity and quality of DNA in timber sample might have limited our success rate. An improvement on DNA extraction method would enhance our success rate on timber samples. In addition, increasing the number of SSR markers capable of individual identification would decrease the overall CP I and increase CL. Overall, we provide scientific proof that can be used directly as court evidence in illegal felling cases. This is the first time study reporting the SSR individual identification system which could be applied to various precious species. A warning to forestall illegal felling is the most valuable impact of this study: DNA types of these precious trees have been filed. The illegal  www.nature.com/scientificreports/ felling crime rate is dropping after public propagation of cypress individual identification system. Moreover, the individual identification system would also provide certificate for legal timber trading 21 . This system would also deter dishonest businessman piggybacking illegal material in legal timber auction, which would further forestall illegal logging. In addition, these markers can be also used in population genetic analysis studies that facilitate the conservation and breeding of C. taiwanensis.

Conclusions
In this study, we developed an individual identification system for C. taiwanensis and provided the scientific evidence. This methodology can be adopted by the courts to link seized timber and victim trees. The C. taiwanensis individual identification system of this study includes 23 gSSR and 12 EST-SSR markers revealing polymorphism. When the 30 non-linkage markers were applied to C. taiwanensis identification, the lowest CP I was 5.596 × 10 -12 and the highest CPD was 0.999999999994404, which was sufficient to identify 18 million random samples of C. taiwanensis (CL = 99.99%). While applied in the criminal cases of C. taiwanensis illegal logging, this SSR marker system successfully matched five seized illegally-felled timbers to three victim trees with minimal 99.99% CL. To the best of our knowledge, this is the first time the SSR technology is being applied to provide molecular evidence for court conviction on C. taiwanensis illegal logging. Our study would provide not only the scientific evidence correlating seized timber and victim tree, but also could inherent unique serial number to identify every single C. taiwanensis timber. We demonstrated the feasibility of matching seized/ illegally-felled timber with victim tree by modern SSR technology, which would prevent illegal logging by warning the criminals that the woodland trees    Table 4. Genetic characterization of 12 polymorphic EST-SSR loci of Chamaecyparis taiwanensis. A number of alleles, Ho observed heterozygosity, He expected heterozygosity, PIC polymorphism information content or power of information content, PD power of discrimination, PE power of exclusion or probability of exclusion, P I probability of identity, PD is equal to 1 − P I , N number of individuals. *Highly significant from Hardy-Weinberg equilibrium (P < 0.001). www.nature.com/scientificreports/ could be identified on the basis of molecular level. Additionally, these markers can be also used in population genetic analysis studies that facilitate the conservation and breeding of C. taiwanensis.  Sequencing and analysis. Three DNA and one RNA libraries were sequenced using the Illumina MiSeq System (2 × 300 bp paired-end; Illumina, San Diego, California, USA) at Tri-I Biotech (New Taipei City, Taiwan). The raw reads were prescreened to remove adapter sequences and reads with greater than 0.1% error or with an www.nature.com/scientificreports/ average quality less than QV30. High-quality filtered DNA and cDNA reads were merged by CLC Genomics Workbench version 7.5 (QIAGENE, Aarhus, Denmark).

SSR screening and primer design.
SSRIT was applied to screen the gSSR and EST-SSR containing sequences from contigs. To design gSSR and EST-SSR primers, sequences with at least five di-, tri-, tetra-, penta-, and hexa-nucleotide repeats were selected using BatchPrimer3 52 , with optimized conditions set length at 18-23 bp, melting temperature 45-62 ℃, and a product size of 80-300 bp.

Aligning seized timbers to victim trees. Samples from five seized timbers of Taiwan Yilan District
Prosecutors Office, six illegally-felled timbers found at crime scene woodland and seven victim trees (Supplementary Sect. 4) were collected. Duplicates of a victim tree (7TA and 7TB) was sourced out in order to ensure the reproducibility of the identical SSR type in individual tree. Two grams of each sample was powdered in liquid nitrogen and the total genomic DNA was extracted following the protocol of VIOGENE plant DNA extraction kit (VIOGENE, New Taipei City, Taiwan). Nineteen non-linkage markers were selected for DNA typing. The sample succeeded in typing were further combined to the aforementioned database to calculation the CP I .