DNA databases of an important tropical timber tree species Shorea leprosula (Dipterocarpaceae) for forensic timber identification

Ng, Chin Hong; Ng, Kevin Kit Siong; Lee, Soon Leong; Zakaria, Nurul-Farhanah; Lee, Chai Ting; Tnah, Lee Hong

doi:10.1038/s41598-022-13697-x

Download PDF

Article
Open access
Published: 09 June 2022

DNA databases of an important tropical timber tree species Shorea leprosula (Dipterocarpaceae) for forensic timber identification

Chin Hong Ng¹,
Kevin Kit Siong Ng¹,
Soon Leong Lee¹,
Nurul-Farhanah Zakaria¹,
Chai Ting Lee¹ &
…
Lee Hong Tnah¹

Scientific Reports volume 12, Article number: 9546 (2022) Cite this article

3 Citations
1 Altmetric
Metrics details

Subjects

Abstract

International timber trade communities are increasingly demanding that timber in the wood supply chain be sourced from sustainably harvested forests and certified plantations. This is to combat illegal logging activities to prevent further depletion of our precious forests worldwide. Hence, timber tracking tools are important to support law enforcement officials in ensuring only sustainably harvested timbers are traded in the market. In this study, we developed chloroplast DNA (cpDNA) and simple sequence repeat (SSR) databases as tracking tools for an important tropical timber tree species, Shorea leprosula from Peninsular Malaysia. A total of 1410 individual trees were sampled from 44 natural populations throughout Peninsular Malaysia. Four cpDNA regions were used to generate a cpDNA haplotype database, resulting in a haplotype map comprising 22 unique haplotypes derived from 28 informative intraspecific variable sites. This cpDNA database can be used to trace the origin of an unknown log at the regional level. Ten SSR loci were used to develop the SSR allele frequency database. Bayesian cluster analysis divided the 44 populations into two genetic clusters corresponding to Region A and Region B. Based on conservativeness evaluation of the SSR databases for individual identification, the coancestry coefficients (θ) were adjusted to 0.1900 and 0.1500 for Region A and B, respectively. These databases are useful tools to complement existing timber tracking systems in ensuring only legally sourced timbers are allowed to enter the wood supply chain.

Genome skimming and exploration of DNA barcodes for Taiwan endemic cypresses

Article Open access 26 November 2020

The genetic legacy of fragmentation and overexploitation in the threatened medicinal African pepper-bark tree, Warburgia salutaris

Article Open access 12 November 2020

Genetic diversity and conservation of Siberian apricot (Prunus sibirica L.) based on microsatellite markers

Article Open access 11 July 2023

Introduction

Shorea leprosula Miq. (locally known as Meranti Tembaga) is a tropical tree species belonging to the Dipterocarpaceae family native to Southeast Asia, where it is widely distributed throughout the tropical rainforests of Peninsular Malaysia, Sumatra, and Borneo¹. It is commonly found in lowland and hill dipterocarp forests below 700 m elevation¹. It is classified as a near-threatened species under the International Union for Conservation of Nature (IUCN) Red List². The harvested wood is internationally traded under the Light Red Meranti timber group as a general utility timber for the production of furniture, panelling, flooring, and plywood³. Due to easy accessibility of the species from lowland forests, it is prone to be targeted for illegal logging. Illegal logging is a lucrative business, which is associated with a total global market value between USD30-157 billion annually⁴. To address the issue of illegal logging, consumer countries have developed measures to ban the import of illegally-logged timber by implementing legality verification systems; such as through the United States Lacey Act (2008), the European Union Timber Regulation (2010), Australia Illegal Logging Prohibition Act (2012), the Japanese Clean Wood Act (2017); and most recently, the United Kingdom Timber Regulation (2021). In addition, the United Nations Office on Drugs and Crime (UNODC) has also produced a guide on best practices for forensic timber identification, aiming to overcome the challenges posed by illegal logging and environmental degradation ⁵. Such a complex set of urgent issues need to be addressed as reports have shown that an area of forest approximately equivalent to the size of Austria (83,871 km²) disappears worldwide every year as the result of illegal logging⁶.

To prevent illegal timber from entering the wood supply chain, enforcement authorities use timber tracking methods to verify the legality of the harvested wood. The most commonly used methods include paper based documentation, painted identification marks and radio frequency identification (RFID) tags⁷. In comparison, methods based on inherent wood characteristics such as visual identification (wood anatomy), genetics (DNA barcoding, haplotype map, DNA profiling), and chemical methods (stable isotopes, mass spectrometry and near infrared spectroscopy) can provide more reliable forensic timber identification⁸. Each tool has its own strength and in combination they complement one another allowing authorities to overcome limitations of more traditional methods in species identification, geographic origin verification or linking illegal logs to the stumps of origin.

Genetic approaches have been used to determine the origin of wood samples from many important species, including Neobalanocarpus heimii^9,10, Gonystylus bancanus¹¹, Acer macrophyllum¹², Cedrela odorata¹³ and Chamaecyparis taiwanensis¹⁴. To develop timber tracking tools suitable for these species, researchers applied the principles of population genetics such as mutation, genetic drift, migration, adaptation, and speciation⁸. These methodologies utilise genetic material (genetic markers) common across groups of individuals to define populations for provenance testing or to define species for species identification⁸. During forensic timber identification, enforcement officers need to identify unknown samples at genus or species level correctly from the start, before further investigating geographic origin or individual identification. This process is commonly supported by wood anatomists through the examination of the internal structure of the specimen in comparison to reference materials¹⁵. In addition, such identifications can also be achieved using DNA barcoding technology based on nucleotide variation at specific gene regions¹⁶. One example of this is the CITES listed species with the genus Gonystylus which can be distinguished from other closely related species using a combination of genetic markers including internal transcribed spacer (ITS2), trnH-psbA intergenic spacer and trnL¹¹. Once the particular species is identified, we can track the geographic region of origin using a population identification database developed from cpDNA^9,17 or single nucleotide polymorphisms (SNP) markers¹³. If the suspected log can be traced back to a particular geographic region, we can use an individual identification database to link the log to the original stump^11,12. Subsequently, the confidence level of the match probability can be tested by calculation of random match probability between the log and stump.

The Forest Research Institute Malaysia (FRIM) has developed comprehensive DNA profiling databases for several important tropical timber species for timber tracking, namely N. heimii¹⁰, G. bancanus¹¹, S. platyclados¹⁸, Intsia palembanica¹⁹ and Aquilaria malaccensis²⁰. As an extension, this study aimed to develop tracking tools for S. leprosula in the context of forensic identification. Specifically, we utilised cpDNA markers to develop a haplotype database and SSR markers to establish an allele frequency database for this important tropical timber species. We can use the cpDNA haplotype database to infer the geographic origin of an unknown sample to the regional level. Subsequently, by using the SSR allele frequency database, we can calculate the random match probability to support the strength of evidence in cases where the suspect log matches the tree stump. This gives a new impetus for higher acceptance of evidence by the judge, which will improve the success rate of prosecutions of illegal logging perpetrators.

Results

cpDNA haplotype database

DNA sequencing of the choloroplast (cp) markers produced sequences of the following lengths: 573 bp (atpB-rbcL); 487 bp (petG-trnP); 500 bp (trnL1-trnL2); and 593 bp (psbM-trnD). Alignment of the 352 individuals from the 44 populations yielded a total 28 variable sites: 11 in the atpB-rbcL spacer, seven in both the petG-trnP and psbM-trnD spacers, and three in the trnL1-trnL2 spacer (Supplementary Table S1). Based on these 28 variable sites (21 base substitutions and 7 deletions) across the combined intergenic regions, a total of 22 unique haplotypes were found (Fig. 1a).

SSR allele frequency database

The reproducibility of SSR genotyping was confirmed by achieving consistent genotypes from five independent PCR amplifications on a single individual for each of the ten SSR loci. Individual bar plots from STRUCTURE analysis are presented in Fig. 1b. At the highest Delta K likelihood scores, the best representation of the data was K = 2 suggesting that the 44 populations in Peninsular Malaysia can be divided into two main genetic clusters: Region A and Region B. The first cluster, ‘Region A’ consists of 12 populations, namely SBadak, BPerangin, BEnggang, GJerai, RTelui, GInas, GBongsu, Belum, Piah, BHijau, Korbu and Bubu. The second cluster, ‘Region B’ consists of 32 populations, namely Behrang, Ampang, HGombak, HLangat, SLalang, PPanjang, Berembun, Angsi, Kenaboi, Triang, Pasoh, BSenggeh, GLedang, Krau, TNegara, Terenggun, SBetis, USat, CTongkat, HTerengganu, Jengai, AGading, Tekam, Beserah, Jengka, Lentang, Lesong, ERompin, GArong, Labis, AHitam and Panti. Similarly, the UPGMA dendrogram analysis also divided the 44 populations into two genetic clusters (Fig. 2) corresponding to Region A and B of the STRUCTURE result.

SSR allele frequency databases were established according to Region A and B, and characterized to evaluate the relative usefulness of each SSR marker in forensic investigation. The distribution of allele frequencies for each locus is listed in Table S2 (Region A database) and Table S3 (Region B database). Forensic parameters are shown in Table 1, with a total of 143 alleles and 174 alleles detected in the Region A and B databases, respectively. The observed (H_o) and expected (H_e) heterozygosity ranged from 0.3570 to 0.8346 and 0.4375 to 0.8795, respectively for populations in the Region A database; and ranged from 0.3298 to 0.8356 and 0.3469 to 0.8793, respectively for populations in the Region B database. The power of discrimination (PD) for the SSR loci ranged from 0.601 to 0.972 and 0.554 to 0.975, in Region A and B databases, respectively. The most discriminating locus was Sle605 in both the Region A (PD = 0.972) and Region B (PD = 0.975) databases. Minimum allele frequency was adjusted for alleles falling below the thresholds of 0.0066 (Region A) and 0.0024 (Region B).

Table 1 Genetic diversity and forensic variables (A: total number of alleles; H_o: observed heterozygosity; H_e: expected heterozygosity; PIC: polymorphic information content; HWE: Hardy–Weinberg equilibrium; MP: matching probability; PD: power of discrimination) for each the 10 SSR loci of Shorea leprosula in the Region A and B databases.

Full size table

Deviations from HWE were detected in four of the SSR loci for Region A (SleT11, SleT15, SleT17 and Sle465) and six SSR loci in Region B (SleT01, SleT11, SleT15, SleT17, SleT29 and SleT31). We evaluated these loci in each population independently to rule out the possible presence of null alleles. There were four populations in Region A (GJerai, RTelui, GBongsu and Piah) where a single one locus deviated from HWE; whereas there were eight populations in Region B (Behrang, HGombak, SLalang, Angsi, Klau, USat, Jengka and Panti) with a single locus and a single population (GLedang) with two loci that deviated from HWE (Table S4). Observed deviation from HWE was substantially lower in each population (either absence or not more than two loci) and thus it might be due to Wahlund effect caused by population substructuring in both Region A and B. Linkage disequilibrium (LD) testing was used to evaluate the independence of frequencies for all the SSR genotypes. A total of 13.3% and 28.9% of the 45 pairwise loci were found significant evidence of LD for Region A and B, respectively. Some of the loci might be linked as a result of population substructuring and inbreeding (inbreeding coefficient = 0.0822 [Peninsular Malaysia]). These results are in line with observations in real populations, where the assumption of completely random mating and zero migration required for HWE and LD are unlikely to be met, either in humans, animals or plants ^21,22,23.

Mean self-assignment, the proportion of individuals correctly assigned back to their population, was 45.9% and ranged from 14.3% (Kenaboi) to 81.3% (CTongkat) between population (Table 2). At the regional level, correct assignment rate of individuals to their region of origin was higher, 87.4% for Region A and 90.0% for Region B, (average of 88.7%).

Table 2 Self-assignment test outcomes for Shorea leprosula individuals at the population and regional levels.

Full size table

Conservativeness of the database

The coancestry coefficient (θ) for Peninsular Malaysia (0.0579) was higher than those of Region A (0.0454) and Region B (0.0500) (Table 3). A total of 4.54% and 5.00% of the genetic variability was distributed among populations within Region A and Region B, respectively. In terms of inbreeding coefficient (f), the value for the Region A database (f = 0.0892) was highest, followed by Peninsular Malaysia (f = 0.0822) and Region B (f = 0.0666). All the θ and f values were significantly greater than zero, demonstrated by the 95% confidence intervals not overlapping with zero. Both of the θ and f values were used to calculate the conservativeness of each database by testing the cognate database (P_origin) against the regional database (P_combined). The databases were non-conservative at the calculated θ value. In order for both the Region databases (A and B) to be conservative, the value of θ was adjusted from 0.0454 to 0.1900 for Region A and from 0.0500 to 0.1500 for Region B. For the Region A database, the most common SSR profile frequency is 2.69 × 10^–7 or 1 in 3.72 million and the rarest profile frequency is 1.84 × 10^–14 or 1 in 54.3 trillion. For the Region B database, the most common SSR profile frequency is 1.06 × 10^–7 or 1 in 9.43 million and the rarest profile frequency is 4.03 × 10^–16 or 1 in 2.48 quadrillion.

Table 3 Coancestry (θ) and inbreeding (f) coefficients for Shorea leprosula at each hierarchical level.

Full size table

Discussion

At the moment, the database is not accessible to the public. However, the public can contribute by reporting suspicious illegal logging activities to the relevant enforcement unit, so that actions can be taken and suspicious samples collected for forensic timber identification. In addressing forensic timber identification questions, we use a cpDNA haplotype database to investigate the geographic origin of a suspect log at a regional level. Subsequently, we use an SSR allele frequency database to narrow down the geographic origin to population level. After the population is identified, the state forest department’s enforcement officials can verify in their system if the area was permitted for logging activities. If that area is a forest reserve where no logging permit is being issued, efforts to locate potential stumps belonging to the sampled log can be initiated. Once potential stumps are found, we can try to link the log to the potential stumps by comparing their SSR DNA profiles. A random match probability between the log and the potential stumps can be established by using the SSR allele frequency database.

From the cpDNA haplotype database, haplotypes H1 and H2 were most prevalent in Peninsular Malaysia, with a frequency of 47.2% and 42.6%, respectively. The distribution of cpDNA haplotypes is overlaid by the division of populations in Region A and B as suggested by the STRUCTURE analysis. For Region A, haplotype H2 was found in all the populations, either in all the samples (BPerangin and BEnggang) or part of the samples. The less common haplotypes, H3 (1.7%) and H4 (1.1%) were also found in Region A. Overall, we observed haplotype H2 dominates the populations in this region. Whereas for Region B, 78% of populations carried haplotype H1 in all the samples (Terenggun, Krau, Behrang, HLangat, Berembun, Kenaboi, Angsi, Triang, Pasoh, BSenggeh, Labis and ERompin), with the exception of some populations which exhibited part of their samples carried haplotype H1 (TNegara, AGading, Tekam, Jengka, Beserah, Ampang, Lentang, HGombak, SLalang, Lesong, GLedang, PPanjang and AHitam). In addition, the less common haplotype H5 (1.1%) is found solely in this region. As a whole, haplotype H1 dominates the populations in Region B. Those haplotype H2 found in the populations of Region B might be due to the retention of ancestral polymorphism by the maternally inherited cpDNA marker²⁴. The remaining rare haplotypes, H6-H22, present in one or two individuals are endemic to certain populations, as shown in Fig. 1a.

Based on the cpDNA haplotypes, S. leprosula individuals from Peninsular Malaysia can be traced back to their geographical origin in either Region A or B. In forensic investigation, if the generated haplotype of an unknown log belongs to haplotype H3 or H4, we can postulate that it might have originated from Region A. Similarly, if haplotype H1 or H5 were detected, then Region B would be the most likely source of origin. However, based only on the cpDNA haplotype database, it is impossible to track an unknown log back to a specific population or forest reserve because forest reserve boundaries were defined according to political governance and thus may not necessarily reflect the distribution of natural populations of the species. It should be noted that some rare haplotypes might not be represented in the database, as it is impossible to collect all S. leprosula trees from every forest reserve in Peninsular Malaysia. We can include more sampling sites in the future to improve the comprehensiveness of the cpDNA haplotype database. Particularly, the inclusion of populations from other distributions such as Sumatra and Borneo could provide some insights on the evolutionary history and gene flow of the species due to isolation and separation by South China Sea between Peninsular Malaysia and Borneo as well as by Straits of Malacca between Peninsular Malaysia and Sumatra.

Once the geographical origin at regional level is ascertained, an assignment test based on the SSR allele frequency database can be used to trace the samples origin to population level. In this study, we observed low assignment rates to origin populations, which may be due to the weak genetic structure (θ = 0.058) observed in this species. The value of θ shows that only 5.8% of genetic variability was found distributed among populations, thus suggesting high genetic similarity. This θ value (0.058) was higher than I. palembanica (0.026)¹⁹ and S. platyclados (0.033)¹⁸ but lower than G. bancanus (0.067)¹¹, A. malaccensis (0.097)²⁰ and N. heimii (0.127)¹⁰. Previous study suggested that populations of S. leprosula sampled from Peninsular Malaysia were a continuous, connected forest in the past, particularly in the low inland forests²⁵. Continuous distribution would promote gene flow among populations through the sharing of a common gene pool, as shown by the common haplotypes H1 and H2 observed in the cpDNA population database. The current mean assignment rate at the population level is 45.90%, which is lower than those seen in other tropical species such as G. bancanus (54.80%)¹¹, I. palembanica (62.20%)¹⁹, S. platyclados (77.78%)¹⁸ and A. malaccensis (92.09%)²⁰. At the regional level, the mean assignment rate to region is 88.70%, which is higher than seen in I. palembanica (80.21%)¹⁹ but lower than A. malaccensis (94.96%)²⁰, S. platyclados (99.11%)¹⁸ and G. bancanus (100%)¹¹.

The identification of illegal logging sites can be achieved under two circumstances. Firstly, by utilising assignment tests based on the SSR allele frequency database to locate the original population for the suspected illegal log. Secondly, if the Forest Department has received report on illegal logging activities in a specified area. As such, with help from experienced foresters and local indigenous people who are familiar with their local forest area, it is possible to find and sample the potential stumps which potentially match the suspect log within the forest. Once potential stumps are found, a tissue sample can be collected for DNA testing following FRIM’s standard operating protocol on DNA forensics for plant species identification and wood tracking²⁶. If the suspect log shows a similar SSR profile to a particular stump, we can calculate a random match probability by using the SSR allele frequency database with corrected θ value. By considering both population substructuring and inbreeding coefficient, the adjusted θ value will increase the profile frequency but conversely, understating the weight of the DNA evidence against a defendant²⁷, should the matter be brought before the legal system. Random match probability is the reciprocal of profile frequency (1/profile frequency), representing the estimated frequency at which a particular SSR profile would be expected to occur in a population²¹. This will help to determine the probability of a match between an unknown log and its potential origin stump. The possible profile frequency based on the 10 SSR loci ranges between the profile frequency of the most common genotype which would be the least powerful in terms of differentiating between two unrelated individuals ²⁸, and the rarest theoretical profile. Based on the Region A database, the possible SSR profile frequencies range from 2.69 × 10^–7 to 1.84 × 10^–14, and for the Region B database, from 1.06 × 10^–7 to 4.03 × 10^–16. With such low profile frequencies, we can rule out the possibility of a random match between the DNA profiles of any log and stump²¹.

In this study, we obtained cpDNA fragment and SSR loci using high quality samples such as inner bark or leaf tissue preserved in liquid nitrogen. However, many seized woods or logs are usually have been dried or processed in practice. Thus, this may pose a challenge to extract sufficient and good quality DNA from dry wood for subsequent DNA analysis. To close the DNA extraction gap in S. leprosula, our future study is to develop a suitable DNA extraction method for dry wood and processed sample. The extracted DNA is then tested by PCR amplification on both cpDNA and SSR markers utilized in the DNA databases.

Conclusions

We report on the development of cpDNA haplotype and SSR allele frequency databases for an important timber species, S. leprosula in Peninsular Malaysia. The cpDNA haplotype database enables the tracing of unknown log at the regional level. The SSR allele frequency database was validated for specificity and accuracy for the calculation of random match probability of an unknown log to a potential origin stump. This database along with the existing reference databases in other important forest timber species will serve as an impetus and increase the use of DNA technology in illegal logging investigations and verification of legality in wood supply chains.

Methods

Sample collection and DNA extraction

In this study, 1,410 S. leprosula wild samples representing 44 populations from the natural forests distributed throughout Peninsular Malaysia (Table 4) were collected. The sample collection was carried out with the permissions granted from the State Forest Departments (Kedah, Perak, Kelantan, Terengganu, Pahang, Selangor, Negeri Sembilan, Melaka and Johor), the Department of Wildlife and National Parks, Royal Belum State Park and Johor National Parks Corporation. The voucher specimen was identified by Ramli Ponyoh and deposited in FRIM herbarium centre (voucher number = A4363). Cambium or leaf tissues was collected from each sample and kept in liquid nitrogen during transportation from the field to laboratory. Total genomic DNA was extracted using the 2× cetyltrimethylammonium bromide (CTAB)²⁹ procedure and purified using the High Pure PCR Template Preparation Kit (Roche Diagnostics, GmbH, Penzberg, Germany). For cpDNA analysis, eight purified DNA samples per population were used (8 × 44 populations = 352 samples), while all the purified DNA samples (1,410 samples) were used for SSR analysis.

Table 4 Names, geographic locations, altitude and sample number for 44 populations of Shorea leprosula in Peninsular Malaysia.

Full size table

cpDNA haplotype database

The target cpDNA regions were amplified using four intergenic spacer primers namely atpB-rbcL, petG-trnP, trnL1-trnL2 and psbM-trnD (Supplementary Table S5). These four cpDNA markers were selected based on their informative intraspecific variability identified in eight individuals from different populations of S. leprosula. Each reaction was performed in a 10 μL total volume composed of 1× Type-it Multiplex PCR Master Mix (Qiagen), 0.2 μM each primer and 10 ng of template DNA on a 2720 Thermal Cycler (Applied Biosystems, Foster City, CA). The thermal cycling conditions consist of a first activation step at 95 °C for 5 min, followed by 35 cycles of denaturation at 95 °C for 30 s, annealing at 50 °C for 90 s, and extension at 72 °C for 1 min; with a final extension step at 60 °C for 30 min. An ABI 3130xl capillary sequencer (Applied Biosystems) was used to sequence both forward and reverse directions of each PCR product. The sequence data was edited and analysed using Sequencher v.5.1 (Gene Codes Corporation, Michigan, USA). We identified chloroplast haplotypes by taking into account insertion/deletion, and substitution among sequences. A total of 32 sequences were deposited in GenBank with accession numbers ranging from MZ419000 to MZ419031.

SSR allele frequency database

We used ten SSR loci, namely SleT01, SleT09, SleT11, SleT15, SleT17, SleT29, SleT31 (unique to this study: Supplementary material S1), Sle267, Sle465 and Sle605³⁰ to genotype the 1410 individuals. Information on primer sequence, repeat motifs, allele size range, fluorescent label and GenBank accession number for the ten SSR markers are listed in supplementary Table S6. The PCR mix consists of 1× Type-it Multiplex PCR Master Mix (Qiagen), 0.4 μM for each primer and 10 ng of template DNA. The forward primer was fluorescently labelled and mixed with nonlabelled primer at a ratio of 1:10. PCR amplification was performed using the programme: activation step at 95 °C for 5 min, followed by 40 cycles of a denaturation step at 95 °C for 30 s, annealing at 55 °C for 90 s, and extension at 72 °C for 30 s; and a final extension at 72 °C for 45 min. PCR products were electrophoretically separated using an ABI 3130xl capillary sequencer (Applied Biosystems) with GeneScan 400HD ROX size standard (Applied Biosystems). Fragment sizes were determined using GeneMarker v2.6.4 software (Soft Genetics LLC, Pennsylvania, USA). To evaluate the reproducibility of all SSR loci, five independent PCR amplifications were performed on one individual³¹.

Statistical analysis

For cluster analysis, we used STRUCTURE v2.3.4 to run our dataset using a burn-in length of 100,000 and 200,000 steps for the Markov Chain Monte Carlo (MCMC)³². We applied models of admixture with sampling locations included as prior population information. Correlated allele frequencies were applied with K values ranging from 1 to 10 for 10 repetitions. The optimal number of genetic clusters was estimated based on the Delta K method³³ of STRUCTURE SELECTOR³⁴. For the optimal K, data from the 10 independent runs of STRUCTURE analyses were graphically represented using CLUMPAK³⁵. To support the analysis of the population structure, a UPGMA dendrogram was constructed based on Nei’s D_A using POPTREE2³⁶. 1000 bootstrap replicates were applied to determine the relative strength of the nodes.

The populations of S. leprosula were divided into two regions, Region A and Region B based on the optimal value of K = 2 derived above. Subsequently, the SSR database was built for Region A and B, comprising 381 (12 populations) and 1029 (32 populations) individuals, respectively. Allele frequency for each locus was calculated using Microsatellite Toolkit³⁷. The number of alleles per locus (A), observed (H_o) and expected heterozygosity (H_e), conformity to Hardy–Weinberg equilibrium (HWE) expectations and linkage disequilibrium (LD) between loci were assessed using Fisher’s exact tests in Genetic Data Analysis (GDA) v1.1 ³⁸. The p value for departure from HWE and LD was adjusted by Bonferroni correction³⁹. Forensic parameters including polymorphic information content (PIC), matching probability (MP) and power of discrimination (PD) were assessed using PowerStats v1.2⁴⁰. Coancestry (θ) and inbreeding (f) coefficients for the combined database (Peninsular Malaysia) and regional database (Region A and B) were calculated with 1000 bootstrap replicates in GDA⁴¹. Self-assignment tests were used to evaluate the proportion of correctly assigned individuals to the designated population and region as implemented in GENECLASS2⁴².

The subpopulation-cum-inbreeding model was used to calculate the profile frequency by multiplying the frequency of each locus across all the loci⁴³. The most common and rarest profile frequencies were calculated by considering an individual sample that is heterozygous at all loci possessing the two most common alleles and rarest alleles at each locus, respectively. The conservativeness of the database was estimated by calculating the full profile frequency of each individual using genotype frequencies derived from the cognate database (P_origin) against profile frequency of each individual using genotype frequencies derived from the regional database (P_combined). The relative difference between the databases (d) were defined as d = log₁₀ (P_origin/P_combined). If the d value was negative, in the case of P_origin was less than P_combined, it suggests that the database is conservative²⁷. For a non-conservative database, in the case of positive d value, a series of θ adjustments were applied to recalculate P_combined until all samples present a negative d value.

Plant collection declaration

We declare that all our experimental research and field sampling of plant material comply with local, national or international guidelines and legislation.

Data availability

Raw sequence information and SSR primer pairs have been deposited to NCBI; GenBank accession numbers are provided in Table S6.

References

Symington, C. F. Foresters’ Manual of Dipterocarps. 519 (Forest Research Institute Malaysia & Malaysian Nature Society, 2004).
Google Scholar
Pooma, R. & Newman, M. F. Shorea leprosula. The IUCN Red List of Threatened Species 2017: e.T33123A2833148. https://doi.org/10.2305/IUCN.UK.2017-3.RLTS.T33123A2833148.en. http://www.iucnredlist.org (2017).
Gan, K. S. & Lim, S. C. Common Commercial Timbers of Peninsular Malaysia (Forest Research Institute Malaysia, 2004).
Google Scholar
Montero, J. J. M., Wright, E. & Khan, M. N. Illegal Logging, Fishing, and Wildlife Trade: The Costs and How to Combat It (World Bank, 2019).
Google Scholar
UNODC. Best Practice Guide for Forensic Timber Identification. (United Nations Office on Drugs and Crime, 2016).
Magrath, W., Younger, P. & Phan, H. Chainsaw Project. An Interpol Perspective on Law Enforcement in Illegal Logging (The World Bank and Interpol, 2009).
Google Scholar
Dormontt, E. E. et al. Forensic timber identification: It’s time to integrate disciplines to combat illegal logging. Biol. Conserv. 191, 790–798. https://doi.org/10.1016/j.biocon.2015.06.038 (2015).
Article Google Scholar
Schmitz, N. et al. Overview of Current Practices in Data Analysis for Wood Identification. A Guide for the Different Timber Tracking Methods (Global Timber Tracking Network, GTTN secretariat, European Forest Institute and Thunen Institute, 2020).
Google Scholar
Tnah, L. H. et al. Geographical traceability of an important tropical timber (Neobalanocarpus heimii) inferred from chloroplast DNA. For. Ecol. Manag. 258, 1918–1923. https://doi.org/10.1016/j.foreco.2009.07.029 (2009).
Article Google Scholar
Tnah, L. H., Lee, S. L., Ng, K. K. S., Faridah, Q.-Z. & Faridah-Hanum, I. Forensic DNA profiling of tropical timber species in Peninsular Malaysia. For. Ecol. Manag. 259, 1436–1446. https://doi.org/10.1016/j.foreco.2010.01.017 (2010).
Article Google Scholar
Ng, K. K. S. et al. Forensic timber identification: A case study of a CITES listed species, Gonystylus bancanus (Thymelaeaceae). Forensic Sci. Int. Genet. 23, 197–209. https://doi.org/10.1016/j.fsigen.2016.05.002 (2016).
Article CAS PubMed Google Scholar
Dormontt, E. E. et al. Forensic validation of a SNP and INDEL panel for individualisation of timber from bigleaf maple (Acer macrophyllum Pursch). Forensic Sci. Int. Genet. 46, 102252. https://doi.org/10.1016/j.fsigen.2020.102252 (2020).
Article CAS PubMed Google Scholar
Finch, K. et al. Predicting the geographic origin of Spanish Cedar (Cedrela odorata L.) based on DNA variation. Conserv. Genet. https://doi.org/10.1007/s10592-020-01282-6 (2020).
Article Google Scholar
Huang, C.-J. et al. Development and technical application of SSR-based individual identification system for Chamaecyparis taiwanensis against illegal logging convictions. Sci. Rep. 10, 1–14 (2020).
Article Google Scholar
Carlquist, S. Comparative Wood Anatomy: Systematic, Ecological, and Evolutionary Aspects of Dicotyledon Wood (Springer, 2001).
Book Google Scholar
Kress, W. J. Plant DNA barcodes: Applications today and in the future. J. Syst. Evol. 55, 291–307 (2017).
Article Google Scholar
Deguilloux, M. F., Pemonge, M. H., Bertel, L., Kremer, A. & Petit, R. J. Checking the geographical origin of oak wood: Molecular and statistical tools. Mol. Ecol. 12, 1629–1636. https://doi.org/10.1046/j.1365-294X.2003.01836.x (2003).
Article CAS PubMed Google Scholar
Ng, C. H. et al. Geographic origin and individual assignment of Shorea platyclados (Dipterocarpaceae) for forensic identification. PLoS ONE 12, e0176158. https://doi.org/10.1371/journal.pone.0176158 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ng, C. H. et al. A geographical traceability system for Merbau (Intsia palembanica Miq.), an important timber species from peninsular Malaysia. Forensic Sci. Int. Genet. https://doi.org/10.1016/j.fsigen.2019.102188 (2020).
Article PubMed Google Scholar
Lee, S. L. et al. DNA databases of a CITES listed species Aquilaria malaccensis (Thymelaeaceae) as the tracking tools for forensic identification and chain of custody certification. Forensic Sci. Int. Genet. https://doi.org/10.1016/j.fsigen.2021.102658 (2022)
Butler, J. M. Forensic DNA Typing: Biology, Technology, and Genetics of STR Markers (Elsevier Science, 2005).
Google Scholar
Waits, L. P., Luikart, G. & Taberlet, P. Estimating the probability of identity among genotypes in natural populations: Cautions and guidelines. Mol. Ecol. 10, 249–256 (2001).
Article CAS Google Scholar
Triggs, C. M. & Buckleton, J. S. Logical implications of applying the principles of population genetics to the interpretation of DNA profiling evidence. Forensic Sci. Int. 128, 108–114 (2002).
Article CAS Google Scholar
Zong, Y. et al. Chloroplast DNA-based genetic diversity and phylogeography of Pyrus betulaefolia (Rosaceae) in Northern China. Tree Genet. Genom. 10, 739–749. https://doi.org/10.1007/s11295-014-0718-0 (2014).
Article Google Scholar
Lee, S. L., Wickneswari, R., Mahani, M. C. & Zakri, A. H. Genetic diversity of a tropical tree species, Shorea leprosula Miq. (Dipterocarpaceae), in Malaysia: Implications for conservation of genetic resources and tree improvement. Biotropica 32, 213–224 (2000).
Article Google Scholar
Lee, S. L. et al. Standard Operating Procedure on DNA Forensics for Plant Species Identification and Wood Tracking. Version 1.5 edn. (Forest Research Institute Malaysia, 2015).
Gill, P., Foreman, L., Buckleton, J. S., Triggs, C. M. & Allen, H. A comparison of adjustment methods to test the robustness of an STR DNA database comprised of 24 European populations. Forensic Sci. Int. 131, 184–196. https://doi.org/10.1016/S0379-0738(02)00423-1 (2003).
Article CAS PubMed Google Scholar
Edwards, A., Hammond, H. A., Jin, L., Caskey, C. T. & Chakraborty, R. Genetic variation at five trimeric and tetrameric tandem repeat loci in four human population groups. Genomics 12, 241–253. https://doi.org/10.1016/0888-7543(92)90371-X (1992).
Article CAS PubMed Google Scholar
Murray, M. & Thompson, W. F. Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 8, 4321–4325 (1980).
Article CAS Google Scholar
Lee, S., Tani, N., Ng, K. & Tsumura, Y. Isolation and characterization of 20 microsatellite loci for an important tropical tree Shorea leprosula (Dipterocarpaceae) and their applicability to S. parvifolia. Mol. Ecol. Notes 4, 222-225. (2004).
Article CAS Google Scholar
Dawnay, N. et al. A forensic STR profiling system for the Eurasian badger: A framework for developing profiling systems for wildlife species. Forensic Sci. Int. Genet. 2, 47–53. https://doi.org/10.1016/j.fsigen.2007.08.006 (2008).
Article PubMed Google Scholar
Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).
Article CAS Google Scholar
Evanno, G., Regnaut, S. & Goudet, J. Detecting the number of clusters of individuals using the software structure: A simulation study. Mol. Ecol. 14, 2611–2620. https://doi.org/10.1111/j.1365-294X.2005.02553.x (2005).
Article CAS PubMed Google Scholar
Li, Y.-L. & Liu, J.-X. StructureSelector: A web-based software to select and visualize the optimal number of clusters using multiple methods. Mol. Ecol. Resour. 18, 176–177. https://doi.org/10.1111/1755-0998.12719 (2018).
Article PubMed Google Scholar
Kopelman, N. M., Mayzel, J., Jakobsson, M., Rosenberg, N. A. & Mayrose, I. Clumpak: A program for identifying clustering modes and packaging population structure inferences across K. Mol. Ecol. Resour. 15, 1179–1191. https://doi.org/10.1111/1755-0998.12387 (2015).
Article CAS PubMed PubMed Central Google Scholar
Takezaki, N., Nei, M. & Tamura, K. POPTREEW: Web version of POPTREE for constructing population trees from allele frequency data and computing some other quantities. Mol. Biol. Evol. 31, 1622–1624 (2014).
Article CAS Google Scholar
Park, S. Excel Microsatellite Toolkit. Version 3.1. 1. Animal Genomics Lab Website. (University College, 2008).
Lewis, P. O. & Zaykin, D. Genetic Data Analysis (GDA): Computer Program for the Analysis of Allelic Data. Version 1.1. http://phylogeny.uconn.edu/software/ (University of Connecticut, 2001).
Rice, W. R. Analyzing tables of statistical tests. Evolution 43, 223–225 (1989).
Article Google Scholar
Tereba, A. Tools for analysis of population statistics. Profiles DNA 2, 14–16 (1999).
Google Scholar
Weir, B. S. & Cockerham, C. C. Estimating F statistics for the analysis of population structure. Evolution 38, 1358–1370 (1984).
CAS PubMed Google Scholar
Piry, S. et al. GENECLASS2: A software for genetic assignment and first-generation migrant detection. J. Hered. 95, 536–539 (2004).
Article CAS Google Scholar
Ayres, K. L. & Overall, A. D. J. Allowing for within-subpopulation inbreeding in forensic match probabilities. Forensic Sci. Int. 103, 207–216. https://doi.org/10.1016/S0379-0738(99)00087-0 (1999).
Article Google Scholar

Download references

Acknowledgements

This study was supported by the Government of Malaysia under the 11th Malaysia Plan 2016-2020 and 12th Malaysia Plan 2021–2025. We acknowledge the State Forest Departments (Kedah, Perak, Kelantan, Terengganu, Pahang, Selangor, Negeri Sembilan, Melaka and Johor), the Department of Wildlife and National Parks, Royal Belum State Park and Johor National Parks Corporation in granting permissions for sample collection and logistic assistance in the forest. The authors thank Ghazali Jaafar, Yahya Marhani, Ramli Ponyoh, Yasri Baya and Sharifah Talib for excellent assistance in the field and laboratory.

Author information

Authors and Affiliations

Genetics Laboratory, Forest Research Institute Malaysia, 52109, Kepong, Selangor, Malaysia
Chin Hong Ng, Kevin Kit Siong Ng, Soon Leong Lee, Nurul-Farhanah Zakaria, Chai Ting Lee & Lee Hong Tnah

Authors

Chin Hong Ng
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Kit Siong Ng
View author publications
You can also search for this author in PubMed Google Scholar
Soon Leong Lee
View author publications
You can also search for this author in PubMed Google Scholar
Nurul-Farhanah Zakaria
View author publications
You can also search for this author in PubMed Google Scholar
Chai Ting Lee
View author publications
You can also search for this author in PubMed Google Scholar
Lee Hong Tnah
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.H.N., K.K.S.N. and S.L.L. conceptualizing the work and study design; C.H.N., K.K.S.N., S.L.L., N.F.Z., C.T.L. and L.H.T. contributed to the experimental work and data analysis. All authors reviewed the manuscript.

Corresponding author

Correspondence to Chin Hong Ng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ng, C.H., Ng, K.K.S., Lee, S.L. et al. DNA databases of an important tropical timber tree species Shorea leprosula (Dipterocarpaceae) for forensic timber identification. Sci Rep 12, 9546 (2022). https://doi.org/10.1038/s41598-022-13697-x

Download citation

Received: 01 April 2022
Accepted: 26 May 2022
Published: 09 June 2022
DOI: https://doi.org/10.1038/s41598-022-13697-x

This article is cited by

Genetic diversity and population structure of Rubroshorea curtisii, a hill dipterocarp species: implications for conservation
- Chin Hong Ng
- Kevin Kit Siong Ng
- Khairunnisa M. Mamat
Tree Genetics & Genomes (2024)
Tracing the geographic origin of planted tropical timber Neobalanocarpus heimii (chengal) with DNA approach
- Lee Hong Tnah
- Chai Ting Lee
- Soon Leong Lee
Conservation Genetics Resources (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Genome skimming and exploration of DNA barcodes for Taiwan endemic cypresses

The genetic legacy of fragmentation and overexploitation in the threatened medicinal African pepper-bark tree, Warburgia salutaris

Genetic diversity and conservation of Siberian apricot (Prunus sibirica L.) based on microsatellite markers

Introduction

Results

cpDNA haplotype database

SSR allele frequency database

Conservativeness of the database

Discussion

Conclusions

Methods

Sample collection and DNA extraction

cpDNA haplotype database

SSR allele frequency database

Statistical analysis

Plant collection declaration

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Genetic diversity and population structure of Rubroshorea curtisii, a hill dipterocarp species: implications for conservation

Tracing the geographic origin of planted tropical timber Neobalanocarpus heimii (chengal) with DNA approach

Comments

Search

Quick links