A DNA barcode library for woody plants in tropical and subtropical China

Jin, Lu; Shi, Hao-You; Li, Ting; Zhao, Nan; Xu, Yong; Xiao, Tian-Wen; Song, Feng; Ma, Chen-Xin; Li, Qiao-Ming; Lin, Lu-Xiang; Shao, Xiao-Na; Li, Bu-Hang; Mi, Xiang-Cheng; Ren, Hai-Bao; Qiao, Xiu-Juan; Lian, Ju-Yu; Du, Hu; Ge, Xue-Jun

doi:10.1038/s41597-023-02742-7

Download PDF

Data Descriptor
Open access
Published: 22 November 2023

A DNA barcode library for woody plants in tropical and subtropical China

Lu Jin¹^na1,
Hao-You Shi²^na1,
Ting Li³,
Nan Zhao⁴,
Yong Xu⁵,
Tian-Wen Xiao¹,
Feng Song ORCID: orcid.org/0000-0002-1332-312X⁶,
Chen-Xin Ma¹,
Qiao-Ming Li⁷,
Lu-Xiang Lin⁷,
Xiao-Na Shao⁷,
Bu-Hang Li⁸,
Xiang-Cheng Mi ORCID: orcid.org/0000-0002-2971-5881⁹,
Hai-Bao Ren⁹,
Xiu-Juan Qiao^10,11,
Ju-Yu Lian^1,12,
Hu Du¹³ &
…
Xue-Jun Ge¹

Scientific Data volume 10, Article number: 819 (2023) Cite this article

1377 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

The application of DNA barcoding has been significantly limited by the scarcity of reliable specimens and inadequate coverage and replication across all species. The deficiency of DNA barcode reference coverage is particularly striking for highly biodiverse subtropical and tropical regions. In this study, we present a comprehensive barcode library for woody plants in tropical and subtropical China. Our dataset includes a standard barcode library comprising the four most widely used barcodes (rbcL, matK, ITS, and ITS2) for 2,520 species from 4,654 samples across 49 orders, 144 families, and 693 genera, along with 79 samples identified at the genus level. This dataset also provides a super-barcode library consisting of 1,239 samples from 1,139 species, 411 genera, 113 families, and 40 orders. This newly developed library will serve as a valuable resource for DNA barcoding research in tropical and subtropical China and bordering countries, enable more accurate species identification, and contribute to the conservation and management of tropical and subtropical forests.

Phylogenomics and the rise of the angiosperms

Article Open access 24 April 2024

Complexity of avian evolution revealed by family-level genomes

Article 01 April 2024

Revealing uncertainty in the status of biodiversity change

Article Open access 27 March 2024

Background & Summary

Accurate species identification is crucial for biological research, particularly in the areas of biodiversity conservation and utilization. However, traditional morphology-based identification has significant limitations, including incorrect identifications, unrecognized cryptic species, the absence of diagnostic characters in specific developmental stages, and the need for specialized expertise¹. Moreover, woody plant identification in tropical or subtropical regions poses a formidable challenge owing to the lack of access to reproductive organs necessary to differentiate similar species during field surveys². To address these challenges, DNA barcoding has emerged as a powerful tool that can help circumvent the limitations of morphological identification^1,3.

DNA barcodes are short standardized sequences that can be used to identify species based on materials from the entire organism, fragmented tissue, or even environmental DNA⁴. However, while cytochrome c oxidase subunit 1 (CO1) performs well universally for animals, it is not appropriate for plants owing to the lower rates of divergence in plant compared to animal mitochondrial genomes³. The plant working group of the Consortium for the Barcoding of Life (CBOL) has recommended rbcL and matK as core barcodes for land plants after comparing the performance of 7 candidate plastid loci⁵. Further, the internal transcribed spacer (ITS) or ITS2 has been reported to have the highest degree of species discrimination for seed plants⁶. Based on these findings, plastid (rbcL and matK) and nuclear fragments (ITS/ITS2) have been widely used as standard DNA barcodes for plants.

Despite their wide application, standard DNA barcodes have insufficient variation, which limits their usefulness in identifying recently diverged and rapidly radiated groups^7,8. To address this issue, the use of whole plastid genomes as super-barcodes has been proposed^9,10,11. Ranging from 110 to 160 kbp, the plastid genome can provide more variation than standard DNA barcodes to distinguish closely related species, thus improving phylogenetic resolution at lower taxonomic levels in plant phylogenetic and population genetic analyses^11,12. Genome skimming, a low-coverage shotgun sequencing approach, has been applied widely to obtain complete plastid genomes and high-copy nuclear ribosomal sequences (nrDNA)^13,14,15,16. This method recovers all plastid loci and ITS simultaneously, which overcomes problems of low PCR efficiency and sequence retrieval for the standard barcode sequences, and contributes to the reference database for standard barcodes¹⁰.

The tropical and subtropical moist biomes of all continents have the highest tree species richness, with southeast Asia being one of the most diverse regions¹⁷. Within this region, China has exceptionally high biodiversity and endemism. The region of South-Central China is recognized as a hotspot for biodiversity but has experienced significant loss of habitats due to human activities¹⁸. According to the Atlas of Woody Plants in China¹⁹, there are 11,405 woody species in China, of which 244 (2.1%) are gymnosperms, 10,480 (91.9%) are dicots, and 664 (5.8%) are monocots. Woody plant species richness in China is concentrated primarily in the southern mountainous regions, which are dominated by subtropical evergreen broad-leaved and tropical monsoon rain forests. These regions include the south and southeast areas of Yunnan, mountains at the borders of Guangxi and Yunnan, and the Hengduan, Wuyi, Nanling, and Wuling Mountains.

Here, we developed a comprehensive barcode library that includes both standard barcodes and super-barcodes for woody plants in tropical and subtropical China. The standard barcode library contains the four most widely used barcodes (rbcL, matK, ITS, and ITS2) for 2,520 species from 4,733 samples across 49 orders, 144 families, and 683 genera, and includes 79 samples identified to the genus level, while the super-barcode library consists of 1,239 samples from 1,139 species, 411 genera, 113 families, and 40 orders. Our library generated 5,937 novel standard barcode sequences for 1,696 species and 262 new plastid genome sequences for 258 species that will enrich the current barcode database for woody plants in subtropical and tropical China. This barcode library represents a valuable resource for taxonomic identification, ecological and evolutionary research, and biodiversity conservation in subtropical and tropical China and bordering countries. Furthermore, by integrating this DNA barcode library with other datasets, such as datasets containing functional traits²⁰ and geographic distribution information¹⁹, we can expand our comprehension of the evolutionary history and temporal dynamics of the flora in this region, which will provide valuable insights for conservation efforts in the face of global climate change²¹.

Methods

Sample collection and identification

To create a comprehensive library of standard barcodes and super-barcodes for woody plants in subtropical and tropical China, we conducted fieldwork in 11 provinces and 29 cities, representing a significant proportion of the plant diversity of tropical and subtropical China (Fig. 1, Table S1). The scientific names of species in our dataset were standardized with reference to The Plant List (http://www.theplantlist.org/) using the ‘status’ function of the R package ‘plantlist’ version 0.7.2²² and the Flora of China. For each species, one to nine individuals were sampled, and fresh leaf material was dried in silica gel for subsequent DNA extraction. Voucher specimens were identified by professional taxonomists using morphological characters and were deposited in the herbarium of the South China Botanical Garden (IBSC).

DNA extraction, sequencing, and assembly

Total genomic DNA was isolated from silica-dried leaf tissue using the cetyltrimethylammonium bromide (CTAB) method²³. Amplification of rbcL utilized one universal primer set (rbcLa-F/-R). The matK with low amplification success rate required three pairs of primers (Kim_3F/1 R, xF/5r, Gym_F1A/R1A), of which Gym_F1A/R1A²⁴. For the ITS marker, two primer pairs (ITS-Leu/4, ITS5/4) were utilized initially; samples that failed to amplify were re-amplified for ITS2. The ITS2 was amplified by one universal primer set (ITS2_S2F/S3R). Each 25 μl PCR reaction mixture included 2.5 μl 10 × PCR Buffer, 2 μl dNTPs (2.5 mM), 0.5 μl of each primer (10 μM), 2 μl DNA template, 0.2 μl rTaq (5U/μl), 0.5 μl DNA template, and 18.8 μl ddH₂O. Mg²⁺ (5%) or dimethyl sulfoxide (DMSO) (5%) was added to improve the sequence recovery success rate of matK and ITS/ITS2. Mg²⁺ can act as a cofactor during polymerization²⁵, and DMSO can resolve secondary DNA structures by binding the major and minor grooves of DNA strands²⁶. The details of primers and references are shown in Table S2. All PCR products were sequenced using the Sanger sequencing method on an ABI 3730 DNA analyzer. All original trace files were assembled and checked using Geneious v11.0.2²⁷.

For the samples collected for super-barcodes, we implemented the genome-skimming method to acquire complete plastid genome and nrDNA sequences. DNA extracts were sent to Beijing Genomics Institute (BGI, Shenzhen, China) for library preparation and genome-skimming sequencing. Following the MGIEasy Universal DNA Library Prep Set user manual v.1.0 (MGI Tech, https://en.mgi-tech.com/download/files.html), the DNA extracts were sheared into 300 to 500 bp fragments for library construction. Paired-end sequencing (2 × 150 bp) was performed on the Illumina HiSeq X Ten platform at BGI. Phred quality scores and %GC content of raw reads were determined using FastQC 0.11.5 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Subsequently, low-quality reads and adapters were removed using Trimmomatic v.0.35²⁸, generating approximately 2–3 Gb of clean read data for each sample. The plastid genomes and nrDNA were de novo assembled from the clean read data using GetOrganelle v1.7.6²⁹. Then, the plastomes were annotated using DOGMA³⁰ and GeSeq³¹, with start and stop codons manually adjusted in Geneious v11.0.2. The nrDNAs were annotated using Geneious v11.0.2 as well. For subsequent barcode analyses, plastid markers (rbcL and matK) were extracted from plastomes, while ITS/ITS2 were extracted from nrDNA.

Data verification

All the sequences obtained by Sanger sequencing were verified by using the BLASTn tool. If query sequences with top hits were from the same species or genus as the submitted sequences, they were retained for further analyses²⁸. Sequences with conflicts between the search outcomes and taxonomic identification were examined carefully to determine whether there was contamination (e.g., mixed ITS sequences of insects and fungi), incorrect sequencing (e.g., mix-up of DNA samples), or incorrect identification (i.e., a mismatch between sequence Blast results and specimen identification). The contaminated or incorrect sequences were excluded from further analyses, while the samples with incorrect identifications were re-identified by taxonomic experts. However, 79 samples were not identified with certainty and thus were not included in further analyses. To minimize the impact of missing data, we only included species with samples from at least two individuals in our subsequent analyses.

We utilized three common methods to assess the discriminatory power of the four standard barcodes. First, genetic distances were used to identify the presence of “barcode gaps”, which occur when minimum inter-specific genetic distances are higher than maximum intra-specific genetic distances³². Following the methods of Gill et al.³³, the uncorrected intra- and inter-specific genetic distances for each barcode separately and their combinations were calculated with the function ‘distancematrix’ in DECIPHER³⁴. Second, we used TaxonDNA v1.8³⁵ to perform identification based on genetic distances. For the “Best match” (BM) approach, an identification was considered successful if the query and its closest sequence matches were from the same species, while mismatched species were categorized as incorrect identifications. Results with matching multiple different species were considered ambiguous. For the ‘Best close match’ (BCM) method, a threshold value that was less than 95% of all intra-specific distances was established³⁵. Queries without any sequence matches below the threshold were considered as unidentified, while correct, ambiguous, and incorrect identifications were defined as for the BM method. Third, we used a tree-based method, where species clustering in a monophyletic group was considered a successful resolution. We aligned the standard barcode sequences using MAFFT v7.4³⁶ and adjusted them manually in Geneious v11.0.2. Alignment of rbcL and matK was performed with default parameters. ITS and ITS2 were aligned by families and then the sequences were concatenated. The gymnosperm sequences were removed to avoid inaccuracy of ITS alignment caused by the higher variation of internal transcribed spacer-1 (ITS1) in these species. We constructed Maximum-likelihood (ML) trees for each marker and their combinations using RAxML 8.2.12³⁷ under the GTRGAMMA model. Node supports were evaluated with 1,000 bootstrap replicates, and monophyletic clades with support greater than or equal to 50% were defined as successful identifications³⁸.

To confirm a higher phylogenetic resolution of the super-barcode in comparison to the standard barcode, we evaluated the node supports of the plastid genome tree. We extracted all protein-coding genes from the assembled plastid genomes using a python script (https://github.com/Kinggerm/PersonalUtilities). A total of 78 genes that occurred most frequently in all species were selected to construct a plastid genome tree. Sequences were aligned by MAFFT v7.4³⁶ for each locus and then concatenated to generate a supermatrix. Model selection was performed using jModelTest v2.0³⁹, and the maximum likelihood tree was constructed under the best model GTRGAMMA by RAxML 8.2.12³⁷. To evaluate the node supports, 1000 bootstraps were replicated. As we had fewer replicated samples for super-barcodes, we did not test the resolution for super-barcodes to identify closely related species.

Data Records

All standard and super-barcode sequences, sequence records, and specimen pictures from this study are stored at Figshare⁴⁰. The raw reads data for all newly generated plastid genomes in this study have been deposited in the NCBI Sequence Read Archive (SRA) database under the accession numbers SRX22362678⁴¹-SRX22362939⁴². We successfully generated standard barcodes for 1,696 species from 2,524 individuals across 48 orders, 130 families, and 547 genera. In addition, we identified 79 samples at the genus level. We also incorporated partial standard barcode data from our previous study on Dinghushan National Nature Reserve⁴³, which included 517 woody species from 969 samples. Furthermore, we extracted rbcL, matK, and ITS/ITS2 from our plastid genome dataset (see below). Overall, we constructed a standard barcode library containing 2,520 species from 4,733 samples across 49 orders, 144 families, and 683 genera. This library, which also includes 79 samples currently identified to the genus level, comprises a total of 15,090 accessions for the four most commonly used barcodes (rbcL, matK, ITS, and ITS2). Thus, for the standard barcode library, we obtained 2,520 species from 4,654 individuals, resulting in a total of 14,837 sequences⁴⁰. Specifically, we acquired 4,451 rbcL sequences, 4,055 matK sequences, 2,905 ITS sequences, and 3,426 ITS2 sequences (Table 1). These sequences cover 683 genera, 144 families, and 49 orders of woody plants in tropical and subtropical China.

Table 1 Summary of standard barcodes for woody plants in tropical and subtropical China (without sp.).

Full size table

For super-barcodes, 971 plastid genomes were obtained from our previous research^44,45. In addition, 262 plastid genomes belonging to 71 families, 170 genera, and 258 species were newly generated in the present study. Finally, the super-barcode library included 1,239 samples belonging to 40 orders, 113 families, 411 genera, and 1,139 species⁴⁰.

The sequence records file has two separate sheets for standard and super-barcode libraries. Each record in the list for super-barcodes contains (1) associated species information including sample ID, order, family, genus, and species; (2) sequence information including GenBank accession numbers and the presence or absence of the four standard barcodes; and (3) specimen information including collection sites, latitude and longitude, elevation, collectors, collection date, identifier, museum ID, and the storing institution. The list for standard barcodes contains additional information including BOLD ID, sequence length, trace count, and image count. Moreover, all specimen details and standard DNA barcode sequences were uploaded to the BOLD system, which is open to the public, in the dataset “DS-EBLF” (https://doi.org/10.5883/DS-EBLF).

Technical Validation

The discriminatory power of the standard barcodes among species were evaluated with multiple individuals using three common methods (Table 2). The results of the distance-based “BM/BCM” method demonstrated that BM and BCM had almost the same correct, ambiguous, and incorrect identification rate for all barcodes, with BM having slightly higher rates than BCM. ITS had the highest correct identifications (72.66% for BCM) while the resolution for rbcL and matK was lower with higher ambiguous identification (Table 2). The combination RMI had the highest species resolution for the barcoding gap and tree-based method (59.07% and 66.61%, respectively) (Table 2). While rbcL and matK had the lowest resolution for the data set with abundant species, ITS performed best for the four single barcodes using the three methods (71.68%/72.66%, 58.05%, and 61.33% for BM/BCM, the barcoding gap, and tree-based method, respectively) (Table 2), which is consistent with previous DNA barcode studies (e.g., Hu et al.³⁸; Liu et al.⁴³; Gill et al.³³; Huang et al.²). Moreover, we observed significant improvements in node supports for the plastid genome tree compared to the standard barcode tree, particularly for species-rich families (Fig. 2, Table 3). In the standard barcode tree, 20.44% of the nodes showed low bootstrap support values (0 < BS < 50), and only 57.27% of the nodes had high bootstrap support values (BS > 85). In contrast, in the plastid genome tree, 5.49% of the nodes had low bootstrap support values, and 85.86% of the nodes had high bootstrap support values (Fig. 2, Table 3). Both the standard barcode tree and the plastid genome tree can be found on Figshare⁴⁰.

Table 2 Species identification rates for standard barcodes based on three methods.

Full size table

Table 3 Comparisons of bootstrap values for total and the most ten families sampled between the standard barcode tree and the plastid genome tree.

Full size table

Code availability

The code used to check species names can be found in the R package ‘plantlist’ version 0.7.2.

References

Hebert, P. D. N., Cywinska, A., Ball, S. L. & deWaard, J. R. Biological identifications through DNA barcodes. Proc Biol Sci 270, 313–321 (2003).
Article CAS PubMed PubMed Central Google Scholar
Huang, X., Ci, X., Conran, J. G. & Li, J. Application of DNA barcodes in Asian tropical trees – A case study from Xishuangbanna nature reserve, southwest China. PLOS ONE 10, e0129295 (2015).
Article PubMed PubMed Central Google Scholar
Kress, W. J., Wurdack, K. J., Zimmer, E. A., Weigt, L. A. & Janzen, D. H. Use of DNA barcodes to identify flowering plants. Proceedings of the National Academy of Sciences of the United States of America 102, 8369–8374 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
de Vere, N., Rich, T. C. G., Trinder, S. A. & Long, C. DNA Barcoding for Plants. in Plant Genotyping 101–118, https://doi.org/10.1007/978-1-4939-1966-6_8 (Humana Press, New York, NY, 2015).
CBOL Plant Working Group. et al. A DNA barcode for land plants. Proceedings of the National Academy of Sciences of the United States of America 106, 12794–12797 (2009).
Article ADS PubMed Central Google Scholar
China Plant BOL Group. et al. Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants. Proceedings of the National Academy of Sciences of the United States of America 108, 19641–19646 (2011).
Article ADS PubMed Central Google Scholar
Velzen, R., van, Weitschek, E., Felici, G. & Bakker, F. T. Dna barcoding of recently diverged species: relative performance of matching methods. PLOS ONE 7, e30490 (2012).
Article ADS PubMed PubMed Central Google Scholar
Yan, H.-F. et al. Dna barcoding evaluation and its taxonomic implications in the species-rich genus Primula l. in China. PLOS ONE 10, e0122903 (2015).
Article PubMed PubMed Central Google Scholar
Coissac, E., Hollingsworth, P. M., Lavergne, S. & Taberlet, P. From barcodes to genomes: extending the concept of DNA barcoding. Molecular Ecology 25, 1423–1428 (2016).
Article CAS PubMed Google Scholar
Hollingsworth, P. M., Li, D.-Z., van der Bank, M. & Twyford, A. D. Telling plant species apart with DNA: from barcodes to genomes. Philosophical Transactions of the Royal Society B: Biological Sciences 371, 20150338 (2016).
Article Google Scholar
Li, X. et al. Plant DNA barcoding: from gene to genome. Biol Rev Camb Philos Soc 90, 157–166 (2015).
Article PubMed Google Scholar
Parks, M., Cronn, R. & Liston, A. Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes. BMC Biology 7, 84 (2009).
Article PubMed PubMed Central Google Scholar
Fu, C.-N. et al. Testing genome skimming for species discrimination in the large and taxonomically difficult genus Rhododendron. Molecular Ecology Resources 22, 404–414 (2022).
Article CAS PubMed Google Scholar
Ji, Y. et al. Testing and using complete plastomes and ribosomal DNA sequences as the next generation DNA barcodes in Panax (Araliaceae). Molecular Ecology Resources 19, 1333–1345 (2019).
Article CAS PubMed Google Scholar
Yu, X.-Q. et al. Species discrimination in Schima (Theaceae): Next-generation super-barcodes meet evolutionary complexity. Molecular Ecology Resources 22, 3161–3175 (2022).
Article CAS PubMed Google Scholar
Zeng, C.-X. et al. Genome skimming herbarium specimens for DNA barcoding and phylogenomics. Plant Methods 14, 43 (2018).
Article PubMed PubMed Central Google Scholar
Cazzolla Gatti, R. et al. The number of tree species on Earth. Proceedings of the National Academy of Sciences 119, e2115329119 (2022).
Article Google Scholar
Myers, N., Mittermeier, R. A., Mittermeier, C. G., da Fonseca, G. A. B. & Kent, J. Biodiversity hotspots for conservation priorities. Nature 403, 853–858 (2000).
Article ADS CAS PubMed Google Scholar
Fang J., Wang Z. & Tang Z. Atlas of Woody Plants in China: Distribution and Climate. (Springer Science & Business Media, 2011).
Wang, H. et al. The China Plant Trait Database: toward a comprehensive regional compilation of functional traits for land plants. Ecology 99, 500–500 (2018).
Article PubMed Google Scholar
Henniges, M. C. et al. A taxonomic, genetic and ecological data resource for the vascular plants of Britain and Ireland. Sci Data 9, 1 (2022).
Article PubMed PubMed Central Google Scholar
Zhang, J., Liu, B., Liu, S., Feng, Z. & Jiang, K. Plantlist: looking up the status of plant scientific names based on the plant list database, searching the Chinese names and making checklists of plants. (2021).
Doyle, J. J. & Doyle, J. L. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochemical bulletin 19, 11–15 (1987).
Google Scholar
Li, Y., Gao, L.-M., Poudel, R. C., Li, D.-Z. & Forrest, A. High universality of matK primers for barcoding gymnosperms. Journal of Systematics and Evolution 49, 169–175 (2011).
Article Google Scholar
Lorenz, T. C. Polymerase chain reaction: basic protocol plus troubleshooting and optimization strategies. J Vis Exp 3998 https://doi.org/10.3791/3998 (2012).
Sa, F. & Sb, G. Effect of dimethyl sulfoxide concentration on specificity of primer matching in PCR. BioTechniques 12 (1992).
Kearse, M. et al. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).
Article PubMed PubMed Central Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar
Jin, J.-J. et al. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol 21, 241 (2020).
Article PubMed PubMed Central Google Scholar
Wyman, S. K., Jansen, R. K. & Boore, J. L. Automatic annotation of organellar genomes with DOGMA. Bioinformatics 20, 3252–3255 (2004).
Article CAS PubMed Google Scholar
Tillich, M. et al. GeSeq - versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 45, W6–W11 (2017).
Article CAS PubMed PubMed Central Google Scholar
Collins, R. A. & Cruickshank, R. H. The seven deadly sins of dna barcoding. Molecular Ecology Resources 13, 969–975 (2013).
Article CAS PubMed Google Scholar
Gill, B. A. et al. Plant DNA-barcode library and community phylogeny for a semi-arid East African savanna. Molecular Ecology Resources 19, 838–846 (2019).
Article PubMed Google Scholar
Wright, E. S. Using Decipher v2.0 to analyze big biological sequence data in R. The R Journal 8, 352–359 (2016).
Article Google Scholar
Meier, R., Shiyang, K., Vaidya, G. & Ng, P. K. L. DNA barcoding and taxonomy in Diptera: a tale of high intraspecific variability and low identification success. Syst Biol 55, 715–728 (2006).
Article PubMed Google Scholar
Katoh, K. & Standley, D. M. Mafft multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30, 772–780 (2013).
Article CAS PubMed PubMed Central Google Scholar
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Article CAS PubMed PubMed Central Google Scholar
Hu, J.-L. et al. Assessing candidate DNA barcodes for Chinese and internationally traded timber species. Molecular Ecology Resources 22, 1478–1492 (2022).
Article CAS PubMed Google Scholar
Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. jModelTest 2: more models, new heuristics and parallel computing. Nature Methods 9, 772–772 (2012).
Article CAS PubMed PubMed Central Google Scholar
Jin, L. et al. A DNA barcode library for woody plants in tropical and subtropical China, Figshare, https://doi.org/10.6084/m9.figshare.22715128.v4 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX22362678 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX22362939 (2023).
Liu, J. et al. The use of DNA barcoding as a tool for the conservation biogeography of subtropical forests in China. Diversity and Distributions 21, 188–199 (2015).
Article Google Scholar
Jin, L. et al. Stronger latitudinal phylogenetic patterns in woody angiosperm assemblages with higher dispersal abilities in China. Journal of Biogeography https://doi.org/10.1111/jbi.14746 (2023).
Jin, L. et al. Plastome-based phylogeny improves community phylogenetics of subtropical forests in China. Molecular Ecology Resources 22, 319–333 (2022).
Article PubMed Google Scholar

Download references

Acknowledgements

This study was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences, Grant No. XDB31000000. The authors acknowledge the Chinese Forest Biodiversity Monitoring Network (CForBio) for installing and supporting the forest dynamic plots in China, and all the field technicians who have helped census the plots. They also thank Yu-Ying Zhou and Feng Song for their kind help in molecular experiments and data analysis.

Author information

These authors contributed equally: Lu Jin, Hao-You Shi.

Authors and Affiliations

Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China
Lu Jin, Tian-Wen Xiao, Chen-Xin Ma, Ju-Yu Lian & Xue-Jun Ge
Central South Academy of Inventory and Planning of NFGA, Changsha, 410014, China
Hao-You Shi
Yiyang Forestry Bureau, Yiyang, 413000, China
Ting Li
Hunan Police Academy, Changsha, 410138, China
Nan Zhao
Conghua Middle School, Guangzhou, 510900, China
Yong Xu
College of Forestry, Central South University of Forestry & Technology, Changsha, 410004, China
Feng Song
CAS Key Laboratory of Tropical Forest Ecology, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming, 650201, China
Qiao-Ming Li, Lu-Xiang Lin & Xiao-Na Shao
School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275, China
Bu-Hang Li
State Key Laboratory of Vegetation and Environmental Change, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
Xiang-Cheng Mi & Hai-Bao Ren
Key Laboratory of Aquatic Botany and Watershed Ecology, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China
Xiu-Juan Qiao
Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074, China
Xiu-Juan Qiao
Center of Plant Ecology, Core Botanical Gardens, Chinese Academy of Sciences, Guangzhou, 510650, China
Ju-Yu Lian
Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha, Hunan, 410125, China
Hu Du

Authors

Lu Jin
View author publications
You can also search for this author in PubMed Google Scholar
Hao-You Shi
View author publications
You can also search for this author in PubMed Google Scholar
Ting Li
View author publications
You can also search for this author in PubMed Google Scholar
Nan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Tian-Wen Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Feng Song
View author publications
You can also search for this author in PubMed Google Scholar
Chen-Xin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Qiao-Ming Li
View author publications
You can also search for this author in PubMed Google Scholar
Lu-Xiang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Na Shao
View author publications
You can also search for this author in PubMed Google Scholar
Bu-Hang Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiang-Cheng Mi
View author publications
You can also search for this author in PubMed Google Scholar
Hai-Bao Ren
View author publications
You can also search for this author in PubMed Google Scholar
Xiu-Juan Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Ju-Yu Lian
View author publications
You can also search for this author in PubMed Google Scholar
Hu Du
View author publications
You can also search for this author in PubMed Google Scholar
Xue-Jun Ge
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

X.-J.G. conceived and designed the study; T.L., N.Z., Y.X., T.-W.X., C.-X.M., Q.-M.L., L.-X.L., X.-N.S., B.-H.L., X.-C.M., H.-B.R., X.-J.Q., J.-Y.L. and H.D. collected the voucher specimens and fresh leaf materials; H.-Y.S., L.J., T.L., N.Z., Y.X., T.-W.X., F.S., C.-X.M., and X.-N.S. performed the experiments; H.-Y.S. and L.J. analysed the data; H.-Y. Shi and L.J. wrote the manuscript, with significant contributions from X.-J.G.

Corresponding author

Correspondence to Xue-Jun Ge.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jin, L., Shi, HY., Li, T. et al. A DNA barcode library for woody plants in tropical and subtropical China. Sci Data 10, 819 (2023). https://doi.org/10.1038/s41597-023-02742-7

Download citation

Received: 02 May 2023
Accepted: 10 November 2023
Published: 22 November 2023
DOI: https://doi.org/10.1038/s41597-023-02742-7