Introduction

Wikstroemia Engl. (Thymelaeaceae) is a diverse genus of approximately 70 species. Members of Wikstroemia are widely distributed in the Asian and Oceanian regions and scattered around the Hawaiian Islands1. The species are mostly fibrous trees, shrubs or subshrubs with a woody rhizome. Several species are cultivated as raw material for pulp production2,3, while a handful of them are reported to have medicinal properties4,5. However, studies of Wikstroemia have been confined to its utilization in pulp production and pharmacological applications; reports on genetic studies of Wikstroemia are scarce.

The only reports on the genetic diversity to date include one on Wikstroemia ganpi in Korea using inter simple sequence repeat (ISSR) markers6 and two, published, complete plastome sequences of Wikstroemia chamaedaphne and Wikstroemia indica7,8. Due to the lack of molecular evidence, taxonomic studies of Wikstroemia have relied solely on morphological characteristics9. Ironically, the continuous nature of morphological variation in members of Wikstroemia has led to much taxonomic confusion in attempts to distinguish species and has resulted in ambiguities in taxonomic classifications between Wikstroemia and its sister genera9,10. Among the key morphological characteristics proposed to differentiate Wikstroemia from allied genera is the presence of petaloid scales in the flower11. However, the presence and characteristics of the disc in the flowers of Wikstroemia was not emphasized. Failure to analyse this character may result in misidentifications due to overlap in this feature during classification10. The subgeneric classification of Wikstroemia, consisting of only the subgenera Wikstroemia and Diplomorpha, is generally accepted12,13. Another problem in classification is the difficulty in detecting natural hybridization among the species due to the possibility of low reproductive isolation and high genetic similarity, suggesting that Wikstroemia represents a large complex of species14.

The plastome is a circular double-stranded DNA molecule. In plants, the plastome is mostly maternally inherited and not disturbed by gene recombination15. A typical plant plastome ranges in size from 120 to 217 kb16. The complete plastome has a typical quadripartite structure, including a large single-copy (LSC) region, a small single-copy (SSC) region, and two separate inverted regions (IRs)17. Owing to its slow rate of evolution and ease of sequencing and assembly due to its small size, the plastome has been receiving much attention among biologist and taxonomist because it is highly informative and provides evolutionary and genetics insights18,19.

The taxonomic placement of Wikstroemia has been controversial. This genus has experienced a complicated classification history in reviews of members of the Thymelaeaceae. Stellera chamaejasme of the monotypic genus Stellera was reported to be sister to Wikstroemia based on combined plastid DNA sequences (trnT-trnL, trnL-trnF, trnL intron, and rpl16 intron)20, while Wikstroemia, along with 14 sister genera based on palynology findings, has been taxonomically placed in the Daphne group of the tribe Daphneae21,22. Although phylogenetic studies in Thymelaeaceae are ongoing23, phylogenetic relationships in Wikstroemia are likely to be understudied. Constituent genera in Thymelaeaceae have experienced similar molecular challenges, in which poor phylogenetic resolution is likely due to low genetic variation in the selected molecular markers23. Such conflicts can be overcome by utilizing genome-scale datasets24. At the same time, highly divergent regions may be identified through genome comparisons, which could aid in future phylogenetic studies of such a diverse genus as Wikstroemia.

In this study, we sequenced the complete plastomes of six species of Wikstroemia, W. alternifolia, W. canescens, W. capitata, W. dolicantha, W. micrantha, and W. scytophylla, to analyse and compare genomes using bioinformatic tools. Our aims were to (1) characterize the plastomes of the six species of Wikstroemia; (2) examine the variation in sequence repeats and codon usage in the six plastome sequences; (3) identify highly divergent regions in the plastome sequences; and (4) improve the understanding of the intrageneric/intergeneric phylogeny of Wikstroemia within Thymelaeaceae based on plastome sequences and the nuclear ribosomal DNA internal transcribed spacer (ITS) region.

Results

Plastome features of six species of Wikstroemia

The total length of the plastomes of the six species of Wikstroemia analysed in this study ranged from 172,610 bp (W. micrantha) to 173,697 bp (W. alternifolia). All six plastomes exhibited a typical quadripartite structure (Table 1, Fig. 1) consisting of a pair of inverted repeat (IR) regions (41,850–42,073 bp) separated by an LSC region (86,111–86,701 bp) and an SSC region (2799–2871 bp). All six plastomes had the same GC content at 36.7%. However, the GC content in the plastome of each species of Wikstroemia was unevenly distributed. The IR region accounted for the highest GC content (38.8–38.9%), followed by the LSC region (34.8–34.9%), while the SSC region showed the lowest GC content (28.7–29.6%).

Table 1 Plastome features of six species of Wikstroemia.
Figure 1
figure 1

Gene map for the plastomes of six species of Wikstroemia used in this study. Genes on the inside of the map are transcribed in the clockwise direction; genes on the outside of the map are transcribed in the counterclockwise direction. Darker grey in the inner circle represents the GC content, whereas light grey corresponds to the AT content. Different functional groups of genes are shown in different colours. The gene map was generated using OGDRAW45.

The six plastomes of Wikstroemia displayed an identical gene content and gene order with no structural reconfigurations. A total of 138 to 139 genes were detected in the six species used in this study, comprising 92 to 93 protein-coding genes , 38 transfer RNA (tRNA) genes, and 8 ribosomal RNA (rRNA) genes (Table 1). However, 27 genes were duplicated in the IR regions, including 15 protein-coding genes (ccsA, ndhA, ndhB, ndhD, ndhE, ndhH, ndhG, ndhI, psaC, rpl2, rpl23, rps7, rps15, ycf1, ycf2), eight tRNA genes (trnA-UGC, trnI-CAU, trnI-GAU, trnL-CAA, trnL-UAG, trnN-GUU, trnR-ACG and trnV-GAC) and four rRNAs (rrn4.5, rrn5, rrn16 and rrn23) (Table 2). Fifteen genes contained an intron, five of which (ndhA, ndhB, rpl2, trnA-UGC and trnI-GAU) were located in the IR region, and the remaining 10 genes (atpF, petB, petD, rpl16, rpoC1, rps16, trnG-UCC, trnL-UAA, trnK-UUU and trnV-UAC) were located in the LSC region (see Supplementary Table S1 online). Only the ycf3 gene, which was present in the LSC region, was detected to contain a pair of introns. Upon comparison, we found that the trnK-UUU gene had the longest intron, ranging from 2498 to 2508 bp, in all six genomes.

Table 2 Genes present in the plastomes of six species of Wikstroemia used in this study.

Repetitive sequence analysis

The total number of short sequence repeats (SSRs) in the plastome sequences of W. alternifolia, W. canescens, W. capitata, W. dolicantha, W. micrantha, and W. scytophylla were 127, 128, 110, 87, 90 and 110, respectively (Fig. 2A). No hexanucleotide sequences, however, were detected in the plastome sequences of W. alternifolia, W. canescens and W. scytophylla. The majority of SSRs (W. alternifolia: 70.87%; W. canescens: 70.31%; W. capitata: 68.18%; W. dolicantha: 63.22%; W. micrantha: 61.11%, W. scytophylla: 63.64%) were located in the LSC regions rather than in the other two regions of the plastome (Fig. 2B).

Figure 2
figure 2

Distribution of small sequence repeats (SSRs) in the plastomes of six accessions of Wikstroemia. (A) Number of different SSR types detected in the plastomes of six species of Wikstroemia; (B) Frequencies of identified SSRs in large single-copy (LSC), small single-copy (SSC) and inverted repeat (IR) regions.

All six species of Wikstroemia contained the same number of long repeats (Fig. 3A). In general, all of them contained 24 forward repeats and 25 palindromic repeats, except for W. canescens and W. capitata. Long forward repeats that ranged between 30 and 40 bp were the most abundant in W. dolicantha and W. micrantha, while W. alternifolia, W. canescens, W. capitata, and W. scytophylla were noted to have a higher number of long forward repeats with lengths of 41 to 60 bp (Fig. 3B). Long palindromic repeats were equally abundant in W. alternifolia and W. canescens, ranging from 40 to 60 bp and above 60 bp (Fig. 3C), while long palindromic repeats were abundant in the range of 30 to 60 bp in W. capitata, W. dolicantha, W. micrantha and W. scytophylla. Long reverse repeats, mostly within the range of 30 to 40 bp, were detected only in W. canescens and W. capitata (Fig. 3D).

Figure 3
figure 3

Analysis of long repeat sequences in the plastomes of six species of Wikstroemia. (A) Quantities of long repeats based on type; (B) frequencies of forward repeats by length; (C) frequencies of palindromic repeats by length; and (D) frequencies of reverse repeats by length.

Analysis of codon usage

Thirty preferred codons (relative synonymous codon usage; RSCU > 1.00) were recorded in W. alternifolia, W. canescens, W. capitata, W. dolicantha, W. micrantha and W. scytophylla (see Supplementary Table S2 online). The stop codon UAA was most abundant and preferred over the other two stop codons, UAG and UGA, in all six species. Preferred codons mostly ended with the amino acids A or U, except for the leucine-encoded (Leu) codon UUG. The Leu-encoded codons had the greatest occurrence (9.38%), while cysteine-encoded (Cys) codons had the fewest occurrences (3.13%) among all six species of Wikstroemia.

Sequence divergence analysis

The plastome sequence alignment of the eight species of Wikstroemia, using the W. chamaedaphne plastome as a reference, indicated high sequence conservatism across the plastomes of eight species but not in the plastome of W. indica (Fig. 4). Overall, the size and gene order of the plastomes in Wikstroemia were well conserved, but a distinct large gap was observed beginning within the ycf1 gene sequence of the IRa to 5′ region of the trnL-UAG in the IRb of W. indica. Both single-copy regions were recorded as having greater sequence divergence than the IR region (Fig. 5). With a Pi-value cut-off point of 0.025, eight highly variable gene regions were identified: ndhD-ndhF, ndhF-rpl32, ndhJ, petL-petG, psbI-trnS-GCU, trnG-UCC, trnK-UUU-rps16 and the trnL-UAA-trnF-GAA intergenic spacer regions. Six of the highly variable regions were located in the LSC, while two of them were in the SSC region.

Figure 4
figure 4

Complete plastome comparison of eight species of Wikstroemia using the plastome of W. chamaedaphne as reference.

Figure 5
figure 5

Sliding window analysis of complete plastome sequences among eight species of Wikstroemia (window length: 1000 bp; step size: 500 bp).

Contraction and expansion in the IR region

Genes adjacent to the IR borders were consistent across members of Wikstroemia, except in W. indica, which varied in its adjacent genes at the IRb/SSC (JSB) and IRa/SSC (JSA) borders (Fig. 6). In contrast to the rpl32 and ndhF genes in the SSC region, adjacent to JSB and JSA, respectively, the ycf1 gene was located across both JSA and JSB in the plastomes of W. indica. The trnL-UAG gene was also adjacent to JSA in the SSC region of the W. indica plastome. In comparison, six species (W. alternifolia, W. chamaedaphne, W. dolicantha, W. indica, W. micrantha and W. scytophylla) had their rps19 gene crossing the IRb/LSC (JLB) border.

Figure 6
figure 6

Comparison of borders between LSC, SSC and IR regions across the plastomes of eight species of Wikstroemia. Image was generated with IRscope50.

Selection pressure

Sixty-nine shared protein-coding genes were included in the selection pressure analysis between Aquilaria sinensis and W. capitata (Table 3). When analysed separately, the Ka/Ks values indicated that five genes, namely, rpl2, rps7, rps18, ycf1 and ycf2, displayed positive selection; 61 genes indicated purifying selection, and three genes did not exhibit any synonymous (Ks) values indicative of selection due to the constraints of the model used. The Ka/Ks value for the combined dataset revealed that the overall selection pressure of the 69 shared protein-coding genes was 0.435, showing signals of purifying selection.

Table 3 Selection pressure analysis of 69 shared protein-coding gene sequences for Aquilaria sinensis (GenBank accession MN720647) and Wikstroemia capitata, analysed separately and combined.

Phylogenetic analysis

The maximum-likelihood (ML) and Bayesian inference (BI) trees based on the complete plastome sequences excluding the IRa sequences and the dataset of the intergenic spacer (IGS) sequences revealed that all the branch nodes for eight species of Wikstroemia included in the phylogenetic tree were supported with high bootstrap values and Bayesian posterior probabilities (ML: ≥ 90%; BI: ≥ 95%) (Fig. 7). For the dataset of the total gene sequences containing protein-coding genes, tRNAs, and rRNAs that are shared by all species, strong posterior probabilities were recorded in most of the branch nodes of the BI tree but not in the ML tree, in which moderate bootstrap support was recorded for the backbone structure of the Wikstroemia clade (see Supplementary Figure S1 online). The molecular placement of W. capitata and W. indica, forming sister to each other under low branch support, in the ML tree and BI tree based on the dataset of all protein-coding genes was incongruent with the phylogenetic trees based on the datasets using complete plastome sequences excluding the IRa and intergenic spacer (IGS) sequences. The ML trees and BI trees based on the datasets of the first, second, and third codons of each amino acid in the protein-coding sequences did not display matching molecular placement of Wikstroemia when compared with each other; most of the branches were poorly supported in the Wikstroemia clade (see Supplementary Figure S2 online). The phylogenetic tree using complete plastome sequences excluding the IRa sequences suggested that a paraphyletic relationship was present in Wikstroemia. Two species, W. alternifolia and W. canescens, were clustered with Stellera chamaejasme, while six species of Wikstroemia (W. capitata, W. chamaedaphne, W. dolicantha, W. indica, W. micrantha and W. scytophylla) formed a monophyletic group.

Figure 7
figure 7

Maximum likelihood (ML) and Bayesian inference (BI) of Wikstroemia and allied genera based on the complete plastome sequences excluding the inverted repeat A (IRa) region, and a dataset of the intergenic spacer (IGS) regions of 17 taxa representing 5 genera of Thymelaeaceae, analysed separately. Branch nodes that were calculated with reliable support values (ML: bootstrap ≥ 75%; BI: posterior probability ≥ 0.90) are indicated with an asterisk (*). Sequences obtained through this study are indicated in bold; two species, Psidium guajava (KY635879) and Gossypium gossypioides (HQ901195), were included as outgroups.

The ITS-based ML tree revealed a paraphyletic relationship between Wikstroemia and S. chamaejasme, while most of the branch nodes within the Wikstroemia clade were not highly supported (Fig. 8A). Strong bootstrap support was recorded for the sistership between W. alternifolia and W. canescens and between W. micrantha and W. stenophylla. Weakly supported sisterships were present between W. dolicantha and W. scytophylla and between W. capitata and W. ligustrina. In contrast, the BI analysis displayed a monophyletic relationship within the Wikstroemia clade (Fig. 8B). Similar to the ML tree, sisterships were strongly supported between W. alternifolia and W. canescens and between W. micrantha and W. stenophylla but not between W. dolicantha and W. scytophylla or between W. capitata and W. ligustrina in the BI tree.

Figure 8
figure 8

Phylogenetic analyses of Thymelaeaceae based on nuclear ribosomal DNA internal transcribed spacer (ITS) gene sequences of 34 taxa representing 6 genera of Thymelaeaceae. (A) Maximum-likelihood (ML) and (B) Bayesian inference (BI) tree analyses were conducted with 1000 bootstrap replicates. Branch nodes that were calculated with reliable support values (ML: bootstrap ≥ 75%; BI: posterior probability ≥ 0.90) are indicated with an asterisk (*). Two species, Psidium guajava (MN2953604) and Gossypium australe (AF057763), were included as outgroups.

Discussion

The plastomes of the species in Wikstroemia examined in this study were highly conserved, which is similar to the situation in other angiosperms. The length of the plastomes of the six species of Wikstroemia varied little and were similar in size to typical angiosperms25. The same number and contents of the genes were predicted in this study, suggesting that the evolution of the gene sequences was consistent across the six species. Similar to most angiosperms, sequence repeats for A/T were more abundant than those of G/C in the Wikstroemia plastomes and may represent bias in the base composition, which is potentially affected by the tendency of the genome to change to A-T rather than to G-C26. An additional validation step for these SSRs, for which five novel SSR primer sets were designed, was conducted for the six species of Wikstroemia reported in this study. Details of the newly designed SSR primer sets and the resulting pherograms are included for reference (see Supplementary Table S3 and Data S1 online).

Expansion and contraction of the IR region are major evolutionary events that influence the length of the plastomes27. The IR junctions in the plastomes reported in this study were placed and annotated with Geneious Prime28 and further validated with GeSeq29 as well as Sanger sequencing using novel specific primer sets (see Supplementary Table S4 and Data S2 online). Our study indicated that the contractions and expansions of the IR regions exhibited relatively stable patterns within Wikstroemia, with slight variation; gene recombination between the repetitive sequence or poly-A structure and tRNA could be one of the reasons for the change in length in the IR region30. However, W. indica indicated dissimilarity in its IR borders, which differed from most angiosperms31. We suspect that the plastome IR contraction and expansion in W. indica is severe and may be due to extensive gene transfer and larger IR expansion due to the results of the double strand break repair mechanism32,33,34. Interestingly, when compared to other species of Wikstroemia sequenced in this study, the plastome of W. indica was smaller (151,731 bp) and had a greater GC content (37.4%)8. We found that the plastome of W. indica had a shorter IR region and larger SSC region than other species of Wikstroemia. Changes in the placement of the IR borders in the plastome of W. indica were likely due to contraction of the IR region, causing a loss in the number and content of the genes. Among the genes that were not found in W. indica but were present in other species of Wikstroemia, ndhA, ndhG, and ndhI were supposed to be present in the IR region; genes such as ccsA, ndhD, ndhE, ndhH, psaC, rps15, and trnL-UAG that are commonly duplicated in the IR regions were reduced to only one copy and were transferred to the SSC region, while the ndhF and rpl32 genes, common genes in the SSC region, were not detected. Therefore, it can be concluded that the contraction of the IR region that caused gene loss contributed to the difference in plastome content between W. indica and the other seven species of Wikstroemia.

Molecular evidence based on plastome sequences revealed a nonmonophyletic relationship between the species of Wikstroemia due to W. alternifolia and W. canescens clustering with Stellera chamaejasme. Information on the phylogenetic relationships of Wikstroemia species is scarce. Although taxonomic work is challenging in a genus with diverse species, continuous efforts among taxonomists studying members of the Thymelaeaceae have provided some insights into the taxonomic status of Wikstroemia. To provide better insight into the phylogenetic relationships at the nuclear level, we used ITS sequences to perform ML and BI analyses. Unlike phylogenomic tree analyses on complete plastome sequences, low bootstrap support and Bayesian posterior probabilities were observed at the species level in Wikstroemia. The molecular placement of the species of Wikstroemia, however, was identical in both the ML and BI trees, while the most distinct difference between both phylogenetic trees was the placement of S. chamaejasme. In the ML tree based on the ITS sequences, S. chamaejasme clustered within the Wikstroemia clade, but it was sister to Wikstroemia in the BI tree. The discordance between the plastid and nuclear phylogenies in this study may be due to phylogenetic sorting, convergence, unequal rates of evolution, long branch attraction, and introgression35. However, low branch node support in both the ITS-based ML and BI trees suggested that either the inclusion of additional nuclear gene sequences or the application of the restriction site-associated DNA sequence (RAD-Seq) technique that integrates up to 10% of the nuclear genome36 could be helpful in resolving the phylogenetic relationships within Wikstroemia. Evidently, in this study, the use of a single nuclear gene sequence, i.e., ITS, which was suspected to be useful in delimiting many plants at the species level37, was insufficient for resolving the phylogenetic relationships between Stellera and Wikstroemia.

Members of Wikstroemia currently comprise species previously placed under Capura L., Daphne L., Diplomorpha Meisn., Daphnimorpha Nakai, Lonicera L., Passerina L., Restella Pobed., and Stellera L.1,38. The monotypic genus Stellera, which exhibits strikingly similar morphological characteristics, has troubled some taxonomists who compared it to Wikstroemia. At least five species were placed under Stellera before they were transferred to Wikstroemia; others were transferred to allied genera, such as Daphne, Diarthron and Thymelaea in the tribe Daphneae38. This is understandable, as Stellera has a longer taxonomic history, i.e., back to 1747, when compared to other genera in the Daphneae. As a result, S. chamaejasme, as the type species, is the only species left in the genus. Based on the literature, we found that Wikstroemia has an interesting nomenclatural history in which two genera, Diplomorpha and Daphnimorpha, were synonymized and excluded. Combining Stellera with Wikstroemia was previously proposed by transferring the type species S. chamaejasme to the monotypic subgenus Chamaejasme11,39. However, the proposal was rejected, as Stellera has priority over Wikstroemia40, and based on the Rules of Nomenclature, the combination can only be accepted if Stellera is proposed as a nomen genus rejiciendum (nom. gen. rejic.)12. Therefore, we do not exclude the possibility that Stellera should be synonymized with Wikstroemia. In that case, Wikstroemia would be synonymized under Stellera. One should not jump to such a conclusion rashly, based on the current situation, as the taxonomic dispute on whether Wikstroemia should be synonymized with Daphne is yet unresolved41. Unless Daphne is considered in a subsequent taxonomic treatment, based on the phylogenetic trees in this study, we could only conclude that Wikstroemia is not monophyletic and that Stellera is unquestionably closely related to Wikstroemia.

While phylogenetic analyses based on the plastome sequences of Wikstroemia have proven to be promising, we suggest that larger sampling is required to resolve the taxonomic dispute in Wikstroemia through a molecular approach. We foresee that the genetic information in the complete plastome sequences of Wikstroemia is deemed sufficient and could aid in the classification of Wikstroemia, both at the genus level and at the species level.

Conclusion

To the best of our knowledge, this study presents the first genome-scale analysis of species of Wikstroemia. The findings revealed high conservation of genes in the plastomes. The identification of highly variable gene regions in the plastome sequences of Wikstroemia could potentially be useful in resolving phylogenetic relationships in the genus. A strong sistership between Wikstroemia and the monotypic genus Stellera was present. The ML and BI trees based on the plastome sequences revealed that all the branch nodes for eight species of Wikstroemia included in the phylogenetic tree were supported with high bootstrap values and Bayesian posterior probabilities (ML: ≥ 90%; BI: ≥ 95%), while the ITS-based tree analyses could not properly resolve the phylogenetic relationship between Stellera and Wikstroemia. Nevertheless, the molecular data obtained in this study will serve as a valuable resource for providing greater insights into the taxonomy and phylogeny of Thymelaeaceae.

Materials and methods

Plant materials and DNA extraction

Fresh leaf materials of six species of Wikstroemia, W. alternifolia, W. canescens, W. capitata, W. dolicantha, W. micrantha and W. scytophylla, were collected from botanical gardens and natural populations in China (Table 1). Species identification was carried out by Yonghong Zhang, and the voucher specimens were deposited in the Herbarium of Yunnan Normal University (YNUB)42. Based on the local guidelines and legislation on plant study, permissions for collections and research were unnecessary, as the samples were not collected in protected areas or recorded as threatened species. However, W. scytophylla was collected under permit record number w2021005, which was authorized in the Kunming Botanical Garden, Chinese Academy of Science, China. All collections are permitted and legal. Total genomic DNA was extracted using the Axygen AxyPrep Multisource Genomic Miniprep DNA kit (Corning, USA) following the manufacturer’s protocol.

Plastome sequencing, assembly and annotation

A sequence library was constructed, and sequencing was performed on the Illumina HiSeq 2500-PE150 platform (Illumina, USA). All raw reads were filtered using NGS QC Toolkit version 2.3.3 with default parameters to obtain clean reads43. The plastome was de novo assembled using NOVOPlasty44 with the rbcL gene sequence of Daphne kiusiana (GenBank accession KY991380) as the seed sequence. Gene annotation was performed in Geneious Prime28 using the complete plastome sequence of W. chamaedaphne (GenBank accession MN563132) as the reference genome. The circular physical map of the plastome was generated using OGDRAW45.

Repeat analyses

SSRs were identified using MISA-web46, in which parameters for the identification of perfect mono-, di-, tri-, tetra-, penta-, and hexanucleotide motifs were set for a minimum of 10, 5, 4, 3, 3, and 3 repeats, respectively. Long repeats, including forward, palindrome, reverse and complement repeats, were determined using REPuter47 with a Hamming distance of 3 and a minimal repeat size of 30 bp.

Codon usage

Coding sequences of each plastome were extracted, and the RSCU was analysed using MEGA748.

Comparative genome and divergence analyses

The complete plastome sequences of two species of Wikstroemia, W. chamaedaphne (GenBank accession MN563132) and W. indica (GenBank accession MN453832), which were available in NCBI GenBank, were downloaded and included in subsequent analyses. By using the plastome sequences of W. chamaedaphne as the reference genome, nucleotide variation in the plastome sequence alignment of the eight species of Wikstroemia was visualized using mVISTA49 in Shuffle-LAGAN mode. To detect the expansion and contraction of the IR region in the plastomes across the eight species, the IR/SC boundaries of the plastomes were visualized using IRscope50. To detect the mutational hotspots and divergence regions in the plastomes of the eight species, sequence alignment of the plastome sequences was carried out using Geneious Prime28. Calculations of the nucleotide variability (Pi) among the eight plastomes were performed using DnaSP v551 with a window length of 1000 bp and a step size of 500 bp.

Selection pressure analysis

The ratio of nonsynonymous to synonymous substitutions (Ka/Ks) of protein-coding genes was calculated for Aquilaria sinensis (GenBank accession MN720647) and Wikstroemia capitata. Calculations were conducted for two sets of data: (1) shared genes analysed separately and (2) a combined dataset containing all shared genes. Prior to sequence alignment using MUSCLE embedded in MEGA748, the plastome sequence of A. sinensis was reannotated to ensure uniformity. For the combined dataset, the coding sequences were concatenated manually. Selection pressure acting on these genes was estimated using KaKs_Calculator 2.052 based on the Yang and Nielsen codon frequency (YN) model, with parameters for the initial ratio of transition to transversion frequency (K) set between 0.3 and 0.7. A Ka/Ks value equal to or less than 1.0 indicates the presence of purifying selection, in which changes in gene residues of amino acids that may favour excess synonymous versus nonsynonymous substitutions have been avoided, while the presence of positive selection is specified if the Ka/Ks value is more than 1.0.

Polymerase chain reaction and Sanger sequencing

Polymerase chain reaction (PCR) amplification was carried out in a 20 µL reaction volume using the ITS universal primer set: 5F: 5ʹ-GGAAGTAAAAGTCGTAA-CAAGG-3ʹ (forward) and 4R: 5ʹ-TCCTCCGCTTATTGATATGC-3ʹ (reverse). The PCRs for the nuclear ribosomal DNA ITS region contained 10 µL of 2× Taq PCR Starmix with loading dye (Genstar Biosolutions, China), 0.4 µM of each primer and 20 ng of genomic DNA as a template. PCR amplifications were conducted on a T100 Thermal Cycler (Bio-Rad, USA), with initial denaturation at 93 °C for 5 min; 40 cycles of denaturation at 93 °C for 30 s, annealing at 60 °C for 30 s, and extension at 72 °C for 30 s; and a final extension at 72 °C for 5 min. PCR products were sent for direct Sanger sequencing at both ends using an ABI 3730 DNA Analyzer (Applied Biosystems, USA).

Phylogenetic analyses

Phylogenetic analyses were conducted based on the plastome or gene sequences of 17 selected taxa from Thymelaeaceae. Two species, Psidium guajava (Myrtaceae; GenBank accession KY635879) and Gossypium gossypioides (Malvaceae; GenBank accession HQ901195), were included as outgroups. Seven datasets, including the (1) complete plastome sequences excluding IRa, (2) the total gene sequences containing protein-coding genes, tRNAs, and rRNAs that are shared by all species, (3) the intergenic spacer (IGS) sequences, (4) all protein-coding genes that are shared by all species, and (5) three additional subdatasets at the codon level for the first/second/third codons of each amino acid in the protein-coding sequences, were used to perform phylogenetic inferences. Part of the complete plastome sequences excluding the Ira and the targeted genic and intergenic regions in the plastomes, was extracted and concatenated using Geneious Prime28, while the first/second/third codons of each amino acid in the shared genes were extracted using MEGA748. Sequence alignment was carried out using MAFFT v7.45053. The ML tree was constructed based on all the sequence datasets using RAxML 8.2.1154. The general-time-reversible (GTR) and gamma distributed (+ G) (+ GTR + G) DNA substitution model was selected, and all branch nodes were calculated under 1000 bootstrap replicates. BI analysis was conducted for all the datasets54,55. BI analysis was executed through the MrBayes55 pipeline available in the CIPRESS Science Gateway web portal56. Markov chain Monte Carlo (MCMC) was conducted with 2,000,000 generations, and sampling was collected every 100 cycles. The final tree was visualized using FigTree57 and edited manually.

The ITS sequences were aligned and manually trimmed for their primer sequences to obtain clean sequences. A total of 26 additional ITS sequences derived from members of the Thymelaeaceae were downloaded from the NCBI GenBank and MUSCLE-aligned against the ITS sequences of the six species of Wikstroemia used in this study using MEGA748. Two species, P. guajava (Myrtaceae; GenBank accession MN295360) and Gossypium australe (Malvaceae; GenBank accession AF057763), were included as outgroups. The alignment was trimmed using trimAL v1.258 with the gappyout method to reduce systematic errors produced by poor alignment. The optimal DNA substitution model for the ML analysis using the “Find Best DNA/Protein Model (ML)” function embedded in MEGA748 was calculated to be the Kimura two-parameter (K2P) model with the discrete Gamma model (+ G4) and invariant sites included (+ I) (= K2P + G + I). ML analysis was performed using MEGA748 with 1000 bootstrap replicates. BI analysis was conducted with a previously described method55.