Complete plastomes of six species of Wikstroemia (Thymelaeaceae) reveal paraphyly with the monotypic genus Stellera

Wikstroemia (Thymelaeaceae) is a diverse genus that extends from Asia to Australia and has been recorded on the Hawaiian Islands. Despite its medicinal properties and resource utilization in pulp production, genetic studies of the species in this important genus have been neglected. In this study, the plastome sequences of six species of Wikstroemia were sequenced and analysed. The plastomes ranged in size between 172,610 bp (W. micrantha) and 173,697 bp (W. alternifolia) and exhibited a typical genome structure consisting of a pair of inverted repeat (IR) regions separated by a large single-copy (LSC) region and a small single-copy (SSC) region. The six plastomes were similar in the 138 or 139 genes predicted, which consisted of 92 or 93 protein-coding genes, 38 tRNA genes, and 8 rRNA genes. The overall GC contents were identical (36.7%). Comparative genomic analyses were conducted with the inclusion of two additional published species of Wikstroemia in which the sequence divergence and expansion of IRs in the plastomes were determined. When compared to the coding sequences (CDSs) of Aquilaria sinensis, five genes, namely, rpl2, rps7, rps18, ycf1 and ycf2, indicated positive selection in W. capitata. The plastome-based phylogenetic analysis inferred that Wikstroemia in its current state is paraphyletic to Stellera chamaejasme, while the ITS-based tree analyses could not properly resolve the phylogenetic relationship between Stellera and Wikstroemia. This finding rekindled interest in the proposal to synonymize Stellera with Wikstroemia, which was previously proposed but rejected due to taxonomic conflicts. Nevertheless, this study provides valuable genomic information to aid in the taxonomic implications and phylogenomic reconstruction of Thymelaeaceae.

Wikstroemia Engl. (Thymelaeaceae) is a diverse genus of approximately 70 species. Members of Wikstroemia are widely distributed in the Asian and Oceanian regions and scattered around the Hawaiian Islands 1 . The species are mostly fibrous trees, shrubs or subshrubs with a woody rhizome. Several species are cultivated as raw material for pulp production 2,3 , while a handful of them are reported to have medicinal properties 4,5 . However, studies of Wikstroemia have been confined to its utilization in pulp production and pharmacological applications; reports on genetic studies of Wikstroemia are scarce.
The only reports on the genetic diversity to date include one on Wikstroemia ganpi in Korea using inter simple sequence repeat (ISSR) markers 6 and two, published, complete plastome sequences of Wikstroemia chamaedaphne and Wikstroemia indica 7,8 . Due to the lack of molecular evidence, taxonomic studies of Wikstroemia have relied solely on morphological characteristics 9 . Ironically, the continuous nature of morphological variation in members of Wikstroemia has led to much taxonomic confusion in attempts to distinguish species and has resulted in ambiguities in taxonomic classifications between Wikstroemia and its sister genera 9,10 . Among the key morphological characteristics proposed to differentiate Wikstroemia from allied genera is the presence of petaloid scales in the flower 11 . However, the presence and characteristics of the disc in the flowers of Wikstroemia was not emphasized. Failure to analyse this character may result in misidentifications due to overlap in this feature during classification 10 . The subgeneric classification of Wikstroemia, consisting of only the subgenera Wikstroemia and Diplomorpha, is generally accepted 12,13 . Another problem in classification is the difficulty in detecting natural hybridization among the species due to the possibility of low reproductive isolation and high genetic similarity, suggesting that Wikstroemia represents a large complex of species 14 .
The plastome is a circular double-stranded DNA molecule. In plants, the plastome is mostly maternally inherited and not disturbed by gene recombination 15 . A typical plant plastome ranges in size from 120 to 217 kb 16 . The complete plastome has a typical quadripartite structure, including a large single-copy (LSC) region, a small single-copy (SSC) region, and two separate inverted regions (IRs) 17 . Owing to its slow rate of evolution and ease of sequencing and assembly due to its small size, the plastome has been receiving much attention among biologist and taxonomist because it is highly informative and provides evolutionary and genetics insights 18,19 .
The taxonomic placement of Wikstroemia has been controversial. This genus has experienced a complicated classification history in reviews of members of the Thymelaeaceae. Stellera chamaejasme of the monotypic genus Stellera was reported to be sister to Wikstroemia based on combined plastid DNA sequences (trnT-trnL, trnL-trnF, trnL intron, and rpl16 intron) 20 , while Wikstroemia, along with 14 sister genera based on palynology findings, has been taxonomically placed in the Daphne group of the tribe Daphneae 21,22 . Although phylogenetic studies in Thymelaeaceae are ongoing 23 , phylogenetic relationships in Wikstroemia are likely to be understudied. Constituent genera in Thymelaeaceae have experienced similar molecular challenges, in which poor phylogenetic resolution is likely due to low genetic variation in the selected molecular markers 23 . Such conflicts can be overcome by utilizing genome-scale datasets 24 . At the same time, highly divergent regions may be identified through genome comparisons, which could aid in future phylogenetic studies of such a diverse genus as Wikstroemia.
In this study, we sequenced the complete plastomes of six species of Wikstroemia, W. alternifolia, W. canescens, W. capitata, W. dolicantha, W. micrantha, and W. scytophylla, to analyse and compare genomes using bioinformatic tools. Our aims were to (1) characterize the plastomes of the six species of Wikstroemia; (2) examine the variation in sequence repeats and codon usage in the six plastome sequences; (3) identify highly divergent regions in the plastome sequences; and (4) improve the understanding of the intrageneric/intergeneric phylogeny of Wikstroemia within Thymelaeaceae based on plastome sequences and the nuclear ribosomal DNA internal transcribed spacer (ITS) region.
Repetitive sequence analysis. The  63.64%) were located in the LSC regions rather than in the other two regions of the plastome (Fig. 2B).   (Fig. 3B). Long palindromic repeats were equally abundant in W. alternifolia and W. canescens, ranging from 40 to 60 bp and above 60 bp (Fig. 3C), while long palindromic repeats were        www.nature.com/scientificreports/ Sequence divergence analysis. The plastome sequence alignment of the eight species of Wikstroemia, using the W. chamaedaphne plastome as a reference, indicated high sequence conservatism across the plastomes of eight species but not in the plastome of W. indica (Fig. 4). Overall, the size and gene order of the plastomes in Wikstroemia were well conserved, but a distinct large gap was observed beginning within the ycf1 gene sequence of the IRa to 5′ region of the trnL-UAG in the IRb of W. indica. Both single-copy regions were recorded as having greater sequence divergence than the IR region (Fig. 5). With a Pi-value cut-off point of 0.025, eight highly variable gene regions were identified: ndhD-ndhF, ndhF-rpl32, ndhJ, petL-petG, psbI-trnS-GCU, trnG-UCC, trnK-UUU-rps16 and the trnL-UAA-trnF-GAA intergenic spacer regions. Six of the highly variable regions were located in the LSC, while two of them were in the SSC region.
Contraction and expansion in the IR region. Genes adjacent to the IR borders were consistent across members of Wikstroemia, except in W. indica, which varied in its adjacent genes at the IRb/SSC (JSB) and IRa/ SSC (JSA) borders (Fig. 6). In contrast to the rpl32 and ndhF genes in the SSC region, adjacent to JSB and JSA, respectively, the ycf1 gene was located across both JSA and JSB in the plastomes of W. Selection pressure. Sixty-nine shared protein-coding genes were included in the selection pressure analysis between Aquilaria sinensis and W. capitata (Table 3). When analysed separately, the K a /K s values indicated that five genes, namely, rpl2, rps7, rps18, ycf1 and ycf2, displayed positive selection; 61 genes indicated purifying selection, and three genes did not exhibit any synonymous (K s ) values indicative of selection due to the constraints of the model used. The K a /K s value for the combined dataset revealed that the overall selection pressure of the 69 shared protein-coding genes was 0.435, showing signals of purifying selection.
Phylogenetic analysis. The maximum-likelihood (ML) and Bayesian inference (BI) trees based on the complete plastome sequences excluding the IRa sequences and the dataset of the intergenic spacer (IGS) sequences revealed that all the branch nodes for eight species of Wikstroemia included in the phylogenetic tree were supported with high bootstrap values and Bayesian posterior probabilities (ML: ≥ 90%; BI: ≥ 95%) (Fig. 7). For the dataset of the total gene sequences containing protein-coding genes, tRNAs, and rRNAs that are shared by all species, strong posterior probabilities were recorded in most of the branch nodes of the BI tree but not in the ML tree, in which moderate bootstrap support was recorded for the backbone structure of the Wikstroemia clade (see Supplementary Figure S1 online). The molecular placement of W. capitata and W. indica, forming sister to each other under low branch support, in the ML tree and BI tree based on the dataset of all protein-coding genes was incongruent with the phylogenetic trees based on the datasets using complete plastome sequences excluding the IRa and intergenic spacer (IGS) sequences. The ML trees and BI trees based on the datasets of the first, second, and third codons of each amino acid in the protein-coding sequences did not display matching molecular placement of Wikstroemia when compared with each other; most of the branches were poorly supported in the Wikstroemia clade (see Supplementary Figure S2  formed a monophyletic group. The ITS-based ML tree revealed a paraphyletic relationship between Wikstroemia and S. chamaejasme, while most of the branch nodes within the Wikstroemia clade were not highly supported (Fig. 8A). Strong bootstrap support was recorded for the sistership between W. alternifolia and W. canescens and between W. micrantha and W. stenophylla. Weakly supported sisterships were present between W. dolicantha and W. scytophylla and between W. capitata and W. ligustrina. In contrast, the BI analysis displayed a monophyletic relationship within the Wikstroemia clade (Fig. 8B). Similar to the ML tree, sisterships were strongly supported between W. alternifolia and W. canescens and between W. micrantha and W. stenophylla but not between W. dolicantha and W. scytophylla or between W. capitata and W. ligustrina in the BI tree.

Discussion
The plastomes of the species in Wikstroemia examined in this study were highly conserved, which is similar to the situation in other angiosperms. The length of the plastomes of the six species of Wikstroemia varied little and were similar in size to typical angiosperms 25 . The same number and contents of the genes were predicted in this study, suggesting that the evolution of the gene sequences was consistent across the six species. Similar to most angiosperms, sequence repeats for A/T were more abundant than those of G/C in the Wikstroemia plastomes and may represent bias in the base composition, which is potentially affected by the tendency of the genome to change to A-T rather than to G-C 26 . An additional validation step for these SSRs, for which five novel SSR primer sets were designed, was conducted for the six species of Wikstroemia reported in this study. Details of the newly designed SSR primer sets and the resulting pherograms are included for reference (see Supplementary  Table S3 and Data S1 online).
Expansion and contraction of the IR region are major evolutionary events that influence the length of the plastomes 27 . The IR junctions in the plastomes reported in this study were placed and annotated with Geneious Prime 28 and further validated with GeSeq 29 as well as Sanger sequencing using novel specific primer sets (see Supplementary Table S4  www.nature.com/scientificreports/ between the repetitive sequence or poly-A structure and tRNA could be one of the reasons for the change in length in the IR region 30 . However, W. indica indicated dissimilarity in its IR borders, which differed from most angiosperms 31 . We suspect that the plastome IR contraction and expansion in W. indica is severe and may be due to extensive gene transfer and larger IR expansion due to the results of the double strand break repair mechanism [32][33][34] . Interestingly, when compared to other species of Wikstroemia sequenced in this study, the plastome of W. indica was smaller (151,731 bp) and had a greater GC content (37.4%) 8 . We found that the plastome of W. indica had a shorter IR region and larger SSC region than other species of Wikstroemia. Changes in the placement of the IR borders in the plastome of W. indica were likely due to contraction of the IR region, causing a loss in the number and content of the genes. Among the genes that were not found in W. indica but were present in other species of Wikstroemia, ndhA, ndhG, and ndhI were supposed to be present in the IR region; genes such as ccsA, ndhD, ndhE, ndhH, psaC, rps15, and trnL-UAG that are commonly duplicated in the IR regions were reduced to only one copy and were transferred to the SSC region, while the ndhF and rpl32 genes, common genes in the SSC region, were not detected. Therefore, it can be concluded that the contraction of the IR region that caused gene loss contributed to the difference in plastome content between W. indica and the other seven species of Wikstroemia. Molecular evidence based on plastome sequences revealed a nonmonophyletic relationship between the species of Wikstroemia due to W. alternifolia and W. canescens clustering with Stellera chamaejasme. Information on the phylogenetic relationships of Wikstroemia species is scarce. Although taxonomic work is challenging in a genus with diverse species, continuous efforts among taxonomists studying members of the Thymelaeaceae have provided some insights into the taxonomic status of Wikstroemia. To provide better insight into the phylogenetic relationships at the nuclear level, we used ITS sequences to perform ML and BI analyses. Unlike phylogenomic tree analyses on complete plastome sequences, low bootstrap support and Bayesian posterior probabilities were observed at the species level in Wikstroemia. The molecular placement of the species of Wikstroemia, however, was identical in both the ML and BI trees, while the most distinct difference between both phylogenetic trees was the placement of S. chamaejasme. In the ML tree based on the ITS sequences, S. chamaejasme clustered within the Wikstroemia clade, but it was sister to Wikstroemia in the BI tree. The discordance between the plastid and nuclear phylogenies in this study may be due to phylogenetic sorting, convergence, unequal rates of evolution, long branch attraction, and introgression 35 . However, low branch node support in both the ITS-based ML and BI trees suggested that either the inclusion of additional nuclear gene sequences or the application of the restriction site-associated DNA sequence (RAD-Seq) technique that integrates up to 10% of the nuclear genome 36 could be helpful in resolving the phylogenetic relationships within Wikstroemia. Evidently, in this study, the use of a single nuclear gene sequence, i.e., ITS, which was suspected to be useful in delimiting many plants at the species level 37 , was insufficient for resolving the phylogenetic relationships between Stellera and Wikstroemia.
Members of Wikstroemia currently comprise species previously placed under Capura L., Daphne L., Diplomorpha Meisn., Daphnimorpha Nakai, Lonicera L., Passerina L., Restella Pobed., and Stellera L. 1,38 . The monotypic genus Stellera, which exhibits strikingly similar morphological characteristics, has troubled some taxonomists who compared it to Wikstroemia. At least five species were placed under Stellera before they were transferred to Wikstroemia; others were transferred to allied genera, such as Daphne, Diarthron and Thymelaea in the tribe Daphneae 38 . This is understandable, as Stellera has a longer taxonomic history, i.e., back to 1747, when compared to other genera in the Daphneae. As a result, S. chamaejasme, as the type species, is the only species left in the genus. Based on the literature, we found that Wikstroemia has an interesting nomenclatural history in which two genera, Diplomorpha and Daphnimorpha, were synonymized and excluded. Combining Stellera with Wikstroemia was previously proposed by transferring the type species S. chamaejasme to the monotypic subgenus Chamaejasme 11,39 . However, the proposal was rejected, as Stellera has priority over Wikstroemia 40 , and based  www.nature.com/scientificreports/ on the Rules of Nomenclature, the combination can only be accepted if Stellera is proposed as a nomen genus rejiciendum (nom. gen. rejic.) 12 . Therefore, we do not exclude the possibility that Stellera should be synonymized with Wikstroemia. In that case, Wikstroemia would be synonymized under Stellera. One should not jump to such a conclusion rashly, based on the current situation, as the taxonomic dispute on whether Wikstroemia should be synonymized with Daphne is yet unresolved 41 . Unless Daphne is considered in a subsequent taxonomic treatment, based on the phylogenetic trees in this study, we could only conclude that Wikstroemia is not monophyletic and that Stellera is unquestionably closely related to Wikstroemia. While phylogenetic analyses based on the plastome sequences of Wikstroemia have proven to be promising, we suggest that larger sampling is required to resolve the taxonomic dispute in Wikstroemia through a molecular approach. We foresee that the genetic information in the complete plastome sequences of Wikstroemia is deemed sufficient and could aid in the classification of Wikstroemia, both at the genus level and at the species level.

Conclusion
To the best of our knowledge, this study presents the first genome-scale analysis of species of Wikstroemia. The findings revealed high conservation of genes in the plastomes. The identification of highly variable gene regions in the plastome sequences of Wikstroemia could potentially be useful in resolving phylogenetic relationships in the genus. A strong sistership between Wikstroemia and the monotypic genus Stellera was present. The ML and BI trees based on the plastome sequences revealed that all the branch nodes for eight species of Wikstroemia included in the phylogenetic tree were supported with high bootstrap values and Bayesian posterior probabilities (ML: ≥ 90%; BI: ≥ 95%), while the ITS-based tree analyses could not properly resolve the phylogenetic relationship www.nature.com/scientificreports/ between Stellera and Wikstroemia. Nevertheless, the molecular data obtained in this study will serve as a valuable resource for providing greater insights into the taxonomy and phylogeny of Thymelaeaceae. Repeat analyses. SSRs were identified using MISA-web 46 , in which parameters for the identification of perfect mono-, di-, tri-, tetra-, penta-, and hexanucleotide motifs were set for a minimum of 10, 5, 4, 3, 3, and 3 repeats, respectively. Long repeats, including forward, palindrome, reverse and complement repeats, were determined using REPuter 47 with a Hamming distance of 3 and a minimal repeat size of 30 bp.  Comparative genome and divergence analyses. The complete plastome sequences of two species of Wikstroemia, W. chamaedaphne (GenBank accession MN563132) and W. indica (GenBank accession MN453832), which were available in NCBI GenBank, were downloaded and included in subsequent analyses. By using the plastome sequences of W. chamaedaphne as the reference genome, nucleotide variation in the plastome sequence alignment of the eight species of Wikstroemia was visualized using mVISTA 49 in Shuffle-LAGAN mode. To detect the expansion and contraction of the IR region in the plastomes across the eight species, the IR/ SC boundaries of the plastomes were visualized using IRscope 50 . To detect the mutational hotspots and divergence regions in the plastomes of the eight species, sequence alignment of the plastome sequences was carried out using Geneious Prime 28 . Calculations of the nucleotide variability (Pi) among the eight plastomes were performed using DnaSP v5 51 with a window length of 1000 bp and a step size of 500 bp.

Materials and methods
Selection pressure analysis. The ratio of nonsynonymous to synonymous substitutions (K a /K s ) of protein-coding genes was calculated for Aquilaria sinensis (GenBank accession MN720647) and Wikstroemia capitata. Calculations were conducted for two sets of data: (1) shared genes analysed separately and (2) a combined dataset containing all shared genes. Prior to sequence alignment using MUSCLE embedded in MEGA7 48 , the plastome sequence of A. sinensis was reannotated to ensure uniformity. For the combined dataset, the coding sequences were concatenated manually. Selection pressure acting on these genes was estimated using KaKs_Calculator 2.0 52 based on the Yang and Nielsen codon frequency (YN) model, with parameters for the initial ratio of transition to transversion frequency (K) set between 0.3 and 0.7. A K a /K s value equal to or less than 1.0 indicates the presence of purifying selection, in which changes in gene residues of amino acids that may favour excess synonymous versus nonsynonymous substitutions have been avoided, while the presence of positive selection is specified if the K a /K s value is more than 1.0. Phylogenetic analyses. Phylogenetic analyses were conducted based on the plastome or gene sequences of 17 selected taxa from Thymelaeaceae. Two species, Psidium guajava (Myrtaceae; GenBank accession KY635879) and Gossypium gossypioides (Malvaceae; GenBank accession HQ901195), were included as outgroups. Seven datasets, including the (1) complete plastome sequences excluding IRa, (2) the total gene sequences containing protein-coding genes, tRNAs, and rRNAs that are shared by all species, (3) the intergenic spacer (IGS) sequences, (4) all protein-coding genes that are shared by all species, and (5) three additional subdatasets at the codon level for the first/second/third codons of each amino acid in the protein-coding sequences, were used to perform phylogenetic inferences. Part of the complete plastome sequences excluding the Ira and the targeted genic and intergenic regions in the plastomes, was extracted and concatenated using Geneious Prime 28 , while the first/second/third codons of each amino acid in the shared genes were extracted using MEGA7 48 . Sequence alignment was carried out using MAFFT v7.450 53 . The ML tree was constructed based on all the sequence datasets using RAxML 8.2.11 54 . The general-time-reversible (GTR) and gamma distributed (+ G) (+ GTR + G) DNA substitution model was selected, and all branch nodes were calculated under 1000 bootstrap replicates. BI analysis was conducted for all the datasets 54,55 . BI analysis was executed through the MrBayes 55 pipeline available in the CIPRESS Science Gateway web portal 56 . Markov chain Monte Carlo (MCMC) was conducted with 2,000,000 generations, and sampling was collected every 100 cycles. The final tree was visualized using FigTree 57 and edited manually. The ITS sequences were aligned and manually trimmed for their primer sequences to obtain clean sequences. A total of 26 additional ITS sequences derived from members of the Thymelaeaceae were downloaded from the NCBI GenBank and MUSCLE-aligned against the ITS sequences of the six species of Wikstroemia used in this study using MEGA7 48 . Two species, P. guajava (Myrtaceae; GenBank accession MN295360) and Gossypium australe (Malvaceae; GenBank accession AF057763), were included as outgroups. The alignment was trimmed using trimAL v1.2 58 with the gappyout method to reduce systematic errors produced by poor alignment. The optimal DNA substitution model for the ML analysis using the "Find Best DNA/Protein Model (ML)" function embedded in MEGA7 48 was calculated to be the Kimura two-parameter (K2P) model with the discrete Gamma model (+ G4) and invariant sites included (+ I) (= K2P + G + I). ML analysis was performed using MEGA7 48 with 1000 bootstrap replicates. BI analysis was conducted with a previously described method 55 .

Data availability
The complete chloroplast sequences generated and analysed in this paper are available in GenBank (https:// www. ncbi. nlm. nih. gov/ genba nk/, accession numbers listed in the text).