Characterization and comparative analysis among plastome sequences of eight endemic Rubus (Rosaceae) species in Taiwan

Genus Rubus represents the second largest genus of the family Rosaceae in Taiwan, with 41 currently recognized species across three subgenera (Chamaebatus, Idaoeobatus, and Malochobatus). Despite previous morphological and cytological studies, little is known regarding the overall phylogenetic relationships among the Rubus species in Taiwan, and their relationships to congeneric species in continental China. We characterized eight complete plastomes of Taiwan endemic Rubus species: subg. Idaeobatus (R. glandulosopunctatus, R. incanus, R. parviaraliifolius, R rubroangustifolius, R. taitoensis, and R. taiwanicolus) and subg. Malachobatus (R. kawakamii and R. laciniastostipulatus) to determine their phylogenetic relationships. The plastomes were highly conserved and the size of the complete plastome sequences ranged from 155,566 to 156,236 bp. The overall GC content ranged from 37.0 to 37.3%. The frequency of codon usage showed similar patterns among species, and 29 of the 73 common protein-coding genes were positively selected. The comparative phylogenomic analysis identified four highly variable intergenic regions (rps16/trnQ, petA/psbJ, rpl32/trnL-UAG, and trnT-UGU/trnL-UAA). Phylogenetic analysis of 31 representative complete plastomes within the family Rosaceae revealed three major lineages within Rubus in Taiwan. However, overall phylogenetic relationships among endemic species require broader taxon sampling to gain new insights into infrageneric relationships and their plastome evolution.

A partial ycf1 gene (1,107-1,248 bp) was located in the IRb/SSC junction region, while the complete ycf1 gene (5,820-5,862 bp) was located in the IR region at the SSC/IRa junction. The infA gene of the eight Taiwanese endemic Rubus plastomes located in the LSC region became a pseudogene. Interestingly, the highly conserved group II intron of atpF was lost, and a frameshift mutation via ATT deletion of the ndhF gene was identified in R. kawakmii of subg. Malachobatus (with a CDS size variation length of 2,241 bp), and caused early termination of translation (Fig. 2b). Also, it displayed point mutations, altering from transcriptions of tyrosine (Tyr) to phenylalanine (Phe). Two other point mutations on the ndhF gene altered phenylalanine (Phe) to isoleucine (Ile) and tyrosine (Tyr) to phenylalanine (Phe); the former mutation was in R. glandulosopunctatus, R. rubroangustifolius, R. taiwanicolus (Fig. 2d) and the latter in R. laciniastipulatus (Fig. 2c). Additionally, a point mutation was detected in R. laciniastipulatus, altering asparagine (Asn) to lysine (Lys) (Fig. 2c). Three Taiwanese Rubus plastomes from subg. Idaeobatus (R. incanus, R. parviaraliifolius, and R. taitoensis) contained the same consistent distribution of amino acids, sequences (Fig. 2e), and the CDS length of 2242 bp as four Rubus plastomes of two subgera Idaeobatus and Anoplobatus sampled from the Korean peninsula and Japan (Fig. 2a) 39 .
The frequency of codon usage of the eight complete Taiwanese endemic Rubus plastomes was calculated for the cp genome on the basis of the sequences of protein-coding genes and tRNA genes (Fig. 3) Table 2). Those genes included photosynthesis genes (ndA, ndhB, ndhD, ndhF, ndhG, petB, psaI, psbE, and psbF), self-replication genes (rpl23, rpoA, rpoB, rpoC2, rps2, rps14, and rps16) and others (accD, clpP, and matK). No RNA editing sites at ndhF and ndhG genes were found in R. glandulosopunctatus, R. rubroangustifolius, and R. taiwanicolus. In addition, RNA editing sites at the rpoC1 gene was observed in R. kawakamii. Compared with other species, the ndhA gene in R. glandulosopunctatus showed exceptionally higher frequencies (i.e., five times) in RNA editing sites. The ndhB gene had the highest number of potential editing sites (a total of 11 sites), followed by the ndhD gene (a total of eight sites). It showed consistent results from previous reports [42][43][44] , except for R. glandulosopunctatus that had the second highest number of potential editing sites in ndhA (10). All editing sites showed base transition from cytosine (C) to thymine (T), and the most frequent transition serine (Ser) conversion to leucine (Leu) (Fig. 3). Consequently, the amino acids with hydrophobic chains (isoleucine, leucine, methionine, and phenylalanine) formed in 88.6% of the 29 RNA editing sites.
Comparative analysis of genome structure. The eight complete plastome sequences of endemic Rubus species in Taiwan were plotted using mVISTA analysis using the annotated R. taiwanicolus plastome as a refer- Figure 1. The eigth endemic Rubus plastomes in Taiwan. The genes located outside of the circle are transcribed clockwise, while those located inside are transcribed counterclockwise. The gray bar area in the inner circle denotes the guanine-cytosine (GC) content of the genome, whereas the lighter gray area indicates the adenosine-thymine (AT) content of the genome. Large single copy, small single copy, and inverted repeat are indicated with LSC, SSC, and IR, respectively. Gene map was generated with the OrganellarGenomeDRAW (OGDRAW) 1 www.nature.com/scientificreports/ ence (Fig. 4). The results indicated that the LSC region was most divergent, that the two IR regions were highly conserved, and that the non-coding regions were more divergent and variable than the coding regions. In addition, the R. incanus, R. parviaraliifolius and R. taitoensis plastomes showed high sequence similarity (i.e., 97% sequence identity; 149,200 bp identical sites) to the R. taiwanicolus plastome, while R. kawakamii was least similar (97% sequence identity; 149,178 bp identical sites) to R. taiwanicolus. The sliding window analysis using DnaSP program revealed highly variable regions in the continental Island endemic taxa of the Rubus chloroplast genome (Fig. 5). As the eight plastomes of Rubus from Taiwan were compared, the average value of nucleotide diversity (Pi) over the entire cp genome was 0.010. The most variable region was the rps16/trnQ intergenic region with a 0.05018 Pi value. Also, highly variable regions included three other intergenic regions: petA/psbJ (Pi = 0.04567), rpl32/trnL-UAG (Pi = 0.04165), and trnT-UGU/trnL-UAA (Pi = 0.04147). Therefore, four highly variable regions with a Pi value greater than 0.04 identified in eight endemic Rubus plastomes in Taiwan can be useful for population genetic and phylogeographic study.
The positive selection analysis using DnaSP program with synonymous and nonsynonymous substitution options revealed positively selected genes (Fig. 6). Overall, the average Ka/Ks ratio of the 73 common proteincoding genes in the eight endemic plastomes was 0.45. For each conserved gene, a total of 44 (out of 73 genes) had an average Ka/Ks ratio below 1 for the eight comparison groups, suggesting that these genes were subjected   www.nature.com/scientificreports/ to strong purifying selection pressures in the Rubus chloroplast. The Ka/Ks ratio of > 1 had a total of 29 genes out of 73, suggesting that these genes were positively selected in the eight endemic plastomes in Taiwan. Those genes included one ATP subunit gene (atpI), three photosystem subunit genes (psbB, psbC, and psbD), two cytochrome b6/f complex gene (petA and petB), eight NADH oxidoreductase genes (ndhA, ndhD, ndhF, ndhG, ndhH, ndhI, ndhJ, and ndhK), Rubisco gene (rbcL), four encoded DNA dependent RNA polymerase genes (rpoA, rpoB, rpoC1, and rpoC2), four ribosomal subunit genes (rpl20, rps2, rps3, and rps4), maturase gene (matK), one subunit Acetyl-CoA carboxylase gene (accD), one c-type cytochrome synthesis gene (ccsA), one envelop membrane protein gene (cemA), and two unknown genes (ycf1 and ycf 2). Surprisingly, our results from the Taiwanese endemic Rubus suggested that ca. 40% of the protein-coding genes underwent positive selection pressures.
Phylogenetic analysis. Maximum likelihood (ML) analysis conducted on the best-fit model of "TVM + F + R4" revealed the first preliminary phylogenetic relationships among endemic species in Taiwan (Fig. 7). Phylogenetic analysis of 31 representative plastomes within the family Rosaceae supported both the monophyly of Rubus (100% bootstrap support) and the sister relationship between Rubus and the clade containing Fragaria and Rosa (100% bootstrap support). Based on limited available complete plastome sequences of Rubus from four subgenera, Anoplobatus and Malachobatus appear to be monophyletic, while Idaeobatus is not monophyletic (Fig. 7 . 7), requiring further confirmation based on rigorous sampling from both continental and island species.

Discussion
Chloroplast genome structure and evolution in Taiwanese Rubus endemics. For the first time, we characterized the eight complete plastomes of Taiwan endemic Rubus species, including two species from subg. Malachobatus. The size of the complete plastome sequences are highly conserved with the total length ranging from 155,566 to 156,236 bp (Table 1). In addition, despite their representations from two subgenera (Idaeobatus and Malachobatus), the plastomes are highly conserved, with no structural variations or content rearrangements. Interestingly, the highly conserved group II intron of atpF gene was lost in all eight plastomes regardless of their subgeneric assignments, as we demonstrated previously in the case of R. boninensis, R. crataegifolius, R. takesimensis and R. trifidus 35,39 . Within the major lineages of the family Rosaceae 45 , we found the complete atpF gene in members of the newly circumscribed subfamily Amygdaloideae, such as Prunus (KP732472), Pyrus (HG737342, AP012207), and Malus (NC040170, NC031163). However, loss of the atpF intron was also detected in other members of Rosoideae, such as Fragaria (KY769126, 769125, 434061), Hagenia (KX0088604), Potentilla (HG931056), and Rosa (KY419918, KX768420, KY419934). It still remains to be determined whether the loss of the atpF intron, that frequently occurs in Rosa and all subgenera of Rubus, has occurred in other major lineages within the family Rosaceae.
In this study, we detected mutations in the 3′ region of the ndhF gene; which is known to have frameshift mutations and alterations on transcription termination, as a result of higher substitution rate, a wide range of insertion and deletion (indel) variations, a low homoplasy rate, and a high AT content 46,47 . In comparative phylogenomic analysis of genus Rosa sect. Synstylae, Jeon and Kim 30 revealed frameshift and point mutations on the 3′ end of the ndhF gene. However, our previous study on the comparative analysis of four Rubus plastomes (two from each subgenus Anoplabatus and Idaeobatus) only showed nucleotide substitutions and transcription alterations without size variations 39 . A frameshift mutation via ATT deletion that caused early termination and translation was identified in R. kawakamii (subg. Malachobatus) from the eight Taiwan endemic Rubus plastomes (Fig. 2). Although the other endemic R. laciniatostipulatus belongs to the same subg. Malachobatus, it only showed a point mutation, which altered transcriptions from asparagine (Asn) to lysine (Lys) and tyrosine (Tyr) to phenylalanine (Phe). We also detected point mutations in other endemic Rubus in Taiwan for the same www.nature.com/scientificreports/ 2,242 bp sequences, which altered transcripts from tyrosine (Tyr) to phenylalanine (Phe), phenylalanine (Phe) to isoleucine (Ile), tyrosine (Tyr) to phenylalanine (Phe), and asparagine (Asp) to lysine (Lys). Furthermore, three Taiwanese endemic Rubus plastomes in subg. Idaeobatus (R. incanus, R. parviaraliifolius, and R. taitoensis) showed the same distribution of amino acids and sequence lengths (2,242 bp) with the four previously analyzed members of Rubus plastomes (two in subg. Idaeobatus and two in Anoplobatus) 39 . It appears that these changes are neither subgenus specific nor geographically confined to the East Sea (Ulleung Island), the northwestern Pacific Ocean (Bonin Islands), and the western rim of Pacific Ocean (Taiwan). The codon usage bias, which could be manifested primarily by the balance between mutational bias and natural selective forces, provide crucial information to our understanding of molecular evolution and environmental adaptation 41,48,49 . In the case of R. trifidus (continental progenitor)-R. boninensis (insular derivative) species pairs in subg. Anoplobatus, we found similar patterns in codon usage with some exceptions 39 . When compared with the pair of R. trifidus-R. boninensis, AUG (trnfM-CAU , trnI-CAU , and trnM-CAU), UCA (trnS-UGA), UAG (no usage), and CAA codon usage (trnQ-UUG ) showed different patterns in R. crataegifolius-R. takesimensis species pairs. When compared with these two species pairs, AUG (trnfM-CAU , trnI-CAU , and trnM-CAU), CAA (trnQ-UUG), and UCA (trnS-UGA) codon usage of the eight Taiwanese Rubus endemics showed similar patterns to R. cratageifolius, R. takesimensis, and R. trifidus from Korea. The specific codon usage amongst eight endemic Rubus was detected in R. laciniatostipulatus (subg. Malachobatus) with AUG codon usage (trnV-GAU). The UAG codon usage (stop codon) was similar only to R. crataegifolius and R. takesimensis on Ulleung Island, and R. boninensis on Bonin Islands. Rubus trifidus showed a different UAG codon usage of trnI-CAT. However, the Bonin Islands endemic R. boninensis showed different patterns of codon usage at AUG (trnfM-CAU and trnI-CAU), CAA (trnK-UUG) and UCA (no usage). Given that R. boninensis occurs on Minamiiwojima Island, which is estimated to be as young as 30,000 years old, it was interesting to notice that the endemic Rubus species on geologically younger islands showed more diverse patterns of codon usage than other insular endemics on Ulleung Island (R. takesimensis) and Taiwan (eight species in this study). The biased patterns toward high RSCU values of U and A, at the third codon usage in eight endemic Rubus plastomes in Taiwan, was similar to other angiosperms and algal lineages 40,41 .
Owing to their important function in plant metabolism, proteins and RNA molecules encoding plastid genes are subject to selective pressures 50 . While purifying selection acts to maintain protein functions, positive selection may come into play in response to environmental changes, novel ecological adaptation, or results from coevolutionary processes 50,51 . Previous studies showed that Ka/Ks values are usually less than one because synonymous nucleotide substitutions occur more frequently than nonsynonymous substitutions 52 . Interestingly, our current results from the eight endemic Rubus species from Taiwan suggested that ca. 40% protein-coding genes experienced positive selection pressures. All but two genes (ndhB and ndhC) of the NADH oxidoreductase genes of the eight Taiwan Rubus endemics showed that they were under positive selection pressure. Under strong light conditions, NADH dehydrogenase can protect plants from photoinhibition or photooxidation stress by stabilizing the NADH complex, and adjusting the photosynthetic rate and growth delay caused by drought 53,54 . In addition, all Rubus plastid gene encoding proteins related to transcription and post-transcriptional modification (matK, rpoA, rpoB, rpoC1, and rpoC2) underwent positive selection. In the same Rosaceae family, Fragaria vesca ssp. vesca showed six positively selected genes: rpoC2, ndhD, ndhF, psbB, ycf1, and ycf4 55 . We can only speculate that the positive selection pressure among eight endemic species of Rubus in Taiwan experienced by many genes was likely a result of their adaptation to subtropical climates in the island of Taiwan. This speculation, however, has to be investigated further.
A highly variable region or hotspot region in the whole chloroplast genome scale can help elucidate the phylogenetic relationships and complex evolutionary history of Rubus as a maternally inherited marker. Recently, several hot spot regions including genic and non-coding regions across the entire plastome were reported in several members of Rosaceae, such as Malus 56 , Prunus 57 , Pyrus 27 , and Rosa 30 . In our current study, we also detected four highly variable regions, including rps16/trnQ (Pi = 0.0518) and petA/psbJ (Pi = 0.0466), as two of the most variable hotspots within the eight Taiwan Rubus endemics, with an average nucleotide diversity (Pi) value of 0.01. When compared with our early study (average Pi value of 0.01), which included subg. Idaeobatus (four taxa) and subg. Cylactis (one taxon), the two most variable noncoding regions, trnL/trnF and rps16/trnQ, were detected with a Pi value of 0.05 and 0.046, respectively 35 . Yang et al. 39 compared the four taxa of progenitor-derivative species pairs in subg. Idaeobatus (R. crataegifolius-R. takesimensis on Ulleung Island) and subg. Anoplobatus (R. trifidus-R. boninensis on the Bonin Islands), and found that the average Pi value (0.005) was substantially lower than that found between subg. Idaeobatus and subg. Cylactis (Pi = 0.01). The same study also suggested that the trnT/trnL region was the most variable region with a Pi value of 0.027. Thus, considering all previous studies on Rubus for the identification of hotspot regions throughout the complete plastomes, two intergenic regions rsp16/trnQ and trnT/trnL were found to be the most variable hotspot regions within genus Rubus, including members of four subgenera (Anoplobatus, Cylactis, Idaeobatus and Malachobatus. Therefore, four hotspot regions from this study, i.e., rps16/trnQ, petA/psbJ, rpl32/trnL-UAG, and trnT-UGU/trnL-UAA, can be used as effective chloroplast DNA markers for population genetic and phylogeographic studies of Rubus species in Taiwan. Phylogenetic position and relationships of Taiwanese endemic Rubus. Given the scarcity of available complete plastome sequences of genus Rubus, we were not able to meticulously assess the phylogenetic relationships among the eight endemic Rubus species from Taiwan and their congeneric species in Taiwan and mainland China. Nevertheless, this study provides some insights into preliminary assessments of the relationships among Rubus endemics in Taiwan. First of all, the clade of Rubus endemics in Taiwan included three species, R. taitoensis, R. parviaraliifolius, and R. incanus, which belong to subg. Idaeobatus (Fig. 7)  www.nature.com/scientificreports/ central mountains in Taiwan, was treated as a synonym of R. niveus, which occurs widely in South Asia and Southeast Asia 3 . It was considered that R. incanus (narrow cymose panicles or short thyrses) and R. niveus (umbellate corymbs) are two distinct taxa on the basis of the inflorescence type, and our preliminary data based on the complete plastome sequences support the statement of Huang and Hu 17 (Fig. 7). In addition, R. incanus has red drupelets at maturity, while R. niveus has red immature and black drupelets at maturity, further supporting that they are distinct taxonomic entities. R. parviaraliifolius occurs in low to medium altitudes (300-1,800 m) throughout the island and has 5-foliolate leaves with red fruits at maturity, and its sister species in this phylogeny, R. taitoensis, has simple leaves (not divided or 3-lobed) with orange to yellow fruits, occurring in medium altitudes (1,500-2,800 m) in the central mountains. It is necessary to include three species of this clade into a broader phylogenetic framework of Rubus in Taiwan and mainland China to precisely determine species relationships among these taxa.
The second clade includes three species in subg. Idaeobatus, R. glandulosopunctatus, R. rubroangustifolius, and R. taiwanicolus, which is more closely related to the clade of subg. Anoplobatus (R. trifidus and R. boninensis) and subg. Idaeobatus (R. crataegifolius and R. takesimensis) than other members of the same subgenus (R. amabilis, R. coreanus, R. niveus, R. taitoensis, R. parviaraliifolius, and R. incanus) (Fig. 7). In general, the clade of subg. Idaeobatus is not well resolved, especially subg. Anoplobatus deeply embedded within subg. Idaeobatus clade containing R. leucanthus, R. corchorifolius, R. glandulosopunctatus, R. rubroangustifolius, and R. taiwanicolus, R. crataegifolius and R. takesimensis (Fig. 7) 14 . Rubus taiwanicolus is a small subshrub up to 15 cm tall with 9-15 foliolate leaves, occurring in the central mountains from medium to high altitudes (1,500-3,000 m), and no previous hypothesis regarding its relationship to other congeneric species exists. In the case of R. glandulosopunctatus, it is considered a synonym of R. rosifolius, which occurs widely in Asia (East Asia, South Asia, and Southeast Asia), Africa, and Australia. As pointed out by Huang and Hu 17 , this widespread species exhibits tremendous morphological variations, requiring more investigation on this complex taxon. Given that the chloroplast and nuclear combined phylogenies ( Figs. 1 and 2) 15 . In our current complete plastome-based phylogeny (Fig. 7), the clade containing R. glandulosopunctatus is closely related to the clade containing R. trifidus and R. craetaegifolius, which is consistent with the target capture sequencing phylogeny 15 . Extensive sampling regarding the phylogenetic relationships of this complex taxon still have to be further determined. R. rubroangustifolius, which is endemic to eastern and northern Taiwan, was treated as a synonym of R. croceacanthus var. glaber 17 . Rubus croceacanthus has never been included in previous phylogenetic studies, and it has been known for its tremendous morphological variations in Taiwan. In Huang and Hu 17 , R. rubroangustifolius was treated as a synonym of R. cardotii, and R. croceacanthus var. glabra was treated as a synonym of R. cardotii. It was also pointed out that R. cardotii can be easily distinguished from R. croceacanthus on the basis of several morphological characteristics. However, little is known about the phylogenetic position of R. rubroangustifolius and its relationship to R. croceacanthus and R. cardotii.
Lastly, the third clade of two endemic species of subg. Malachobatus, R. laciniatostipulatus, and R. kawakamii form a clade with R. lambertianus var. glaber, another species of the same subgenus. R. fockeanus from subg. Cylactis is sister to the clade of subg. Malachobatus. Based on three concatenated regions of chloroplast DNA among Chinese Rubus species, it was shown that subg. Cylactis is closely related to subg. Malachobatus, including few exceptional species from subg. Idaeobatus (e.g., R. pungens complex and R. peltatus) and Dalibardastrum (R. amphidasys and R. tsangorum). Subg. Cylactis has also shown to be highly polyphyletic on the basis of target capture sequencing study 15 . Rubus foeckeanus typically occurs in high elevation (2,000-4,000 m in glassy slopes and forests) and has 3-foliolate compound leaves. Rubus laciniatostipulatus occurs widely in southern China and southeastern Asia. In Taiwan, it occurs in forest edges in the northern and central parts of the island at low elevations (20-300 m) 17 . In the case of R. kawakamii, it is commonly distributed in forests at medium altitudes (1,000-2,500 m) throughout the central mountains. This species is very difficult to distinguish from R. swinhoei, which occurs at low altitudes (20-1,200 m) in northern and central parts, and is often treated as a variety of R. swinhoei 58 . Nevertheless, both R. laciniatostipulatus and R. kawakamii have simple leaves, shallowly 5-7 lobed or not divided, respectively, and display a close relationship 14 . Overall, phylogenetic relationships among the endemic species, including infraspecific taxa, require broader taxon sampling from China and Taiwan to gain new insights into infrageneric relationships, as well as their plastome evolution. It is also necessary to assemble plastome sequences from members of subg. Rubus and several other subgenera to fully understand plastome evolution and to reveal the complex evolutionary history of Rubus on a global scale.  65 and adjusted manually with Geneious 61 . Using DnaSP v. 6.10 software 66 , a sliding window analysis with a step size of 200 bp and a window length of 800 bp was conducted to determine the nucleotide diversity (Pi) of the plastome. The codon usage frequency was calculated using MEGA7 67 with the relative synonymous codon usage (RSCU) value 68 , which is a simple measure of non-uniform usage of synonymous codons in a coding sequence. The DNA code used by bacteria, archaea, prokaryotic viruses, and chloroplast proteins was used 69 . Protein-coding genes were run using the online program predictive RNA editor for plants (PREP) suite 70 , with 22 genes as reference, based on a cut off value of 0.8 to predict the possible RNA editing sites in eight endemic Rubus from Taiwan. Analyses based on the complete cp genomes and concatenated sequences of 75 common protein-coding genes among the studied species were performed via MAFFT v. 7 65 71 . Prinsepia utilis was used as an outgroup, and a non-parametric bootstrap analysis was performed with 1000 replicates.