Introduction

Araliaceae, the ginseng family, is a highly diversified family containing approximately 45 genera and 1500 species; several morphological characteristics of the Araliaceae, such as flower morphology, overlap with those of the closely related Apiaceae1,2,3,4,5. The Araliaceae were previously merged into the Apiaceae (Umbelliferae) owing to their similar morphology, but eventually previous studies treated the Araliaceae as an independent family2,6,7,8. This family consists of two subfamilies: the species-rich Aralioideae, and the Hydrocotyloideae, containing two genera (Hydrocotyle and Trachymene). Species in the family Araliaceae are widely distributed in tropical and subtropical regions of both the northern and southern hemispheres, with less diversity in temperate regions1,9. These species are well known as useful biological resources for both medicinal (Panax, Aralia, Eleutherococcus, Heteropanax, Kalopanax, Tetrapanax, and Fatsia) and ornamental (Hedera, Dendropanax, and Schefflera) purposes, with many studies having investigated the medicinal properties of Araliaceae species10,11,12,13,14,15,16.

Phylogenetic questions around the Araliaceae remain despite several previous studies investigating them1,2,4,17,18,19,20. Phylogenetic studies using the internal transcribed spacer (nrITS) region of the nuclear ribosomal DNA (nrDNA) and several plastid markers suggest four major groups within the Aralioideae subfamily—the Asian Palmate group, the AraliaPanax group, the Greater Raukaua group, and the PolyciasPseudopanax group—and clearly support the monophyly of the two subfamilies, Aralioideae and Hydrocotyloideae1,2,4,17,18,19,20. However, phylogenetic discordance has emerged between different phylogenetic trees and regarding the poly- or paraphyly of the genera Schefflera, Pseudopanax, Aralia, and Polyscias1,2,4,17,18,19,20. This highlights the limitations, such as polytomy, of phylogenetic analyses using a few DNA barcoding regions, indicating that genome-based phylogenetic studies are required to shed light on phylogenetic relationships within the Araliaceae.

Plastids, which have an endosymbiotic origin, are organelles specific to plant cells and contain their own genomes. Unlike the mitochondrial genomes (mitogenomes), which are another endosymbiotic organelle genome of plant cells and have complicated genome structures, plastid genomes (plastomes) have a conserved quadripartite structure and are around 150 kb in size21,22. In most land plants, plastomes contain two copies of an inverted repeat (IR) region separating a large single-copy (LSC) region and a small single-copy (SSC) region21. Reports of complete plastid genomes and phylogenomic studies using plastome data have rapidly increased with high-throughput sequencing technologies. Although maternally inherited plastid DNA often exhibited phylogenetic discordance with biparentally inherited nuclear DNA, plastome data are still widely used for revealing phylogenetic relationships and many studies have demonstrated the utility of plastome data for resolving relationships among various species23,24,25,26,27. In the Araliaceae, numerous plastomes have been used to develop molecular markers for species identification and to reveal phylogenetic relationships in the genus Panax and the Asian Palmate group; Panax species include popular medicinal plants such as Korean ginseng, and phylogenetic relationships within the Asian Palmate group remain unclear28,29,30,31. In addition, evolution of the Panax species has been examined because of their intercontinental disjunction pattern between East Asia and North America29,32,33,34,35. However, the plastome evolution and phylogenetic relationships of the Araliaceae have been poorly studied at the family level using plastome data. The nrDNA consists of two transcription units, 45S and 5S, thousands of copies of which are tandemly repeated in the nuclear genome36,37. The tandemly repeated nrDNA structure results in conserved sequence mutations, which contribute to the ease of sequencing and use for phylogenetic studies38,39.

Low-coverage whole-genome sequencing allows stimultaneous assembly of maternally inherited plastome and biparentally inherited nrDNAs40,41. In this study, we newly assembled the complete plastomes and 45S nrDNAs, including nrITS regions, of 13 species (12 Araliaceae species and one Apiaceae species) and compared these with previously reported Araliaceae plastomes. Furthermore, we inferred phylogenetic relationships among the Araliaceae using previously reported plastome and nrDNA sequences from NCBI GenBank and explored plastome evolution within the family Araliaceae.

Results

Plastid genome and 45S nrDNA assemblies

Twelve species were newly sequenced using Illumina sequencing platforms (Supplementary Table S1), and the genome sequence of Hydrocotyle vulgaris (ERX5309978) was downloaded from the NCBI SRA database. We successfully assembled 13 plastomes as circular genomes comprising two copies of the IR region (25,060–25,969 bp) separated by the LSC (84,198–86,630 bp) and the SSC (17,869–18,708 bp) regions (Supplementary Fig. S1A, Supplementary Table S1). The mean coverage of the assembled plastomes ranged from 227 × in Hedera helix to 2108 × in Centella asiatica (Apiaceae) with no gaps. All newly assembled plastomes were similar to previously reported Araliaceae and Apiaceae plastomes in genome structure and gene content, even in the IR and SC boundaries, with Hydrocotyle vulgaris having the smallest plastome (153,026 bp) and Schefflera arboricola the largest one (156,704 bp). The GC content of plastomes ranged from 37.7% in Hydrocotyle vulgaris and Centella asiatica to 38.1% in the two Aralia plastomes (Supplementary Table S1), similar to previously reported Araliaceae plastomes. All the plastomes assembled in this study contained 80 protein-coding genes, 31 tRNAs, and four rRNAs with 115 unique genes overall (Supplementary Table S2). Seventeen genes were duplicated in the IR region, giving a total of 132 genes in the Araliaceae plastomes. Eighteen genes contained one or two introns: 15 genes contained one intron and three genes (rps12, clpP, and ycf3) contained two introns. Six of the intron-containing genes were tRNAs (Supplementary Table S1).

The complete 45S nrDNA transcription unit assemblies, including 18S, 5.8S, 15S, and two internal transcribed spacers (ITS1 and ITS2), were almost identical in length, with mean sequence coverage ranging from 305 × in Hydrocotyle vulgaris to 3094 × in Kalopanax septemlobus (Supplementary Fig. S1B; Supplementary Table S1). The 5.8S rRNA sequence was 160 bp in all species, whereas the lengths of 18S, ITS1, ITS2, and 26S varied by species: 1807–1808 bp for 18S rRNA, 222–228 bp for ITS1, 231–237 bp for ITS2, and 3395–3399 bp for 26S rRNA. The GC content was more than 53%, ranging from 53.6% in Hydrocotyle vulgaris to 55.8% in Centella asiatica (Supplementary Table S1).

Phylogenetic relationships in the Araliaceae

We performed phylogenetic reconstructions based on plastome data using four different data matrices: whole-plastome sequences, concatenation of 78 protein-coding gene sequences (CDSs), first and second sites of the 78 CDSs, and translated protein sequences (Fig. 1, Supplementary Figs.S2S5). Phylogenetic results from the four data matrices indicated that the two subfamilies within the Araliaceae, Aralioideae and Hydrocotyloideae, are monophyletic, while monophyly of the three major groups in the Aralioideae, the Asian Palmate group, the AraliaPanax group, and the Greater Raukaua group, was well resolved (Fig. 1, Supplementary Figs.S2S5). However, the genus Aralia was not monophyletic in all four phylogenies (Fig. 1, Supplementary Figs.S2S5), similar to previous studies1,2,4,17,18,19. We were not able to examine the monophyly of the PolyciasPseudopanax group or the Osmoxylon group because we had only one sample from each group. Topologies within the Asian Palmate group differed using the four data matrices. Within the Asian Palmate group, we observed phylogenetic discordance in the positions of the MacropanaxMetapanaxKalopanax clade (Clade 2), the FatsiaOreopanax clade (Clade 4), and the DendropanaxChengiopanaxGamblea clade (Clade 5), and in the monophyly of the ScheffleraHeteropanaxTetrapanax clade (Clade 6) and the Oplopanax clade (Clade 7) (Fig. 2A–C). The clades 2, 4, and 5 were each located in two positions; however, the topology inferred from whole-plastome sequences and protein-coding gene sequences had stronger branch support than the topology inferred from the first and second sites of the 78 CDSs or the translated protein sequences (Fig. 2B,D). Monophyly of clades 6 and 7 was inferred from the protein-coding gene sequences and the translated protein sequences, although the branch support values were relatively low (Fig. 2C,D).

Figure 1
figure 1

A summary of Araliaceae phylogenies inferred from four different datasets. Bootstrap values calculated by maximum-likelihood analyses of whole-plastome sequences, concatenated protein-coding gene sequences, first and second sites of CDSs, and amino acid sequences, respectively, are shown near branches. Solid diamonds indicate full support (100%) from four different datasets, and asterisks indicate full support from the corresponding dataset. Hyphens indicate missing values owing to different results for the topology of the corresponding dataset. Clades marked by numbers in gray circles correspond to the clades shown in Fig. 2.

Figure 2
figure 2

Conflicting phylogenetic results for the Asian Palmate group in the Araliaceae. (A) Four topologies inferred from four different datasets (whole-plastome sequence, protein-coding gene sequence, first and second sites in CDS, and protein sequence); simplified topologies for each conflict are illustrated in B, C, D, and E. Detailed information for clades marked by numbers in gray circles can be found in Fig. 1. (B) Conflicts within the Asian Palmate Core. (C) Conflict of the Oplopanax position. (D) Topologies supported and bootstrap values estimated by maximum-likelihood analyses based on four different datasets. Topologies correspond to those in (B and C). The greatest bootstrap value among the four phylogenies is indicated in bold. Pf: Polyscias fruticosa.

The phylogeny based on 45S nrDNA sequences, including 18S, ITS-1, 5.8S, ITS2, and 26S regions, indicated strong support for monophyly of the Asian Palmate group and the AraliaPanax group, with bootstrap support values of 90% and 80%, respectively (Fig. 3). However, phylogenetic relationships within the Asian Palmate group conflicted with those in the plastome-based phylogeny and remained unclear owing to a lack of bootstrap support (less than 50%), indicating polytomy (Fig. 3). The monophyly of the genus Panax also collapsed because of the position of Panax trifolius. P. trifolius, distributed in North America, is known to be the most basal group of the genus Panax, which was supported by our plastome-based phylogenies (Fig. 1, Supplementary Figs.S2S5); however, phylogenetic positions of P. trifolius, Aralia cordata and A. elata were unclear with low support (less than 50%) in the nrDNA phylogeny (Fig. 3). While P. stipuleanatus occupied a consistent position as sister to all Panax species (except P. trifolius) in the two phylogenies, phylogenetic positions of other Panax species were incongruent (Fig. 3). P. notoginseng was sister to a clade of four Panax species, P. bipinnatifidus, P. wangianus, P. vietnamensis, and P. zingiberensis, in the plastome-based phylogeny; however, it was sister to a clade of these four Panax species and two tetraploid Panax species, P. ginseng and P. quinquefolius, in the nrDNA phylogeny (Fig. 3). Polyscias fruticosa also showed a difference in phylogenetic position between the plastome and nrDNA phylogenies, being sister to the Asian Palmate group in the plastome phylogeny but showing uncertain relationships with both the Asian Palmate and AraliaPanax groups in the 45S nrDNA phylogeny (Fig. 3). Consequently, we inferred the species network of the family Araliaceae by considering the topologies and branch support values obtained from the plastome-based phylogenies using the four data matrices as well as the 45S nrDNA phylogeny, and this network was used to estimate divergence times of Araliaceae species by constraining the topology (Fig. 1).

Figure 3
figure 3

Comparison between plastome-based and nrDNA trees of the Araliaceae. nrDNA sequences of 25 species were available for phylogenetic reconstruction. Phylograms of plastome (left) and nrDNA (right) trees inferred by the maximum-likelihood method using RAxML; bootstrap support values greater than 50% are indicated above branches, and nodes with less than 50% support were collapsed. Each group is marked in a different color. Long branches are indicated with slashes and shortened; each slash represents one scale bar. E.: Eleutherococcus, K.: Kalopanax, D.: Dendropanax, S.: Schefflera, T.: Tetrapanax, Po.: Polyscias, Pa.: Panax, Hy.: Hydrocotyle.

Nucleotide substitution analyses

Using the plastome sequence of Centella asiatica (Apiaceae) as a reference, we estimated nucleotide substitution rates using 78 plastid genes of 63 Araliaceae plastomes. Synonymous (dS) and nonsynonymous (dN) nucleotide substitution rates of three Hydrocotyle plastomes in the subfamily Hydrocotyloideae were relatively higher than those of plastomes in the subfamily Aralioideae (Table 1), and differences in nucleotide substitution rates were supported statistically by analyses of variance (ANOVA) and Bonferroni post-hoc tests (P < 0.001) (Supplementary Fig. S6). Compared with the Aralioideae plastomes, the dS was relatively faster than the dN in the Hydrocotyloideae plastomes (Table 1), leading to a lower dN/dS ratio (Supplementary Fig. S6). Within the Aralioideae, the dS of the AraliaPanax group was statistically higher (P < 0.001) than those of the Asian Palmate group and the Greater Raukaua group. The difference in dS between the AraliaPanax group and Osmoxylon/Polyscias groups was not significant owing to having just single samples of Osmoxylon and Polyscias (Supplementary Fig. S6). Hydrocotyle plastomes in the Hydrocotyloideae showed a statistically significant difference from other groups in the dN analysis, whereas differences among the five groups in the Aralioideae were not supported in any case (Supplementary Fig. S6). The AraliaPanax group had an equivalent dN and high dS compared with the other four groups in the Aralioideae, resulting in a relatively low dN/dS ratio (P < 0.001), and this phenomenon was also confirmed in the Hydrocotyle plastomes (P < 0.001).

Table 1 Transition and transversion between subfamilies and among four groups in the Araliaceae family.

Using the Centella asiatica plastome as the standard, we analyzed transitions and transversions to determine which nucleotide substitution types are increased in the Hydrocotyloideae plastomes showing significantly higher substitution rates compared with Aralioideae plastomes (Table 1; Supplementary Fig. S7). The most common substitutions in protein-coding regions of Araliaceae plastomes were transitions, with the most common transition type being a T-to-C (A-to-G) transition, occurring in a range from 987 sites in Oreopanax obtusifolius to 1270 sites in Hydrocotyle sibthorpioides (Table 1; Supplementary Fig. S7). All the plastomes of the subfamily Aralioideae had more T-to-C transitions than C-to-T (G-to-A) transitions; however, plastomes in the subfamily Hydrocotyloideae had more C-to-T transitions than T-to-C transitions. The three Hydrocotyle (Hydrocotyloideae) plastomes had relatively higher numbers of both transitions and transversions than the Aralioideae plastomes, but the number of C-to-T transitions in the Hydrocotyloideae plastomes was approximately double that in the Aralioideae plastomes (Table 1; Supplementary Fig. S7). This increase in C-to-T transitions was found in all protein-coding genes (not any specific gene) used in the analyses of substitution rate and type, and appeared to be a major driver of the accelerated nucleotide substitution rates of Hydrocotyloideae plastomes compared with Aralioideae plastomes (Table 1; Supplementary Figs. S6, S7).

Divergence time and ancestral distribution estimation

We performed molecular clock analysis to estimate divergence time based on concatenated sequences of 78 plastid genes using BEAST (Supplementary Fig. S8). Our results indicated that the split within the Araliaceae began around 45 million years ago (Ma) during the Middle Eocene. The first split occurred between the two subfamilies, Aralioideae and Hydrocotyloideae, at an estimated divergence time of 44.76 Ma during the Middle Eocene (68.23–25.37 Ma in 95% HPDs). Within the Aralioideae, splits between the five major groups (AraliaPanax group, Asian Palmate group, Greater Raukaua group, Polycias–Pseudopanax group, and Osmoxylon group) occurred during the Late Oligocene and the Early Miocene (Supplementary Fig. S8). During the Middle Miocene, the split of Aralia and Panax occurred at 13.84 Ma, while rapid divergence of clades showing phylogenetic discordance in the Asian Palmate group occurred from 14.36 to 9.61 Ma (Figs. 1, 2B,C, 4A, Supplementary Fig. S8). Both the divergence of the two subfamilies and the divergences within the AraliaPanax group (the split between Aralia and Panax) and the Asian Palmate group (rapid divergence of clades showing phylogenetic discordance) were estimated to coincide with climatic optima in the Middle Eocene and the Middle Miocene (Fig. 4B). Within the genus Panax, two Panax species distributed in North America diverged from other Panax species at separate times. The first North American species, P. trifolius (dwarf ginseng), phylogenetically located in the basal group of the genus Panax, diverged from other Panax species at 11.45 Ma; the second basal species, P. stipuleanatus, distributed in South East Asia, diverged around 9.93 Ma during the Late Miocene (Fig. 5A, Supplementary Fig. S8). The other Panax species then diverged into two lineages, South Asian and Northeast Asian (Figs. 1, 5A, Supplementary Fig. S8). The split between these two lineages occurred at the boundary of the Miocene and Pliocene (6.42 Ma), and the other North American species, P. quinquefolius (American ginseng), diverged from Panax ginseng (Korean ginseng) of the Northeastern Asian lineage at 1.42 Ma (3.83–0.06 in 95% HPDs) (Fig. 5A, Supplementary Fig. S8).

Figure 4
figure 4

Ancestral distribution and divergence time estimation of the Araliaceae with global climate over the last 65 million years. (A) Graphical results of ancestral distribution areas at each node of the phylogeny of the Araliaceae obtained by S-DIVA, and divergence times for Araliaceae species were estimated by BEAST. The split between Apiaceae and Araliaceae was set as the calibration point. The map with ancestral ranges was illustrated using MapChart (https://www.mapchart.net/). Colour key indicates possible ancestral ranges; letter combinations indicate a combination of ranges. (B) Climate for the Cenozoic (0–65 Ma). The climatic curve is a stacked deep-sea benthic foraminiferal oxygen-isotope curve and was modified from Zachos et al.51. Q.: Quaternary; Pl.: Pliocene; P.: Pleistocene.

Figure 5
figure 5

Evolutionary model of Panax species. (A) Estimated divergence times for Panax species with information on their distribution and ploidy level. The predicted climate for the most recent 20 million years and the curve were modified from Zachos et al.51. (B) Supplementary hypothesis involving two-stage migration of Panax to North America: (1) migration of ancestral diploid Panax through floristic exchange and (2) migration of tetraploid Panax mediated by recurrent glaciations. The map indicating the distribution areas of Panax species was downloaded from MapChart (https://www.mapchart.net/). SE: southeastern; S.: southern; NE: northeastern; W.: western; N.: north; Pl.: Pliocene; P.: Pleistocene.

The optimal ancestral reconstruction by the Statistical Dispersal-Vicariance Analysis (S-DIVA) on the BEAST tree proposed 17 vicariance events and 26 dispersal events within the Araliaceae (Fig. 4A). Ancestral area reconstruction suggested Oceania (G) as the base for the Araliaceae, which may be due to the basal Hydrocotyle and the Greater Raukaua groups being distributed in Oceania. After the Hydrocotyle divergence, the remaining Araliaceae species dispersed from Oceania (G) to Southeast Asia, the Himalayas, up to Central China (BD) (node 100) for the Asian Palmate group, and farther into South and Central Asia to the Americas (BCDF) for the AraliaPanax group (node 118). The ancestral distribution area for both the Asian Palmate group and the AraliaPanax group was predicted to be South and Central China (B). A total of five vicariance events were detected in the AraliaPanax group, four of which were in the Panax genus (nodes 105–107, 110). Likewise, five dispersal occurrences took place in this group, with four occurring within the Panax genus (nodes 106, 108–110).

Discussion

Phylogenetic discordance and the evolution of the Asian Palmate group

Phylogenetic discordance and poly- or paraphyly of several genera have previously emerged as scientific questions in the Araliaceae1,2,4,17,18,19,20. Despite the use of plastome and nuclear data in recent phylogenomic studies31,42, these questions remain uncertain owing to the limited sampling. The phylogenetic discordance between nuclear and organelle gene trees, which is referred to as cytonuclear discordance, is known to be caused by gene paralogy, hybridization, introgression (including chloroplast capture), and incomplete lineage sorting43,44,45. However, phylogenetic discordance has often been reported in plastome-based phylogenomic studies as well. It can be caused by evolutionary rate heterogeneity between plastid genes46, but also by how partitioning of the plastome data was performed47,48,49. Differences in data partitioning can result in different tree topology, branch length, and bootstrap support values50. All the phylogenies constructed in this study supported the monophyly of the Asian Palmate group, the AraliaPanax group, the Greater Raukaua group, and the genus Hydrocotyle, which was consistent with previous studies using few genes or genome data1,2,4,17,18,19,20,31,42. However, our analyses inferred different phylogenetic topologies and branch support values from four different data matrices of plastome data and the nrDNA phylogeny, particularly in the Asian Palmate group (Fig. 2). While the phylogenetic relationships within the AraliaPanax group, the Greater Raukaua group, and the genus Hydrocotyle were consistent in all four plastome-based phylogenies (Fig. 1), those of the Asian Palmate group differed by data partitioning of plastome data and between plastome and nrDNA phylogenies as well.

Within the Asian Palmate group, phylogenetic discordance was observed in the position of four clades, the MacropanaxMetapanaxKalopanax clade (Clade 2), the FatsiaOreopanax clade (Clade 4), the DendropanaxChengiopanaxGamblea clade (Clade 5), and the Oplopanax clade (Clade 7). While these clades were monophyletic with strong branch support, phylogenetic relationships among the clades differed with the different data matrices, and their branch support values were relatively lower than those of other branches (Figs. 1, 2D). Most previous studies using few gene sequences were often failed to resolve phylogenetic relationships among groups and clades of the Asian Palmate group by showing polytomy and collapsing monophyly of them1,2,17,18,19. Our topology inferred from first and second sites of CDSs and protein sequences (Fig. 2Bii,C) was similar to the topology inferred from nrITS and six plastid regions by Li and Wen19, but relationships among the MerrilliopanaxHedera clade (Clade 3), the FatsiaOreopanax clade (Clade 4), Dendropanax (Clade 5), the EleutherococcusBrassaiopsisTrevesia clade (Clade 1), and the MacropanaxMetapanaxKalopanax clade (Clade 2) remained unclear19. Recently, plastid and nuclear (Hyb-seq) phylogenomic studies were conducted to resolve the phylogenetic relationships of the Asian Palmate group, respectively31,42. The topology inferred from plastome data by Valcárcel and Wen31 was consistent with the topology inferred from whole-plastome sequence (Fig. 2Bi,Ci). However, the phylogenomic study by Gallego-Narbón et al.42 using nuclear genes indicated a novel topology that differed from any other topology inferred from plastome and nrDNA data1,2,4,19,31. In the topology, the EleutherococcusBrassaiopsisTrevesia clade (Clade 1) was collapsed and divided into two places. The Eleutherococcus genus was nested within the MacropanaxMetapanaxKalopanax clade (Clade 2), and the BrassaiopsisTrevesia clade was sister to the FatsiaOreopanax clade (Clade 4)42. The Oplopanax clade (Clade 7) was sister to all other clades of the Asian Palmate group in recent studies18,19,31,42, and this position was supported by whole-plastome sequence in this study (Fig. 2Ci). However, different positions of the Oplopanax were reported, e.g., sister to the Gamblea1 and Fatsia2,18. In this study, a sister relationship between the ScheffleraHeteropanaxTetrapanax clade (Clade 6) and the Oplopanax clade (Clade 7), inferred from 78 CDS and protein sequences, has not been reported so far, and the relationship might be an alternative hypothesis for the phylogenetic position of the Oplopanax clade (Clade 7).

The phylogenetic problems of the Asian Palmate group, such as incongruences between different gene tree and unsettled relationships have prompted several phylogenetic studies, and it has been hypothesized that the phylogenetic discordance might be the result of rapid diversification18,31,42. Rapid diversification of the Asian Palmate group was also supported by phylogenetic analyses and divergence time estimation in this study. It has recently been hypothesized that the rapid diversification of the Asian Palmate group was triggered by hybridization42. The limitations of maternally inherited plastome data preclude us from discussing the hypothesis proposed by Gallego-Narbón et al.42. Alternatively, we propose a hypothesis based on our dating results and the climatic data in the Cenzoic predicted by Zachos et al.51. Our molecular dating results indicated that the clades showing phylogenetic discordance in the Asian Palmate group diverged during the Middle Miocene (Fig. 4A), coinciding with the Mid-Miocene Climatic Optimum (Fig. 4B)51. We therefore hypothesize that the global warming event in the Middle Miocene may have triggered rapid diversification, leading to phylogenetic discordance within the Asian Palmate group. Further study using extensive sampling and reliable fossil data for more accurate molecular dating of the Asian Palmate group is needed to demonstrate how the species of the Asian Palmate group evolved.

Accelerated AT-biased mutation in the plastomes of the semi-aquatic Hydrocotyle genus

Although the Araliaceae plastomes are under purifying selection (dN/dS < 1), the accelerated substitution rates in the Hydrocotyle plastomes observed in this study (Supplementary Fig. S6) were also reported in the nuclear and plastid phylogenies with quite long branches, even compared with one Apiaceae species42. Mutational pressure and gene conversion affect the nucleotide landscape of the genome, and these two factors are known as non-adaptive mechanisms52. Owing to the influence of AT-biased mutational pressure in non-coding regions, the majority of land plants have AT-rich plastomes53; conversely, genic regions appear to be under GC-biased mutational pressure, referred to as GC-biased gene conversion52,53,54,55,56. The Araliaceae species in this study have AT-rich plastomes with a GC content of approximately 38%, suggesting that the Araliaceae plastomes are under AT-biased mutational pressure similar to other land plant plastomes. The plastomes of the subfamily Aralioideae were found to be GC-biased in the genic region, with T-to-C transitions being significantly more common than C-to-T transitions (Table 1; Supplementary Fig. S7). By contrast, the number of C-to-T transitions in Hydrocotyle plastomes of the subfamily Hydrocotyloideae was almost double that in Aralioideae plastomes, resulting in a slight reduction in plastome GC content (Table 1; Supplementary Fig. S7). AT-biased mutations in the Hydrocotyle plastomes contributed to accelerated nucleotide substitution rates and long branches in the phylogenetic trees compared with Aralioideae plastomes (Supplementary Figs. S2S7), implying that the Hydrocotyle plastomes have their own evolutionary history not experienced by other species in the Araliaceae. This may be an evolutionary result of adaptation. The Hydrocotyle genus is the only aquatic or semi-aquatic plant in the Araliaceae family. Hydrocotyle species, including the three species in this study, usually grow in places that are marshy, boggy, and wet, but can even survive under water57,58,59. Accelerated nucleotide substitution rates in some species or clades exhibiting distinct environments and growth habitats have been reported60,61,62, and associated mutation rates among the plastome, mitogenome, and nuclear genome were discussed in previous studies56,63,64,65,66. The Hydrocotyle species had a relatively long branch in both plastome and nrDNA phylogenies (Fig. 3), implying that accelerated substitution rates occurred at least in plastome and nrDNA. The accelerated substitution rates in the three genome compartments may have been influenced by generation time, plant height, the replication mechanism of organelle genomes, and an improper organelle DNA repair system56,60,61,62,63,64,65,66. Therefore, we speculate that the accelerated substitution rate in the Hydrocotyle genus occurred not only in their plastome but also in their mitogenome and nuclear genomes as a consequence of their evolutionary history and environmental adaptation. Further research, including three genome compartments in the plant cell, may be able to reveal why Hydrocotyle plastomes have such a distinctive evolutionary pattern.

Evolution of the AraliaPanax group and two independent migrations of Panax species

The AraliaPanax group consists of three genera: Aralia, Panax, and Sciadodendron (a monotypic genus that includes only S. excelsum and was recently considered Aralia excelsa)17. Two genera, Aralia L. and Panax L., include approximately 60 and 18 species, respectively, and are mainly distributed in Asia and the Americas17,35,64. The two genera can be distinguished by leaf and flower morphology. The Aralia has usually pinnate leaves and five to eight carpels, whereas the Panax has usually palmately compound leaves and whorled arrangements, a single terminal flower, and a bi- or tricarpellate ovary2,17,67. However, the phylogenetic relationships between these two genera remain uncertain so far. Previous studies using a few DNA barcoding regions showed that the Aralia was not monophyletic and the Panax was nested within the Aralia clade1,2,4,17,18. In addition, monophyly of the Panax was often collapsed by P. trifolius locating with Aralia species17, as shown in our nrDNA phylogeny (Fig. 3). In contrast, cases that supported monophyly of the two genera have also been reported19,31,42,68. When the Sciadodendron (S. excelsum = currently Aralia excelsa) was treated as an independent genus, it was sister to the Panax, and they were sister to the monophyletic Aralia clade; conversely, when it was considered a member of Aralia, the monophyly of the Aralia genus collapsed19,31, as shown in our plastome phylogenies (Fig. 1, 3). Two recent studies utilized a large amount of nuclear data to reconstruct the phylogenetic relationships of the Araliaceae42,68. Both nuclear phylogenomic studies successfully resolved and supported the monophyly of the Panax; however, the generic delimitation between the Aralia and Panax was still controversial42,68. In the two studies, Panax species formed their own clade with strong support42,68, but they were still nested within the genus Aralia, owing to the position of A. humilis (section Humiles)68. Therefore, collapsing the monophyly of the genus Panax is likely caused by insufficient data, but the generic delimitation of the genus Aralia should be revised with extensive sampling of both Aralia and Panax from Asia and the Americas, based on a large amount of nuclear data and morphology.

The Panax genus exhibits interesting evolutionary patterns in terms of its distribution and ploidy levels. Panax species are mostly distributed in East Asia and rarely in North America35,69. Only two Panax species, diploid P. trifolius (dwarf ginseng) and tetraploid P. quinquefolius (American ginseng), are distributed in the Northeastern region of North America, and their phylogenetic positions are far from each other: P. trifolius is sister to all Panax species, and P. quinquefolius is sister to P. ginseng (Korean ginseng) in our plastome phylogeny (Figs. 1, 3). The remaining East Asian Panax species can be divided into two groups: the diploid group distributed in Southeast Asia (Geographical area B in Fig. 4) and the tetraploid group distributed in Northeast Asia (Geographical area A in Fig. 4)30,34,35,70. While the Northeast Asian tetraploid Panax species were sister to the Southeast Asian diploid Panax species in the plastome phylogeny, they were located within the Southeast Asian diploid Panax species in the nrDNA phylogeny (Fig. 3). The cytonuclear discordance between plastome and nrDNA phylogenies could be evidence for allotetraploidization. Allotetraploidization of the Northeast Asian tetraploid Panax species was hypothesized previously33 and supported by phylogenetic evidence that the two subgenomes of the tetraploid species were located in two different clades; one subgenome (subgenome B) was sister to the Panax notoginseng genome and the other subgenome (subgenome A) was not71. These phylogenetic results imply that the allotetraploidization occurred in the common ancestor of the tetraploid Panax species, P. ginseng, P. japonicus, and P. quinquefolius, and that the migration of P. quinquefolius from East Asia to North America should have occurred after the allopolyploidization. Therefore, a two-step migration hypothesis has previously been proposed33,34,35, and our biogeographical analysis also supported the two independent migrations (dispersal) from East Asia to North America (Fig. 4A).

The Bering land bridge is proposed as the migration route of North American Panax species33,34,35,72. The first dispersal of Panax species (P. trifolius) from East Asia to North America is thought to have occurred through the Bering land bridge during the Middle Miocene (Figs. 4A, 5A,B), similar to most intercontinental disjunct plants that diverged from 3 to 25 Ma through floristic exchanges between East Asia and North America72,73,74. The Bering Strait was not formed until the Middle Miocene, with East Asia and North America instead connected by land, facilitating floristic exchanges between the two continents74. However, the second migration, that of tetraploid Panax species (P. quinquefolius), occurred during the Pleistocene, after the formation of the Bering Strait (Figs. 4A, 5A,B); the Bering Strait was also dynamic, experiencing recurrent glaciations and global warming75. This implies that natural dispersal may have been difficult owing to geographical and climatic isolation and that some mediator was needed for migration. It has previously been suggested that birds or small mammals might have carried Panax species into North America76,77,78. Furthermore, during the Early Pleistocene, the steppe mammoth (Mammuthus trogontherii), a trunked mammal, was estimated to have migrated from Eurasia to North America around 1.3–1.5 Ma79, coinciding with the split between P. ginseng and P. quinquefolius in this study (Figs. 4A, 5A). The migration route of the steppe mammoth reached the northeastern region of North America, where North American ginseng is currently distributed80. Consequently, we propose a supplementary hypothesis for the intercontinental disjunction of Panax species in which the North American Panax species migrated from East Asia to North America in two stages: the first stage was dispersal of P. trifolius through floristic exchange via the Bering land bridge (node 110 in Fig. 4)72,73,74, and the second stage was migration of P. quinquefolius (node 106 in Fig. 4), which occurred during recurrent glaciations and was likely mediated by animals, including birds and small mammals33,35,76,77,78, as well as trunked mammals.

In conclusion, our findings provide evidence that rapid diversification within the Asian Palmate group may be responsible for the phylogenetic discordance in the group, and that the Hydrocotyle plastome has undergone a different evolutionary history compared with other Araliaceae plastomes. In addition, we complement the hypothesis for the intercontinental disjunction of Panax species by combining evidence from molecular dating, distribution, and ploidy level. We believe that our study furthers our understanding of the diverse evolutionary patterns in the plastomes of the Araliaceae and complements the hypothesis for the evolution of Araliaceae species. Further studies may be able to demonstrate how the Hydrocotyle plastomes are under different mutational pressure from other Araliaceae plastomes as well as reveal the origin of the genus Aralia and whether this genus is monophyletic or evolved convergently.

Methods

Plant materials, DNA extraction, and Illumina sequencing

Fresh leaves of 12 species were collected from the Medicinal Plant Garden, College of Pharmacy, Seoul National University (Goyang, Korea), with permission from the garden authorities (Supplementary Table S3). Total genomic DNA was extracted using an Exgene Plant SV Midi Kit (Geneall Biotechnology, Seoul), and DNA quality was examined using gel electrophoresis and a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, USA). Library preparation and sequencing were performed by Phyzen Co. Ltd. (Seongnam, South Korea). Approximately 1 μg of genomic DNA for each sample was sequenced using two NGS platforms. Of the 12 newly sequenced species, seven were sequenced using the Illumina MiSeq platform with paired-end reads of 2 × 300 bp and and five were sequenced using the Illumina Hiseq Xten with paired-end reads of 2 × 150 bp (Supplementary Table S1).

Plastome and nrDNA assembly and gene annotation

In addition to 12 newly sequenced species, raw reads of Hydrocotyle vulgaris (ERX5309978) were downloaded from the NCBI Sequence Read Archive (SRA) database for plastome and 45S nrDNA assemblies. Plastid genome and 45S nrDNA sequences were assembled using the dnaLCW method40,41. Sequence trimming and de novo assembly of plastid genomes and 45S rDNA were performed using the CLC assembly cell v. 4.21 (CLC Bio, Denmark). Plastome (KM088019) and 45S rDNA (KM036295) sequences of Panax ginseng cultivar ‘Chunpoong’ were used as a reference. Contigs belonging to the plastome were extracted using MUMmer81 and BLASTZ82, and assembled contigs were curated manually. Gene annotation of the plastid genome was performed using GeSeq83 and Artemis84. For the 45S nrDNA contig, annotation of 18S, ITS-1, 5.8S, ITS2, and 26S was determined through sequence comparison with the reference. Circular maps of plastome structures were drawn using Circos85.

Phylogenetic reconstruction

To reconstruct the plastome-based phylogeny of the Araliaceae, 53 previously reported plastome sequences, including 10 plastome sequences reported by the authors’ laboratory, were downloaded from NCBI GenBank (Supplementary Table S3). These 53 sequences plus the 13 plastomes newly assembled in this study (66 plastome sequences including three Apiaceae species as the outgroup) were used to reconstruct the phylogenetic relationships of the Araliaceae (Supplementary Table S3). Phylogenetic reconstruction was performed using four different matrices: (1) whole-plastome sequences, (2) concatenation of 78 protein-coding gene sequences, (3) first and second sites of 78 coding sequences (CDSs), and (4) protein sequences of 78 genes. Sequence alignments for whole-plastome sequences and for each gene were conducted using MAFFT86, and the aligned sequences of each gene were then concatenated. For phylogenetic reconstruction using protein sequences, the CDS of each gene was translated into an amino acid sequence using the translator embedded in BioEdit v. 7.2.587; amino acid sequences were then aligned using MAFFT. Phylogenetic analyses based on the four different matrices were conducted using RAxML v. 888 with 1000 bootstrap replicates and models selected by jModeltest v. 2.1.589. The nrDNA sequences of 25 species (including Centella asiatica as an outgroup) were used to reconstruct a phylogenetic tree of 45S nrDNA sequences, including 18S, 5.8S, and 26S rDNAs and two ITS regions. Alignment and phylogenetic reconstruction were performed using MAFFT and RAxML, respectively. In order to compare phylogenetic results between nrDNA and plastome phylogenies, a plastome-based phylogeny using 78 protein-coding gene sequences of these 25 species was reconstructed according to the alignment and phylogenetic analysis described above.

Nucleotide substitution analyses

Synonymous (dS) and nonsynonymous (dN) substitution rates of 78 plastid genes were estimated using the codeml application in PAML v. 4.990. The topology obtained from phylogenetic reconstruction was used for nucleotide substitution rate analysis. Plastid gene sequences of Centella asiatica (Apiaceae) were included as a standard, and the F3 × 4 model was used for codon frequencies. Statistical significances between different groups were examined using analysis of variance (ANOVA) with Bonferroni post hoc tests using R v. 3.5.191. To determine the presence of any specific mutational pressure in the genomes, substitution types in protein-coding regions, such as C-to-T or T-to-C transitions and transversions between purine and pyrimidine bases were investigated.

Divergence time estimation

Bayesian divergence time estimation was performed using a lognormal relaxed clock model in BEAST 2 v. 2.6.292. A total of 66 species were included, and 78 concatenated plastid gene sequences (excluding rps12 and ycf15 genes) were used. The GTR model with invariant sites and a number of six gamma categories was selected as the best-fit model. One secondary nodal calibration was employed, with the root age of Apiaceae and Araliaceae constrained using a normal prior distribution with mean of 60.19 and SD of 6.6, according to the results of Magallon et al.93. The dataset was run using a birth–death speciation prior, and 78 independent runs of 10 million generations with sampling every 1000 generations were performed. The log files obtained from independent runs were combined using LogCombiner in BEAST after burning the first 10% of generations; the combined logs were then checked using Tracer v. 1.7.294 to confirm whether effective sample sizes for relevant parameters were over 200. Tree files were then resampled using LogCombiner with the same burn-in strategy, and a time-calibrated maximum clade credibility (MCC) tree with 0.5 posterior probability limit and mean branch lengths was generated using TreeAnnotator in BEAST. The estimated divergence time tree was displayed using FigTree v.1.4.3 to check mean and 95% high posterior densities (HPDs) of the chronogram.

Ancestral area estimation

Biogeographical analysis was performed with the S-DIVA to reconstruct the possible ancestral ranges of the Araliaceae on the phylogenetic tree, implemented in the Reconstruct Ancestral State in Phylogenies (RASP) v. 4.395. A total of 66 species were assigned to seven geographical regions according to their native distribution provided by Plants of the World Online (https://powo.science.kew.org/): (A) Northeast Asia, (B) Southeast China, Central China, Himalayas, (C) Central Asia, Southern Asia, (D) Southeast Asia, (E) Europe, West Asia, Africa, (F) North, Central, South America, (G) Australia, Oceania, Pacific Islands. The seven geographical regions were color-coded and illustrated using MapChart (https://www.mapchart.net/). Maximum areas at each node were set to seven. The S-DIVA analysis was run allowing for extinction.

Sample collection and experiment statement

All the plant materials in this study were collected with permission from the garden authorities, and all the methods in this study comply with relevant institutional, national, and international guidelines and legislation.