Complete chloroplast genomes shed light on phylogenetic relationships, divergence time, and biogeography of Allioideae (Amaryllidaceae)

Allioideae includes economically important bulb crops such as garlic, onion, leeks, and some ornamental plants in Amaryllidaceae. Here, we reported the complete chloroplast genome (cpDNA) sequences of 17 species of Allioideae, five of Amaryllidoideae, and one of Agapanthoideae. These cpDNA sequences represent 80 protein-coding, 30 tRNA, and four rRNA genes, and range from 151,808 to 159,998 bp in length. Loss and pseudogenization of multiple genes (i.e., rps2, infA, and rpl22) appear to have occurred multiple times during the evolution of Alloideae. Additionally, eight mutation hotspots, including rps15-ycf1, rps16-trnQ-UUG, petG-trnW-CCA, psbA upstream, rpl32-trnL-UAG, ycf1, rpl22, matK, and ndhF, were identified in the studied Allium species. Additionally, we present the first phylogenomic analysis among the four tribes of Allioideae based on 74 cpDNA coding regions of 21 species of Allioideae, five species of Amaryllidoideae, one species of Agapanthoideae, and five species representing selected members of Asparagales. Our molecular phylogenomic results strongly support the monophyly of Allioideae, which is sister to Amaryllioideae. Within Allioideae, Tulbaghieae was sister to Gilliesieae-Leucocoryneae whereas Allieae was sister to the clade of Tulbaghieae- Gilliesieae-Leucocoryneae. Molecular dating analyses revealed the crown age of Allioideae in the Eocene (40.1 mya) followed by differentiation of Allieae in the early Miocene (21.3 mya). The split of Gilliesieae from Leucocoryneae was estimated at 16.5 mya. Biogeographic reconstruction suggests an African origin for Allioideae and subsequent spread to Eurasia during the middle Eocene. Cool and arid conditions during the late Eocene led to isolation between African and Eurasian species. African Allioideae may have diverged to South American taxa in the late Oligocene. Rather than vicariance, long-distance dispersal is the most likely explanation for intercontinental distribution of African and South American Allioideae species.


Results
Comparative analysis of cpDNA features in Allioideae and related taxa. The cpDNA genomes of Allioideae have a quadripartite structure that includes a large single copy (LSC), a small single copy (SSC), and two inverted repeat (IR) regions (Fig. 1). However, cpDNA genome size varies from 145,819 to 157,735 bp (Table 1). Among the four tribes of Allioideae, Allieae species have the smallest cpDNA (Allium paradoxum; 145,819 bp) and the largest cpDNA (Allium tuberosum; 157,735 bp). Most cpDNA of Allioideae is smaller than those of Agapanthoideae (157,055 bp) and Amaryllidoideae (ranging from 158,355 to 159,998 bp). Additionally, the GC content of cpDNA sequence in Allium species (generally ≤ 37.1%) is lower than those of other examined taxa (Table 1).
Although cpDNA size is variable, its gene content and order are quite stable among Allioideae and related taxa; it includes 80 protein-coding genes, 30 tRNAs, and four rRNAs (Fig. 1, Table 1, and Table S1). The loss of infA was observed in Allium monanthum, A. karataviense, A. ampeloprasum, A. macleanii, and A. spicatum, whereas complete deletion of rpl22 and rps16 was recorded in A. monanthum and A. platyspathum, respectively (Table 1). In addition to the loss of protein-coding regions, pseudogenization was annotated in some regions of the examined species, including rps2 in most examined species of Allieae (   In Allium, three subclades (a-c) were recognized: "a" included A. nigrum through A. cyathophorum; "b" included A. spicatum through A. tricoccum; and "c" comprised A. cernuum through A. siculum. Although monophyly of three subgenera: Cepa, Anguinum, and Anguinum was supported, our results clearly demonstrate that two subgenera, Cyathophora and Melanocrommyum, represented by multiple species, were not monophyletic (Fig. 2).

Molecular dating analyses.
Divergence time estimates for the tribes of Allioideae based on the combination of 74 coding gene sequences in the chloroplast genome are shown in Fig. 3   Ancestral area reconstruction. The ancestral ranges of the nodes of clades in Allioideae inferred using the Bayesian binary method (BBM) and statistical dispersal variance analysis (S-DIVA) are summarised in Fig. 4 and Table 2. The BBM reconstruction suggests that Africa (C) is the most probable ancestral area of Allioideae (node 1, 82%), whereas the S-DIVA range reconstruction for this node is Eurasia + Africa (AC, 50%) or Eura-  Table 2). www.nature.com/scientificreports/   Table 2  www.nature.com/scientificreports/ sia + Africa + South America (ACD, 50%). Both methods suggest Eurasia (A) as the ancestral area for Allieae (Clade I; node 2). S-DIVA suggests Africa + South America (CD) as the most probable ancestral area for clade II (node 3), which includes the remaining tribes of Allioideae, whereas BBM indicated Africa (C) with 87% marginal probability. BBM and S-DIVA reconstructions both suggest that South America (D) is the most probable ancestral area for the node of Gilliesieae-Leucocoryneae (node 4).

Discussion
Chloroplast genome evolution in Allioideae. The newly sequenced chloroplast genome revealed a highly conserved genome structure in terms of GC content and gene composition and order among Allioideae and related taxa (Table 1 and Fig. 1). In comparison to GC content of Amborella trichopoda (38.34%), Nicotiana tabacum (37.85%), and Oryza sativa (39%), those of Allioideae exhibited a lower percentage, especially members of Allieae (generally ≤ 37.1%). However, only some representatives of over 1000 species in Alliodeae were used in this study. Therefore, a larger number of Allioideae samples is needed to clarify the fluctuation of GC content which contributed to RNA editing and stability of genome structure [23][24][25] . Gene content varied among species due to pseudogenization and loss of genes in some Allioideae (i.e., Tulbaghieae, Gillesieae, and Allieae; Table 1). The size of chloroplast genome was affected by the reduction and expansion of IR regions and gene loss and duplication 23 . Among Allioideae species, chloroplast genome size fluctuation was caused by pseudogenization and loss of genes (Table 1). For example, the smallest cpDNA in Allioideae was found in Allium paradoxum of which three and nine genes were lost and pseudogenized, respectively. Further observation on the gene loss and pseudogenization revealed that the gene loss and pseudogenization are not corresponded to the recognized clades indicating parallel evolution of these events in Allium (Table S5). For example, the loss of infA was recorded in representative species of three evolutionary lines in Allium. A similar trend was found in the pseudogenization of rps2 of which the intact sequences were also recorded (Table S5). In monocots, the parallel loss or pseudogenization of genes has been reported. For instance, in Liliales representatives of both photosynthetic and mycoheterotrophic groups show the gene loss and pseudogenization of rps16, infA, and cemA 26 . Previously, the loss of infA was surveyed in angiosperms, revealing that the loss of infA from cpDNA can be mitigated by infA in the nuclear genome 27 . Various gene deletions have been reported in Allium (section Daghestanica) 21 . However, the mechanism leading to and outcomes of these events have not been studied in Allioideae species. In the present study, the sequence of the lost gene was not found in the current raw NGS data, suggesting that these genes were not transferred to nuclear or mitochondrial genomes. However, to confirm the final destination of the lost genes, the NGS data of nuclear and mitochondrial genomes among Allium species should be generated. Additionally, only 13 out of over 800 species of Allium were examined in the present study; therefore, further studies that cover all members of Allium should be conducted to provide a comprehensive understanding of the evolution of gene loss and pseudogenization in Allieae and related taxa. Aside from the loss and pseudogenization of genes, which affect genome size, the expansion and contraction of IR regions resulted in differing junctions among LSC-IR-SSC regions and thus caused length variations in the cpDNA of Allioideae (Table 1). Previously, Wang et al. 28 described different junction types in monocot species, ranging from trnH-GUG to rpl22. The LSC-IR junctions of basal angiosperms and monocots were also reported and divided into five types 26 . In the present study, the LSC-IR junction varied from trnH-GUG (type II, Nothoscordum bonariense) to rpl22 (type IV, most of Allium; Table 1). Notably, type III of LSC-IR junction (located in the IGS between rps19-rpl22) was found in Allium monanthum (Table 1), suggesting high variability of this boundary in Allioideae. Similar to the LSC-IR junction, the SSC-IR border feature is variable among Allioideae species, which may show overlap, adjunction, or a gap between ycf1 and ndhF as described in a previous study 26 (Table 1). This junction is located within ycf1 in cpDNA due to its long length. These characteristics of the LSC-IR-SSC junction have also been reported in other monocot groups 28,29 , suggesting similar patterns of structural variation among the cpDNA of monocots.
Analysis of nucleotide diversity and repeats in cpDNA sequences provides useful information for identifying molecular markers, reconstructing phylogenetic relationships, and exploring population genetics in angiosperms 30,31 . In this study, different SSRs were identified among Allioideae that may be useful for studies of molecular markers and population genetics of Allium in particular and Allioideae in general (Tables S3 and S4). Furthermore, eight hotspot regions of cpDNA were identified, which can be used in future studies of interspecies relationships among Allium species (Table S2). Another study on the complete plastomes of Allium revealed different genes with high nucleotide diversity (including ndhK, ndhE, ndhA, rps16, psaI, rpl22, rpl32, and trnK-UUU ) in comparison with the present study 15 . These various findings might be caused by different taxon sampling and an insufficient number of samples among the studies. However, these results provided preliminary data on nucleotide diversity of plastomes for further studies that include all Allium taxa to identify the common hotspot regions across Allium.
Phylogenetic relationships of Allioideae. Our MP and BI analyses consistently recovered Allioideae as sister to Amaryllidoideae (Fig. 2). This result is in line with previous molecular phylogenetic studies of Amaryllidaceae 5,6 . By contrast, Allioideae was found to be sister to a clade of Amaryllidoideae and Agapanthoideae inferred from data of nuclear ITS and plastid matK, ndhF, and rbcL 7 . Although Allioideae has superior ovary and solid style (vs. inferior ovary and hollow style in Amaryllidoideae), these characteristics are homoplasious in Asparagales 32 . Our phylogenomic study recovered Allieae as sister to the rest tribes of Allioideae (Fig. 2). The unique position of Allieae is also corroborated by having the synapomorphic, gynobasic style (vs. terminal in other tribes). Tulbaghieae, sister to Leucocoryneae-Gilliesieae, could be distinguished by the presence of corona in the flower. Moreover, the pseudogenization of cemA gene was only detected in Tulbaghieae. Gilliesieae and Leucocoryneae were strongly supported as sister in agreement with Sassone  www.nature.com/scientificreports/ is supported by several morphological characteristics such as terminal style position and absence of corona in the flower. In addition, both tribes were distributed in South America. In particular, Gilliesieae is restricted to Chile and Patagonia in Argentina, while Leucocoryneae is located in Argentina, Chile, Bolivia, Peru, Paraguay, Uruguay, and Brazil. Therefore, molecular phylogenetic relationships among tribes of Allioideae were supported by morphological and geographical evidence.
In the present study, Allium subg. Melanocrommyum and A. subg. Cyathophora were found to be non-monophyletic although 74 protein-coding genes were used (Fig. 2). Previous molecular phylogenetic studies of Allium revealed the non-monophyly of some subgenera 8,9,33 . For example, Li et al. 33 reported paraphyly of the subgenera Anguinum, Cepa, Allium, Reticulatobulbosa, and Polyprason inferred from ITS and rps16 sequences. Similarly, the monophyly of subgenera Rhizirideum, Polyprason, and Cyathophora was not corroborated by ITS and external transcribed spacer sequences 9 . Additionally, the phylogeny of Allium based on whole plastome sequences revealed the polyphyly of subgenera Cepa and Polyprason 8 . Albeit different molecular datasets have been used and resulted in non-monophyletic relationships, Allium species are always placed into three distinct clades, and accordingly, the hypothesis of three evolutionary lineages was proposed 10,33 . Among members of the genus Allium, the basic chromosome numbers are x = 7, 8, 9, 10, 14 7,34,35 . Additionally, natural interspecific hybridization has been reported in Allium 35 . The high chromosome diversity and hybridization in this genus might blur to propose a clear classification of Allium. Although 74 protein-coding genes were used in the present study, subgeneric relationships within Allium were not fully resolved. Therefore, further studies using more Allium samples and more molecular data (i.e., coding sequences in nuclear and mitochondrial genomes, and hotspot regions) should be conducted to provide better subgeneric classification of this complex genus of Allioideae and an explanation for the three distinct groups of Allium.

Divergence time and biogeographic origins of Allioideae. Accurate estimation of divergence time in
a certain plant group is important to understanding its biogeographic history. However, like most plant groups, the fossil record in Allioideae is sparse. When paleontological data are lacking, molecular estimates provide the only means for inferring the age of lineages, and multiple DNA regions are used to ensure the accuracy of divergence time estimates. Here, we used 74 cpDNA coding regions to estimate the divergence times of major clades in Allioideae. Previous studies also analyzed divergence times of Allioideae and resulted in different outcomes (Table S6). Our molecular dating analysis suggests that Allioideae diverged from its sister clade in the early Eocene (mean = 47.7 mya; 95% HPD = 40.8-56.5 mya). Similar divergence time of Allioideae (41.9 mya, 95% HDP = 34.5-47.6 mya) was estimated based on 48 shared chloroplast genes among 19 monocots families 8 . The diversification of Allioideae, which resulted in the formation of two major lineages, is estimated to have occurred in the middle Eocene (40.1 mya, 95% HPD = 28.5-55.3 mya; node 1 in Fig. 3). This estimate of the crown age of Allioideae is similar to that obtained in a previous research (37.0 mya, 95% HPD = 27.8-44.5 mya) 6 . This result is also supported by the fossil genus Paleoallium, which is similar to extant Allium, recently reported during the Eocene 36 . Thus, we believe that this is the most reliable estimate of the divergence time for Allioideae to date. However, Costa et al. 7 presented an older divergence time of Allioideae (Table S6). In particular, Allioideae diverged in the Paleocene (63.2 mya, 95% HDP = 67.5-53.7 mya) followed by splits of Allieae (52.2 mya, 95% HDP = 58.1-44.4 mya) and Tulbaghieae and Gilliesieae (54.1 mya, 95% HDP = 65.1-37.11 mya) 7 . In comparison to the results of the current study, the older times might be caused by different sequence data matrix (four loci of which missing data were accounted for 20.5% of the matrix), and different calibration points (fossil leaf of Amaryllidaceae) 7 .
The species in four tribes of Allioideae distributed discontinuously, with complete separation between the Northern and Southern Hemispheres (Allieae, Eurasia, and North America; Tulbaghieae, Africa; Gilliesieae and Leucocoryneae, South America). In contrast to Dubouzet and Shinoda 37 , who suggested that the major lineages of Allioideae originated in the Northern Hemisphere, our biogeographic reconstructions based on BBM analysis suggest that this subfamily originated in Africa with high marginal probability, while S-DIVA suggests Eurasia + Africa or Eurasia + Africa + South America as the origin of Allioideae (Fig. 4, Table 2). The deepest branches of the topology originate in Africa, including the sister groups of subfamilies Amaryllidoideae and Agapanthoideae. Moreover, the age of the crown node of clade II (mean = 25.3 mya), which includes Tulbaghieae, Gilliesieae, and Leucocoryneae from the Southern Hemisphere, is older than that of Allieae (mean = 21.3 mya) from the Northern Hemisphere (Fig. 3). The initial diversification of Allioideae likely occurred due to climatic conditions. During the late Paleocene and early Eocene, a warming period occurred, producing a pronounced climate optimum that favored the diversification of major Allioideae lineages in Africa. The ancestor of Allioideae is believed to have originated in Africa, with the Allieae lineage then migrating towards warmer areas of the Northern Hemisphere when the global climate shifted to cooler conditions around 50-34 mya 38 . Dispersal from Africa to Europe is common among land plants with disjunct distributions in both regions 39 .
The ancestral range of the crown node of clade II, which includes Tulbaghieae, Gilliesieae, and Leucocoryneae, is in Africa according to our BBM analysis ( Fig. 4 and Table 2). Two mechanisms have been proposed to explain the intercontinental distribution of this clade in the Southern Hemisphere, attributing it to either dispersal or vicariance (continental drift). We observed disjunct populations in Africa and South America. Our age estimate for the divergence of these two regions is 25.3 mya, followed by diversification approximately 16.5 mya and the subsequent emergence of the monophyletic Gilliesieae-Leucocoryneae lineage in South America. Thus, continental drift does not appear to have played a role in the disjunct distribution of the Allioideae species in Africa and South America, as the great southern continent of Gondwanaland is thought to have broken up in the early Cretaceous. The possibility of biological exchange between Africa and South America since the late Oligocene occurred too recent to support a vicariance explanation based on continental drift. Instead, long-distance dispersal may explain the intercontinental distribution of African and South American Allioideae species. www.nature.com/scientificreports/ Similar origins have been postulated for Caricaceae 40 and Canellaceae 41 . The latest study on the biogeography of Allioideae suggested an "Out-of-India" hypothesis for the colonization of Allieae in the northern hemisphere from India tectonic plate 7 . However, the absence of Allieae species in India questioned the reliability of "Out-of-India" hypothesis although the authors demonstrated that aridification during the collision of India and Eurasia caused the extinction of Allium in India. The present study presents the most detailed molecular phylogenetic and biogeographic information available to date for Allioideae and illustrates the need to investigate relationships at the tribe level more thoroughly, especially Gilliesieae-Leucocoryneae. Givnish et al. 42 recently suggested an "out of Gondwana" origin for Liliales and emphasized the importance of vicariance in the ancient past for determining its current distribution. However, the biogeographic origin and their distribution pattern of Asparagales in the Southern Hemisphere have not yet been addressed. Thus, future works should include additional sampling to establish the biogeographic history of Asparagales in Southeast Asia, India, South America, Australia, and Africa.

Conclusions
This study provided new data on the evolution of chloroplast genomes in Allioideae. Specifically, there were parallel events of gene loss (infA, rps16, ndhF, ndhG, and rpl22) and pseudogenization (i.e., rps2, ycf15, rps16 and matK) across Allieae despite the division of Allium into evolutionary lines. The phylogeny inferred from 74 protein-coding genes revealed the monophyly of tribes in Allioideae; however, the subgenera classification of Allium was polyphyletic, suggesting further studies on phylogeny of Allium with more samples and molecular data (i.e., single copy genes in nuclear and mitochondrial genomes and non-coding regions). Divergence time estimation and biogeographic analysis resulted in the origin from Africa in the Eocene of Allioideae species of which the expansion to the northern hemisphere may infer from long-distance dispersal.

Materials and methods
Taxon sampling, DNA extraction, genome assembly, and annotation. Allioideae samples were collected from various sources (Table S7). Samples were dried with silica gel and used for extraction of total genomic DNA with a modified 2 × cetyltrimethylammonium bromide (CTAB) method 43 . High-quality DNA samples (> 200 ng/ul) were applied to NGS using the MiSeq sequencing platform with Miseq Reagent Kit v3 following manufacturer's instruction (Illumina, Korea). The raw reads (2 × 300 bp paired-end reads) obtained were trimmed to remove regions with error probabilities greater than 0.01% per base using Geneious v.7.1.9 44 . Also, the adapter sequences were removed using the function "Trims Ends" of Geneious v.7.1.9. The paired-end reads (300 bp) were assembled using the reference chloroplast genomes of Allium cepa (GenBank no. KM088013), Allium obliquum (GenBank no. NC037199), Allium sativum (GenBank no. NC031829), Allium ursinum (Gen-Bank no. MH157875), and Allium victorialis (GenBank no. MF687749) based on minimum similarity of 95% to the reference. Then, the isolated reads were subjected to de novo assembly in Geneious to complete the chloroplast genome sequences. The number of total reads, number of assembled reads, and coverage are summarised in Table S7 (over 15x). To confirm the newly completed sequences of Allium chloroplast genome, NOVOPlasty was used following the manual instructions 45 . In the case of having gaps during the assembly process, specific primer pairs were designed using Primer3 and the PCR products were sequenced using Sanger method to cover the gaps 46 . The newly completed chloroplast genome sequences were annotated using previously published Allium cpDNA as listed above with Geneious. Then, the protein-coding regions were checked and manually adjusted to include a start codon at the beginning and a stop codon at the end of the region. The tRNA sequences were confirmed using tRNAScan-SE 47 . A circular chloroplast genome map was obtained using the OGDraw program 48 .  (Table S7). The DNASP 5.0 program was used to calculate the nucleotide diversity (Pi values) of noncoding and coding cpDNA regions among Allioideae species 49 . The REPuter program was used to identify repeats in the cpDNA of Allioideae with a minimum length of 19 bp 50 . The Phobos program embedded in Geneious was used to identify simple single repeats, including mono-, di-, tri-, tetra-, penta-, and hexa-nucleotides with repeated numbers of 10, 5, 4, 3, 3, and 3, respectively [http://www.rub.de/ecoev o/cm/cm_phobo s.htm].

Comparative genomic analyses in
Phylogenetic analysis. Twenty-eight species were subjected to phylogenetic analysis, including Allioideae (21 species), Amaryllidoideae (5), and Agapanthoideae (1) within Amaryllidaceae. Within Allioideae, all four tribes (Allieae [18 species], Gilliesieae [1], Leucocoryneae [1], and Tulbaghieae [1]) recognized in the most recent accounts of the subfamily were sampled. For rooting, five species of Asparagaceae, Xanthorrhoeaceae, and Iridaceae were included based on previous phylogenetic studies 6 . Taxa sampled, voucher information, and GenBank accession numbers for the cp genome data are listed in Table S4. Among 80 coding genes in the chloroplast genome, six genes (rpl22, infA, ycf15, rps2, rps16, and accD) were excluded from the data matrix due to pseudogenization and loss events. Thus, the phylogenetic analyses were done on a dataset of 74 coding genes of the cp genome. Multiple-sequence alignment was performed using MAFFT v.6 51 with the default alignment parameters. Gaps were treated as missing data.
Phylogenetic reconstructions based on the combined sequences of 74 coding genes were performed using the maximum parsimony (MP) method in the program PAUP * 4.0b10 52 . All characters and character states were weighted equally and unordered. The most parsimonious trees were identified with a heuristic algorithm comprising tree bisection-reconnection, branch swapping, the MULPARS function, and the alternative character www.nature.com/scientificreports/ state. Bootstrap analyses (1000 pseudoreplicates) were conducted to examine the relative level of support (BP) for individual clades on each of the resulting cladograms. Phylogenetic analysis of the combined cpDNA dataset was also conducted using Bayesian inference (BI) in MrBayes v.3.12 53 . Applying the Akaike information criterion, jModelTest v.2.1.7 54 assigned the GTR + I + Г model of molecular evolution to the combined dataset. Four MCMC chains were run simultaneously and sampled every 1000 generations for a total of 20 million generations. We plotted the log-likelihood scores of sample points against generation time using Tracer v.1.5; this ensured that stationarity was achieved after the first 2 million generations by determining whether the log-likelihood values of the sample points reached a stable equilibrium. In addition, we used the AWTY graphical system 55 to compare split frequencies among runs and plot the cumulative split frequencies to ensure that stationarity was reached. The first 1000 (10%) sample trees from each run were discarded (representing burn-in), as determined using Tracer v.1.5. A maximum a posteriori tree was constructed by summarising the remaining trees from parallel runs into a majority-rule consensus tree, yielding posterior probability (PP) values for each clade.

Molecular dating analysis.
To estimate the divergence times of tribes in Allioideae, we used BEAST v.1.8 56 based on 74 cpDNA coding regions. The BEAUti interface was used to generate input files for BEAST, in which the GTR + I + Г model, Yule speciation tree prior, and uncorrelated lognormal molecular clock model were applied. Two runs of 200 million generations were set for the MCMC chains, sampling every 1000 generations. Convergence of the stationary distribution was checked through visual inspection of the plotted posterior estimates using Tracer v.1.6. After discarding the first 20,000 (10%) trees as burn-in, the samples were summarised in a maximum clade credibility tree in TreeAnnotator v.1.6.1 using a PP limit of 0.50 and summarising the mean node heights. The mean and 95% HPD of each age estimate were obtained from the combined outputs using Tracer. The results were visualized using Figtree v.1.4.2 [http://tree.bio.ed.ac.uk/softw are/figtr ee/].
Age calibration was constrained to the phylogeny of Allioideae and its close relatives. The crown node (C1 in Fig. 3) of Yucca-Hosta was constrained with a uniform distribution from 20.7 to 37.5 mya following McKain et al. 57 , who estimated the divergence time of Agavoideae using 69 cpDNA coding genes. Three further calibration processes were implemented, as uniform distribution from 50.0 to 67.4 mya for the stem group of Amaryllidaceae (C2); from 42.0 to 61.7 mya for the crown group of Amaryllidaceae (C3); and from 38.1 to 56.5 mya for the stem node of Allioideae (C4).
Ancestral area reconstruction. Biogeographic data for species within Allioideae were compiled from their distributions described in the literature and herbarium specimens. The distribution range of Allioideae species and outgroups was divided into five areas: (A) Eurasia, (B) North America, (C) Africa, (D) South America, and (E) Australia. We coded each species based on the entire range of the species regardless of the sample's biogeographic source. Ancestral area reconstruction and estimation of spatial patterns of geographic diversification within Allioideae were inferred using the BBM and S-DIVA as implemented in RASP v.2.1b (Reconstruct Ancestral State in Phylogenies, formerly S-DIVA) 58 . The BBM was run using the fixed state frequencies model (Jukes-Cantor) with equal among-site rate variations over two million generations, 10 chains each, and two parallel runs. In S-DIVA, the frequencies of ancestral ranges at a given node in ancestral reconstructions are averaged over all trees. For these analyses, we used all post burn-in trees obtained from BEAST analysis. The consensus tree used to map the ancestral distribution of each node was obtained using the Compute Condense option in RASP from stored trees. The maximum number of ancestral areas was set to five.  46-49 (2015).