Comparative analyses of Mikania (Asteraceae: Eupatorieae) plastomes and impact of data partitioning and inference methods on phylogenetic relationships

We assembled new plastomes of 19 species of Mikania and of Ageratina fastigiata, Litothamnus nitidus, and Stevia collina, all belonging to tribe Eupatorieae (Asteraceae). We analyzed the structure and content of the assembled plastomes and used the newly generated sequences to infer phylogenetic relationships and study the effects of different data partitions and inference methods on the topologies. Most phylogenetic studies with plastomes ignore that processes like recombination and biparental inheritance can occur in this organelle, using the whole genome as a single locus. Our study sought to compare this approach with multispecies coalescent methods that assume that different parts of the genome evolve at different rates. We found that the overall gene content, structure, and orientation are very conserved in all plastomes of the studied species. As observed in other Asteraceae, the 22 plastomes assembled here contain two nested inversions in the LSC region. The plastomes show similar length and the same gene content. The two most variable regions within Mikania are rpl32-ndhF and rpl16-rps3, while the three genes with the highest percentage of variable sites are ycf1, rpoA, and psbT. We generated six phylogenetic trees using concatenated maximum likelihood and multispecies coalescent methods and three data partitions: coding and non-coding sequences and both combined. All trees strongly support that the sampled Mikania species form a monophyletic group, which is further subdivided into three clades. The internal relationships within each clade are sensitive to the data partitioning and inference methods employed. The trees resulting from concatenated analysis are more similar among each other than to the correspondent tree generated with the same data partition but a different method. The multispecies coalescent analysis indicate a high level of incongruence between species and gene trees. The lack of resolution and congruence among trees can be explained by the sparse sampling (~ 0.45% of the currently accepted species) and by the low number of informative characters present in the sequences. Our study sheds light into the impact of data partitioning and methods over phylogenetic resolution and brings relevant information for the study of Mikania diversity and evolution, as well as for the Asteraceae family as a whole.

). All plastomes sequenced in this study encode 113 unique genes, including 79 protein-coding genes (CDS), 17 of which contain introns, 30 tRNA genes, and four rRNA genes ( Table 2). The plastomes of Mikania and the other three Eupatorieae have identical structure and order. The boundaries between the four main plastome regions are very conserved within Mikania species and among the three Eupatorieae genera sampled here: the LSC/IRb border is within rps19, the IRb/SSC is within ycf1, the SSC/IRa is between ndhF and a partial ycf1 (ψycf1), and the IRa/LSC is between a truncated rps19 ( †rps19) and trnH GUG (Supplementary Figs. S1, S2).

Analyses of SSR and tandem repeats.
In the 19 Mikania plastomes, the total number of SSRs range from 34 to 44 SSRs, while 51, 49, and 38 SSRs are recovered in A. fastigiata, L. nitidus, and S. collina, respectively (Supplementary Fig. S4A-C). The most abundant SSRs are A or T mononucleotide repeats, which account for 54.3-69% of the total SSRs in Mikania, 70.6% in A. fastigiata, 53.1% in L. nitidus, and 52.6% S. collina; G or C repeats, on the other hand, are rare (Supplementary Table S3). Among the total number of SSR motifs in Mikania, 20-29 (57.5-69%) are mono-repeats, 4-6 (9.5-14.3%) are di-repeats, 2-5 (5.4-12.5%) are tri-repeats, 5-7 (12.2-17.6%) are tetra-repeats, 0-1 (0-2.3%) is penta-repeat, and 0-1 (0-2.9%) is hexa-repeat ( Supplementary  Fig. S4B, Supplementary Table S3). Furthermore, most of the SSRs in the Mikania species are located in the LSC We also used REPuter to identify tandem repeat sequences longer than 30 bp in the plastomes sequenced here. In all 22 plastomes, repeats with 30-33 bp are the most common. Most repeats are found in the LSC, a few in the IRs, and none in the SSC ( Supplementary Fig. S4D Phylogenetic relationships of twenty Mikania species. Phylogenetic analyses using two different methods, concatenated maximum likelihood and multispecies coalescent, and three datasets (only coding regions, only non-coding regions and both combined) generated six different topologies with different degrees of support (Fig. 2). Ageratina fastigiata, Litothamnus nitidus, Stevia collina, and Helianthus annuus (NC007977; Heliantheae) were used as outgroups and the trees were rooted using H. annuus (Fig. 3). Within each clade, the relationships between some species pairs are stable, but the position of some taxa (e.g., M. smaragdina, M. ternata) consistently change, especially in Clade III (Fig. 2). The adjusted Robinson-Foulds distances fitted in a multidimensional scaling model show that topologies are considerably different among themselves, especially the three coalescent trees (Fig. 4, Supplementary Table S5). Although not directly comparable, support values are generally higher in the concatenated analyses (BS) than in the multispecies coalescent analyses (LPP), and in the total dataset in comparison with the coding-only or non-coding-only datasets (Fig. 2). Gene discordance analyses ran with the coalescent trees show a high level of incongruence between the species tree and the gene trees, especially in the dataset containing only coding regions (Fig. 5, Supplementary Fig. S5A).

Discussion
In this study, we assembled 19 complete plastomes of Mikania species and of three other species from tribe Eupatorieae (i.e., Ageratina fastigiata, Litothamnus nitidus, and Stevia collina), and conducted phylogenetic analyses with different datasets and inference methods. The organization of Mikania plastomes is similar across www.nature.com/scientificreports/     25 and Praxelis clematidea 26 , these inversions do not start between the trnS GCU and trnC GCA genes, as in other Asteraceae taxa, but between the trnC GCA and petN genes 25,26 . We also noticed an inversion within the ycf1 gene in the SSC region in the Ageratina adenophora 25 and Praxelis clematidea 26 plastomes, which was not observed in the plastomes assembled here, nor in M. micrantha (NC031833.1 22 ) or Ageratum conyzoides (MK905238 27 ) ( Supplementary Fig. S6). The gene content found in the 22 plastomes assembled here resembles that found in other Asteraceae genomes 25,31,33 . They encode 113 unique genes, including 79 protein-coding genes (CDS), 30 tRNA genes, and four rRNA genes. All plastomes include 17 intron-containing genes (14 contain one, while three contain two introns; Table 3). Within Eupatorieae, a duplication of the trnF GAA gene was detected in Ageratina adenophora 25 and Praxelis clematidea 26   www.nature.com/scientificreports/ Cichorioideae, Asteroideae, and Heliantheae alliance) 37 . The rpoC1 gene in Ageratina adenophora 25 , Ageratum conyzoides 27 , and Praxelis clematidea 26 contains two introns, while in all plastomes assembled here it contains only one intron, similarly to other Asteraceae plastomes sequenced to date 25,33 . Previous plastome comparative studies within Heliantheae detected a ~ 450 bp deletion in the ycf2 gene for some taxa 30,33 , which was not observed in the previously published Eupatorieae plastomes or those newly sequenced here. The nucleotide variability is relatively low within Mikania plastome sequences (mean π value of 0.0036). Yet, another comparative plastome study in Asteraceae, with 36 species of Aldama (Heliantheae), found an even lower mean π value, 0.00118 33 . The rpl32-ndhF and rpl16-rps3 intergenic regions are the most variable loci  www.nature.com/scientificreports/ found within Mikania plastomes, making them candidate markers for phylogenetic studies at the species level within the genus. Other regions with higher nucleotide variability within Mikania are: rbcL, ycf1, petN-psbM, rps16-trnQ UUG , trnH GUG -psbA, and atpI-atpH (Fig. 4A). The noncoding regions are more variable that the coding regions, as expected 38 (Supplementary Fig. S3, Supplementary Table S2). Some of the noncoding regions that are variable within Mikania have been reported to be likely useful for molecular studies at lower taxonomic levels 39,40 . Considering only the coding regions and the percentage of variable sites, the ycf1 gene is the most variable (3.83%) followed by rpoA (3.43%) (Fig. 4B,C, Supplementary Table S1). The ycf1 gene is well known as a variable coding region at lower taxonomic levels, including within Asteraceae, and has been used in phylogenetic studies within distinct plant groups 33,41 . In addition, the rpo genes (rpoA, rpoB, rpoC1, and rpoC2) have been previously reported to be relatively rapidly evolving 42,43 and divergent within Asteraceae 25,33,41 . The number of Single Sequence Repeats (SSRs), 34 to 44, identified within Mikania plastomes is similar to that reported for other Asteraceae, such as within Heliantheae, where 38-57 SSRs were found in a study with 15 species 33 . In all plastomes assembled here, most SSRs found are mononucleotide repeats (59-61% within Mikania and 22-37% in the other genera), followed by tetranucleotide repeats (5-7% within Mikania and 57-72% in the other genera). The A or T motifs are the most common SSR repeat, in agreement with other studies 31,44 (Supplementary Fig. S4A-C, Supplementary Table S3). Dispersed repeats are considered to have important influence in genome structure, size, recombination, and rearrangements 25 . The number of repeats ≥ 30 bp found in the plastomes sequenced in this study range from 17 to 45. The maximum repeat size found within all Mikania species was 48 bp (Supplementary Fig. S4D,E, Supplementary Table S4). In Myripnois dioica (Pertyeae) 58 repeats ≥ 20 bp were found and the maximum repeat length was the same found for Mikania (48 bp) 31 . In Lactuca sativa and Helianthus annuus, 15 and 33 repeats ≥ 23 bp were found, respectively, of which most were smaller than 40 bp, with only two larger than 90 bp 30 . In Ageratina denophora, 59 repeats ≥ 15 bp were found, most ranging between 15 and 50 bp, but repeats > 100 bp were also present 25 .
The phylogenetic analyses performed here sampled only 20 out of ~ 450 Mikania species (Fig. 3). Yet, the relationships within this genus were never investigated using complete plastomes and represent an advance in our knowledge of infrageneric evolutionary relationships. The only phylogenetic study of the genus to date 21 was focused on species delimitation of a few highly variable taxa, such as M. micrantha and M. cordifolia, based on AFLPs and two nuclear ribosomal markers. The divergent sampling, with only four species overlapping between both studies, precludes a proper comparison between the topologies from the previous study and the ones found here. The differences in the genomic compartment used by both studies further hinders a proper comparison, given the frequent occurrence of discrepancies between nuclear and plastidial phylogenies. The comparison among trees obtained with different reconstruction methods and datasets show a scenario of incongruence among topologies, especially in the higher nested clades (Figs. 2, 4). The backbones of most trees show Clade I as sister to a clade formed by Clade II and Clade III, except for the Astral coding tree, which shows Clade II as sister to Clade I and Clade III (Fig. 2B). Clade II presents the same relationships in all three concatenated analyses, while in the coalescent trees the relative positions of some taxa, such as Mikania ternata and M. purpurascens, change in all trees, but especially when comparing the coding tree with the non-coding and total trees (Fig. 2).  (Fig. 2).
The gene tree discordance analyses (Fig. 5, Supplemental Fig. S5) show strong discordance across the three datasets (total, coding, non-coding), with few gene trees agreeing with the relationships shown in the species tree. The multispecies coalescent has been extensively used in the context of multi-locus phylogenies obtained from target capture data 45 , but few studies have applied it to plastid data, due to the widespread misconceptions about the lack of biparental inheritance and recombination in this organelle 11 . Among recent studies that used the multispecies coalescent in plastid data, three of them refer to higher-level phylogenies, i.e., among Angiosperms 46 , among Rosids 10 and among tribes of Asteraceae 47 , while one deals with a single genus 48 . Most of these studies found incongruences between concatenated and coalescent analyses, but only two of them presented information about gene tree/species tree discordance 46,47 , both showing wide discordance between the inferred species tree and the gene trees.
Walker et al. 46 proposes that uninformative genes are one of the reasons for discordance, and this is likely one of the issues in our trees. In the coding dataset, the number of variable sites in each gene across species of Mikania varies from 0 to 3.83%, while the non-coding dataset presents a little more variation, from 0.5% to 8.8% (Supplementary Tables S1, S2). The large variation in gene tree topologies, summarized by the gene tree discordance analysis (Fig. 5, Supplementary Fig. S5), leads to weakly resolved gene trees and consequently to poorly supported species trees, as the calculation of local posterior probabilities (LPP) depends on the concordance among the three possible topologies for a determined quartet of branches 49 . The length of each individual locus alignment also influences on the degree of conflict, as shorter loci tend to have less informative sites, contributing to the lack of resolution in gene trees 46 . In our dataset, these two factors seem to be correlated, with most of the shorter alignments presenting very few variable sites ( Supplementary Fig. S7, Supplementary Table S1, S2).
Recent studies have shown that in concatenated analysis, a few outlier genes can drive topology inference 45,50 . Our concatenated analyses present topologies more similar to each other than the coalescent topologies, regardless of the dataset (Fig. 4). The concatenated coding tree is more similar to the concatenated total topology (Figs. 2, 4), possibly indicating that one or more specific genes are responsible for defining most of the topology, whereas the three coalescent topologies are all different from each other, due to lack of resolution in individual gene trees. Incomplete lineage sorting (ILS) is usually considered a source of conflict in the multispecies www.nature.com/scientificreports/ coalescent 45 , and one metric that can be used to assess its occurrence is the normalized quartet score of a coalescent tree, which measures the percentage of quartet trees found in the species tree from all calculated quartet trees 51 . The normalized quartet scores calculated for the three Astral topologies are considered very high (total: ~ 52%, coding: ~ 49%, non-coding: ~ 57%), but considering the sparse sampling in our study, which included ca. 0.45% of Mikania species, it is difficult to delimit the occurrence of ILS in relation to the lack of actual sampling. Further studies with more complete sampling could help untangle cases of ILS and lack of resolution due to uninformative genes, by also increasing the likelihood of sequence variation. Although applying the multispecies coalescent methods to chloroplast sequences makes sense biologically, due to the possibility of evolutionary process that could lead to different parts of the genome evolving in different rates, in practice the results tend to be confounding, as seen here and in previous studies 10,46 . The causes of plastome conflict are still poorly understood 46 , and in lower-level phylogenies, as the case presented here, it might be hard to untangle sources of conflict inherent to plastome biology from lack of sequence variability due to rapid radiations over short evolutionary times. In Mikania, where the phylogenetic relationships are poorly known, especially in relation to the nuclear genome, it is difficult to map out other potential root causes for conflict, such as hybridization, plastome capture, or incomplete lineage sorting. An expanded sampling, both in terms of species and genome compartment (e.g., adding nuclear markers), could bring a clearer picture of the evolutionary relationships in the genus and of other biological factors that might impact phylogenetic reconstructions in Mikania.

Comparative analyses of the assembled Mikania plastomes.
We conducted comparative analyses within 20 Mikania plastomes (i.e., 19 sequenced here plus M. micrantha, NC031833.1) and among Mikania and the three outgroup taxa plastomes assembled in this study (i.e., Ageratina fastigiata, Litothamnus nitidus, and Stevia collina). We used MAFFT 7 63 with the FFT-NS-2 method 64 to perform the alignment of the complete plastome sequences, with one copy of the IRs manually excluded to avoid data duplication. To search variable regions, we used mVISTA 65 with Shuffle-LAGAN 66 with the previously annotated M. decora plastome as reference, plus two Mikania species and the three outgroup taxa sequenced here. Based on the phylogeny recovered in this study, we selected one species from each of the three main recovered clades of Mikania (i.e., M. decora, M. decumbens, and M. sylvatica). We calculated nucleotide variability values (π) within 20 Mikania plastomes. We used DnaSP 6.10 67 to conduct a sliding window analysis with a 200 bp step size and 800 bp window length. The resulting π values were plotted using R 68 . We analyzed the variable sites using MEGA 7 69 in the alignments of the 20 Mikania complete plastomes and of 79 protein-coding genes (Supplementary Table S1) extracted from these genomes. Each gene was extracted from the complete plastome alignment and separately re-aligned in Geneious with the ClustalW plugin 70 considering codon positions.
Analyses of repeated regions. We searched for microsatellites or Simple Sequence Repeats (SSRs; i.e., tandemly arranged repeats of short DNA motifs of 1-6 bp in length) and repeated elements using MISA 71 and REPuter 72 , respectively, in the plastomes of the 19 Mikania species and three other Eupatorieae representatives sequenced here. We analyzed SSRs with motifs between 1 and 6 nucleotides and a minimum number of repetition units as follows: 10 for mono-, 5 for di-, and 4 for trinucleotide, and 3 for tetra-, penta-, and hexanucleotide SSRs. We identified repeated elements ≥ 30 bp (forward, palindrome, reverse, and complement) using ≥ 90% of sequence identity and hamming distance = 3.
Phylogenetic reconstruction. We reconstructed phylogenetic relationships among 20 Mikania plastomes (i.e., 19 sequenced here plus M. micrantha; NC031833.1) and three species from other Eupatorieae genera assembled in this study plus Helianthus annuus (NC007977; Heliantheae) as outgroup. Three concatenated matrices were assembled: one containing the whole plastome sequence with one IR removed (total), one containing only the CDS regions of all 79 protein-coding genes (coding) and one containing all intergenic regions and introns (non-coding). All matrices were aligned using MAFFT 7 63 using the FFT-NS-2 method 64 . Maximum likelihood reconstructions were carried out in in RAxML 8.2.9 73 using the GTR + G model with node support assessed by rapid bootstrap (-f a) using 1000 non-parametric bootstrap pseudo-replicates. The multispecies pseudocoalescent model from Astral III 51 was used to obtain species trees from individual gene trees. Three datasets were used in these analyses: one containing only each individual CDS region from all 79 protein-coding genes (coding), www.nature.com/scientificreports/ one containing intergenic regions longer than 300 bp (non-coding), and one combining both datasets (total). Character evolution models for each gene matrix were calculated with PartitionFinder v.1.1.0 [74][75][76] , evaluating the GTR + G and GTR + G + I models in the RAxML version with rcluster search option and unlinked branch lengths, using the corrected Akaike Information Criterion to choose between models. Unrooted gene trees were obtained in RAxML 8.2.9, using the rapid bootstrap mode and 100 pseudo-replicates. Branch support was calculated using local posterior probabilities (LPP) 51 .
Gene tree discordance. Discordance between the species tree and gene trees, expressed as the proportion of gene trees presenting each of the clades found in the species tree, was calculated using phyparts with the thorough conflict analysis options (-a 1) 77 . All species and gene trees were rooted using Helianthus annuus as outgroup using the function pxrr in the package phyx 78 . The proportion of gene trees in agreement with the species tree in each node, as well as the proportion of uninformative gene trees or those supporting alternative topologies, were plotted as pie charts at each node of the tree using the phypartspiecharts.py script 79 .
Topological comparisons. The adjusted Robinson-Foulds (RF) distance was used to calculate the distance among the six topologies. The RF distance was calculated between all pairs of rooted trees using PAUP* v4.0a 80 and adjusted by the number of nodes in the trees (RFadj = RF/(2n − 6)), resulting in values ranging from 0 to 1. A multidimensional scaling approach was used to observe the level of similarity among the topologies, using the "cmdscale" command in the R package "stats", and subsequently plotted.
Data archiving statement. The complete plastome sequence data of the 19 Mikania plastomes and that of Ageratina fastigiata, Litothamnus nitidus, and Stevia collina are available in GenBank (NCBI) with the accession numbers MT793834-MT793855.