Abstract
Ants are the most ubiquitous and ecologically dominant arthropods on Earth, and understanding their phylogeny is crucial for deciphering their character evolution, species diversification, and biogeography. Although recent genomic data have shown promise in clarifying intrafamilial relationships across the tree of ants, inconsistencies between molecular datasets have also emerged. Here I re-examine the most comprehensive published Sanger-sequencing and genome-scale datasets of ants using model comparison methods that model among-site compositional heterogeneity to understand the sources of conflict in phylogenetic studies. My results under the best-fitting model, selected on the basis of Bayesian cross-validation and posterior predictive model checking, identify contentious nodes in ant phylogeny whose resolution is modelling-dependent. I show that the Bayesian infinite mixture CAT model outperforms empirical finite mixture models (C20, C40 and C60) and that, under the best-fitting CAT-GTR + G4 model, the enigmatic Martialis heureka is sister to all ants except Leptanillinae, rejecting the more popular hypothesis supported under worse-fitting models, that place it as sister to Leptanillinae. These analyses resolve a lasting controversy in ant phylogeny and highlight the significance of model comparison and adequate modelling of among-site compositional heterogeneity in reconstructing the deep phylogeny of insects.
Similar content being viewed by others
Introduction
Ants are the most ubiquitous and ecologically dominant arthropods in terrestrial ecosystems, largely due to the evolution of eusociality1,2. A robust phylogeny of the group is key to understanding ant character evolution, species diversification, and biogeography. Over the past two decades, developments in molecular phylogenetics1,2,3,4,5,6, along with the discovery of exceptional Cretaceous fossils7,8,9,10 have propelled substantial advances in our understanding of ant evolutionary history.
Among the three major groups (formicoids, leptanilloids and poneroids; Fig. 1) of extant ants, the intrafamilial relationships within the formicoids, a clade encompassing the vast majority of ant species, have crystallised from recent molecular phylogenies2,6,11,12. By contrast, the intersubfamilial relationships within the poneroid clade are not clear in recent phylogenetic studies, although the monophyly of the clade is strongly supported2,3,4. The most contentious open questions of ant phylogeny lie in the morphologically peculiar leptanilloid clade (Leptanillinae and Martialinae; Fig. 1)2,4,5, with some phylogenetic studies rejecting the monophyly of this group13,14. The subfamily Martialinae, represented by a sole Neotropical species Martialis heureka Rabeling & Verhaagh, was earlier recovered as the sister group to the remainder of the extant ants13. However, this hypothesis has been questioned by Kück et al.14, who recovered Leptanillinae as a sister group to the extant Formicidae based on improved alignment of the Sanger-sequencing data, alignment masking, and data partitioning. Such inconsistencies may result from confounding factors in phylogenetic analyses such as long-branch attraction15 and compositional heterogeneity across lineages4. Recently, a genome-scale phylogenomic study based on protein-coding genes and ultra-conserved elements under both concatenation and coalescence methods consistently and strongly supported the monophyly of the leptanilloid clade2. Whether Martialis or the Leptanillinae are sisters to all other ants is thus the most pressing outstanding question of ant systematics and evolution1, since it is fundamental to our understanding of ant phenotype and ecology.
In the phylogenomic era, molecular phylogenies, whether they are maximally supported or not, are often misled by systematic errors when the properties of molecular evolution are not adequately modelled16,17,18,19,20. To tackle this issue, I thoroughly explored the recently published Sanger-sequencing4 and genome-scale2 datasets of ants by testing the fit of substitution models and modelling among-site compositional heterogeneity. My analyses of large-scale datasets under the best-fitting model, selected based on model comparison and posterior predictive model checking, identify modelling-dependent signals and shed light on the contentious early divergences in the tree of ants.
Results
Sanger-sequencing datasets of Borowiec et al.4
My phylogenetic analyses of three (complete 11-gene matrix and two matrices with the most AT-rich and the most GC-rich outgroups excluded, with 7451 NT sites) of the four datasets presented by Borowiec et al.4 under the site-heterogeneous CAT-GTR + G4 model yielded trees (Fig. 2) largely consistent with the ones obtained under the partition model in the original study4. The trees based on the complete dataset (Fig. 2a) and the one with the most GC-rich outgroups removed (Fig. 2c) were identical to each other in terms of the ingroup topology and support. Leptanillinae were recovered as sister to all other ants, and Martialis was weakly supported as the second-branching lineages (Bayesian posterior probability [BPP] = 0.58 and 0.74, respectively). For the matrix with the most AT-rich outgroups removed (Fig. 2b), Martialis was very weakly supported as a sister to Leptanillinae (BPP = 0.35), but all other relationships were congruent with the complete dataset. Due to site filtering, that reduced the phylogenetic signal and compositional bias4, the fourth compositionally homogeneous matrix (3995 NT sites) under CAT-GTR + G4 yielded an overall weakly supported tree (Fig. 2d): Martialis was weakly supported as sister to Leptanillinae (BPP = 0.77), and the intersubfamilial relationships within the poneroid and formicoid clades too were weakly supported. Overall, the deep relationships among ants, i.e., the systematic placements of Leptanillinae and Martialis, were not resolved with confidence based on these four datasets.
Nuclear genomic datasets of Romiguier et al.2
In my maximum likelihood analyses using IQ-TREE, Matrix 1, Matrix 2 and Matrix 4 under LG4X + R and LG + C20 + F + G models yielded consistent results (Figs. 3a and 4a; Supplementary Figs. 1, 2 and 4): the clade Martialis + Leptanillinae was maximally supported (maximum likelihood bootstrap [MLB] = 100; except for Matrix 4 under the LG4X + R model, MLB = 92), as was the monophyly of the poneroid and formicoid clades (MLB = 100). Similarly, Matrix 3 under the site-homogeneous LG + F + G model and the empirical site-heterogeneous LG + C20, C40 and C60 models resulted in a similar topology to that derived from Matrix 2, although the support values for Martialis + Leptanillinae were low (MLB = 56, 89, 90 and 80, respectively; Supplementary Fig. 3). Interestingly, Matrix 5 under LG4X + R, LG + C20 + F + G and GHOST (LG + FO*H4) models consistently yielded different topologies in terms of the placement of Martialis: Martialis was supported as sister to poneroids + formicoids (MLB = 81, 97 and 73, respectively; Supplementary Figs. 5 and 6).
By contrast, under the site-heterogeneous CAT-GTR + G4 mixture model, all datasets (Matrices 1–5) yielded a consistent topology in terms of the systematic positions of Martialis and Leptanillinae (Figs. 3b, c and 4b; Supplementary Figs. 1c, 2c, 3f, 4c and 5c). All nodes in all Bayesian analyses of Matrices 1, 2 and 4 were maximally supported (BPP = 1), suggesting that the phylogenetic signal was strong in these matrices. Leptanillinae was a sister group to all other subfamilies of ants, and Martialis was recovered as a sister to the monophyletic poneroids and formicoids. The intersubfamilial relationships of formicoids were identical in all analyses, agreeing with the currently accepted topology based on recent phylogenetic studies1,2,3,4. The interrelationships of poneroid subfamilies were slightly different between the 4151-gene Matrix 1 and the 2343-gene Matrices 2 and 3. Based on Matrix 1, Ponerinae were maximally supported as sister to the remaining five subfamilies (Fig. 3a; Supplementary Fig. 1c). Based on Matrices 2 and 3, however, Apomyrminae + Amblyoponinae was sister to other lineages (Fig. 4b; Supplementary Figs. 2c and 3f).
Model comparison
The LOO-CV (leave-one-out cross-validation) and the wAIC (widely applicable information criterion) scores were obtained based on Matrix 3, considering the huge computational burden. The scores were close to each other, suggesting that wAIC is a close approximation of LOO-CV for the 2343-gene amino acid dataset. ∆CV and ∆wAIC were calculated as the difference in the estimated predictive performance between the best-fitting model and another model under consideration. As shown in Fig. 4c, the CAT-GTR + G4 mixture model fitted the dataset better than any of the other models, including the site-homogeneous LG + F + G model, and the LG + C20 + F + G, LG + C40 + F + G, and LG + C60 + F + G models on the amino acid dataset, both according to LOO-CV (∆CV = −28.28 + 29.44 = 1.16) and according to wAIC (∆wAIC = −28.28 + 29.44 = 1.16). The LG + C20 + F + G model used by Romiguier et al.2 was clearly better fitting than the LG + F + G model (∆CV = −29.66 + 30.39 = 0.73), but less well fitting than the CAT-GTR + G4 model. Therefore, topologies reconstructed with the CAT-GTR + G4 model were used as the preferred trees for elucidating the relationships of ant subfamilies.
Posterior predictive model checking
To test whether available models could adequately describe among-site amino acid preferences, I used analyses of site-specific amino acid diversity (PPA-DIV), or the mean number of distinct amino acids observed at each site, based on Matrix 3, as shown in Table 1. The Z score was adopted here since the MCMC estimate of the p-value is 0. PPA-DIV has a broad and distinctive distribution of Z scores across the different tested models (from 156.891 under LG + F + G to 3.974 under CAT-GTR + G4). For the tested dataset, absolute Z scores greater than 5 were obtained under the LG + F + G and LG + C20, C40 and C60 models, indicating a strong rejection of the null hypothesis that the model adequately describes the data. PPAs indicated that CAT-GTR + G4 (Z-score = 3.974), the model that was favoured based on my model comparison test, describes site-specific amino acid preferences substantially better than other tested models. This result was not surprising, since CAT-GTR + G4 is known to be by far the best-fitting model that can explicitly accommodate among-site compositional heterogeneity21,22,23,24.
Discussion
Impacts of outgroup choice and data filtering on tree topology
My detailed analyses of both Sanger-sequencing and genome-scale datasets provide a basis for reassessing the subfamilial relationships of ants. As found in the recent phylogenomic study2, my results based on multiple supermatrices (variants of the less-outgroup 4151-gene and more-outgroup 2343-gene datasets) demonstrate that outgroup selection does not affect the internal phylogeny of ants when the phylogenetic signal is strong. By contrast, outgroup choice does influence the basal relationships when the much smaller Sanger-sequencing nucleotide datasets are analysed4 and this may stem from sequence divergence and a lack of sufficient phylogenetic signal. My phylogenomic analyses also show that data filtering of various degrees and taxon subsampling do not affect the topology of the basal ant phylogeny, when the large-scale matrices are used. Moreover, my comparative phylogenomic results under both site-homogenous and site-heterogeneous models, integrated with formal model comparison, clearly demonstrate that modelling among-site compositional heterogeneity is the key to a natural ant tree of life.
Significance of model comparison and modelling compositional heterogeneity
Modelling of amino acid replacement is central to phylogenomic inference, particularly so when dealing with deeper relationships and rapid radiations. As such, model comparison is a crucial yet computationally challenging step in modern phylogenomics. In the most recent phylogenomic study of ants2, the empirical finite mixture model LG + C20 + F + G was selected to mitigate long-branch-attraction artefacts by modelling among-site compositional heterogeneity. The selection of this particular model (instead of LG + C40 + F + G, LG + C60 + F + G, or other theoretically better-fitting models) was apparently a compromise since runs of supermatrices under the C40 and C60 models (in IQ-TREE) are computationally expensive in terms of both running time and memory requirements. A recent study focusing on model comparison under Bayesian cross-validation, however, shows that amino acid mixture models (CAT models) outperform all single-matrix models (LG, WAG) and free finite mixtures (CAT-GTR + G4) consistently outperform empirical finite mixtures (e.g., LG + C20, C40 and C60)25. Not surprisingly, my cross-validation analyses based on the ant dataset reached the same conclusion that CAT-GTR + G4 outperforms other tested models (including LG + C20 + F + G and LG + C60 + F + G). In addition, similarly to a recent simulation study on the rooting of the animal tree26, my posterior predictive analyses demonstrate that CAT-GTR + G4 can best describe site-specific amino acid preferences in ant phylogenomics, so CAT should be preferred to C60. Overall, my analyses of genome-scale data highlight the significance of model comparison and adequate modelling of among-site compositional heterogeneity in deciphering the deep phylogeny of ants.
My reanalyses of the 11-loci datasets from Borowiec et al. 4 suggest that deeper phylogeny of ants and the position of Martialinae cannot be unambiguously resolved when the phylogenetic signal is weak. Among all of my analyses, the highest support regarding the placement of Martialis was 0.77, under the compositionally homogeneous matrix (Fig. 1d). However, in this particular analysis, support for the nodes within formicoids were exceptionally weak, and more importantly, the relationships within formicoids were inconsistent with the widely accepted topology1,2 and my phylogenomic results.
Position of Martialis in the ant Tree of Life
The discovery of Martialis heureka (Martialinae) based on a single stray worker from the Amazon (north of Manaus, Brazil)13 is exciting and perplexing. Since M. heureka displays a bizarre combination of both pleisiomorphic and autapomorphic traits, it was placed into its own subfamily. Its precise phylogenetic position has been contentious since its discovery. Rabeling et al. 13 recovered M. heureka as the sister to all remaining extant subfamilies, while a reanalysis by Kück et al. 14 recovered it as the second-branching lineage after Leptanillinae. Subsequent integrated analyses based on a handful of loci4,27 continued to reach divergent conclusions. Based a broad sampling and genome sequencing, Romiguier et al.2. retrieved high support for the leptanillomorph clade (Leptanillinae and Martialinae) as the sister group to all other extant ants. My reanalyses of multiple datasets from the most comprehensive studies2,4 under better-fitting models show that M. heureka is sister to all ants except Leptanillinae, agreeing with the conclusion of Kück et al.14 but rejecting other topologies2,4,10,13. My analyses resolve a lasting enigma in ant phylogeny and offer a backbone topology for investigations of character evolution, biogeography, and ecology of early ants. For instance, Boudinot et al.10 recently have adduced some potential synapomorphies of Leptanillomorpha (Martialis + Leptanillinae), but the present phylogenomic results suggest that these morphological similarities could be a consequence of convergent acquisition of features adapting these ants to a subterranean lifestyle.
The relationships of formicoid subfamilies have long been concerted1, but the intersubfamilial relationships of poneroids remain unsettled. Monophyly of the morphologically heterogeneous poneroids has been recently consistently recovered in molecular phylogenetic studies, including Moreau et al.6, some analyses of Brady et al.11, Ward and Fisher28, Borowiec et al.4 and phylogenomic studies2,5. In more recent studies4,5, many deeper nodes within poneroids were weakly to moderately supported. The resolution of poneroid relationships has been much improved in Romiguier’s et al.2 genome-based phylogenomic study. The remaining incongruences in poneroid relationships between the aforementioned study and the present analyses of the filtered 4151-gene dataset (Matrix 1) under CAT-GTR + G4 remain to be addressed by future studies with a broader taxon sampling. Ponerinae are a sister group to the clade (Amblyoponinae, Apomyrminae) + (Paraponerinae, (Agroecomyrmecinae, Proceratiinae)) (Fig. 2a). My phylogenomic study will provide a foundation for understanding ant evolution and comparative studies of evolutionary innovations among ants.
Methods
Dataset collation
I used the most comprehensive Sanger-sequencing and nuclear genome alignments from Borowiec et al.4 and Romiguier et al.2, respectively. The datasets (Sanger-sequencing datasets29 and genome-scale datasets30) were downloaded from the Zenodo data repository.
For the Sanger-sequencing (11 nuclear loci) data, I used all four nucleotide [NT] matrices generated in Borowiec et al.:4 (1) Full 11-locus matrix (123 taxa, 7451 NT sites); (2) Full matrix with the most AT-rich outgroups excluded (117 taxa, 7451 NT sites); (3) Full matrix with the most GC-rich outgroups excluded (117 taxa, 7451 NT sites); and (4) Homogeneous matrix with heterogeneous partitions removed (123 taxa, 3995 NT sites).
For the nuclear genomic data, I used the two BUSCO-gene amino acid [AA] supermatrices from Romiguier et al.:2 (1) Fewer-outgroup AA dataset (83 taxa, 4151 single-copy protein-coding genes, 1,692,050 AA sites); and (2) More-outgroup amino acid dataset (188 taxa, 2343 single-copy protein-coding genes, 983,951 AA sites), which was designed to test the impact of outgroup selection on tree inference. To balance taxon sampling of subfamilies, focus on the deeper phylogeny of ants, and, more importantly, speed up computationally heavy Bayesian runs, I subsampled AA sites of the 4151-gene supermatrix and filtered constant sites (to speed up analyses) using BMGE v.1.131, resulting in Matrix 1 (38 taxa, 647,114 AA sites). Similarly, I randomly pruned the outgroup taxa and selected all representative ingroup genera of the 2343-gene supermatrix, and filtered ambiguously aligned sites using BMGE with default setting (-m BLOSUM62, -h 0.5), yielding Matrix 2 (47 taxa, 623,908 AA sites). Additionally, as sensitivity tests of the potential impact of my data filtering and subsampling methods on tree inference, I (1) filtered the 2343-gene supermatrix using a stringent setting (-m BLOSUM30, -h 0.1:0.5) to select slow-evolving AA sites, resulting in Matrix 3 (47 taxa, 95,201 AA sites); (2) removed remotely related outgroup and selected representative ant genera, but kept all AA sites of the 4151-gene supermatrix, resulting in Matrix 4 (17 taxa, 1,692,050 AA sites); and (3) filtered the 4151-gene supermatrix using BMGE with a very stringent setting (-m BLOSUM30 -h 0.2:0.3), resulting in Matrix 5 (82 taxa, 21,902 AA sites).
Phylogenetic analyses
Phylogenomic analyses of the nuclear genomic datasets, Matrices 1–5, were conducted using the simpler LG4X + R model32,33 and the site-heterogeneous model (LG + C20 + F + G)34 with IQ-TREE v.2.1.335. For the site-heterogeneous LG + C20 + F + G model, the posterior mean site frequency (PMSF) model36 was applied using the respective LG4X + R tree as the guide tree. In addition, the comparatively small Matrix 3 was analysed using the site-homogeneous model (LG + F + G) and the site-heterogeneous models (LG + C40 + F + G, and LG + C60 + F + G) with IQ-TREE v.2.1.3, corresponding to the models used on the following model comparison (see below). For Matrix 5, the heterotachous model (General Heterogeneous evolution On a Single Topology, GHOST37) was also tested using IQ-TREE v.2.1.3.
As the subfamilial interrelationships of ants are expected to be affected by long-branch attraction artefacts1,2, I used the compositionally site-heterogeneous infinite mixture model CAT-GTR + G4 implemented in PhyloBayes MPI 1.938, which has been proven to be effective for mitigating such a systematic error by modelling across-site compositional heterogeneity. Four Sanger-sequencing matrices (as nucleotides) and five genome-based supermatrices (as amino acids) were analysed under the CAT-GTR + G4 model. For each analysis, two Markov chain Monte Carlo chains were run, and convergence was assessed using the bpcomp and tracecomp tools implemented in PhyloBayes39. Approximately 30% of samples were discarded as burn-in. Detailed information about the PhyloBayes runs (burnin samples, total number of cycles, bpcomp maxdiff and tracecomp minimal overall effective size) is given in the figure caption of each analysis.
Model comparison
For the filtered genome-scale AA dataset (Matrix 3), I used the comparatively efficient and reliable approaches, i.e., the leave-one-out cross-validation (LOO-CV) and the widely applicable information criterion (wAIC)40, to estimate the relative fit of alternative models (CAT-GTR, LG + G, LG + C20, LG + C40, and LG + C60) in the latest PhyloBayes MPI 1.9. The general idea of cross-validation is to split the data set into two subsets, using one subset for training the model and then evaluating the fit of the model over the remaining subset. In the context of Bayesian inference, a natural procedure to implement CV is to average the validation likelihood over the training posterior distribution. The resulting CV score is then log-transformed and averaged over multiple random splits of the original data set into training and validation sets. In leave-one-out CV, each observation is taken in turn and set aside for validation, using the n − 1 remaining observations to train the model40. The LOO-CV and wAIC scores were compared to determine and select the best-fitting model, based on which my preferred tree of ants was selected.
Testing model adequacy
Posterior predictive analyses (PPA)39 were performed on Matrix 3 using PhyloBayes MPI 1.9 to test whether LG + F + G, LG + C20 + F + G, LG + C40 + F + G, LG + C60 + F + G, or CAT-GTR + G can adequately describe site-specific amino acid preferences for the dataset. These models (especially LG + F + G and LG + C20 + F + G) were selected and tested because they had previously been used in the recent study of ant phylogenomics2 that yielded a contradictory topology to my preferred tree.
Statistics and reproducibility
For Bayesian phylogenetic analyses, I followed the practice as indicated in the manual of PhyloBayes39,40, which integrates detailed methods (bpcomp and tracecomp) for statistical evaluations. Detailed information on the PhyloBayes runs (burnin samples, total number of cycles, bpcomp maxdiff and tracecomp minimal overall effective size) is given in the supplemental figure caption of each analysis. In addition, as sensitivity tests of the potential impact of my data filtering and subsampling methods on tree inference, I further designed three additional supermatrices for comparison.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The data sets and output files generated in our phylogenomic analyses have been deposited in the Dryad Digital Repository [https://doi.org/10.5061/dryad.pk0p2ngsj]41.
References
Ward, P. S. The phylogeny and evolution of ants. Annu. Rev. Ecol. Evol. Syst. 45, 23–43 (2014).
Romiguier, J. et al. Ant phylogenomics reveals a natural selection hotspot preceding the origin of complex eusociality. Curr. Biol. 32, 2942–2947 (2022).
Borowiec, M. L., Moreau, C. S. & Rabeling, C. Ants: Phylogeny and Classification, In Encyclopedia of Social Insects. (ed C. K. Starr), pp. 1–18. (Springer International Publishing, Cham, 2020).
Borowiec, M. L. et al. Compositional heterogeneity and outgroup choice influence the internal phylogeny of the ants. Mol. Phylogenet. Evol. 134, 111–121 (2019).
Branstetter, M. G., Longino, J. T., Ward, P. S. & Faircloth, B. C. Enriching the ant tree of life: enhanced UCE bait set for genome‐scale phylogenetics of ants and other Hymenoptera. Methods Ecol. Evol. 8, 768–776 (2017).
Moreau, C. S., Bell, C. D., Vila, R., Archibald, S. B. & Pierce, N. E. Phylogeny of the ants: diversification in the age of angiosperms. Science 312, 101–104 (2006).
Barden, P., Perrichot, V. & Wang, B. Specialized predation drives aberrant morphological integration and diversity in the earliest ants. Curr. Biol. 30, 3818–3824 (2020).
Barden, P. & Grimaldi, D. A. Adaptive radiation in socially advanced stem-group ants from the Cretaceous. Curr. Biol. 26, 515–521 (2016).
LaPolla, J. S., Dlussky, G. M. & Perrichot, V. Ants and the fossil record. Annu. Rev. Entomol. 58, 609–630 (2013).
Boudinot, B. E. et al. Evolution and systematics of the Aculeata and kin (Hymenoptera), with emphasis on the ants (Formicoidea: †@@@idae fam. nov., Formicidae). Preprint at bioRxiv https://doi.org/10.1101/2022.02.20.480183 (2022).
Brady, S. G., Schultz, T. R., Fisher, B. L. & Ward, P. S. Evaluating alternative hypotheses for the early evolution and diversification of ants. Proc. Natl Acad. Sci. USA 103, 18172–18177 (2006).
Ward, P. S., Brady, S. G., Fisher, B. L. & Schultz, T. R. The evolution of myrmicine ants: phylogeny and biogeography of a hyperdiverse ant clade (Hymenoptera: Formicidae). Syst. Entomol. 40, 61–81 (2015).
Rabeling, C., Brown, J. M. & Verhaagh, M. Newly discovered sister lineage sheds light on early ant evolution. Proc. Natl Acad. Sci. USA 105, 14913–14917 (2008).
Kück, P., Hita Garcia, F., Misof, B. & Meusemann, K. Improved phylogenetic analyses corroborate a plausible position of Martialis heureka in the ant tree of life. PLoS ONE 6, e21031 (2011).
Jermiin, L. S., Ho, S. Y., Ababneh, F., Robinson, J. & Larkum, A. W. The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst. Biol. 53, 638–643 (2004).
Cai, C. Y. et al. Integrated phylogenomics and fossil data illuminate the evolution of beetles. R. Soc. Open Sci. 9, 211771 (2022).
Kapli, P., Flouri, T. & Telford, M. J. Systematic errors in phylogenetic trees. Curr. Biol. 31, R59–R64 (2021).
Kapli, P., Yang, Z. & Telford, M. J. Phylogenetic tree building in the genomic age. Nat. Rev. Genet. 21, 428–444 (2020).
Tihelka, E. et al. The evolution of insect biodiversity. Curr. Biol. 31, R1299–R1311 (2021).
Tihelka, E. et al. Fleas are parasitic scorpionflies. Palaeoentomology 3, 641–653 (2020).
Lartillot, N., Brinkmann, H. & Philippe, H. Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol. Biol. 7, S4 (2007).
Feuda, R. et al. Improved modeling of compositional heterogeneity supports sponges as sister to all other animals. Curr. Biol. 27, 3864–3870 (2017).
Cai, C. Y., Tihelka, E., Liu, X. Y. & Engel, M. S. Improved modelling of compositional heterogeneity reconciles phylogenomic conflicts among lacewings. Palaeoentomology 6, 49–57 (2023).
Li, Y. D., Engel, M. S., Tihelka, E. & Cai, C. Phylogenomics of weevils revisited: data curation and modelling compositional heterogeneity. Biol. Lett. 19, 20230307 (2023).
Bujaki, T. & Rodrigue, N. Bayesian cross-validation comparison of amino acid replacement models: contrasting profile mixtures, pairwise exchangeabilities, and gamma-distributed rates-across-sites. J. Mol. Evol. 90, 468–475 (2022).
Giacomelli, M., Rossi, M. E., Lozano-Fernandez, J., Feuda, R. & Pisani, D. Resolving tricky nodes in the tree of life through amino acid recoding. iScience 25, 105594 (2022).
Moreau, C. S. & Bell, C. D. Testing the museum versus cradle tropical biological diversity hypothesis: phylogeny, diversification, and ancestral biogeographic range evolution of the ants. Evolution 67, 2240–2257 (2013).
Ward, P. S. & Fisher, B. L. Tales of dracula ants: the evolutionary history of the ant subfamily Amblyoponinae (Hymenoptera: Formicidae). Syst. Entomol. 41, 683–693 (2016).
Borowiec, M. L. et al. Compositional heterogeneity and outgroup choice influence the internal phylogeny of the ants [Data set]. Zenodo https://doi.org/10.5281/zenodo.2549806 (2019).
Romiguier, J. et al. Ant phylogenomics reveals a natural selection hotspot preceding the origin of complex eusociality [Data set]. Zenodo https://doi.org/10.5281/zenodo.5705739 (2022).
Criscuolo, A. & Gribaldo, S. BMGE (Block Mapping and Gathering with Entropy): selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol. Biol. 10, 210 (2010).
Le, S. Q., Dang, C. C. & Gascuel, O. Modeling protein evolution with several amino acid replacement matrices depending on site rates. Mol. Biol. Evol. 29, 2921–2936 (2012).
Yang, Z. A space-time process model for the evolution of DNA sequences. Genetics 139, 993–1005 (1995).
Le, S. Q., Gascuel, O. & Lartillot, N. Empirical profile mixture models for phylogenetic reconstruction. Bioinformatics 24, 2317–2323 (2008).
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
Wang, H. C., Minh, B. Q., Susko, E. & Roger, A. J. Modeling site heterogeneity with posterior mean site frequency profiles accelerates accurate phylogenomic estimation. Syst. Biol. 67, 216–235 (2018).
Crotty, S. M. et al. GHOST: recovering historical signal from heterotachously-evolved sequence alignments. Syst. Biol. 69, 249–264 (2020).
Lartillot, N., Rodrigue, N., Stubbs, D., Richer, J. & PhyloBayes, M. P. I. phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst. Biol. 62, 611–615 (2013).
Lartillot, N. PhyloBayes: Bayesian phylogenetics using site-heterogeneous models. In Phylogenetics in the Genomic Era. (eds C. Scornavacca, F. Delsuc & N. Galtier), pp. 1.5:1–1.5:16. No commercial publisher, authors open access book (2020).
Lartillot, N. Identifying the best approximating model in Bayesian phylogenetics: Bayes factors, cross-validation or wAIC? Syst. Biol. 72, 616–638 (2023).
Cai, C. Data sets for phylogenomic analyses in: Ant backbone phylogeny resolved by modelling compositional heterogeneity among sites in genomic data [Data set]. Dryad https://doi.org/10.5061/dryad.pk0p2ngsj (2024).
Acknowledgements
This research was supported by the National Natural Science Foundation of China (42222201, 42288201) and the Second Tibetan Plateau Scientific Expedition and Research project (2019QZKK0706). I thank Prof. Philip Ward, Dr. Jonathan Romiguier, Dr. Marek Borowiec, and Mr. Erik Tihelka for their helpful discussions.
Author information
Authors and Affiliations
Contributions
C.C.: conceptualisation, project administration, data curation, formal analysis, funding acquisition, investigation, visualisation, writing.
Corresponding author
Ethics declarations
Competing interests
The author declares no competing interests.
Peer review
Peer review information
Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editors: Katie Davis and Joao Valente.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Cai, C. Ant backbone phylogeny resolved by modelling compositional heterogeneity among sites in genomic data. Commun Biol 7, 106 (2024). https://doi.org/10.1038/s42003-024-05793-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42003-024-05793-7
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.