Macroecology and macroevolution of the latitudinal diversity gradient in ants

The latitudinal diversity gradient—the tendency for more species to occur toward the equator—is the dominant pattern of life on Earth, yet the mechanisms responsible for it remain largely unexplained. Recently, the analysis of global data has led to advances in understanding, but these advances have been mostly limited to vertebrates and trees and have not provided consensus answers. Here we synthesize large-scale geographic, phylogenetic, and fossil data for an exemplar invertebrate group—ants—and investigate whether the latitudinal diversity gradient arose due to higher rates of net diversification in the tropics, or due to a longer time period to accumulate diversity due to Earth’s climatic history. We find that latitudinal affinity is highly conserved, temperate clades are young and clustered within tropical clades, and diversification rate shows no systematic variation with latitude. These results indicate that diversification time—and not rate—is the main driver of the diversity gradient in ants.

This topology was used for the node calibration (NC) dating analysis, and highly supported nodes were used to justify monophyly constraints for the FBD dating analysis in which topology was not otherwise fixed. Node numbers reflect bootstrap support. Scale in substitution per unit branch length.   For a given species, the species median was taken across over the estimate inferred from each of 100 trees in the posterior sample. In b), the black line is the median for the band and the blue shaded area represents the standard deviation across all species. FBD and NC refer to the two dating methods, stem and crown to the two grafting methods. We also inferred net diversification with BAMM directly on the backbone using the clade-specific sampling fraction option. , where each blue dot is the standardized correlation coefficient on one tree from the posterior, and bar is the 95% CI. The results are stacked vertically to display variation among posterior trees. STRAPP refers to the lineage-based permutation test, and here the blue dot is the Pearson correlation coefficient, and the shaded area encloses 95% null distributions from 1000 replicates. The columns represent combinations of different tree dating methods (FBD or NC) and different methods of inferring net diversification rate (stem or crown grafting, or using BAMM directly on the backbone phylogeny with clade-specific sampling fractions).
Supplementary Figure 5 | Sensitivity test of the diversification rate-latitude relationship to potential latitudinal sampling bias. a-h) In test 1, we pruned different fractions of extratropical species off of the grafted all-ant trees to compensate for potential oversampling in the extratropics, and inferred diversification rates for different dating/grafting method combinations. In test 2, we used the backbone trees and adjusted the clade-specific sampling fraction to reflect both uncertainty in the number of undescribed ant species (5K-15K) and the fraction of those that are tropical (0.7-1.0). In top rows (a-e, i-k) , the dot is the standardized coefficient of the clade-wise PGLS with bars reflecting 95%CI, and in the bottom rows (e-h, l-n) the dots reflect the Pearson correlation and bars the 95% null distribution from the structured rate permutation test. For i-n) the black and gray dots represent correlations and null distributions for diversification used for the NC and FBD backbone trees, respectively (the x values are slightly offset for visibility).

Time-homogeneous symmetric model
Time-homogeneous asymmetric model

FBD trees NC trees
Likelihood profile of epoch boundary . All models agree on there being few high-confidence extratropical lineages reconstructed to be older than the Oligocene, with most arising since the Miocene. To test if the ancestral state estimation was sensitive to sample size, we dropped 25% of taxa and recalculated the ancestral state pattern in Figure 3. a) Ancestral state pattern recovered across 100 posterior trees using the full data (NC dating). b-c) Ancestral states reconstructed on a reduced dataset after dropping 25% of taxa from all trees. The two tests differ in that in test 1, the same 25% of taxa were dropped from all trees, while in test 2, a different random set was dropped from each tree.

Time (Million years ago)
Probability extratropical

Supplementary Note 1: Phylogenetic Reconstruction Methods
Overview. We sought to reconstruct phylogenetic tree sets that represent current knowledge of ant phylogenetic history while integrating over uncertainty due to the fact that most species lack molecular data. In this regard, we follow similar efforts reconstructing comprehensive phylogenies for plants, 1 birds, 2 and mammals 3 representing taxa both with and without molecular data. Recent studies have established the main backbone of the ant tree of life, 4-8 with molecular data available for all ant subfamilies and most genera. Subfamily and genus-level taxonomy has been extensively reorganized to be consistent with phylogenetic relationships, and with a few notable exceptions (e.g. Monomorium, Cerapachys), are mostly monophyletic (or a combination of a few genera are monophyletic) and taxonomy can be used to place them on the tree. Given this, our approach was to a) first reconstruct backbone phylogenies with a broad set of data representing the main ant lineages, b) date the backbone trees using a dataset of 509 fossil taxa, and then c) identify well-supported clades that can be grafted onto the appropriate backbone lineages leaving trees with all (or nearly all) ant taxa represented. We discuss each of these steps in turn. The alignment, ML tree, BEAST xml files, dated backbone posterior tree sets (with and without fossil taxa), and all-ant posterior trees generated with the different method permutations are all available in a Dryad repository (doi:10.5061/dryad.g579t7k).
Molecular Data and Taxon Selection. We retreived nucleotide sequences provided by eight different ant phylogenetic studies as either nexus alignments from Treebase 9 or as FASTA sequences from GenBank. 10  . 8 We chose to use alignments from published studies rather than mine genetic data from GenBank directly, because we have found some GenBank data to represent misidentified loci and/or taxa, and believe those used in published phylogenetic studies are most reliable. Data in nexus alignment files were de-aligned to retrieve the nucleotide sequences and converted into FASTA format. The full dataset included 7,283 sequences representing 13 loci, and 948 taxa representing 687 ant species from 276 genera and 16 Formicidae subfamilies. It also included 43 sequences from six outgroup taxa obtained from the Schmidt 2013 study. 13 With the help of AntCat.org, 15 we manually updated taxonomic names to represent the latest taxonomic changes and to correct misidentifications. Finally, we filtered the dataset to maximize coverage and eliminate redundancy which reduced the dataset to 5,465 sequences representing 11 loci, 673 ant species (269 genera) and six outgroup taxa. The 11 genetic loci are identified as follows: 18S ribosomal RNA (18S ), 28S ribosomal RNA (28S ), abdominal-A (abdA), arginine kinase (argK ), carbamoyl-phosphate synthetase 2 (CAD), elongation factor 1-alpha F1 (EF1aF1 ), elongation factor 1alpha F2 (EF1aF2 ), long-wavelength rhodopsin (LwRh), DNA topoisomerase (Top), ultrabithorax (Ubx ), wingless (wg). GenBank accession numbers for all sequences as well as detailed taxonomic information are available in Supplementary Data 1.
Nucleotide Alignment. Since nine out of the 11 loci are protein-coding genes, we generated codon-aware alignments for these loci using MACSE v1.01b. 16 Apart from its primary utility as a codon-aware aligner, MACSE allows the user to align reliable and unreliable protein coding sequences, the latter defined as sequences containing errors like stop codons or frameshift mutations. For each protein coding locus in our dataset, amino acid sequences classified under the Formicidae rank that represented these genes were retrieved from NCBI GenBank Protein database. We reverse transcribed them and passed them to MACSE as reliable input sequence. Then, for each of these loci, a random sample of 50 sequences were passed as unreliable sequences. The resulting multiple sequence alignment generated by MACSE was manually cleaned and curated using Geneious version R8 17 to reduce artifactual gaps. The remaining sequences were then aligned to this multiple sequence alignment using MAFFT v7.205 software 18 with the '-add' and '-keep-length' options. The resulting nine multiple sequence alignments, each containing the full set of sequences, were also manually cleaned to remove poorly aligned or non-coding regions and examined to ensure that a fully codon-based alignment was generated. The two non-coding ribosomal genes, 18S and 28S, were aligned with MAFFT and the resulting alignments manually curated. Finally, we concatenated the multiple sequence alignments for all 11 loci into a single matrix containing 679 taxa and 10,767 sites, with 42.2% missing data.
Partitioning Schemes and Model Selection. We used PartitionFinder v1.1.1 software 19 to determine suitable partitioning schemes and corresponding models of molecular evolution for the concatenated multiple sequence alignment generated in the previous step. We designated 29 data blocks: two representing the ribosomal genes 18S and 28S, and 27 blocks representing the three codon positions for the nine protein coding genes. Models searched included HKY, HKY+Γ, SYM, SYM+Γ, GTR, GTR+Γ, TrN, TrN+Γ, K80, K80+Γ, TrNef, TrNef+Γ, JC and JC+Γ. We We used ClockstaR v2 20 to determine whether multiple unlinked relaxed-clock models were advisable for the partitions determined by PartitionFinder. Based on the "firstSEmax" criterion and the PAM clustering algorithm, multiple ClockstaR runs preferred a single clock model (k=1), so clock models were linked for all relaxedclock analyses.

Supplementary
Backbone Tree ML Estimation. We reconstructed phylogenetic relationships of 673 ant taxa using the maximum likelihood method of inference implemented in RAxML v8.0.1 21 with the nucleotide alignment generated from data from published sources. We performed 100 ML searches to find the best tree and 1000 bootstraps to obtain support values. The partitioning scheme determined by PartitionFinder was employed with the GTR+Γ model of evolution. This undated backbone tree represents 269 major genera out of 323 described genera, although most of the remaining genera have very few species (98% of ant species are in the included 269 genera).
Backbone Tree Divergence Dating: Overview. The timescale of ant evolution has been a contentious issue in the past 4, 5, 7 and is relevant for the hypotheses we test in this study. Thus, we performed three dating methods to infer a set of time-calibrated backbone phylogenies. First, we used a traditional approach of calibrating nodes with fossil evidence (henceforth, NC), using and comparing two different choices of prior distributions (uniform and lognormal). Second, we implemented the more recently developed fossilized birth-death process (henceforth, FBD) approach. 22 The FBD approach circumvents problems that arise with node calibration methods and allow the integration of a broader range of fossil data into analyses. However, it may be sensitive to taphonomic biases in the fossil record. In general, each of these approaches has strengths and weaknesses, although the performance of different methods under different conditions is still a point of active research.
We constrained our trees to follow the rooting of the phylogenomic study of Johnson et al. (2013), 23 with ants sister to Apidae and nested within the Aculeata.
Backbone Tree Divergence Dating: Node Calibration (NC) Approach. In this approach, the backbone tree was dated in absolute time using BEAST2. 24 We employed two different methods for the dating process: Yule model of speciation with uniform prior distributions and Yule model of speciation with lognormal prior distributions, each using 51 fossil calibrations. See Supplementary Table 2 for de-13 tails regarding these fossil calibrations and their distributions used in BEAST2. The calibration of the root node was set to a uniform prior distribution in both methods reflecting the bounds on that node argued by Brady et al. (2006). 5 For both BEAST2 runs, the partitioning scheme and models specified by PartitionFinder were used (Supplementary Table 1), along with a single (linked) Lognormal Relaxed-Clock model (based on the ClockstaR analysis). For both dating methods we sought to use reasonable starting trees to reduce burnin time. We generated these by dating the ML backbone tree using the 51 age calibrations with chronos function provided in APE package 64 in R. The starting trees satisfied the monophyly constraints enforced by the fossil calibrations and were fixed (i.e. we did not estimate tree topology with BEAST2). Four independent analyses were performed for each of the two methods. Each analysis was monitored using Tracer software v1.6.0 65 to evaluate burnin. We wrote a custom script to monitor convergence of node-age estimates across runs (producing log files readable in Tracer), which occurred as the runs ranged from 129 million states to 141 million states. We discarded the leading 40% of the saved states, and combined the trees from the different runs into a single posterior sample.

Supplementary
Backbone Tree Divergence Dating: Fossilized Birth-Death Process (FBD) Approach. We first generated a list of candidate ant fossil taxa from a recently consolidated global dataset of ant fossils. 66 The FBD analysis is implemented by specifying a tip date for all taxa (extant taxa have a tip date of zero, whereas fossils appear on the phylogeny at some past date), and a set of monophyly constraints which constrain fossils to fall within particular clades (thereby affecting the age of those clades). Here, we did not use ML backbone tree directly in the analysis, but instead used the backbone tree and previous work on ant phylogenetics to justify 217 monophyly constraints reflecting well-supported relationships. These constraints included 25 subfamily level constraints, 177 genus level constraints, and 15 constraints on groups of species that are monophyletic but within non-monophyletic genera (see Supplementary Data 2). Genera which have not been recovered as monophyletic in the ML tree and previous studies (e.g. Monomorium and Cerapachys) were not constrained to be monophyletic. We relied on taxonomy as a guide to placing fossils into clades. For example, an extinct genus that is classified in the subfamily Myrmicinae is constrained to fall within the myrmicine clade, however, it would be excluded from falling within individual genera by genus-level monophyly constraints. We excluded any fossils that are incertae sedis or were identified as falling in a non-monophyletic genus. A list of fossil taxa used and their tip dates can be found in Supplementary Data 1.
We used the Sampled Ancestors plugin 67 in BEAST2 24 to implement this analysis. Combined with fossil taxa tip dates, the Sampled Ancestors package automatically generates starting trees using these constraints. These trees, however, had either extremely short or extremely long internal branch lengths, which led to undesirably long burnin times on this large a tree. To reduce the burnin, we manually adjusted these branches (without changing fossil tip dates) to give them more realistic lengths and in this way generated four starting trees. Unlike the NC approach, in which topology was fixed to the ML topology, the FBD analysis integrates over tree space consistent the monophyly constraints. For each start tree, we ran four independent analyses, making a total of 16 analyses. As before, the runs were monitored with Tracer software using a script to evaluate node age convergence (of post-burnin states) across runs. We stopped the runs when they were ranging from 170 million states to 229 million states. We discarded the first 40% of the states, and combined the runs into a single posterior sample.
Backbone Tree Divergence Dating: Method Comparison. The FBD and NC approaches both have their strengths and weaknesses, and a full evaluation of those is best explored elsewhere. We compared the recovered node ages found using the different dating methods, by comparing median ages of a set of 279 nodes present in all trees. We found that the two Node-Calibration approaches (lognormal and uniform priors) gave very similar results, but the FBD was more divergent from the other two, although still highly correlated (Supplementary Fig. 8). Part of the reason could be that the two Node-Calibration approaches were based on the same (ML) topology, while the FBD analysis sampled over the topology posterior as well as node ages. Given the insensitivity of the NC approach to the choice of calibration prior, from this point we used trees dated with FBD and only one of the NC methods: the version based on less restrictive uniform prior calibrations.
Clade Grafting and All-Ant Trees. We sought to use the posterior trees from the Bayesian divergence dating analyses to generate trees representing the full range of known taxonomic diversity in the Formicidae family. This follows similar efforts for plants, 1 birds, 2 and mammals. 3 Those comprehensive phylogenies, while not fully resolved or based on molecular data for all taxa, have proven to be very useful for evolutionary biology and applications in ecology (e.g. community phylogenetics). Note that we also use backbone trees for certain downstream analyses when all-ant trees were not appropriate or to corroborate results.
Of the 14,912 taxa in our geographic dataset, 14,594 can be placed onto the phylogeny. The remainder belong to (often very small) genera for which no molecular data are available. To place taxa onto the backbone trees, we identified 262 terminal ant clades on the backbone tree (Supplementary Data 3). These terminal clades mostly corresponded to genus names. However, in cases of non-monophyletic genera, the terminal clades were composed of either sub-genus or multi-genus clades. If a genus was split into two terminal clades, we assigned species randomly to the two clades in proportion to the number of taxa in each clade on the backbone tree (but in this case which species was assigned to each clade varied ranodmly across each all-ant tree we constructed). For each of these sets of species, we generated 2000 pure-birth trees with a matching richness using the pbtree function provided in the phytools package 68 in R. Then, we randomly retrieved 800 dated backbone trees from the posterior set of each of the two dating methods, NC and FBD, -1,600 trees in total. On each of the backbone trees, we replaced the 262 terminal clades (having incomplete species composition) with random trees, without replacement, from the respective pure-birth tree sets (having near complete species composition).
The key decision in grafting the terminal clades onto the backbone is how to set the crown age of the grafted clade. The backbone trees usually contain more than one specimen from the same terminal clade. This gives a lower-bound on the crown age of the clade, whereas the upper bound is given by the stem node. In many cases, the crown node on the backbone is likely the true crown node of the clade, for others it is almost certainly not. Assuming the crown nodes on the backbone are the true nodes can underestimate clade age and overestimate diversification rate. Assuming the true crown is older (located somewhere between the backbone crown age and stem age) may lead to overestimating ages and underestimating diversification rates. Because neither solution is perfect, we performed the analyses using both approaches and compared results for those analyses that used these ant-wide trees.
We call the two approaches the "stem grafting" and "crown grafting" methods. For the stem grafting method, to make a single all-ant tree we 1) first chose a backbone tree from the posterior, then 2) for each terminal clade, we found the crown and stem node for that clade on the backbone tree, 3) we set the crown of the terminal clade to a randomly chosen age between the backbone crown and stem ages, and scaled the edge lengths of the grafted clade using the randomly chosen age. For the crown grafting method, we followed the same procedure except we set the crown age of the grafted clade to match the crown age on the backbone tree, and scaled the edge lengths using this age. In both approaches, if only one taxon appears on the backbone tree (i.e. thus it is a single branch there is no crown on the backbone), we grafted the crown of the clade to a random age between the stem node and the present.
In total, the above procedures produced four sets of 800 all-ant trees, each with 14,594 species, constructed with combinations of the two dating methods (NC and FBD) and the two grafting methods ("stem grafting" and "clade grafting"). Individual trees will contain artifactual variation due to the random structure within terminal clades, and reflect individual samples from the posterior of backbone node ages, but collectively the tree sets integrate over these variations.
We additionally created a single tree for each method where we first set the ages of the backbone tree to their median values across the backbone posterior tree set (for FBD trees, where topology changes as well, we used the maximum clade credibility topology). These trees were used to give a single "average" tree for each method (e.g. for visualization purposes), and for the sampling bias tests, but were not used for the main statistical analyses and conclusions. We refer to these as the "mcc" trees.

Supplementary Note 2: Diversification Rate Inference and Tests for Correlations with Latitude
Overview. Statistical inference of macroevolutionary rates from phylogenetic data is a fundamental, but technically challenging task in evolutionary biology. New methods are actively being developed, however these are controversial with the performance of different methods under debate. Ideally, one would like to infer speciation rates, extinction rates, and net diversification rates in a way that is robust to lineagespecific heterogeneity in those rates and accounts for missing data across lineages. However, this has proven to be a challenging task. For example, some previous studies on the latitudinal gradient have used methods to detect latitudinal dependency that are prone to false inference when there is background lineage-specific heterogeneity in macroevolutionary rates, which always be the case for a tree as large as an entire group like ants (or birds, mammals, etc.) 69 Likewise, methods such as MEDUSA 70 that infer heterogeneity in clade rates and are intended to account for missing data can also be problematic. 71 We designed our analyses around BAMM (Bayesian Analysis of Macroevolutionary Mixtures), 72 but very recently this method has been under debate 73, 74 as well (see additional discussion below).
In this context, we took a conservative approach in what we sought to infer, while foregoing opportunities to infer some macroevolutionary rates that we might ideally like to know. Specifically, we focused on net diversification rates, the difference between speciation and extinction rates, rather than inferring variation in both speciation and extinction across the tree. The reason we chose to do this is that teasing apart speciation and extinction rates is more challenging than inferring net diversification rate, and the former depends strongly on the details of branching patterns within clades in the tree. Tree shape, in turn, can be affected by taxa not included in the tree, and for a group as large as ants (including birds and mammals) no phylogeny is completely sampled with molecular data for all species. While methods exist that attempt to account for lineage specific sampling, the validity of these has recently been questioned. 71 We thus used two alternate methods for representing unsampled species on the tree. First, we ran BAMM on the backbone trees, assigning sampling fractions to each terminal clade based on known richness of the clade. Second, we used the completely sampled trees (with respect to known ant species) that were constructed with the grafting methods described above, but without clade-specific sampling fractions. The species assigned to each clade reflect well-supported relationships in ant systematics, but the internal branching pattern is generated with a pure birth process. This should allow for inference of overall net diversification rates but would give problem-20 atic estimates of clade-specific speciation and extinction rates, thus we ignored the latter. For similar reasons, we focused on average net diversification rate of different lineages and clades by assuming constant diversification rate models rather than those that allow for density-dependence (accelerating or decelerating diversification within clades), because the latter would be biased by the grafting approach.
This focus on net diversification rate is sufficient to test the hypothesis that variation in macroevolutionary rates is responsible for the disparity of richness across latitude. Although teasing apart any variation in speciation and extinction with latitude would be interesting for many reasons, if systematic variation in those rates does not lead to variation in net diversification rate with latitude, then variation in those rates cannot be a causal factor in the latitudinal diversity gradient.
Bayesian Inference of Diversification Rates. To infer net diversification across ant clades, we used BAMM v2.5.0. 72 We randomly selected 100 trees from each of the four groups of all-ant trees and ran BAMM on each tree. We ran between 18 and 20 million generations using MEDUSA-like settings (enforcing constant rate birthdeath) in BAMM, saving 1000 states per tree. Speciation/Extinction priors were set using BAMMtools 75 function, setBAMMpriors, for each of the 400 trees. We set expectedNumberOfShifts to 250, and minCladeSizeForShift to 5. We discarded 50% of the states as burn-in, and used the posterior set for summarizing the net diversification rates of the 262 terminal clades on each tree. We also performed tests to ensure that the results were not prior-sensitive, by altering the prior and examining the number of rate-shifts.
We also ran BAMM on the backbone trees (100 each from NC and FBD dating methods) using the sampling fraction option to represent missing species. Here we set the sampling fraction for each tip to be the number of species on the backbone phylogeny for each clade divided by the total known richness of that clade. For these runs, we set the expectedNumberOfShifts to 1 and minCladeSizeForShift to 1. The BAMM control files can be found in the Dryad repository (doi:10.5061/dryad.g579t7k).
Cross-validation of BAMM results. BAMM has very recently been debated on both theoretical and practical grounds, 73,74 and it is likely this debate will continue for some time. To our knowledge, there is no alternative, uncontroversial method that infers heterogeneity in diversification rates across large trees with many potential rate-shifts. In light of these developments, it is worth questioning whether our BAMM results are reasonable or artifactual in some way due to the performance of the program.
As described above, we intentionally took a conservative track in what we hoped 21 to achieve with our diversification analysis. We did not seek to infer speciation and extinction rates individually, only net diversification rates, we did not rely solely on corrections that adjust for "incomplete sampling fractions" (i.e. we also constructed all-ant trees), we only considered constant-rate models (i.e. no accelerating or decelerating models), and we did not focus on the number and location of individual rate shifts. In this context, diversification rates should be mostly be a function of clade age and richness. BAMM provides a convenient framework to infer this across entire trees, integrate over uncertainty, and allow for sharing of information across individual clades that are close in the phylogeny. But does it give reasonable results that are driven by clade ages and richness? To check this, we compared the net diversification rate parameter inferred with BAMM (median from posterior) with that estimated from bd.ms function in geiger v2.0 76 that fits a single constant-rate model that depends only on clade age and richness. We used the all-ant trees derived from the mcc-median trees from the four methods NC-stem, NC-crown, FBD-stem, FBD-crown. Parameters epsilon and missing in bd.ms were set to 0. On a species level the correlation between the results from BAMM and geiger's bd.ms function is very high (Spearman's ρ: 0.93, 0.90, 0.93, 0.94, for NC-crown, NC-stem, FBD-stem, and FBD-crown respectively). We also compared estimates on a clade-by-clade basis, excluding very small clades (<20 species) which should have uncertain estimates with any method. On this clade-by-clade basis, the results are highly correlated (Spearman's ρ: 0.94, 0.89, 0.88, 0.94, for NC-crown, NC-stem, FBD-stem, and FBD-crown respectively). Thus, while we cannot comment on the validity of BAMM for all purposes, we believe it represents a valid approach that gives reasonable values for the quantities we want to infer, and our overall conclusions would be similar using different approaches.
Testing for latitude-diversification rate correlations. After estimating diversification rates with BAMM, we took two complementary methodological approaches to evaluating correlations between diversification rate and latitude. We used multiple methods in order to ensure our results were robust to variations in data coding and methodology. First, we performed clade-wise regression approach in a phylogenetic framework. Second, we used lineage-wise approach based on structured rate permutations.
Clade-wise weighted PGLS approach. We performed a series of clade-wise regressions to evaluate if there was any correlation between the latitudinal affinity of a clade and its inferred diversification rate. First, as a baseline, we calculated an unweighted correlation between clade diversification rate and the average of ab-solute midpoint latitudes from all species in the clade, and calculated this across all trees from the posterior sample (GLS). Second, to correct for phylogenetic nonindependence among clades, we performed regressions weighted by phylogeny using a PGLS regression approach. 77 To represent phylogenetic covariation across trees, for each posterior tree we pruned each terminal clade producing new termini located at the crown node of each clade.
Third, given that our estimates of clade age and thus diversification rate have uncertainty, we weighted the regression to compensate for this uncertainty (wGLS and wPGLS). Specifically, for the regression on each tree, we weighted data points by the inverse coefficient of variation for the clade's diversification rate estimate across all trees. We also weighted each clade by the confidence intervals on its average absolute latitude. Thus, clades that have more uncertainty in their diversification rate (for example due to uncertainty in clade age in the backbone or arising from the grafting process), or are more spread across latitude, contribute less to the regression than those clades that have more confidently-estimated rates or are more concentrated at a given latitude.
For all regressions, we scaled predictor and response variables by their standard deviation and centered their distribution on zero so as to produce a standardized regression coefficient. To assess the relative effects of phylogenetic correction and weighting on model parameters, we repeated these analyses for each combination of phylogenetically corrected and un-corrected, weighted and un-weighted GLS settings.
We performed these regressions on BAMM results using both the clade grafting method (i.e. adding species to the tree) and also on the BAMM results from the backbone trees using the sampling fraction option to represent missing species. For each terminal clade, we exported the average net diversification rate of the clade since the crown node. If there was only one species in the clade, we took the average rate along the branch from the stem node.
Lineage-wise permutation test approach (STRAPP). As a complementary test, we used the traitDependentBAMM function, also referred to as STRAPP -Structured Rate Permutations on Phylogenies, 78 in the BAMMtools 75 R package. This function calculates a correlation coefficient and tests for significance by permuting diversification rates on the phylogeny while preserving the underlying covariation in macroevolutionary rates and species traits.
We ran STRAPP on each posterior tree: 100 each from the four groups of all-ant grafted trees (NC-stem, NC-crown, FBD-stem, FBD-crown) and 100 each of the two groups of backbone/sampling-fraction trees (NC and FBD). For the all-ant trees, we used the species level midpoint latitudes. For the backbone trees, we assigned each clade the average absolute midpoint latitude for all species in the whole terminal clade (reasoning that sampling fraction is being assigned based on whole clade richness, thus whole-clade tropical affinity should be reflected). We also tested using individual species midpoint latitudes using only those taxa found on the backbone tree, and found no difference in results. We tested statistically whether there was a significant correlation between absolute midpoint latitude and diversification rate using the Pearson correlation as a test statistic and 1000 null replications.

Supplementary Note 3: Evaluating potential effects of sampling bias on results
Our analysis is near-comprehensive with respect to current knowledge of ant species and their geographic distributions, but the global inventory of ant biodiversity is far from complete. Furthermore, it is probable that tropical fauna are not as well documented as extratropical fauna. Here we consider how the three main analyses presented in this paper, phylogenetic diversity, and ancestral states/lineages through time, and diversification rates, might or might not be biased by latitudinal sampling given our results.
Phylogenetic diversity and latitudinal sampling bias. For phylogenetic diversity, we found that extratropical species were clustered on the phylogeny, especially in the northern hemisphere (Figs. 1, 2, Supplementary Fig. 2). Overall, MPD should be rather insensitive to sampling bias because it reflects the overall spread of species in a sample across the tree. If new species are added to the tropical clades more or less in the same proportions as their current richness (a reasonable assumption), the MPD in the tropics should be stable. We found phylogenetic diversity to be lower in the extratropics because lineages there are clustered into relatively few clades. If we sampled more species from the tropics, that would not make the extratropics less clustered on the phylogeny. This may not be true for all PD measures, for example mean nearest taxon distance (MNTD) would be sensitive to density of species within clades rather than spread among them.
Diversification rate and latitudinal sampling bias. In principle, the undersampling of the tropics could bias our estimate of diversification rates and obscure potential latitudinal correlations. To investigate the likelihood of this possibility, we performed experiments to mimic potential latitudinal sampling biases and evaluate their effects on our results using two methods.
In the first method (Test 1), we used the grafted "all-ant" trees and randomly pruned different fractions of extratropical tips from the tree and re-ran BAMM. This is intended to account for a lower probability of discovery of tropical species relative to extratropical species (i.e. by undiscovering extratropical species) without using sampling fractions. Here we randomly pruned off extratropical species with a range of probabilities (0.2, 0.4, 0.6). The maximum value (0.6) would imply that even if extratropical species are completely sampled, tropical species are only 40% sampled. Since there are over 10,000 described ant species in the tropics, this would mean there are actually 25,000 tropical species and thus close to 30,000 ant species total, which is at the high end for estimates of total ant species. 79 We performed two replicates for each tree/parameter combination, but since the results were similar across replicates, for clarity of presentation we only present one replicate.
In the second test (Test 2), we used the backbone trees and the clade-specific sampling fraction option in BAMM, and manipulated the sampling fractions to mimic undersampling of tropical taxa. There are two uncertainties we need to account for: the number of total remaining ant species and the fraction of those that are tropical. For total ant species, the current number is approximately 15,000. We estimate the true number to be within the range of 20,000-30,000. For the fraction tropical (i.e. absolute midpoint latitude < 23.5), the current number is approximately 0.7, which should provide a minimum value, thus we estimate the fraction of remaining species to be between 0.7−1.0, with 1.0 reflecting the extreme assumption that all undiscovered ant species are tropical. We used these numbers to adjust the clade-specific sampling fractions in BAMM. We assumed that remaining tropical and temperate species are assigned to clades in the same proportions of current species. For example, if 1% of all known tropical ant species are found in clade A, then we assume 1% of undiscovered tropical ant species are also in that clade. Thus, using the ranges above, we first calculated how many tropical and extratropical species are missing from the tree, and then how many should be assigned to each clade. Finally, we adjust the sampling fractions accordingly and re-ran BAMM.
Both the analyses above were performed on the mcc trees for each dating method. After inferring the new rates in BAMM, we put the results through the same analytical pipeline for testing diversification rate-latitude correlations. For test 1 (pruning method) we recalculated the average absolute latitude for each clade after dropping tips. For the sampling fraction test (test 2), we performed the PGLS and STRAPP analyses with a new independent variable: the fraction of species in the clade that are tropical (according to the binary classification). We used the latter so we can account for the fact that if sampling is biased by latitude, it also will affect the clade-level average tropicality. In other words, we can adjust the fraction tropical in each clade more straightforwardly than the average latitude in each clade, because the latter requires a model of specifically how bias changes with latitude. We also, however, ran the regressions on the original average absolute latitude variable and the results were comparable.
Both of our tests agree that it is unlikely that latitudinal sampling bias is obscuring a diversification rate-latitude correlation ( Supplementary Fig. 5). Even making extreme assumptions about latitudinal sampling bias (there are 15,000 ant species out there undiscovered, and they are all in the tropics), the correlation goes from weakly positive to near zero, and never becomes significantly negative. We believe this is due to the extreme heterogeneity in richness and diversification rates among ant clades at all latitudes, which assuming that missing species are distributed in proportion to current richness, still dominates the variation in diversification rate even if accounting for a latitudinal sampling bias.
Ancestral state estimation and sampling bias. For reconstructing ancestral states/lineages through time, having more species on the phylogeny would increase the accuracy of those reconstructions, but would a latitudinal sampling bias systematically skew the results? Our main finding was that there are many old tropical lineages relative to few old extratropical lineages. If we oversampled extratropical lineages, this could only increase the relative number of old extratropical lineages relative to tropical lineages. In addition, tropical undersampling could bias the extratropical lineages to look older than they are, because we might miss recent tropical relatives. Thus, if there is a bias, we believe it should be in the opposite direction of our results.
A related concern is sample size itself. Our ancestral state estimation uses the backbone trees constructed with molecular data. Out of total 679 taxa on the backbone trees, 512 have species range data (i.e. because they are described species) and can be used for ancestral state reconstruction. Ancestral state reconstruction does not require a fully sampled phylogeny and should not be biased by using random subsets of taxa. However, it is reasonable to question whether our ancestral state results are sensitive to sample size or inclusion of particular taxa, given that the actual taxa on the tree are far fewer than the total number of ant species. To check this, we performed two simple subsampling tests. First, in test 1 we randomly chose 25% of the taxa, and dropped this same set of taxa from all the trees. Second, in test 2 we dropped a different randomly chosen 25% of taxa from each posterior tree. We then recalculated the ancestral state pattern across trees as we did for the full dataset ( Figure 3, Supplementary Figs. 6-7). We performed these tests using the time-homogeneous asymmetric model, using NC backbone trees.
We found that the overall ancestral state pattern, showing old tropical lineages but few or no old temperate lineages, was highly stable even after dropping 25% of the taxa in both tests (Supplementary Fig. 10). This indicates that the pattern we recovered is not sensitive to sample size or the particular choice of taxa included.

Supplementary Note 4: Additional discussion of ancestral state estimation results
We successfully fit the time-homogeneous symmetric and asymmetric models ( Figure  3, Supplementary Fig. 6), and the 2-epoch symmetric model (Supplementary Fig.  7). We also attempted to fit 2-epoch asymmetric models, but found they exhibited convergence issues and inconsistent results. We believe these problems arise from fitting asymmetric models to scenarios of accelerating evolution (scenarios involving decelerating evolution can more easily be fit successfully, see ref. 80 ). For example, a pattern where old lineages are mostly in one of the two states (which appears to be our case) could be caused by either very low evolutionary rates and an ancestor in that state, or very fast but very asymmetric evolution, causing most lineages to accumulate in one state. Thus, the recent emergence of a minority trait can be caused by either an acceleration or deceleration in evolutionary rate with a similar resulting pattern. Due to these issues we do not further consider 2-epoch asymmetric model in this study.
The time-homogenous asymmetric model (FBD: ∆AIC = 0.04 ± 0.004, NC: ∆AIC = 0.005 ± 0.004, with the means and standard errors taken across posterior trees) and the symmetric epoch-model (FBD: ∆AIC = 2.5 ± 0.14, NC: ∆AIC = 3.3±0.12) fit the data comparably well, and both outcompeted the time-homogeneous symmetric model (FBD: ∆AIC = 12.0 ± 0.16, NC: ∆AIC = 16.7 ± 0.17). Since the asymmetric model had the lowest AIC for most trees, we presented it in the main text. Both models predict a recent emergence of extratropical lineages, but for somewhat different reasons. The symmetric-epoch models predict this is due to there being a tropical ancestor and acceleration of transition rates in the recent past. The time-homogenous asymmetric predict this due to a sustained 3-fold higher transition rate from the extratropics back into the tropics, resulting in few old extratropical lineages persisting. For the epoch model, the optimal breakpoint was actually more recent than 34mya Oligocene cooling, falling during the Miocene (NC mean time shift: 19.5 ± 0.65mya, FBD mean time shift: 16.9 ± 0.3mya). However, the likelihood of an Oligocene breakpoint is not dramatically different (Supplementary Fig.  7). Further work is needed to distinguish strongly asymmetric evolution versus more symmetric, but time-inhomogeneous trait evolution.